Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev

hotfix(blocks): bump stagehand ^0.5.1 → ^3.4.0 to fix yanked litellm (#12539 )
## Summary **Critical CI fix** — litellm was compromised in a supply chain attack (versions 1.82.7/1.82.8 contained infostealer malware) and PyPI subsequently yanked many litellm versions including the 1.7x range that stagehand 0.5.x depended on. This breaks `poetry lock` in CI for all PRs. - Bump `stagehand` from `^0.5.1` to `^3.4.0` — Stagehand v3 is a Stainless-generated HTTP API client that **no longer depends on litellm**, completely removing litellm from our dependency tree - Migrate stagehand blocks to use `AsyncStagehand` + session-based API (`sessions.start`, `session.navigate/act/observe/extract`) - Net reduction of ~430 lines in `poetry.lock` from dropping litellm and its transitive dependencies ## Why All CI pipelines are blocked because `poetry lock` fails to resolve yanked litellm versions that stagehand 0.5.x required. ## Test plan - [x] CI passes (poetry lock resolves, backend tests green) - [ ] Verify stagehand blocks still function with the new session-based API
2026-04-08 03:00:28 -04:00 · 2026-03-24 21:18:11 +07:00 · 2026-03-24 21:17:19 +07:00 · 2026-03-24 20:27:46 +07:00 · 2026-03-24 19:16:42 +07:00 · 2026-03-24 10:59:04 +00:00
2165 changed files with 736198 additions and 2011 deletions
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -17,6 +17,14 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
 gh pr view {N}
 ```

+## Read the PR description
+
+Understand the **Why / What / How** before addressing comments — you need context to make good fixes:
+
+```bash
+gh pr view {N} --json body --jq '.body'
+```
+
 ## Fetch comments (all sources)

 ### 1. Inline review threads — GraphQL (primary source of actionable items)
--- a/.claude/skills/pr-review/SKILL.md
+++ b/.claude/skills/pr-review/SKILL.md
@@ -17,6 +17,16 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
 gh pr view {N}
 ```

+## Read the PR description
+
+Before reading code, understand the **why**, **what**, and **how** from the PR description:
+
+```bash
+gh pr view {N} --json body --jq '.body'
+```
+
+Every PR should have a Why / What / How structure. If any of these are missing, note it as feedback.
+
 ## Read the diff

 ```bash
@@ -34,6 +44,8 @@ gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews

 ## What to check

+**Description quality:** Does the PR description cover Why (motivation/problem), What (summary of changes), and How (approach/implementation details)? If any are missing, request them — you can't judge the approach without understanding the problem and intent.
+
 **Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).

 **Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).
--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -0,0 +1,754 @@
+---
+name: pr-test
+description: "E2E manual testing of PRs/branches using docker compose, agent-browser, and API calls. TRIGGER when user asks to manually test a PR, test a feature end-to-end, or run integration tests against a running system."
+user-invocable: true
+argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
+metadata:
+  author: autogpt-team
+  version: "2.0.0"
+---
+
+# Manual E2E Test
+
+Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.
+
+## Critical Requirements
+
+These are NON-NEGOTIABLE. Every test run MUST satisfy ALL the following:
+
+### 1. Screenshots at Every Step
+- Take a screenshot at EVERY significant test step — not just at the end
+- Every test scenario MUST have at least one BEFORE and one AFTER screenshot
+- Name screenshots sequentially: `{NN}-{action}-{state}.png` (e.g., `01-credits-before.png`, `02-credits-after.png`)
+- If a screenshot is missing for a scenario, the test is INCOMPLETE — go back and take it
+
+### 2. Screenshots MUST Be Posted to PR
+- Push ALL screenshots to a temp branch `test-screenshots/pr-{N}`
+- Post a PR comment with ALL screenshots embedded inline using GitHub raw URLs
+- This is NOT optional — every test run MUST end with a PR comment containing screenshots
+- If screenshot upload fails, retry. If it still fails, list failed files and require manual drag-and-drop/paste attachment in the PR comment
+
+### 3. State Verification with Before/After Evidence
+- For EVERY state-changing operation (API call, user action), capture the state BEFORE and AFTER
+- Log the actual API response values (e.g., `credits_before=100, credits_after=95`)
+- Screenshot MUST show the relevant UI state change
+- Compare expected vs actual values explicitly — do not just eyeball it
+
+### 4. Negative Test Cases Are Mandatory
+- Test at least ONE negative case per feature (e.g., insufficient credits, invalid input, unauthorized access)
+- Verify error messages are user-friendly and accurate
+- Verify the system state did NOT change after a rejected operation
+
+### 5. Test Report Must Include Full Evidence
+Each test scenario in the report MUST have:
+- **Steps**: What was done (exact commands or UI actions)
+- **Expected**: What should happen
+- **Actual**: What actually happened
+- **API Evidence**: Before/after API response values for state-changing operations
+- **Screenshot Evidence**: Before/after screenshots with explanations
+
+## State Manipulation for Realistic Testing
+
+When testing features that depend on specific states (rate limits, credits, quotas):
+
+1. **Use Redis CLI to set counters directly:**
+   ```bash
+   # Find the Redis container
+   REDIS_CONTAINER=$(docker ps --format '{{.Names}}' | grep redis | head -1)
+   # Set a key with expiry
+   docker exec $REDIS_CONTAINER redis-cli SET key value EX ttl
+   # Example: Set rate limit counter to near-limit
+   docker exec $REDIS_CONTAINER redis-cli SET "rate_limit:user:test@test.com" 99 EX 3600
+   # Example: Check current value
+   docker exec $REDIS_CONTAINER redis-cli GET "rate_limit:user:test@test.com"
+   ```
+
+2. **Use API calls to check before/after state:**
+   ```bash
+   # BEFORE: Record current state
+   BEFORE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   echo "Credits BEFORE: $BEFORE"
+
+   # Perform the action...
+
+   # AFTER: Record new state and compare
+   AFTER=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   echo "Credits AFTER: $AFTER"
+   echo "Delta: $(( BEFORE - AFTER ))"
+   ```
+
+3. **Take screenshots BEFORE and AFTER state changes** — the UI must reflect the backend state change
+
+4. **Never rely on mocked/injected browser state** — always use real backend state. Do NOT use `agent-browser eval` to fake UI state. The backend must be the source of truth.
+
+5. **Use direct DB queries when needed:**
+   ```bash
+   # Query via Supabase's PostgREST or docker exec into the DB
+   docker exec supabase-db psql -U supabase_admin -d postgres -c "SELECT credits FROM user_credits WHERE user_id = '...';"
+   ```
+
+6. **After every API test, verify the state change actually persisted:**
+   ```bash
+   # Example: After a credits purchase, verify DB matches API
+   API_CREDITS=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   DB_CREDITS=$(docker exec supabase-db psql -U supabase_admin -d postgres -t -c "SELECT credits FROM user_credits WHERE user_id = '...';" | tr -d ' ')
+   [ "$API_CREDITS" = "$DB_CREDITS" ] && echo "CONSISTENT" || echo "MISMATCH: API=$API_CREDITS DB=$DB_CREDITS"
+   ```
+
+## Arguments
+
+- `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
+- If `--fix` flag is present, auto-fix bugs found and push fixes (like pr-address loop)
+
+## Step 0: Resolve the target
+
+```bash
+# If argument is a PR number, find its worktree
+gh pr view {N} --json headRefName --jq '.headRefName'
+# If argument is a path, use it directly
+```
+
+Determine:
+- `REPO_ROOT` — the root repo directory: `git -C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'` (or `git rev-parse --show-toplevel` if not a worktree)
+- `WORKTREE_PATH` — the worktree directory
+- `PLATFORM_DIR` — `$WORKTREE_PATH/autogpt_platform`
+- `BACKEND_DIR` — `$PLATFORM_DIR/backend`
+- `FRONTEND_DIR` — `$PLATFORM_DIR/frontend`
+- `PR_NUMBER` — the PR number (from `gh pr list --head $(git branch --show-current)`)
+- `PR_TITLE` — the PR title, slugified (e.g. "Add copilot permissions" → "add-copilot-permissions")
+- `RESULTS_DIR` — `$REPO_ROOT/test-results/PR-{PR_NUMBER}-{slugified-title}`
+
+Create the results directory:
+```bash
+PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')
+PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50)
+RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}"
+mkdir -p $RESULTS_DIR
+```
+
+**Test user credentials** (for logging into the UI or verifying results manually):
+- Email: `test@test.com`
+- Password: `testtest123`
+
+## Step 1: Understand the PR
+
+Before testing, understand what changed:
+
+```bash
+cd $WORKTREE_PATH
+
+# Read PR description to understand the WHY
+gh pr view {N} --json body --jq '.body'
+
+git log --oneline dev..HEAD | head -20
+git diff dev --stat
+```
+
+Read the PR description (Why / What / How) and changed files to understand:
+0. **Why** does this PR exist? What problem does it solve?
+1. **What** feature/fix does this PR implement?
+2. **How** does it work? What's the approach?
+3. What components are affected? (backend, frontend, copilot, executor, etc.)
+4. What are the key user-facing behaviors to test?
+
+## Step 2: Write test scenarios
+
+Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:
+
+```markdown
+# Test Plan: PR #{N} — {title}
+
+## Scenarios
+1. [Scenario name] — [what to verify]
+2. ...
+
+## API Tests (if applicable)
+1. [Endpoint] — [expected behavior]
+   - Before state: [what to check before]
+   - After state: [what to verify changed]
+
+## UI Tests (if applicable)
+1. [Page/component] — [interaction to test]
+   - Screenshot before: [what to capture]
+   - Screenshot after: [what to capture]
+
+## Negative Tests (REQUIRED — at least one per feature)
+1. [What should NOT happen] — [how to trigger it]
+   - Expected error: [what error message/code]
+   - State unchanged: [what to verify did NOT change]
+```
+
+**Be critical** — include edge cases, error paths, and security checks. Every scenario MUST specify what screenshots to take and what state to verify.
+
+## Step 3: Environment setup
+
+### 3a. Copy .env files from the root worktree
+
+The root worktree (`$REPO_ROOT`) has the canonical `.env` files with all API keys. Copy them to the target worktree:
+
+```bash
+# CRITICAL: .env files are NOT checked into git. They must be copied manually.
+cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env
+cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env
+cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env
+```
+
+### 3b. Configure copilot authentication
+
+The copilot needs an LLM API to function. Two approaches (try subscription first):
+
+#### Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription)
+
+The `claude_agent_sdk` Python package **bundles its own Claude CLI binary** — no need to install `@anthropic-ai/claude-code` via npm. The backend auto-provisions credentials from environment variables on startup.
+
+Run the helper script to extract tokens from your host and auto-update `backend/.env` (works on macOS, Linux, and Windows/WSL):
+
+```bash
+# Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env
+bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env
+```
+
+**How it works:** The script reads the OAuth token from:
+- **macOS**: system keychain (`"Claude Code-credentials"`)
+- **Linux/WSL**: `~/.claude/.credentials.json`
+- **Windows**: `%APPDATA%/claude/.credentials.json`
+
+It sets `CLAUDE_CODE_OAUTH_TOKEN`, `CLAUDE_CODE_REFRESH_TOKEN`, and `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` in the `.env` file. On container startup, the backend auto-provisions `~/.claude/.credentials.json` inside the container from these env vars. The SDK's bundled CLI then authenticates using that file. No `claude login`, no npm install needed.
+
+**Note:** The OAuth token expires (~24h). If copilot returns auth errors, re-run the script and restart: `$BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env && docker compose up -d copilot_executor`
+
+#### Option 2: OpenRouter API key mode (fallback)
+
+If subscription mode doesn't work, switch to API key mode using OpenRouter:
+
+```bash
+# In $BACKEND_DIR/.env, ensure these are set:
+CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false
+CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env>
+CHAT_BASE_URL=https://openrouter.ai/api/v1
+CHAT_USE_CLAUDE_AGENT_SDK=true
+```
+
+Use `sed` to update these values:
+```bash
+ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2)
+[ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; }
+perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env
+# Add or update CHAT_API_KEY and CHAT_BASE_URL
+grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env
+grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env
+```
+
+### 3c. Stop conflicting containers
+
+```bash
+# Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav)
+docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do
+  docker stop "$name" 2>/dev/null
+done
+```
+
+### 3e. Build and start
+
+```bash
+cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
+if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker build failed"; exit 1; fi
+
+cd $PLATFORM_DIR && docker compose up -d 2>&1 | tail -20
+if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker compose up failed"; exit 1; fi
+```
+
+**Note:** If the container appears to be running old code (e.g. missing PR changes), use `docker compose build --no-cache` to force a full rebuild. Docker BuildKit may sometimes reuse cached `COPY` layers from a previous build on a different branch.
+
+**Expected time: 3-8 minutes** for build, 5-10 minutes with `--no-cache`.
+
+### 3f. Wait for services to be ready
+
+```bash
+# Poll until backend and frontend respond
+for i in $(seq 1 60); do
+  BACKEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8006/docs 2>/dev/null)
+  FRONTEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null)
+  if [ "$BACKEND" = "200" ] && [ "$FRONTEND" = "200" ]; then
+    echo "Services ready"
+    break
+  fi
+  sleep 5
+done
+```
+
+
+### 3h. Create test user and get auth token
+
+```bash
+ANON_KEY=$(grep "NEXT_PUBLIC_SUPABASE_ANON_KEY=" $FRONTEND_DIR/.env | sed 's/.*NEXT_PUBLIC_SUPABASE_ANON_KEY=//' | tr -d '[:space:]')
+
+# Signup (idempotent — returns "User already registered" if exists)
+RESULT=$(curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
+  -H "apikey: $ANON_KEY" \
+  -H 'Content-Type: application/json' \
+  -d '{"email":"test@test.com","password":"testtest123"}')
+
+# If "Database error finding user", restart supabase-auth and retry
+if echo "$RESULT" | grep -q "Database error"; then
+  docker restart supabase-auth && sleep 5
+  curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
+    -H "apikey: $ANON_KEY" \
+    -H 'Content-Type: application/json' \
+    -d '{"email":"test@test.com","password":"testtest123"}'
+fi
+
+# Get auth token
+TOKEN=$(curl -s -X POST 'http://localhost:8000/auth/v1/token?grant_type=password' \
+  -H "apikey: $ANON_KEY" \
+  -H 'Content-Type: application/json' \
+  -d '{"email":"test@test.com","password":"testtest123"}' | jq -r '.access_token // ""')
+```
+
+**Use this token for ALL API calls:**
+```bash
+curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...
+```
+
+## Step 4: Run tests
+
+### Service ports reference
+
+| Service | Port | URL |
+|---------|------|-----|
+| Frontend | 3000 | http://localhost:3000 |
+| Backend REST | 8006 | http://localhost:8006 |
+| Supabase Auth (via Kong) | 8000 | http://localhost:8000 |
+| Executor | 8002 | http://localhost:8002 |
+| Copilot Executor | 8008 | http://localhost:8008 |
+| WebSocket | 8001 | http://localhost:8001 |
+| Database Manager | 8005 | http://localhost:8005 |
+| Redis | 6379 | localhost:6379 |
+| RabbitMQ | 5672 | localhost:5672 |
+
+### API testing
+
+Use `curl` with the auth token for backend API tests. **For EVERY API call that changes state, record before/after values:**
+
+```bash
+# Example: List agents
+curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/graphs | jq . | head -20
+
+# Example: Create an agent
+curl -s -X POST http://localhost:8006/api/graphs \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{...}' | jq .
+
+# Example: Run an agent
+curl -s -X POST "http://localhost:8006/api/graphs/{graph_id}/execute" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{"data": {...}}'
+
+# Example: Get execution results
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
+```
+
+**State verification pattern (use for EVERY state-changing API call):**
+```bash
+# 1. Record BEFORE state
+BEFORE_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
+echo "BEFORE: $BEFORE_STATE"
+
+# 2. Perform the action
+ACTION_RESULT=$(curl -s -X POST ... | jq .)
+echo "ACTION RESULT: $ACTION_RESULT"
+
+# 3. Record AFTER state
+AFTER_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
+echo "AFTER: $AFTER_STATE"
+
+# 4. Log the comparison
+echo "=== STATE CHANGE VERIFICATION ==="
+echo "Before: $BEFORE_STATE"
+echo "After: $AFTER_STATE"
+echo "Expected change: {describe what should have changed}"
+```
+
+### Browser testing with agent-browser
+
+```bash
+# Close any existing session
+agent-browser close 2>/dev/null || true
+
+# Use --session-name to persist cookies across navigations
+# This means login only needs to happen once per test session
+agent-browser --session-name pr-test open 'http://localhost:3000/login' --timeout 15000
+
+# Get interactive elements
+agent-browser --session-name pr-test snapshot | grep "textbox\|button"
+
+# Login
+agent-browser --session-name pr-test fill {email_ref} "test@test.com"
+agent-browser --session-name pr-test fill {password_ref} "testtest123"
+agent-browser --session-name pr-test click {login_button_ref}
+sleep 5
+
+# Dismiss cookie banner if present
+agent-browser --session-name pr-test click 'text=Accept All' 2>/dev/null || true
+
+# Navigate — cookies are preserved so login persists
+agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
+
+# Take screenshot
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/01-page.png
+
+# Interact with elements
+agent-browser --session-name pr-test fill {ref} "text"
+agent-browser --session-name pr-test press "Enter"
+agent-browser --session-name pr-test click {ref}
+agent-browser --session-name pr-test click 'text=Button Text'
+
+# Read page content
+agent-browser --session-name pr-test snapshot | grep "text:"
+```
+
+**Key pages:**
+- `/copilot` — CoPilot chat (for testing copilot features)
+- `/build` — Agent builder (for testing block/node features)
+- `/build?flowID={id}` — Specific agent in builder
+- `/library` — Agent library (for testing listing/import features)
+- `/library/agents/{id}` — Agent detail with run history
+- `/marketplace` — Marketplace
+
+### Checking logs
+
+```bash
+# Backend REST server
+docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
+
+# Executor (runs agent graphs)
+docker logs autogpt_platform-executor-1 2>&1 | tail -30
+
+# Copilot executor (runs copilot chat sessions)
+docker logs autogpt_platform-copilot_executor-1 2>&1 | tail -30
+
+# Frontend
+docker logs autogpt_platform-frontend-1 2>&1 | tail -30
+
+# Filter for errors
+docker logs autogpt_platform-executor-1 2>&1 | grep -i "error\|exception\|traceback" | tail -20
+```
+
+### Copilot chat testing
+
+The copilot uses SSE streaming. To test via API:
+
+```bash
+# Create a session
+SESSION_ID=$(curl -s -X POST 'http://localhost:8006/api/chat/sessions' \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{}' | jq -r '.id // .session_id // ""')
+
+# Stream a message (SSE - will stream chunks)
+curl -N -X POST "http://localhost:8006/api/chat/sessions/$SESSION_ID/stream" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{"message": "Hello, what can you help me with?"}' \
+  --max-time 60 2>/dev/null | head -50
+```
+
+Or test via browser (preferred for UI verification):
+```bash
+agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
+# ... fill chat input and press Enter, wait 20-30s for response
+```
+
+## Step 5: Record results and take screenshots
+
+**Take a screenshot at EVERY significant test step** — before and after interactions, on success, and on failure. This is NON-NEGOTIABLE.
+
+**Required screenshot pattern for each test scenario:**
+```bash
+# BEFORE the action
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-before.png
+
+# Perform the action...
+
+# AFTER the action
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-after.png
+```
+
+**Naming convention:**
+```bash
+# Examples:
+# $RESULTS_DIR/01-login-page-before.png
+# $RESULTS_DIR/02-login-page-after.png
+# $RESULTS_DIR/03-credits-page-before.png
+# $RESULTS_DIR/04-credits-purchase-after.png
+# $RESULTS_DIR/05-negative-insufficient-credits.png
+# $RESULTS_DIR/06-error-state.png
+```
+
+**Minimum requirements:**
+- At least TWO screenshots per test scenario (before + after)
+- At least ONE screenshot for each negative test case showing the error state
+- If a test fails, screenshot the failure state AND any error logs visible in the UI
+
+## Step 6: Show results to user with screenshots
+
+**CRITICAL: After all tests complete, you MUST show every screenshot to the user using the Read tool, with an explanation of what each screenshot shows.** This is the most important part of the test report — the user needs to visually verify the results.
+
+For each screenshot:
+1. Use the `Read` tool to display the PNG file (Claude can read images)
+2. Write a 1-2 sentence explanation below it describing:
+   - What page/state is being shown
+   - What the screenshot proves (which test scenario it validates)
+   - Any notable details visible in the UI
+
+Format the output like this:
+
+```markdown
+### Screenshot 1: {descriptive title}
+[Read the PNG file here]
+
+**What it shows:** {1-2 sentence explanation of what this screenshot proves}
+
+---
+```
+
+After showing all screenshots, output a **detailed** summary table:
+
+| # | Scenario | Result | API Evidence | Screenshot Evidence |
+|---|----------|--------|-------------|-------------------|
+| 1 | {name} | PASS/FAIL | Before: X, After: Y | 01-before.png, 02-after.png |
+| 2 | ... | ... | ... | ... |
+
+**IMPORTANT:** As you show each screenshot and record test results, persist them in shell variables for Step 7:
+
+```bash
+# Build these variables during Step 6 — they are required by Step 7's script
+# NOTE: declare -A requires Bash 4.0+. This is standard on modern systems (macOS ships zsh
+# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
+# plain variable with a lookup function instead.
+declare -A SCREENSHOT_EXPLANATIONS=(
+  ["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
+  ["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
+  # ... one entry per screenshot, using the same explanations you showed the user above
+)
+
+TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
+| 2 | Credits purchase | PASS | Before: 100, After: 95 | 03-credits-before.png, 04-credits-after.png |
+| 3 | Insufficient credits (negative) | PASS | Credits: 0, rejected | 05-insufficient-credits-error.png |"
+# ... one row per test scenario with actual results
+```
+
+## Step 7: Post test report as PR comment with screenshots
+
+Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees), then post a comment with inline images and per-screenshot explanations.
+
+**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
+
+```bash
+# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
+REPO="Significant-Gravitas/AutoGPT"
+SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
+SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"
+
+# Step 1: Create blobs for each screenshot and build tree JSON
+# Retry each blob upload up to 3 times. If still failing, list them at end of report.
+shopt -s nullglob
+SCREENSHOT_FILES=("$RESULTS_DIR"/*.png)
+if [ ${#SCREENSHOT_FILES[@]} -eq 0 ]; then
+  echo "ERROR: No screenshots found in $RESULTS_DIR. Test run is incomplete."
+  exit 1
+fi
+TREE_JSON='['
+FIRST=true
+FAILED_UPLOADS=()
+for img in "${SCREENSHOT_FILES[@]}"; do
+  BASENAME=$(basename "$img")
+  B64=$(base64 < "$img")
+  BLOB_SHA=""
+  for attempt in 1 2 3; do
+    BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha' 2>/dev/null || true)
+    [ -n "$BLOB_SHA" ] && break
+    sleep 1
+  done
+  if [ -z "$BLOB_SHA" ]; then
+    FAILED_UPLOADS+=("$img")
+    continue
+  fi
+  if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
+  TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
+done
+TREE_JSON+=']'
+
+# Step 2: Create tree, commit, and branch ref
+TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
+COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+  -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+  -f tree="$TREE_SHA" \
+  --jq '.sha')
+gh api "repos/${REPO}/git/refs" \
+  -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
+  -f sha="$COMMIT_SHA" 2>/dev/null \
+  || gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
+    -X PATCH -f sha="$COMMIT_SHA" -f force=true
+```
+
+Then post the comment with **inline images AND explanations for each screenshot**:
+
+```bash
+REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"
+
+# Build image markdown using uploaded image URLs; skip FAILED_UPLOADS (listed separately)
+
+IMAGE_MARKDOWN=""
+for img in "${SCREENSHOT_FILES[@]}"; do
+  BASENAME=$(basename "$img")
+  TITLE=$(echo "${BASENAME%.png}" | sed 's/^[0-9]*-//' | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
+  # Skip images that failed to upload — they will be listed at the end
+  IS_FAILED=false
+  for failed in "${FAILED_UPLOADS[@]}"; do
+    [ "$(basename "$failed")" = "$BASENAME" ] && IS_FAILED=true && break
+  done
+  if [ "$IS_FAILED" = true ]; then
+    continue
+  fi
+  EXPLANATION="${SCREENSHOT_EXPLANATIONS[$BASENAME]}"
+  if [ -z "$EXPLANATION" ]; then
+    echo "ERROR: Missing screenshot explanation for $BASENAME. Add it to SCREENSHOT_EXPLANATIONS in Step 6."
+    exit 1
+  fi
+  IMAGE_MARKDOWN="${IMAGE_MARKDOWN}
+### ${TITLE}
+![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})
+${EXPLANATION}
+"
+done
+
+# Write comment body to file to avoid shell interpretation issues with special characters
+COMMENT_FILE=$(mktemp)
+# If any uploads failed, append a section listing them with instructions
+FAILED_SECTION=""
+if [ ${#FAILED_UPLOADS[@]} -gt 0 ]; then
+  FAILED_SECTION="
+## ⚠️ Failed Screenshot Uploads
+The following screenshots could not be uploaded via the GitHub API after 3 retries.
+**To add them:** drag-and-drop or paste these files into a PR comment manually:
+"
+  for failed in "${FAILED_UPLOADS[@]}"; do
+    FAILED_SECTION="${FAILED_SECTION}
+- \`$(basename "$failed")\` (local path: \`$failed\`)"
+  done
+  FAILED_SECTION="${FAILED_SECTION}
+
+**Run status:** INCOMPLETE until the files above are manually attached and visible inline in the PR."
+fi
+
+cat > "$COMMENT_FILE" <<INNEREOF
+## E2E Test Report
+
+| # | Scenario | Result | API Evidence | Screenshot Evidence |
+|---|----------|--------|-------------|-------------------|
+${TEST_RESULTS_TABLE}
+
+${IMAGE_MARKDOWN}
+${FAILED_SECTION}
+INNEREOF
+
+gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
+rm -f "$COMMENT_FILE"
+```
+
+**The PR comment MUST include:**
+1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
+2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
+3. A 1-2 sentence explanation below each screenshot describing what it proves
+
+This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
+
+## Fix mode (--fix flag)
+
+When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.
+
+### Fix protocol for EVERY issue found (including UX issues):
+
+1. **Identify** the root cause in the code — read the relevant source files
+2. **Write a failing test first** (TDD): For backend bugs, write a test marked with `pytest.mark.xfail(reason="...")`. For frontend/Playwright bugs, write a test with `.fixme` annotation. Run it to confirm it fails as expected.
+3. **Screenshot** the broken state: `agent-browser screenshot $RESULTS_DIR/{NN}-broken-{description}.png`
+4. **Fix** the code in the worktree
+5. **Rebuild** ONLY the affected service (not the whole stack):
+   ```bash
+   cd $PLATFORM_DIR && docker compose up --build -d {service_name}
+   # e.g., docker compose up --build -d rest_server
+   # e.g., docker compose up --build -d frontend
+   ```
+6. **Wait** for the service to be ready (poll health endpoint)
+7. **Re-test** the same scenario
+8. **Screenshot** the fixed state: `agent-browser screenshot $RESULTS_DIR/{NN}-fixed-{description}.png`
+9. **Remove the xfail/fixme marker** from the test written in step 2, and verify it passes
+10. **Verify** the fix did not break other scenarios (run a quick smoke test)
+11. **Commit and push** immediately:
+   ```bash
+   cd $WORKTREE_PATH
+   git add -A
+   git commit -m "fix: {description of fix}"
+   git push
+   ```
+12. **Continue** to the next test scenario
+
+### Fix loop (like pr-address)
+
+```text
+test scenario → find issue (bug OR UX problem) → screenshot broken state
+→ fix code → rebuild affected service only → re-test → screenshot fixed state
+→ verify no regressions → commit + push
+→ repeat for next scenario
+→ after ALL scenarios pass, run full re-test to verify everything together
+```
+
+**Key differences from non-fix mode:**
+- UX issues count as bugs — fix them (bad alignment, confusing labels, missing loading states)
+- Every fix MUST have a before/after screenshot pair proving it works
+- Commit after EACH fix, not in a batch at the end
+- The final re-test must produce a clean set of all-passing screenshots
+
+## Known issues and workarounds
+
+### Problem: "Database error finding user" on signup
+**Cause:** Supabase auth service schema cache is stale after migration.
+**Fix:** `docker restart supabase-auth && sleep 5` then retry signup.
+
+### Problem: Copilot returns auth errors in subscription mode
+**Cause:** `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` but `CLAUDE_CODE_OAUTH_TOKEN` is not set or expired.
+**Fix:** Re-extract the OAuth token from macOS keychain (see step 3b, Option 1) and recreate the container (`docker compose up -d copilot_executor`). The backend auto-provisions `~/.claude/.credentials.json` from the env var on startup. No `npm install` or `claude login` needed — the SDK bundles its own CLI binary.
+
+### Problem: agent-browser can't find chromium
+**Cause:** The Dockerfile auto-provisions system chromium on all architectures (including ARM64). If your branch is behind `dev`, this may not be present yet.
+**Fix:** Check if chromium exists: `which chromium || which chromium-browser`. If missing, install it: `apt-get install -y chromium` and set `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` in the container environment.
+
+### Problem: agent-browser selector matches multiple elements
+**Cause:** `text=X` matches all elements containing that text.
+**Fix:** Use `agent-browser snapshot` to get specific `ref=eNN` references, then use those: `agent-browser click eNN`.
+
+### Problem: Frontend shows cookie banner blocking interaction
+**Fix:** `agent-browser click 'text=Accept All'` before other interactions.
+
+### Problem: Container loses npm packages after rebuild
+**Cause:** `docker compose up --build` rebuilds the image, losing runtime installs.
+**Fix:** Add packages to the Dockerfile instead of installing at runtime.
+
+### Problem: Services not starting after `docker compose up`
+**Fix:** Wait and check health: `docker compose ps`. Common cause: migration hasn't finished. Check: `docker logs autogpt_platform-migrate-1 2>&1 | tail -5`. If supabase-db isn't healthy: `docker restart supabase-db && sleep 10`.
+
+### Problem: Docker uses cached layers with old code (PR changes not visible)
+**Cause:** `docker compose up --build` reuses cached `COPY` layers from previous builds. If the PR branch changes Python files but the previous build already cached that layer from `dev`, the container runs `dev` code.
+**Fix:** Always use `docker compose build --no-cache` for the first build of a PR branch. Subsequent rebuilds within the same branch can use `--build`.
+
+### Problem: `agent-browser open` loses login session
+**Cause:** Without session persistence, `agent-browser open` starts fresh.
+**Fix:** Use `--session-name pr-test` on ALL agent-browser commands. This auto-saves/restores cookies and localStorage across navigations. Alternatively, use `agent-browser eval "window.location.href = '...'"` to navigate within the same context.
+
+### Problem: Supabase auth returns "Database error querying schema"
+**Cause:** The database schema changed (migration ran) but supabase-auth has a stale schema cache.
+**Fix:** `docker restart supabase-db && sleep 10 && docker restart supabase-auth && sleep 8`. If user data was lost, re-signup.
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,8 +1,12 @@
-<!-- Clearly explain the need for these changes: -->
+### Why / What / How
+
+<!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? -->
+<!-- What: What does this PR change? Summarize the changes at a high level. -->
+<!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. -->

 ### Changes 🏗️

-<!-- Concisely describe all of the changes made in this pull request: -->
+<!-- List the key changes. Keep it higher level than the diff but specific enough to highlight what's new/modified. -->

 ### Checklist 📋

--- a/autogpt_platform/CLAUDE.md
+++ b/autogpt_platform/CLAUDE.md
@@ -55,6 +55,7 @@ AutoGPT Platform is a monorepo containing:
 - Create the PR against the `dev` branch of the repository.
 - Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
 - Use conventional commit messages (see below)
+- **Structure the PR description with Why / What / How** — Why: the motivation (what problem it solves, what's broken/missing without it); What: high-level summary of changes; How: approach, key implementation details, or architecture decisions. Reviewers need all three to judge whether the approach fits the problem.
 - Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
 - Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
  ```bash
--- a/autogpt_platform/autogpt_libs/poetry.lock
+++ b/autogpt_platform/autogpt_libs/poetry.lock
@@ -1,4 +1,4 @@
-# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.
+# This file is automatically @generated by Poetry 2.2.1 and should not be changed by hand.

 [[package]]
 name = "annotated-doc"
@@ -67,7 +67,7 @@ description = "Backport of asyncio.Runner, a context manager that controls event
 optional = false
 python-versions = "<3.11,>=3.8"
 groups = ["dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "backports_asyncio_runner-1.2.0-py3-none-any.whl", hash = "sha256:0da0a936a8aeb554eccb426dc55af3ba63bcdc69fa1a600b5bb305413a4477b5"},
    {file = "backports_asyncio_runner-1.2.0.tar.gz", hash = "sha256:a5aa7b2b7d8f8bfcaa2b57313f70792df84e32a2a746f585213373f900b42162"},
@@ -541,7 +541,7 @@ description = "Backport of PEP 654 (exception groups)"
 optional = false
 python-versions = ">=3.7"
 groups = ["main", "dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "exceptiongroup-1.3.0-py3-none-any.whl", hash = "sha256:4d111e6e0c13d0644cad6ddaa7ed0261a0b36971f6d23e7ec9b4b9097da78a10"},
    {file = "exceptiongroup-1.3.0.tar.gz", hash = "sha256:b241f5885f560bc56a59ee63ca4c6a8bfa46ae4ad651af316d4e81817bb9fd88"},
@@ -2181,14 +2181,14 @@ testing = ["coverage (>=6.2)", "hypothesis (>=5.7.1)"]

 [[package]]
 name = "pytest-cov"
-version = "7.0.0"
+version = "7.1.0"
 description = "Pytest plugin for measuring coverage."
 optional = false
 python-versions = ">=3.9"
 groups = ["dev"]
 files = [
-    {file = "pytest_cov-7.0.0-py3-none-any.whl", hash = "sha256:3b8e9558b16cc1479da72058bdecf8073661c7f57f7d3c5f22a1c23507f2d861"},
-    {file = "pytest_cov-7.0.0.tar.gz", hash = "sha256:33c97eda2e049a0c5298e91f519302a1334c26ac65c1a483d6206fd458361af1"},
+    {file = "pytest_cov-7.1.0-py3-none-any.whl", hash = "sha256:a0461110b7865f9a271aa1b51e516c9a95de9d696734a2f71e3e78f46e1d4678"},
+    {file = "pytest_cov-7.1.0.tar.gz", hash = "sha256:30674f2b5f6351aa09702a9c8c364f6a01c27aae0c1366ae8016160d1efc56b2"},
 ]

 [package.dependencies]
@@ -2342,30 +2342,30 @@ pyasn1 = ">=0.1.3"

 [[package]]
 name = "ruff"
-version = "0.15.0"
+version = "0.15.7"
 description = "An extremely fast Python linter and code formatter, written in Rust."
 optional = false
 python-versions = ">=3.7"
 groups = ["dev"]
 files = [
-    {file = "ruff-0.15.0-py3-none-linux_armv6l.whl", hash = "sha256:aac4ebaa612a82b23d45964586f24ae9bc23ca101919f5590bdb368d74ad5455"},
-    {file = "ruff-0.15.0-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:dcd4be7cc75cfbbca24a98d04d0b9b36a270d0833241f776b788d59f4142b14d"},
-    {file = "ruff-0.15.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d747e3319b2bce179c7c1eaad3d884dc0a199b5f4d5187620530adf9105268ce"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:650bd9c56ae03102c51a5e4b554d74d825ff3abe4db22b90fd32d816c2e90621"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a6664b7eac559e3048223a2da77769c2f92b43a6dfd4720cef42654299a599c9"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6f811f97b0f092b35320d1556f3353bf238763420ade5d9e62ebd2b73f2ff179"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:761ec0a66680fab6454236635a39abaf14198818c8cdf691e036f4bc0f406b2d"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:940f11c2604d317e797b289f4f9f3fa5555ffe4fb574b55ed006c3d9b6f0eb78"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bcbca3d40558789126da91d7ef9a7c87772ee107033db7191edefa34e2c7f1b4"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:9a121a96db1d75fa3eb39c4539e607f628920dd72ff1f7c5ee4f1b768ac62d6e"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5298d518e493061f2eabd4abd067c7e4fb89e2f63291c94332e35631c07c3662"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:afb6e603d6375ff0d6b0cee563fa21ab570fd15e65c852cb24922cef25050cf1"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:77e515f6b15f828b94dc17d2b4ace334c9ddb7d9468c54b2f9ed2b9c1593ef16"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:6f6e80850a01eb13b3e42ee0ebdf6e4497151b48c35051aab51c101266d187a3"},
-    {file = "ruff-0.15.0-py3-none-win32.whl", hash = "sha256:238a717ef803e501b6d51e0bdd0d2c6e8513fe9eec14002445134d3907cd46c3"},
-    {file = "ruff-0.15.0-py3-none-win_amd64.whl", hash = "sha256:dd5e4d3301dc01de614da3cdffc33d4b1b96fb89e45721f1598e5532ccf78b18"},
-    {file = "ruff-0.15.0-py3-none-win_arm64.whl", hash = "sha256:c480d632cc0ca3f0727acac8b7d053542d9e114a462a145d0b00e7cd658c515a"},
-    {file = "ruff-0.15.0.tar.gz", hash = "sha256:6bdea47cdbea30d40f8f8d7d69c0854ba7c15420ec75a26f463290949d7f7e9a"},
+    {file = "ruff-0.15.7-py3-none-linux_armv6l.whl", hash = "sha256:a81cc5b6910fb7dfc7c32d20652e50fa05963f6e13ead3c5915c41ac5d16668e"},
+    {file = "ruff-0.15.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:722d165bd52403f3bdabc0ce9e41fc47070ac56d7a91b4e0d097b516a53a3477"},
+    {file = "ruff-0.15.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:7fbc2448094262552146cbe1b9643a92f66559d3761f1ad0656d4991491af49e"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b39329b60eba44156d138275323cc726bbfbddcec3063da57caa8a8b1d50adf"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:87768c151808505f2bfc93ae44e5f9e7c8518943e5074f76ac21558ef5627c85"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:fb0511670002c6c529ec66c0e30641c976c8963de26a113f3a30456b702468b0"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e0d19644f801849229db8345180a71bee5407b429dd217f853ec515e968a6912"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4806d8e09ef5e84eb19ba833d0442f7e300b23fe3f0981cae159a248a10f0036"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dce0896488562f09a27b9c91b1f58a097457143931f3c4d519690dea54e624c5"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:1852ce241d2bc89e5dc823e03cff4ce73d816b5c6cdadd27dbfe7b03217d2a12"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5f3e4b221fb4bd293f79912fc5e93a9063ebd6d0dcbd528f91b89172a9b8436c"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:b15e48602c9c1d9bdc504b472e90b90c97dc7d46c7028011ae67f3861ceba7b4"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:1b4705e0e85cedc74b0a23cf6a179dbb3df184cb227761979cc76c0440b5ab0d"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:112c1fa316a558bb34319282c1200a8bf0495f1b735aeb78bfcb2991e6087580"},
+    {file = "ruff-0.15.7-py3-none-win32.whl", hash = "sha256:6d39e2d3505b082323352f733599f28169d12e891f7dd407f2d4f54b4c2886de"},
+    {file = "ruff-0.15.7-py3-none-win_amd64.whl", hash = "sha256:4d53d712ddebcd7dace1bc395367aec12c057aacfe9adbb6d832302575f4d3a1"},
+    {file = "ruff-0.15.7-py3-none-win_arm64.whl", hash = "sha256:18e8d73f1c3fdf27931497972250340f92e8c861722161a9caeb89a58ead6ed2"},
+    {file = "ruff-0.15.7.tar.gz", hash = "sha256:04f1ae61fc20fe0b148617c324d9d009b5f63412c0b16474f3d5f1a1a665f7ac"},
 ]

 [[package]]
@@ -2564,7 +2564,7 @@ description = "A lil' TOML parser"
 optional = false
 python-versions = ">=3.8"
 groups = ["dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "tomli-2.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:678e4fa69e4575eb77d103de3df8a895e1591b48e740211bd1067378c69e8249"},
    {file = "tomli-2.2.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:023aa114dd824ade0100497eb2318602af309e5a55595f76b626d6d9f3b7b0a6"},
@@ -2912,4 +2912,4 @@ type = ["pytest-mypy"]
 [metadata]
 lock-version = "2.1"
 python-versions = ">=3.10,<4.0"
-content-hash = "9619cae908ad38fa2c48016a58bcf4241f6f5793aa0e6cc140276e91c433cbbb"
+content-hash = "e0936a065565550afed18f6298b7e04e814b44100def7049f1a0d68662624a39"
--- a/autogpt_platform/autogpt_libs/pyproject.toml
+++ b/autogpt_platform/autogpt_libs/pyproject.toml
@@ -26,8 +26,8 @@ pyright = "^1.1.408"
 pytest = "^8.4.1"
 pytest-asyncio = "^1.3.0"
 pytest-mock = "^3.15.1"
-pytest-cov = "^7.0.0"
-ruff = "^0.15.0"
+pytest-cov = "^7.1.0"
+ruff = "^0.15.7"

 [build-system]
 requires = ["poetry-core"]
--- a/autogpt_platform/backend/Dockerfile
+++ b/autogpt_platform/backend/Dockerfile
@@ -121,36 +121,20 @@ RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
 COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries

-# Install agent-browser (Copilot browser tool) + Chromium.
-# On amd64: install runtime libs + run `agent-browser install` to download
-#   Chrome for Testing (pinned version, tested with Playwright).
-# On arm64: install system chromium package — Chrome for Testing has no ARM64
-#   binary. AGENT_BROWSER_EXECUTABLE_PATH is set at runtime by the entrypoint
-#   script (below) to redirect agent-browser to the system binary.
-ARG TARGETARCH
+# Install agent-browser (Copilot browser tool) using the system chromium package.
+# Chrome for Testing (the binary agent-browser downloads via `agent-browser install`)
+# has no ARM64 builds, so we use the distro-packaged chromium instead — verified to
+# work with agent-browser via Docker tests on arm64; amd64 is validated in CI.
+# Note: system chromium tracks the Debian package schedule rather than a pinned
+# Chrome for Testing release. If agent-browser requires a specific Chrome version,
+# verify compatibility against the chromium package version in the base image.
 RUN apt-get update \
-    && if [ "$TARGETARCH" = "arm64" ]; then \
-         apt-get install -y --no-install-recommends chromium fonts-liberation; \
-       else \
-         apt-get install -y --no-install-recommends \
-           libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
-           libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
-           libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
-           libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
-           fonts-liberation libfontconfig1; \
-       fi \
+    && apt-get install -y --no-install-recommends chromium fonts-liberation \
    && rm -rf /var/lib/apt/lists/* \
    && npm install -g agent-browser \
-    && ([ "$TARGETARCH" = "arm64" ] || agent-browser install) \
    && rm -rf /tmp/* /root/.npm

-# On arm64 the system chromium is at /usr/bin/chromium; set
-# AGENT_BROWSER_EXECUTABLE_PATH so agent-browser's daemon uses it instead of
-# Chrome for Testing (which has no ARM64 binary). On amd64 the variable is left
-# unset so agent-browser uses the Chrome for Testing binary it downloaded above.
-RUN printf '#!/bin/sh\n[ -x /usr/bin/chromium ] && export AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium\nexec "$@"\n' \
-    > /usr/local/bin/entrypoint.sh \
-    && chmod +x /usr/local/bin/entrypoint.sh
+ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium

 WORKDIR /app/autogpt_platform/backend

@@ -173,5 +157,4 @@ RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \

 ENV PORT=8000

-ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
 CMD ["rest"]
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
@@ -0,0 +1,93 @@
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from backend.data.graph import get_graph_as_admin
+
+# Shared constants
+ADMIN_USER_ID = "admin-user-id"
+CREATOR_USER_ID = "other-creator-id"
+GRAPH_ID = "test-graph-id"
+GRAPH_VERSION = 3
+
+
+def _make_mock_graph(user_id: str = CREATOR_USER_ID) -> MagicMock:
+    graph = MagicMock()
+    graph.userId = user_id
+    graph.id = GRAPH_ID
+    graph.version = GRAPH_VERSION
+    graph.Nodes = []
+    return graph
+
+
+@pytest.mark.asyncio
+async def test_admin_can_access_pending_agent_not_owned() -> None:
+    """Admin must be able to access a graph they don't own even if it's not
+    APPROVED in the marketplace. This is the core use case: reviewing a
+    submitted-but-pending agent from the admin dashboard."""
+    mock_graph = _make_mock_graph()
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch(
+            "backend.data.graph.AgentGraph.prisma",
+        ) as mock_prisma,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ),
+    ):
+        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
+
+        result = await get_graph_as_admin(
+            graph_id=GRAPH_ID,
+            version=GRAPH_VERSION,
+            user_id=ADMIN_USER_ID,
+            for_export=False,
+        )
+
+    assert (
+        result is not None
+    ), "Admin should be able to access a pending agent they don't own"
+    assert result is mock_graph_model
+
+
+@pytest.mark.asyncio
+async def test_admin_download_pending_agent_with_subagents() -> None:
+    """Admin export (for_export=True) of a pending agent must include
+    sub-graphs. This exercises the full export code path that the Download
+    button uses."""
+    mock_graph = _make_mock_graph()
+    mock_sub_graph = MagicMock(name="SubGraph")
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch(
+            "backend.data.graph.AgentGraph.prisma",
+        ) as mock_prisma,
+        patch(
+            "backend.data.graph.get_sub_graphs",
+            new_callable=AsyncMock,
+            return_value=[mock_sub_graph],
+        ) as mock_get_sub,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ) as mock_from_db,
+    ):
+        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
+
+        result = await get_graph_as_admin(
+            graph_id=GRAPH_ID,
+            version=GRAPH_VERSION,
+            user_id=ADMIN_USER_ID,
+            for_export=True,
+        )
+
+    assert result is not None, "Admin export of pending agent must succeed"
+    mock_get_sub.assert_awaited_once_with(mock_graph)
+    mock_from_db.assert_called_once_with(
+        graph=mock_graph,
+        sub_graphs=[mock_sub_graph],
+        for_export=True,
+    )
--- a/autogpt_platform/backend/backend/api/features/v1.py
+++ b/autogpt_platform/backend/backend/api/features/v1.py
@@ -592,6 +592,11 @@ async def fulfill_checkout(user_id: Annotated[str, Security(get_user_id)]):
 async def configure_user_auto_top_up(
    request: AutoTopUpConfig, user_id: Annotated[str, Security(get_user_id)]
 ) -> str:
+    """Configure auto top-up settings and perform an immediate top-up if needed.
+
+    Raises HTTPException(422) if the request parameters are invalid or if
+    the credit top-up fails.
+    """
    if request.threshold < 0:
        raise HTTPException(status_code=422, detail="Threshold must be greater than 0")
    if request.amount < 500 and request.amount != 0:
@@ -606,10 +611,20 @@ async def configure_user_auto_top_up(
    user_credit_model = await get_user_credit_model(user_id)
    current_balance = await user_credit_model.get_credits(user_id)

-    if current_balance < request.threshold:
-        await user_credit_model.top_up_credits(user_id, request.amount)
-    else:
-        await user_credit_model.top_up_credits(user_id, 0)
+    try:
+        if current_balance < request.threshold:
+            await user_credit_model.top_up_credits(user_id, request.amount)
+        else:
+            await user_credit_model.top_up_credits(user_id, 0)
+    except ValueError as e:
+        known_messages = (
+            "must not be negative",
+            "already exists for user",
+            "No payment method found",
+        )
+        if any(msg in str(e) for msg in known_messages):
+            raise HTTPException(status_code=422, detail=str(e))
+        raise

    await set_auto_top_up(
        user_id, AutoTopUpConfig(threshold=request.threshold, amount=request.amount)
--- a/autogpt_platform/backend/backend/api/features/workspace/routes.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes.py
@@ -188,6 +188,7 @@ async def upload_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
    file: UploadFile,
    session_id: str | None = Query(default=None),
+    overwrite: bool = Query(default=False),
 ) -> UploadFileResponse:
    """
    Upload a file to the user's workspace.
@@ -248,7 +249,9 @@ async def upload_file(
    # Write file via WorkspaceManager
    manager = WorkspaceManager(user_id, workspace.id, session_id)
    try:
-        workspace_file = await manager.write_file(content, filename)
+        workspace_file = await manager.write_file(
+            content, filename, overwrite=overwrite
+        )
    except ValueError as e:
        raise fastapi.HTTPException(status_code=409, detail=str(e)) from e

--- a/autogpt_platform/backend/backend/api/rest_api.py
+++ b/autogpt_platform/backend/backend/api/rest_api.py
@@ -210,13 +210,22 @@ instrument_fastapi(
 def handle_internal_http_error(status_code: int = 500, log_error: bool = True):
    def handler(request: fastapi.Request, exc: Exception):
        if log_error:
-            logger.exception(
-                "%s %s failed. Investigate and resolve the underlying issue: %s",
-                request.method,
-                request.url.path,
-                exc,
-                exc_info=exc,
-            )
+            if status_code >= 500:
+                logger.exception(
+                    "%s %s failed. Investigate and resolve the underlying issue: %s",
+                    request.method,
+                    request.url.path,
+                    exc,
+                    exc_info=exc,
+                )
+            else:
+                logger.warning(
+                    "%s %s failed with %d: %s",
+                    request.method,
+                    request.url.path,
+                    status_code,
+                    exc,
+                )

        hint = (
            "Adjust the request and retry."
@@ -266,12 +275,10 @@ async def validation_error_handler(


 app.add_exception_handler(PrismaError, handle_internal_http_error(500))
-app.add_exception_handler(
-    FolderAlreadyExistsError, handle_internal_http_error(409, False)
-)
-app.add_exception_handler(FolderValidationError, handle_internal_http_error(400, False))
-app.add_exception_handler(NotFoundError, handle_internal_http_error(404, False))
-app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403, False))
+app.add_exception_handler(FolderAlreadyExistsError, handle_internal_http_error(409))
+app.add_exception_handler(FolderValidationError, handle_internal_http_error(400))
+app.add_exception_handler(NotFoundError, handle_internal_http_error(404))
+app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403))
 app.add_exception_handler(RequestValidationError, validation_error_handler)
 app.add_exception_handler(pydantic.ValidationError, validation_error_handler)
 app.add_exception_handler(MissingConfigError, handle_internal_http_error(503))
--- a/autogpt_platform/backend/backend/blocks/autopilot.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot.py
@@ -15,6 +15,12 @@ from backend.blocks._base import (
    BlockSchemaInput,
    BlockSchemaOutput,
 )
+from backend.copilot.permissions import (
+    CopilotPermissions,
+    ToolName,
+    all_known_tool_names,
+    validate_block_identifiers,
+)
 from backend.data.model import SchemaField

 if TYPE_CHECKING:
@@ -96,6 +102,50 @@ class AutoPilotBlock(Block):
            advanced=True,
        )

+        tools: list[ToolName] = SchemaField(
+            description=(
+                "Tool names to filter. Works with tools_exclude to form an "
+                "allow-list or deny-list. "
+                "Leave empty to apply no tool filter."
+            ),
+            default=[],
+            advanced=True,
+        )
+
+        tools_exclude: bool = SchemaField(
+            description=(
+                "Controls how the 'tools' list is interpreted. "
+                "True (default): 'tools' is a deny-list — listed tools are blocked, "
+                "all others are allowed. An empty 'tools' list means allow everything. "
+                "False: 'tools' is an allow-list — only listed tools are permitted."
+            ),
+            default=True,
+            advanced=True,
+        )
+
+        blocks: list[str] = SchemaField(
+            description=(
+                "Block identifiers to filter when the copilot uses run_block. "
+                "Each entry can be: a block name (e.g. 'HTTP Request'), "
+                "a full block UUID, or the first 8 hex characters of the UUID "
+                "(e.g. 'c069dc6b'). Works with blocks_exclude. "
+                "Leave empty to apply no block filter."
+            ),
+            default=[],
+            advanced=True,
+        )
+
+        blocks_exclude: bool = SchemaField(
+            description=(
+                "Controls how the 'blocks' list is interpreted. "
+                "True (default): 'blocks' is a deny-list — listed blocks are blocked, "
+                "all others are allowed. An empty 'blocks' list means allow everything. "
+                "False: 'blocks' is an allow-list — only listed blocks are permitted."
+            ),
+            default=True,
+            advanced=True,
+        )
+
        # timeout_seconds removed: the SDK manages its own heartbeat-based
        # timeouts internally; wrapping with asyncio.timeout corrupts the
        # SDK's internal stream (see service.py CRITICAL comment).
@@ -184,7 +234,7 @@ class AutoPilotBlock(Block):

    async def create_session(self, user_id: str) -> str:
        """Create a new chat session and return its ID (mockable for tests)."""
-        from backend.copilot.model import create_chat_session
+        from backend.copilot.model import create_chat_session  # avoid circular import

        session = await create_chat_session(user_id)
        return session.session_id
@@ -196,6 +246,7 @@ class AutoPilotBlock(Block):
        session_id: str,
        max_recursion_depth: int,
        user_id: str,
+        permissions: "CopilotPermissions | None" = None,
    ) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
        """Invoke the copilot and collect all stream results.

@@ -209,14 +260,21 @@ class AutoPilotBlock(Block):
            session_id: Chat session to use.
            max_recursion_depth: Maximum allowed recursion nesting.
            user_id: Authenticated user ID.
+            permissions: Optional capability filter restricting tools/blocks.

        Returns:
            A tuple of (response_text, tool_calls, history_json, session_id, usage).
        """
-        from backend.copilot.sdk.collect import collect_copilot_response
+        from backend.copilot.sdk.collect import (
+            collect_copilot_response,  # avoid circular import
+        )

        tokens = _check_recursion(max_recursion_depth)
+        perm_token = None
        try:
+            effective_permissions, perm_token = _merge_inherited_permissions(
+                permissions
+            )
            effective_prompt = prompt
            if system_context:
                effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
@@ -225,6 +283,7 @@ class AutoPilotBlock(Block):
                session_id=session_id,
                message=effective_prompt,
                user_id=user_id,
+                permissions=effective_permissions,
            )

            # Build a lightweight conversation summary from streamed data.
@@ -271,6 +330,8 @@ class AutoPilotBlock(Block):
            )
        finally:
            _reset_recursion(tokens)
+            if perm_token is not None:
+                _inherited_permissions.reset(perm_token)

    async def run(
        self,
@@ -295,6 +356,13 @@ class AutoPilotBlock(Block):
            yield "error", "max_recursion_depth must be at least 1."
            return

+        # Validate and build permissions eagerly — fail before creating a session.
+        permissions = await _build_and_validate_permissions(input_data)
+        if isinstance(permissions, str):
+            # Validation error returned as a string message.
+            yield "error", permissions
+            return
+
        # Create session eagerly so the user always gets the session_id,
        # even if the downstream stream fails (avoids orphaned sessions).
        sid = input_data.session_id
@@ -312,6 +380,7 @@ class AutoPilotBlock(Block):
                session_id=sid,
                max_recursion_depth=input_data.max_recursion_depth,
                user_id=execution_context.user_id,
+                permissions=permissions,
            )

            yield "response", response
@@ -374,3 +443,78 @@ def _reset_recursion(
    """Restore recursion depth and limit to their previous values."""
    _autopilot_recursion_depth.reset(tokens[0])
    _autopilot_recursion_limit.reset(tokens[1])
+
+
+# ---------------------------------------------------------------------------
+# Permission helpers
+# ---------------------------------------------------------------------------
+
+# Inherited permissions from a parent AutoPilotBlock execution.
+# This acts as a ceiling: child executions can only be more restrictive.
+_inherited_permissions: contextvars.ContextVar["CopilotPermissions | None"] = (
+    contextvars.ContextVar("_inherited_permissions", default=None)
+)
+
+
+async def _build_and_validate_permissions(
+    input_data: "AutoPilotBlock.Input",
+) -> "CopilotPermissions | str":
+    """Build a :class:`CopilotPermissions` from block input and validate it.
+
+    Returns a :class:`CopilotPermissions` on success or a human-readable
+    error string if validation fails.
+    """
+    # Tool names are validated by Pydantic via the ToolName Literal type
+    # at model construction time — no runtime check needed here.
+    # Validate block identifiers against live block registry.
+    if input_data.blocks:
+        invalid_blocks = await validate_block_identifiers(input_data.blocks)
+        if invalid_blocks:
+            return (
+                f"Unknown block identifier(s) in 'blocks': {invalid_blocks}. "
+                "Use find_block to discover valid block names and IDs. "
+                "You may also use the first 8 characters of a block UUID."
+            )
+
+    return CopilotPermissions(
+        tools=list(input_data.tools),
+        tools_exclude=input_data.tools_exclude,
+        blocks=input_data.blocks,
+        blocks_exclude=input_data.blocks_exclude,
+    )
+
+
+def _merge_inherited_permissions(
+    permissions: "CopilotPermissions | None",
+) -> "tuple[CopilotPermissions | None, contextvars.Token[CopilotPermissions | None] | None]":
+    """Merge *permissions* with any inherited parent permissions.
+
+    The merged result is stored back into the contextvar so that any nested
+    AutoPilotBlock invocation (sub-agent) inherits the merged ceiling.
+
+    Returns a tuple of (merged_permissions, reset_token).  The caller MUST
+    reset the contextvar via ``_inherited_permissions.reset(token)`` in a
+    ``finally`` block when ``reset_token`` is not None — this prevents
+    permission leakage between sequential independent executions in the same
+    asyncio task.
+    """
+    parent = _inherited_permissions.get()
+
+    if permissions is None and parent is None:
+        return None, None
+
+    all_tools = all_known_tool_names()
+
+    if permissions is None:
+        permissions = CopilotPermissions()  # allow-all; will be narrowed by parent
+
+    merged = (
+        permissions.merged_with_parent(parent, all_tools)
+        if parent is not None
+        else permissions
+    )
+
+    # Store merged permissions as the new inherited ceiling for nested calls.
+    # Return the token so the caller can restore the previous value in finally.
+    token = _inherited_permissions.set(merged)
+    return merged, token
--- a/autogpt_platform/backend/backend/blocks/autopilot_permissions_test.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot_permissions_test.py
@@ -0,0 +1,265 @@
+"""Tests for AutoPilotBlock permission fields and validation."""
+
+from __future__ import annotations
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from pydantic import ValidationError
+
+from backend.blocks.autopilot import (
+    AutoPilotBlock,
+    _build_and_validate_permissions,
+    _inherited_permissions,
+    _merge_inherited_permissions,
+)
+from backend.copilot.permissions import CopilotPermissions, all_known_tool_names
+from backend.data.execution import ExecutionContext
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_input(**kwargs) -> AutoPilotBlock.Input:
+    defaults = {
+        "prompt": "Do something",
+        "system_context": "",
+        "session_id": "",
+        "max_recursion_depth": 3,
+        "tools": [],
+        "tools_exclude": True,
+        "blocks": [],
+        "blocks_exclude": True,
+    }
+    defaults.update(kwargs)
+    return AutoPilotBlock.Input(**defaults)
+
+
+# ---------------------------------------------------------------------------
+# _build_and_validate_permissions
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+class TestBuildAndValidatePermissions:
+    async def test_empty_inputs_returns_empty_permissions(self):
+        inp = _make_input()
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.is_empty()
+
+    async def test_valid_tool_names_accepted(self):
+        inp = _make_input(tools=["run_block", "web_fetch"], tools_exclude=True)
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.tools == ["run_block", "web_fetch"]
+        assert result.tools_exclude is True
+
+    async def test_invalid_tool_rejected_by_pydantic(self):
+        """Invalid tool names are now caught at Pydantic validation time
+        (Literal type), before ``_build_and_validate_permissions`` is called."""
+        with pytest.raises(ValidationError, match="not_a_real_tool"):
+            _make_input(tools=["not_a_real_tool"])
+
+    async def test_valid_block_name_accepted(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["HTTP Request"], blocks_exclude=True)
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.blocks == ["HTTP Request"]
+
+    async def test_valid_partial_uuid_accepted(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["c069dc6b"], blocks_exclude=False)
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+
+    async def test_invalid_block_identifier_returns_error(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["totally_fake_block"])
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, str)
+        assert "totally_fake_block" in result
+        assert "Unknown block identifier" in result
+
+    async def test_sdk_builtin_tool_names_accepted(self):
+        inp = _make_input(tools=["Read", "Task", "WebSearch"], tools_exclude=False)
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert not result.tools_exclude
+
+    async def test_empty_blocks_skips_validation(self):
+        # Should not call validate_block_identifiers at all when blocks=[].
+        with patch(
+            "backend.copilot.permissions.validate_block_identifiers"
+        ) as mock_validate:
+            inp = _make_input(blocks=[])
+            await _build_and_validate_permissions(inp)
+            mock_validate.assert_not_called()
+
+
+# ---------------------------------------------------------------------------
+# _merge_inherited_permissions
+# ---------------------------------------------------------------------------
+
+
+class TestMergeInheritedPermissions:
+    def test_no_permissions_no_parent_returns_none(self):
+        merged, token = _merge_inherited_permissions(None)
+        assert merged is None
+        assert token is None
+
+    def test_permissions_no_parent_returned_unchanged(self):
+        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        merged, token = _merge_inherited_permissions(perms)
+        try:
+            assert merged is perms
+            assert token is not None
+        finally:
+            if token is not None:
+                _inherited_permissions.reset(token)
+
+    def test_child_narrows_parent(self):
+        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        # Set parent as inherited
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            child = CopilotPermissions(tools=["web_fetch"], tools_exclude=True)
+            merged, inner_token = _merge_inherited_permissions(child)
+            try:
+                assert merged is not None
+                all_t = all_known_tool_names()
+                effective = merged.effective_allowed_tools(all_t)
+                assert "bash_exec" not in effective
+                assert "web_fetch" not in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+    def test_none_permissions_with_parent_uses_parent(self):
+        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            merged, inner_token = _merge_inherited_permissions(None)
+            try:
+                assert merged is not None
+                # Merged should have parent's restrictions
+                effective = merged.effective_allowed_tools(all_known_tool_names())
+                assert "bash_exec" not in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+    def test_child_cannot_expand_parent_whitelist(self):
+        parent = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            # Child tries to allow more tools
+            child = CopilotPermissions(
+                tools=["run_block", "bash_exec"], tools_exclude=False
+            )
+            merged, inner_token = _merge_inherited_permissions(child)
+            try:
+                assert merged is not None
+                effective = merged.effective_allowed_tools(all_known_tool_names())
+                assert "bash_exec" not in effective
+                assert "run_block" in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+
+# ---------------------------------------------------------------------------
+# AutoPilotBlock.run — validation integration
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+class TestAutoPilotBlockRunPermissions:
+    async def _collect_outputs(self, block, input_data, user_id="test-user"):
+        """Helper to collect all yields from block.run()."""
+        ctx = ExecutionContext(
+            user_id=user_id,
+            graph_id="g1",
+            graph_exec_id="ge1",
+            node_exec_id="ne1",
+            node_id="n1",
+        )
+        outputs = {}
+        async for key, val in block.run(input_data, execution_context=ctx):
+            outputs[key] = val
+        return outputs
+
+    async def test_invalid_tool_rejected_by_pydantic(self):
+        """Invalid tool names are caught at Pydantic validation (Literal type)."""
+        with pytest.raises(ValidationError, match="not_a_tool"):
+            _make_input(tools=["not_a_tool"])
+
+    async def test_invalid_block_yields_error(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            block = AutoPilotBlock()
+            inp = _make_input(blocks=["nonexistent_block"])
+            outputs = await self._collect_outputs(block, inp)
+        assert "error" in outputs
+        assert "nonexistent_block" in outputs["error"]
+
+    async def test_empty_prompt_yields_error_before_permission_check(self):
+        block = AutoPilotBlock()
+        inp = _make_input(prompt="   ", tools=["run_block"])
+        outputs = await self._collect_outputs(block, inp)
+        assert "error" in outputs
+        assert "Prompt cannot be empty" in outputs["error"]
+
+    async def test_valid_permissions_passed_to_execute(self):
+        """Permissions are forwarded to execute_copilot when valid."""
+        block = AutoPilotBlock()
+        captured: dict = {}
+
+        async def fake_execute_copilot(self_inner, **kwargs):
+            captured["permissions"] = kwargs.get("permissions")
+            return (
+                "ok",
+                [],
+                '[{"role":"user","content":"hi"}]',
+                "test-sid",
+                {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2},
+            )
+
+        with patch.object(
+            AutoPilotBlock, "create_session", new=AsyncMock(return_value="test-sid")
+        ), patch.object(AutoPilotBlock, "execute_copilot", new=fake_execute_copilot):
+            inp = _make_input(tools=["run_block"], tools_exclude=False)
+            outputs = await self._collect_outputs(block, inp)
+
+        assert "error" not in outputs
+        perms = captured.get("permissions")
+        assert isinstance(perms, CopilotPermissions)
+        assert perms.tools == ["run_block"]
+        assert perms.tools_exclude is False
--- a/autogpt_platform/backend/backend/blocks/llm.py
+++ b/autogpt_platform/backend/backend/blocks/llm.py
@@ -49,6 +49,9 @@ settings = Settings()
 logger = TruncatedLogger(logging.getLogger(__name__), "[LLM-Block]")
 fmt = TextFormatter(autoescape=False)

+# HTTP status codes for user-caused errors that should not be reported to Sentry.
+USER_ERROR_STATUS_CODES = (401, 403, 429)
+
 LLMProviderName = Literal[
    ProviderName.AIML_API,
    ProviderName.ANTHROPIC,
@@ -796,6 +799,19 @@ async def llm_call(
            )
        prompt = result.messages

+    # Sanitize unpaired surrogates in message content to prevent
+    # UnicodeEncodeError when httpx encodes the JSON request body.
+    for msg in prompt:
+        content = msg.get("content")
+        if isinstance(content, str):
+            try:
+                content.encode("utf-8")
+            except UnicodeEncodeError:
+                logger.warning("Sanitized unpaired surrogates in LLM prompt content")
+                msg["content"] = content.encode("utf-8", errors="surrogatepass").decode(
+                    "utf-8", errors="replace"
+                )
+
    # Calculate available tokens based on context window and input length
    estimated_input_tokens = estimate_token_count(prompt)
    model_max_output = llm_model.max_output_tokens or int(2**15)
@@ -878,65 +894,60 @@ async def llm_call(
        client = anthropic.AsyncAnthropic(
            api_key=credentials.api_key.get_secret_value()
        )
-        try:
-            resp = await client.messages.create(
-                model=llm_model.value,
-                system=sysprompt,
-                messages=messages,
-                max_tokens=max_tokens,
-                tools=an_tools,
-                timeout=600,
-            )
+        resp = await client.messages.create(
+            model=llm_model.value,
+            system=sysprompt,
+            messages=messages,
+            max_tokens=max_tokens,
+            tools=an_tools,
+            timeout=600,
+        )

-            if not resp.content:
-                raise ValueError("No content returned from Anthropic.")
+        if not resp.content:
+            raise ValueError("No content returned from Anthropic.")

-            tool_calls = None
-            for content_block in resp.content:
-                # Antropic is different to openai, need to iterate through
-                # the content blocks to find the tool calls
-                if content_block.type == "tool_use":
-                    if tool_calls is None:
-                        tool_calls = []
-                    tool_calls.append(
-                        ToolContentBlock(
-                            id=content_block.id,
-                            type=content_block.type,
-                            function=ToolCall(
-                                name=content_block.name,
-                                arguments=json.dumps(content_block.input),
-                            ),
-                        )
+        tool_calls = None
+        for content_block in resp.content:
+            # Antropic is different to openai, need to iterate through
+            # the content blocks to find the tool calls
+            if content_block.type == "tool_use":
+                if tool_calls is None:
+                    tool_calls = []
+                tool_calls.append(
+                    ToolContentBlock(
+                        id=content_block.id,
+                        type=content_block.type,
+                        function=ToolCall(
+                            name=content_block.name,
+                            arguments=json.dumps(content_block.input),
+                        ),
                    )
-
-            if not tool_calls and resp.stop_reason == "tool_use":
-                logger.warning(
-                    f"Tool use stop reason but no tool calls found in content. {resp}"
                )

-            reasoning = None
-            for content_block in resp.content:
-                if hasattr(content_block, "type") and content_block.type == "thinking":
-                    reasoning = content_block.thinking
-                    break
-
-            return LLMResponse(
-                raw_response=resp,
-                prompt=prompt,
-                response=(
-                    resp.content[0].name
-                    if isinstance(resp.content[0], anthropic.types.ToolUseBlock)
-                    else getattr(resp.content[0], "text", "")
-                ),
-                tool_calls=tool_calls,
-                prompt_tokens=resp.usage.input_tokens,
-                completion_tokens=resp.usage.output_tokens,
-                reasoning=reasoning,
+        if not tool_calls and resp.stop_reason == "tool_use":
+            logger.warning(
+                f"Tool use stop reason but no tool calls found in content. {resp}"
            )
-        except anthropic.APIError as e:
-            error_message = f"Anthropic API error: {str(e)}"
-            logger.error(error_message)
-            raise ValueError(error_message)
+
+        reasoning = None
+        for content_block in resp.content:
+            if hasattr(content_block, "type") and content_block.type == "thinking":
+                reasoning = content_block.thinking
+                break
+
+        return LLMResponse(
+            raw_response=resp,
+            prompt=prompt,
+            response=(
+                resp.content[0].name
+                if isinstance(resp.content[0], anthropic.types.ToolUseBlock)
+                else getattr(resp.content[0], "text", "")
+            ),
+            tool_calls=tool_calls,
+            prompt_tokens=resp.usage.input_tokens,
+            completion_tokens=resp.usage.output_tokens,
+            reasoning=reasoning,
+        )
    elif provider == "groq":
        if tools:
            raise ValueError("Groq does not support tools.")
@@ -1449,7 +1460,16 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
                    yield "prompt", self.prompt
                    return
            except Exception as e:
-                logger.exception(f"Error calling LLM: {e}")
+                is_user_error = (
+                    isinstance(e, (anthropic.APIStatusError, openai.APIStatusError))
+                    and e.status_code in USER_ERROR_STATUS_CODES
+                )
+                if is_user_error:
+                    logger.warning(f"Error calling LLM: {e}")
+                    error_feedback_message = f"Error calling LLM: {e}"
+                    break
+                else:
+                    logger.exception(f"Error calling LLM: {e}")
                if (
                    "maximum context length" in str(e).lower()
                    or "token limit" in str(e).lower()
--- a/autogpt_platform/backend/backend/blocks/smart_decision_maker.py
+++ b/autogpt_platform/backend/backend/blocks/smart_decision_maker.py
@@ -258,9 +258,10 @@ def get_pending_tool_calls(conversation_history: list[Any] | None) -> dict[str,
    return {call_id: count for call_id, count in pending_calls.items() if count > 0}


-class SmartDecisionMakerBlock(Block):
+class OrchestratorBlock(Block):
    """
-    A block that uses a language model to make smart decisions based on a given prompt.
+    A block that uses a language model to orchestrate tool calls, supporting both
+    single-shot and iterative agent mode execution.
    """

    class Input(BlockSchemaInput):
@@ -401,8 +402,8 @@ class SmartDecisionMakerBlock(Block):
            description="Uses AI to intelligently decide what tool to use.",
            categories={BlockCategory.AI},
            block_type=BlockType.AI,
-            input_schema=SmartDecisionMakerBlock.Input,
-            output_schema=SmartDecisionMakerBlock.Output,
+            input_schema=OrchestratorBlock.Input,
+            output_schema=OrchestratorBlock.Output,
            test_input={
                "prompt": "Hello, World!",
                "credentials": llm.TEST_CREDENTIALS_INPUT,
@@ -440,7 +441,7 @@ class SmartDecisionMakerBlock(Block):
        tool_name = custom_name if custom_name else block.name

        tool_function: dict[str, Any] = {
-            "name": SmartDecisionMakerBlock.cleanup(tool_name),
+            "name": OrchestratorBlock.cleanup(tool_name),
            "description": block.description,
        }
        sink_block_input_schema = block.input_schema
@@ -451,7 +452,7 @@ class SmartDecisionMakerBlock(Block):
            field_name = link.sink_name
            is_dynamic = is_dynamic_field(field_name)
            # Clean property key to ensure Anthropic API compatibility for ALL fields
-            clean_field_name = SmartDecisionMakerBlock.cleanup(field_name)
+            clean_field_name = OrchestratorBlock.cleanup(field_name)
            field_mapping[clean_field_name] = field_name

            if is_dynamic:
@@ -485,7 +486,7 @@ class SmartDecisionMakerBlock(Block):
            field_name = link.sink_name
            is_dynamic = is_dynamic_field(field_name)
            # Always use cleaned field name for property key (Anthropic API compliance)
-            clean_field_name = SmartDecisionMakerBlock.cleanup(field_name)
+            clean_field_name = OrchestratorBlock.cleanup(field_name)

            if is_dynamic:
                base_name = extract_base_field_name(field_name)
@@ -542,7 +543,7 @@ class SmartDecisionMakerBlock(Block):
        tool_name = custom_name if custom_name else sink_graph_meta.name

        tool_function: dict[str, Any] = {
-            "name": SmartDecisionMakerBlock.cleanup(tool_name),
+            "name": OrchestratorBlock.cleanup(tool_name),
            "description": sink_graph_meta.description,
        }

@@ -552,7 +553,7 @@ class SmartDecisionMakerBlock(Block):
        for link in links:
            field_name = link.sink_name

-            clean_field_name = SmartDecisionMakerBlock.cleanup(field_name)
+            clean_field_name = OrchestratorBlock.cleanup(field_name)
            field_mapping[clean_field_name] = field_name

            sink_block_input_schema = sink_node.input_default["input_schema"]
@@ -618,17 +619,13 @@ class SmartDecisionMakerBlock(Block):
                raise ValueError(f"Sink node not found: {links[0].sink_id}")

            if sink_node.block_id == AgentExecutorBlock().id:
-                tool_func = (
-                    await SmartDecisionMakerBlock._create_agent_function_signature(
-                        sink_node, links
-                    )
+                tool_func = await OrchestratorBlock._create_agent_function_signature(
+                    sink_node, links
                )
                return_tool_functions.append(tool_func)
            else:
-                tool_func = (
-                    await SmartDecisionMakerBlock._create_block_function_signature(
-                        sink_node, links
-                    )
+                tool_func = await OrchestratorBlock._create_block_function_signature(
+                    sink_node, links
                )
                return_tool_functions.append(tool_func)

@@ -908,7 +905,7 @@ class SmartDecisionMakerBlock(Block):
                task=node_exec_future,
            )

-            # Execute the node directly since we're in the SmartDecisionMaker context
+            # Execute the node directly since we're in the Orchestrator context
            node_exec_future.set_result(
                await execution_processor.on_node_execution(
                    node_exec=node_exec_entry,
@@ -934,7 +931,7 @@ class SmartDecisionMakerBlock(Block):
            )

        except Exception as e:
-            logger.error(f"Tool execution with manager failed: {e}")
+            logger.warning(f"Tool execution with manager failed: {e}")
            # Return error response
            return _create_tool_response(
                tool_call.id,
@@ -1112,7 +1109,7 @@ class SmartDecisionMakerBlock(Block):
                return
        elif input_data.last_tool_output:
            logger.error(
-                f"[SmartDecisionMakerBlock-node_exec_id={node_exec_id}] "
+                f"[OrchestratorBlock-node_exec_id={node_exec_id}] "
                f"No pending tool calls found. This may indicate an issue with the "
                f"conversation history, or the tool giving response more than once."
                f"This should not happen! Please check the conversation history for any inconsistencies."
@@ -1249,7 +1246,7 @@ class SmartDecisionMakerBlock(Block):
                emit_key = f"tools_^_{sink_node_id}_~_{original_field_name}"

                logger.debug(
-                    "[SmartDecisionMakerBlock|geid:%s|neid:%s] emit %s",
+                    "[OrchestratorBlock|geid:%s|neid:%s] emit %s",
                    graph_exec_id,
                    node_exec_id,
                    emit_key,
--- a/autogpt_platform/backend/backend/blocks/stagehand/blocks.py
+++ b/autogpt_platform/backend/backend/blocks/stagehand/blocks.py
@@ -1,13 +1,8 @@
 import logging
-import signal
-import threading
-import warnings
-from contextlib import contextmanager
 from enum import Enum

-# Monkey patch Stagehands to prevent signal handling in worker threads
-import stagehand.main
-from stagehand import Stagehand
+from stagehand import AsyncStagehand
+from stagehand.types.session_act_params import Options as ActOptions

 from backend.blocks.llm import (
    MODEL_METADATA,
@@ -28,46 +23,6 @@ from backend.sdk import (
    SchemaField,
 )

-# Suppress false positive cleanup warning of litellm (a dependency of stagehand)
-warnings.filterwarnings("ignore", module="litellm.llms.custom_httpx")
-
-# Store the original method
-original_register_signal_handlers = stagehand.main.Stagehand._register_signal_handlers
-
-
-def safe_register_signal_handlers(self):
-    """Only register signal handlers in the main thread"""
-    if threading.current_thread() is threading.main_thread():
-        original_register_signal_handlers(self)
-    else:
-        # Skip signal handling in worker threads
-        pass
-
-
-# Replace the method
-stagehand.main.Stagehand._register_signal_handlers = safe_register_signal_handlers
-
-
-@contextmanager
-def disable_signal_handling():
-    """Context manager to temporarily disable signal handling"""
-    if threading.current_thread() is not threading.main_thread():
-        # In worker threads, temporarily replace signal.signal with a no-op
-        original_signal = signal.signal
-
-        def noop_signal(*args, **kwargs):
-            pass
-
-        signal.signal = noop_signal
-        try:
-            yield
-        finally:
-            signal.signal = original_signal
-    else:
-        # In main thread, don't modify anything
-        yield
-
-
 logger = logging.getLogger(__name__)


@@ -148,13 +103,10 @@ class StagehandObserveBlock(Block):
        instruction: str = SchemaField(
            description="Natural language description of elements or actions to discover.",
        )
-        iframes: bool = SchemaField(
-            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
-            default=True,
-        )
-        domSettleTimeoutMs: int = SchemaField(
-            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
-            default=45000,
+        dom_settle_timeout_ms: int = SchemaField(
+            description="Timeout in ms to wait for the DOM to settle after navigation.",
+            default=30000,
+            advanced=True,
        )

    class Output(BlockSchemaOutput):
@@ -185,32 +137,28 @@ class StagehandObserveBlock(Block):

        logger.debug(f"OBSERVE: Using model provider {model_credentials.provider}")

-        with disable_signal_handling():
-            stagehand = Stagehand(
-                api_key=stagehand_credentials.api_key.get_secret_value(),
-                project_id=input_data.browserbase_project_id,
+        async with AsyncStagehand(
+            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
+            browserbase_project_id=input_data.browserbase_project_id,
+            model_api_key=model_credentials.api_key.get_secret_value(),
+        ) as client:
+            session = await client.sessions.start(
                model_name=input_data.model.provider_name,
-                model_api_key=model_credentials.api_key.get_secret_value(),
+                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
            )
+            try:
+                await session.navigate(url=input_data.url)

-            await stagehand.init()
-
-        page = stagehand.page
-
-        assert page is not None, "Stagehand page is not initialized"
-
-        await page.goto(input_data.url)
-
-        observe_results = await page.observe(
-            input_data.instruction,
-            iframes=input_data.iframes,
-            domSettleTimeoutMs=input_data.domSettleTimeoutMs,
-        )
-        for result in observe_results:
-            yield "selector", result.selector
-            yield "description", result.description
-            yield "method", result.method
-            yield "arguments", result.arguments
+                observe_response = await session.observe(
+                    instruction=input_data.instruction,
+                )
+                for result in observe_response.data.result:
+                    yield "selector", result.selector
+                    yield "description", result.description
+                    yield "method", result.method
+                    yield "arguments", result.arguments
+            finally:
+                await session.end()


 class StagehandActBlock(Block):
@@ -242,24 +190,22 @@ class StagehandActBlock(Block):
            description="Variables to use in the action. Variables contains data you want the action to use.",
            default_factory=dict,
        )
-        iframes: bool = SchemaField(
-            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
-            default=True,
+        dom_settle_timeout_ms: int = SchemaField(
+            description="Timeout in ms to wait for the DOM to settle after navigation.",
+            default=30000,
+            advanced=True,
        )
-        domSettleTimeoutMs: int = SchemaField(
-            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
-            default=45000,
-        )
-        timeoutMs: int = SchemaField(
-            description="Timeout in milliseconds for DOM ready. Extended timeout for slow-loading forms",
-            default=60000,
+        timeout_ms: int = SchemaField(
+            description="Timeout in ms for each action.",
+            default=30000,
+            advanced=True,
        )

    class Output(BlockSchemaOutput):
        success: bool = SchemaField(
            description="Whether the action was completed successfully"
        )
-        message: str = SchemaField(description="Details about the action’s execution.")
+        message: str = SchemaField(description="Details about the action's execution.")
        action: str = SchemaField(description="Action performed")

    def __init__(self):
@@ -282,32 +228,33 @@ class StagehandActBlock(Block):

        logger.debug(f"ACT: Using model provider {model_credentials.provider}")

-        with disable_signal_handling():
-            stagehand = Stagehand(
-                api_key=stagehand_credentials.api_key.get_secret_value(),
-                project_id=input_data.browserbase_project_id,
+        async with AsyncStagehand(
+            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
+            browserbase_project_id=input_data.browserbase_project_id,
+            model_api_key=model_credentials.api_key.get_secret_value(),
+        ) as client:
+            session = await client.sessions.start(
                model_name=input_data.model.provider_name,
-                model_api_key=model_credentials.api_key.get_secret_value(),
+                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
            )
+            try:
+                await session.navigate(url=input_data.url)

-            await stagehand.init()
-
-        page = stagehand.page
-
-        assert page is not None, "Stagehand page is not initialized"
-
-        await page.goto(input_data.url)
-        for action in input_data.action:
-            action_results = await page.act(
-                action,
-                variables=input_data.variables,
-                iframes=input_data.iframes,
-                domSettleTimeoutMs=input_data.domSettleTimeoutMs,
-                timeoutMs=input_data.timeoutMs,
-            )
-            yield "success", action_results.success
-            yield "message", action_results.message
-            yield "action", action_results.action
+                for action in input_data.action:
+                    act_options = ActOptions(
+                        variables={k: v for k, v in input_data.variables.items()},
+                        timeout=input_data.timeout_ms,
+                    )
+                    act_response = await session.act(
+                        input=action,
+                        options=act_options,
+                    )
+                    result = act_response.data.result
+                    yield "success", result.success
+                    yield "message", result.message
+                    yield "action", result.action_description
+            finally:
+                await session.end()


 class StagehandExtractBlock(Block):
@@ -335,13 +282,10 @@ class StagehandExtractBlock(Block):
        instruction: str = SchemaField(
            description="Natural language description of elements or actions to discover.",
        )
-        iframes: bool = SchemaField(
-            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
-            default=True,
-        )
-        domSettleTimeoutMs: int = SchemaField(
-            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
-            default=45000,
+        dom_settle_timeout_ms: int = SchemaField(
+            description="Timeout in ms to wait for the DOM to settle after navigation.",
+            default=30000,
+            advanced=True,
        )

    class Output(BlockSchemaOutput):
@@ -367,24 +311,21 @@ class StagehandExtractBlock(Block):

        logger.debug(f"EXTRACT: Using model provider {model_credentials.provider}")

-        with disable_signal_handling():
-            stagehand = Stagehand(
-                api_key=stagehand_credentials.api_key.get_secret_value(),
-                project_id=input_data.browserbase_project_id,
+        async with AsyncStagehand(
+            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
+            browserbase_project_id=input_data.browserbase_project_id,
+            model_api_key=model_credentials.api_key.get_secret_value(),
+        ) as client:
+            session = await client.sessions.start(
                model_name=input_data.model.provider_name,
-                model_api_key=model_credentials.api_key.get_secret_value(),
+                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
            )
+            try:
+                await session.navigate(url=input_data.url)

-            await stagehand.init()
-
-        page = stagehand.page
-
-        assert page is not None, "Stagehand page is not initialized"
-
-        await page.goto(input_data.url)
-        extraction = await page.extract(
-            input_data.instruction,
-            iframes=input_data.iframes,
-            domSettleTimeoutMs=input_data.domSettleTimeoutMs,
-        )
-        yield "extraction", str(extraction.model_dump()["extraction"])
+                extract_response = await session.extract(
+                    instruction=input_data.instruction,
+                )
+                yield "extraction", str(extract_response.data.result)
+            finally:
+                await session.end()
--- a/autogpt_platform/backend/backend/blocks/test/test_llm.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_llm.py
@@ -1,9 +1,18 @@
+from typing import cast
 from unittest.mock import AsyncMock, MagicMock, patch

+import anthropic
+import httpx
+import openai
 import pytest

+import backend.blocks.llm as llm
 from backend.data.model import NodeExecutionStats

+# TEST_CREDENTIALS_INPUT is a plain dict that satisfies AICredentials at runtime
+# but not at the type level. Cast once here to avoid per-test suppressors.
+_TEST_AI_CREDENTIALS = cast(llm.AICredentials, llm.TEST_CREDENTIALS_INPUT)
+

 class TestLLMStatsTracking:
    """Test that LLM blocks correctly track token usage statistics."""
@@ -655,3 +664,148 @@ class TestAITextSummarizerValidation:
        error_message = str(exc_info.value)
        assert "Expected a string summary" in error_message
        assert "received dict" in error_message
+
+
+def _make_anthropic_status_error(status_code: int) -> anthropic.APIStatusError:
+    """Create an anthropic.APIStatusError with the given status code."""
+    request = httpx.Request("POST", "https://api.anthropic.com/v1/messages")
+    response = httpx.Response(status_code, request=request)
+    return anthropic.APIStatusError(
+        f"Error code: {status_code}", response=response, body=None
+    )
+
+
+def _make_openai_status_error(status_code: int) -> openai.APIStatusError:
+    """Create an openai.APIStatusError with the given status code."""
+    response = httpx.Response(
+        status_code, request=httpx.Request("POST", "https://api.openai.com/v1/chat")
+    )
+    return openai.APIStatusError(
+        f"Error code: {status_code}", response=response, body=None
+    )
+
+
+class TestUserErrorStatusCodeHandling:
+    """Test that user-caused LLM API errors (401/403/429) break the retry loop
+    and are logged as warnings, while server errors (500) trigger retries."""
+
+    @pytest.mark.asyncio
+    @pytest.mark.parametrize("status_code", [401, 403, 429])
+    async def test_anthropic_user_error_breaks_retry_loop(self, status_code: int):
+        """401/403/429 Anthropic errors should break immediately, not retry."""
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+        call_count = 0
+
+        async def mock_llm_call(*args, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            raise _make_anthropic_status_error(status_code)
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+                prompt="Test",
+                expected_format={"key": "desc"},
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+                retry=3,
+            )
+
+            with pytest.raises(RuntimeError):
+                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                    pass
+
+        assert (
+            call_count == 1
+        ), f"Expected exactly 1 call for status {status_code}, got {call_count}"
+
+    @pytest.mark.asyncio
+    @pytest.mark.parametrize("status_code", [401, 403, 429])
+    async def test_openai_user_error_breaks_retry_loop(self, status_code: int):
+        """401/403/429 OpenAI errors should break immediately, not retry."""
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+        call_count = 0
+
+        async def mock_llm_call(*args, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            raise _make_openai_status_error(status_code)
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+                prompt="Test",
+                expected_format={"key": "desc"},
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+                retry=3,
+            )
+
+            with pytest.raises(RuntimeError):
+                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                    pass
+
+        assert (
+            call_count == 1
+        ), f"Expected exactly 1 call for status {status_code}, got {call_count}"
+
+    @pytest.mark.asyncio
+    async def test_server_error_retries(self):
+        """500 errors should be retried (not break immediately)."""
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+        call_count = 0
+
+        async def mock_llm_call(*args, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            raise _make_anthropic_status_error(500)
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+                prompt="Test",
+                expected_format={"key": "desc"},
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+                retry=3,
+            )
+
+            with pytest.raises(RuntimeError):
+                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                    pass
+
+        assert (
+            call_count > 1
+        ), f"Expected multiple retry attempts for 500, got {call_count}"
+
+    @pytest.mark.asyncio
+    async def test_user_error_logs_warning_not_exception(self):
+        """User-caused errors should log with logger.warning, not logger.exception."""
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+
+        async def mock_llm_call(*args, **kwargs):
+            raise _make_anthropic_status_error(401)
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+                prompt="Test",
+                expected_format={"key": "desc"},
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+            )
+
+            with (
+                patch.object(llm.logger, "warning") as mock_warning,
+                patch.object(llm.logger, "exception") as mock_exception,
+                pytest.raises(RuntimeError),
+            ):
+                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                    pass
+
+        mock_warning.assert_called_once()
+        mock_exception.assert_not_called()
--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker.py
@@ -57,7 +57,7 @@ async def execute_graph(
@pytest.mark.asyncio(loop_scope="session")
 async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):
    from backend.blocks.agent import AgentExecutorBlock
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data import graph

    test_user = await create_test_user()
@@ -66,7 +66,7 @@ async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):

    nodes = [
        graph.Node(
-            block_id=SmartDecisionMakerBlock().id,
+            block_id=OrchestratorBlock().id,
            input_default={
                "prompt": "Hello, World!",
                "credentials": creds,
@@ -108,10 +108,10 @@ async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):


@pytest.mark.asyncio(loop_scope="session")
-async def test_smart_decision_maker_function_signature(server: SpinTestServer):
+async def test_orchestrator_function_signature(server: SpinTestServer):
    from backend.blocks.agent import AgentExecutorBlock
    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data import graph

    test_user = await create_test_user()
@@ -120,7 +120,7 @@ async def test_smart_decision_maker_function_signature(server: SpinTestServer):

    nodes = [
        graph.Node(
-            block_id=SmartDecisionMakerBlock().id,
+            block_id=OrchestratorBlock().id,
            input_default={
                "prompt": "Hello, World!",
                "credentials": creds,
@@ -169,7 +169,7 @@ async def test_smart_decision_maker_function_signature(server: SpinTestServer):
    )
    test_graph = await create_graph(server, test_graph, test_user)

-    tool_functions = await SmartDecisionMakerBlock._create_tool_node_signatures(
+    tool_functions = await OrchestratorBlock._create_tool_node_signatures(
        test_graph.nodes[0].id
    )
    assert tool_functions is not None, "Tool functions should not be None"
@@ -198,12 +198,12 @@ async def test_smart_decision_maker_function_signature(server: SpinTestServer):


@pytest.mark.asyncio
-async def test_smart_decision_maker_tracks_llm_stats():
-    """Test that SmartDecisionMakerBlock correctly tracks LLM usage stats."""
+async def test_orchestrator_tracks_llm_stats():
+    """Test that OrchestratorBlock correctly tracks LLM usage stats."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock the llm.llm_call function to return controlled data
    mock_response = MagicMock()
@@ -224,14 +224,14 @@ async def test_smart_decision_maker_tracks_llm_stats():
        new_callable=AsyncMock,
        return_value=mock_response,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],
    ):

        # Create test input
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Should I continue with this task?",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -274,12 +274,12 @@ async def test_smart_decision_maker_tracks_llm_stats():


@pytest.mark.asyncio
-async def test_smart_decision_maker_parameter_validation():
-    """Test that SmartDecisionMakerBlock correctly validates tool call parameters."""
+async def test_orchestrator_parameter_validation():
+    """Test that OrchestratorBlock correctly validates tool call parameters."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock tool functions with specific parameter schema
    mock_tool_functions = [
@@ -327,13 +327,13 @@ async def test_smart_decision_maker_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_with_typo,
    ) as mock_llm_call, patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -394,13 +394,13 @@ async def test_smart_decision_maker_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_missing_required,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -454,13 +454,13 @@ async def test_smart_decision_maker_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_valid,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -518,13 +518,13 @@ async def test_smart_decision_maker_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_all_params,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -562,12 +562,12 @@ async def test_smart_decision_maker_parameter_validation():


@pytest.mark.asyncio
-async def test_smart_decision_maker_raw_response_conversion():
-    """Test that SmartDecisionMaker correctly handles different raw_response types with retry mechanism."""
+async def test_orchestrator_raw_response_conversion():
+    """Test that Orchestrator correctly handles different raw_response types with retry mechanism."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock tool functions
    mock_tool_functions = [
@@ -637,7 +637,7 @@ async def test_smart_decision_maker_raw_response_conversion():
    with patch(
        "backend.blocks.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm_call, patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
@@ -646,7 +646,7 @@ async def test_smart_decision_maker_raw_response_conversion():
        # Second call returns successful response
        mock_llm_call.side_effect = [mock_response_retry, mock_response_success]

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Test prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -715,12 +715,12 @@ async def test_smart_decision_maker_raw_response_conversion():
        new_callable=AsyncMock,
        return_value=mock_response_ollama,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],  # No tools for this test
    ):
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Simple prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -771,12 +771,12 @@ async def test_smart_decision_maker_raw_response_conversion():
        new_callable=AsyncMock,
        return_value=mock_response_dict,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],
    ):
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Another test",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -811,12 +811,12 @@ async def test_smart_decision_maker_raw_response_conversion():


@pytest.mark.asyncio
-async def test_smart_decision_maker_agent_mode():
+async def test_orchestrator_agent_mode():
    """Test that agent mode executes tools directly and loops until finished."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock tool call that requires multiple iterations
    mock_tool_call_1 = MagicMock()
@@ -893,7 +893,7 @@ async def test_smart_decision_maker_agent_mode():
    with patch("backend.blocks.llm.llm_call", llm_call_mock), patch.object(
        block, "_create_tool_node_signatures", return_value=mock_tool_signatures
    ), patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
+        "backend.blocks.orchestrator.get_database_manager_async_client",
        return_value=mock_db_client,
    ), patch(
        "backend.executor.manager.async_update_node_execution_status",
@@ -929,7 +929,7 @@ async def test_smart_decision_maker_agent_mode():
        }

        # Test agent mode with max_iterations = 3
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Complete this task using tools",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -969,12 +969,12 @@ async def test_smart_decision_maker_agent_mode():


@pytest.mark.asyncio
-async def test_smart_decision_maker_traditional_mode_default():
+async def test_orchestrator_traditional_mode_default():
    """Test that default behavior (agent_mode_max_iterations=0) works as traditional mode."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock tool call
    mock_tool_call = MagicMock()
@@ -1018,7 +1018,7 @@ async def test_smart_decision_maker_traditional_mode_default():
    ):

        # Test default behavior (traditional mode)
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Test prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -1060,12 +1060,12 @@ async def test_smart_decision_maker_traditional_mode_default():


@pytest.mark.asyncio
-async def test_smart_decision_maker_uses_customized_name_for_blocks():
-    """Test that SmartDecisionMakerBlock uses customized_name from node metadata for tool names."""
+async def test_orchestrator_uses_customized_name_for_blocks():
+    """Test that OrchestratorBlock uses customized_name from node metadata for tool names."""
    from unittest.mock import MagicMock

    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data.graph import Link, Node

    # Create a mock node with customized_name in metadata
@@ -1080,7 +1080,7 @@ async def test_smart_decision_maker_uses_customized_name_for_blocks():
    mock_link.sink_name = "input"

    # Call the function directly
-    result = await SmartDecisionMakerBlock._create_block_function_signature(
+    result = await OrchestratorBlock._create_block_function_signature(
        mock_node, [mock_link]
    )

@@ -1091,12 +1091,12 @@ async def test_smart_decision_maker_uses_customized_name_for_blocks():


@pytest.mark.asyncio
-async def test_smart_decision_maker_falls_back_to_block_name():
-    """Test that SmartDecisionMakerBlock falls back to block.name when no customized_name."""
+async def test_orchestrator_falls_back_to_block_name():
+    """Test that OrchestratorBlock falls back to block.name when no customized_name."""
    from unittest.mock import MagicMock

    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data.graph import Link, Node

    # Create a mock node without customized_name
@@ -1111,7 +1111,7 @@ async def test_smart_decision_maker_falls_back_to_block_name():
    mock_link.sink_name = "input"

    # Call the function directly
-    result = await SmartDecisionMakerBlock._create_block_function_signature(
+    result = await OrchestratorBlock._create_block_function_signature(
        mock_node, [mock_link]
    )

@@ -1122,11 +1122,11 @@ async def test_smart_decision_maker_falls_back_to_block_name():


@pytest.mark.asyncio
-async def test_smart_decision_maker_uses_customized_name_for_agents():
-    """Test that SmartDecisionMakerBlock uses customized_name from metadata for agent nodes."""
+async def test_orchestrator_uses_customized_name_for_agents():
+    """Test that OrchestratorBlock uses customized_name from metadata for agent nodes."""
    from unittest.mock import AsyncMock, MagicMock, patch

-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data.graph import Link, Node

    # Create a mock node with customized_name in metadata
@@ -1152,10 +1152,10 @@ async def test_smart_decision_maker_uses_customized_name_for_agents():
    mock_db_client.get_graph_metadata.return_value = mock_graph_meta

    with patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
+        "backend.blocks.orchestrator.get_database_manager_async_client",
        return_value=mock_db_client,
    ):
-        result = await SmartDecisionMakerBlock._create_agent_function_signature(
+        result = await OrchestratorBlock._create_agent_function_signature(
            mock_node, [mock_link]
        )

@@ -1166,11 +1166,11 @@ async def test_smart_decision_maker_uses_customized_name_for_agents():


@pytest.mark.asyncio
-async def test_smart_decision_maker_agent_falls_back_to_graph_name():
+async def test_orchestrator_agent_falls_back_to_graph_name():
    """Test that agent node falls back to graph name when no customized_name."""
    from unittest.mock import AsyncMock, MagicMock, patch

-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data.graph import Link, Node

    # Create a mock node without customized_name
@@ -1196,10 +1196,10 @@ async def test_smart_decision_maker_agent_falls_back_to_graph_name():
    mock_db_client.get_graph_metadata.return_value = mock_graph_meta

    with patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
+        "backend.blocks.orchestrator.get_database_manager_async_client",
        return_value=mock_db_client,
    ):
-        result = await SmartDecisionMakerBlock._create_agent_function_signature(
+        result = await OrchestratorBlock._create_agent_function_signature(
            mock_node, [mock_link]
        )

--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dict.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dict.py
@@ -3,12 +3,12 @@ from unittest.mock import Mock
 import pytest

 from backend.blocks.data_manipulation import AddToListBlock, CreateDictionaryBlock
-from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+from backend.blocks.orchestrator import OrchestratorBlock


@pytest.mark.asyncio
-async def test_smart_decision_maker_handles_dynamic_dict_fields():
-    """Test Smart Decision Maker can handle dynamic dictionary fields (_#_) for any block"""
+async def test_orchestrator_handles_dynamic_dict_fields():
+    """Test Orchestrator can handle dynamic dictionary fields (_#_) for any block"""

    # Create a mock node for CreateDictionaryBlock
    mock_node = Mock()
@@ -23,24 +23,24 @@ async def test_smart_decision_maker_handles_dynamic_dict_fields():
            source_name="tools_^_create_dict_~_name",
            sink_name="values_#_name",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_age",
            sink_name="values_#_age",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_city",
            sink_name="values_#_city",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

    # Generate function signature
-    signature = await SmartDecisionMakerBlock._create_block_function_signature(
+    signature = await OrchestratorBlock._create_block_function_signature(
        mock_node, mock_links  # type: ignore
    )

@@ -70,8 +70,8 @@ async def test_smart_decision_maker_handles_dynamic_dict_fields():


@pytest.mark.asyncio
-async def test_smart_decision_maker_handles_dynamic_list_fields():
-    """Test Smart Decision Maker can handle dynamic list fields (_$_) for any block"""
+async def test_orchestrator_handles_dynamic_list_fields():
+    """Test Orchestrator can handle dynamic list fields (_$_) for any block"""

    # Create a mock node for AddToListBlock
    mock_node = Mock()
@@ -86,18 +86,18 @@ async def test_smart_decision_maker_handles_dynamic_list_fields():
            source_name="tools_^_add_to_list_~_0",
            sink_name="entries_$_0",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_add_to_list_~_1",
            sink_name="entries_$_1",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

    # Generate function signature
-    signature = await SmartDecisionMakerBlock._create_block_function_signature(
+    signature = await OrchestratorBlock._create_block_function_signature(
        mock_node, mock_links  # type: ignore
    )

--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dynamic_fields.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dynamic_fields.py
@@ -1,4 +1,4 @@
-"""Comprehensive tests for SmartDecisionMakerBlock dynamic field handling."""
+"""Comprehensive tests for OrchestratorBlock dynamic field handling."""

 import json
 from unittest.mock import AsyncMock, MagicMock, Mock, patch
@@ -6,7 +6,7 @@ from unittest.mock import AsyncMock, MagicMock, Mock, patch
 import pytest

 from backend.blocks.data_manipulation import AddToListBlock, CreateDictionaryBlock
-from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+from backend.blocks.orchestrator import OrchestratorBlock
 from backend.blocks.text import MatchTextPatternBlock
 from backend.data.dynamic_fields import get_dynamic_field_description

@@ -37,7 +37,7 @@ async def test_dynamic_field_description_generation():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_dict_fields():
    """Test that function signatures are created correctly for dictionary dynamic fields."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Create a mock node for CreateDictionaryBlock
    mock_node = Mock()
@@ -52,19 +52,19 @@ async def test_create_block_function_signature_with_dict_fields():
            source_name="tools_^_create_dict_~_values___name",  # Sanitized source
            sink_name="values_#_name",  # Original sink
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_values___age",  # Sanitized source
            sink_name="values_#_age",  # Original sink
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_values___email",  # Sanitized source
            sink_name="values_#_email",  # Original sink
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

@@ -100,7 +100,7 @@ async def test_create_block_function_signature_with_dict_fields():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_list_fields():
    """Test that function signatures are created correctly for list dynamic fields."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Create a mock node for AddToListBlock
    mock_node = Mock()
@@ -115,19 +115,19 @@ async def test_create_block_function_signature_with_list_fields():
            source_name="tools_^_add_list_~_0",
            sink_name="entries_$_0",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_add_list_~_1",
            sink_name="entries_$_1",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_add_list_~_2",
            sink_name="entries_$_2",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

@@ -154,7 +154,7 @@ async def test_create_block_function_signature_with_list_fields():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_object_fields():
    """Test that function signatures are created correctly for object dynamic fields."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Create a mock node for MatchTextPatternBlock (simulating object fields)
    mock_node = Mock()
@@ -169,13 +169,13 @@ async def test_create_block_function_signature_with_object_fields():
            source_name="tools_^_extract_~_user_name",
            sink_name="data_@_user_name",  # Dynamic object field
            sink_id="extract_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_extract_~_user_email",
            sink_name="data_@_user_email",  # Dynamic object field
            sink_id="extract_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

@@ -197,11 +197,11 @@ async def test_create_block_function_signature_with_object_fields():
@pytest.mark.asyncio
 async def test_create_tool_node_signatures():
    """Test that the mapping between sanitized and original field names is built correctly."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock the database client and connected nodes
    with patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client"
+        "backend.blocks.orchestrator.get_database_manager_async_client"
    ) as mock_db:
        mock_client = AsyncMock()
        mock_db.return_value = mock_client
@@ -281,7 +281,7 @@ async def test_create_tool_node_signatures():
@pytest.mark.asyncio
 async def test_output_yielding_with_dynamic_fields():
    """Test that outputs are yielded correctly with dynamic field names mapped back."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # No more sanitized mapping needed since we removed sanitization

@@ -309,13 +309,13 @@ async def test_output_yielding_with_dynamic_fields():

    # Mock the LLM call
    with patch(
-        "backend.blocks.smart_decision_maker.llm.llm_call", new_callable=AsyncMock
+        "backend.blocks.orchestrator.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm:
        mock_llm.return_value = mock_response

        # Mock the database manager to avoid HTTP calls during tool execution
        with patch(
-            "backend.blocks.smart_decision_maker.get_database_manager_async_client"
+            "backend.blocks.orchestrator.get_database_manager_async_client"
        ) as mock_db_manager, patch.object(
            block, "_create_tool_node_signatures", new_callable=AsyncMock
        ) as mock_sig:
@@ -420,7 +420,7 @@ async def test_output_yielding_with_dynamic_fields():
@pytest.mark.asyncio
 async def test_mixed_regular_and_dynamic_fields():
    """Test handling of blocks with both regular and dynamic fields."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Create a mock node
    mock_node = Mock()
@@ -450,19 +450,19 @@ async def test_mixed_regular_and_dynamic_fields():
            source_name="tools_^_test_~_regular",
            sink_name="regular_field",  # Regular field
            sink_id="test_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_test_~_dict_key",
            sink_name="values_#_key1",  # Dynamic dict field
            sink_id="test_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_test_~_dict_key2",
            sink_name="values_#_key2",  # Dynamic dict field
            sink_id="test_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

@@ -488,7 +488,7 @@ async def test_mixed_regular_and_dynamic_fields():
@pytest.mark.asyncio
 async def test_validation_errors_dont_pollute_conversation():
    """Test that validation errors are only used during retries and don't pollute the conversation."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Track conversation history changes
    conversation_snapshots = []
@@ -535,7 +535,7 @@ async def test_validation_errors_dont_pollute_conversation():

    # Mock the LLM call
    with patch(
-        "backend.blocks.smart_decision_maker.llm.llm_call", new_callable=AsyncMock
+        "backend.blocks.orchestrator.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm:
        mock_llm.side_effect = mock_llm_call

@@ -565,7 +565,7 @@ async def test_validation_errors_dont_pollute_conversation():

            # Mock the database manager to avoid HTTP calls during tool execution
            with patch(
-                "backend.blocks.smart_decision_maker.get_database_manager_async_client"
+                "backend.blocks.orchestrator.get_database_manager_async_client"
            ) as mock_db_manager:
                # Set up the mock database manager for agent mode
                mock_db_client = AsyncMock()
--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_responses_api.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_responses_api.py
@@ -1,6 +1,6 @@
-"""Tests for SmartDecisionMakerBlock compatibility with the OpenAI Responses API.
+"""Tests for OrchestratorBlock compatibility with the OpenAI Responses API.

-The SmartDecisionMakerBlock manages conversation history in the Chat Completions
+The OrchestratorBlock manages conversation history in the Chat Completions
 format, but OpenAI models now use the Responses API which has a fundamentally
 different conversation structure.  These tests document:

@@ -27,8 +27,8 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from backend.blocks.smart_decision_maker import (
-    SmartDecisionMakerBlock,
+from backend.blocks.orchestrator import (
+    OrchestratorBlock,
    _combine_tool_responses,
    _convert_raw_response_to_dict,
    _create_tool_response,
@@ -733,7 +733,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_no_reasoning_no_tools(self):
        """Dict raw_response, no reasoning → appends assistant dict."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": "hi"})
        block._update_conversation(prompt, resp)
@@ -741,7 +741,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_with_reasoning_no_tool_calls(self):
        """Reasoning present, no tool calls → reasoning prepended."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response(
            {"role": "assistant", "content": "answer"},
@@ -757,7 +757,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_with_reasoning_and_anthropic_tool_calls(self):
        """Reasoning + Anthropic tool_use in content → reasoning skipped."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        raw = {
            "role": "assistant",
@@ -772,7 +772,7 @@ class TestUpdateConversation:

    def test_with_tool_outputs(self):
        """Tool outputs → extended onto prompt."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": None})
        outputs = [{"role": "tool", "tool_call_id": "call_1", "content": "r"}]
@@ -782,7 +782,7 @@ class TestUpdateConversation:

    def test_without_tool_outputs(self):
        """No tool outputs → only assistant message appended."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": "done"})
        block._update_conversation(prompt, resp, None)
@@ -790,7 +790,7 @@ class TestUpdateConversation:

    def test_string_raw_response(self):
        """Ollama string → wrapped as assistant dict."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response("hello from ollama")
        block._update_conversation(prompt, resp)
@@ -800,7 +800,7 @@ class TestUpdateConversation:

    def test_responses_api_text_response_produces_valid_items(self):
        """Responses API text response → conversation items must have valid role."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = [
            {"role": "system", "content": "sys"},
            {"role": "user", "content": "user"},
@@ -820,7 +820,7 @@ class TestUpdateConversation:

    def test_responses_api_function_call_produces_valid_items(self):
        """Responses API function_call → conversation items must have valid type."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response(
            _MockResponse(output=[_MockFunctionCall("tool", "{}", call_id="call_1")])
@@ -856,7 +856,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
    """
    import backend.blocks.llm as llm_module

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # First response: tool call
    mock_tc = MagicMock()
@@ -936,7 +936,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
    with patch("backend.blocks.llm.llm_call", llm_mock), patch.object(
        block, "_create_tool_node_signatures", return_value=tool_sigs
    ), patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
+        "backend.blocks.orchestrator.get_database_manager_async_client",
        return_value=mock_db,
    ), patch(
        "backend.executor.manager.async_update_node_execution_status",
@@ -945,7 +945,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
        "backend.integrations.creds_manager.IntegrationCredentialsManager"
    ):

-        inp = SmartDecisionMakerBlock.Input(
+        inp = OrchestratorBlock.Input(
            prompt="Improve this",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -992,7 +992,7 @@ async def test_traditional_mode_conversation_valid_for_responses_api():
    """Traditional mode: the yielded conversation must contain only valid items."""
    import backend.blocks.llm as llm_module

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    mock_tc = MagicMock()
    mock_tc.function.name = "my_tool"
@@ -1028,7 +1028,7 @@ async def test_traditional_mode_conversation_valid_for_responses_api():
        "backend.blocks.llm.llm_call", new_callable=AsyncMock, return_value=resp
    ), patch.object(block, "_create_tool_node_signatures", return_value=tool_sigs):

-        inp = SmartDecisionMakerBlock.Input(
+        inp = OrchestratorBlock.Input(
            prompt="Do it",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
--- a/autogpt_platform/backend/backend/copilot/context.py
+++ b/autogpt_platform/backend/backend/copilot/context.py
@@ -17,6 +17,9 @@ from backend.util.workspace import WorkspaceManager
 if TYPE_CHECKING:
    from e2b import AsyncSandbox

+    from backend.copilot.permissions import CopilotPermissions
+
+
 # Allowed base directory for the Read tool.  Public so service.py can use it
 # for sweep operations without depending on a private implementation detail.
 # Respects CLAUDE_CONFIG_DIR env var, consistent with transcript.py's
@@ -43,6 +46,12 @@ _current_sandbox: ContextVar["AsyncSandbox | None"] = ContextVar(
 )
 _current_sdk_cwd: ContextVar[str] = ContextVar("_current_sdk_cwd", default="")

+# Current execution's capability filter.  None means "no restrictions".
+# Set by set_execution_context(); read by run_block and service.py.
+_current_permissions: "ContextVar[CopilotPermissions | None]" = ContextVar(
+    "_current_permissions", default=None
+)
+

 def encode_cwd_for_cli(cwd: str) -> str:
    """Encode a working directory path the same way the Claude CLI does.
@@ -63,6 +72,7 @@ def set_execution_context(
    session: ChatSession,
    sandbox: "AsyncSandbox | None" = None,
    sdk_cwd: str | None = None,
+    permissions: "CopilotPermissions | None" = None,
 ) -> None:
    """Set per-turn context variables used by file-resolution tool handlers."""
    _current_user_id.set(user_id)
@@ -70,6 +80,7 @@ def set_execution_context(
    _current_sandbox.set(sandbox)
    _current_sdk_cwd.set(sdk_cwd or "")
    _current_project_dir.set(_encode_cwd_for_cli(sdk_cwd) if sdk_cwd else "")
+    _current_permissions.set(permissions)


 def get_execution_context() -> tuple[str | None, ChatSession | None]:
@@ -77,6 +88,11 @@ def get_execution_context() -> tuple[str | None, ChatSession | None]:
    return _current_user_id.get(), _current_session.get()


+def get_current_permissions() -> "CopilotPermissions | None":
+    """Return the capability filter for the current execution, or None if unrestricted."""
+    return _current_permissions.get()
+
+
 def get_current_sandbox() -> "AsyncSandbox | None":
    """Return the E2B sandbox for the current session, or None if not active."""
    return _current_sandbox.get()
--- a/autogpt_platform/backend/backend/copilot/context_test.py
+++ b/autogpt_platform/backend/backend/copilot/context_test.py
@@ -11,6 +11,7 @@ import pytest
 from backend.copilot.context import (
    SDK_PROJECTS_DIR,
    _current_project_dir,
+    get_current_permissions,
    get_current_sandbox,
    get_execution_context,
    get_sdk_cwd,
@@ -18,6 +19,7 @@ from backend.copilot.context import (
    resolve_sandbox_path,
    set_execution_context,
 )
+from backend.copilot.permissions import CopilotPermissions


 def _make_session() -> MagicMock:
@@ -61,6 +63,19 @@ def test_get_current_sandbox_returns_set_value():
    assert get_current_sandbox() is mock_sandbox


+def test_set_and_get_current_permissions():
+    """set_execution_context stores permissions; get_current_permissions returns it."""
+    perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+    set_execution_context("u1", _make_session(), permissions=perms)
+    assert get_current_permissions() is perms
+
+
+def test_get_current_permissions_defaults_to_none():
+    """get_current_permissions returns None when no permissions have been set."""
+    set_execution_context("u1", _make_session())
+    assert get_current_permissions() is None
+
+
 def test_get_sdk_cwd_empty_when_not_set():
    """get_sdk_cwd returns empty string when sdk_cwd is not set."""
    set_execution_context("u1", _make_session(), sdk_cwd=None)
--- a/autogpt_platform/backend/backend/copilot/permissions.py
+++ b/autogpt_platform/backend/backend/copilot/permissions.py
@@ -0,0 +1,430 @@
+"""Copilot execution permissions — tool and block allow/deny filtering.
+
+:class:`CopilotPermissions` is the single model used everywhere:
+
+- ``AutoPilotBlock`` reads four block-input fields and builds one instance.
+- ``stream_chat_completion_sdk`` applies it when constructing
+  ``ClaudeAgentOptions.allowed_tools`` / ``disallowed_tools``.
+- ``run_block`` reads it from the contextvar to gate block execution.
+- Recursive (sub-agent) invocations merge parent and child so children
+  can only be *more* restrictive, never more permissive.
+
+Tool names
+----------
+Users specify the **short name** as it appears in ``TOOL_REGISTRY`` (e.g.
+``run_block``, ``web_fetch``) or as an SDK built-in (e.g. ``Read``,
+``Task``, ``WebSearch``).  Internally these are mapped to the full SDK
+format (``mcp__copilot__run_block``, ``Read``, …) by
+:func:`apply_tool_permissions`.
+
+Block identifiers
+-----------------
+Each entry in ``blocks`` may be one of:
+
+- A **full UUID** (``c069dc6b-c3ed-4c12-b6e5-d47361e64ce6``)
+- A **partial UUID** — the first 8-character hex segment (``c069dc6b``)
+- A **block name** (case-insensitive, e.g. ``"HTTP Request"``)
+
+:func:`validate_block_identifiers` resolves all entries against the live
+block registry and returns any that could not be matched.
+
+Semantics
+---------
+``tools_exclude=True``  (default) — ``tools`` is a **blacklist**; listed
+tools are denied and everything else is allowed.  An empty list means
+"allow all" (no filtering).
+
+``tools_exclude=False`` — ``tools`` is a **whitelist**; only listed tools
+are allowed.
+
+``blocks_exclude`` follows the same pattern for ``blocks``.
+
+Recursion inheritance
+---------------------
+:meth:`CopilotPermissions.merged_with_parent` produces a new instance that
+is at most as permissive as the parent:
+
+- Tools: effective-allowed sets are intersected then stored as a whitelist.
+- Blocks: the parent is stored in ``_parent`` and consulted during every
+  :meth:`is_block_allowed` call so both constraints must pass.
+"""
+
+from __future__ import annotations
+
+import re
+from typing import Literal, get_args
+
+from pydantic import BaseModel, PrivateAttr
+
+# ---------------------------------------------------------------------------
+# Constants — single source of truth for all accepted tool names
+# ---------------------------------------------------------------------------
+
+# Literal type combining all valid tool names — used by AutoPilotBlock.Input
+# so the frontend renders a multi-select dropdown.
+# This is the SINGLE SOURCE OF TRUTH.  All other name sets are derived from it.
+ToolName = Literal[
+    # Platform tools (must match keys in TOOL_REGISTRY)
+    "add_understanding",
+    "bash_exec",
+    "browser_act",
+    "browser_navigate",
+    "browser_screenshot",
+    "connect_integration",
+    "continue_run_block",
+    "create_agent",
+    "create_feature_request",
+    "create_folder",
+    "customize_agent",
+    "delete_folder",
+    "delete_workspace_file",
+    "edit_agent",
+    "find_agent",
+    "find_block",
+    "find_library_agent",
+    "fix_agent_graph",
+    "get_agent_building_guide",
+    "get_doc_page",
+    "get_mcp_guide",
+    "list_folders",
+    "list_workspace_files",
+    "move_agents_to_folder",
+    "move_folder",
+    "read_workspace_file",
+    "run_agent",
+    "run_block",
+    "run_mcp_tool",
+    "search_docs",
+    "search_feature_requests",
+    "update_folder",
+    "validate_agent_graph",
+    "view_agent_output",
+    "web_fetch",
+    "write_workspace_file",
+    # SDK built-ins
+    "Edit",
+    "Glob",
+    "Grep",
+    "Read",
+    "Task",
+    "TodoWrite",
+    "WebSearch",
+    "Write",
+]
+
+# Frozen set of all valid tool names — derived from the Literal.
+ALL_TOOL_NAMES: frozenset[str] = frozenset(get_args(ToolName))
+
+# SDK built-in tool names — uppercase-initial names are SDK built-ins.
+SDK_BUILTIN_TOOL_NAMES: frozenset[str] = frozenset(
+    n for n in ALL_TOOL_NAMES if n[0].isupper()
+)
+
+# Platform tool names — everything that isn't an SDK built-in.
+PLATFORM_TOOL_NAMES: frozenset[str] = ALL_TOOL_NAMES - SDK_BUILTIN_TOOL_NAMES
+
+# Compiled regex patterns for block identifier classification.
+_FULL_UUID_RE = re.compile(
+    r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
+    re.IGNORECASE,
+)
+_PARTIAL_UUID_RE = re.compile(r"^[0-9a-f]{8}$", re.IGNORECASE)
+
+
+# ---------------------------------------------------------------------------
+# Helper — block identifier matching
+# ---------------------------------------------------------------------------
+
+
+def _block_matches(identifier: str, block_id: str, block_name: str) -> bool:
+    """Return True if *identifier* resolves to the given block.
+
+    Resolution order:
+    1. Full UUID — exact case-insensitive match against *block_id*.
+    2. Partial UUID (8 hex chars, first segment) — prefix match.
+    3. Name — case-insensitive equality against *block_name*.
+    """
+    ident = identifier.strip()
+    if _FULL_UUID_RE.match(ident):
+        return ident.lower() == block_id.lower()
+    if _PARTIAL_UUID_RE.match(ident):
+        return block_id.lower().startswith(ident.lower())
+    return ident.lower() == block_name.lower()
+
+
+# ---------------------------------------------------------------------------
+# Model
+# ---------------------------------------------------------------------------
+
+
+class CopilotPermissions(BaseModel):
+    """Capability filter for a single copilot execution.
+
+    Attributes:
+        tools: Tool names to filter (short names, e.g. ``run_block``).
+        tools_exclude: When True (default) ``tools`` is a blacklist;
+            when False it is a whitelist.  Ignored when *tools* is empty.
+        blocks: Block identifiers (name, full UUID, or 8-char partial UUID).
+        blocks_exclude: Same semantics as *tools_exclude* but for blocks.
+    """
+
+    tools: list[str] = []
+    tools_exclude: bool = True
+    blocks: list[str] = []
+    blocks_exclude: bool = True
+
+    # Private: parent permissions for recursion inheritance.
+    # Set only by merged_with_parent(); never exposed in block input schema.
+    _parent: CopilotPermissions | None = PrivateAttr(default=None)
+
+    # ------------------------------------------------------------------
+    # Tool helpers
+    # ------------------------------------------------------------------
+
+    def effective_allowed_tools(self, all_tools: frozenset[str]) -> frozenset[str]:
+        """Compute the set of short tool names that are permitted.
+
+        Args:
+            all_tools: Universe of valid short tool names.
+
+        Returns:
+            Subset of *all_tools* that pass the filter.
+        """
+        if not self.tools:
+            return frozenset(all_tools)
+        tool_set = frozenset(self.tools)
+        if self.tools_exclude:
+            return all_tools - tool_set
+        return all_tools & tool_set
+
+    # ------------------------------------------------------------------
+    # Block helpers
+    # ------------------------------------------------------------------
+
+    def is_block_allowed(self, block_id: str, block_name: str) -> bool:
+        """Return True if the block may be executed under these permissions.
+
+        Checks this instance first, then consults the parent (if any) so
+        the entire inheritance chain is respected.
+        """
+        if not self._check_block_locally(block_id, block_name):
+            return False
+        if self._parent is not None:
+            return self._parent.is_block_allowed(block_id, block_name)
+        return True
+
+    def _check_block_locally(self, block_id: str, block_name: str) -> bool:
+        """Check *only* this instance's block filter (ignores parent)."""
+        if not self.blocks:
+            return True  # No filter → allow all
+        matched = any(
+            _block_matches(identifier, block_id, block_name)
+            for identifier in self.blocks
+        )
+        return not matched if self.blocks_exclude else matched
+
+    # ------------------------------------------------------------------
+    # Recursion / merging
+    # ------------------------------------------------------------------
+
+    def merged_with_parent(
+        self,
+        parent: CopilotPermissions,
+        all_tools: frozenset[str],
+    ) -> CopilotPermissions:
+        """Return a new instance that is at most as permissive as *parent*.
+
+        - Tools: intersection of effective-allowed sets, stored as a whitelist.
+        - Blocks: parent is stored internally; both constraints are applied
+          during :meth:`is_block_allowed`.
+        """
+        merged_tools = self.effective_allowed_tools(
+            all_tools
+        ) & parent.effective_allowed_tools(all_tools)
+        result = CopilotPermissions(
+            tools=sorted(merged_tools),
+            tools_exclude=False,
+            blocks=self.blocks,
+            blocks_exclude=self.blocks_exclude,
+        )
+        result._parent = parent
+        return result
+
+    # ------------------------------------------------------------------
+    # Convenience
+    # ------------------------------------------------------------------
+
+    def is_empty(self) -> bool:
+        """Return True when no filtering is configured (allow-all passthrough)."""
+        return not self.tools and not self.blocks and self._parent is None
+
+
+# ---------------------------------------------------------------------------
+# Validation helpers
+# ---------------------------------------------------------------------------
+
+
+def all_known_tool_names() -> frozenset[str]:
+    """Return all short tool names accepted in *tools*.
+
+    Returns the pre-computed ``ALL_TOOL_NAMES`` set (derived from the
+    ``ToolName`` Literal).  On first call, also verifies consistency with
+    the live ``TOOL_REGISTRY``.
+    """
+    _assert_tool_names_consistent()
+    return ALL_TOOL_NAMES
+
+
+def validate_tool_names(tools: list[str]) -> list[str]:
+    """Return entries in *tools* that are not valid tool names.
+
+    Args:
+        tools: List of short tool name strings to validate.
+
+    Returns:
+        List of invalid names (empty if all are valid).
+    """
+    return [t for t in tools if t not in ALL_TOOL_NAMES]
+
+
+_tool_names_checked = False
+
+
+def _assert_tool_names_consistent() -> None:
+    """Verify that ``PLATFORM_TOOL_NAMES`` matches ``TOOL_REGISTRY`` keys.
+
+    Called once lazily (TOOL_REGISTRY has heavy imports).  Raises
+    ``AssertionError`` with a helpful diff if they diverge.
+    """
+    global _tool_names_checked
+    if _tool_names_checked:
+        return
+    _tool_names_checked = True
+
+    from backend.copilot.tools import TOOL_REGISTRY
+
+    registry_keys: frozenset[str] = frozenset(TOOL_REGISTRY.keys())
+    declared: frozenset[str] = PLATFORM_TOOL_NAMES
+    if registry_keys != declared:
+        missing = registry_keys - declared
+        extra = declared - registry_keys
+        parts: list[str] = [
+            "PLATFORM_TOOL_NAMES in permissions.py is out of sync with TOOL_REGISTRY."
+        ]
+        if missing:
+            parts.append(f"  Missing from PLATFORM_TOOL_NAMES: {sorted(missing)}")
+        if extra:
+            parts.append(f"  Extra in PLATFORM_TOOL_NAMES: {sorted(extra)}")
+        parts.append("  Update the ToolName Literal to match.")
+        raise AssertionError("\n".join(parts))
+
+
+async def validate_block_identifiers(
+    identifiers: list[str],
+) -> list[str]:
+    """Resolve each block identifier and return those that could not be matched.
+
+    Args:
+        identifiers: List of block identifiers (name, full UUID, or partial UUID).
+
+    Returns:
+        List of identifiers that matched no known block.
+    """
+    from backend.blocks import get_blocks
+
+    # get_blocks() returns dict[block_id_str, BlockClass]; instantiate once to get names.
+    block_registry = get_blocks()
+    block_info = {bid: cls().name for bid, cls in block_registry.items()}
+    invalid: list[str] = []
+    for ident in identifiers:
+        matched = any(
+            _block_matches(ident, bid, bname) for bid, bname in block_info.items()
+        )
+        if not matched:
+            invalid.append(ident)
+    return invalid
+
+
+# ---------------------------------------------------------------------------
+# SDK tool-list application
+# ---------------------------------------------------------------------------
+
+
+def apply_tool_permissions(
+    permissions: CopilotPermissions,
+    *,
+    use_e2b: bool = False,
+) -> tuple[list[str], list[str]]:
+    """Compute (allowed_tools, extra_disallowed) for :class:`ClaudeAgentOptions`.
+
+    Takes the base allowed/disallowed lists from
+    :func:`~backend.copilot.sdk.tool_adapter.get_copilot_tool_names` /
+    :func:`~backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools` and
+    applies *permissions* on top.
+
+    Returns:
+        ``(allowed_tools, extra_disallowed)`` where *allowed_tools* is the
+        possibly-narrowed list to pass to ``ClaudeAgentOptions.allowed_tools``
+        and *extra_disallowed* is the list to pass to
+        ``ClaudeAgentOptions.disallowed_tools``.
+    """
+    from backend.copilot.sdk.tool_adapter import (
+        _READ_TOOL_NAME,
+        MCP_TOOL_PREFIX,
+        get_copilot_tool_names,
+        get_sdk_disallowed_tools,
+    )
+    from backend.copilot.tools import TOOL_REGISTRY
+
+    base_allowed = get_copilot_tool_names(use_e2b=use_e2b)
+    base_disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)
+
+    if permissions.is_empty():
+        return base_allowed, base_disallowed
+
+    all_tools = all_known_tool_names()
+    effective = permissions.effective_allowed_tools(all_tools)
+
+    # In E2B mode, SDK built-in file tools (Read, Write, Edit, Glob, Grep)
+    # are replaced by MCP equivalents (read_file, write_file, ...).
+    # Map each SDK built-in name to its E2B MCP name so users can use the
+    # familiar names in their permissions and the E2B tools are included.
+    _SDK_TO_E2B: dict[str, str] = {}
+    if use_e2b:
+        from backend.copilot.sdk.e2b_file_tools import E2B_FILE_TOOL_NAMES
+
+        _SDK_TO_E2B = dict(
+            zip(
+                ["Read", "Write", "Edit", "Glob", "Grep"],
+                E2B_FILE_TOOL_NAMES,
+                strict=False,
+            )
+        )
+
+    # Build an updated allowed list by mapping short names → SDK names and
+    # keeping only those present in the original base_allowed list.
+    def to_sdk_names(short: str) -> list[str]:
+        names: list[str] = []
+        if short in TOOL_REGISTRY:
+            names.append(f"{MCP_TOOL_PREFIX}{short}")
+        elif short in _SDK_TO_E2B:
+            # E2B mode: map SDK built-in file tool to its MCP equivalent.
+            names.append(f"{MCP_TOOL_PREFIX}{_SDK_TO_E2B[short]}")
+        else:
+            names.append(short)  # SDK built-in — used as-is
+        return names
+
+    # short names permitted by permissions
+    permitted_sdk: set[str] = set()
+    for s in effective:
+        permitted_sdk.update(to_sdk_names(s))
+    # Always include the internal Read tool (used by SDK for large/truncated outputs)
+    permitted_sdk.add(f"{MCP_TOOL_PREFIX}{_READ_TOOL_NAME}")
+
+    filtered_allowed = [t for t in base_allowed if t in permitted_sdk]
+
+    # Extra disallowed = tools that were in base_allowed but are now removed
+    removed = set(base_allowed) - set(filtered_allowed)
+    extra_disallowed = list(set(base_disallowed) | removed)
+
+    return filtered_allowed, extra_disallowed
--- a/autogpt_platform/backend/backend/copilot/permissions_test.py
+++ b/autogpt_platform/backend/backend/copilot/permissions_test.py
@@ -0,0 +1,579 @@
+"""Tests for CopilotPermissions — tool/block capability filtering."""
+
+from __future__ import annotations
+
+import pytest
+
+from backend.copilot.permissions import (
+    ALL_TOOL_NAMES,
+    PLATFORM_TOOL_NAMES,
+    SDK_BUILTIN_TOOL_NAMES,
+    CopilotPermissions,
+    _block_matches,
+    all_known_tool_names,
+    apply_tool_permissions,
+    validate_block_identifiers,
+    validate_tool_names,
+)
+from backend.copilot.tools import TOOL_REGISTRY
+
+# ---------------------------------------------------------------------------
+# _block_matches
+# ---------------------------------------------------------------------------
+
+
+class TestBlockMatches:
+    BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
+    BLOCK_NAME = "HTTP Request"
+
+    def test_full_uuid_match(self):
+        assert _block_matches(self.BLOCK_ID, self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_full_uuid_case_insensitive(self):
+        assert _block_matches(self.BLOCK_ID.upper(), self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_full_uuid_no_match(self):
+        other = "aaaaaaaa-0000-0000-0000-000000000000"
+        assert not _block_matches(other, self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_partial_uuid_match(self):
+        assert _block_matches("c069dc6b", self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_partial_uuid_case_insensitive(self):
+        assert _block_matches("C069DC6B", self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_partial_uuid_no_match(self):
+        assert not _block_matches("deadbeef", self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_name_match(self):
+        assert _block_matches("HTTP Request", self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_name_case_insensitive(self):
+        assert _block_matches("http request", self.BLOCK_ID, self.BLOCK_NAME)
+        assert _block_matches("HTTP REQUEST", self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_name_no_match(self):
+        assert not _block_matches("Unknown Block", self.BLOCK_ID, self.BLOCK_NAME)
+
+    def test_partial_uuid_not_matching_as_name(self):
+        # "c069dc6b" is 8 hex chars → treated as partial UUID, NOT name match
+        assert not _block_matches(
+            "c069dc6b", "ffffffff-0000-0000-0000-000000000000", "c069dc6b"
+        )
+
+
+# ---------------------------------------------------------------------------
+# CopilotPermissions.effective_allowed_tools
+# ---------------------------------------------------------------------------
+
+
+ALL_TOOLS = frozenset(
+    ["run_block", "web_fetch", "bash_exec", "find_agent", "Task", "Read"]
+)
+
+
+class TestEffectiveAllowedTools:
+    def test_empty_list_allows_all(self):
+        perms = CopilotPermissions(tools=[], tools_exclude=True)
+        assert perms.effective_allowed_tools(ALL_TOOLS) == ALL_TOOLS
+
+    def test_empty_whitelist_allows_all(self):
+        # edge: tools_exclude=False but empty list → allow all
+        perms = CopilotPermissions(tools=[], tools_exclude=False)
+        assert perms.effective_allowed_tools(ALL_TOOLS) == ALL_TOOLS
+
+    def test_blacklist_removes_listed(self):
+        perms = CopilotPermissions(tools=["bash_exec", "web_fetch"], tools_exclude=True)
+        result = perms.effective_allowed_tools(ALL_TOOLS)
+        assert "bash_exec" not in result
+        assert "web_fetch" not in result
+        assert "run_block" in result
+        assert "Task" in result
+
+    def test_whitelist_keeps_only_listed(self):
+        perms = CopilotPermissions(tools=["run_block", "Task"], tools_exclude=False)
+        result = perms.effective_allowed_tools(ALL_TOOLS)
+        assert result == frozenset(["run_block", "Task"])
+
+    def test_whitelist_unknown_tool_yields_empty(self):
+        perms = CopilotPermissions(tools=["nonexistent"], tools_exclude=False)
+        result = perms.effective_allowed_tools(ALL_TOOLS)
+        assert result == frozenset()
+
+    def test_blacklist_unknown_tool_ignored(self):
+        perms = CopilotPermissions(tools=["nonexistent"], tools_exclude=True)
+        result = perms.effective_allowed_tools(ALL_TOOLS)
+        assert result == ALL_TOOLS
+
+
+# ---------------------------------------------------------------------------
+# CopilotPermissions.is_block_allowed
+# ---------------------------------------------------------------------------
+
+
+BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
+BLOCK_NAME = "HTTP Request"
+
+
+class TestIsBlockAllowed:
+    def test_empty_allows_everything(self):
+        perms = CopilotPermissions(blocks=[], blocks_exclude=True)
+        assert perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_blacklist_blocks_listed(self):
+        perms = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
+        assert not perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_blacklist_allows_unlisted(self):
+        perms = CopilotPermissions(blocks=["Other Block"], blocks_exclude=True)
+        assert perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_whitelist_allows_listed(self):
+        perms = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=False)
+        assert perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_whitelist_blocks_unlisted(self):
+        perms = CopilotPermissions(blocks=["Other Block"], blocks_exclude=False)
+        assert not perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_partial_uuid_blacklist(self):
+        perms = CopilotPermissions(blocks=["c069dc6b"], blocks_exclude=True)
+        assert not perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_full_uuid_whitelist(self):
+        perms = CopilotPermissions(blocks=[BLOCK_ID], blocks_exclude=False)
+        assert perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_parent_blocks_when_child_allows(self):
+        parent = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
+        child = CopilotPermissions(blocks=[], blocks_exclude=True)
+        child._parent = parent
+        assert not child.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_parent_allows_when_child_blocks(self):
+        parent = CopilotPermissions(blocks=[], blocks_exclude=True)
+        child = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
+        child._parent = parent
+        assert not child.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_both_must_allow(self):
+        parent = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=False)
+        child = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=False)
+        child._parent = parent
+        assert child.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+    def test_grandparent_blocks_propagate(self):
+        grandparent = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
+        parent = CopilotPermissions(blocks=[], blocks_exclude=True)
+        parent._parent = grandparent
+        child = CopilotPermissions(blocks=[], blocks_exclude=True)
+        child._parent = parent
+        assert not child.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+
+# ---------------------------------------------------------------------------
+# CopilotPermissions.merged_with_parent
+# ---------------------------------------------------------------------------
+
+
+class TestMergedWithParent:
+    def test_tool_intersection(self):
+        all_t = frozenset(["run_block", "web_fetch", "bash_exec"])
+        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        child = CopilotPermissions(tools=["web_fetch"], tools_exclude=True)
+        merged = child.merged_with_parent(parent, all_t)
+        effective = merged.effective_allowed_tools(all_t)
+        assert "bash_exec" not in effective
+        assert "web_fetch" not in effective
+        assert "run_block" in effective
+
+    def test_parent_whitelist_narrows_child(self):
+        all_t = frozenset(["run_block", "web_fetch", "bash_exec"])
+        parent = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        child = CopilotPermissions(tools=[], tools_exclude=True)  # allow all
+        merged = child.merged_with_parent(parent, all_t)
+        effective = merged.effective_allowed_tools(all_t)
+        assert effective == frozenset(["run_block"])
+
+    def test_child_cannot_expand_parent_whitelist(self):
+        all_t = frozenset(["run_block", "web_fetch", "bash_exec"])
+        parent = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        child = CopilotPermissions(
+            tools=["run_block", "bash_exec"], tools_exclude=False
+        )
+        merged = child.merged_with_parent(parent, all_t)
+        effective = merged.effective_allowed_tools(all_t)
+        # bash_exec was not in parent's whitelist → must not appear
+        assert "bash_exec" not in effective
+        assert "run_block" in effective
+
+    def test_merged_stored_as_whitelist(self):
+        all_t = frozenset(["run_block", "web_fetch"])
+        parent = CopilotPermissions(tools=[], tools_exclude=True)
+        child = CopilotPermissions(tools=[], tools_exclude=True)
+        merged = child.merged_with_parent(parent, all_t)
+        assert not merged.tools_exclude  # stored as whitelist
+        assert set(merged.tools) == {"run_block", "web_fetch"}
+
+    def test_block_parent_stored(self):
+        all_t = frozenset(["run_block"])
+        parent = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
+        child = CopilotPermissions(blocks=[], blocks_exclude=True)
+        merged = child.merged_with_parent(parent, all_t)
+        # Parent restriction is preserved via _parent
+        assert not merged.is_block_allowed(BLOCK_ID, BLOCK_NAME)
+
+
+# ---------------------------------------------------------------------------
+# CopilotPermissions.is_empty
+# ---------------------------------------------------------------------------
+
+
+class TestIsEmpty:
+    def test_default_is_empty(self):
+        assert CopilotPermissions().is_empty()
+
+    def test_with_tools_not_empty(self):
+        assert not CopilotPermissions(tools=["bash_exec"]).is_empty()
+
+    def test_with_blocks_not_empty(self):
+        assert not CopilotPermissions(blocks=["HTTP Request"]).is_empty()
+
+    def test_with_parent_not_empty(self):
+        perms = CopilotPermissions()
+        perms._parent = CopilotPermissions(tools=["bash_exec"])
+        assert not perms.is_empty()
+
+
+# ---------------------------------------------------------------------------
+# validate_tool_names
+# ---------------------------------------------------------------------------
+
+
+class TestValidateToolNames:
+    def test_valid_registry_tool(self):
+        assert validate_tool_names(["run_block", "web_fetch"]) == []
+
+    def test_valid_sdk_builtin(self):
+        assert validate_tool_names(["Read", "Task", "WebSearch"]) == []
+
+    def test_invalid_tool(self):
+        result = validate_tool_names(["nonexistent_tool"])
+        assert "nonexistent_tool" in result
+
+    def test_mixed(self):
+        result = validate_tool_names(["run_block", "fake_tool"])
+        assert "fake_tool" in result
+        assert "run_block" not in result
+
+    def test_empty_list(self):
+        assert validate_tool_names([]) == []
+
+
+# ---------------------------------------------------------------------------
+# validate_block_identifiers (async)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+class TestValidateBlockIdentifiers:
+    async def test_empty_list(self):
+        result = await validate_block_identifiers([])
+        assert result == []
+
+    async def test_valid_full_uuid(self, mocker):
+        mock_block = mocker.MagicMock()
+        mock_block.return_value.name = "HTTP Request"
+        mocker.patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block},
+        )
+        result = await validate_block_identifiers(
+            ["c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"]
+        )
+        assert result == []
+
+    async def test_invalid_identifier(self, mocker):
+        mock_block = mocker.MagicMock()
+        mock_block.return_value.name = "HTTP Request"
+        mocker.patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block},
+        )
+        result = await validate_block_identifiers(["totally_unknown"])
+        assert "totally_unknown" in result
+
+    async def test_partial_uuid_match(self, mocker):
+        mock_block = mocker.MagicMock()
+        mock_block.return_value.name = "HTTP Request"
+        mocker.patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block},
+        )
+        result = await validate_block_identifiers(["c069dc6b"])
+        assert result == []
+
+    async def test_name_match(self, mocker):
+        mock_block = mocker.MagicMock()
+        mock_block.return_value.name = "HTTP Request"
+        mocker.patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block},
+        )
+        result = await validate_block_identifiers(["http request"])
+        assert result == []
+
+
+# ---------------------------------------------------------------------------
+# apply_tool_permissions
+# ---------------------------------------------------------------------------
+
+
+class TestApplyToolPermissions:
+    def test_empty_permissions_returns_base_unchanged(self, mocker):
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
+            return_value=["mcp__copilot__run_block", "mcp__copilot__web_fetch", "Task"],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
+            return_value=["Bash"],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": object(), "web_fetch": object()},
+        )
+        perms = CopilotPermissions()
+        allowed, disallowed = apply_tool_permissions(perms, use_e2b=False)
+        assert "mcp__copilot__run_block" in allowed
+        assert "mcp__copilot__web_fetch" in allowed
+
+    def test_blacklist_removes_tool(self, mocker):
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
+            return_value=[
+                "mcp__copilot__run_block",
+                "mcp__copilot__web_fetch",
+                "mcp__copilot__bash_exec",
+                "Task",
+            ],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
+            return_value=["Bash"],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {
+                "run_block": object(),
+                "web_fetch": object(),
+                "bash_exec": object(),
+            },
+        )
+        mocker.patch(
+            "backend.copilot.permissions.all_known_tool_names",
+            return_value=frozenset(["run_block", "web_fetch", "bash_exec", "Task"]),
+        )
+        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        allowed, _ = apply_tool_permissions(perms, use_e2b=False)
+        assert "mcp__copilot__bash_exec" not in allowed
+        assert "mcp__copilot__run_block" in allowed
+
+    def test_whitelist_keeps_only_listed(self, mocker):
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
+            return_value=[
+                "mcp__copilot__run_block",
+                "mcp__copilot__web_fetch",
+                "Task",
+                "WebSearch",
+            ],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
+            return_value=["Bash"],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": object(), "web_fetch": object()},
+        )
+        mocker.patch(
+            "backend.copilot.permissions.all_known_tool_names",
+            return_value=frozenset(["run_block", "web_fetch", "Task", "WebSearch"]),
+        )
+        perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        allowed, _ = apply_tool_permissions(perms, use_e2b=False)
+        assert "mcp__copilot__run_block" in allowed
+        assert "mcp__copilot__web_fetch" not in allowed
+        assert "Task" not in allowed
+
+    def test_read_tool_always_included_even_when_blacklisted(self, mocker):
+        """mcp__copilot__Read must stay in allowed even if Read is explicitly blacklisted."""
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
+            return_value=[
+                "mcp__copilot__run_block",
+                "mcp__copilot__Read",
+                "Task",
+            ],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
+            return_value=[],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": object()},
+        )
+        mocker.patch(
+            "backend.copilot.permissions.all_known_tool_names",
+            return_value=frozenset(["run_block", "Read", "Task"]),
+        )
+        # Explicitly blacklist Read
+        perms = CopilotPermissions(tools=["Read"], tools_exclude=True)
+        allowed, _ = apply_tool_permissions(perms, use_e2b=False)
+        assert "mcp__copilot__Read" in allowed  # always preserved for SDK internals
+        assert "mcp__copilot__run_block" in allowed
+        assert "Task" in allowed
+
+    def test_read_tool_always_included_with_narrow_whitelist(self, mocker):
+        """mcp__copilot__Read must stay in allowed even when not in a whitelist."""
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
+            return_value=[
+                "mcp__copilot__run_block",
+                "mcp__copilot__Read",
+                "Task",
+            ],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
+            return_value=[],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": object()},
+        )
+        mocker.patch(
+            "backend.copilot.permissions.all_known_tool_names",
+            return_value=frozenset(["run_block", "Read", "Task"]),
+        )
+        # Whitelist only run_block — Read not listed
+        perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        allowed, _ = apply_tool_permissions(perms, use_e2b=False)
+        assert "mcp__copilot__Read" in allowed  # always preserved for SDK internals
+        assert "mcp__copilot__run_block" in allowed
+
+    def test_e2b_file_tools_included_when_sdk_builtin_whitelisted(self, mocker):
+        """In E2B mode, whitelisting 'Read' must include mcp__copilot__read_file."""
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
+            return_value=[
+                "mcp__copilot__run_block",
+                "mcp__copilot__Read",
+                "mcp__copilot__read_file",
+                "mcp__copilot__write_file",
+                "Task",
+            ],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
+            return_value=["Bash", "Read", "Write", "Edit", "Glob", "Grep"],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": object()},
+        )
+        mocker.patch(
+            "backend.copilot.permissions.all_known_tool_names",
+            return_value=frozenset(["run_block", "Read", "Write", "Task"]),
+        )
+        mocker.patch(
+            "backend.copilot.sdk.e2b_file_tools.E2B_FILE_TOOL_NAMES",
+            ["read_file", "write_file", "edit_file", "glob", "grep"],
+        )
+        # Whitelist Read and run_block — E2B read_file should be included
+        perms = CopilotPermissions(tools=["Read", "run_block"], tools_exclude=False)
+        allowed, _ = apply_tool_permissions(perms, use_e2b=True)
+        assert "mcp__copilot__read_file" in allowed
+        assert "mcp__copilot__run_block" in allowed
+        # Write not whitelisted — write_file should NOT be included
+        assert "mcp__copilot__write_file" not in allowed
+
+    def test_e2b_file_tools_excluded_when_sdk_builtin_blacklisted(self, mocker):
+        """In E2B mode, blacklisting 'Read' must also remove mcp__copilot__read_file."""
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
+            return_value=[
+                "mcp__copilot__run_block",
+                "mcp__copilot__Read",
+                "mcp__copilot__read_file",
+                "Task",
+            ],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
+            return_value=["Bash", "Read", "Write", "Edit", "Glob", "Grep"],
+        )
+        mocker.patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": object()},
+        )
+        mocker.patch(
+            "backend.copilot.permissions.all_known_tool_names",
+            return_value=frozenset(["run_block", "Read", "Task"]),
+        )
+        mocker.patch(
+            "backend.copilot.sdk.e2b_file_tools.E2B_FILE_TOOL_NAMES",
+            ["read_file", "write_file", "edit_file", "glob", "grep"],
+        )
+        # Blacklist Read — E2B read_file should also be removed
+        perms = CopilotPermissions(tools=["Read"], tools_exclude=True)
+        allowed, _ = apply_tool_permissions(perms, use_e2b=True)
+        assert "mcp__copilot__read_file" not in allowed
+        assert "mcp__copilot__run_block" in allowed
+        # mcp__copilot__Read is always preserved for SDK internals
+        assert "mcp__copilot__Read" in allowed
+
+
+# ---------------------------------------------------------------------------
+# SDK_BUILTIN_TOOL_NAMES sanity check
+# ---------------------------------------------------------------------------
+
+
+class TestSdkBuiltinToolNames:
+    def test_expected_builtins_present(self):
+        expected = {
+            "Read",
+            "Write",
+            "Edit",
+            "Glob",
+            "Grep",
+            "Task",
+            "WebSearch",
+            "TodoWrite",
+        }
+        assert expected.issubset(SDK_BUILTIN_TOOL_NAMES)
+
+    def test_platform_names_match_tool_registry(self):
+        """PLATFORM_TOOL_NAMES (derived from ToolName Literal) must match TOOL_REGISTRY keys."""
+        registry_keys = frozenset(TOOL_REGISTRY.keys())
+        assert PLATFORM_TOOL_NAMES == registry_keys, (
+            f"ToolName Literal is out of sync with TOOL_REGISTRY. "
+            f"Missing: {registry_keys - PLATFORM_TOOL_NAMES}, "
+            f"Extra: {PLATFORM_TOOL_NAMES - registry_keys}"
+        )
+
+    def test_all_tool_names_is_union(self):
+        """ALL_TOOL_NAMES must equal PLATFORM_TOOL_NAMES | SDK_BUILTIN_TOOL_NAMES."""
+        assert ALL_TOOL_NAMES == PLATFORM_TOOL_NAMES | SDK_BUILTIN_TOOL_NAMES
+
+    def test_no_overlap_between_platform_and_sdk(self):
+        """Platform and SDK built-in names must not overlap."""
+        assert PLATFORM_TOOL_NAMES.isdisjoint(SDK_BUILTIN_TOOL_NAMES)
+
+    def test_known_tools_includes_registry_and_builtins(self):
+        known = all_known_tool_names()
+        assert "run_block" in known
+        assert "Read" in known
+        assert "Task" in known
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -12,34 +12,18 @@ from backend.copilot.tools import TOOL_REGISTRY
 # Shared technical notes that apply to both SDK and baseline modes
 _SHARED_TOOL_NOTES = f"""\

-### Sharing files with the user
-After saving a file to the persistent workspace with `write_workspace_file`,
-share it with the user by embedding the `download_url` from the response in
-your message as a Markdown link or image:
+### Sharing files
+After `write_workspace_file`, embed the `download_url` in Markdown:
+- File: `[report.csv](workspace://file_id#text/csv)`
+- Image: `![chart](workspace://file_id#image/png)`
+- Video: `![recording](workspace://file_id#video/mp4)`

- **Any file** — shows as a clickable download link:
-  `[report.csv](workspace://file_id#text/csv)`
- **Image** — renders inline in chat:
-  `![chart](workspace://file_id#image/png)`
- **Video** — renders inline in chat with player controls:
-  `![recording](workspace://file_id#video/mp4)`
-
-The `download_url` field in the `write_workspace_file` response is already
-in the correct format — paste it directly after the `(` in the Markdown.
-
-### Passing file content to tools — @@agptfile: references
-Instead of copying large file contents into a tool argument, pass a file
-reference and the platform will load the content for you.
-
-Syntax: `@@agptfile:<uri>[<start>-<end>]`
-
- `<uri>` **must** start with `workspace://` or `/` (absolute path):
-  - `workspace://<file_id>` — workspace file by ID
-  - `workspace:///<path>` — workspace file by virtual path
-  - `/absolute/local/path` — ephemeral or sdk_cwd file
-  - E2B sandbox absolute path (e.g. `/home/user/script.py`)
- `[<start>-<end>]` is an optional 1-indexed inclusive line range.
- URIs that do not start with `workspace://` or `/` are **not** expanded.
+### File references — @@agptfile:
+Pass large file content to tools by reference: `@@agptfile:<uri>[<start>-<end>]`
+- `workspace://<file_id>` or `workspace:///<path>` — workspace files
+- `/absolute/path` — local/sandbox files
+- `[start-end]` — optional 1-indexed line range
+- Multiple refs per argument supported. Only `workspace://` and absolute paths are expanded.

 Examples:
 ```
@@ -50,21 +34,9 @@ Examples:
@@agptfile:/home/user/script.py
 ```

-You can embed a reference inside any string argument, or use it as the entire
-value.  Multiple references in one argument are all expanded.
+**Structured data**: When the entire argument is a single file reference, the platform auto-parses by extension/MIME. Supported: JSON, JSONL, CSV, TSV, YAML, TOML, Parquet, Excel (.xlsx only; legacy `.xls` is NOT supported). Unrecognised formats return plain string.

-**Structured data**: When the **entire** argument value is a single file
-reference (no surrounding text), the platform automatically parses the file
-content based on its extension or MIME type.  Supported formats: JSON, JSONL,
-CSV, TSV, YAML, TOML, Parquet, and Excel (.xlsx — first sheet only).
-For example, pass `@@agptfile:workspace://<id>` where the file is a `.csv` and
-the rows will be parsed into `list[list[str]]` automatically.  If the format is
-unrecognised or parsing fails, the content is returned as a plain string.
-Legacy `.xls` files are **not** supported — only the modern `.xlsx` format.
-
-**Type coercion**: The platform also coerces expanded values to match the
-block's expected input types.  For example, if a block expects `list[list[str]]`
-and the expanded value is a JSON string, it will be parsed into the correct type.
+**Type coercion**: The platform auto-coerces expanded string values to match block input types (e.g. JSON string → `list[list[str]]`).

 ### Media file inputs (format: "file")
 Some block inputs accept media files — their schema shows `"format": "file"`.
@@ -91,6 +63,50 @@ Example — committing an image file to GitHub:
 }}
 ```

+### Writing large files — CRITICAL
+**Never write an entire large document in a single tool call.**  When the
+content you want to write exceeds ~2000 words the tool call's output token
+limit will silently truncate the arguments, producing an empty `{{}}` input
+that fails repeatedly.
+
+**Preferred: compose from file references.**  If the data is already in
+files (tool outputs, workspace files), compose the report in one call
+using `@@agptfile:` references — the system expands them inline:
+
+```bash
+cat > report.md << 'EOF'
+# Research Report
+## Data from web research
+@@agptfile:/home/user/web_results.txt
+## Block execution output
+@@agptfile:workspace://<file_id>
+## Conclusion
+<brief synthesis>
+EOF
+```
+
+**Fallback: write section-by-section.**  When you must generate content
+from conversation context (no files to reference), split into multiple
+`bash_exec` calls — one section per call:
+
+```bash
+cat > report.md << 'EOF'
+# Section 1
+<content from your earlier tool call results>
+EOF
+```
+```bash
+cat >> report.md << 'EOF'
+# Section 2
+<content from your earlier tool call results>
+EOF
+```
+Use `cat >` for the first chunk and `cat >>` to append subsequent chunks.
+Do not re-fetch or re-generate data you already have from prior tool calls.
+
+After building the file, reference it with `@@agptfile:` in other tools:
+`@@agptfile:/home/user/report.md`
+
 ### Sub-agent tasks
 - When using the Task tool, NEVER set `run_in_background` to true.
  All tasks must run in the foreground.
@@ -166,17 +182,12 @@ def _build_storage_supplement(

 ## Tool notes

-### Shell commands
- The SDK built-in Bash tool is NOT available.  Use the `bash_exec` MCP tool
-  for shell commands — it runs {sandbox_type}.
-
-### Working directory
- Your working directory is: `{working_dir}`
- All SDK file tools AND `bash_exec` operate on the same filesystem
- Use relative paths or absolute paths under `{working_dir}` for all file operations
+### Shell & filesystem
+- The SDK built-in Bash tool is NOT available. Use `bash_exec` for shell commands ({sandbox_type}). Working dir: `{working_dir}`
+- SDK file tools (Read/Write/Edit/Glob/Grep) and `bash_exec` share one filesystem — use relative or absolute paths under this dir.
+- `read_workspace_file`/`write_workspace_file` operate on **persistent cloud workspace storage** (separate from the working dir).

 ### Two storage systems — CRITICAL to understand
-
 1. **{storage_system_1_name}** (`{working_dir}`):
 {characteristics}
 {persistence}
--- a/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
@@ -143,11 +143,11 @@ To use an MCP (Model Context Protocol) tool as a node in the agent:
   tool_arguments.
 6. Output: `result` (the tool's return value) and `error` (error message)

-### Using SmartDecisionMakerBlock (AI Orchestrator with Agent Mode)
+### Using OrchestratorBlock (AI Orchestrator with Agent Mode)

 To create an agent where AI autonomously decides which tools or sub-agents to
 call in a loop until the task is complete:
-1. Create a `SmartDecisionMakerBlock` node
+1. Create a `OrchestratorBlock` node
   (ID: `3b191d9f-356f-482d-8238-ba04b6d18381`)
 2. Set `input_default`:
   - `agent_mode_max_iterations`: Choose based on task complexity:
@@ -169,8 +169,8 @@ call in a loop until the task is complete:
 3. Wire the `prompt` input from an `AgentInputBlock` (the user's task)
 4. Create downstream tool blocks — regular blocks **or** `AgentExecutorBlock`
   nodes that call sub-agents
-5. Link each tool to the SmartDecisionMaker: set `source_name: "tools"` on
-   the SmartDecisionMaker side and `sink_name: <input_field>` on each tool
+5. Link each tool to the Orchestrator: set `source_name: "tools"` on
+   the Orchestrator side and `sink_name: <input_field>` on each tool
   block's input. Create one link per input field the tool needs.
 6. Wire the `finished` output to an `AgentOutputBlock` for the final result
 7. Credentials (LLM API key) are configured by the user in the platform UI
@@ -178,35 +178,35 @@ call in a loop until the task is complete:

 **Example — Orchestrator calling two sub-agents:**
 - Node 1: `AgentInputBlock` (input_default: `{"name": "task"}`)
- Node 2: `SmartDecisionMakerBlock` (input_default:
+- Node 2: `OrchestratorBlock` (input_default:
  `{"agent_mode_max_iterations": 10, "conversation_compaction": true}`)
 - Node 3: `AgentExecutorBlock` (sub-agent A — set `graph_id`, `graph_version`,
  `input_schema`, `output_schema` from library agent)
 - Node 4: `AgentExecutorBlock` (sub-agent B — same pattern)
 - Node 5: `AgentOutputBlock` (input_default: `{"name": "result"}`)
 - Links:
-  - Input→SDM: `source_name: "result"`, `sink_name: "prompt"`
-  - SDM→Agent A (per input field): `source_name: "tools"`,
+  - Input→Orchestrator: `source_name: "result"`, `sink_name: "prompt"`
+  - Orchestrator→Agent A (per input field): `source_name: "tools"`,
    `sink_name: "<agent_a_input_field>"`
-  - SDM→Agent B (per input field): `source_name: "tools"`,
+  - Orchestrator→Agent B (per input field): `source_name: "tools"`,
    `sink_name: "<agent_b_input_field>"`
-  - SDM→Output: `source_name: "finished"`, `sink_name: "value"`
+  - Orchestrator→Output: `source_name: "finished"`, `sink_name: "value"`

 **Example — Orchestrator calling regular blocks as tools:**
 - Node 1: `AgentInputBlock` (input_default: `{"name": "task"}`)
- Node 2: `SmartDecisionMakerBlock` (input_default:
+- Node 2: `OrchestratorBlock` (input_default:
  `{"agent_mode_max_iterations": 5, "conversation_compaction": true}`)
 - Node 3: `GetWebpageBlock` (regular block — the AI calls it as a tool)
 - Node 4: `AITextGeneratorBlock` (another regular block as a tool)
 - Node 5: `AgentOutputBlock` (input_default: `{"name": "result"}`)
 - Links:
-  - Input→SDM: `source_name: "result"`, `sink_name: "prompt"`
-  - SDM→GetWebpage: `source_name: "tools"`, `sink_name: "url"`
-  - SDM→AITextGenerator: `source_name: "tools"`, `sink_name: "prompt"`
-  - SDM→Output: `source_name: "finished"`, `sink_name: "value"`
+  - Input→Orchestrator: `source_name: "result"`, `sink_name: "prompt"`
+  - Orchestrator→GetWebpage: `source_name: "tools"`, `sink_name: "url"`
+  - Orchestrator→AITextGenerator: `source_name: "tools"`, `sink_name: "prompt"`
+  - Orchestrator→Output: `source_name: "finished"`, `sink_name: "value"`

 Regular blocks work exactly like sub-agents as tools — wire each input
-field from `source_name: "tools"` on the SmartDecisionMaker side.
+field from `source_name: "tools"` on the Orchestrator side.

 ### Example: Simple AI Text Processor

--- a/autogpt_platform/backend/backend/copilot/sdk/collect.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/collect.py
@@ -7,7 +7,20 @@ without implementing their own event loop.

 from __future__ import annotations

-from typing import Any
+from typing import TYPE_CHECKING, Any
+
+from backend.copilot.response_model import (
+    StreamError,
+    StreamTextDelta,
+    StreamToolInputAvailable,
+    StreamToolOutputAvailable,
+    StreamUsage,
+)
+
+from .service import stream_chat_completion_sdk
+
+if TYPE_CHECKING:
+    from backend.copilot.permissions import CopilotPermissions


 class CopilotResult:
@@ -39,6 +52,7 @@ async def collect_copilot_response(
    message: str,
    user_id: str,
    is_user_message: bool = True,
+    permissions: "CopilotPermissions | None" = None,
 ) -> CopilotResult:
    """Consume :func:`stream_chat_completion_sdk` and return aggregated results.

@@ -53,6 +67,8 @@ async def collect_copilot_response(
        message: The user message / prompt.
        user_id: Authenticated user ID.
        is_user_message: Whether this is a user-initiated message.
+        permissions: Optional capability filter.  When provided, restricts
+            which tools and blocks the copilot may use during this execution.

    Returns:
        A :class:`CopilotResult` with the aggregated response text,
@@ -61,16 +77,6 @@ async def collect_copilot_response(
    Raises:
        RuntimeError: If the stream yields a ``StreamError`` event.
    """
-    from backend.copilot.response_model import (
-        StreamError,
-        StreamTextDelta,
-        StreamToolInputAvailable,
-        StreamToolOutputAvailable,
-        StreamUsage,
-    )
-
-    from .service import stream_chat_completion_sdk
-
    result = CopilotResult()
    response_parts: list[str] = []
    tool_calls_by_id: dict[str, dict[str, Any]] = {}
@@ -80,6 +86,7 @@ async def collect_copilot_response(
        message=message,
        is_user_message=is_user_message,
        user_id=user_id,
+        permissions=permissions,
    ):
        if isinstance(event, StreamTextDelta):
            response_parts.append(event.delta)
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
@@ -2,19 +2,20 @@

 import asyncio
 import base64
-import functools
 import json
 import logging
 import os
 import re
 import shutil
-import subprocess
 import sys
 import time
 import uuid
 from collections.abc import AsyncGenerator, AsyncIterator
 from dataclasses import dataclass
-from typing import Any, NamedTuple, cast
+from typing import TYPE_CHECKING, Any, NamedTuple, cast
+
+if TYPE_CHECKING:
+    from backend.copilot.permissions import CopilotPermissions

 from claude_agent_sdk import (
    AssistantMessage,
@@ -31,6 +32,7 @@ from langsmith.integrations.claude_agent_sdk import configure_claude_agent_sdk
 from pydantic import BaseModel

 from backend.copilot.context import get_workspace_manager
+from backend.copilot.permissions import apply_tool_permissions
 from backend.data.redis_client import get_redis_async
 from backend.executor.cluster_lock import AsyncClusterLock
 from backend.util.exceptions import NotFoundError
@@ -77,10 +79,15 @@ from ..tracking import track_user_message
 from .compaction import CompactionTracker, filter_compaction_messages
 from .response_adapter import SDKResponseAdapter
 from .security_hooks import create_security_hooks
+from .subscription import validate_subscription as _validate_claude_code_subscription
 from .tool_adapter import (
+    cancel_pending_tool_tasks,
    create_copilot_mcp_server,
    get_copilot_tool_names,
    get_sdk_disallowed_tools,
+    pre_launch_tool_call,
+    reset_stash_event,
+    reset_tool_failure_counters,
    set_execution_context,
    wait_for_stash,
 )
@@ -106,6 +113,20 @@ config = ChatConfig()
 # Non-context errors (network, auth, rate-limit) are NOT retried.
 _MAX_STREAM_ATTEMPTS = 3

+# Hard circuit breaker: abort the stream if the model sends this many
+# consecutive tool calls with empty parameters (a sign of context
+# saturation or serialization failure).  Empty input ({}) is never
+# legitimate — even one is suspicious, three is conclusive.
+_EMPTY_TOOL_CALL_LIMIT = 3
+
+# User-facing error shown when the empty-tool-call circuit breaker trips.
+_CIRCUIT_BREAKER_ERROR_MSG = (
+    "AutoPilot was unable to complete the tool call "
+    "— this usually happens when the response is "
+    "too large to fit in a single tool call. "
+    "Try breaking your request into smaller parts."
+)
+
 # Patterns that indicate the prompt/request exceeds the model's context limit.
 # Matched case-insensitively against the full exception chain.
 _PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
@@ -164,6 +185,19 @@ def _is_prompt_too_long(err: BaseException) -> bool:
    return False


+def _is_tool_only_message(sdk_msg: object) -> bool:
+    """Return True if *sdk_msg* is an AssistantMessage containing only ToolUseBlocks.
+
+    Such a message represents a parallel tool-call batch (no text output yet).
+    The ``bool(…content)`` guard prevents vacuous-truth evaluation on an empty list.
+    """
+    return (
+        isinstance(sdk_msg, AssistantMessage)
+        and bool(sdk_msg.content)
+        and all(isinstance(b, ToolUseBlock) for b in sdk_msg.content)
+    )
+
+
 class ReducedContext(NamedTuple):
    builder: TranscriptBuilder
    use_resume: bool
@@ -458,37 +492,6 @@ def _resolve_sdk_model() -> str | None:
    return model


-@functools.cache
-def _validate_claude_code_subscription() -> None:
-    """Validate Claude CLI is installed and responds to `--version`.
-
-    Cached so the blocking subprocess check runs at most once per process
-    lifetime.  A failure (CLI not installed) is a config error that requires
-    a process restart anyway.
-    """
-    claude_path = shutil.which("claude")
-    if not claude_path:
-        raise RuntimeError(
-            "Claude Code CLI not found. Install it with: "
-            "npm install -g @anthropic-ai/claude-code"
-        )
-    result = subprocess.run(
-        [claude_path, "--version"],
-        capture_output=True,
-        text=True,
-        timeout=10,
-    )
-    if result.returncode != 0:
-        raise RuntimeError(
-            f"Claude CLI check failed (exit {result.returncode}): "
-            f"{result.stderr.strip()}"
-        )
-    logger.info(
-        "Claude Code subscription mode: CLI version %s",
-        result.stdout.strip(),
-    )
-
-
 def _build_sdk_env(
    session_id: str | None = None,
    user_id: str | None = None,
@@ -1028,15 +1031,122 @@ def _dispatch_response(
    return response


-class _TransientErrorHandled(Exception):
+class _HandledStreamError(Exception):
    """Raised by `_run_stream_attempt` after it has already yielded a
-    `StreamError` for a transient API error.
+    `StreamError` to the client (e.g. transient API error, circuit breaker).

    This signals the outer retry loop that the attempt failed so it can
    perform session-message rollback and set the `ended_with_stream_error`
    flag, **without** yielding a duplicate `StreamError` to the client.
+
+    Attributes:
+        error_msg: The user-facing error message to persist.
+        code: Machine-readable error code (e.g. ``circuit_breaker_empty_tool_calls``).
+        retryable: Whether the frontend should offer a retry button.
    """

+    def __init__(
+        self,
+        message: str,
+        error_msg: str | None = None,
+        code: str | None = None,
+        retryable: bool = True,
+    ):
+        super().__init__(message)
+        self.error_msg = error_msg
+        self.code = code
+        self.retryable = retryable
+
+
+@dataclass
+class _EmptyToolBreakResult:
+    """Result of checking for empty tool calls in a single AssistantMessage."""
+
+    count: int  # Updated consecutive counter
+    tripped: bool  # Whether the circuit breaker fired
+    error: StreamError | None  # StreamError to yield (if tripped)
+    error_msg: str | None  # Error message (if tripped)
+    error_code: str | None  # Error code (if tripped)
+
+
+def _check_empty_tool_breaker(
+    sdk_msg: object,
+    consecutive: int,
+    ctx: _StreamContext,
+    state: _RetryState,
+) -> _EmptyToolBreakResult:
+    """Detect consecutive empty tool calls and trip the circuit breaker.
+
+    Returns an ``_EmptyToolBreakResult`` with the updated counter and, if the
+    breaker tripped, the ``StreamError`` to yield plus the error metadata.
+    """
+    if not isinstance(sdk_msg, AssistantMessage):
+        return _EmptyToolBreakResult(consecutive, False, None, None, None)
+
+    empty_tools = [
+        b.name for b in sdk_msg.content if isinstance(b, ToolUseBlock) and not b.input
+    ]
+    if not empty_tools:
+        # Reset on any non-empty-tool AssistantMessage (including text-only
+        # messages — any() over empty content is False).
+        return _EmptyToolBreakResult(0, False, None, None, None)
+
+    consecutive += 1
+
+    # Log full diagnostics on first occurrence only; subsequent hits just
+    # log the counter to reduce noise.
+    if consecutive == 1:
+        logger.warning(
+            "%s Empty tool call detected (%d/%d): "
+            "tools=%s, model=%s, error=%s, "
+            "block_types=%s, cumulative_usage=%s",
+            ctx.log_prefix,
+            consecutive,
+            _EMPTY_TOOL_CALL_LIMIT,
+            empty_tools,
+            sdk_msg.model,
+            sdk_msg.error,
+            [type(b).__name__ for b in sdk_msg.content],
+            {
+                "prompt": state.usage.prompt_tokens,
+                "completion": state.usage.completion_tokens,
+                "cache_read": state.usage.cache_read_tokens,
+            },
+        )
+    else:
+        logger.warning(
+            "%s Empty tool call detected (%d/%d): tools=%s",
+            ctx.log_prefix,
+            consecutive,
+            _EMPTY_TOOL_CALL_LIMIT,
+            empty_tools,
+        )
+
+    if consecutive < _EMPTY_TOOL_CALL_LIMIT:
+        return _EmptyToolBreakResult(consecutive, False, None, None, None)
+
+    logger.error(
+        "%s Circuit breaker: aborting stream after %d "
+        "consecutive empty tool calls. "
+        "This is likely caused by the model attempting "
+        "to write content too large for a single tool "
+        "call's output token limit. The model should "
+        "write large files in chunks using bash_exec "
+        "with cat >> (append).",
+        ctx.log_prefix,
+        consecutive,
+    )
+    error_msg = _CIRCUIT_BREAKER_ERROR_MSG
+    error_code = "circuit_breaker_empty_tool_calls"
+    _append_error_marker(ctx.session, error_msg, retryable=True)
+    return _EmptyToolBreakResult(
+        count=consecutive,
+        tripped=True,
+        error=StreamError(errorText=error_msg, code=error_code),
+        error_msg=error_msg,
+        error_code=error_code,
+    )
+

 async def _run_stream_attempt(
    ctx: _StreamContext,
@@ -1071,6 +1181,12 @@ async def _run_stream_attempt(
        accumulated_tool_calls=[],
    )
    ended_with_stream_error = False
+    # Stores the error message used by _append_error_marker so the outer
+    # retry loop can re-append the correct message after session rollback.
+    stream_error_msg: str | None = None
+    stream_error_code: str | None = None
+
+    consecutive_empty_tool_calls = 0

    async with ClaudeSDKClient(options=state.options) as client:
        logger.info(
@@ -1161,18 +1277,43 @@ async def _run_stream_attempt(
                        "suppressing raw error text",
                        ctx.log_prefix,
                    )
+                    stream_error_msg = FRIENDLY_TRANSIENT_MSG
+                    stream_error_code = "transient_api_error"
                    _append_error_marker(
                        ctx.session,
-                        FRIENDLY_TRANSIENT_MSG,
+                        stream_error_msg,
                        retryable=True,
                    )
                    yield StreamError(
-                        errorText=FRIENDLY_TRANSIENT_MSG,
-                        code="transient_api_error",
+                        errorText=stream_error_msg,
+                        code=stream_error_code,
                    )
                    ended_with_stream_error = True
                    break

+            # Parallel tool execution: pre-launch every ToolUseBlock as an
+            # asyncio.Task the moment its AssistantMessage arrives.  The SDK
+            # sends one AssistantMessage per tool call when issuing parallel
+            # calls, so each message is pre-launched independently.  The MCP
+            # handlers will await the already-running task instead of executing
+            # fresh, making all concurrent tool calls run in parallel.
+            #
+            # Also determine if the message is a tool-only batch (all content
+            # items are ToolUseBlocks) — such messages have no text output yet,
+            # so we skip the wait_for_stash flush below.
+            is_tool_only = False
+            if isinstance(sdk_msg, AssistantMessage) and sdk_msg.content:
+                is_tool_only = True
+                # NOTE: Pre-launches are sequential (each await completes
+                # file-ref expansion before the next starts).  This is fine
+                # since expansion is typically sub-ms; a future optimisation
+                # could gather all pre-launches concurrently.
+                for tool_use in sdk_msg.content:
+                    if isinstance(tool_use, ToolUseBlock):
+                        await pre_launch_tool_call(tool_use.name, tool_use.input)
+                    else:
+                        is_tool_only = False
+
            # Race-condition fix: SDK hooks (PostToolUse) are
            # executed asynchronously via start_soon() — the next
            # message can arrive before the hook stashes output.
@@ -1186,15 +1327,12 @@ async def _run_stream_attempt(
            # AssistantMessages (each containing only
            # ToolUseBlocks), we must NOT wait/flush — the prior
            # tools are still executing concurrently.
-            is_parallel_continuation = isinstance(sdk_msg, AssistantMessage) and all(
-                isinstance(b, ToolUseBlock) for b in sdk_msg.content
-            )
            if (
                state.adapter.has_unresolved_tool_calls
                and isinstance(sdk_msg, (AssistantMessage, ResultMessage))
-                and not is_parallel_continuation
+                and not is_tool_only
            ):
-                if await wait_for_stash(timeout=0.5):
+                if await wait_for_stash():
                    await asyncio.sleep(0)
                else:
                    logger.warning(
@@ -1209,13 +1347,17 @@ async def _run_stream_attempt(
            if isinstance(sdk_msg, ResultMessage):
                logger.info(
                    "%s Received: ResultMessage %s "
-                    "(unresolved=%d, current=%d, resolved=%d)",
+                    "(unresolved=%d, current=%d, resolved=%d, "
+                    "num_turns=%d, cost_usd=%s, result=%s)",
                    ctx.log_prefix,
                    sdk_msg.subtype,
                    len(state.adapter.current_tool_calls)
                    - len(state.adapter.resolved_tool_calls),
                    len(state.adapter.current_tool_calls),
                    len(state.adapter.resolved_tool_calls),
+                    sdk_msg.num_turns,
+                    sdk_msg.total_cost_usd,
+                    (sdk_msg.result or "")[:200],
                )
                if sdk_msg.subtype in (
                    "error",
@@ -1272,6 +1414,18 @@ async def _run_stream_attempt(
                    )
                    entries_replaced = True

+            # --- Hard circuit breaker for empty tool calls ---
+            breaker = _check_empty_tool_breaker(
+                sdk_msg, consecutive_empty_tool_calls, ctx, state
+            )
+            consecutive_empty_tool_calls = breaker.count
+            if breaker.tripped and breaker.error is not None:
+                stream_error_msg = breaker.error_msg
+                stream_error_code = breaker.error_code
+                yield breaker.error
+                ended_with_stream_error = True
+                break
+
            # --- Dispatch adapter responses ---
            for response in state.adapter.convert_message(sdk_msg):
                dispatched = _dispatch_response(
@@ -1352,8 +1506,10 @@ async def _run_stream_attempt(
    # to the client (StreamError yielded above), raise so the outer retry
    # loop can rollback session messages and set its error flags properly.
    if ended_with_stream_error:
-        raise _TransientErrorHandled(
-            "Transient API error handled — StreamError already yielded"
+        raise _HandledStreamError(
+            "Stream error handled — StreamError already yielded",
+            error_msg=stream_error_msg,
+            code=stream_error_code,
        )


@@ -1364,6 +1520,7 @@ async def stream_chat_completion_sdk(
    user_id: str | None = None,
    session: ChatSession | None = None,
    file_ids: list[str] | None = None,
+    permissions: "CopilotPermissions | None" = None,
    **_kwargs: Any,
 ) -> AsyncIterator[StreamBaseResponse]:
    """Stream chat completion using Claude Agent SDK.
@@ -1609,7 +1766,13 @@ async def stream_chat_completion_sdk(

        yield StreamStart(messageId=message_id, sessionId=session_id)

-        set_execution_context(user_id, session, sandbox=e2b_sandbox, sdk_cwd=sdk_cwd)
+        set_execution_context(
+            user_id,
+            session,
+            sandbox=e2b_sandbox,
+            sdk_cwd=sdk_cwd,
+            permissions=permissions,
+        )

        # Fail fast when no API credentials are available at all.
        sdk_env = _build_sdk_env(session_id=session_id, user_id=user_id)
@@ -1635,8 +1798,11 @@ async def stream_chat_completion_sdk(
            on_compact=compaction.on_compact,
        )

-        allowed = get_copilot_tool_names(use_e2b=use_e2b)
-        disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)
+        if permissions is not None:
+            allowed, disallowed = apply_tool_permissions(permissions, use_e2b=use_e2b)
+        else:
+            allowed = get_copilot_tool_names(use_e2b=use_e2b)
+            disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)

        def _on_stderr(line: str) -> None:
            """Log a stderr line emitted by the Claude CLI subprocess."""
@@ -1746,6 +1912,12 @@ async def stream_chat_completion_sdk(
        )

        for attempt in range(_MAX_STREAM_ATTEMPTS):
+            # Clear any stale stash signal from the previous attempt so
+            # wait_for_stash() doesn't fire prematurely on a leftover event.
+            reset_stash_event()
+            # Reset tool-level circuit breaker so failures from a previous
+            # (rolled-back) attempt don't carry over to the fresh attempt.
+            reset_tool_failure_counters()
            if attempt > 0:
                logger.info(
                    "%s Retrying with reduced context (%d/%d)",
@@ -1801,6 +1973,10 @@ async def stream_chat_completion_sdk(
                    if not isinstance(event, StreamHeartbeat):
                        events_yielded += 1
                    yield event
+                # Cancel any pre-launched tasks that were never dispatched
+                # by the SDK (e.g. edge-case SDK behaviour changes). Symmetric
+                # with the three error-path await cancel_pending_tool_tasks() calls.
+                await cancel_pending_tool_tasks()
                break  # Stream completed — exit retry loop
            except asyncio.CancelledError:
                logger.warning(
@@ -1809,26 +1985,42 @@ async def stream_chat_completion_sdk(
                    attempt + 1,
                    _MAX_STREAM_ATTEMPTS,
                )
+                # Cancel any pre-launched tasks so they don't continue executing
+                # against a rolled-back or abandoned session.
+                await cancel_pending_tool_tasks()
                raise
-            except _TransientErrorHandled:
+            except _HandledStreamError as exc:
                # _run_stream_attempt already yielded a StreamError and
                # appended an error marker.  We only need to rollback
                # session messages and set the error flag — do NOT set
                # stream_err so the post-loop code won't emit a
                # duplicate StreamError.
                logger.warning(
-                    "%s Transient error handled in stream attempt "
-                    "(attempt %d/%d, events_yielded=%d)",
+                    "%s Stream error handled in attempt "
+                    "(attempt %d/%d, code=%s, events_yielded=%d)",
                    log_prefix,
                    attempt + 1,
                    _MAX_STREAM_ATTEMPTS,
+                    exc.code or "transient",
                    events_yielded,
                )
                session.messages = session.messages[:pre_attempt_msg_count]
+                # transcript_builder still contains entries from the aborted
+                # attempt that no longer match session.messages.  Skip upload
+                # so a future --resume doesn't replay rolled-back content.
+                skip_transcript_upload = True
                # Re-append the error marker so it survives the rollback
                # and is persisted by the finally block (see #2947655365).
-                _append_error_marker(session, FRIENDLY_TRANSIENT_MSG, retryable=True)
+                # Use the specific error message from the attempt (e.g.
+                # circuit breaker msg) rather than always the generic one.
+                _append_error_marker(
+                    session,
+                    exc.error_msg or FRIENDLY_TRANSIENT_MSG,
+                    retryable=True,
+                )
                ended_with_stream_error = True
+                # Cancel any pre-launched tasks from the failed attempt.
+                await cancel_pending_tool_tasks()
                break
            except Exception as e:
                stream_err = e
@@ -1845,6 +2037,9 @@ async def stream_chat_completion_sdk(
                    exc_info=True,
                )
                session.messages = session.messages[:pre_attempt_msg_count]
+                # Cancel any pre-launched tasks from the failed attempt so they
+                # don't continue executing against the rolled-back session.
+                await cancel_pending_tool_tasks()
                if events_yielded > 0:
                    # Events were already sent to the frontend and cannot be
                    # unsent.  Retrying would produce duplicate/inconsistent
@@ -1854,11 +2049,13 @@ async def stream_chat_completion_sdk(
                        log_prefix,
                        events_yielded,
                    )
+                    skip_transcript_upload = True
                    ended_with_stream_error = True
                    break
                if not is_context_error:
                    # Non-context errors (network, auth, rate-limit) should
                    # not trigger compaction — surface the error immediately.
+                    skip_transcript_upload = True
                    ended_with_stream_error = True
                    break
                continue
--- a/autogpt_platform/backend/backend/copilot/sdk/service_helpers_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service_helpers_test.py
@@ -1,21 +1,23 @@
 """Unit tests for extracted service helpers.

 Covers ``_is_prompt_too_long``, ``_reduce_context``, ``_iter_sdk_messages``,
-and the ``ReducedContext`` named tuple.
+``ReducedContext``, and the ``is_parallel_continuation`` logic.
 """

 from __future__ import annotations

 import asyncio
 from collections.abc import AsyncGenerator
-from unittest.mock import AsyncMock, patch
+from unittest.mock import AsyncMock, MagicMock, patch

 import pytest
+from claude_agent_sdk import AssistantMessage, TextBlock, ToolUseBlock

 from .conftest import build_test_transcript as _build_transcript
 from .service import (
    ReducedContext,
    _is_prompt_too_long,
+    _is_tool_only_message,
    _iter_sdk_messages,
    _reduce_context,
 )
@@ -281,3 +283,55 @@ class TestIterSdkMessages:
        first = await gen.__anext__()
        assert first == "first"
        await gen.aclose()  # should cancel pending task cleanly
+
+
+# ---------------------------------------------------------------------------
+# is_parallel_continuation logic
+# ---------------------------------------------------------------------------
+
+
+class TestIsParallelContinuation:
+    """Unit tests for the is_parallel_continuation expression in the streaming loop.
+
+    Verifies the vacuous-truth guard (empty content must return False) and the
+    boundary cases for mixed TextBlock+ToolUseBlock messages.
+    """
+
+    def _make_tool_block(self) -> MagicMock:
+        block = MagicMock(spec=ToolUseBlock)
+        return block
+
+    def test_all_tool_use_blocks_is_parallel(self):
+        """AssistantMessage with only ToolUseBlocks is a parallel continuation."""
+        msg = MagicMock(spec=AssistantMessage)
+        msg.content = [self._make_tool_block(), self._make_tool_block()]
+        assert _is_tool_only_message(msg) is True
+
+    def test_empty_content_is_not_parallel(self):
+        """AssistantMessage with empty content must NOT be treated as parallel.
+
+        Without the bool(sdk_msg.content) guard, all() on an empty iterable
+        returns True via vacuous truth — this test ensures the guard is present.
+        """
+        msg = MagicMock(spec=AssistantMessage)
+        msg.content = []
+        assert _is_tool_only_message(msg) is False
+
+    def test_mixed_text_and_tool_blocks_not_parallel(self):
+        """AssistantMessage with text + tool blocks is NOT a parallel continuation."""
+        msg = MagicMock(spec=AssistantMessage)
+        text_block = MagicMock(spec=TextBlock)
+        msg.content = [text_block, self._make_tool_block()]
+        assert _is_tool_only_message(msg) is False
+
+    def test_non_assistant_message_not_parallel(self):
+        """Non-AssistantMessage types are never parallel continuations."""
+        assert _is_tool_only_message("not a message") is False
+        assert _is_tool_only_message(None) is False
+        assert _is_tool_only_message(42) is False
+
+    def test_single_tool_block_is_parallel(self):
+        """Single ToolUseBlock AssistantMessage is a parallel continuation."""
+        msg = MagicMock(spec=AssistantMessage)
+        msg.content = [self._make_tool_block()]
+        assert _is_tool_only_message(msg) is True
--- a/autogpt_platform/backend/backend/copilot/sdk/subscription.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/subscription.py
@@ -0,0 +1,144 @@
+"""Claude Code subscription auth helpers.
+
+Handles locating the SDK-bundled CLI binary, provisioning credentials from
+environment variables, and validating that subscription auth is functional.
+"""
+
+import functools
+import json
+import logging
+import os
+import shutil
+import subprocess
+
+logger = logging.getLogger(__name__)
+
+
+def find_bundled_cli() -> str:
+    """Locate the Claude CLI binary bundled inside ``claude_agent_sdk``.
+
+    Falls back to ``shutil.which("claude")`` if the SDK bundle is absent.
+    """
+    try:
+        from claude_agent_sdk._internal.transport.subprocess_cli import (
+            SubprocessCLITransport,
+        )
+
+        path = SubprocessCLITransport._find_bundled_cli(None)  # type: ignore[arg-type]
+        if path:
+            return str(path)
+    except Exception:
+        pass
+    system_path = shutil.which("claude")
+    if system_path:
+        return system_path
+    raise RuntimeError(
+        "Claude CLI not found — neither the SDK-bundled binary nor a "
+        "system-installed `claude` could be located."
+    )
+
+
+def provision_credentials_file() -> None:
+    """Write ``~/.claude/.credentials.json`` from env when running headless.
+
+    If ``CLAUDE_CODE_OAUTH_TOKEN`` is set (an OAuth *access* token obtained
+    from ``claude auth status`` or extracted from the macOS keychain), this
+    helper writes a minimal credentials file so the bundled CLI can
+    authenticate without an interactive ``claude login``.
+
+    A ``CLAUDE_CODE_REFRESH_TOKEN`` env var is optional but recommended —
+    it lets the CLI silently refresh an expired access token.
+    """
+    access_token = os.environ.get("CLAUDE_CODE_OAUTH_TOKEN", "").strip()
+    if not access_token:
+        return
+
+    creds_dir = os.path.expanduser("~/.claude")
+    creds_path = os.path.join(creds_dir, ".credentials.json")
+
+    # Don't overwrite an existing credentials file (e.g. from a volume mount).
+    if os.path.exists(creds_path):
+        logger.debug("Credentials file already exists at %s — skipping", creds_path)
+        return
+
+    os.makedirs(creds_dir, exist_ok=True)
+
+    creds = {
+        "claudeAiOauth": {
+            "accessToken": access_token,
+            "refreshToken": os.environ.get("CLAUDE_CODE_REFRESH_TOKEN", "").strip(),
+            "expiresAt": 0,
+            "scopes": [
+                "user:inference",
+                "user:profile",
+                "user:sessions:claude_code",
+            ],
+        }
+    }
+    with open(creds_path, "w") as f:
+        json.dump(creds, f)
+    logger.info("Provisioned Claude credentials file at %s", creds_path)
+
+
+@functools.cache
+def validate_subscription() -> None:
+    """Validate the bundled Claude CLI is reachable and authenticated.
+
+    Cached so the blocking subprocess check runs at most once per process
+    lifetime.  On first call, also provisions ``~/.claude/.credentials.json``
+    from the ``CLAUDE_CODE_OAUTH_TOKEN`` env var when available.
+    """
+    provision_credentials_file()
+
+    cli = find_bundled_cli()
+    result = subprocess.run(
+        [cli, "--version"],
+        capture_output=True,
+        text=True,
+        timeout=10,
+    )
+    if result.returncode != 0:
+        raise RuntimeError(
+            f"Claude CLI check failed (exit {result.returncode}): "
+            f"{result.stderr.strip()}"
+        )
+    logger.info(
+        "Claude Code subscription mode: CLI version %s",
+        result.stdout.strip(),
+    )
+
+    # Verify the CLI is actually authenticated.
+    auth_result = subprocess.run(
+        [cli, "auth", "status"],
+        capture_output=True,
+        text=True,
+        timeout=10,
+        env={
+            **os.environ,
+            "ANTHROPIC_API_KEY": "",
+            "ANTHROPIC_AUTH_TOKEN": "",
+            "ANTHROPIC_BASE_URL": "",
+        },
+    )
+    if auth_result.returncode != 0:
+        raise RuntimeError(
+            "Claude CLI is not authenticated. Either:\n"
+            "  • Set CLAUDE_CODE_OAUTH_TOKEN env var (from `claude auth status` "
+            "or macOS keychain), or\n"
+            "  • Mount ~/.claude/.credentials.json into the container, or\n"
+            "  • Run `claude login` inside the container."
+        )
+    try:
+        status = json.loads(auth_result.stdout)
+        if not status.get("loggedIn"):
+            raise RuntimeError(
+                "Claude CLI reports loggedIn=false. Set CLAUDE_CODE_OAUTH_TOKEN "
+                "or run `claude login`."
+            )
+        logger.info(
+            "Claude subscription auth: method=%s, email=%s",
+            status.get("authMethod"),
+            status.get("email"),
+        )
+    except json.JSONDecodeError:
+        logger.warning("Could not parse `claude auth status` output")
--- a/autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
@@ -0,0 +1,96 @@
+"""Tests for the tool call circuit breaker in tool_adapter.py."""
+
+import pytest
+
+from backend.copilot.sdk.tool_adapter import (
+    _MAX_CONSECUTIVE_TOOL_FAILURES,
+    _check_circuit_breaker,
+    _clear_tool_failures,
+    _consecutive_tool_failures,
+    _record_tool_failure,
+)
+
+
+@pytest.fixture(autouse=True)
+def _reset_tracker():
+    """Reset the circuit breaker tracker for each test."""
+    token = _consecutive_tool_failures.set({})
+    yield
+    _consecutive_tool_failures.reset(token)
+
+
+class TestCircuitBreaker:
+    def test_no_trip_below_threshold(self):
+        """Circuit breaker should not trip before reaching the limit."""
+        args = {"file_path": "/tmp/test.txt"}
+        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES - 1):
+            assert _check_circuit_breaker("write_file", args) is None
+            _record_tool_failure("write_file", args)
+        # Still under the limit
+        assert _check_circuit_breaker("write_file", args) is None
+
+    def test_trips_at_threshold(self):
+        """Circuit breaker should trip after reaching the failure limit."""
+        args = {"file_path": "/tmp/test.txt"}
+        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
+            assert _check_circuit_breaker("write_file", args) is None
+            _record_tool_failure("write_file", args)
+        # Now it should trip
+        result = _check_circuit_breaker("write_file", args)
+        assert result is not None
+        assert "STOP" in result
+        assert "write_file" in result
+
+    def test_different_args_tracked_separately(self):
+        """Different args should have separate failure counters."""
+        args_a = {"file_path": "/tmp/a.txt"}
+        args_b = {"file_path": "/tmp/b.txt"}
+        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
+            _record_tool_failure("write_file", args_a)
+        # args_a should trip
+        assert _check_circuit_breaker("write_file", args_a) is not None
+        # args_b should NOT trip
+        assert _check_circuit_breaker("write_file", args_b) is None
+
+    def test_different_tools_tracked_separately(self):
+        """Different tools should have separate failure counters."""
+        args = {"file_path": "/tmp/test.txt"}
+        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
+            _record_tool_failure("tool_a", args)
+        # tool_a should trip
+        assert _check_circuit_breaker("tool_a", args) is not None
+        # tool_b with same args should NOT trip
+        assert _check_circuit_breaker("tool_b", args) is None
+
+    def test_empty_args_tracked(self):
+        """Empty args ({}) — the exact failure pattern from the bug — should be tracked."""
+        args = {}
+        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
+            _record_tool_failure("write_file", args)
+        assert _check_circuit_breaker("write_file", args) is not None
+
+    def test_clear_resets_counter(self):
+        """Clearing failures should reset the counter."""
+        args = {}
+        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
+            _record_tool_failure("write_file", args)
+        _clear_tool_failures("write_file")
+        assert _check_circuit_breaker("write_file", args) is None
+
+    def test_success_clears_failures(self):
+        """A successful call should reset the failure counter."""
+        args = {}
+        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES - 1):
+            _record_tool_failure("write_file", args)
+        # Success clears failures
+        _clear_tool_failures("write_file")
+        # Should be able to fail again without tripping
+        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES - 1):
+            _record_tool_failure("write_file", args)
+        assert _check_circuit_breaker("write_file", args) is None
+
+    def test_no_tracker_returns_none(self):
+        """If tracker is not initialized, circuit breaker should not trip."""
+        _consecutive_tool_failures.set(None)  # type: ignore[arg-type]
+        _record_tool_failure("write_file", {})  # should not raise
+        assert _check_circuit_breaker("write_file", {}) is None
--- a/autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
@@ -16,6 +16,7 @@ from typing import TYPE_CHECKING, Any
 from claude_agent_sdk import create_sdk_mcp_server, tool

 from backend.copilot.context import (
+    _current_permissions,
    _current_project_dir,
    _current_sandbox,
    _current_sdk_cwd,
@@ -41,6 +42,8 @@ from .e2b_file_tools import E2B_FILE_TOOL_NAMES, E2B_FILE_TOOLS
 if TYPE_CHECKING:
    from e2b import AsyncSandbox

+    from backend.copilot.permissions import CopilotPermissions
+
 logger = logging.getLogger(__name__)

 # Max MCP response size in chars — keeps tool output under the SDK's 10 MB JSON buffer.
@@ -50,6 +53,14 @@ _MCP_MAX_CHARS = 500_000
 MCP_SERVER_NAME = "copilot"
 MCP_TOOL_PREFIX = f"mcp__{MCP_SERVER_NAME}__"

+# Map from tool_name -> Queue of pre-launched (task, args) pairs.
+# Initialised per-session in set_execution_context() so concurrent sessions
+# never share the same dict.
+_TaskQueueItem = tuple[asyncio.Task[dict[str, Any]], dict[str, Any]]
+_tool_task_queues: ContextVar[dict[str, asyncio.Queue[_TaskQueueItem]] | None] = (
+    ContextVar("_tool_task_queues", default=None)
+)
+
 # Stash for MCP tool outputs before the SDK potentially truncates them.
 # Keyed by tool_name → full output string. Consumed (popped) by the
 # response adapter when it builds StreamToolOutputAvailable.
@@ -66,12 +77,23 @@ _stash_event: ContextVar[asyncio.Event | None] = ContextVar(
    "_stash_event", default=None
 )

+# Circuit breaker: tracks consecutive tool failures to detect infinite retry loops.
+# When a tool is called repeatedly with empty/identical args and keeps failing,
+# this counter is incremented.  After _MAX_CONSECUTIVE_TOOL_FAILURES identical
+# failures the tool handler returns a hard-stop message instead of the raw error.
+_MAX_CONSECUTIVE_TOOL_FAILURES = 3
+_consecutive_tool_failures: ContextVar[dict[str, int]] = ContextVar(
+    "_consecutive_tool_failures",
+    default=None,  # type: ignore[arg-type]
+)
+

 def set_execution_context(
    user_id: str | None,
    session: ChatSession,
    sandbox: "AsyncSandbox | None" = None,
    sdk_cwd: str | None = None,
+    permissions: "CopilotPermissions | None" = None,
 ) -> None:
    """Set the execution context for tool calls.

@@ -83,14 +105,83 @@ def set_execution_context(
        session: Current chat session.
        sandbox: Optional E2B sandbox; when set, bash_exec routes commands there.
        sdk_cwd: SDK working directory; used to scope tool-results reads.
+        permissions: Optional capability filter restricting tools/blocks.
    """
    _current_user_id.set(user_id)
    _current_session.set(session)
    _current_sandbox.set(sandbox)
    _current_sdk_cwd.set(sdk_cwd or "")
    _current_project_dir.set(_encode_cwd_for_cli(sdk_cwd) if sdk_cwd else "")
+    _current_permissions.set(permissions)
    _pending_tool_outputs.set({})
    _stash_event.set(asyncio.Event())
+    _tool_task_queues.set({})
+    _consecutive_tool_failures.set({})
+
+
+def reset_stash_event() -> None:
+    """Clear any stale stash signal left over from a previous stream attempt.
+
+    ``_stash_event`` is set once per session in ``set_execution_context`` and
+    reused across retry attempts.  A PostToolUse hook from a failed attempt may
+    leave the event set; calling this at the start of each retry prevents
+    ``wait_for_stash`` from returning prematurely on a stale signal.
+    """
+    event = _stash_event.get(None)
+    if event is not None:
+        event.clear()
+
+
+async def cancel_pending_tool_tasks() -> None:
+    """Cancel all queued pre-launched tasks for the current execution context.
+
+    Call this when a stream attempt aborts (error, cancellation) to prevent
+    pre-launched tasks from continuing to execute against a rolled-back session.
+    Tasks that are already done are skipped; in-flight tasks are cancelled and
+    awaited so that any cleanup (``finally`` blocks, DB rollbacks) completes
+    before the next retry starts.
+    """
+    queues = _tool_task_queues.get()
+    if not queues:
+        return
+    cancelled_tasks: list[asyncio.Task] = []
+    for tool_name, queue in list(queues.items()):
+        cancelled = 0
+        while not queue.empty():
+            task, _args = queue.get_nowait()
+            if not task.done():
+                task.cancel()
+                cancelled_tasks.append(task)
+                cancelled += 1
+        if cancelled:
+            logger.debug(
+                "Cancelled %d pre-launched task(s) for tool '%s'", cancelled, tool_name
+            )
+    queues.clear()
+    # Await all cancelled tasks so their cleanup (finally blocks, DB rollbacks)
+    # completes before the next retry attempt starts new pre-launches.
+    # Use a timeout to prevent hanging indefinitely if a task's cleanup is stuck.
+    if cancelled_tasks:
+        try:
+            await asyncio.wait_for(
+                asyncio.gather(*cancelled_tasks, return_exceptions=True),
+                timeout=5.0,
+            )
+        except TimeoutError:
+            logger.warning(
+                "Timed out waiting for %d cancelled task(s) to clean up",
+                len(cancelled_tasks),
+            )
+
+
+def reset_tool_failure_counters() -> None:
+    """Reset all tool-level circuit breaker counters.
+
+    Called at the start of each SDK retry attempt so that failure counts
+    from a previous (rolled-back) attempt do not carry over and prematurely
+    trip the breaker on a fresh attempt with different context.
+    """
+    _consecutive_tool_failures.set({})


 def pop_pending_tool_output(tool_name: str) -> str | None:
@@ -155,12 +246,13 @@ async def wait_for_stash(timeout: float = 2.0) -> bool:
    by waiting on the ``_stash_event``, which is signaled by
    :func:`stash_pending_tool_output`.

-    Returns ``True`` if a stash signal was received, ``False`` on timeout.
+    Uses ``asyncio.Event.wait()`` so it returns the instant the hook signals —
+    the timeout is purely a safety net for the case where the hook never fires.
+    Returns ``True`` if the stash signal was received, ``False`` on timeout.

-    The 2.0 s default was chosen based on production metrics: the original
-    0.5 s caused frequent timeouts under load (parallel tool calls, large
-    outputs).  2.0 s gives a comfortable margin while still failing fast
-    when the hook genuinely will not fire.
+    The 2.0 s default was chosen to accommodate slower tool startup in cloud
+    sandboxes while still failing fast when the hook genuinely will not fire.
+    With the parallel pre-launch path, hooks typically fire well under 1 ms.
    """
    event = _stash_event.get(None)
    if event is None:
@@ -169,7 +261,7 @@ async def wait_for_stash(timeout: float = 2.0) -> bool:
    if event.is_set():
        event.clear()
        return True
-    # Slow path: wait for the hook to signal.
+    # Slow path: block until the hook signals or the safety timeout expires.
    try:
        async with asyncio.timeout(timeout):
            await event.wait()
@@ -179,6 +271,82 @@ async def wait_for_stash(timeout: float = 2.0) -> bool:
        return False


+async def pre_launch_tool_call(tool_name: str, args: dict[str, Any]) -> None:
+    """Pre-launch a tool as a background task so parallel calls run concurrently.
+
+    Called when an AssistantMessage with ToolUseBlocks is received, before the
+    SDK dispatches the MCP tool/call requests. The tool_handler will await the
+    pre-launched task instead of executing fresh.
+
+    The tool_name may include an MCP prefix (e.g. ``mcp__copilot__run_block``);
+    the prefix is stripped automatically before looking up the tool.
+
+    Ordering guarantee: the Claude Agent SDK dispatches MCP ``tools/call`` requests
+    in the same order as the ToolUseBlocks appear in the AssistantMessage.
+    Pre-launched tasks are queued FIFO per tool name, so the N-th handler for a
+    given tool name dequeues the N-th pre-launched task — result and args always
+    correspond when the SDK preserves order (which it does in the current SDK).
+    """
+    queues = _tool_task_queues.get()
+    if queues is None:
+        return
+
+    # Strip the MCP server prefix (e.g. "mcp__copilot__") to get the bare tool name.
+    # Use removeprefix so tool names that themselves contain "__" are handled correctly.
+    bare_name = tool_name.removeprefix(MCP_TOOL_PREFIX)
+
+    base_tool = TOOL_REGISTRY.get(bare_name)
+    if base_tool is None:
+        return
+
+    user_id, session = get_execution_context()
+    if session is None:
+        return
+
+    # Expand @@agptfile: references before launching the task.
+    # The _truncating wrapper (which normally handles expansion) runs AFTER
+    # pre_launch_tool_call — the pre-launched task would otherwise receive raw
+    # @@agptfile: tokens and fail to resolve them inside _execute_tool_sync.
+    # Use _build_input_schema (same path as _truncating) for schema-aware expansion.
+    input_schema: dict[str, Any] | None
+    try:
+        input_schema = _build_input_schema(base_tool)
+    except Exception:
+        input_schema = None  # schema unavailable — skip schema-aware expansion
+    try:
+        args = await expand_file_refs_in_args(
+            args, user_id, session, input_schema=input_schema
+        )
+    except FileRefExpansionError as exc:
+        logger.warning(
+            "pre_launch_tool_call: @@agptfile expansion failed for %s: %s — skipping pre-launch",
+            bare_name,
+            exc,
+        )
+        return
+
+    task = asyncio.create_task(_execute_tool_sync(base_tool, user_id, session, args))
+    # Log unhandled exceptions so "Task exception was never retrieved" warnings
+    # do not pollute stderr when a task is pre-launched but never dequeued.
+    task.add_done_callback(
+        lambda t, name=bare_name: (
+            logger.warning(
+                "Pre-launched task for %s raised unhandled: %s",
+                name,
+                t.exception(),
+            )
+            if not t.cancelled() and t.exception()
+            else None
+        )
+    )
+
+    if bare_name not in queues:
+        queues[bare_name] = asyncio.Queue[_TaskQueueItem]()
+    # Store (task, args) so the handler can log a warning if the SDK dispatches
+    # calls in a different order than the ToolUseBlocks appeared in the message.
+    queues[bare_name].put_nowait((task, args))
+
+
 async def _execute_tool_sync(
    base_tool: BaseTool,
    user_id: str | None,
@@ -187,8 +355,10 @@ async def _execute_tool_sync(
 ) -> dict[str, Any]:
    """Execute a tool synchronously and return MCP-formatted response.

-    Note: ``@@agptfile:`` expansion is handled upstream in the ``_truncating`` wrapper
-    so all registered handlers (BaseTool, E2B, Read) expand uniformly.
+    Note: ``@@agptfile:`` expansion should be performed by the caller before
+    invoking this function.  For the normal (non-parallel) path it is handled
+    by the ``_truncating`` wrapper; for the pre-launched parallel path it is
+    handled in :func:`pre_launch_tool_call` before the task is created.
    """
    effective_id = f"sdk-{uuid.uuid4().hex[:12]}"
    result = await base_tool.execute(
@@ -217,6 +387,66 @@ def _mcp_error(message: str) -> dict[str, Any]:
    }


+def _failure_key(tool_name: str, args: dict[str, Any]) -> str:
+    """Compute a stable fingerprint for (tool_name, args) used by the circuit breaker."""
+    args_key = json.dumps(args, sort_keys=True, default=str)
+    return f"{tool_name}:{args_key}"
+
+
+def _check_circuit_breaker(tool_name: str, args: dict[str, Any]) -> str | None:
+    """Check if a tool has hit the consecutive failure limit.
+
+    Tracks failures keyed by (tool_name, args_fingerprint). Returns an error
+    message if the circuit breaker has tripped, or None if the call should proceed.
+    """
+    tracker = _consecutive_tool_failures.get(None)
+    if tracker is None:
+        return None
+
+    key = _failure_key(tool_name, args)
+    count = tracker.get(key, 0)
+    if count >= _MAX_CONSECUTIVE_TOOL_FAILURES:
+        logger.warning(
+            "Circuit breaker tripped for tool %s after %d consecutive "
+            "identical failures (args=%s)",
+            tool_name,
+            count,
+            key[len(tool_name) + 1 :][:200],
+        )
+        return (
+            f"STOP: Tool '{tool_name}' has failed {count} consecutive times with "
+            f"the same arguments. Do NOT retry this tool call. "
+            f"If you were trying to write content to a file, instead respond with "
+            f"the content directly as a text message to the user."
+        )
+    return None
+
+
+def _record_tool_failure(tool_name: str, args: dict[str, Any]) -> None:
+    """Record a tool failure for circuit breaker tracking."""
+    tracker = _consecutive_tool_failures.get(None)
+    if tracker is None:
+        return
+    key = _failure_key(tool_name, args)
+    tracker[key] = tracker.get(key, 0) + 1
+
+
+def _clear_tool_failures(tool_name: str) -> None:
+    """Clear failure tracking for a tool on success.
+
+    Clears ALL args variants for the tool, not just the successful call's args.
+    This gives the tool a "fresh start" on any success, which is appropriate for
+    the primary use case (detecting infinite loops with identical failing args).
+    """
+    tracker = _consecutive_tool_failures.get(None)
+    if tracker is None:
+        return
+    # Clear all entries for this tool name
+    keys_to_remove = [k for k in tracker if k.startswith(f"{tool_name}:")]
+    for k in keys_to_remove:
+        del tracker[k]
+
+
 def create_tool_handler(base_tool: BaseTool):
    """Create an async handler function for a BaseTool.

@@ -225,7 +455,83 @@ def create_tool_handler(base_tool: BaseTool):
    """

    async def tool_handler(args: dict[str, Any]) -> dict[str, Any]:
-        """Execute the wrapped tool and return MCP-formatted response."""
+        """Execute the wrapped tool and return MCP-formatted response.
+
+        If a pre-launched task exists (from parallel tool pre-launch in the
+        message loop), await it instead of executing fresh.
+        """
+        queues = _tool_task_queues.get()
+        if queues and base_tool.name in queues:
+            queue = queues[base_tool.name]
+            if not queue.empty():
+                task, launch_args = queue.get_nowait()
+                # Sanity-check: warn if the args don't match — this can happen
+                # if the SDK dispatches tool calls in a different order than the
+                # ToolUseBlocks appeared in the AssistantMessage (unlikely but
+                # could occur in future SDK versions or with SDK bugs).
+                # We compare full values (not just keys) so that two run_block
+                # calls with different block_id values are caught even though
+                # both have the same key set.
+                if launch_args != args:
+                    logger.warning(
+                        "Pre-launched task for %s: arg mismatch "
+                        "(launch_keys=%s, call_keys=%s) — cancelling "
+                        "pre-launched task and falling back to direct execution",
+                        base_tool.name,
+                        (
+                            sorted(launch_args.keys())
+                            if isinstance(launch_args, dict)
+                            else type(launch_args).__name__
+                        ),
+                        (
+                            sorted(args.keys())
+                            if isinstance(args, dict)
+                            else type(args).__name__
+                        ),
+                    )
+                    if not task.done():
+                        task.cancel()
+                        # Await cancellation to prevent duplicate concurrent
+                        # execution for blocks with side effects.
+                        try:
+                            await task
+                        except (asyncio.CancelledError, Exception):
+                            pass
+                    # Fall through to the direct-execution path below.
+                else:
+                    # Args match — await the pre-launched task.
+                    try:
+                        result = await task
+                    except asyncio.CancelledError:
+                        # Re-raise: CancelledError may be propagating from the
+                        # outer streaming loop being cancelled — swallowing it
+                        # would mask the cancellation and prevent proper cleanup.
+                        logger.warning(
+                            "Pre-launched tool %s was cancelled — re-raising",
+                            base_tool.name,
+                        )
+                        raise
+                    except Exception as e:
+                        logger.error(
+                            "Pre-launched tool %s failed: %s",
+                            base_tool.name,
+                            e,
+                            exc_info=True,
+                        )
+                        return _mcp_error(
+                            f"Failed to execute {base_tool.name}. "
+                            "Check server logs for details."
+                        )
+
+                    # Pre-truncate the result so the _truncating wrapper (which
+                    # wraps this handler) receives an already-within-budget
+                    # value. _truncating handles stashing — we must NOT stash
+                    # here or the output will be appended twice to the FIFO
+                    # queue and pop_pending_tool_output would return a duplicate
+                    # entry on the second call for the same tool.
+                    return truncate(result, _MCP_MAX_CHARS)
+
+        # No pre-launched task — execute directly (fallback for non-parallel calls).
        user_id, session = get_execution_context()

        if session is None:
@@ -234,8 +540,12 @@ def create_tool_handler(base_tool: BaseTool):
        try:
            return await _execute_tool_sync(base_tool, user_id, session, args)
        except Exception as e:
-            logger.error(f"Error executing tool {base_tool.name}: {e}", exc_info=True)
-            return _mcp_error(f"Failed to execute {base_tool.name}: {e}")
+            logger.error(
+                "Error executing tool %s: %s", base_tool.name, e, exc_info=True
+            )
+            return _mcp_error(
+                f"Failed to execute {base_tool.name}. Check server logs for details."
+            )

    return tool_handler

@@ -358,6 +668,15 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
        Applied once to every registered tool."""

        async def wrapper(args: dict[str, Any]) -> dict[str, Any]:
+            # Circuit breaker: stop infinite retry loops with identical args.
+            # Use the original (pre-expansion) args for fingerprinting so
+            # check and record always use the same key — @@agptfile:
+            # expansion mutates args, which would cause a key mismatch.
+            original_args = args
+            stop_msg = _check_circuit_breaker(tool_name, original_args)
+            if stop_msg:
+                return _mcp_error(stop_msg)
+
            user_id, session = get_execution_context()
            if session is not None:
                try:
@@ -365,6 +684,7 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
                        args, user_id, session, input_schema=input_schema
                    )
                except FileRefExpansionError as exc:
+                    _record_tool_failure(tool_name, original_args)
                    return _mcp_error(
                        f"@@agptfile: reference could not be resolved: {exc}. "
                        "Ensure the file exists before referencing it. "
@@ -374,6 +694,12 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
            result = await fn(args)
            truncated = truncate(result, _MCP_MAX_CHARS)

+            # Track consecutive failures for circuit breaker
+            if truncated.get("isError"):
+                _record_tool_failure(tool_name, original_args)
+            else:
+                _clear_tool_failures(tool_name)
+
            # Stash the text so the response adapter can forward our
            # middle-out truncated version to the frontend instead of the
            # SDK's head-truncated version (for outputs >~100 KB the SDK
--- a/autogpt_platform/backend/backend/copilot/sdk/tool_adapter_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/tool_adapter_test.py
@@ -1,16 +1,26 @@
-"""Tests for tool_adapter helpers: truncation, stash, context vars."""
+"""Tests for tool_adapter helpers: truncation, stash, context vars, parallel pre-launch."""
+
+import asyncio
+from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

 from backend.copilot.context import get_sdk_cwd
+from backend.copilot.response_model import StreamToolOutputAvailable
+from backend.copilot.sdk.file_ref import FileRefExpansionError
 from backend.util.truncate import truncate

 from .tool_adapter import (
    _MCP_MAX_CHARS,
    _text_from_mcp_result,
+    cancel_pending_tool_tasks,
+    create_tool_handler,
    pop_pending_tool_output,
+    pre_launch_tool_call,
+    reset_stash_event,
    set_execution_context,
    stash_pending_tool_output,
+    wait_for_stash,
 )

 # ---------------------------------------------------------------------------
@@ -120,6 +130,69 @@ class TestToolOutputStash:
        assert pop_pending_tool_output("a") == "alpha"


+# ---------------------------------------------------------------------------
+# reset_stash_event / wait_for_stash
+# ---------------------------------------------------------------------------
+
+
+class TestResetStashEvent:
+    """Tests for reset_stash_event — the stale-signal fix for retry attempts."""
+
+    @pytest.fixture(autouse=True)
+    def _init_context(self):
+        set_execution_context(
+            user_id="test",
+            session=None,  # type: ignore[arg-type]
+            sandbox=None,
+        )
+
+    @pytest.mark.asyncio
+    async def test_reset_clears_stale_signal(self):
+        """After reset, wait_for_stash does NOT return immediately (blocks until timeout)."""
+        # Simulate a stale signal left by a failed attempt's PostToolUse hook.
+        stash_pending_tool_output("some_tool", "stale output")
+        # The stash_pending_tool_output call sets the event.
+        # Now reset it — simulating start of a new retry attempt.
+        reset_stash_event()
+        # wait_for_stash should block and time out since the event was cleared.
+        result = await wait_for_stash(timeout=0.05)
+        assert result is False, (
+            "wait_for_stash should have timed out after reset_stash_event, "
+            "but it returned True — stale signal was not cleared"
+        )
+
+    @pytest.mark.asyncio
+    async def test_wait_returns_true_when_signaled_after_reset(self):
+        """After reset, a new stash signal is correctly detected."""
+        reset_stash_event()
+
+        async def _signal_after_delay():
+            await asyncio.sleep(0.01)
+            stash_pending_tool_output("tool", "fresh output")
+
+        asyncio.create_task(_signal_after_delay())
+        result = await wait_for_stash(timeout=1.0)
+        assert result is True
+
+    @pytest.mark.asyncio
+    async def test_retry_scenario_stale_event_does_not_fire_prematurely(self):
+        """Simulates: attempt 1 leaves event set → reset → attempt 2 waits correctly."""
+        # Attempt 1: hook fires and sets the event
+        stash_pending_tool_output("t", "attempt-1-output")
+        # Pop it so the stash is empty (simulating normal consumption)
+        pop_pending_tool_output("t")
+
+        # Between attempts: reset (as service.py does before each retry)
+        reset_stash_event()
+
+        # Attempt 2: wait_for_stash should NOT return True immediately
+        result = await wait_for_stash(timeout=0.05)
+        assert result is False, (
+            "Stale event from attempt 1 caused wait_for_stash to return "
+            "prematurely in attempt 2"
+        )
+
+
 # ---------------------------------------------------------------------------
 # _truncating wrapper (integration via create_copilot_mcp_server)
 # ---------------------------------------------------------------------------
@@ -168,3 +241,534 @@ class TestTruncationAndStashIntegration:
        text = _text_from_mcp_result(truncated)
        assert len(text) < len(big_text)
        assert len(str(truncated)) <= _MCP_MAX_CHARS
+
+
+# ---------------------------------------------------------------------------
+# Parallel pre-launch infrastructure
+# ---------------------------------------------------------------------------
+
+
+def _make_mock_tool(name: str, output: str = "result") -> MagicMock:
+    """Return a BaseTool mock that returns a successful StreamToolOutputAvailable."""
+    tool = MagicMock()
+    tool.name = name
+    tool.parameters = {"properties": {}, "required": []}
+    tool.execute = AsyncMock(
+        return_value=StreamToolOutputAvailable(
+            toolCallId="test-id",
+            output=output,
+            toolName=name,
+            success=True,
+        )
+    )
+    return tool
+
+
+def _make_mock_session() -> MagicMock:
+    """Return a minimal ChatSession mock."""
+    return MagicMock()
+
+
+def _init_ctx(session=None):
+    set_execution_context(
+        user_id="user-1",
+        session=session,  # type: ignore[arg-type]
+        sandbox=None,
+    )
+
+
+class TestPreLaunchToolCall:
+    """Tests for pre_launch_tool_call and the queue-based parallel dispatch."""
+
+    @pytest.fixture(autouse=True)
+    def _init(self):
+        _init_ctx(session=_make_mock_session())
+
+    @pytest.mark.asyncio
+    async def test_unknown_tool_is_silently_ignored(self):
+        """pre_launch_tool_call does nothing for tools not in TOOL_REGISTRY."""
+        # Should not raise even if the tool name is completely unknown
+        await pre_launch_tool_call("nonexistent_tool", {})
+
+    @pytest.mark.asyncio
+    async def test_mcp_prefix_stripped_before_registry_lookup(self):
+        """mcp__copilot__run_block is looked up as 'run_block'."""
+        mock_tool = _make_mock_tool("run_block")
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("mcp__copilot__run_block", {"block_id": "b1"})
+
+        # The task was enqueued — mock_tool.execute should be called once
+        # (may not complete immediately but should start)
+        await asyncio.sleep(0)  # yield to event loop
+        mock_tool.execute.assert_awaited_once()
+
+    @pytest.mark.asyncio
+    async def test_bare_tool_name_without_prefix(self):
+        """Tool names without __ separator are looked up as-is."""
+        mock_tool = _make_mock_tool("run_block")
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("run_block", {"block_id": "b1"})
+
+        await asyncio.sleep(0)
+        mock_tool.execute.assert_awaited_once()
+
+    @pytest.mark.asyncio
+    async def test_task_enqueued_fifo_for_same_tool(self):
+        """Two pre-launched calls for the same tool name are enqueued FIFO."""
+        results = []
+
+        async def slow_execute(*args, **kwargs):
+            results.append(len(results))
+            return StreamToolOutputAvailable(
+                toolCallId="id",
+                output=str(len(results) - 1),
+                toolName="t",
+                success=True,
+            )
+
+        mock_tool = _make_mock_tool("t")
+        mock_tool.execute = AsyncMock(side_effect=slow_execute)
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"t": mock_tool},
+        ):
+            await pre_launch_tool_call("t", {"n": 1})
+            await pre_launch_tool_call("t", {"n": 2})
+            await asyncio.sleep(0)
+
+        assert mock_tool.execute.await_count == 2
+
+    @pytest.mark.asyncio
+    async def test_file_ref_expansion_failure_skips_pre_launch(self):
+        """When @@agptfile: expansion fails, pre_launch_tool_call skips the task.
+
+        The handler should then fall back to direct execution (which will also
+        fail with a proper MCP error via _truncating's own expansion).
+        """
+        mock_tool = _make_mock_tool("run_block", output="should-not-execute")
+
+        with (
+            patch(
+                "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+                {"run_block": mock_tool},
+            ),
+            patch(
+                "backend.copilot.sdk.tool_adapter.expand_file_refs_in_args",
+                AsyncMock(side_effect=FileRefExpansionError("@@agptfile:missing.txt")),
+            ),
+        ):
+            # Should not raise — expansion failure is handled gracefully
+            await pre_launch_tool_call("run_block", {"text": "@@agptfile:missing.txt"})
+            await asyncio.sleep(0)
+
+        # No task was pre-launched — execute was not called
+        mock_tool.execute.assert_not_awaited()
+
+
+class TestCreateToolHandlerParallel:
+    """Tests for create_tool_handler using pre-launched tasks."""
+
+    @pytest.fixture(autouse=True)
+    def _init(self):
+        _init_ctx(session=_make_mock_session())
+
+    @pytest.mark.asyncio
+    async def test_handler_uses_prelaunched_task(self):
+        """Handler pops and awaits the pre-launched task rather than re-executing."""
+        mock_tool = _make_mock_tool("run_block", output="pre-launched result")
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("run_block", {"block_id": "b1"})
+            await asyncio.sleep(0)  # let task start
+
+            handler = create_tool_handler(mock_tool)
+            result = await handler({"block_id": "b1"})
+
+        assert result["isError"] is False
+        text = result["content"][0]["text"]
+        assert "pre-launched result" in text
+        # Should only have been called once (the pre-launched task), not twice
+        mock_tool.execute.assert_awaited_once()
+
+    @pytest.mark.asyncio
+    async def test_handler_does_not_double_stash_for_prelaunched_task(self):
+        """Pre-launched task result must NOT be stashed by tool_handler directly.
+
+        The _truncating wrapper wraps tool_handler and handles stashing after
+        tool_handler returns.  If tool_handler also stashed, the output would be
+        appended twice to the FIFO queue and pop_pending_tool_output would return
+        a duplicate on the second call.
+
+        This test calls tool_handler directly (without _truncating) and asserts
+        that nothing was stashed — confirming stashing is deferred to _truncating.
+        """
+        mock_tool = _make_mock_tool("run_block", output="stash-me")
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("run_block", {"block_id": "b1"})
+            await asyncio.sleep(0)
+
+            handler = create_tool_handler(mock_tool)
+            result = await handler({"block_id": "b1"})
+
+        assert result["isError"] is False
+        assert "stash-me" in result["content"][0]["text"]
+        # tool_handler must NOT stash — _truncating (which wraps handler) does it.
+        # Calling pop here (without going through _truncating) should return None.
+        not_stashed = pop_pending_tool_output("run_block")
+        assert not_stashed is None, (
+            "tool_handler must not stash directly — _truncating handles stashing "
+            "to prevent double-stash in the FIFO queue"
+        )
+
+    @pytest.mark.asyncio
+    async def test_handler_falls_back_when_queue_empty(self):
+        """When no pre-launched task exists, handler executes directly."""
+        mock_tool = _make_mock_tool("run_block", output="direct result")
+
+        # Don't call pre_launch_tool_call — queue is empty
+        handler = create_tool_handler(mock_tool)
+        result = await handler({"block_id": "b1"})
+
+        assert result["isError"] is False
+        text = result["content"][0]["text"]
+        assert "direct result" in text
+        mock_tool.execute.assert_awaited_once()
+
+    @pytest.mark.asyncio
+    async def test_handler_cancelled_error_propagates(self):
+        """CancelledError from a pre-launched task is re-raised to preserve cancellation semantics."""
+        mock_tool = _make_mock_tool("run_block")
+        mock_tool.execute = AsyncMock(side_effect=asyncio.CancelledError())
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("run_block", {"block_id": "b1"})
+            await asyncio.sleep(0)
+
+            handler = create_tool_handler(mock_tool)
+            with pytest.raises(asyncio.CancelledError):
+                await handler({"block_id": "b1"})
+
+    @pytest.mark.asyncio
+    async def test_handler_exception_returns_mcp_error(self):
+        """Exception from a pre-launched task is caught and returned as MCP error."""
+        mock_tool = _make_mock_tool("run_block")
+        mock_tool.execute = AsyncMock(side_effect=RuntimeError("block exploded"))
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("run_block", {"block_id": "b1"})
+            await asyncio.sleep(0)
+
+            handler = create_tool_handler(mock_tool)
+            result = await handler({"block_id": "b1"})
+
+        assert result["isError"] is True
+        assert "Failed to execute run_block" in result["content"][0]["text"]
+
+    @pytest.mark.asyncio
+    async def test_two_same_tool_calls_dispatched_in_order(self):
+        """Two pre-launched tasks for the same tool are consumed in FIFO order."""
+        call_order = []
+
+        async def execute_with_tag(*args, **kwargs):
+            tag = kwargs.get("block_id", "?")
+            call_order.append(tag)
+            return StreamToolOutputAvailable(
+                toolCallId="id", output=f"out-{tag}", toolName="run_block", success=True
+            )
+
+        mock_tool = _make_mock_tool("run_block")
+        mock_tool.execute = AsyncMock(side_effect=execute_with_tag)
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("run_block", {"block_id": "first"})
+            await pre_launch_tool_call("run_block", {"block_id": "second"})
+            await asyncio.sleep(0)
+
+            handler = create_tool_handler(mock_tool)
+            r1 = await handler({"block_id": "first"})
+            r2 = await handler({"block_id": "second"})
+
+        assert "out-first" in r1["content"][0]["text"]
+        assert "out-second" in r2["content"][0]["text"]
+        assert call_order == [
+            "first",
+            "second",
+        ], f"Expected FIFO dispatch order but got {call_order}"
+
+    @pytest.mark.asyncio
+    async def test_arg_mismatch_falls_back_to_direct_execution(self):
+        """When pre-launched args differ from SDK args, handler cancels pre-launched
+        task and falls back to direct execution with the correct args."""
+        mock_tool = _make_mock_tool("run_block", output="direct-result")
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            # Pre-launch with args {"block_id": "wrong"}
+            await pre_launch_tool_call("run_block", {"block_id": "wrong"})
+            await asyncio.sleep(0)
+
+            # SDK dispatches with different args
+            handler = create_tool_handler(mock_tool)
+            result = await handler({"block_id": "correct"})
+
+        assert result["isError"] is False
+        # The tool was called twice: once by pre-launch (wrong args), once by
+        # direct fallback (correct args). The result should come from the
+        # direct execution path.
+        assert mock_tool.execute.await_count == 2
+
+    @pytest.mark.asyncio
+    async def test_no_session_falls_back_gracefully(self):
+        """When session is None and no pre-launched task, handler returns MCP error."""
+        mock_tool = _make_mock_tool("run_block")
+        # session=None means get_execution_context returns (user_id, None)
+        set_execution_context(user_id="u", session=None, sandbox=None)  # type: ignore[arg-type]
+
+        handler = create_tool_handler(mock_tool)
+        result = await handler({"block_id": "b1"})
+
+        assert result["isError"] is True
+        assert "session" in result["content"][0]["text"].lower()
+
+
+# ---------------------------------------------------------------------------
+# cancel_pending_tool_tasks
+# ---------------------------------------------------------------------------
+
+
+class TestCancelPendingToolTasks:
+    """Tests for cancel_pending_tool_tasks — the stream-abort cleanup helper."""
+
+    @pytest.fixture(autouse=True)
+    def _init(self):
+        _init_ctx(session=_make_mock_session())
+
+    @pytest.mark.asyncio
+    async def test_cancels_queued_tasks(self):
+        """Queued tasks are cancelled and the queue is cleared."""
+        ran = False
+
+        async def never_run(*_args, **_kwargs):
+            nonlocal ran
+            await asyncio.sleep(10)  # long enough to still be pending
+            ran = True
+
+        mock_tool = _make_mock_tool("run_block")
+        mock_tool.execute = AsyncMock(side_effect=never_run)
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("run_block", {"block_id": "b1"})
+            await asyncio.sleep(0)  # let task start
+            await cancel_pending_tool_tasks()
+            await asyncio.sleep(0)  # let cancellation propagate
+
+        assert not ran, "Task should have been cancelled before completing"
+
+    @pytest.mark.asyncio
+    async def test_noop_when_no_tasks_queued(self):
+        """cancel_pending_tool_tasks does not raise when queues are empty."""
+        await cancel_pending_tool_tasks()  # should not raise
+
+    @pytest.mark.asyncio
+    async def test_handler_does_not_find_cancelled_task(self):
+        """After cancel, tool_handler falls back to direct execution."""
+        mock_tool = _make_mock_tool("run_block", output="direct-fallback")
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"run_block": mock_tool},
+        ):
+            await pre_launch_tool_call("run_block", {"block_id": "b1"})
+            await asyncio.sleep(0)
+            await cancel_pending_tool_tasks()
+
+            # Queue is now empty — handler should execute directly
+            handler = create_tool_handler(mock_tool)
+            result = await handler({"block_id": "b1"})
+
+        assert result["isError"] is False
+        assert "direct-fallback" in result["content"][0]["text"]
+
+
+# ---------------------------------------------------------------------------
+# Concurrent / parallel pre-launch scenarios
+# ---------------------------------------------------------------------------
+
+
+class TestAllParallelToolsPrelaunchedIndependently:
+    """Simulate SDK sending N separate AssistantMessages for the same tool concurrently."""
+
+    @pytest.fixture(autouse=True)
+    def _init(self):
+        _init_ctx(session=_make_mock_session())
+
+    @pytest.mark.asyncio
+    async def test_all_parallel_tools_prelaunched_independently(self):
+        """5 pre-launches for the same tool all enqueue independently and run concurrently.
+
+        Each task sleeps for PER_TASK_S seconds. If they ran sequentially the total
+        wall time would be ~5*PER_TASK_S. Running concurrently it should finish in
+        roughly PER_TASK_S (plus scheduling overhead).
+        """
+        PER_TASK_S = 0.05
+        N = 5
+        started: list[int] = []
+        finished: list[int] = []
+
+        async def slow_execute(*args, **kwargs):
+            idx = len(started)
+            started.append(idx)
+            await asyncio.sleep(PER_TASK_S)
+            finished.append(idx)
+            return StreamToolOutputAvailable(
+                toolCallId=f"id-{idx}",
+                output=f"result-{idx}",
+                toolName="bash_exec",
+                success=True,
+            )
+
+        mock_tool = _make_mock_tool("bash_exec")
+        mock_tool.execute = AsyncMock(side_effect=slow_execute)
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"bash_exec": mock_tool},
+        ):
+            for i in range(N):
+                await pre_launch_tool_call("bash_exec", {"cmd": f"echo {i}"})
+
+            # Measure only the concurrent execution window, not pre-launch overhead.
+            # Starting the timer here avoids false failures on slow CI runners where
+            # the pre_launch_tool_call setup takes longer than the concurrent sleep.
+            t0 = asyncio.get_running_loop().time()
+            await asyncio.sleep(PER_TASK_S * 2)
+            elapsed = asyncio.get_running_loop().time() - t0
+
+        assert mock_tool.execute.await_count == N
+        assert len(finished) == N
+        # Wall time of the sleep window should be well under N * PER_TASK_S
+        # (sequential would be ~0.25s; concurrent finishes in ~PER_TASK_S = 0.05s)
+        assert elapsed < N * PER_TASK_S, (
+            f"Expected concurrent execution (<{N * PER_TASK_S:.2f}s) "
+            f"but sleep window took {elapsed:.2f}s"
+        )
+
+
+class TestHandlerReturnsResultFromCorrectPrelaunchedTask:
+    """Pop pre-launched tasks in order and verify each returns its own result."""
+
+    @pytest.fixture(autouse=True)
+    def _init(self):
+        _init_ctx(session=_make_mock_session())
+
+    @pytest.mark.asyncio
+    async def test_handler_returns_result_from_correct_prelaunched_task(self):
+        """Two pre-launches for the same tool: first handler gets first result, second gets second."""
+
+        async def execute_with_cmd(*args, **kwargs):
+            cmd = kwargs.get("cmd", "?")
+            return StreamToolOutputAvailable(
+                toolCallId="id",
+                output=f"output-for-{cmd}",
+                toolName="bash_exec",
+                success=True,
+            )
+
+        mock_tool = _make_mock_tool("bash_exec")
+        mock_tool.execute = AsyncMock(side_effect=execute_with_cmd)
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"bash_exec": mock_tool},
+        ):
+            await pre_launch_tool_call("bash_exec", {"cmd": "alpha"})
+            await pre_launch_tool_call("bash_exec", {"cmd": "beta"})
+            await asyncio.sleep(0)  # let both tasks start
+
+            handler = create_tool_handler(mock_tool)
+            r1 = await handler({"cmd": "alpha"})
+            r2 = await handler({"cmd": "beta"})
+
+        text1 = r1["content"][0]["text"]
+        text2 = r2["content"][0]["text"]
+        assert "output-for-alpha" in text1, f"Expected alpha result, got: {text1}"
+        assert "output-for-beta" in text2, f"Expected beta result, got: {text2}"
+        assert mock_tool.execute.await_count == 2
+
+
+class TestFiveConcurrentPrelaunchAllComplete:
+    """Pre-launch 5 tasks; consume all 5 via handlers; assert all succeed."""
+
+    @pytest.fixture(autouse=True)
+    def _init(self):
+        _init_ctx(session=_make_mock_session())
+
+    @pytest.mark.asyncio
+    async def test_five_concurrent_prelaunch_all_complete(self):
+        """All 5 pre-launched tasks complete and return successful results."""
+        N = 5
+        call_count = 0
+
+        async def counting_execute(*args, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            n = call_count
+            return StreamToolOutputAvailable(
+                toolCallId=f"id-{n}",
+                output=f"done-{n}",
+                toolName="bash_exec",
+                success=True,
+            )
+
+        mock_tool = _make_mock_tool("bash_exec")
+        mock_tool.execute = AsyncMock(side_effect=counting_execute)
+
+        with patch(
+            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
+            {"bash_exec": mock_tool},
+        ):
+            for i in range(N):
+                await pre_launch_tool_call("bash_exec", {"cmd": f"task-{i}"})
+
+            await asyncio.sleep(0)  # let all tasks start
+
+            handler = create_tool_handler(mock_tool)
+            results = []
+            for i in range(N):
+                results.append(await handler({"cmd": f"task-{i}"}))
+
+        assert (
+            mock_tool.execute.await_count == N
+        ), f"Expected {N} execute calls, got {mock_tool.execute.await_count}"
+        for i, result in enumerate(results):
+            assert result["isError"] is False, f"Result {i} should not be an error"
+            text = result["content"][0]["text"]
+            assert "done-" in text, f"Result {i} missing expected output: {text}"
--- a/autogpt_platform/backend/backend/copilot/tools/add_understanding.py
+++ b/autogpt_platform/backend/backend/copilot/tools/add_understanding.py
@@ -22,13 +22,12 @@ class AddUnderstandingTool(BaseTool):

    @property
    def description(self) -> str:
-        return """Capture and store information about the user's business context,
-workflows, pain points, and automation goals. Call this tool whenever the user
-shares information about their business. Each call incrementally adds to the
-existing understanding - you don't need to provide all fields at once.
-
-Use this to build a comprehensive profile that helps recommend better agents
-and automations for the user's specific needs."""
+        return (
+            "Store user's business context, workflows, pain points, and automation goals. "
+            "Call whenever the user shares business info. Each call incrementally merges "
+            "with existing data — provide only the fields you have. "
+            "Builds a profile that helps recommend better agents for the user's needs."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/agent_browser.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_browser.py
@@ -20,9 +20,9 @@ SSRF protection:

 Requires:
  npm install -g agent-browser
-  agent-browser install   (downloads Chromium, one-time — skipped in Docker
-                           where system chromium is pre-installed and
-                           AGENT_BROWSER_EXECUTABLE_PATH is set)
+  In Docker: system chromium package with AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
+             (set automatically — no `agent-browser install` needed).
+  Locally: run `agent-browser install` to download Chromium.
 """

 import asyncio
@@ -410,18 +410,11 @@ class BrowserNavigateTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Navigate to a URL using a real browser. Returns an accessibility "
-            "tree snapshot listing the page's interactive elements with @ref IDs "
-            "(e.g. @e3) that can be used with browser_act. "
-            "Session persists — cookies and login state carry over between calls. "
-            "Use this (with browser_act) for multi-step interaction: login flows, "
-            "form filling, button clicks, or anything requiring page interaction. "
-            "For plain static pages, prefer web_fetch — no browser overhead. "
-            "For authenticated pages: navigate to the login page first, use browser_act "
-            "to fill credentials and submit, then navigate to the target page. "
-            "Note: for slow SPAs, the returned snapshot may reflect a partially-loaded "
-            "state. If elements seem missing, use browser_act with action='wait' and a "
-            "CSS selector or millisecond delay, then take a browser_screenshot to verify."
+            "Navigate to a URL in a real browser. Returns accessibility tree with @ref IDs "
+            "for browser_act. Session persists (cookies/auth carry over). "
+            "For static pages, prefer web_fetch. "
+            "For SPAs, elements may load late — use browser_act with wait + browser_screenshot to verify. "
+            "For auth: navigate to login, fill creds and submit with browser_act, then navigate to target."
        )

    @property
@@ -431,13 +424,13 @@ class BrowserNavigateTool(BaseTool):
            "properties": {
                "url": {
                    "type": "string",
-                    "description": "The HTTP/HTTPS URL to navigate to.",
+                    "description": "HTTP/HTTPS URL to navigate to.",
                },
                "wait_for": {
                    "type": "string",
                    "enum": ["networkidle", "load", "domcontentloaded"],
                    "default": "networkidle",
-                    "description": "When to consider navigation complete. Use 'networkidle' for SPAs (default).",
+                    "description": "Navigation completion strategy (default: networkidle).",
                },
            },
            "required": ["url"],
@@ -556,14 +549,12 @@ class BrowserActTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Interact with the current browser page. Use @ref IDs from the "
-            "snapshot (e.g. '@e3') to target elements. Returns an updated snapshot. "
-            "Supported actions: click, dblclick, fill, type, scroll, hover, press, "
+            "Interact with the current browser page using @ref IDs from the snapshot. "
+            "Actions: click, dblclick, fill, type, scroll, hover, press, "
            "check, uncheck, select, wait, back, forward, reload. "
-            "fill clears the field before typing; type appends without clearing. "
-            "wait accepts a CSS selector (waits for element) or milliseconds string (e.g. '1000'). "
-            "Example login flow: fill @e1 with email → fill @e2 with password → "
-            "click @e3 (submit) → browser_navigate to the target page."
+            "fill clears field first; type appends. "
+            "wait accepts CSS selector or milliseconds (e.g. '1000'). "
+            "Returns updated snapshot."
        )

    @property
@@ -589,30 +580,21 @@ class BrowserActTool(BaseTool):
                        "forward",
                        "reload",
                    ],
-                    "description": "The action to perform.",
+                    "description": "Action to perform.",
                },
                "target": {
                    "type": "string",
-                    "description": (
-                        "Element to target. Use @ref from snapshot (e.g. '@e3'), "
-                        "a CSS selector, or a text description. "
-                        "Required for: click, dblclick, fill, type, hover, check, uncheck, select. "
-                        "For wait: a CSS selector to wait for, or milliseconds as a string (e.g. '1000')."
-                    ),
+                    "description": "@ref ID (e.g. '@e3'), CSS selector, or text. Required for: click, dblclick, fill, type, hover, check, uncheck, select. For wait: CSS selector or milliseconds string (e.g. '1000').",
                },
                "value": {
                    "type": "string",
-                    "description": (
-                        "For fill/type: the text to enter. "
-                        "For press: key name (e.g. 'Enter', 'Tab', 'Control+a'). "
-                        "For select: the option value to select."
-                    ),
+                    "description": "Text for fill/type, key for press (e.g. 'Enter'), option for select.",
                },
                "direction": {
                    "type": "string",
                    "enum": ["up", "down", "left", "right"],
                    "default": "down",
-                    "description": "For scroll: direction to scroll.",
+                    "description": "Scroll direction (default: down).",
                },
            },
            "required": ["action"],
@@ -759,12 +741,10 @@ class BrowserScreenshotTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Take a screenshot of the current browser page and save it to the workspace. "
-            "IMPORTANT: After calling this tool, immediately call read_workspace_file "
-            "with the returned file_id to display the image inline to the user — "
-            "the screenshot is not visible until you do this. "
-            "With annotate=true (default), @ref labels are overlaid on interactive "
-            "elements, making it easy to see which @ref ID maps to which element on screen."
+            "Screenshot the current browser page and save to workspace. "
+            "annotate=true overlays @ref labels on elements. "
+            "IMPORTANT: After calling, you MUST immediately call read_workspace_file with the "
+            "returned file_id to display the image inline."
        )

    @property
@@ -775,12 +755,12 @@ class BrowserScreenshotTool(BaseTool):
                "annotate": {
                    "type": "boolean",
                    "default": True,
-                    "description": "Overlay @ref labels on interactive elements (default: true).",
+                    "description": "Overlay @ref labels (default: true).",
                },
                "filename": {
                    "type": "string",
                    "default": "screenshot.png",
-                    "description": "Filename to save in the workspace.",
+                    "description": "Workspace filename (default: screenshot.png).",
                },
            },
        }
--- a/autogpt_platform/backend/backend/copilot/tools/agent_browser_integration_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_browser_integration_test.py
@@ -0,0 +1,351 @@
+"""Integration tests for agent-browser + system chromium.
+
+These tests actually invoke the agent-browser binary via subprocess and require:
+  - agent-browser installed (npm install -g agent-browser)
+  - AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium (set in Docker)
+
+Run with:
+    poetry run test
+
+Or to run only this file:
+    poetry run pytest backend/copilot/tools/agent_browser_integration_test.py -v -p no:autogpt_platform
+
+Skipped automatically when agent-browser binary is not found.
+Tests that hit external sites are marked ``integration`` and skipped by default
+in CI (use ``-m integration`` to include them).
+
+Two test tiers:
+  - CLI tests: call agent-browser subprocess directly (no backend imports needed)
+  - Tool class tests: call BrowserNavigateTool/BrowserActTool._execute() directly
+    with user_id=None (skips workspace/DB interactions — no Postgres/RabbitMQ needed)
+"""
+
+import concurrent.futures
+import os
+import shutil
+import subprocess
+import tempfile
+from datetime import datetime, timezone
+from urllib.parse import urlparse
+
+import pytest
+
+from backend.copilot.model import ChatSession
+from backend.copilot.tools.agent_browser import BrowserActTool, BrowserNavigateTool
+from backend.copilot.tools.models import (
+    BrowserActResponse,
+    BrowserNavigateResponse,
+    ErrorResponse,
+)
+
+pytestmark = pytest.mark.skipif(
+    shutil.which("agent-browser") is None,
+    reason="agent-browser binary not found",
+)
+
+_SESSION = "integration-test-session"
+
+
+def _agent_browser(
+    *args: str, session: str = _SESSION, timeout: int = 30
+) -> tuple[int, str, str]:
+    """Run agent-browser for the given session, return (rc, stdout, stderr)."""
+    result = subprocess.run(
+        ["agent-browser", "--session", session, "--session-name", session, *args],
+        capture_output=True,
+        text=True,
+        timeout=timeout,
+    )
+    return result.returncode, result.stdout, result.stderr
+
+
+def _close_session(session: str, timeout: int = 5) -> None:
+    """Best-effort close for a browser session; never raises on failure."""
+    try:
+        subprocess.run(
+            ["agent-browser", "--session", session, "--session-name", session, "close"],
+            capture_output=True,
+            timeout=timeout,
+        )
+    except (subprocess.TimeoutExpired, OSError):
+        pass
+
+
+@pytest.fixture(autouse=True)
+def _teardown():
+    """Close the shared test session after each test (best-effort)."""
+    yield
+    _close_session(_SESSION)
+
+
+# ---------------------------------------------------------------------------
+# Tests
+# ---------------------------------------------------------------------------
+
+
+def test_chromium_executable_env_is_set():
+    """AGENT_BROWSER_EXECUTABLE_PATH must be set and point to an executable binary."""
+    exe = os.environ.get("AGENT_BROWSER_EXECUTABLE_PATH", "")
+    assert exe, "AGENT_BROWSER_EXECUTABLE_PATH is not set"
+    assert os.path.isfile(exe), f"Chromium binary not found at {exe}"
+    assert os.access(exe, os.X_OK), f"Chromium binary at {exe} is not executable"
+
+
+@pytest.mark.integration
+def test_navigate_returns_success():
+    """agent-browser can open a public URL using system chromium."""
+    rc, _, stderr = _agent_browser("open", "https://example.com")
+    assert rc == 0, f"open failed (rc={rc}): {stderr}"
+
+
+@pytest.mark.integration
+def test_get_title_after_navigate():
+    """get title returns the page title after navigation."""
+    rc, _, _ = _agent_browser("open", "https://example.com")
+    assert rc == 0
+
+    rc, stdout, stderr = _agent_browser("get", "title", timeout=10)
+    assert rc == 0, f"get title failed: {stderr}"
+    assert "example" in stdout.lower()
+
+
+@pytest.mark.integration
+def test_get_url_after_navigate():
+    """get url returns the navigated URL."""
+    rc, _, _ = _agent_browser("open", "https://example.com")
+    assert rc == 0
+
+    rc, stdout, stderr = _agent_browser("get", "url", timeout=10)
+    assert rc == 0, f"get url failed: {stderr}"
+    assert urlparse(stdout.strip()).netloc == "example.com"
+
+
+@pytest.mark.integration
+def test_snapshot_returns_interactive_elements():
+    """snapshot -i -c lists interactive elements on the page."""
+    rc, _, _ = _agent_browser("open", "https://example.com")
+    assert rc == 0
+
+    rc, stdout, stderr = _agent_browser("snapshot", "-i", "-c", timeout=15)
+    assert rc == 0, f"snapshot failed: {stderr}"
+    assert len(stdout.strip()) > 0, "snapshot returned empty output"
+
+
+@pytest.mark.integration
+def test_screenshot_produces_valid_png():
+    """screenshot saves a non-empty, valid PNG file."""
+    rc, _, _ = _agent_browser("open", "https://example.com")
+    assert rc == 0
+
+    with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as f:
+        tmp = f.name
+    try:
+        rc, _, stderr = _agent_browser("screenshot", tmp, timeout=15)
+        assert rc == 0, f"screenshot failed: {stderr}"
+        size = os.path.getsize(tmp)
+        assert size > 1000, f"PNG too small ({size} bytes) — likely blank or corrupt"
+        with open(tmp, "rb") as f:
+            assert f.read(4) == b"\x89PNG", "Output is not a valid PNG"
+    finally:
+        os.unlink(tmp)
+
+
+@pytest.mark.integration
+def test_scroll_down():
+    """scroll down succeeds without error."""
+    rc, _, _ = _agent_browser("open", "https://example.com")
+    assert rc == 0
+
+    rc, _, stderr = _agent_browser("scroll", "down", timeout=10)
+    assert rc == 0, f"scroll failed: {stderr}"
+
+
+@pytest.mark.integration
+def test_fill_form_field():
+    """fill writes text into an input field."""
+    rc, _, _ = _agent_browser("open", "https://httpbin.org/forms/post")
+    assert rc == 0
+
+    rc, _, stderr = _agent_browser(
+        "fill", "input[name=custname]", "IntegrationTestUser", timeout=10
+    )
+    assert rc == 0, f"fill failed: {stderr}"
+
+
+@pytest.mark.integration
+def test_concurrent_independent_sessions():
+    """Two independent sessions can navigate in parallel without interference."""
+    session_a = "integration-concurrent-a"
+    session_b = "integration-concurrent-b"
+
+    try:
+        with concurrent.futures.ThreadPoolExecutor(max_workers=2) as pool:
+            fut_a = pool.submit(
+                _agent_browser, "open", "https://example.com", session=session_a
+            )
+            fut_b = pool.submit(
+                _agent_browser, "open", "https://httpbin.org/html", session=session_b
+            )
+            rc_a, _, err_a = fut_a.result(timeout=40)
+            rc_b, _, err_b = fut_b.result(timeout=40)
+        assert rc_a == 0, f"session_a open failed: {err_a}"
+        assert rc_b == 0, f"session_b open failed: {err_b}"
+
+        rc_ua, url_a, err_ua = _agent_browser(
+            "get", "url", session=session_a, timeout=10
+        )
+        rc_ub, url_b, err_ub = _agent_browser(
+            "get", "url", session=session_b, timeout=10
+        )
+        assert rc_ua == 0, f"session_a get url failed: {err_ua}"
+        assert rc_ub == 0, f"session_b get url failed: {err_ub}"
+        assert urlparse(url_a.strip()).netloc == "example.com"
+        assert urlparse(url_b.strip()).netloc == "httpbin.org"
+    finally:
+        _close_session(session_a)
+        _close_session(session_b)
+
+
+@pytest.mark.integration
+def test_close_session():
+    """close shuts down the browser daemon cleanly."""
+    rc, _, _ = _agent_browser("open", "https://example.com")
+    assert rc == 0
+
+    rc, _, stderr = _agent_browser("close", timeout=10)
+    assert rc == 0, f"close failed: {stderr}"
+
+
+# ---------------------------------------------------------------------------
+# Python tool class integration tests
+#
+# These tests exercise the actual BrowserNavigateTool / BrowserActTool Python
+# classes (not just the CLI binary) to verify the full call path — URL
+# validation, subprocess dispatch, response parsing — works with system
+# chromium.  user_id=None skips workspace/DB interactions so no Postgres or
+# RabbitMQ is needed.
+# ---------------------------------------------------------------------------
+
+_TOOL_SESSION_ID = "integration-tool-test-session"
+_TEST_SESSION = ChatSession(
+    session_id=_TOOL_SESSION_ID,
+    user_id="test-user",
+    messages=[],
+    usage=[],
+    started_at=datetime.now(timezone.utc),
+    updated_at=datetime.now(timezone.utc),
+)
+
+
+@pytest.fixture(autouse=False)
+def _close_tool_session():
+    """Tear down the tool-test browser session after each tool test."""
+    yield
+    _close_session(_TOOL_SESSION_ID)
+
+
+@pytest.mark.integration
+@pytest.mark.asyncio
+async def test_tool_navigate_returns_response(_close_tool_session):
+    """BrowserNavigateTool._execute returns a BrowserNavigateResponse with real content."""
+    tool = BrowserNavigateTool()
+    resp = await tool._execute(
+        user_id=None, session=_TEST_SESSION, url="https://example.com"
+    )
+    assert isinstance(
+        resp, BrowserNavigateResponse
+    ), f"Expected BrowserNavigateResponse, got: {resp}"
+    assert urlparse(resp.url).netloc == "example.com"
+    assert resp.title, "Expected non-empty page title"
+    assert resp.snapshot, "Expected non-empty accessibility snapshot"
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize(
+    "ssrf_url",
+    [
+        "http://169.254.169.254/",  # AWS/GCP/Azure metadata endpoint
+        "http://127.0.0.1/",  # IPv4 loopback
+        "http://10.0.0.1/",  # RFC-1918 private range
+        "http://[::1]/",  # IPv6 loopback
+        "http://0.0.0.0/",  # Wildcard / INADDR_ANY
+    ],
+)
+async def test_tool_navigate_blocked_url(ssrf_url: str, _close_tool_session):
+    """BrowserNavigateTool._execute rejects internal/private URLs (SSRF guard)."""
+    tool = BrowserNavigateTool()
+    resp = await tool._execute(user_id=None, session=_TEST_SESSION, url=ssrf_url)
+    assert isinstance(
+        resp, ErrorResponse
+    ), f"Expected ErrorResponse for SSRF URL {ssrf_url!r}, got: {resp}"
+    assert resp.error == "blocked_url"
+
+
+@pytest.mark.asyncio
+async def test_tool_navigate_missing_url(_close_tool_session):
+    """BrowserNavigateTool._execute returns an error when url is empty."""
+    tool = BrowserNavigateTool()
+    resp = await tool._execute(user_id=None, session=_TEST_SESSION, url="")
+    assert isinstance(resp, ErrorResponse)
+    assert resp.error == "missing_url"
+
+
+@pytest.mark.integration
+@pytest.mark.asyncio
+async def test_tool_act_scroll(_close_tool_session):
+    """BrowserActTool._execute can scroll after a navigate."""
+    nav = BrowserNavigateTool()
+    nav_resp = await nav._execute(
+        user_id=None, session=_TEST_SESSION, url="https://example.com"
+    )
+    assert isinstance(nav_resp, BrowserNavigateResponse)
+
+    act = BrowserActTool()
+    resp = await act._execute(
+        user_id=None, session=_TEST_SESSION, action="scroll", direction="down"
+    )
+    assert isinstance(
+        resp, BrowserActResponse
+    ), f"Expected BrowserActResponse, got: {resp}"
+    assert resp.action == "scroll"
+
+
+@pytest.mark.integration
+@pytest.mark.asyncio
+async def test_tool_act_fill_and_click(_close_tool_session):
+    """BrowserActTool._execute can fill a form field."""
+    nav = BrowserNavigateTool()
+    nav_resp = await nav._execute(
+        user_id=None, session=_TEST_SESSION, url="https://httpbin.org/forms/post"
+    )
+    assert isinstance(nav_resp, BrowserNavigateResponse)
+
+    act = BrowserActTool()
+    resp = await act._execute(
+        user_id=None,
+        session=_TEST_SESSION,
+        action="fill",
+        target="input[name=custname]",
+        value="ToolIntegrationTest",
+    )
+    assert isinstance(resp, BrowserActResponse), f"fill failed: {resp}"
+
+
+@pytest.mark.asyncio
+async def test_tool_act_missing_action(_close_tool_session):
+    """BrowserActTool._execute returns an error when action is missing."""
+    act = BrowserActTool()
+    resp = await act._execute(user_id=None, session=_TEST_SESSION, action="")
+    assert isinstance(resp, ErrorResponse)
+    assert resp.error == "missing_action"
+
+
+@pytest.mark.asyncio
+async def test_tool_act_missing_target(_close_tool_session):
+    """BrowserActTool._execute returns an error when click target is missing."""
+    act = BrowserActTool()
+    resp = await act._execute(
+        user_id=None, session=_TEST_SESSION, action="click", target=""
+    )
+    assert isinstance(resp, ErrorResponse)
+    assert resp.error == "missing_target"
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
@@ -7,7 +7,7 @@ from typing import Any
 from .helpers import (
    AGENT_EXECUTOR_BLOCK_ID,
    MCP_TOOL_BLOCK_ID,
-    SMART_DECISION_MAKER_BLOCK_ID,
+    TOOL_ORCHESTRATOR_BLOCK_ID,
    AgentDict,
    are_types_compatible,
    generate_uuid,
@@ -31,7 +31,7 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
 _GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
 _TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"

-# Defaults applied to SmartDecisionMakerBlock nodes by the fixer.
+# Defaults applied to OrchestratorBlock nodes by the fixer.
 _SDM_DEFAULTS: dict[str, int | bool] = {
    "agent_mode_max_iterations": 10,
    "conversation_compaction": True,
@@ -1639,8 +1639,8 @@ class AgentFixer:

        return agent

-    def fix_smart_decision_maker_blocks(self, agent: AgentDict) -> AgentDict:
-        """Fix SmartDecisionMakerBlock nodes to ensure agent-mode defaults.
+    def fix_orchestrator_blocks(self, agent: AgentDict) -> AgentDict:
+        """Fix OrchestratorBlock nodes to ensure agent-mode defaults.

        Ensures:
        1. ``agent_mode_max_iterations`` defaults to ``10`` (bounded agent mode)
@@ -1657,7 +1657,7 @@ class AgentFixer:
        nodes = agent.get("nodes", [])

        for node in nodes:
-            if node.get("block_id") != SMART_DECISION_MAKER_BLOCK_ID:
+            if node.get("block_id") != TOOL_ORCHESTRATOR_BLOCK_ID:
                continue

            node_id = node.get("id", "unknown")
@@ -1670,7 +1670,7 @@ class AgentFixer:
                if field not in input_default or input_default[field] is None:
                    input_default[field] = default_value
                    self.add_fix_log(
-                        f"SmartDecisionMakerBlock {node_id}: "
+                        f"OrchestratorBlock {node_id}: "
                        f"Set {field}={default_value!r}"
                    )

@@ -1763,8 +1763,8 @@ class AgentFixer:
        # Apply fixes for MCPToolBlock nodes
        agent = self.fix_mcp_tool_blocks(agent)

-        # Apply fixes for SmartDecisionMakerBlock nodes (agent-mode defaults)
-        agent = self.fix_smart_decision_maker_blocks(agent)
+        # Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
+        agent = self.fix_orchestrator_blocks(agent)

        # Apply fixes for AgentExecutorBlock nodes (sub-agents)
        if library_agents:
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/helpers.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/helpers.py
@@ -12,7 +12,7 @@ __all__ = [
    "AGENT_OUTPUT_BLOCK_ID",
    "AgentDict",
    "MCP_TOOL_BLOCK_ID",
-    "SMART_DECISION_MAKER_BLOCK_ID",
+    "TOOL_ORCHESTRATOR_BLOCK_ID",
    "UUID_REGEX",
    "are_types_compatible",
    "generate_uuid",
@@ -34,7 +34,7 @@ UUID_REGEX = re.compile(r"^" + UUID_RE_STR + r"$")

 AGENT_EXECUTOR_BLOCK_ID = "e189baac-8c20-45a1-94a7-55177ea42565"
 MCP_TOOL_BLOCK_ID = "a0a4b1c2-d3e4-4f56-a7b8-c9d0e1f2a3b4"
-SMART_DECISION_MAKER_BLOCK_ID = "3b191d9f-356f-482d-8238-ba04b6d18381"
+TOOL_ORCHESTRATOR_BLOCK_ID = "3b191d9f-356f-482d-8238-ba04b6d18381"
 AGENT_INPUT_BLOCK_ID = "c0a8e994-ebf1-4a9c-a4d8-89d09c86741b"
 AGENT_OUTPUT_BLOCK_ID = "363ae599-353e-4804-937e-b2ee3cef3da4"

--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/validator.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/validator.py
@@ -10,7 +10,7 @@ from .helpers import (
    AGENT_INPUT_BLOCK_ID,
    AGENT_OUTPUT_BLOCK_ID,
    MCP_TOOL_BLOCK_ID,
-    SMART_DECISION_MAKER_BLOCK_ID,
+    TOOL_ORCHESTRATOR_BLOCK_ID,
    AgentDict,
    are_types_compatible,
    get_defined_property_type,
@@ -827,18 +827,18 @@ class AgentValidator:

        return valid

-    def validate_smart_decision_maker_blocks(
+    def validate_orchestrator_blocks(
        self,
        agent: AgentDict,
        node_lookup: dict[str, dict[str, Any]] | None = None,
    ) -> bool:
-        """Validate that SmartDecisionMakerBlock nodes have downstream tools.
+        """Validate that OrchestratorBlock nodes have downstream tools.

-        Checks that each SmartDecisionMakerBlock node has at least one link
+        Checks that each OrchestratorBlock node has at least one link
        with ``source_name == "tools"`` connecting to a downstream block.
        Without tools, the block has nothing to call and will error at runtime.

-        Returns True if all SmartDecisionMakerBlock nodes are valid.
+        Returns True if all OrchestratorBlock nodes are valid.
        """
        valid = True
        nodes = agent.get("nodes", [])
@@ -848,7 +848,7 @@ class AgentValidator:
        non_tool_block_ids = {AGENT_INPUT_BLOCK_ID, AGENT_OUTPUT_BLOCK_ID}

        for node in nodes:
-            if node.get("block_id") != SMART_DECISION_MAKER_BLOCK_ID:
+            if node.get("block_id") != TOOL_ORCHESTRATOR_BLOCK_ID:
                continue

            node_id = node.get("id", "unknown")
@@ -863,7 +863,7 @@ class AgentValidator:
            max_iter = input_default.get("agent_mode_max_iterations")
            if max_iter is not None and not isinstance(max_iter, int):
                self.add_error(
-                    f"SmartDecisionMakerBlock node '{customized_name}' "
+                    f"OrchestratorBlock node '{customized_name}' "
                    f"({node_id}) has non-integer "
                    f"agent_mode_max_iterations={max_iter!r}. "
                    f"This field must be an integer."
@@ -871,7 +871,7 @@ class AgentValidator:
                valid = False
            elif isinstance(max_iter, int) and max_iter < -1:
                self.add_error(
-                    f"SmartDecisionMakerBlock node '{customized_name}' "
+                    f"OrchestratorBlock node '{customized_name}' "
                    f"({node_id}) has invalid "
                    f"agent_mode_max_iterations={max_iter}. "
                    f"Use -1 for infinite or a positive number for "
@@ -880,7 +880,7 @@ class AgentValidator:
                valid = False
            elif isinstance(max_iter, int) and max_iter > 100:
                self.add_error(
-                    f"SmartDecisionMakerBlock node '{customized_name}' "
+                    f"OrchestratorBlock node '{customized_name}' "
                    f"({node_id}) has agent_mode_max_iterations="
                    f"{max_iter} which is unusually high. Values above "
                    f"100 risk excessive cost and long execution times. "
@@ -890,7 +890,7 @@ class AgentValidator:
                valid = False
            elif max_iter == 0:
                self.add_error(
-                    f"SmartDecisionMakerBlock node '{customized_name}' "
+                    f"OrchestratorBlock node '{customized_name}' "
                    f"({node_id}) has agent_mode_max_iterations=0 "
                    f"(traditional mode). The agent generator only supports "
                    f"agent mode (set to -1 for infinite or a positive "
@@ -908,7 +908,7 @@ class AgentValidator:

            if not has_tools:
                self.add_error(
-                    f"SmartDecisionMakerBlock node '{customized_name}' "
+                    f"OrchestratorBlock node '{customized_name}' "
                    f"({node_id}) has no downstream tool blocks connected. "
                    f"Connect at least one block to its 'tools' output so "
                    f"the AI has tools to call."
@@ -1025,8 +1025,8 @@ class AgentValidator:
                self.validate_mcp_tool_blocks(agent),
            ),
            (
-                "SmartDecisionMaker blocks",
-                self.validate_smart_decision_maker_blocks(agent, node_lookup),
+                "Orchestrator blocks",
+                self.validate_orchestrator_blocks(agent, node_lookup),
            ),
        ]

--- a/autogpt_platform/backend/backend/copilot/tools/agent_output.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_output.py
@@ -108,22 +108,12 @@ class AgentOutputTool(BaseTool):

    @property
    def description(self) -> str:
-        return """Retrieve execution outputs from agents in the user's library.
-
-        Identify the agent using one of:
-        - agent_name: Fuzzy search in user's library
-        - library_agent_id: Exact library agent ID
-        - store_slug: Marketplace format 'username/agent-name'
-
-        Select which run to retrieve using:
-        - execution_id: Specific execution ID
-        - run_time: 'latest' (default), 'yesterday', 'last week', or ISO date 'YYYY-MM-DD'
-
-        Wait for completion (optional):
-        - wait_if_running: Max seconds to wait if execution is still running (0-300).
-          If the execution is running/queued, waits up to this many seconds for completion.
-          Returns current status on timeout. If already finished, returns immediately.
-        """
+        return (
+            "Retrieve execution outputs from a library agent. "
+            "Identify by agent_name, library_agent_id, or store_slug. "
+            "Filter by execution_id or run_time. "
+            "Optionally wait for running executions."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
@@ -132,32 +122,29 @@ class AgentOutputTool(BaseTool):
            "properties": {
                "agent_name": {
                    "type": "string",
-                    "description": "Agent name to search for in user's library (fuzzy match)",
+                    "description": "Agent name (fuzzy match).",
                },
                "library_agent_id": {
                    "type": "string",
-                    "description": "Exact library agent ID",
+                    "description": "Library agent ID.",
                },
                "store_slug": {
                    "type": "string",
-                    "description": "Marketplace identifier: 'username/agent-slug'",
+                    "description": "Marketplace 'username/agent-name'.",
                },
                "execution_id": {
                    "type": "string",
-                    "description": "Specific execution ID to retrieve",
+                    "description": "Specific execution ID.",
                },
                "run_time": {
                    "type": "string",
-                    "description": (
-                        "Time filter: 'latest', 'yesterday', 'last week', or 'YYYY-MM-DD'"
-                    ),
+                    "description": "Time filter: 'latest', 'today', 'yesterday', 'last week', 'last 7 days', 'last month', 'last 30 days', 'YYYY-MM-DD', or ISO datetime.",
                },
                "wait_if_running": {
                    "type": "integer",
-                    "description": (
-                        "Max seconds to wait if execution is still running (0-300). "
-                        "If running, waits for completion. Returns current state on timeout."
-                    ),
+                    "description": "Max seconds to wait if still running (0-300). Returns current state on timeout.",
+                    "minimum": 0,
+                    "maximum": 300,
                },
            },
            "required": [],
--- a/autogpt_platform/backend/backend/copilot/tools/bash_exec.py
+++ b/autogpt_platform/backend/backend/copilot/tools/bash_exec.py
@@ -42,15 +42,9 @@ class BashExecTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Execute a Bash command or script. "
-            "Full Bash scripting is supported (loops, conditionals, pipes, "
-            "functions, etc.). "
-            "The working directory is shared with the SDK Read/Write/Edit/Glob/Grep "
-            "tools — files created by either are immediately visible to both. "
-            "Execution is killed after the timeout (default 30s, max 120s). "
-            "Returns stdout and stderr. "
-            "Useful for file manipulation, data processing, running scripts, "
-            "and installing packages."
+            "Execute a Bash command or script. Shares filesystem with SDK file tools. "
+            "Useful for scripts, data processing, and package installation. "
+            "Killed after timeout (default 30s, max 120s)."
        )

    @property
@@ -60,13 +54,11 @@ class BashExecTool(BaseTool):
            "properties": {
                "command": {
                    "type": "string",
-                    "description": "Bash command or script to execute.",
+                    "description": "Bash command or script.",
                },
                "timeout": {
                    "type": "integer",
-                    "description": (
-                        "Max execution time in seconds (default 30, max 120)."
-                    ),
+                    "description": "Max seconds (default 30, max 120).",
                    "default": 30,
                },
            },
--- a/autogpt_platform/backend/backend/copilot/tools/conftest.py
+++ b/autogpt_platform/backend/backend/copilot/tools/conftest.py
@@ -0,0 +1,20 @@
+"""Local conftest for copilot/tools tests.
+
+Overrides the session-scoped `server` and `graph_cleanup` autouse fixtures from
+backend/conftest.py so that integration tests in this directory do not trigger
+the full SpinTestServer startup (which requires Postgres + RabbitMQ).
+"""
+
+import pytest_asyncio
+
+
+@pytest_asyncio.fixture(scope="session", loop_scope="session")
+async def server():  # type: ignore[override]
+    """No-op server stub — tools tests don't need the full backend."""
+    return None
+
+
+@pytest_asyncio.fixture(scope="session", loop_scope="session", autouse=True)
+async def graph_cleanup():  # type: ignore[override]
+    """No-op graph cleanup stub."""
+    yield
--- a/autogpt_platform/backend/backend/copilot/tools/continue_run_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/continue_run_block.py
@@ -30,12 +30,7 @@ class ContinueRunBlockTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Continue executing a block after human review approval. "
-            "Use this after a run_block call returned review_required. "
-            "Pass the review_id from the review_required response. "
-            "The block will execute with the original pre-approved input data."
-        )
+        return "Resume block execution after a run_block call returned review_required. Pass the review_id."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -44,10 +39,7 @@ class ContinueRunBlockTool(BaseTool):
            "properties": {
                "review_id": {
                    "type": "string",
-                    "description": (
-                        "The review_id from a previous review_required response. "
-                        "This resumes execution with the pre-approved input data."
-                    ),
+                    "description": "review_id from the review_required response.",
                },
            },
            "required": ["review_id"],
--- a/autogpt_platform/backend/backend/copilot/tools/create_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/create_agent.py
@@ -23,12 +23,8 @@ class CreateAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Create a new agent workflow. Pass `agent_json` with the complete "
-            "agent graph JSON you generated using block schemas from find_block. "
-            "The tool validates, auto-fixes, and saves.\n\n"
-            "IMPORTANT: Before calling this tool, search for relevant existing agents "
-            "using find_library_agent that could be used as building blocks. "
-            "Pass their IDs in the library_agent_ids parameter."
+            "Create a new agent from JSON (nodes + links). Validates, auto-fixes, and saves. "
+            "Before calling, search for existing agents with find_library_agent."
        )

    @property
@@ -42,34 +38,21 @@ class CreateAgentTool(BaseTool):
            "properties": {
                "agent_json": {
                    "type": "object",
-                    "description": (
-                        "The agent JSON to validate and save. "
-                        "Must contain 'nodes' and 'links' arrays, and optionally "
-                        "'name' and 'description'."
-                    ),
+                    "description": "Agent graph with 'nodes' and 'links' arrays.",
                },
                "library_agent_ids": {
                    "type": "array",
                    "items": {"type": "string"},
-                    "description": (
-                        "List of library agent IDs to use as building blocks."
-                    ),
+                    "description": "Library agent IDs as building blocks.",
                },
                "save": {
                    "type": "boolean",
-                    "description": (
-                        "Whether to save the agent. Default is true. "
-                        "Set to false for preview only."
-                    ),
+                    "description": "Save the agent (default: true). False for preview.",
                    "default": True,
                },
                "folder_id": {
                    "type": "string",
-                    "description": (
-                        "Optional folder ID to save the agent into. "
-                        "If not provided, the agent is saved at root level. "
-                        "Use list_folders to find available folders."
-                    ),
+                    "description": "Folder ID to save into (default: root).",
                },
            },
            "required": ["agent_json"],
--- a/autogpt_platform/backend/backend/copilot/tools/customize_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/customize_agent.py
@@ -23,9 +23,7 @@ class CustomizeAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Customize a marketplace or template agent. Pass `agent_json` "
-            "with the complete customized agent JSON. The tool validates, "
-            "auto-fixes, and saves."
+            "Customize a marketplace/template agent. Validates, auto-fixes, and saves."
        )

    @property
@@ -39,32 +37,21 @@ class CustomizeAgentTool(BaseTool):
            "properties": {
                "agent_json": {
                    "type": "object",
-                    "description": (
-                        "Complete customized agent JSON to validate and save. "
-                        "Optionally include 'name' and 'description'."
-                    ),
+                    "description": "Customized agent JSON with nodes and links.",
                },
                "library_agent_ids": {
                    "type": "array",
                    "items": {"type": "string"},
-                    "description": (
-                        "List of library agent IDs to use as building blocks."
-                    ),
+                    "description": "Library agent IDs as building blocks.",
                },
                "save": {
                    "type": "boolean",
-                    "description": (
-                        "Whether to save the customized agent. Default is true."
-                    ),
+                    "description": "Save the agent (default: true). False for preview.",
                    "default": True,
                },
                "folder_id": {
                    "type": "string",
-                    "description": (
-                        "Optional folder ID to save the agent into. "
-                        "If not provided, the agent is saved at root level. "
-                        "Use list_folders to find available folders."
-                    ),
+                    "description": "Folder ID to save into (default: root).",
                },
            },
            "required": ["agent_json"],
--- a/autogpt_platform/backend/backend/copilot/tools/edit_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/edit_agent.py
@@ -23,12 +23,8 @@ class EditAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Edit an existing agent. Pass `agent_json` with the complete "
-            "updated agent JSON you generated. The tool validates, auto-fixes, "
-            "and saves.\n\n"
-            "IMPORTANT: Before calling this tool, if the changes involve adding new "
-            "functionality, search for relevant existing agents using find_library_agent "
-            "that could be used as building blocks."
+            "Edit an existing agent. Validates, auto-fixes, and saves. "
+            "Before calling, search for existing agents with find_library_agent."
        )

    @property
@@ -42,33 +38,20 @@ class EditAgentTool(BaseTool):
            "properties": {
                "agent_id": {
                    "type": "string",
-                    "description": (
-                        "The ID of the agent to edit. "
-                        "Can be a graph ID or library agent ID."
-                    ),
+                    "description": "Graph ID or library agent ID to edit.",
                },
                "agent_json": {
                    "type": "object",
-                    "description": (
-                        "Complete updated agent JSON to validate and save. "
-                        "Must contain 'nodes' and 'links'. "
-                        "Include 'name' and/or 'description' if they need "
-                        "to be updated."
-                    ),
+                    "description": "Updated agent JSON with nodes and links.",
                },
                "library_agent_ids": {
                    "type": "array",
                    "items": {"type": "string"},
-                    "description": (
-                        "List of library agent IDs to use as building blocks for the changes."
-                    ),
+                    "description": "Library agent IDs as building blocks.",
                },
                "save": {
                    "type": "boolean",
-                    "description": (
-                        "Whether to save the changes. "
-                        "Default is true. Set to false for preview only."
-                    ),
+                    "description": "Save changes (default: true). False for preview.",
                    "default": True,
                },
            },
--- a/autogpt_platform/backend/backend/copilot/tools/feature_requests.py
+++ b/autogpt_platform/backend/backend/copilot/tools/feature_requests.py
@@ -134,11 +134,7 @@ class SearchFeatureRequestsTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Search existing feature requests to check if a similar request "
-            "already exists before creating a new one. Returns matching feature "
-            "requests with their ID, title, and description."
-        )
+        return "Search existing feature requests. Check before creating a new one."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -234,14 +230,9 @@ class CreateFeatureRequestTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Create a new feature request or add a customer need to an existing one. "
-            "Always search first with search_feature_requests to avoid duplicates. "
-            "If a matching request exists, pass its ID as existing_issue_id to add "
-            "the user's need to it instead of creating a duplicate. "
-            "IMPORTANT: Never include personally identifiable information (PII) in "
-            "the title or description — no names, emails, phone numbers, company "
-            "names, or other identifying details. Write titles and descriptions in "
-            "generic, feature-focused language."
+            "Create a feature request or add need to existing one. "
+            "Search first to avoid duplicates. Pass existing_issue_id to add to existing. "
+            "Never include PII (names, emails, phone numbers, company names) in title/description."
        )

    @property
@@ -251,28 +242,15 @@ class CreateFeatureRequestTool(BaseTool):
            "properties": {
                "title": {
                    "type": "string",
-                    "description": (
-                        "Title for the feature request. Must be generic and "
-                        "feature-focused — do not include any user names, emails, "
-                        "company names, or other PII."
-                    ),
+                    "description": "Feature request title. No names, emails, or company info.",
                },
                "description": {
                    "type": "string",
-                    "description": (
-                        "Detailed description of what the user wants and why. "
-                        "Must not contain any personally identifiable information "
-                        "(PII) — describe the feature need generically without "
-                        "referencing specific users, companies, or contact details."
-                    ),
+                    "description": "What the user wants and why. No names, emails, or company info.",
                },
                "existing_issue_id": {
                    "type": "string",
-                    "description": (
-                        "If adding a need to an existing feature request, "
-                        "provide its Linear issue ID (from search results). "
-                        "Omit to create a new feature request."
-                    ),
+                    "description": "Linear issue ID to add need to (from search results).",
                },
            },
            "required": ["title", "description"],
--- a/autogpt_platform/backend/backend/copilot/tools/find_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_agent.py
@@ -18,10 +18,7 @@ class FindAgentTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Discover agents from the marketplace based on capabilities and "
-            "user needs, or look up a specific agent by its creator/slug ID."
-        )
+        return "Search marketplace agents by capability, or look up by slug ('username/agent-name')."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -30,7 +27,7 @@ class FindAgentTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": "Search query describing what the user wants to accomplish, or a creator/slug ID (e.g. 'username/agent-name') for direct lookup. Use single keywords for best results.",
+                    "description": "Search keywords, or 'username/agent-name' for direct slug lookup.",
                },
            },
            "required": ["query"],
--- a/autogpt_platform/backend/backend/copilot/tools/find_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_block.py
@@ -5,6 +5,7 @@ from prisma.enums import ContentType

 from backend.blocks import get_block
 from backend.blocks._base import BlockType
+from backend.copilot.context import get_current_permissions
 from backend.copilot.model import ChatSession
 from backend.data.db_accessors import search

@@ -38,7 +39,7 @@ COPILOT_EXCLUDED_BLOCK_TYPES = {

 # Specific block IDs excluded from CoPilot (STANDARD type but still require graph context)
 COPILOT_EXCLUDED_BLOCK_IDS = {
-    # SmartDecisionMakerBlock - dynamically discovers downstream blocks via graph topology;
+    # OrchestratorBlock - dynamically discovers downstream blocks via graph topology;
    # usable in agent graphs (guide hardcodes its ID) but cannot run standalone.
    "3b191d9f-356f-482d-8238-ba04b6d18381",
 }
@@ -54,13 +55,9 @@ class FindBlockTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Search for available blocks by name or description, or look up a "
-            "specific block by its ID. "
-            "Blocks are reusable components that perform specific tasks like "
-            "sending emails, making API calls, processing text, etc. "
-            "IMPORTANT: Use this tool FIRST to get the block's 'id' before calling run_block. "
-            "The response includes each block's id, name, and description. "
-            "Call run_block with the block's id **with no inputs** to see detailed inputs/outputs and execute it."
+            "Search blocks by name or description. Returns block IDs for run_block. "
+            "Always call this FIRST to get block IDs before using run_block. "
+            "Then call run_block with the block's id and empty input_data to see its detailed schema."
        )

    @property
@@ -70,19 +67,11 @@ class FindBlockTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": (
-                        "Search query to find blocks by name or description, "
-                        "or a block ID (UUID) for direct lookup. "
-                        "Use keywords like 'email', 'http', 'text', 'ai', etc."
-                    ),
+                    "description": "Search keywords (e.g. 'email', 'http', 'ai').",
                },
                "include_schemas": {
                    "type": "boolean",
-                    "description": (
-                        "If true, include full input_schema and output_schema "
-                        "for each block. Use when generating agent JSON that "
-                        "needs block schemas. Default is false."
-                    ),
+                    "description": "Include full input/output schemas (for agent JSON generation).",
                    "default": False,
                },
            },
@@ -161,6 +150,19 @@ class FindBlockTool(BaseTool):
                            session_id=session_id,
                        )

+                    # Check block-level permissions — hide denied blocks entirely
+                    perms = get_current_permissions()
+                    if perms is not None and not perms.is_block_allowed(
+                        block.id, block.name
+                    ):
+                        return NoResultsResponse(
+                            message=f"No blocks found for '{query}'",
+                            suggestions=[
+                                "Search for an alternative block by name",
+                            ],
+                            session_id=session_id,
+                        )
+
                    summary = BlockInfoSummary(
                        id=block.id,
                        name=block.name,
@@ -207,6 +209,7 @@ class FindBlockTool(BaseTool):
                )

            # Enrich results with block information
+            perms = get_current_permissions()
            blocks: list[BlockInfoSummary] = []
            for result in results:
                block_id = result["content_id"]
@@ -223,6 +226,12 @@ class FindBlockTool(BaseTool):
                ):
                    continue

+                # Skip blocks denied by execution permissions
+                if perms is not None and not perms.is_block_allowed(
+                    block.id, block.name
+                ):
+                    continue
+
                summary = BlockInfoSummary(
                    id=block_id,
                    name=block.name,
--- a/autogpt_platform/backend/backend/copilot/tools/find_block_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_block_test.py
@@ -69,8 +69,8 @@ class TestFindBlockFiltering:
        assert BlockType.HUMAN_IN_THE_LOOP in COPILOT_EXCLUDED_BLOCK_TYPES
        assert BlockType.AGENT in COPILOT_EXCLUDED_BLOCK_TYPES

-    def test_excluded_block_ids_contains_smart_decision_maker(self):
-        """Verify SmartDecisionMakerBlock is in COPILOT_EXCLUDED_BLOCK_IDS."""
+    def test_excluded_block_ids_contains_orchestrator(self):
+        """Verify OrchestratorBlock is in COPILOT_EXCLUDED_BLOCK_IDS."""
        assert "3b191d9f-356f-482d-8238-ba04b6d18381" in COPILOT_EXCLUDED_BLOCK_IDS

    @pytest.mark.asyncio(loop_scope="session")
@@ -120,18 +120,18 @@ class TestFindBlockFiltering:

    @pytest.mark.asyncio(loop_scope="session")
    async def test_excluded_block_id_filtered_from_results(self):
-        """Verify SmartDecisionMakerBlock is filtered from search results."""
+        """Verify OrchestratorBlock is filtered from search results."""
        session = make_session(user_id=_TEST_USER_ID)

-        smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
+        orchestrator_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
        search_results = [
-            {"content_id": smart_decision_id, "score": 0.9},
+            {"content_id": orchestrator_id, "score": 0.9},
            {"content_id": "normal-block-id", "score": 0.8},
        ]

-        # SmartDecisionMakerBlock has STANDARD type but is excluded by ID
+        # OrchestratorBlock has STANDARD type but is excluded by ID
        smart_block = make_mock_block(
-            smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
+            orchestrator_id, "Orchestrator", BlockType.STANDARD
        )
        normal_block = make_mock_block(
            "normal-block-id", "Normal Block", BlockType.STANDARD
@@ -139,7 +139,7 @@ class TestFindBlockFiltering:

        def mock_get_block(block_id):
            return {
-                smart_decision_id: smart_block,
+                orchestrator_id: smart_block,
                "normal-block-id": normal_block,
            }.get(block_id)

@@ -161,7 +161,7 @@ class TestFindBlockFiltering:
                    user_id=_TEST_USER_ID, session=session, query="decision"
                )

-        # Should only return normal block, not SmartDecisionMakerBlock
+        # Should only return normal block, not OrchestratorBlock
        assert isinstance(response, BlockListResponse)
        assert len(response.blocks) == 1
        assert response.blocks[0].id == "normal-block-id"
@@ -601,10 +601,8 @@ class TestFindBlockDirectLookup:
    async def test_uuid_lookup_excluded_block_id(self):
        """UUID matching an excluded block ID returns NoResultsResponse."""
        session = make_session(user_id=_TEST_USER_ID)
-        smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
-        block = make_mock_block(
-            smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
-        )
+        orchestrator_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
+        block = make_mock_block(orchestrator_id, "Orchestrator", BlockType.STANDARD)

        with patch(
            "backend.copilot.tools.find_block.get_block",
@@ -612,7 +610,7 @@ class TestFindBlockDirectLookup:
        ):
            tool = FindBlockTool()
            response = await tool._execute(
-                user_id=_TEST_USER_ID, session=session, query=smart_decision_id
+                user_id=_TEST_USER_ID, session=session, query=orchestrator_id
            )

        from .models import NoResultsResponse
--- a/autogpt_platform/backend/backend/copilot/tools/find_library_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_library_agent.py
@@ -19,13 +19,8 @@ class FindLibraryAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Search for or list agents in the user's library. Use this to find "
-            "agents the user has already added to their library, including agents "
-            "they created or added from the marketplace. "
-            "When creating agents with sub-agent composition, use this to get "
-            "the agent's graph_id, graph_version, input_schema, and output_schema "
-            "needed for AgentExecutorBlock nodes. "
-            "Omit the query to list all agents."
+            "Search user's library agents. Returns graph_id, schemas for sub-agent composition. "
+            "Omit query to list all."
        )

    @property
@@ -35,10 +30,7 @@ class FindLibraryAgentTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": (
-                        "Search query to find agents by name or description. "
-                        "Omit to list all agents in the library."
-                    ),
+                    "description": "Search by name/description. Omit to list all.",
                },
            },
            "required": [],
--- a/autogpt_platform/backend/backend/copilot/tools/fix_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/fix_agent.py
@@ -22,20 +22,10 @@ class FixAgentGraphTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Auto-fix common issues in an agent JSON graph. Applies fixes for:\n"
-            "- Missing or invalid UUIDs on nodes and links\n"
-            "- StoreValueBlock prerequisites for ConditionBlock\n"
-            "- Double curly brace escaping in prompt templates\n"
-            "- AddToList/AddToDictionary prerequisite blocks\n"
-            "- CodeExecutionBlock output field naming\n"
-            "- Missing credentials configuration\n"
-            "- Node X coordinate spacing (800+ units apart)\n"
-            "- AI model default parameters\n"
-            "- Link static properties based on input schema\n"
-            "- Type mismatches (inserts conversion blocks)\n\n"
-            "Returns the fixed agent JSON plus a list of fixes applied. "
-            "After fixing, the agent is re-validated. If still invalid, "
-            "the remaining errors are included in the response."
+            "Auto-fix common agent JSON issues: missing/invalid UUIDs, StoreValueBlock prerequisites, "
+            "double curly brace escaping, AddToList/AddToDictionary prerequisites, credentials, "
+            "node spacing, AI model defaults, link static properties, and type mismatches. "
+            "Returns fixed JSON and list of fixes applied."
        )

    @property
--- a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
@@ -42,12 +42,7 @@ class GetAgentBuildingGuideTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Returns the complete guide for building agent JSON graphs, including "
-            "block IDs, link structure, AgentInputBlock, AgentOutputBlock, "
-            "AgentExecutorBlock (for sub-agent composition), and MCPToolBlock usage. "
-            "Call this before generating agent JSON to ensure correct structure."
-        )
+        return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/get_doc_page.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_doc_page.py
@@ -25,8 +25,7 @@ class GetDocPageTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Get the full content of a documentation page by its path. "
-            "Use this after search_docs to read the complete content of a relevant page."
+            "Read full documentation page content by path (from search_docs results)."
        )

    @property
@@ -36,10 +35,7 @@ class GetDocPageTool(BaseTool):
            "properties": {
                "path": {
                    "type": "string",
-                    "description": (
-                        "The path to the documentation file, as returned by search_docs. "
-                        "Example: 'platform/block-sdk-guide.md'"
-                    ),
+                    "description": "Doc file path (e.g. 'platform/block-sdk-guide.md').",
                },
            },
            "required": ["path"],
--- a/autogpt_platform/backend/backend/copilot/tools/get_mcp_guide.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_mcp_guide.py
@@ -38,11 +38,7 @@ class GetMCPGuideTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Returns the MCP tool guide: known hosted server URLs (Notion, Linear, "
-            "Stripe, Intercom, Cloudflare, Atlassian) and authentication workflow. "
-            "Call before using run_mcp_tool if you need a server URL or auth info."
-        )
+        return "Get MCP server URLs and auth guide. Call before run_mcp_tool if you need a server URL or auth info."

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/helpers.py
+++ b/autogpt_platform/backend/backend/copilot/tools/helpers.py
@@ -1,15 +1,24 @@
 """Shared helpers for chat tools."""

 import logging
+import uuid
 from collections import defaultdict
+from dataclasses import dataclass
 from typing import Any

 from pydantic_core import PydanticUndefined

+from backend.blocks import BlockType, get_block
 from backend.blocks._base import AnyBlockSchema
-from backend.copilot.constants import COPILOT_NODE_PREFIX, COPILOT_SESSION_PREFIX
+from backend.copilot.constants import (
+    COPILOT_NODE_EXEC_ID_SEPARATOR,
+    COPILOT_NODE_PREFIX,
+    COPILOT_SESSION_PREFIX,
+)
+from backend.copilot.model import ChatSession
+from backend.copilot.sdk.file_ref import FileRefExpansionError, expand_file_refs_in_args
 from backend.data.credit import UsageTransactionMetadata
-from backend.data.db_accessors import credit_db, workspace_db
+from backend.data.db_accessors import credit_db, review_db, workspace_db
 from backend.data.execution import ExecutionContext
 from backend.data.model import CredentialsFieldInfo, CredentialsMetaInput
 from backend.executor.utils import block_usage_cost
@@ -17,8 +26,20 @@ from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.util.exceptions import BlockError, InsufficientBalanceError
 from backend.util.type import coerce_inputs_to_schema

-from .models import BlockOutputResponse, ErrorResponse, ToolResponseBase
-from .utils import match_credentials_to_requirements
+from .models import (
+    BlockOutputResponse,
+    ErrorResponse,
+    InputValidationErrorResponse,
+    ReviewRequiredResponse,
+    SetupInfo,
+    SetupRequirementsResponse,
+    ToolResponseBase,
+    UserReadiness,
+)
+from .utils import (
+    build_missing_credentials_from_field_info,
+    match_credentials_to_requirements,
+)

 logger = logging.getLogger(__name__)

@@ -231,6 +252,286 @@ async def resolve_block_credentials(
    return await match_credentials_to_requirements(user_id, requirements)


+@dataclass
+class BlockPreparation:
+    """Result of successful block validation, ready for execution or task creation.
+
+    Attributes:
+        block: The resolved block instance (schema definition + execute method).
+        block_id: UUID of the block being prepared.
+        input_data: User-supplied input values after file-ref expansion.
+        matched_credentials: Credential field name -> resolved credential metadata.
+        input_schema: JSON Schema for the block's input, with credential
+            discriminators resolved for the user's available providers.
+        credentials_fields: Set of field names in the schema that are credential
+            inputs (e.g. ``{"credentials", "api_key"}``).
+        required_non_credential_keys: Schema-required fields minus credential
+            fields — the fields the user must supply directly.
+        provided_input_keys: Keys the user actually provided in ``input_data``.
+        synthetic_graph_id: Auto-generated graph UUID used for CoPilot
+            single-block executions (no real graph exists in the DB).
+        synthetic_node_id: Auto-generated node UUID paired with
+            ``synthetic_graph_id`` to form the execution context for the block.
+    """
+
+    block: AnyBlockSchema
+    block_id: str
+    input_data: dict[str, Any]
+    matched_credentials: dict[str, CredentialsMetaInput]
+    input_schema: dict[str, Any]
+    credentials_fields: set[str]
+    required_non_credential_keys: set[str]
+    provided_input_keys: set[str]
+    synthetic_graph_id: str
+    synthetic_node_id: str
+
+
+async def prepare_block_for_execution(
+    block_id: str,
+    input_data: dict[str, Any],
+    user_id: str,
+    session: ChatSession,
+    session_id: str,
+) -> "BlockPreparation | ToolResponseBase":
+    """Validate and prepare a block for execution.
+
+    Performs: block lookup, disabled/excluded-type checks, credential resolution,
+    input schema generation, file-ref expansion, missing-credentials check, and
+    unrecognized-field validation.
+
+    Does NOT check for missing required fields (tools differ: run_block shows a
+    schema preview) and does NOT run the HITL review check (use check_hitl_review
+    separately).
+
+    Args:
+        block_id: Block UUID to prepare.
+        input_data: Input values provided by the caller.
+        user_id: Authenticated user ID.
+        session: Current chat session (needed for file-ref expansion).
+        session_id: Chat session ID (used in error responses).
+
+    Returns:
+        BlockPreparation on success, or a ToolResponseBase error/setup response.
+    """
+    # Lazy import: find_block imports from .base and .models (siblings), not
+    # from helpers — no actual circular dependency exists today.  Kept lazy as a
+    # precaution since find_block is the block-registry module and future changes
+    # could introduce a cycle.
+    from .find_block import COPILOT_EXCLUDED_BLOCK_IDS, COPILOT_EXCLUDED_BLOCK_TYPES
+
+    block = get_block(block_id)
+    if not block:
+        return ErrorResponse(
+            message=f"Block '{block_id}' not found", session_id=session_id
+        )
+    if block.disabled:
+        return ErrorResponse(
+            message=f"Block '{block_id}' is disabled", session_id=session_id
+        )
+
+    if (
+        block.block_type in COPILOT_EXCLUDED_BLOCK_TYPES
+        or block.id in COPILOT_EXCLUDED_BLOCK_IDS
+    ):
+        if block.block_type == BlockType.MCP_TOOL:
+            hint = (
+                " Use the `run_mcp_tool` tool instead — it handles "
+                "MCP server discovery, authentication, and execution."
+            )
+        elif block.block_type == BlockType.AGENT:
+            hint = " Use the `run_agent` tool instead."
+        else:
+            hint = " This block is designed for use within graphs only."
+        return ErrorResponse(
+            message=f"Block '{block.name}' cannot be run directly.{hint}",
+            session_id=session_id,
+        )
+
+    matched_credentials, missing_credentials = await resolve_block_credentials(
+        user_id, block, input_data
+    )
+
+    try:
+        input_schema: dict[str, Any] = block.input_schema.jsonschema()
+    except Exception as e:
+        logger.warning("Failed to generate input schema for block %s: %s", block_id, e)
+        return ErrorResponse(
+            message=f"Block '{block.name}' has an invalid input schema",
+            error=str(e),
+            session_id=session_id,
+        )
+
+    # Expand @@agptfile: refs using the block's input schema so string/list
+    # fields get the correct deserialization.
+    if input_data:
+        try:
+            input_data = await expand_file_refs_in_args(
+                input_data, user_id, session, input_schema=input_schema
+            )
+        except FileRefExpansionError as exc:
+            return ErrorResponse(
+                message=(
+                    f"Failed to resolve file reference: {exc}. "
+                    "Ensure the file exists before referencing it."
+                ),
+                session_id=session_id,
+            )
+
+    credentials_fields = set(block.input_schema.get_credentials_fields().keys())
+
+    if missing_credentials:
+        credentials_fields_info = _resolve_discriminated_credentials(block, input_data)
+        missing_creds_dict = build_missing_credentials_from_field_info(
+            credentials_fields_info, set(matched_credentials.keys())
+        )
+        missing_creds_list = list(missing_creds_dict.values())
+        return SetupRequirementsResponse(
+            message=(
+                f"Block '{block.name}' requires credentials that are not configured. "
+                "Please set up the required credentials before running this block."
+            ),
+            session_id=session_id,
+            setup_info=SetupInfo(
+                agent_id=block_id,
+                agent_name=block.name,
+                user_readiness=UserReadiness(
+                    has_all_credentials=False,
+                    missing_credentials=missing_creds_dict,
+                    ready_to_run=False,
+                ),
+                requirements={
+                    "credentials": missing_creds_list,
+                    "inputs": get_inputs_from_schema(
+                        input_schema, exclude_fields=credentials_fields
+                    ),
+                    "execution_modes": ["immediate"],
+                },
+            ),
+            graph_id=None,
+            graph_version=None,
+        )
+    required_keys = set(input_schema.get("required", []))
+    required_non_credential_keys = required_keys - credentials_fields
+    provided_input_keys = set(input_data.keys()) - credentials_fields
+
+    valid_fields = set(input_schema.get("properties", {}).keys()) - credentials_fields
+    unrecognized_fields = provided_input_keys - valid_fields
+    if unrecognized_fields:
+        return InputValidationErrorResponse(
+            message=(
+                f"Unknown input field(s) provided: {', '.join(sorted(unrecognized_fields))}. "
+                "Block was not executed. Please use the correct field names from the schema."
+            ),
+            session_id=session_id,
+            unrecognized_fields=sorted(unrecognized_fields),
+            inputs=input_schema,
+        )
+
+    synthetic_graph_id = f"{COPILOT_SESSION_PREFIX}{session_id}"
+    synthetic_node_id = f"{COPILOT_NODE_PREFIX}{block_id}"
+
+    return BlockPreparation(
+        block=block,
+        block_id=block_id,
+        input_data=input_data,
+        matched_credentials=matched_credentials,
+        input_schema=input_schema,
+        credentials_fields=credentials_fields,
+        required_non_credential_keys=required_non_credential_keys,
+        provided_input_keys=provided_input_keys,
+        synthetic_graph_id=synthetic_graph_id,
+        synthetic_node_id=synthetic_node_id,
+    )
+
+
+async def check_hitl_review(
+    prep: BlockPreparation,
+    user_id: str,
+    session_id: str,
+) -> "tuple[str, dict[str, Any]] | ToolResponseBase":
+    """Check for an existing or new HITL review requirement.
+
+    If a review is needed, stores the review record and returns a
+    ReviewRequiredResponse.  Otherwise returns
+    ``(synthetic_node_exec_id, input_data)`` ready for execute_block.
+    """
+    block = prep.block
+    block_id = prep.block_id
+    synthetic_graph_id = prep.synthetic_graph_id
+    synthetic_node_id = prep.synthetic_node_id
+    input_data = prep.input_data
+
+    # Reuse an existing WAITING review for identical input (LLM retry guard)
+    existing_reviews = await review_db().get_pending_reviews_for_execution(
+        synthetic_graph_id, user_id
+    )
+    existing_review = next(
+        (
+            r
+            for r in existing_reviews
+            if r.node_id == synthetic_node_id
+            and r.status.value == "WAITING"
+            and r.payload == input_data
+        ),
+        None,
+    )
+    if existing_review:
+        return ReviewRequiredResponse(
+            message=(
+                f"Block '{block.name}' requires human review. "
+                f"After the user approves, call continue_run_block with "
+                f"review_id='{existing_review.node_exec_id}' to execute."
+            ),
+            session_id=session_id,
+            block_id=block_id,
+            block_name=block.name,
+            review_id=existing_review.node_exec_id,
+            graph_exec_id=synthetic_graph_id,
+            input_data=input_data,
+        )
+
+    synthetic_node_exec_id = (
+        f"{synthetic_node_id}{COPILOT_NODE_EXEC_ID_SEPARATOR}" f"{uuid.uuid4().hex[:8]}"
+    )
+
+    review_context = ExecutionContext(
+        user_id=user_id,
+        graph_id=synthetic_graph_id,
+        graph_exec_id=synthetic_graph_id,
+        graph_version=1,
+        node_id=synthetic_node_id,
+        node_exec_id=synthetic_node_exec_id,
+        sensitive_action_safe_mode=True,
+    )
+    should_pause, input_data = await block.is_block_exec_need_review(
+        input_data,
+        user_id=user_id,
+        node_id=synthetic_node_id,
+        node_exec_id=synthetic_node_exec_id,
+        graph_exec_id=synthetic_graph_id,
+        graph_id=synthetic_graph_id,
+        graph_version=1,
+        execution_context=review_context,
+        is_graph_execution=False,
+    )
+    if should_pause:
+        return ReviewRequiredResponse(
+            message=(
+                f"Block '{block.name}' requires human review. "
+                f"After the user approves, call continue_run_block with "
+                f"review_id='{synthetic_node_exec_id}' to execute."
+            ),
+            session_id=session_id,
+            block_id=block_id,
+            block_name=block.name,
+            review_id=synthetic_node_exec_id,
+            graph_exec_id=synthetic_graph_id,
+            input_data=input_data,
+        )
+
+    return synthetic_node_exec_id, input_data
+
+
 def _resolve_discriminated_credentials(
    block: AnyBlockSchema,
    input_data: dict[str, Any],
--- a/autogpt_platform/backend/backend/copilot/tools/helpers_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/helpers_test.py
@@ -1,4 +1,4 @@
-"""Tests for execute_block — credit charging and type coercion."""
+"""Tests for execute_block, prepare_block_for_execution, and check_hitl_review."""

 from collections.abc import AsyncIterator
 from typing import Any
@@ -7,8 +7,20 @@ from unittest.mock import AsyncMock, MagicMock, patch
 import pytest

 from backend.blocks._base import BlockType
-from backend.copilot.tools.helpers import execute_block
-from backend.copilot.tools.models import BlockOutputResponse, ErrorResponse
+from backend.copilot.constants import COPILOT_NODE_PREFIX, COPILOT_SESSION_PREFIX
+from backend.copilot.tools.helpers import (
+    BlockPreparation,
+    check_hitl_review,
+    execute_block,
+    prepare_block_for_execution,
+)
+from backend.copilot.tools.models import (
+    BlockOutputResponse,
+    ErrorResponse,
+    InputValidationErrorResponse,
+    ReviewRequiredResponse,
+    SetupRequirementsResponse,
+)

 _USER = "test-user-helpers"
 _SESSION = "test-session-helpers"
@@ -510,3 +522,341 @@ async def test_coerce_inner_elements_of_generic():
    # Inner elements should be coerced from int to str
    assert block._captured_inputs["values"] == ["1", "2", "3"]
    assert all(isinstance(v, str) for v in block._captured_inputs["values"])
+
+
+# ---------------------------------------------------------------------------
+# prepare_block_for_execution tests
+# ---------------------------------------------------------------------------
+
+_PREP_USER = "prep-user"
+_PREP_SESSION = "prep-session"
+
+
+def _make_prep_session(session_id: str = _PREP_SESSION) -> MagicMock:
+    session = MagicMock()
+    session.session_id = session_id
+    return session
+
+
+def _make_simple_block(
+    block_id: str = "blk-1",
+    name: str = "Simple Block",
+    disabled: bool = False,
+    required: list[str] | None = None,
+    properties: dict[str, Any] | None = None,
+) -> MagicMock:
+    block = MagicMock()
+    block.id = block_id
+    block.name = name
+    block.disabled = disabled
+    block.description = ""
+    block.block_type = MagicMock()
+
+    schema = {
+        "type": "object",
+        "properties": properties or {"text": {"type": "string"}},
+        "required": required or [],
+    }
+    block.input_schema.jsonschema.return_value = schema
+    block.input_schema.get_credentials_fields.return_value = {}
+    block.input_schema.get_credentials_fields_info.return_value = {}
+    return block
+
+
+def _patch_excluded(block_ids: set | None = None, block_types: set | None = None):
+    return (
+        patch(
+            "backend.copilot.tools.find_block.COPILOT_EXCLUDED_BLOCK_IDS",
+            new=block_ids or set(),
+            create=True,
+        ),
+        patch(
+            "backend.copilot.tools.find_block.COPILOT_EXCLUDED_BLOCK_TYPES",
+            new=block_types or set(),
+            create=True,
+        ),
+    )
+
+
+@pytest.mark.asyncio
+async def test_prepare_block_not_found() -> None:
+    excl_ids, excl_types = _patch_excluded()
+    with (
+        patch("backend.copilot.tools.helpers.get_block", return_value=None),
+        excl_ids,
+        excl_types,
+    ):
+        result = await prepare_block_for_execution(
+            block_id="missing",
+            input_data={},
+            user_id=_PREP_USER,
+            session=_make_prep_session(),
+            session_id=_PREP_SESSION,
+        )
+    assert isinstance(result, ErrorResponse)
+    assert "not found" in result.message
+
+
+@pytest.mark.asyncio
+async def test_prepare_block_disabled() -> None:
+    block = _make_simple_block(disabled=True)
+    excl_ids, excl_types = _patch_excluded()
+    with (
+        patch("backend.copilot.tools.helpers.get_block", return_value=block),
+        excl_ids,
+        excl_types,
+    ):
+        result = await prepare_block_for_execution(
+            block_id="blk-1",
+            input_data={},
+            user_id=_PREP_USER,
+            session=_make_prep_session(),
+            session_id=_PREP_SESSION,
+        )
+    assert isinstance(result, ErrorResponse)
+    assert "disabled" in result.message
+
+
+@pytest.mark.asyncio
+async def test_prepare_block_unrecognized_fields() -> None:
+    block = _make_simple_block(properties={"text": {"type": "string"}})
+    excl_ids, excl_types = _patch_excluded()
+    with (
+        patch("backend.copilot.tools.helpers.get_block", return_value=block),
+        excl_ids,
+        excl_types,
+        patch(
+            "backend.copilot.tools.helpers.resolve_block_credentials",
+            AsyncMock(return_value=({}, [])),
+        ),
+        patch(
+            "backend.copilot.tools.helpers.expand_file_refs_in_args",
+            AsyncMock(side_effect=lambda d, *a, **kw: d),
+        ),
+    ):
+        result = await prepare_block_for_execution(
+            block_id="blk-1",
+            input_data={"text": "hi", "unknown_field": "oops"},
+            user_id=_PREP_USER,
+            session=_make_prep_session(),
+            session_id=_PREP_SESSION,
+        )
+    assert isinstance(result, InputValidationErrorResponse)
+    assert "unknown_field" in result.unrecognized_fields
+
+
+@pytest.mark.asyncio
+async def test_prepare_block_missing_credentials() -> None:
+    block = _make_simple_block()
+    mock_cred = MagicMock()
+    excl_ids, excl_types = _patch_excluded()
+    with (
+        patch("backend.copilot.tools.helpers.get_block", return_value=block),
+        excl_ids,
+        excl_types,
+        patch(
+            "backend.copilot.tools.helpers.resolve_block_credentials",
+            AsyncMock(return_value=({}, [mock_cred])),
+        ),
+        patch(
+            "backend.copilot.tools.helpers.build_missing_credentials_from_field_info",
+            return_value={"cred_key": mock_cred},
+        ),
+    ):
+        result = await prepare_block_for_execution(
+            block_id="blk-1",
+            input_data={},
+            user_id=_PREP_USER,
+            session=_make_prep_session(),
+            session_id=_PREP_SESSION,
+        )
+    assert isinstance(result, SetupRequirementsResponse)
+
+
+@pytest.mark.asyncio
+async def test_prepare_block_success_returns_preparation() -> None:
+    block = _make_simple_block(
+        required=["text"], properties={"text": {"type": "string"}}
+    )
+    excl_ids, excl_types = _patch_excluded()
+    with (
+        patch("backend.copilot.tools.helpers.get_block", return_value=block),
+        excl_ids,
+        excl_types,
+        patch(
+            "backend.copilot.tools.helpers.resolve_block_credentials",
+            AsyncMock(return_value=({}, [])),
+        ),
+        patch(
+            "backend.copilot.tools.helpers.expand_file_refs_in_args",
+            AsyncMock(side_effect=lambda d, *a, **kw: d),
+        ),
+    ):
+        result = await prepare_block_for_execution(
+            block_id="blk-1",
+            input_data={"text": "hello"},
+            user_id=_PREP_USER,
+            session=_make_prep_session(),
+            session_id=_PREP_SESSION,
+        )
+    assert isinstance(result, BlockPreparation)
+    assert result.required_non_credential_keys == {"text"}
+    assert result.provided_input_keys == {"text"}
+
+
+# ---------------------------------------------------------------------------
+# check_hitl_review tests
+# ---------------------------------------------------------------------------
+
+
+def _make_hitl_prep(
+    block_id: str = "blk-hitl",
+    input_data: dict | None = None,
+    session_id: str = "hitl-sess",
+    needs_review: bool = False,
+) -> BlockPreparation:
+    block = MagicMock()
+    block.id = block_id
+    block.name = "HITL Block"
+    data = input_data if input_data is not None else {"action": "delete"}
+    block.is_block_exec_need_review = AsyncMock(return_value=(needs_review, data))
+    return BlockPreparation(
+        block=block,
+        block_id=block_id,
+        input_data=data,
+        matched_credentials={},
+        input_schema={},
+        credentials_fields=set(),
+        required_non_credential_keys=set(),
+        provided_input_keys=set(),
+        synthetic_graph_id=f"{COPILOT_SESSION_PREFIX}{session_id}",
+        synthetic_node_id=f"{COPILOT_NODE_PREFIX}{block_id}",
+    )
+
+
+@pytest.mark.asyncio
+async def test_check_hitl_no_review_needed() -> None:
+    prep = _make_hitl_prep(input_data={"action": "read"}, needs_review=False)
+    mock_rdb = MagicMock()
+    mock_rdb.get_pending_reviews_for_execution = AsyncMock(return_value=[])
+
+    with patch("backend.copilot.tools.helpers.review_db", return_value=mock_rdb):
+        result = await check_hitl_review(prep, "user1", "hitl-sess")
+
+    assert isinstance(result, tuple)
+    node_exec_id, returned_data = result
+    assert node_exec_id.startswith(f"{COPILOT_NODE_PREFIX}blk-hitl")
+    assert returned_data == {"action": "read"}
+
+
+@pytest.mark.asyncio
+async def test_check_hitl_review_required() -> None:
+    prep = _make_hitl_prep(input_data={"action": "delete"}, needs_review=True)
+    mock_rdb = MagicMock()
+    mock_rdb.get_pending_reviews_for_execution = AsyncMock(return_value=[])
+
+    with patch("backend.copilot.tools.helpers.review_db", return_value=mock_rdb):
+        result = await check_hitl_review(prep, "user1", "hitl-sess")
+
+    assert isinstance(result, ReviewRequiredResponse)
+    assert result.block_id == "blk-hitl"
+
+
+@pytest.mark.asyncio
+async def test_check_hitl_reuses_existing_waiting_review() -> None:
+    prep = _make_hitl_prep(input_data={"action": "delete"}, needs_review=False)
+
+    existing = MagicMock()
+    existing.node_id = prep.synthetic_node_id
+    existing.status.value = "WAITING"
+    existing.payload = {"action": "delete"}
+    existing.node_exec_id = "existing-review-42"
+
+    mock_rdb = MagicMock()
+    mock_rdb.get_pending_reviews_for_execution = AsyncMock(return_value=[existing])
+
+    with patch("backend.copilot.tools.helpers.review_db", return_value=mock_rdb):
+        result = await check_hitl_review(prep, "user1", "hitl-sess")
+
+    assert isinstance(result, ReviewRequiredResponse)
+    assert result.review_id == "existing-review-42"
+
+
+@pytest.mark.asyncio
+async def test_prepare_block_excluded_by_type() -> None:
+    """prepare_block_for_execution returns ErrorResponse for excluded block types."""
+    from backend.blocks import BlockType
+
+    block = _make_simple_block()
+    block.block_type = BlockType.AGENT
+
+    excl_ids, excl_types = _patch_excluded(block_types={BlockType.AGENT})
+    with (
+        patch("backend.copilot.tools.helpers.get_block", return_value=block),
+        excl_ids,
+        excl_types,
+    ):
+        result = await prepare_block_for_execution(
+            block_id="blk-agent",
+            input_data={},
+            user_id=_PREP_USER,
+            session=_make_prep_session(),
+            session_id=_PREP_SESSION,
+        )
+    assert isinstance(result, ErrorResponse)
+    assert "cannot be run directly" in result.message
+
+
+@pytest.mark.asyncio
+async def test_prepare_block_excluded_by_id() -> None:
+    """prepare_block_for_execution returns ErrorResponse for excluded block IDs."""
+    block = _make_simple_block(block_id="blk-excluded")
+
+    excl_ids, excl_types = _patch_excluded(block_ids={"blk-excluded"})
+    with (
+        patch("backend.copilot.tools.helpers.get_block", return_value=block),
+        excl_ids,
+        excl_types,
+    ):
+        result = await prepare_block_for_execution(
+            block_id="blk-excluded",
+            input_data={},
+            user_id=_PREP_USER,
+            session=_make_prep_session(),
+            session_id=_PREP_SESSION,
+        )
+    assert isinstance(result, ErrorResponse)
+    assert "cannot be run directly" in result.message
+
+
+@pytest.mark.asyncio
+async def test_prepare_block_file_ref_expansion_error() -> None:
+    """prepare_block_for_execution returns ErrorResponse when file-ref expansion fails."""
+    from backend.copilot.sdk.file_ref import FileRefExpansionError
+
+    block = _make_simple_block(properties={"text": {"type": "string"}})
+    excl_ids, excl_types = _patch_excluded()
+    with (
+        patch("backend.copilot.tools.helpers.get_block", return_value=block),
+        excl_ids,
+        excl_types,
+        patch(
+            "backend.copilot.tools.helpers.resolve_block_credentials",
+            AsyncMock(return_value=({}, [])),
+        ),
+        patch(
+            "backend.copilot.tools.helpers.expand_file_refs_in_args",
+            AsyncMock(
+                side_effect=FileRefExpansionError("@@agptfile:missing.txt not found")
+            ),
+        ),
+    ):
+        result = await prepare_block_for_execution(
+            block_id="blk-1",
+            input_data={"text": "@@agptfile:missing.txt"},
+            user_id=_PREP_USER,
+            session=_make_prep_session(),
+            session_id=_PREP_SESSION,
+        )
+    assert isinstance(result, ErrorResponse)
+    assert "file reference" in result.message.lower()
--- a/autogpt_platform/backend/backend/copilot/tools/manage_folders.py
+++ b/autogpt_platform/backend/backend/copilot/tools/manage_folders.py
@@ -88,10 +88,7 @@ class CreateFolderTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Create a new folder in the user's library to organize agents. "
-            "Optionally nest it inside an existing folder using parent_id."
-        )
+        return "Create a library folder. Use parent_id to nest inside another folder."

    @property
    def requires_auth(self) -> bool:
@@ -104,22 +101,19 @@ class CreateFolderTool(BaseTool):
            "properties": {
                "name": {
                    "type": "string",
-                    "description": "Name for the new folder (max 100 chars).",
+                    "description": "Folder name (max 100 chars).",
                },
                "parent_id": {
                    "type": "string",
-                    "description": (
-                        "ID of the parent folder to nest inside. "
-                        "Omit to create at root level."
-                    ),
+                    "description": "Parent folder ID (omit for root).",
                },
                "icon": {
                    "type": "string",
-                    "description": "Optional icon identifier for the folder.",
+                    "description": "Icon identifier.",
                },
                "color": {
                    "type": "string",
-                    "description": "Optional hex color code (#RRGGBB).",
+                    "description": "Hex color (#RRGGBB).",
                },
            },
            "required": ["name"],
@@ -175,13 +169,9 @@ class ListFoldersTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "List the user's library folders. "
-            "Omit parent_id to get the full folder tree. "
-            "Provide parent_id to list only direct children of that folder. "
-            "Set include_agents=true to also return the agents inside each folder "
-            "and root-level agents not in any folder. Always set include_agents=true "
-            "when the user asks about agents, wants to see what's in their folders, "
-            "or mentions agents alongside folders."
+            "List library folders. Omit parent_id for full tree. "
+            "Set include_agents=true when user asks about agents, wants to see "
+            "what's in their folders, or mentions agents alongside folders."
        )

    @property
@@ -195,17 +185,11 @@ class ListFoldersTool(BaseTool):
            "properties": {
                "parent_id": {
                    "type": "string",
-                    "description": (
-                        "List children of this folder. "
-                        "Omit to get the full folder tree."
-                    ),
+                    "description": "List children of this folder (omit for full tree).",
                },
                "include_agents": {
                    "type": "boolean",
-                    "description": (
-                        "Whether to include the list of agents inside each folder. "
-                        "Defaults to false."
-                    ),
+                    "description": "Include agents in each folder (default: false).",
                },
            },
            "required": [],
@@ -357,10 +341,7 @@ class MoveFolderTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Move a folder to a different parent folder. "
-            "Set target_parent_id to null to move to root level."
-        )
+        return "Move a folder. Set target_parent_id to null for root."

    @property
    def requires_auth(self) -> bool:
@@ -373,14 +354,11 @@ class MoveFolderTool(BaseTool):
            "properties": {
                "folder_id": {
                    "type": "string",
-                    "description": "ID of the folder to move.",
+                    "description": "Folder ID.",
                },
                "target_parent_id": {
                    "type": ["string", "null"],
-                    "description": (
-                        "ID of the new parent folder. "
-                        "Use null to move to root level."
-                    ),
+                    "description": "New parent folder ID (null for root).",
                },
            },
            "required": ["folder_id"],
@@ -433,10 +411,7 @@ class DeleteFolderTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Delete a folder from the user's library. "
-            "Agents inside the folder are moved to root level (not deleted)."
-        )
+        return "Delete a folder. Agents inside move to root (not deleted)."

    @property
    def requires_auth(self) -> bool:
@@ -499,10 +474,7 @@ class MoveAgentsToFolderTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Move one or more agents to a folder. "
-            "Set folder_id to null to move agents to root level."
-        )
+        return "Move agents to a folder. Set folder_id to null for root."

    @property
    def requires_auth(self) -> bool:
@@ -516,13 +488,11 @@ class MoveAgentsToFolderTool(BaseTool):
                "agent_ids": {
                    "type": "array",
                    "items": {"type": "string"},
-                    "description": "List of library agent IDs to move.",
+                    "description": "Library agent IDs to move.",
                },
                "folder_id": {
                    "type": ["string", "null"],
-                    "description": (
-                        "Target folder ID. Use null to move to root level."
-                    ),
+                    "description": "Target folder ID (null for root).",
                },
            },
            "required": ["agent_ids"],
--- a/autogpt_platform/backend/backend/copilot/tools/run_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_agent.py
@@ -104,19 +104,11 @@ class RunAgentTool(BaseTool):

    @property
    def description(self) -> str:
-        return """Run or schedule an agent from the marketplace or user's library.
-
-        The tool automatically handles the setup flow:
-        - Returns missing inputs if required fields are not provided
-        - Returns missing credentials if user needs to configure them
-        - Executes immediately if all requirements are met
-        - Schedules execution if cron expression is provided
-
-        Identify the agent using either:
-        - username_agent_slug: Marketplace format 'username/agent-name'
-        - library_agent_id: ID of an agent in the user's library
-
-        For scheduled execution, provide: schedule_name, cron, and optionally timezone."""
+        return (
+            "Run or schedule an agent. Automatically checks inputs and credentials. "
+            "Identify by username_agent_slug ('user/agent') or library_agent_id. "
+            "For scheduling, provide schedule_name + cron."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
@@ -125,40 +117,38 @@ class RunAgentTool(BaseTool):
            "properties": {
                "username_agent_slug": {
                    "type": "string",
-                    "description": "Agent identifier in format 'username/agent-name'",
+                    "description": "Marketplace format 'username/agent-name'.",
                },
                "library_agent_id": {
                    "type": "string",
-                    "description": "Library agent ID from user's library",
+                    "description": "Library agent ID.",
                },
                "inputs": {
                    "type": "object",
-                    "description": "Input values for the agent",
+                    "description": "Input values for the agent.",
                    "additionalProperties": True,
                },
                "use_defaults": {
                    "type": "boolean",
-                    "description": "Set to true to run with default values (user must confirm)",
+                    "description": "Run with default values (confirm with user first).",
                },
                "schedule_name": {
                    "type": "string",
-                    "description": "Name for scheduled execution (triggers scheduling mode)",
+                    "description": "Name for scheduled execution. Providing this triggers scheduling mode (also requires cron).",
                },
                "cron": {
                    "type": "string",
-                    "description": "Cron expression (5 fields: min hour day month weekday)",
+                    "description": "Cron expression (min hour day month weekday).",
                },
                "timezone": {
                    "type": "string",
-                    "description": "IANA timezone for schedule (default: UTC)",
+                    "description": "IANA timezone (default: UTC).",
                },
                "wait_for_result": {
                    "type": "integer",
-                    "description": (
-                        "Max seconds to wait for execution to complete (0-300). "
-                        "If >0, blocks until the execution finishes or times out. "
-                        "Returns execution outputs when complete."
-                    ),
+                    "description": "Max seconds to wait for completion (0-300).",
+                    "minimum": 0,
+                    "maximum": 300,
                },
            },
            "required": [],
--- a/autogpt_platform/backend/backend/copilot/tools/run_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_block.py
@@ -1,36 +1,19 @@
 """Tool for executing blocks directly."""

 import logging
-import uuid
 from typing import Any

-from backend.blocks import BlockType, get_block
-from backend.blocks._base import AnyBlockSchema
-from backend.copilot.constants import (
-    COPILOT_NODE_EXEC_ID_SEPARATOR,
-    COPILOT_NODE_PREFIX,
-    COPILOT_SESSION_PREFIX,
-)
+from backend.copilot.context import get_current_permissions
 from backend.copilot.model import ChatSession
-from backend.copilot.sdk.file_ref import FileRefExpansionError, expand_file_refs_in_args
-from backend.data.db_accessors import review_db
-from backend.data.execution import ExecutionContext

 from .base import BaseTool
-from .find_block import COPILOT_EXCLUDED_BLOCK_IDS, COPILOT_EXCLUDED_BLOCK_TYPES
-from .helpers import execute_block, get_inputs_from_schema, resolve_block_credentials
-from .models import (
-    BlockDetails,
-    BlockDetailsResponse,
-    ErrorResponse,
-    InputValidationErrorResponse,
-    ReviewRequiredResponse,
-    SetupInfo,
-    SetupRequirementsResponse,
-    ToolResponseBase,
-    UserReadiness,
+from .helpers import (
+    BlockPreparation,
+    check_hitl_review,
+    execute_block,
+    prepare_block_for_execution,
 )
-from .utils import build_missing_credentials_from_field_info
+from .models import BlockDetails, BlockDetailsResponse, ErrorResponse, ToolResponseBase

 logger = logging.getLogger(__name__)

@@ -45,13 +28,10 @@ class RunBlockTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Execute a specific block with the provided input data. "
-            "IMPORTANT: You MUST call find_block first to get the block's 'id' - "
-            "do NOT guess or make up block IDs. "
-            "On first attempt (without input_data), returns detailed schema showing "
-            "required inputs and outputs. Then call again with proper input_data to execute. "
-            "If a block requires human review, use continue_run_block with the "
-            "review_id after the user approves."
+            "Execute a block. IMPORTANT: Always get block_id from find_block first "
+            "— do NOT guess or fabricate IDs. "
+            "Call with empty input_data to see schema, then with data to execute. "
+            "If review_required, use continue_run_block."
        )

    @property
@@ -61,28 +41,14 @@ class RunBlockTool(BaseTool):
            "properties": {
                "block_id": {
                    "type": "string",
-                    "description": (
-                        "The block's 'id' field from find_block results. "
-                        "NEVER guess this - always get it from find_block first."
-                    ),
-                },
-                "block_name": {
-                    "type": "string",
-                    "description": (
-                        "The block's human-readable name from find_block results. "
-                        "Used for display purposes in the UI."
-                    ),
+                    "description": "Block ID from find_block results.",
                },
                "input_data": {
                    "type": "object",
-                    "description": (
-                        "Input values for the block. "
-                        "First call with empty {} to see the block's schema, "
-                        "then call again with proper values to execute."
-                    ),
+                    "description": "Input values. Use {} first to see schema.",
                },
            },
-            "required": ["block_id", "block_name", "input_data"],
+            "required": ["block_id", "input_data"],
        }

    @property
@@ -130,267 +96,85 @@ class RunBlockTool(BaseTool):
                session_id=session_id,
            )

-        # Get the block
-        block = get_block(block_id)
-        if not block:
-            return ErrorResponse(
-                message=f"Block '{block_id}' not found",
-                session_id=session_id,
-            )
-        if block.disabled:
-            return ErrorResponse(
-                message=f"Block '{block_id}' is disabled",
-                session_id=session_id,
-            )
+        logger.info("Preparing block %s for user %s", block_id, user_id)

-        # Check if block is excluded from CoPilot (graph-only blocks)
-        if (
-            block.block_type in COPILOT_EXCLUDED_BLOCK_TYPES
-            or block.id in COPILOT_EXCLUDED_BLOCK_IDS
-        ):
-            # Provide actionable guidance for blocks with dedicated tools
-            if block.block_type == BlockType.MCP_TOOL:
-                hint = (
-                    " Use the `run_mcp_tool` tool instead — it handles "
-                    "MCP server discovery, authentication, and execution."
+        prep_or_err = await prepare_block_for_execution(
+            block_id=block_id,
+            input_data=input_data,
+            user_id=user_id,
+            session=session,
+            session_id=session_id,
+        )
+        if isinstance(prep_or_err, ToolResponseBase):
+            return prep_or_err
+        prep: BlockPreparation = prep_or_err
+
+        # Check block-level permissions before execution.
+        perms = get_current_permissions()
+        if perms is not None and not perms.is_block_allowed(block_id, prep.block.name):
+            available_hint = (
+                f"Allowed identifiers: {perms.blocks!r}. "
+                if not perms.blocks_exclude and perms.blocks
+                else (
+                    f"Blocked identifiers: {perms.blocks!r}. "
+                    if perms.blocks_exclude and perms.blocks
+                    else ""
                )
-            elif block.block_type == BlockType.AGENT:
-                hint = " Use the `run_agent` tool instead."
-            else:
-                hint = " This block is designed for use within graphs only."
+            )
            return ErrorResponse(
-                message=f"Block '{block.name}' cannot be run directly.{hint}",
+                message=(
+                    f"Block '{prep.block.name}' ({block_id}) is not permitted "
+                    f"by the current execution permissions. {available_hint}"
+                    "Use find_block to discover blocks that are allowed."
+                ),
                session_id=session_id,
            )

-        logger.info(f"Executing block {block.name} ({block_id}) for user {user_id}")
-
-        (
-            matched_credentials,
-            missing_credentials,
-        ) = await resolve_block_credentials(user_id, block, input_data)
-
-        # Get block schemas for details/validation
-        try:
-            input_schema: dict[str, Any] = block.input_schema.jsonschema()
-        except Exception as e:
-            logger.warning(
-                "Failed to generate input schema for block %s: %s",
-                block_id,
-                e,
-            )
-            return ErrorResponse(
-                message=f"Block '{block.name}' has an invalid input schema",
-                error=str(e),
-                session_id=session_id,
-            )
-        try:
-            output_schema: dict[str, Any] = block.output_schema.jsonschema()
-        except Exception as e:
-            logger.warning(
-                "Failed to generate output schema for block %s: %s",
-                block_id,
-                e,
-            )
-            return ErrorResponse(
-                message=f"Block '{block.name}' has an invalid output schema",
-                error=str(e),
-                session_id=session_id,
-            )
-
-        # Expand @@agptfile: refs in input_data with the block's input
-        # schema.  The generic _truncating wrapper skips opaque object
-        # properties (input_data has no declared inner properties in the
-        # tool schema), so file ref tokens are still intact here.
-        # Using the block's schema lets us return raw text for string-typed
-        # fields and parsed structures for list/dict-typed fields.
-        if input_data:
+        # Show block details when required inputs are not yet provided.
+        # This is run_block's two-step UX: first call returns the schema,
+        # second call (with inputs) actually executes.
+        if not (prep.required_non_credential_keys <= prep.provided_input_keys):
            try:
-                input_data = await expand_file_refs_in_args(
-                    input_data,
-                    user_id,
-                    session,
-                    input_schema=input_schema,
+                output_schema: dict[str, Any] = prep.block.output_schema.jsonschema()
+            except Exception as e:
+                logger.warning(
+                    "Failed to generate output schema for block %s: %s", block_id, e
                )
-            except FileRefExpansionError as exc:
                return ErrorResponse(
-                    message=(
-                        f"Failed to resolve file reference: {exc}. "
-                        "Ensure the file exists before referencing it."
-                    ),
+                    message=f"Block '{prep.block.name}' has an invalid output schema",
+                    error=str(e),
                    session_id=session_id,
                )

-        if missing_credentials:
-            # Return setup requirements response with missing credentials
-            credentials_fields_info = block.input_schema.get_credentials_fields_info()
-            missing_creds_dict = build_missing_credentials_from_field_info(
-                credentials_fields_info, set(matched_credentials.keys())
-            )
-            missing_creds_list = list(missing_creds_dict.values())
-
-            return SetupRequirementsResponse(
-                message=(
-                    f"Block '{block.name}' requires credentials that are not configured. "
-                    "Please set up the required credentials before running this block."
-                ),
-                session_id=session_id,
-                setup_info=SetupInfo(
-                    agent_id=block_id,
-                    agent_name=block.name,
-                    user_readiness=UserReadiness(
-                        has_all_credentials=False,
-                        missing_credentials=missing_creds_dict,
-                        ready_to_run=False,
-                    ),
-                    requirements={
-                        "credentials": missing_creds_list,
-                        "inputs": self._get_inputs_list(block),
-                        "execution_modes": ["immediate"],
-                    },
-                ),
-                graph_id=None,
-                graph_version=None,
-            )
-
-        # Check if this is a first attempt (required inputs missing)
-        # Return block details so user can see what inputs are needed
-        credentials_fields = set(block.input_schema.get_credentials_fields().keys())
-        required_keys = set(input_schema.get("required", []))
-        required_non_credential_keys = required_keys - credentials_fields
-        provided_input_keys = set(input_data.keys()) - credentials_fields
-
-        # Check for unknown input fields
-        valid_fields = (
-            set(input_schema.get("properties", {}).keys()) - credentials_fields
-        )
-        unrecognized_fields = provided_input_keys - valid_fields
-        if unrecognized_fields:
-            return InputValidationErrorResponse(
-                message=(
-                    f"Unknown input field(s) provided: {', '.join(sorted(unrecognized_fields))}. "
-                    f"Block was not executed. Please use the correct field names from the schema."
-                ),
-                session_id=session_id,
-                unrecognized_fields=sorted(unrecognized_fields),
-                inputs=input_schema,
-            )
-
-        # Show details when not all required non-credential inputs are provided
-        if not (required_non_credential_keys <= provided_input_keys):
-            # Get credentials info for the response
-            credentials_meta = []
-            for field_name, cred_meta in matched_credentials.items():
-                credentials_meta.append(cred_meta)
-
+            credentials_meta = list(prep.matched_credentials.values())
            return BlockDetailsResponse(
                message=(
-                    f"Block '{block.name}' details. "
+                    f"Block '{prep.block.name}' details. "
                    "Provide input_data matching the inputs schema to execute the block."
                ),
                session_id=session_id,
                block=BlockDetails(
                    id=block_id,
-                    name=block.name,
-                    description=block.description or "",
-                    inputs=input_schema,
+                    name=prep.block.name,
+                    description=prep.block.description or "",
+                    inputs=prep.input_schema,
                    outputs=output_schema,
                    credentials=credentials_meta,
                ),
                user_authenticated=True,
            )

-        # Generate synthetic IDs for CoPilot context.
-        # Encode node_id in node_exec_id so it can be extracted later
-        # (e.g. for auto-approve, where we need node_id but have no NodeExecution row).
-        synthetic_graph_id = f"{COPILOT_SESSION_PREFIX}{session.session_id}"
-        synthetic_node_id = f"{COPILOT_NODE_PREFIX}{block_id}"
-
-        # Check for an existing WAITING review for this block with the same input.
-        # If the LLM retries run_block with identical input, we reuse the existing
-        # review instead of creating duplicates. Different inputs = new execution.
-        existing_reviews = await review_db().get_pending_reviews_for_execution(
-            synthetic_graph_id, user_id
-        )
-        existing_review = next(
-            (
-                r
-                for r in existing_reviews
-                if r.node_id == synthetic_node_id
-                and r.status.value == "WAITING"
-                and r.payload == input_data
-            ),
-            None,
-        )
-        if existing_review:
-            return ReviewRequiredResponse(
-                message=(
-                    f"Block '{block.name}' requires human review. "
-                    f"After the user approves, call continue_run_block with "
-                    f"review_id='{existing_review.node_exec_id}' to execute."
-                ),
-                session_id=session_id,
-                block_id=block_id,
-                block_name=block.name,
-                review_id=existing_review.node_exec_id,
-                graph_exec_id=synthetic_graph_id,
-                input_data=input_data,
-            )
-
-        synthetic_node_exec_id = (
-            f"{synthetic_node_id}{COPILOT_NODE_EXEC_ID_SEPARATOR}"
-            f"{uuid.uuid4().hex[:8]}"
-        )
-
-        # Check for HITL review before execution.
-        # This creates the review record in the DB for CoPilot flows.
-        review_context = ExecutionContext(
-            user_id=user_id,
-            graph_id=synthetic_graph_id,
-            graph_exec_id=synthetic_graph_id,
-            graph_version=1,
-            node_id=synthetic_node_id,
-            node_exec_id=synthetic_node_exec_id,
-            sensitive_action_safe_mode=True,
-        )
-        should_pause, input_data = await block.is_block_exec_need_review(
-            input_data,
-            user_id=user_id,
-            node_id=synthetic_node_id,
-            node_exec_id=synthetic_node_exec_id,
-            graph_exec_id=synthetic_graph_id,
-            graph_id=synthetic_graph_id,
-            graph_version=1,
-            execution_context=review_context,
-            is_graph_execution=False,
-        )
-        if should_pause:
-            return ReviewRequiredResponse(
-                message=(
-                    f"Block '{block.name}' requires human review. "
-                    f"After the user approves, call continue_run_block with "
-                    f"review_id='{synthetic_node_exec_id}' to execute."
-                ),
-                session_id=session_id,
-                block_id=block_id,
-                block_name=block.name,
-                review_id=synthetic_node_exec_id,
-                graph_exec_id=synthetic_graph_id,
-                input_data=input_data,
-            )
+        hitl_or_err = await check_hitl_review(prep, user_id, session_id)
+        if isinstance(hitl_or_err, ToolResponseBase):
+            return hitl_or_err
+        synthetic_node_exec_id, input_data = hitl_or_err

        return await execute_block(
-            block=block,
+            block=prep.block,
            block_id=block_id,
            input_data=input_data,
            user_id=user_id,
            session_id=session_id,
            node_exec_id=synthetic_node_exec_id,
-            matched_credentials=matched_credentials,
+            matched_credentials=prep.matched_credentials,
        )
-
-    def _get_inputs_list(self, block: AnyBlockSchema) -> list[dict[str, Any]]:
-        """Extract non-credential inputs from block schema."""
-        schema = block.input_schema.jsonschema()
-        credentials_fields = set(block.input_schema.get_credentials_fields().keys())
-        return get_inputs_from_schema(schema, exclude_fields=credentials_fields)
--- a/autogpt_platform/backend/backend/copilot/tools/run_block_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_block_test.py
@@ -5,6 +5,8 @@ from unittest.mock import AsyncMock, MagicMock, patch
 import pytest

 from backend.blocks._base import BlockType
+from backend.copilot.context import _current_permissions
+from backend.copilot.permissions import CopilotPermissions

 from ._test_data import make_session
 from .models import (
@@ -92,7 +94,7 @@ class TestRunBlockFiltering:
        input_block = make_mock_block("input-block-id", "Input Block", BlockType.INPUT)

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.copilot.tools.helpers.get_block",
            return_value=input_block,
        ):
            tool = RunBlockTool()
@@ -109,29 +111,92 @@ class TestRunBlockFiltering:

    @pytest.mark.asyncio(loop_scope="session")
    async def test_excluded_block_id_returns_error(self):
-        """Attempting to execute SmartDecisionMakerBlock returns error."""
+        """Attempting to execute OrchestratorBlock returns error."""
        session = make_session(user_id=_TEST_USER_ID)

-        smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
+        orchestrator_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
        smart_block = make_mock_block(
-            smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
+            orchestrator_id, "Orchestrator", BlockType.STANDARD
        )

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.copilot.tools.helpers.get_block",
            return_value=smart_block,
        ):
            tool = RunBlockTool()
            response = await tool._execute(
                user_id=_TEST_USER_ID,
                session=session,
-                block_id=smart_decision_id,
+                block_id=orchestrator_id,
                input_data={},
            )

        assert isinstance(response, ErrorResponse)
        assert "cannot be run directly" in response.message

+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_block_denied_by_permissions_returns_error(self):
+        """A block denied by CopilotPermissions returns an ErrorResponse."""
+        session = make_session(user_id=_TEST_USER_ID)
+        block_id = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
+        standard_block = make_mock_block(block_id, "HTTP Request", BlockType.STANDARD)
+
+        perms = CopilotPermissions(blocks=[block_id], blocks_exclude=True)
+        token = _current_permissions.set(perms)
+        try:
+            with patch(
+                "backend.copilot.tools.helpers.get_block",
+                return_value=standard_block,
+            ):
+                tool = RunBlockTool()
+                response = await tool._execute(
+                    user_id=_TEST_USER_ID,
+                    session=session,
+                    block_id=block_id,
+                    input_data={},
+                )
+        finally:
+            _current_permissions.reset(token)
+
+        assert isinstance(response, ErrorResponse)
+        assert "not permitted" in response.message
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_allowed_by_permissions_passes_guard(self):
+        """A block explicitly allowed by a whitelist CopilotPermissions passes the guard."""
+        session = make_session(user_id=_TEST_USER_ID)
+        block_id = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
+        standard_block = make_mock_block(block_id, "HTTP Request", BlockType.STANDARD)
+
+        perms = CopilotPermissions(blocks=[block_id], blocks_exclude=False)
+        token = _current_permissions.set(perms)
+        try:
+            with (
+                patch(
+                    "backend.copilot.tools.helpers.get_block",
+                    return_value=standard_block,
+                ),
+                patch(
+                    "backend.copilot.tools.helpers.match_credentials_to_requirements",
+                    return_value=({}, []),
+                ),
+            ):
+                tool = RunBlockTool()
+                response = await tool._execute(
+                    user_id=_TEST_USER_ID,
+                    session=session,
+                    block_id=block_id,
+                    input_data={},
+                )
+        finally:
+            _current_permissions.reset(token)
+
+        # Must NOT be blocked by permissions — assert it's not a permission error
+        assert (
+            not isinstance(response, ErrorResponse)
+            or "not permitted" not in response.message
+        )
+
    @pytest.mark.asyncio(loop_scope="session")
    async def test_non_excluded_block_passes_guard(self):
        """Non-excluded blocks pass the filtering guard (may fail later for other reasons)."""
@@ -143,7 +208,7 @@ class TestRunBlockFiltering:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=standard_block,
            ),
            patch(
@@ -200,7 +265,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -243,7 +308,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -289,7 +354,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -337,7 +402,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -381,7 +446,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -435,7 +500,7 @@ class TestRunBlockSensitiveAction:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -491,7 +556,7 @@ class TestRunBlockSensitiveAction:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -545,7 +610,7 @@ class TestRunBlockSensitiveAction:

        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.copilot.tools.helpers.get_block",
                return_value=mock_block,
            ),
            patch(
--- a/autogpt_platform/backend/backend/copilot/tools/run_mcp_tool.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_mcp_tool.py
@@ -57,10 +57,9 @@ class RunMCPToolTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Connect to an MCP (Model Context Protocol) server to discover and execute its tools. "
-            "Two-step: (1) call with server_url to list available tools, "
-            "(2) call again with server_url + tool_name + tool_arguments to execute. "
-            "Call get_mcp_guide for known server URLs and auth details."
+            "Discover and execute MCP server tools. "
+            "Call with server_url only to list tools, then with tool_name + tool_arguments to execute. "
+            "Call get_mcp_guide first for server URLs and auth."
        )

    @property
@@ -70,24 +69,15 @@ class RunMCPToolTool(BaseTool):
            "properties": {
                "server_url": {
                    "type": "string",
-                    "description": (
-                        "URL of the MCP server (Streamable HTTP endpoint), "
-                        "e.g. https://mcp.example.com/mcp"
-                    ),
+                    "description": "MCP server URL (Streamable HTTP endpoint).",
                },
                "tool_name": {
                    "type": "string",
-                    "description": (
-                        "Name of the MCP tool to execute. "
-                        "Omit on first call to discover available tools."
-                    ),
+                    "description": "Tool to execute. Omit to discover available tools.",
                },
                "tool_arguments": {
                    "type": "object",
-                    "description": (
-                        "Arguments to pass to the selected tool. "
-                        "Must match the tool's input schema returned during discovery."
-                    ),
+                    "description": "Arguments matching the tool's input schema.",
                },
            },
            "required": ["server_url"],
--- a/autogpt_platform/backend/backend/copilot/tools/search_docs.py
+++ b/autogpt_platform/backend/backend/copilot/tools/search_docs.py
@@ -38,11 +38,7 @@ class SearchDocsTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Search the AutoGPT platform documentation for information about "
-            "how to use the platform, build agents, configure blocks, and more. "
-            "Returns relevant documentation sections. Use get_doc_page to read full content."
-        )
+        return "Search platform documentation by keyword. Use get_doc_page to read full results."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -51,10 +47,7 @@ class SearchDocsTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": (
-                        "Search query to find relevant documentation. "
-                        "Use natural language to describe what you're looking for."
-                    ),
+                    "description": "Documentation search query.",
                },
            },
            "required": ["query"],
--- a/autogpt_platform/backend/backend/copilot/tools/test_run_block_details.py
+++ b/autogpt_platform/backend/backend/copilot/tools/test_run_block_details.py
@@ -61,12 +61,12 @@ async def test_run_block_returns_details_when_no_input_provided():
    )

    with patch(
-        "backend.copilot.tools.run_block.get_block",
+        "backend.copilot.tools.helpers.get_block",
        return_value=http_block,
    ):
        # Mock credentials check to return no missing credentials
        with patch(
-            "backend.copilot.tools.run_block.resolve_block_credentials",
+            "backend.copilot.tools.helpers.resolve_block_credentials",
            new_callable=AsyncMock,
            return_value=({}, []),  # (matched_credentials, missing_credentials)
        ):
@@ -119,11 +119,11 @@ async def test_run_block_returns_details_when_only_credentials_provided():
    }

    with patch(
-        "backend.copilot.tools.run_block.get_block",
+        "backend.copilot.tools.helpers.get_block",
        return_value=mock,
    ):
        with patch(
-            "backend.copilot.tools.run_block.resolve_block_credentials",
+            "backend.copilot.tools.helpers.resolve_block_credentials",
            new_callable=AsyncMock,
            return_value=(
                {
--- a/autogpt_platform/backend/backend/copilot/tools/tool_schema_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/tool_schema_test.py
@@ -0,0 +1,119 @@
+"""Schema regression tests for all registered CoPilot tools.
+
+Validates that every tool in TOOL_REGISTRY produces a well-formed schema:
+- description is non-empty
+- all `required` fields exist in `properties`
+- every property has a `type` and `description`
+- total schema character budget does not regress past threshold
+"""
+
+import json
+from typing import Any, cast
+
+import pytest
+
+from backend.copilot.tools import TOOL_REGISTRY
+
+# Character budget (~4 chars/token heuristic, targeting ~8000 tokens)
+_CHAR_BUDGET = 32_000
+
+
+@pytest.fixture(scope="module")
+def all_tool_schemas() -> list[tuple[str, Any]]:
+    """Return (tool_name, openai_schema) pairs for every registered tool."""
+    return [(name, tool.as_openai_tool()) for name, tool in TOOL_REGISTRY.items()]
+
+
+def _get_parametrize_data() -> list[tuple[str, object]]:
+    """Build parametrize data at collection time."""
+    return [(name, tool.as_openai_tool()) for name, tool in TOOL_REGISTRY.items()]
+
+
+@pytest.mark.parametrize(
+    "tool_name,schema",
+    _get_parametrize_data(),
+    ids=[name for name, _ in _get_parametrize_data()],
+)
+class TestToolSchema:
+    """Validate schema invariants for every registered tool."""
+
+    def test_description_non_empty(self, tool_name: str, schema: dict) -> None:
+        desc = schema["function"].get("description", "")
+        assert desc, f"Tool '{tool_name}' has an empty description"
+
+    def test_required_fields_exist_in_properties(
+        self, tool_name: str, schema: dict
+    ) -> None:
+        params = schema["function"].get("parameters", {})
+        properties = params.get("properties", {})
+        required = params.get("required", [])
+        for field in required:
+            assert field in properties, (
+                f"Tool '{tool_name}': required field '{field}' "
+                f"not found in properties {list(properties.keys())}"
+            )
+
+    def test_every_property_has_type_and_description(
+        self, tool_name: str, schema: dict
+    ) -> None:
+        params = schema["function"].get("parameters", {})
+        properties = params.get("properties", {})
+        for prop_name, prop_def in properties.items():
+            assert (
+                "type" in prop_def
+            ), f"Tool '{tool_name}', property '{prop_name}' is missing 'type'"
+            assert (
+                "description" in prop_def
+            ), f"Tool '{tool_name}', property '{prop_name}' is missing 'description'"
+
+
+def test_browser_act_action_enum_complete() -> None:
+    """Assert browser_act action enum still contains all 14 supported actions.
+
+    This prevents future PRs from accidentally dropping actions during description
+    trimming. The enum is the authoritative list — this locks it at 14 values.
+    """
+    tool = TOOL_REGISTRY["browser_act"]
+    schema = tool.as_openai_tool()
+    fn_def = schema["function"]
+    params = cast(dict[str, Any], fn_def.get("parameters", {}))
+    actions = params["properties"]["action"]["enum"]
+    expected = {
+        "click",
+        "dblclick",
+        "fill",
+        "type",
+        "scroll",
+        "hover",
+        "press",
+        "check",
+        "uncheck",
+        "select",
+        "wait",
+        "back",
+        "forward",
+        "reload",
+    }
+    assert set(actions) == expected, (
+        f"browser_act action enum changed. Got {set(actions)}, expected {expected}. "
+        "If you added/removed an action, update this test intentionally."
+    )
+
+
+def test_total_schema_char_budget() -> None:
+    """Assert total tool schema size stays under the character budget.
+
+    This locks in the 34% token reduction from #12398 and prevents future
+    description bloat from eroding the gains. Uses character count with a
+    ~4 chars/token heuristic (budget of 32000 chars ≈ 8000 tokens).
+    Character count is tokenizer-agnostic — no dependency on GPT or Claude
+    tokenizers — while still providing a stable regression gate.
+    """
+    schemas = [tool.as_openai_tool() for tool in TOOL_REGISTRY.values()]
+    serialized = json.dumps(schemas)
+    total_chars = len(serialized)
+    assert total_chars < _CHAR_BUDGET, (
+        f"Tool schemas use {total_chars} chars (~{total_chars // 4} tokens), "
+        f"exceeding budget of {_CHAR_BUDGET} chars (~{_CHAR_BUDGET // 4} tokens). "
+        f"Description bloat detected — trim descriptions or raise the budget intentionally."
+    )
--- a/autogpt_platform/backend/backend/copilot/tools/validate_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/validate_agent.py
@@ -22,17 +22,9 @@ class ValidateAgentGraphTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Validate an agent JSON graph for correctness. Checks:\n"
-            "- All block_ids reference real blocks\n"
-            "- All links reference valid source/sink nodes and fields\n"
-            "- Required input fields are wired or have defaults\n"
-            "- Data types are compatible across links\n"
-            "- Nested sink links use correct notation\n"
-            "- Prompt templates use proper curly brace escaping\n"
-            "- AgentExecutorBlock configurations are valid\n\n"
-            "Call this after generating agent JSON to verify correctness. "
-            "If validation fails, either fix issues manually based on the error "
-            "descriptions, or call fix_agent_graph to auto-fix common problems."
+            "Validate agent JSON for correctness: block_ids, links, required fields, "
+            "type compatibility, nested sink notation, prompt brace escaping, "
+            "and AgentExecutorBlock configs. On failure, use fix_agent_graph to auto-fix."
        )

    @property
@@ -46,11 +38,7 @@ class ValidateAgentGraphTool(BaseTool):
            "properties": {
                "agent_json": {
                    "type": "object",
-                    "description": (
-                        "The agent JSON to validate. Must contain 'nodes' and 'links' arrays. "
-                        "Each node needs: id (UUID), block_id, input_default, metadata. "
-                        "Each link needs: id (UUID), source_id, source_name, sink_id, sink_name."
-                    ),
+                    "description": "Agent JSON with 'nodes' and 'links' arrays.",
                },
            },
            "required": ["agent_json"],
--- a/autogpt_platform/backend/backend/copilot/tools/web_fetch.py
+++ b/autogpt_platform/backend/backend/copilot/tools/web_fetch.py
@@ -59,13 +59,7 @@ class WebFetchTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Fetch the content of a public web page by URL. "
-            "Returns readable text extracted from HTML by default. "
-            "Useful for reading documentation, articles, and API responses. "
-            "Only supports HTTP/HTTPS GET requests to public URLs "
-            "(private/internal network addresses are blocked)."
-        )
+        return "Fetch a public web page. Public URLs only — internal addresses blocked. Returns readable text from HTML by default."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -74,14 +68,11 @@ class WebFetchTool(BaseTool):
            "properties": {
                "url": {
                    "type": "string",
-                    "description": "The public HTTP/HTTPS URL to fetch.",
+                    "description": "Public HTTP/HTTPS URL.",
                },
                "extract_text": {
                    "type": "boolean",
-                    "description": (
-                        "If true (default), extract readable text from HTML. "
-                        "If false, return raw content."
-                    ),
+                    "description": "Extract text from HTML (default: true).",
                    "default": True,
                },
            },
--- a/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
+++ b/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
@@ -27,6 +27,8 @@ from .models import ErrorResponse, ResponseType, ToolResponseBase

 logger = logging.getLogger(__name__)

+_MAX_FILE_SIZE_MB = Config().max_file_size_mb
+
 # Sentinel file_id used when a tool-result file is read directly from the local
 # host filesystem (rather than from workspace storage).
 _LOCAL_TOOL_RESULT_FILE_ID = "local"
@@ -415,13 +417,7 @@ class ListWorkspaceFilesTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "List files in the user's persistent workspace (cloud storage). "
-            "These files survive across sessions. "
-            "For ephemeral session files, use the SDK Read/Glob tools instead. "
-            "Returns file names, paths, sizes, and metadata. "
-            "Optionally filter by path prefix."
-        )
+        return "List persistent workspace files. For ephemeral session files, use SDK Glob/Read instead. Optionally filter by path prefix."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -430,24 +426,17 @@ class ListWorkspaceFilesTool(BaseTool):
            "properties": {
                "path_prefix": {
                    "type": "string",
-                    "description": (
-                        "Optional path prefix to filter files "
-                        "(e.g., '/documents/' to list only files in documents folder). "
-                        "By default, only files from the current session are listed."
-                    ),
+                    "description": "Filter by path prefix (e.g. '/documents/').",
                },
                "limit": {
                    "type": "integer",
-                    "description": "Maximum number of files to return (default 50, max 100)",
+                    "description": "Max files to return (default 50, max 100).",
                    "minimum": 1,
                    "maximum": 100,
                },
                "include_all_sessions": {
                    "type": "boolean",
-                    "description": (
-                        "If true, list files from all sessions. "
-                        "Default is false (only current session's files)."
-                    ),
+                    "description": "Include files from all sessions (default: false).",
                },
            },
            "required": [],
@@ -530,18 +519,11 @@ class ReadWorkspaceFileTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Read a file from the user's persistent workspace (cloud storage). "
-            "These files survive across sessions. "
-            "For ephemeral session files, use the SDK Read tool instead. "
-            "Specify either file_id or path to identify the file. "
-            "For small text files, returns content directly. "
-            "For large or binary files, returns metadata and a download URL. "
-            "Use 'save_to_path' to copy the file to the working directory "
-            "(sandbox or ephemeral) for processing with bash_exec or file tools. "
-            "Use 'offset' and 'length' for paginated reads of large files "
-            "(e.g., persisted tool outputs). "
-            "Paths are scoped to the current session by default. "
-            "Use /sessions/<session_id>/... for cross-session access."
+            "Read a file from persistent workspace. Specify file_id or path. "
+            "Small text/image files return inline; large/binary return metadata+URL. "
+            "Use save_to_path to copy to working dir for processing. "
+            "Use offset/length for paginated reads. "
+            "Paths scoped to current session; use /sessions/<id>/... for cross-session access."
        )

    @property
@@ -551,48 +533,30 @@ class ReadWorkspaceFileTool(BaseTool):
            "properties": {
                "file_id": {
                    "type": "string",
-                    "description": "The file's unique ID (from list_workspace_files)",
+                    "description": "File ID from list_workspace_files.",
                },
                "path": {
                    "type": "string",
-                    "description": (
-                        "The virtual file path (e.g., '/documents/report.pdf'). "
-                        "Scoped to current session by default."
-                    ),
+                    "description": "Virtual file path (e.g. '/documents/report.pdf').",
                },
                "save_to_path": {
                    "type": "string",
-                    "description": (
-                        "If provided, save the file to this path in the working "
-                        "directory (cloud sandbox when E2B is active, or "
-                        "ephemeral dir otherwise) so it can be processed with "
-                        "bash_exec or file tools. "
-                        "The file content is still returned in the response."
-                    ),
+                    "description": "Copy file to this working directory path for processing.",
                },
                "force_download_url": {
                    "type": "boolean",
-                    "description": (
-                        "If true, always return metadata+URL instead of inline content. "
-                        "Default is false (auto-selects based on file size/type)."
-                    ),
+                    "description": "Always return metadata+URL instead of inline content.",
                },
                "offset": {
                    "type": "integer",
-                    "description": (
-                        "Character offset to start reading from (0-based). "
-                        "Use with 'length' for paginated reads of large files."
-                    ),
+                    "description": "Character offset for paginated reads (0-based).",
                },
                "length": {
                    "type": "integer",
-                    "description": (
-                        "Maximum number of characters to return. "
-                        "Defaults to full file. Use with 'offset' for paginated reads."
-                    ),
+                    "description": "Max characters to return for paginated reads.",
                },
            },
-            "required": [],  # At least one must be provided
+            "required": [],  # At least one of file_id or path must be provided
        }

    @property
@@ -755,15 +719,10 @@ class WriteWorkspaceFileTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Write or create a file in the user's persistent workspace (cloud storage). "
-            "These files survive across sessions. "
-            "For ephemeral session files, use the SDK Write tool instead. "
-            "Provide content as plain text via 'content', OR base64-encoded via "
-            "'content_base64', OR copy a file from the ephemeral working directory "
-            "via 'source_path'. Exactly one of these three is required. "
-            f"Maximum file size is {Config().max_file_size_mb}MB. "
-            "Files are saved to the current session's folder by default. "
-            "Use /sessions/<session_id>/... for cross-session access."
+            "Write a file to persistent workspace (survives across sessions). "
+            "Provide exactly one of: content (text), content_base64 (binary), "
+            f"or source_path (copy from working dir). Max {_MAX_FILE_SIZE_MB}MB. "
+            "Paths scoped to current session; use /sessions/<id>/... for cross-session access."
        )

    @property
@@ -773,51 +732,31 @@ class WriteWorkspaceFileTool(BaseTool):
            "properties": {
                "filename": {
                    "type": "string",
-                    "description": "Name for the file (e.g., 'report.pdf')",
+                    "description": "Filename (e.g. 'report.pdf').",
                },
                "content": {
                    "type": "string",
-                    "description": (
-                        "Plain text content to write. Use this for text files "
-                        "(code, configs, documents, etc.). "
-                        "Mutually exclusive with content_base64 and source_path."
-                    ),
+                    "description": "Plain text content. Mutually exclusive with content_base64/source_path.",
                },
                "content_base64": {
                    "type": "string",
-                    "description": (
-                        "Base64-encoded file content. Use this for binary files "
-                        "(images, PDFs, etc.). "
-                        "Mutually exclusive with content and source_path."
-                    ),
+                    "description": "Base64-encoded binary content. Mutually exclusive with content/source_path.",
                },
                "source_path": {
                    "type": "string",
-                    "description": (
-                        "Path to a file in the ephemeral working directory to "
-                        "copy to workspace (e.g., '/tmp/copilot-.../output.csv'). "
-                        "Use this to persist files created by bash_exec or SDK Write. "
-                        "Mutually exclusive with content and content_base64."
-                    ),
+                    "description": "Working directory path to copy to workspace. Mutually exclusive with content/content_base64.",
                },
                "path": {
                    "type": "string",
-                    "description": (
-                        "Optional virtual path where to save the file "
-                        "(e.g., '/documents/report.pdf'). "
-                        "Defaults to '/{filename}'. Scoped to current session."
-                    ),
+                    "description": "Virtual path (e.g. '/documents/report.pdf'). Defaults to '/{filename}'.",
                },
                "mime_type": {
                    "type": "string",
-                    "description": (
-                        "Optional MIME type of the file. "
-                        "Auto-detected from filename if not provided."
-                    ),
+                    "description": "MIME type. Auto-detected from filename if omitted.",
                },
                "overwrite": {
                    "type": "boolean",
-                    "description": "Whether to overwrite if file exists at path (default: false)",
+                    "description": "Overwrite if file exists (default: false).",
                },
            },
            "required": ["filename"],
@@ -859,10 +798,10 @@ class WriteWorkspaceFileTool(BaseTool):
            return resolved
        content: bytes = resolved

-        max_size = Config().max_file_size_mb * 1024 * 1024
+        max_size = _MAX_FILE_SIZE_MB * 1024 * 1024
        if len(content) > max_size:
            return ErrorResponse(
-                message=f"File too large. Maximum size is {Config().max_file_size_mb}MB",
+                message=f"File too large. Maximum size is {_MAX_FILE_SIZE_MB}MB",
                session_id=session_id,
            )

@@ -944,12 +883,7 @@ class DeleteWorkspaceFileTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Delete a file from the user's persistent workspace (cloud storage). "
-            "Specify either file_id or path to identify the file. "
-            "Paths are scoped to the current session by default. "
-            "Use /sessions/<session_id>/... for cross-session access."
-        )
+        return "Delete a file from persistent workspace. Specify file_id or path. Paths scoped to current session; use /sessions/<id>/... for cross-session access."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -958,17 +892,14 @@ class DeleteWorkspaceFileTool(BaseTool):
            "properties": {
                "file_id": {
                    "type": "string",
-                    "description": "The file's unique ID (from list_workspace_files)",
+                    "description": "File ID from list_workspace_files.",
                },
                "path": {
                    "type": "string",
-                    "description": (
-                        "The virtual file path (e.g., '/documents/report.pdf'). "
-                        "Scoped to current session by default."
-                    ),
+                    "description": "Virtual file path.",
                },
            },
-            "required": [],  # At least one must be provided
+            "required": [],  # At least one of file_id or path must be provided
        }

    @property
--- a/autogpt_platform/backend/backend/data/block_cost_config.py
+++ b/autogpt_platform/backend/backend/data/block_cost_config.py
@@ -32,9 +32,9 @@ from backend.blocks.llm import (
    AITextSummarizerBlock,
    LlmModel,
 )
+from backend.blocks.orchestrator import OrchestratorBlock
 from backend.blocks.replicate.flux_advanced import ReplicateFluxAdvancedModelBlock
 from backend.blocks.replicate.replicate_block import ReplicateModelBlock
-from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
 from backend.blocks.talking_head import CreateTalkingAvatarVideoBlock
 from backend.blocks.text_to_speech_block import UnrealTextToSpeechBlock
 from backend.blocks.video.narration import VideoNarrationBlock
@@ -548,7 +548,6 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
            },
        )
    ],
-    SmartDecisionMakerBlock: LLM_COST,
    SearchOrganizationsBlock: [
        BlockCost(
            cost_amount=2,
@@ -700,6 +699,7 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
            },
        ),
    ],
+    OrchestratorBlock: LLM_COST,
    VideoNarrationBlock: [
        BlockCost(
            cost_amount=5,  # ElevenLabs TTS cost
--- a/autogpt_platform/backend/backend/data/db.py
+++ b/autogpt_platform/backend/backend/data/db.py
@@ -38,6 +38,10 @@ POOL_TIMEOUT = os.getenv("DB_POOL_TIMEOUT")
 if POOL_TIMEOUT:
    DATABASE_URL = add_param(DATABASE_URL, "pool_timeout", POOL_TIMEOUT)

+STMT_CACHE_SIZE = os.getenv("DB_STATEMENT_CACHE_SIZE")
+if STMT_CACHE_SIZE:
+    DATABASE_URL = add_param(DATABASE_URL, "statement_cache_size", STMT_CACHE_SIZE)
+
 HTTP_TIMEOUT = int(POOL_TIMEOUT) if POOL_TIMEOUT else None

 prisma = Prisma(
--- a/autogpt_platform/backend/backend/data/execution_outputs_test.py
+++ b/autogpt_platform/backend/backend/data/execution_outputs_test.py
@@ -7,7 +7,7 @@ the function returns plain values instead of lists, it causes:
    1 validation error for dict[str,list[any]] response
    Input should be a valid list [type=list_type, input_value='', input_type=str]

-This breaks SmartDecisionMakerBlock agent mode tool execution.
+This breaks OrchestratorBlock agent mode tool execution.
 """

 from unittest.mock import AsyncMock, MagicMock, patch
--- a/autogpt_platform/backend/backend/data/graph.py
+++ b/autogpt_platform/backend/backend/data/graph.py
@@ -737,7 +737,7 @@ class GraphModel(Graph, GraphMeta):
        # Collect errors per node
        node_errors: dict[str, dict[str, str]] = defaultdict(dict)

-        # Validate smart decision maker nodes
+        # Validate tool orchestrator nodes
        nodes_block = {
            node.id: block
            for node in graph.nodes
@@ -1207,13 +1207,9 @@ async def get_graph_as_admin(
        order={"version": "desc"},
    )

-    # For access, the graph must be owned by the user or listed in the store
-    if graph is None or (
-        graph.userId != user_id
-        and not await is_graph_published_in_marketplace(
-            graph_id, version or graph.version
-        )
-    ):
+    # Admin access bypasses ownership and marketplace checks — route-level
+    # auth already ensures only admins can call this function.
+    if graph is None:
        return None

    if for_export:
--- a/autogpt_platform/backend/backend/data/graph_test.py
+++ b/autogpt_platform/backend/backend/data/graph_test.py
@@ -1,6 +1,6 @@
 import json
 from typing import Any
-from unittest.mock import AsyncMock, patch
+from unittest.mock import AsyncMock, MagicMock, patch
 from uuid import UUID

 import fastapi.exceptions
@@ -13,7 +13,7 @@ from backend.api.model import CreateGraph
 from backend.blocks._base import BlockSchema, BlockSchemaInput
 from backend.blocks.basic import StoreValueBlock
 from backend.blocks.io import AgentInputBlock, AgentOutputBlock
-from backend.data.graph import Graph, Link, Node
+from backend.data.graph import Graph, Link, Node, get_graph
 from backend.data.model import SchemaField
 from backend.data.user import DEFAULT_USER_ID
 from backend.usecases.sample import create_test_user
@@ -595,3 +595,82 @@ def test_mcp_credential_combine_no_discriminator_values():
        f"Expected 1 credential entry for MCP blocks without discriminator_values, "
        f"got {len(combined)}: {list(combined.keys())}"
    )
+
+
+# --------------- get_graph access-control regression tests --------------- #
+# These protect the behavior introduced in PR #11323 (Reinier, 2025-11-05):
+# non-owners can access APPROVED marketplace agents but NOT pending ones.
+
+
+def _make_mock_db_graph(user_id: str = "owner-user-id") -> MagicMock:
+    graph = MagicMock()
+    graph.userId = user_id
+    graph.id = "graph-id"
+    graph.version = 1
+    graph.Nodes = []
+    return graph
+
+
+@pytest.mark.asyncio
+async def test_get_graph_non_owner_approved_marketplace_agent() -> None:
+    """A non-owner should be able to access a graph that has an APPROVED
+    marketplace listing.  This is the normal marketplace download flow."""
+    owner_id = "owner-user-id"
+    requester_id = "different-user-id"
+    graph_id = "graph-id"
+    mock_graph = _make_mock_db_graph(owner_id)
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    mock_listing = MagicMock()
+    mock_listing.AgentGraph = mock_graph
+
+    with (
+        patch("backend.data.graph.AgentGraph.prisma") as mock_ag_prisma,
+        patch(
+            "backend.data.graph.StoreListingVersion.prisma",
+        ) as mock_slv_prisma,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ),
+    ):
+        # First lookup (owned graph) returns None — requester != owner
+        mock_ag_prisma.return_value.find_first = AsyncMock(return_value=None)
+        # Marketplace fallback finds an APPROVED listing
+        mock_slv_prisma.return_value.find_first = AsyncMock(return_value=mock_listing)
+
+        result = await get_graph(
+            graph_id=graph_id,
+            version=1,
+            user_id=requester_id,
+        )
+
+    assert result is not None, "Non-owner should access APPROVED marketplace agent"
+
+
+@pytest.mark.asyncio
+async def test_get_graph_non_owner_pending_marketplace_agent_denied() -> None:
+    """A non-owner must NOT be able to access a graph that only has a PENDING
+    (not APPROVED) marketplace listing.  The marketplace fallback filters on
+    submissionStatus=APPROVED, so pending agents should be invisible."""
+    requester_id = "different-user-id"
+    graph_id = "graph-id"
+
+    with (
+        patch("backend.data.graph.AgentGraph.prisma") as mock_ag_prisma,
+        patch(
+            "backend.data.graph.StoreListingVersion.prisma",
+        ) as mock_slv_prisma,
+    ):
+        # First lookup (owned graph) returns None
+        mock_ag_prisma.return_value.find_first = AsyncMock(return_value=None)
+        # Marketplace fallback finds nothing (not APPROVED)
+        mock_slv_prisma.return_value.find_first = AsyncMock(return_value=None)
+
+        result = await get_graph(
+            graph_id=graph_id,
+            version=1,
+            user_id=requester_id,
+        )
+
+    assert result is None, "Non-owner must not access a pending marketplace agent"
--- a/autogpt_platform/backend/backend/data/understanding.py
+++ b/autogpt_platform/backend/backend/data/understanding.py
@@ -23,11 +23,29 @@ def _cache_key(user_id: str) -> str:


 def _json_to_list(value: Any) -> list[str]:
-    """Convert Json field to list[str], handling None."""
+    """Convert Json field to list[str], handling None.
+
+    Also handles legacy dict-format rows (e.g. ``{"Learn": [...], "Create": [...]}``
+    from the reverted themed-prompts feature) by flattening all values into a single
+    list so existing personalised data isn't silently lost.
+    """
    if value is None:
        return []
    if isinstance(value, list):
        return cast(list[str], value)
+    if isinstance(value, dict):
+        # Legacy themed-prompt format: flatten all string values from all categories.
+        logger.debug(
+            "_json_to_list: flattening legacy dict-format value (keys=%s)",
+            list(value.keys()),
+        )
+        return [
+            item
+            for vals in value.values()
+            if isinstance(vals, list)
+            for item in vals
+            if isinstance(item, str)
+        ]
    return []


--- a/autogpt_platform/backend/backend/executor/manager.py
+++ b/autogpt_platform/backend/backend/executor/manager.py
@@ -224,7 +224,7 @@ async def execute_node(
    # Sanity check: validate the execution input.
    input_data, error = validate_exec(node, data.inputs, resolve_input=False)
    if input_data is None:
-        log_metadata.error(f"Skip execution, input validation error: {error}")
+        log_metadata.warning(f"Skip execution, input validation error: {error}")
        yield "error", error
        return

--- a/autogpt_platform/backend/backend/util/prompt_test.py
+++ b/autogpt_platform/backend/backend/util/prompt_test.py
@@ -612,7 +612,7 @@ class TestEnsureToolPairsIntact:
    # ---- Mixed/Edge Case Tests ----

    def test_anthropic_with_type_message_field(self):
-        """Test Anthropic format with 'type': 'message' field (smart_decision_maker style)."""
+        """Test Anthropic format with 'type': 'message' field (orchestrator style)."""
        all_msgs = [
            {"role": "system", "content": "You are helpful."},
            {
@@ -628,7 +628,7 @@ class TestEnsureToolPairsIntact:
            },
            {
                "role": "user",
-                "type": "message",  # Extra field from smart_decision_maker
+                "type": "message",  # Extra field from orchestrator
                "content": [
                    {
                        "type": "tool_result",
--- a/autogpt_platform/backend/backend/util/service.py
+++ b/autogpt_platform/backend/backend/util/service.py
@@ -704,8 +704,19 @@ def get_service_client(
            return kwargs

        def _get_return(self, expected_return: TypeAdapter | None, result: Any) -> Any:
+            """Validate and coerce the RPC result to the expected return type.
+
+            Falls back to the raw result with a warning if validation fails.
+            """
            if expected_return:
-                return expected_return.validate_python(result)
+                try:
+                    return expected_return.validate_python(result)
+                except Exception as e:
+                    logger.warning(
+                        "RPC return type validation failed, using raw result: %s",
+                        type(e).__name__,
+                    )
+                    return result
            return result

        def __getattr__(self, name: str) -> Callable[..., Any]:
--- a/autogpt_platform/backend/backend/util/type.py
+++ b/autogpt_platform/backend/backend/util/type.py
@@ -302,7 +302,14 @@ def _value_satisfies_type(value: Any, target: Any) -> bool:

    # Simple type (e.g. str, int)
    if isinstance(target, type):
-        return isinstance(value, target)
+        try:
+            return isinstance(value, target)
+        except TypeError:
+            # TypedDict and some typing constructs don't support isinstance checks.
+            # For TypedDict, check if value is a dict with the required keys.
+            if isinstance(value, dict) and hasattr(target, "__required_keys__"):
+                return all(k in value for k in target.__required_keys__)
+            return False

    return False

--- a/autogpt_platform/backend/poetry.lock
+++ b/autogpt_platform/backend/poetry.lock
@@ -594,26 +594,6 @@ files = [
    {file = "bracex-2.6.tar.gz", hash = "sha256:98f1347cd77e22ee8d967a30ad4e310b233f7754dbf31ff3fceb76145ba47dc7"},
 ]

-[[package]]
-name = "browserbase"
-version = "1.4.0"
-description = "The official Python library for the Browserbase API"
-optional = false
-python-versions = ">=3.8"
-groups = ["main"]
-files = [
-    {file = "browserbase-1.4.0-py3-none-any.whl", hash = "sha256:ea9f1fb4a88921975b8b9606835c441a59d8ce82ce00313a6d48bbe8e30f79fb"},
-    {file = "browserbase-1.4.0.tar.gz", hash = "sha256:e2ed36f513c8630b94b826042c4bb9f497c333f3bd28e5b76cb708c65b4318a0"},
-]
-
-[package.dependencies]
-anyio = ">=3.5.0,<5"
-distro = ">=1.7.0,<2"
-httpx = ">=0.23.0,<1"
-pydantic = ">=1.9.0,<3"
-sniffio = "*"
-typing-extensions = ">=4.10,<5"
-
 [[package]]
 name = "build"
 version = "1.4.0"
@@ -1488,94 +1468,6 @@ files = [
 [package.extras]
 devel = ["colorama", "json-spec", "jsonschema", "pylint", "pytest", "pytest-benchmark", "pytest-cache", "validictory"]

-[[package]]
-name = "fastuuid"
-version = "0.14.0"
-description = "Python bindings to Rust's UUID library."
-optional = false
-python-versions = ">=3.8"
-groups = ["main"]
-files = [
-    {file = "fastuuid-0.14.0-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:6e6243d40f6c793c3e2ee14c13769e341b90be5ef0c23c82fa6515a96145181a"},
-    {file = "fastuuid-0.14.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:13ec4f2c3b04271f62be2e1ce7e95ad2dd1cf97e94503a3760db739afbd48f00"},
-    {file = "fastuuid-0.14.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:b2fdd48b5e4236df145a149d7125badb28e0a383372add3fbaac9a6b7a394470"},
-    {file = "fastuuid-0.14.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f74631b8322d2780ebcf2d2d75d58045c3e9378625ec51865fe0b5620800c39d"},
-    {file = "fastuuid-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:83cffc144dc93eb604b87b179837f2ce2af44871a7b323f2bfed40e8acb40ba8"},
-    {file = "fastuuid-0.14.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1a771f135ab4523eb786e95493803942a5d1fc1610915f131b363f55af53b219"},
-    {file = "fastuuid-0.14.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:4edc56b877d960b4eda2c4232f953a61490c3134da94f3c28af129fb9c62a4f6"},
-    {file = "fastuuid-0.14.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:bcc96ee819c282e7c09b2eed2b9bd13084e3b749fdb2faf58c318d498df2efbe"},
-    {file = "fastuuid-0.14.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:7a3c0bca61eacc1843ea97b288d6789fbad7400d16db24e36a66c28c268cfe3d"},
-    {file = "fastuuid-0.14.0-cp310-cp310-win32.whl", hash = "sha256:7f2f3efade4937fae4e77efae1af571902263de7b78a0aee1a1653795a093b2a"},
-    {file = "fastuuid-0.14.0-cp310-cp310-win_amd64.whl", hash = "sha256:ae64ba730d179f439b0736208b4c279b8bc9c089b102aec23f86512ea458c8a4"},
-    {file = "fastuuid-0.14.0-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:73946cb950c8caf65127d4e9a325e2b6be0442a224fd51ba3b6ac44e1912ce34"},
-    {file = "fastuuid-0.14.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:12ac85024637586a5b69645e7ed986f7535106ed3013640a393a03e461740cb7"},
-    {file = "fastuuid-0.14.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:05a8dde1f395e0c9b4be515b7a521403d1e8349443e7641761af07c7ad1624b1"},
-    {file = "fastuuid-0.14.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:09378a05020e3e4883dfdab438926f31fea15fd17604908f3d39cbeb22a0b4dc"},
-    {file = "fastuuid-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bbb0c4b15d66b435d2538f3827f05e44e2baafcc003dd7d8472dc67807ab8fd8"},
-    {file = "fastuuid-0.14.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cd5a7f648d4365b41dbf0e38fe8da4884e57bed4e77c83598e076ac0c93995e7"},
-    {file = "fastuuid-0.14.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:c0a94245afae4d7af8c43b3159d5e3934c53f47140be0be624b96acd672ceb73"},
-    {file = "fastuuid-0.14.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:2b29e23c97e77c3a9514d70ce343571e469098ac7f5a269320a0f0b3e193ab36"},
-    {file = "fastuuid-0.14.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:1e690d48f923c253f28151b3a6b4e335f2b06bf669c68a02665bc150b7839e94"},
-    {file = "fastuuid-0.14.0-cp311-cp311-win32.whl", hash = "sha256:a6f46790d59ab38c6aa0e35c681c0484b50dc0acf9e2679c005d61e019313c24"},
-    {file = "fastuuid-0.14.0-cp311-cp311-win_amd64.whl", hash = "sha256:e150eab56c95dc9e3fefc234a0eedb342fac433dacc273cd4d150a5b0871e1fa"},
-    {file = "fastuuid-0.14.0-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:77e94728324b63660ebf8adb27055e92d2e4611645bf12ed9d88d30486471d0a"},
-    {file = "fastuuid-0.14.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:caa1f14d2102cb8d353096bc6ef6c13b2c81f347e6ab9d6fbd48b9dea41c153d"},
-    {file = "fastuuid-0.14.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:d23ef06f9e67163be38cece704170486715b177f6baae338110983f99a72c070"},
-    {file = "fastuuid-0.14.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0c9ec605ace243b6dbe3bd27ebdd5d33b00d8d1d3f580b39fdd15cd96fd71796"},
-    {file = "fastuuid-0.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:808527f2407f58a76c916d6aa15d58692a4a019fdf8d4c32ac7ff303b7d7af09"},
-    {file = "fastuuid-0.14.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2fb3c0d7fef6674bbeacdd6dbd386924a7b60b26de849266d1ff6602937675c8"},
-    {file = "fastuuid-0.14.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:ab3f5d36e4393e628a4df337c2c039069344db5f4b9d2a3c9cea48284f1dd741"},
-    {file = "fastuuid-0.14.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:b9a0ca4f03b7e0b01425281ffd44e99d360e15c895f1907ca105854ed85e2057"},
-    {file = "fastuuid-0.14.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:3acdf655684cc09e60fb7e4cf524e8f42ea760031945aa8086c7eae2eeeabeb8"},
-    {file = "fastuuid-0.14.0-cp312-cp312-win32.whl", hash = "sha256:9579618be6280700ae36ac42c3efd157049fe4dd40ca49b021280481c78c3176"},
-    {file = "fastuuid-0.14.0-cp312-cp312-win_amd64.whl", hash = "sha256:d9e4332dc4ba054434a9594cbfaf7823b57993d7d8e7267831c3e059857cf397"},
-    {file = "fastuuid-0.14.0-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:77a09cb7427e7af74c594e409f7731a0cf887221de2f698e1ca0ebf0f3139021"},
-    {file = "fastuuid-0.14.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:9bd57289daf7b153bfa3e8013446aa144ce5e8c825e9e366d455155ede5ea2dc"},
-    {file = "fastuuid-0.14.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:ac60fc860cdf3c3f327374db87ab8e064c86566ca8c49d2e30df15eda1b0c2d5"},
-    {file = "fastuuid-0.14.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ab32f74bd56565b186f036e33129da77db8be09178cd2f5206a5d4035fb2a23f"},
-    {file = "fastuuid-0.14.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:33e678459cf4addaedd9936bbb038e35b3f6b2061330fd8f2f6a1d80414c0f87"},
-    {file = "fastuuid-0.14.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1e3cc56742f76cd25ecb98e4b82a25f978ccffba02e4bdce8aba857b6d85d87b"},
-    {file = "fastuuid-0.14.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:cb9a030f609194b679e1660f7e32733b7a0f332d519c5d5a6a0a580991290022"},
-    {file = "fastuuid-0.14.0-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:09098762aad4f8da3a888eb9ae01c84430c907a297b97166b8abc07b640f2995"},
-    {file = "fastuuid-0.14.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:1383fff584fa249b16329a059c68ad45d030d5a4b70fb7c73a08d98fd53bcdab"},
-    {file = "fastuuid-0.14.0-cp313-cp313-win32.whl", hash = "sha256:a0809f8cc5731c066c909047f9a314d5f536c871a7a22e815cc4967c110ac9ad"},
-    {file = "fastuuid-0.14.0-cp313-cp313-win_amd64.whl", hash = "sha256:0df14e92e7ad3276327631c9e7cec09e32572ce82089c55cb1bb8df71cf394ed"},
-    {file = "fastuuid-0.14.0-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:b852a870a61cfc26c884af205d502881a2e59cc07076b60ab4a951cc0c94d1ad"},
-    {file = "fastuuid-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:c7502d6f54cd08024c3ea9b3514e2d6f190feb2f46e6dbcd3747882264bb5f7b"},
-    {file = "fastuuid-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:1ca61b592120cf314cfd66e662a5b54a578c5a15b26305e1b8b618a6f22df714"},
-    {file = "fastuuid-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:aa75b6657ec129d0abded3bec745e6f7ab642e6dba3a5272a68247e85f5f316f"},
-    {file = "fastuuid-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a8a0dfea3972200f72d4c7df02c8ac70bad1bb4c58d7e0ec1e6f341679073a7f"},
-    {file = "fastuuid-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1bf539a7a95f35b419f9ad105d5a8a35036df35fdafae48fb2fd2e5f318f0d75"},
-    {file = "fastuuid-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:9a133bf9cc78fdbd1179cb58a59ad0100aa32d8675508150f3658814aeefeaa4"},
-    {file = "fastuuid-0.14.0-cp314-cp314-musllinux_1_1_i686.whl", hash = "sha256:f54d5b36c56a2d5e1a31e73b950b28a0d83eb0c37b91d10408875a5a29494bad"},
-    {file = "fastuuid-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:ec27778c6ca3393ef662e2762dba8af13f4ec1aaa32d08d77f71f2a70ae9feb8"},
-    {file = "fastuuid-0.14.0-cp314-cp314-win32.whl", hash = "sha256:e23fc6a83f112de4be0cc1990e5b127c27663ae43f866353166f87df58e73d06"},
-    {file = "fastuuid-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:df61342889d0f5e7a32f7284e55ef95103f2110fee433c2ae7c2c0956d76ac8a"},
-    {file = "fastuuid-0.14.0-cp38-cp38-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:47c821f2dfe95909ead0085d4cb18d5149bca704a2b03e03fb3f81a5202d8cea"},
-    {file = "fastuuid-0.14.0-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:3964bab460c528692c70ab6b2e469dd7a7b152fbe8c18616c58d34c93a6cf8d4"},
-    {file = "fastuuid-0.14.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:c501561e025b7aea3508719c5801c360c711d5218fc4ad5d77bf1c37c1a75779"},
-    {file = "fastuuid-0.14.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2dce5d0756f046fa792a40763f36accd7e466525c5710d2195a038f93ff96346"},
-    {file = "fastuuid-0.14.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:193ca10ff553cf3cc461572da83b5780fc0e3eea28659c16f89ae5202f3958d4"},
-    {file = "fastuuid-0.14.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:0737606764b29785566f968bd8005eace73d3666bd0862f33a760796e26d1ede"},
-    {file = "fastuuid-0.14.0-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:e0976c0dff7e222513d206e06341503f07423aceb1db0b83ff6851c008ceee06"},
-    {file = "fastuuid-0.14.0-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:6fbc49a86173e7f074b1a9ec8cf12ca0d54d8070a85a06ebf0e76c309b84f0d0"},
-    {file = "fastuuid-0.14.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:de01280eabcd82f7542828ecd67ebf1551d37203ecdfd7ab1f2e534edb78d505"},
-    {file = "fastuuid-0.14.0-cp38-cp38-win32.whl", hash = "sha256:af5967c666b7d6a377098849b07f83462c4fedbafcf8eb8bc8ff05dcbe8aa209"},
-    {file = "fastuuid-0.14.0-cp38-cp38-win_amd64.whl", hash = "sha256:c3091e63acf42f56a6f74dc65cfdb6f99bfc79b5913c8a9ac498eb7ca09770a8"},
-    {file = "fastuuid-0.14.0-cp39-cp39-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:2ec3d94e13712a133137b2805073b65ecef4a47217d5bac15d8ac62376cefdb4"},
-    {file = "fastuuid-0.14.0-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:139d7ff12bb400b4a0c76be64c28cbe2e2edf60b09826cbfd85f33ed3d0bbe8b"},
-    {file = "fastuuid-0.14.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:d55b7e96531216fc4f071909e33e35e5bfa47962ae67d9e84b00a04d6e8b7173"},
-    {file = "fastuuid-0.14.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c0eb25f0fd935e376ac4334927a59e7c823b36062080e2e13acbaf2af15db836"},
-    {file = "fastuuid-0.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:089c18018fdbdda88a6dafd7d139f8703a1e7c799618e33ea25eb52503d28a11"},
-    {file = "fastuuid-0.14.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2fc37479517d4d70c08696960fad85494a8a7a0af4e93e9a00af04d74c59f9e3"},
-    {file = "fastuuid-0.14.0-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:73657c9f778aba530bc96a943d30e1a7c80edb8278df77894fe9457540df4f85"},
-    {file = "fastuuid-0.14.0-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:d31f8c257046b5617fc6af9c69be066d2412bdef1edaa4bdf6a214cf57806105"},
-    {file = "fastuuid-0.14.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:5816d41f81782b209843e52fdef757a361b448d782452d96abedc53d545da722"},
-    {file = "fastuuid-0.14.0-cp39-cp39-win32.whl", hash = "sha256:448aa6833f7a84bfe37dd47e33df83250f404d591eb83527fa2cac8d1e57d7f3"},
-    {file = "fastuuid-0.14.0-cp39-cp39-win_amd64.whl", hash = "sha256:84b0779c5abbdec2a9511d5ffbfcd2e53079bf889824b32be170c0d8ef5fc74c"},
-    {file = "fastuuid-0.14.0.tar.gz", hash = "sha256:178947fc2f995b38497a74172adee64fdeb8b7ec18f2a5934d037641ba265d26"},
-]
-
 [[package]]
 name = "feedparser"
 version = "6.0.12"
@@ -2038,7 +1930,6 @@ files = [
 [package.dependencies]
 cryptography = ">=38.0.3"
 pyasn1-modules = ">=0.2.1"
-requests = {version = ">=2.20.0,<3.0.0", optional = true, markers = "extra == \"requests\""}
 rsa = ">=3.1.4,<5"

 [package.extras]
@@ -2240,34 +2131,6 @@ files = [
    {file = "google_crc32c-1.8.0.tar.gz", hash = "sha256:a428e25fb7691024de47fecfbff7ff957214da51eddded0da0ae0e0f03a2cf79"},
 ]

-[[package]]
-name = "google-genai"
-version = "1.62.0"
-description = "GenAI Python SDK"
-optional = false
-python-versions = ">=3.10"
-groups = ["main"]
-files = [
-    {file = "google_genai-1.62.0-py3-none-any.whl", hash = "sha256:4c3daeff3d05fafee4b9a1a31f9c07f01bc22051081aa58b4d61f58d16d1bcc0"},
-    {file = "google_genai-1.62.0.tar.gz", hash = "sha256:709468a14c739a080bc240a4f3191df597bf64485b1ca3728e0fb67517774c18"},
-]
-
-[package.dependencies]
-anyio = ">=4.8.0,<5.0.0"
-distro = ">=1.7.0,<2"
-google-auth = {version = ">=2.47.0,<3.0.0", extras = ["requests"]}
-httpx = ">=0.28.1,<1.0.0"
-pydantic = ">=2.9.0,<3.0.0"
-requests = ">=2.28.1,<3.0.0"
-sniffio = "*"
-tenacity = ">=8.2.3,<9.2.0"
-typing-extensions = ">=4.11.0,<5.0.0"
-websockets = ">=13.0.0,<15.1.0"
-
-[package.extras]
-aiohttp = ["aiohttp (<3.13.3)"]
-local-tokenizer = ["protobuf", "sentencepiece (>=0.2.0)"]
-
 [[package]]
 name = "google-resumable-media"
 version = "2.8.0"
@@ -2360,6 +2223,7 @@ description = "Lightweight in-process concurrent programming"
 optional = false
 python-versions = ">=3.10"
 groups = ["main"]
+markers = "platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\""
 files = [
    {file = "greenlet-3.3.1-cp310-cp310-macosx_11_0_universal2.whl", hash = "sha256:04bee4775f40ecefcdaa9d115ab44736cd4b9c5fba733575bfe9379419582e13"},
    {file = "greenlet-3.3.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:50e1457f4fed12a50e427988a07f0f9df53cf0ee8da23fab16e6732c2ec909d4"},
@@ -2582,42 +2446,6 @@ files = [
 hpack = ">=4.1,<5"
 hyperframe = ">=6.1,<7"

-[[package]]
-name = "hf-xet"
-version = "1.2.0"
-description = "Fast transfer of large files with the Hugging Face Hub."
-optional = false
-python-versions = ">=3.8"
-groups = ["main"]
-markers = "platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"arm64\" or platform_machine == \"aarch64\""
-files = [
-    {file = "hf_xet-1.2.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:ceeefcd1b7aed4956ae8499e2199607765fbd1c60510752003b6cc0b8413b649"},
-    {file = "hf_xet-1.2.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:b70218dd548e9840224df5638fdc94bd033552963cfa97f9170829381179c813"},
-    {file = "hf_xet-1.2.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7d40b18769bb9a8bc82a9ede575ce1a44c75eb80e7375a01d76259089529b5dc"},
-    {file = "hf_xet-1.2.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:cd3a6027d59cfb60177c12d6424e31f4b5ff13d8e3a1247b3a584bf8977e6df5"},
-    {file = "hf_xet-1.2.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6de1fc44f58f6dd937956c8d304d8c2dea264c80680bcfa61ca4a15e7b76780f"},
-    {file = "hf_xet-1.2.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:f182f264ed2acd566c514e45da9f2119110e48a87a327ca271027904c70c5832"},
-    {file = "hf_xet-1.2.0-cp313-cp313t-win_amd64.whl", hash = "sha256:293a7a3787e5c95d7be1857358a9130694a9c6021de3f27fa233f37267174382"},
-    {file = "hf_xet-1.2.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:10bfab528b968c70e062607f663e21e34e2bba349e8038db546646875495179e"},
-    {file = "hf_xet-1.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:2a212e842647b02eb6a911187dc878e79c4aa0aa397e88dd3b26761676e8c1f8"},
-    {file = "hf_xet-1.2.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:30e06daccb3a7d4c065f34fc26c14c74f4653069bb2b194e7f18f17cbe9939c0"},
-    {file = "hf_xet-1.2.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:29c8fc913a529ec0a91867ce3d119ac1aac966e098cf49501800c870328cc090"},
-    {file = "hf_xet-1.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:66e159cbfcfbb29f920db2c09ed8b660eb894640d284f102ada929b6e3dc410a"},
-    {file = "hf_xet-1.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:9c91d5ae931510107f148874e9e2de8a16052b6f1b3ca3c1b12f15ccb491390f"},
-    {file = "hf_xet-1.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:210d577732b519ac6ede149d2f2f34049d44e8622bf14eb3d63bbcd2d4b332dc"},
-    {file = "hf_xet-1.2.0-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:46740d4ac024a7ca9b22bebf77460ff43332868b661186a8e46c227fdae01848"},
-    {file = "hf_xet-1.2.0-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:27df617a076420d8845bea087f59303da8be17ed7ec0cd7ee3b9b9f579dff0e4"},
-    {file = "hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3651fd5bfe0281951b988c0facbe726aa5e347b103a675f49a3fa8144c7968fd"},
-    {file = "hf_xet-1.2.0-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:d06fa97c8562fb3ee7a378dd9b51e343bc5bc8190254202c9771029152f5e08c"},
-    {file = "hf_xet-1.2.0-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:4c1428c9ae73ec0939410ec73023c4f842927f39db09b063b9482dac5a3bb737"},
-    {file = "hf_xet-1.2.0-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a55558084c16b09b5ed32ab9ed38421e2d87cf3f1f89815764d1177081b99865"},
-    {file = "hf_xet-1.2.0-cp37-abi3-win_amd64.whl", hash = "sha256:e6584a52253f72c9f52f9e549d5895ca7a471608495c4ecaa6cc73dba2b24d69"},
-    {file = "hf_xet-1.2.0.tar.gz", hash = "sha256:a8c27070ca547293b6890c4bf389f713f80e8c478631432962bb7f4bc0bd7d7f"},
-]
-
-[package.extras]
-tests = ["pytest"]
-
 [[package]]
 name = "hpack"
 version = "4.1.0"
@@ -2769,42 +2597,6 @@ files = [
    {file = "httpx_sse-0.4.3.tar.gz", hash = "sha256:9b1ed0127459a66014aec3c56bebd93da3c1bc8bb6618c8082039a44889a755d"},
 ]

-[[package]]
-name = "huggingface-hub"
-version = "1.4.1"
-description = "Client library to download and publish models, datasets and other repos on the huggingface.co hub"
-optional = false
-python-versions = ">=3.9.0"
-groups = ["main"]
-files = [
-    {file = "huggingface_hub-1.4.1-py3-none-any.whl", hash = "sha256:9931d075fb7a79af5abc487106414ec5fba2c0ae86104c0c62fd6cae38873d18"},
-    {file = "huggingface_hub-1.4.1.tar.gz", hash = "sha256:b41131ec35e631e7383ab26d6146b8d8972abc8b6309b963b306fbcca87f5ed5"},
-]
-
-[package.dependencies]
-filelock = "*"
-fsspec = ">=2023.5.0"
-hf-xet = {version = ">=1.2.0,<2.0.0", markers = "platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"arm64\" or platform_machine == \"aarch64\""}
-httpx = ">=0.23.0,<1"
-packaging = ">=20.9"
-pyyaml = ">=5.1"
-shellingham = "*"
-tqdm = ">=4.42.1"
-typer-slim = "*"
-typing-extensions = ">=4.1.0"
-
-[package.extras]
-all = ["Jinja2", "Pillow", "authlib (>=1.3.2)", "fastapi", "fastapi", "httpx", "itsdangerous", "jedi", "libcst (>=1.4.0)", "mypy (==1.15.0)", "numpy", "pytest (>=8.4.2)", "pytest-asyncio", "pytest-cov", "pytest-env", "pytest-mock", "pytest-rerunfailures (<16.0)", "pytest-vcr", "pytest-xdist", "ruff (>=0.9.0)", "soundfile", "ty", "types-PyYAML", "types-simplejson", "types-toml", "types-tqdm", "types-urllib3", "typing-extensions (>=4.8.0)", "urllib3 (<2.0)"]
-dev = ["Jinja2", "Pillow", "authlib (>=1.3.2)", "fastapi", "fastapi", "httpx", "itsdangerous", "jedi", "libcst (>=1.4.0)", "mypy (==1.15.0)", "numpy", "pytest (>=8.4.2)", "pytest-asyncio", "pytest-cov", "pytest-env", "pytest-mock", "pytest-rerunfailures (<16.0)", "pytest-vcr", "pytest-xdist", "ruff (>=0.9.0)", "soundfile", "ty", "types-PyYAML", "types-simplejson", "types-toml", "types-tqdm", "types-urllib3", "typing-extensions (>=4.8.0)", "urllib3 (<2.0)"]
-fastai = ["fastai (>=2.4)", "fastcore (>=1.3.27)", "toml"]
-hf-xet = ["hf-xet (>=1.2.0,<2.0.0)"]
-mcp = ["mcp (>=1.8.0)"]
-oauth = ["authlib (>=1.3.2)", "fastapi", "httpx", "itsdangerous"]
-quality = ["libcst (>=1.4.0)", "mypy (==1.15.0)", "ruff (>=0.9.0)", "ty"]
-testing = ["Jinja2", "Pillow", "authlib (>=1.3.2)", "fastapi", "fastapi", "httpx", "itsdangerous", "jedi", "numpy", "pytest (>=8.4.2)", "pytest-asyncio", "pytest-cov", "pytest-env", "pytest-mock", "pytest-rerunfailures (<16.0)", "pytest-vcr", "pytest-xdist", "soundfile", "urllib3 (<2.0)"]
-torch = ["safetensors[torch]", "torch"]
-typing = ["types-PyYAML", "types-simplejson", "types-toml", "types-tqdm", "types-urllib3", "typing-extensions (>=4.8.0)"]
-
 [[package]]
 name = "hyperframe"
 version = "6.1.0"
@@ -3350,40 +3142,6 @@ dynamodb = ["boto3 (>=1.9.71)"]
 redis = ["redis (>=2.10.5)"]
 test-filesource = ["pyyaml (>=5.3.1)", "watchdog (>=3.0.0)"]

-[[package]]
-name = "litellm"
-version = "1.80.0"
-description = "Library to easily interface with LLM API providers"
-optional = false
-python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8"
-groups = ["main"]
-files = [
-    {file = "litellm-1.80.0-py3-none-any.whl", hash = "sha256:fd0009758f4772257048d74bf79bb64318859adb4ea49a8b66fdbc718cd80b6e"},
-    {file = "litellm-1.80.0.tar.gz", hash = "sha256:eeac733eb6b226f9e5fb020f72fe13a32b3354b001dc62bcf1bc4d9b526d6231"},
-]
-
-[package.dependencies]
-aiohttp = ">=3.10"
-click = "*"
-fastuuid = ">=0.13.0"
-httpx = ">=0.23.0"
-importlib-metadata = ">=6.8.0"
-jinja2 = ">=3.1.2,<4.0.0"
-jsonschema = ">=4.22.0,<5.0.0"
-openai = ">=1.99.5"
-pydantic = ">=2.5.0,<3.0.0"
-python-dotenv = ">=0.2.0"
-tiktoken = ">=0.7.0"
-tokenizers = "*"
-
-[package.extras]
-caching = ["diskcache (>=5.6.1,<6.0.0)"]
-extra-proxy = ["azure-identity (>=1.15.0,<2.0.0)", "azure-keyvault-secrets (>=4.8.0,<5.0.0)", "google-cloud-iam (>=2.19.1,<3.0.0)", "google-cloud-kms (>=2.21.3,<3.0.0)", "prisma (==0.11.0)", "redisvl (>=0.4.1,<0.5.0) ; python_version >= \"3.9\" and python_version < \"3.14\"", "resend (>=0.8.0,<0.9.0)"]
-mlflow = ["mlflow (>3.1.4) ; python_version >= \"3.10\""]
-proxy = ["PyJWT (>=2.8.0,<3.0.0)", "apscheduler (>=3.10.4,<4.0.0)", "azure-identity (>=1.15.0,<2.0.0)", "azure-storage-blob (>=12.25.1,<13.0.0)", "backoff", "boto3 (==1.36.0)", "cryptography", "fastapi (>=0.120.1)", "fastapi-sso (>=0.16.0,<0.17.0)", "gunicorn (>=23.0.0,<24.0.0)", "litellm-enterprise (==0.1.21)", "litellm-proxy-extras (==0.4.5)", "mcp (>=1.10.0,<2.0.0) ; python_version >= \"3.10\"", "orjson (>=3.9.7,<4.0.0)", "polars (>=1.31.0,<2.0.0) ; python_version >= \"3.10\"", "pynacl (>=1.5.0,<2.0.0)", "python-multipart (>=0.0.18,<0.0.19)", "pyyaml (>=6.0.1,<7.0.0)", "rich (==13.7.1)", "rq", "soundfile (>=0.12.1,<0.13.0)", "uvicorn (>=0.29.0,<0.30.0)", "uvloop (>=0.21.0,<0.22.0) ; sys_platform != \"win32\"", "websockets (>=13.1.0,<14.0.0)"]
-semantic-router = ["semantic-router ; python_version >= \"3.9\""]
-utils = ["numpydoc"]
-
 [[package]]
 name = "markdown-it-py"
 version = "4.0.0"
@@ -4857,28 +4615,6 @@ docs = ["furo (>=2025.9.25)", "proselint (>=0.14)", "sphinx (>=8.2.3)", "sphinx-
 test = ["appdirs (==1.4.4)", "covdefaults (>=2.3)", "pytest (>=8.4.2)", "pytest-cov (>=7)", "pytest-mock (>=3.15.1)"]
 type = ["mypy (>=1.18.2)"]

-[[package]]
-name = "playwright"
-version = "1.58.0"
-description = "A high-level API to automate web browsers"
-optional = false
-python-versions = ">=3.9"
-groups = ["main"]
-files = [
-    {file = "playwright-1.58.0-py3-none-macosx_10_13_x86_64.whl", hash = "sha256:96e3204aac292ee639edbfdef6298b4be2ea0a55a16b7068df91adac077cc606"},
-    {file = "playwright-1.58.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:70c763694739d28df71ed578b9c8202bb83e8fe8fb9268c04dd13afe36301f71"},
-    {file = "playwright-1.58.0-py3-none-macosx_11_0_universal2.whl", hash = "sha256:185e0132578733d02802dfddfbbc35f42be23a45ff49ccae5081f25952238117"},
-    {file = "playwright-1.58.0-py3-none-manylinux1_x86_64.whl", hash = "sha256:c95568ba1eda83812598c1dc9be60b4406dffd60b149bc1536180ad108723d6b"},
-    {file = "playwright-1.58.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8f9999948f1ab541d98812de25e3a8c410776aa516d948807140aff797b4bffa"},
-    {file = "playwright-1.58.0-py3-none-win32.whl", hash = "sha256:1e03be090e75a0fabbdaeab65ce17c308c425d879fa48bb1d7986f96bfad0b99"},
-    {file = "playwright-1.58.0-py3-none-win_amd64.whl", hash = "sha256:a2bf639d0ce33b3ba38de777e08697b0d8f3dc07ab6802e4ac53fb65e3907af8"},
-    {file = "playwright-1.58.0-py3-none-win_arm64.whl", hash = "sha256:32ffe5c303901a13a0ecab91d1c3f74baf73b84f4bedbb6b935f5bc11cc98e1b"},
-]
-
-[package.dependencies]
-greenlet = ">=3.1.1,<4.0.0"
-pyee = ">=13,<14"
-
 [[package]]
 name = "pluggy"
 version = "1.6.0"
@@ -5865,24 +5601,6 @@ gcp-secret-manager = ["google-cloud-secret-manager (>=2.23.1)"]
 toml = ["tomli (>=2.0.1)"]
 yaml = ["pyyaml (>=6.0.1)"]

-[[package]]
-name = "pyee"
-version = "13.0.0"
-description = "A rough port of Node.js's EventEmitter to Python with a few tricks of its own"
-optional = false
-python-versions = ">=3.8"
-groups = ["main"]
-files = [
-    {file = "pyee-13.0.0-py3-none-any.whl", hash = "sha256:48195a3cddb3b1515ce0695ed76036b5ccc2ef3a9f963ff9f77aec0139845498"},
-    {file = "pyee-13.0.0.tar.gz", hash = "sha256:b391e3c5a434d1f5118a25615001dbc8f669cf410ab67d04c4d4e07c55481c37"},
-]
-
-[package.dependencies]
-typing-extensions = "*"
-
-[package.extras]
-dev = ["black", "build", "flake8", "flake8-black", "isort", "jupyter-console", "mkdocs", "mkdocs-include-markdown-plugin", "mkdocstrings[python]", "mypy", "pytest", "pytest-asyncio ; python_version >= \"3.4\"", "pytest-trio ; python_version >= \"3.7\"", "sphinx", "toml", "tox", "trio", "trio ; python_version > \"3.6\"", "trio-typing ; python_version > \"3.6\"", "twine", "twisted", "validate-pyproject[all]"]
-
 [[package]]
 name = "pyflakes"
 version = "3.4.0"
@@ -7315,32 +7033,29 @@ uvicorn = ["uvicorn (>=0.34.0)"]

 [[package]]
 name = "stagehand"
-version = "0.5.9"
-description = "Python SDK for Stagehand"
+version = "3.7.0"
+description = "The official Python library for the stagehand API"
 optional = false
 python-versions = ">=3.9"
 groups = ["main"]
 files = [
-    {file = "stagehand-0.5.9-py3-none-any.whl", hash = "sha256:cc8d2a114799ea1c3d6f199e86abd6479a8b338a101fffa6824d85b542ed9071"},
-    {file = "stagehand-0.5.9.tar.gz", hash = "sha256:068a2825b02fbc949ab9d1cf59b80d2c17caba0259e759d807f38d0e9ab236b0"},
+    {file = "stagehand-3.7.0-py3-none-macosx_10_9_x86_64.whl", hash = "sha256:4918068e6c02717c09766f1df41d5a41ac2ad9b610a30bb584a7d5d359f8d654"},
+    {file = "stagehand-3.7.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:cedb940ebbd47930227f5ef82077080aeb1f77480913382183187aba98e3cca5"},
+    {file = "stagehand-3.7.0-py3-none-manylinux2014_x86_64.whl", hash = "sha256:87df69bca9a611c4acae7383333f1e0cf67cc5b92be91639c65772aa59f8e6ea"},
+    {file = "stagehand-3.7.0-py3-none-win_amd64.whl", hash = "sha256:09d809f3b35389b2ed0b879e8909a8ed01e1ba9330f39c08cfcefe1699197585"},
+    {file = "stagehand-3.7.0.tar.gz", hash = "sha256:53cdd79111147a4c6fedcf17ef92427472beaf11ad3fcd800736ae3475a5cc54"},
 ]

 [package.dependencies]
-anthropic = ">=0.51.0"
-browserbase = ">=1.4.0"
-google-genai = ">=1.40.0"
-httpx = ">=0.24.0"
-litellm = ">=1.72.0,<=1.80.0"
-nest-asyncio = ">=1.6.0"
-openai = ">=1.99.6"
-playwright = ">=1.42.1"
-pydantic = ">=1.10.0"
-python-dotenv = ">=1.0.0"
-requests = ">=2.31.0"
-rich = ">=13.7.0"
+anyio = ">=3.5.0,<5"
+distro = ">=1.7.0,<2"
+httpx = ">=0.23.0,<1"
+pydantic = ">=1.9.0,<3"
+sniffio = "*"
+typing-extensions = ">=4.14,<5"

 [package.extras]
-dev = ["black (>=23.3.0)", "isort (>=5.12.0)", "mypy (>=1.3.0)", "psutil (>=5.9.0)", "pytest (>=7.3.1)", "pytest-asyncio (>=0.21.0)", "pytest-cov (>=4.1.0)", "pytest-mock (>=3.10.0)", "ruff"]
+aiohttp = ["aiohttp", "httpx-aiohttp (>=0.1.9)"]

 [[package]]
 name = "starlette"
@@ -7607,48 +7322,6 @@ files = [
 [package.dependencies]
 requests = ">=2.32.3,<3.0.0"

-[[package]]
-name = "tokenizers"
-version = "0.22.2"
-description = ""
-optional = false
-python-versions = ">=3.9"
-groups = ["main"]
-files = [
-    {file = "tokenizers-0.22.2-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:544dd704ae7238755d790de45ba8da072e9af3eea688f698b137915ae959281c"},
-    {file = "tokenizers-0.22.2-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:1e418a55456beedca4621dbab65a318981467a2b188e982a23e117f115ce5001"},
-    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2249487018adec45d6e3554c71d46eb39fa8ea67156c640f7513eb26f318cec7"},
-    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:25b85325d0815e86e0bac263506dd114578953b7b53d7de09a6485e4a160a7dd"},
-    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:bfb88f22a209ff7b40a576d5324bf8286b519d7358663db21d6246fb17eea2d5"},
-    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1c774b1276f71e1ef716e5486f21e76333464f47bece56bbd554485982a9e03e"},
-    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:df6c4265b289083bf710dff49bc51ef252f9d5be33a45ee2bed151114a56207b"},
-    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:369cc9fc8cc10cb24143873a0d95438bb8ee257bb80c71989e3ee290e8d72c67"},
-    {file = "tokenizers-0.22.2-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:29c30b83d8dcd061078b05ae0cb94d3c710555fbb44861139f9f83dcca3dc3e4"},
-    {file = "tokenizers-0.22.2-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:37ae80a28c1d3265bb1f22464c856bd23c02a05bb211e56d0c5301a435be6c1a"},
-    {file = "tokenizers-0.22.2-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:791135ee325f2336f498590eb2f11dc5c295232f288e75c99a36c5dbce63088a"},
-    {file = "tokenizers-0.22.2-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:38337540fbbddff8e999d59970f3c6f35a82de10053206a7562f1ea02d046fa5"},
-    {file = "tokenizers-0.22.2-cp39-abi3-win32.whl", hash = "sha256:a6bf3f88c554a2b653af81f3204491c818ae2ac6fbc09e76ef4773351292bc92"},
-    {file = "tokenizers-0.22.2-cp39-abi3-win_amd64.whl", hash = "sha256:c9ea31edff2968b44a88f97d784c2f16dc0729b8b143ed004699ebca91f05c48"},
-    {file = "tokenizers-0.22.2-cp39-abi3-win_arm64.whl", hash = "sha256:9ce725d22864a1e965217204946f830c37876eee3b2ba6fc6255e8e903d5fcbc"},
-    {file = "tokenizers-0.22.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:753d47ebd4542742ef9261d9da92cd545b2cacbb48349a1225466745bb866ec4"},
-    {file = "tokenizers-0.22.2-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e10bf9113d209be7cd046d40fbabbaf3278ff6d18eb4da4c500443185dc1896c"},
-    {file = "tokenizers-0.22.2-pp310-pypy310_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:64d94e84f6660764e64e7e0b22baa72f6cd942279fdbb21d46abd70d179f0195"},
-    {file = "tokenizers-0.22.2-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f01a9c019878532f98927d2bacb79bbb404b43d3437455522a00a30718cdedb5"},
-    {file = "tokenizers-0.22.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:319f659ee992222f04e58f84cbf407cfa66a65fe3a8de44e8ad2bc53e7d99012"},
-    {file = "tokenizers-0.22.2-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:1e50f8554d504f617d9e9d6e4c2c2884a12b388a97c5c77f0bc6cf4cd032feee"},
-    {file = "tokenizers-0.22.2-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1a62ba2c5faa2dd175aaeed7b15abf18d20266189fb3406c5d0550dd34dd5f37"},
-    {file = "tokenizers-0.22.2-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:143b999bdc46d10febb15cbffb4207ddd1f410e2c755857b5a0797961bbdc113"},
-    {file = "tokenizers-0.22.2.tar.gz", hash = "sha256:473b83b915e547aa366d1eee11806deaf419e17be16310ac0a14077f1e28f917"},
-]
-
-[package.dependencies]
-huggingface-hub = ">=0.16.4,<2.0"
-
-[package.extras]
-dev = ["tokenizers[testing]"]
-docs = ["setuptools-rust", "sphinx", "sphinx-rtd-theme"]
-testing = ["datasets", "numpy", "pytest", "pytest-asyncio", "requests", "ruff", "ty"]
-
 [[package]]
 name = "tomli"
 version = "2.4.0"
@@ -7775,25 +7448,6 @@ async = ["aiohttp (>=3.7.3,<4)", "async-lru (>=1.0.3,<3)"]
 dev = ["coverage (>=4.4.2)", "coveralls (>=2.1.0)", "tox (>=3.21.0)"]
 test = ["urllib3 (<2)", "vcrpy (>=1.10.3)"]

-[[package]]
-name = "typer-slim"
-version = "0.21.1"
-description = "Typer, build great CLIs. Easy to code. Based on Python type hints."
-optional = false
-python-versions = ">=3.9"
-groups = ["main"]
-files = [
-    {file = "typer_slim-0.21.1-py3-none-any.whl", hash = "sha256:6e6c31047f171ac93cc5a973c9e617dbc5ab2bddc4d0a3135dc161b4e2020e0d"},
-    {file = "typer_slim-0.21.1.tar.gz", hash = "sha256:73495dd08c2d0940d611c5a8c04e91c2a0a98600cbd4ee19192255a233b6dbfd"},
-]
-
-[package.dependencies]
-click = ">=8.0.0"
-typing-extensions = ">=3.7.4.3"
-
-[package.extras]
-standard = ["rich (>=10.11.0)", "shellingham (>=1.3.0)"]
-
 [[package]]
 name = "typing-extensions"
 version = "4.15.0"
@@ -8976,4 +8630,4 @@ cffi = ["cffi (>=1.17,<2.0) ; platform_python_implementation != \"PyPy\" and pyt
 [metadata]
 lock-version = "2.1"
 python-versions = ">=3.10,<3.14"
-content-hash = "938e93b7de4005bdd60ce5fb542a63df79115f9e21b1cb9940a19605f00d354a"
+content-hash = "1dd10577184ebff0d10997f4c6ba49484de79b7fa090946e8e5ce5c5bac3cdeb"
--- a/autogpt_platform/backend/pyproject.toml
+++ b/autogpt_platform/backend/pyproject.toml
@@ -88,7 +88,7 @@ pandas = "^2.3.1"
 firecrawl-py = "^4.3.6"
 exa-py = "^1.14.20"
 croniter = "^6.0.0"
-stagehand = "^0.5.1"
+stagehand = "^3.4.0"
 gravitas-md2gdocs = "^0.1.0"
 posthog = "^7.6.0"
 fpdf2 = "^2.8.6"
--- a/autogpt_platform/backend/scripts/refresh_claude_token.sh
+++ b/autogpt_platform/backend/scripts/refresh_claude_token.sh
@@ -0,0 +1,123 @@
+#!/usr/bin/env bash
+# refresh_claude_token.sh — Extract Claude OAuth tokens and update backend/.env
+#
+# Works on macOS (keychain), Linux (~/.claude/.credentials.json),
+# and Windows/WSL (~/.claude/.credentials.json or PowerShell fallback).
+#
+# Usage:
+#   ./scripts/refresh_claude_token.sh              # auto-detect OS
+#   ./scripts/refresh_claude_token.sh --env-file /path/to/.env  # custom .env path
+#
+# Prerequisite: You must have run `claude login` at least once on the host.
+
+set -euo pipefail
+
+# --- Parse arguments ---
+ENV_FILE=""
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --env-file) ENV_FILE="$2"; shift 2 ;;
+    *) echo "Unknown option: $1"; exit 1 ;;
+  esac
+done
+
+# Default .env path: relative to this script's location
+if [[ -z "$ENV_FILE" ]]; then
+  SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+  ENV_FILE="$SCRIPT_DIR/../.env"
+fi
+
+# --- Extract tokens by platform ---
+ACCESS_TOKEN=""
+REFRESH_TOKEN=""
+
+extract_from_credentials_file() {
+  local creds_file="$1"
+  if [[ -f "$creds_file" ]]; then
+    ACCESS_TOKEN=$(jq -r '.claudeAiOauth.accessToken // ""' "$creds_file" 2>/dev/null)
+    REFRESH_TOKEN=$(jq -r '.claudeAiOauth.refreshToken // ""' "$creds_file" 2>/dev/null)
+  fi
+}
+
+case "$(uname -s)" in
+  Darwin)
+    # macOS: extract from system keychain
+    CREDS_JSON=$(security find-generic-password -s "Claude Code-credentials" -w 2>/dev/null || true)
+    if [[ -n "$CREDS_JSON" ]]; then
+      ACCESS_TOKEN=$(echo "$CREDS_JSON" | jq -r '.claudeAiOauth.accessToken // ""' 2>/dev/null)
+      REFRESH_TOKEN=$(echo "$CREDS_JSON" | jq -r '.claudeAiOauth.refreshToken // ""' 2>/dev/null)
+    else
+      # Fallback to credentials file (e.g. if keychain access denied)
+      extract_from_credentials_file "$HOME/.claude/.credentials.json"
+    fi
+    ;;
+  Linux)
+    # Linux (including WSL): read from credentials file
+    extract_from_credentials_file "$HOME/.claude/.credentials.json"
+    ;;
+  MINGW*|MSYS*|CYGWIN*)
+    # Windows Git Bash / MSYS2 / Cygwin
+    APPDATA_PATH="${APPDATA:-$USERPROFILE/AppData/Roaming}"
+    extract_from_credentials_file "$APPDATA_PATH/claude/.credentials.json"
+    # Fallback to home dir
+    if [[ -z "$ACCESS_TOKEN" ]]; then
+      extract_from_credentials_file "$HOME/.claude/.credentials.json"
+    fi
+    ;;
+  *)
+    echo "Unsupported platform: $(uname -s)"
+    exit 1
+    ;;
+esac
+
+# --- Validate ---
+if [[ -z "$ACCESS_TOKEN" ]]; then
+  echo "ERROR: Could not extract Claude OAuth token."
+  echo ""
+  echo "Make sure you have run 'claude login' at least once."
+  echo ""
+  echo "Locations checked:"
+  echo "  macOS:   Keychain ('Claude Code-credentials')"
+  echo "  Linux:   ~/.claude/.credentials.json"
+  echo "  Windows: %APPDATA%/claude/.credentials.json"
+  exit 1
+fi
+
+echo "Found Claude OAuth token: ${ACCESS_TOKEN:0:20}..."
+[[ -n "$REFRESH_TOKEN" ]] && echo "Found refresh token:  ${REFRESH_TOKEN:0:20}..."
+
+# --- Update .env file ---
+update_env_var() {
+  local key="$1" value="$2" file="$3"
+  if grep -q "^${key}=" "$file" 2>/dev/null; then
+    # Replace existing value (works on both macOS and Linux sed)
+    if [[ "$(uname -s)" == "Darwin" ]]; then
+      sed -i '' "s|^${key}=.*|${key}=${value}|" "$file"
+    else
+      sed -i "s|^${key}=.*|${key}=${value}|" "$file"
+    fi
+  elif grep -q "^# *${key}=" "$file" 2>/dev/null; then
+    # Uncomment and set
+    if [[ "$(uname -s)" == "Darwin" ]]; then
+      sed -i '' "s|^# *${key}=.*|${key}=${value}|" "$file"
+    else
+      sed -i "s|^# *${key}=.*|${key}=${value}|" "$file"
+    fi
+  else
+    # Append
+    echo "${key}=${value}" >> "$file"
+  fi
+}
+
+if [[ ! -f "$ENV_FILE" ]]; then
+  echo "WARNING: $ENV_FILE does not exist, creating it."
+  touch "$ENV_FILE"
+fi
+
+update_env_var "CLAUDE_CODE_OAUTH_TOKEN" "$ACCESS_TOKEN" "$ENV_FILE"
+[[ -n "$REFRESH_TOKEN" ]] && update_env_var "CLAUDE_CODE_REFRESH_TOKEN" "$REFRESH_TOKEN" "$ENV_FILE"
+update_env_var "CHAT_USE_CLAUDE_CODE_SUBSCRIPTION" "true" "$ENV_FILE"
+
+echo ""
+echo "Updated $ENV_FILE with Claude subscription tokens."
+echo "Run 'docker compose up -d copilot_executor' to apply."
--- a/autogpt_platform/backend/test/agent_generator/test_smart_decision_maker.py
+++ b/autogpt_platform/backend/test/agent_generator/test_smart_decision_maker.py
@@ -1,10 +1,10 @@
 """
-Tests for SmartDecisionMakerBlock support in agent generator.
+Tests for OrchestratorBlock support in agent generator.

 Covers:
- AgentFixer.fix_smart_decision_maker_blocks()
- AgentValidator.validate_smart_decision_maker_blocks()
- End-to-end fix → validate → pipeline for SmartDecisionMaker agents
+- AgentFixer.fix_orchestrator_blocks()
+- AgentValidator.validate_orchestrator_blocks()
+- End-to-end fix → validate → pipeline for Orchestrator agents
 """

 import uuid
@@ -14,7 +14,7 @@ from backend.copilot.tools.agent_generator.helpers import (
    AGENT_EXECUTOR_BLOCK_ID,
    AGENT_INPUT_BLOCK_ID,
    AGENT_OUTPUT_BLOCK_ID,
-    SMART_DECISION_MAKER_BLOCK_ID,
+    TOOL_ORCHESTRATOR_BLOCK_ID,
 )
 from backend.copilot.tools.agent_generator.validator import AgentValidator

@@ -28,10 +28,10 @@ def _make_sdm_node(
    input_default: dict | None = None,
    metadata: dict | None = None,
 ) -> dict:
-    """Create a SmartDecisionMakerBlock node dict."""
+    """Create a OrchestratorBlock node dict."""
    return {
        "id": node_id or _uid(),
-        "block_id": SMART_DECISION_MAKER_BLOCK_ID,
+        "block_id": TOOL_ORCHESTRATOR_BLOCK_ID,
        "input_default": input_default or {},
        "metadata": metadata or {"position": {"x": 0, "y": 0}},
    }
@@ -125,15 +125,15 @@ def _make_orchestrator_agent() -> dict:
 # ---------------------------------------------------------------------------


-class TestFixSmartDecisionMakerBlocks:
-    """Tests for AgentFixer.fix_smart_decision_maker_blocks()."""
+class TestFixOrchestratorBlocks:
+    """Tests for AgentFixer.fix_orchestrator_blocks()."""

    def test_fills_defaults_when_missing(self):
        """All agent-mode defaults are populated for a bare SDM node."""
        fixer = AgentFixer()
        agent = {"nodes": [_make_sdm_node()], "links": []}

-        result = fixer.fix_smart_decision_maker_blocks(agent)
+        result = fixer.fix_orchestrator_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 10
@@ -159,7 +159,7 @@ class TestFixSmartDecisionMakerBlocks:
            "links": [],
        }

-        result = fixer.fix_smart_decision_maker_blocks(agent)
+        result = fixer.fix_orchestrator_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 5
@@ -182,7 +182,7 @@ class TestFixSmartDecisionMakerBlocks:
            "links": [],
        }

-        result = fixer.fix_smart_decision_maker_blocks(agent)
+        result = fixer.fix_orchestrator_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 10  # kept
@@ -192,7 +192,7 @@ class TestFixSmartDecisionMakerBlocks:
        assert len(fixer.fixes_applied) == 3

    def test_skips_non_sdm_nodes(self):
-        """Non-SmartDecisionMaker nodes are untouched."""
+        """Non-Orchestrator nodes are untouched."""
        fixer = AgentFixer()
        other_node = {
            "id": _uid(),
@@ -202,7 +202,7 @@ class TestFixSmartDecisionMakerBlocks:
        }
        agent = {"nodes": [other_node], "links": []}

-        result = fixer.fix_smart_decision_maker_blocks(agent)
+        result = fixer.fix_orchestrator_blocks(agent)

        assert "agent_mode_max_iterations" not in result["nodes"][0]["input_default"]
        assert len(fixer.fixes_applied) == 0
@@ -212,12 +212,12 @@ class TestFixSmartDecisionMakerBlocks:
        fixer = AgentFixer()
        node = {
            "id": _uid(),
-            "block_id": SMART_DECISION_MAKER_BLOCK_ID,
+            "block_id": TOOL_ORCHESTRATOR_BLOCK_ID,
            "metadata": {},
        }
        agent = {"nodes": [node], "links": []}

-        result = fixer.fix_smart_decision_maker_blocks(agent)
+        result = fixer.fix_orchestrator_blocks(agent)

        assert "input_default" in result["nodes"][0]
        assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 10
@@ -227,13 +227,13 @@ class TestFixSmartDecisionMakerBlocks:
        fixer = AgentFixer()
        node = {
            "id": _uid(),
-            "block_id": SMART_DECISION_MAKER_BLOCK_ID,
+            "block_id": TOOL_ORCHESTRATOR_BLOCK_ID,
            "input_default": None,
            "metadata": {},
        }
        agent = {"nodes": [node], "links": []}

-        result = fixer.fix_smart_decision_maker_blocks(agent)
+        result = fixer.fix_orchestrator_blocks(agent)

        assert isinstance(result["nodes"][0]["input_default"], dict)
        assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 10
@@ -255,7 +255,7 @@ class TestFixSmartDecisionMakerBlocks:
            "links": [],
        }

-        result = fixer.fix_smart_decision_maker_blocks(agent)
+        result = fixer.fix_orchestrator_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 10  # None → default
@@ -275,7 +275,7 @@ class TestFixSmartDecisionMakerBlocks:
            "links": [],
        }

-        result = fixer.fix_smart_decision_maker_blocks(agent)
+        result = fixer.fix_orchestrator_blocks(agent)

        # First node: 3 defaults filled (agent_mode was already set)
        assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 3
@@ -284,7 +284,7 @@ class TestFixSmartDecisionMakerBlocks:
        assert len(fixer.fixes_applied) == 7  # 3 + 4

    def test_registered_in_apply_all_fixes(self):
-        """fix_smart_decision_maker_blocks runs as part of apply_all_fixes."""
+        """fix_orchestrator_blocks runs as part of apply_all_fixes."""
        fixer = AgentFixer()
        agent = {
            "nodes": [_make_sdm_node()],
@@ -295,7 +295,7 @@ class TestFixSmartDecisionMakerBlocks:

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 10
-        assert any("SmartDecisionMakerBlock" in fix for fix in fixer.fixes_applied)
+        assert any("OrchestratorBlock" in fix for fix in fixer.fixes_applied)


 # ---------------------------------------------------------------------------
@@ -303,15 +303,15 @@ class TestFixSmartDecisionMakerBlocks:
 # ---------------------------------------------------------------------------


-class TestValidateSmartDecisionMakerBlocks:
-    """Tests for AgentValidator.validate_smart_decision_maker_blocks()."""
+class TestValidateOrchestratorBlocks:
+    """Tests for AgentValidator.validate_orchestrator_blocks()."""

    def test_valid_sdm_with_tools(self):
        """SDM with downstream tool links passes validation."""
        validator = AgentValidator()
        agent = _make_orchestrator_agent()

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is True
        assert len(validator.errors) == 0
@@ -325,7 +325,7 @@ class TestValidateSmartDecisionMakerBlocks:
            "links": [],  # no tool links
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is False
        assert len(validator.errors) == 1
@@ -344,20 +344,20 @@ class TestValidateSmartDecisionMakerBlocks:
            ],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is False
        assert len(validator.errors) == 1

    def test_no_sdm_nodes_passes(self):
-        """Agent without SmartDecisionMaker nodes passes trivially."""
+        """Agent without Orchestrator nodes passes trivially."""
        validator = AgentValidator()
        agent = {
            "nodes": [_make_input_node(), _make_output_node()],
            "links": [],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is True
        assert len(validator.errors) == 0
@@ -373,7 +373,7 @@ class TestValidateSmartDecisionMakerBlocks:
        )
        agent = {"nodes": [sdm], "links": []}

-        validator.validate_smart_decision_maker_blocks(agent)
+        validator.validate_orchestrator_blocks(agent)

        assert "My Orchestrator" in validator.errors[0]

@@ -392,7 +392,7 @@ class TestValidateSmartDecisionMakerBlocks:
            ],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is False
        assert len(validator.errors) == 1
@@ -408,7 +408,7 @@ class TestValidateSmartDecisionMakerBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is False
        assert any("agent_mode_max_iterations=0" in e for e in validator.errors)
@@ -423,7 +423,7 @@ class TestValidateSmartDecisionMakerBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is True
        assert len(validator.errors) == 0
@@ -438,7 +438,7 @@ class TestValidateSmartDecisionMakerBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is False
        assert any("unusually high" in e for e in validator.errors)
@@ -453,7 +453,7 @@ class TestValidateSmartDecisionMakerBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is False
        assert any("non-integer" in e for e in validator.errors)
@@ -468,7 +468,7 @@ class TestValidateSmartDecisionMakerBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is False
        assert any("invalid" in e and "-5" in e for e in validator.errors)
@@ -488,14 +488,14 @@ class TestValidateSmartDecisionMakerBlocks:
            ],
        }

-        result = validator.validate_smart_decision_maker_blocks(agent)
+        result = validator.validate_orchestrator_blocks(agent)

        assert result is False
        assert len(validator.errors) == 1
        assert "no downstream tool blocks" in validator.errors[0]

    def test_registered_in_validate(self):
-        """validate_smart_decision_maker_blocks runs as part of validate()."""
+        """validate_orchestrator_blocks runs as part of validate()."""
        validator = AgentValidator()
        sdm = _make_sdm_node()
        agent = {
@@ -511,8 +511,8 @@ class TestValidateSmartDecisionMakerBlocks:
        # Build a minimal blocks list with the SDM block info
        blocks = [
            {
-                "id": SMART_DECISION_MAKER_BLOCK_ID,
-                "name": "SmartDecisionMakerBlock",
+                "id": TOOL_ORCHESTRATOR_BLOCK_ID,
+                "name": "OrchestratorBlock",
                "inputSchema": {"properties": {"prompt": {"type": "string"}}},
                "outputSchema": {
                    "properties": {
@@ -557,7 +557,7 @@ class TestValidateSmartDecisionMakerBlocks:
 # ---------------------------------------------------------------------------


-class TestSmartDecisionMakerE2EPipeline:
+class TestOrchestratorE2EPipeline:
    """End-to-end tests: build agent JSON → fix → validate."""

    def test_orchestrator_agent_fix_then_validate(self):
@@ -570,7 +570,7 @@ class TestSmartDecisionMakerE2EPipeline:

        # Verify defaults were applied
        sdm_nodes = [
-            n for n in fixed["nodes"] if n["block_id"] == SMART_DECISION_MAKER_BLOCK_ID
+            n for n in fixed["nodes"] if n["block_id"] == TOOL_ORCHESTRATOR_BLOCK_ID
        ]
        assert len(sdm_nodes) == 1
        assert sdm_nodes[0]["input_default"]["agent_mode_max_iterations"] == 10
@@ -578,7 +578,7 @@ class TestSmartDecisionMakerE2EPipeline:

        # Validate (standalone SDM check)
        validator = AgentValidator()
-        assert validator.validate_smart_decision_maker_blocks(fixed) is True
+        assert validator.validate_orchestrator_blocks(fixed) is True

    def test_bare_sdm_no_tools_fix_then_validate(self):
        """SDM without tools: fixer fills defaults, validator catches error."""
@@ -606,7 +606,7 @@ class TestSmartDecisionMakerE2EPipeline:

        # Validate catches missing tools
        validator = AgentValidator()
-        assert validator.validate_smart_decision_maker_blocks(fixed) is False
+        assert validator.validate_orchestrator_blocks(fixed) is False
        assert any("no downstream tool blocks" in e for e in validator.errors)

    def test_sdm_with_user_set_bounded_iterations(self):
@@ -614,7 +614,7 @@ class TestSmartDecisionMakerE2EPipeline:
        agent = _make_orchestrator_agent()
        # Simulate user setting bounded iterations
        for node in agent["nodes"]:
-            if node["block_id"] == SMART_DECISION_MAKER_BLOCK_ID:
+            if node["block_id"] == TOOL_ORCHESTRATOR_BLOCK_ID:
                node["input_default"]["agent_mode_max_iterations"] = 5
                node["input_default"]["sys_prompt"] = "You are a helpful orchestrator"

@@ -622,7 +622,7 @@ class TestSmartDecisionMakerE2EPipeline:
        fixed = fixer.apply_all_fixes(agent)

        sdm = next(
-            n for n in fixed["nodes"] if n["block_id"] == SMART_DECISION_MAKER_BLOCK_ID
+            n for n in fixed["nodes"] if n["block_id"] == TOOL_ORCHESTRATOR_BLOCK_ID
        )
        assert sdm["input_default"]["agent_mode_max_iterations"] == 5
        assert sdm["input_default"]["sys_prompt"] == "You are a helpful orchestrator"
@@ -638,8 +638,8 @@ class TestSmartDecisionMakerE2EPipeline:

        blocks = [
            {
-                "id": SMART_DECISION_MAKER_BLOCK_ID,
-                "name": "SmartDecisionMakerBlock",
+                "id": TOOL_ORCHESTRATOR_BLOCK_ID,
+                "name": "OrchestratorBlock",
                "inputSchema": {
                    "properties": {
                        "prompt": {"type": "string"},
@@ -709,5 +709,5 @@ class TestSmartDecisionMakerE2EPipeline:
        assert is_valid, f"Validation failed: {error_msg}"

        # SDM-specific validation should pass (has tool links)
-        sdm_errors = [e for e in validator.errors if "SmartDecisionMakerBlock" in e]
+        sdm_errors = [e for e in validator.errors if "OrchestratorBlock" in e]
        assert len(sdm_errors) == 0, f"Unexpected SDM errors: {sdm_errors}"
--- a/autogpt_platform/db/docker/docker-compose.yml
+++ b/autogpt_platform/db/docker/docker-compose.yml
@@ -66,6 +66,9 @@ services:
    container_name: supabase-kong
    image: kong:2.8.1
    restart: unless-stopped
+    networks:
+      - default
+      - shared-network
    ports:
      - 8000:8000/tcp
      - 8443:8443/tcp
@@ -407,6 +410,9 @@ services:
    container_name: supabase-db
    image: supabase/postgres:15.8.1.049
    restart: unless-stopped
+    networks:
+      - default
+      - app-network
    volumes:
      - ./volumes/db/realtime.sql:/docker-entrypoint-initdb.d/migrations/99-realtime.sql:Z
      # Must be superuser to create event trigger
@@ -538,5 +544,11 @@ services:
        "/app/bin/migrate && /app/bin/supavisor eval \"$$(cat /etc/pooler/pooler.exs)\" && /app/bin/server"
      ]

+networks:
+  shared-network:
+    name: shared-network
+  app-network:
+    name: app-network
+
 volumes:
  supabase-config:
--- a/autogpt_platform/db/docker/reset.sh
+++ b/autogpt_platform/db/docker/reset.sh
@@ -10,6 +10,12 @@ then
 fi

 echo "Stopping and removing all containers..."
+# Use the platform compose to tear everything down so no orphan containers remain
+# (the platform compose manages supabase containers via `extends`, using the
+# standalone supabase compose here would leave orphans that conflict on next start)
+if [ -f "../../docker-compose.yml" ]; then
+  docker compose -f ../../docker-compose.yml down -v --remove-orphans
+fi
 docker compose -f docker-compose.yml -f ./dev/docker-compose.dev.yml down -v --remove-orphans

 echo "Cleaning up bind-mounted directories..."
--- a/autogpt_platform/docker-compose.platform.yml
+++ b/autogpt_platform/docker-compose.platform.yml
@@ -114,6 +114,8 @@ services:
      <<: *backend-env
    ports:
      - "8006:8006"
+    volumes:
+      - workspace-data:/app/autogpt_platform/backend/workspaces
    networks:
      - app-network
    logging:
@@ -185,6 +187,8 @@ services:
      PYTHONUNBUFFERED: "1"
    ports:
      - "8008:8008"
+    volumes:
+      - workspace-data:/app/autogpt_platform/backend/workspaces
    networks:
      - app-network
    logging:
@@ -368,6 +372,9 @@ services:
      SUPABASE_URL: http://kong:8000
      AGPT_SERVER_URL: http://rest_server:8006/api
      AGPT_WS_SERVER_URL: ws://websocket_server:8001/ws
+volumes:
+  workspace-data:
+
 networks:
  app-network:
    driver: bridge
--- a/autogpt_platform/docker-compose.yml
+++ b/autogpt_platform/docker-compose.yml
@@ -7,6 +7,7 @@ networks:
 volumes:
  supabase-config:
  clamav-data:
+  workspace-data:

 x-agpt-services:
  &agpt-services
--- a/autogpt_platform/frontend/package.json
+++ b/autogpt_platform/frontend/package.json
@@ -73,7 +73,7 @@
    "@vercel/analytics": "1.5.0",
    "@vercel/speed-insights": "1.2.0",
    "@xyflow/react": "12.9.2",
-    "ai": "6.0.59",
+    "ai": "6.0.134",
    "boring-avatars": "1.11.2",
    "canvas-confetti": "1.9.4",
    "class-variance-authority": "0.7.1",
--- a/autogpt_platform/frontend/pnpm-lock.yaml
+++ b/autogpt_platform/frontend/pnpm-lock.yaml
@@ -142,8 +142,8 @@ importers:
        specifier: 12.9.2
        version: 12.9.2(@types/react@18.3.17)(immer@11.1.3)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
      ai:
-        specifier: 6.0.59
-        version: 6.0.59(zod@3.25.76)
+        specifier: 6.0.134
+        version: 6.0.134(zod@3.25.76)
      boring-avatars:
        specifier: 1.11.2
        version: 1.11.2
@@ -448,16 +448,32 @@ packages:
    peerDependencies:
      zod: ^3.25.76 || ^4.1.8

+  '@ai-sdk/gateway@3.0.77':
+    resolution: {integrity: sha512-UdwIG2H2YMuntJQ5L+EmED5XiwnlvDT3HOmKfVFxR4Nq/RSLFA/HcchhwfNXHZ5UJjyuL2VO0huLbWSZ9ijemQ==}
+    engines: {node: '>=18'}
+    peerDependencies:
+      zod: ^3.25.76 || ^4.1.8
+
  '@ai-sdk/provider-utils@4.0.10':
    resolution: {integrity: sha512-VeDAiCH+ZK8Xs4hb9Cw7pHlujWNL52RKe8TExOkrw6Ir1AmfajBZTb9XUdKOZO08RwQElIKA8+Ltm+Gqfo8djQ==}
    engines: {node: '>=18'}
    peerDependencies:
      zod: ^3.25.76 || ^4.1.8

+  '@ai-sdk/provider-utils@4.0.21':
+    resolution: {integrity: sha512-MtFUYI1/8mgDvRmaBDjbLJPFFrMG777AvSgyIFQtZHIMzm88R/12vYBBpnk7pfiWLFE1DSZzY4WDYzGbKAcmiw==}
+    engines: {node: '>=18'}
+    peerDependencies:
+      zod: ^3.25.76 || ^4.1.8
+
  '@ai-sdk/provider@3.0.5':
    resolution: {integrity: sha512-2Xmoq6DBJqmSl80U6V9z5jJSJP7ehaJJQMy2iFUqTay06wdCqTnPVBBQbtEL8RCChenL+q5DC5H5WzU3vV3v8w==}
    engines: {node: '>=18'}

+  '@ai-sdk/provider@3.0.8':
+    resolution: {integrity: sha512-oGMAgGoQdBXbZqNG0Ze56CHjDZ1IDYOwGYxYjO5KLSlz5HiNQ9udIXsPZ61VWaHGZ5XW/jyjmr6t2xz2jGVwbQ==}
+    engines: {node: '>=18'}
+
  '@ai-sdk/react@3.0.61':
    resolution: {integrity: sha512-vCjZBnY2+TawFBXamSKt6elAt9n1MXMfcjSd9DSgT9peCJN27qNGVSXgaGNh/B3cUgeOktFfhB2GVmIqOjvmLQ==}
    engines: {node: '>=18'}
@@ -4053,6 +4069,12 @@ packages:
    resolution: {integrity: sha512-MnA+YT8fwfJPgBx3m60MNqakm30XOkyIoH1y6huTQvC0PwZG7ki8NacLBcrPbNoo8vEZy7Jpuk7+jMO+CUovTQ==}
    engines: {node: '>= 14'}

+  ai@6.0.134:
+    resolution: {integrity: sha512-YalNEaavld/kE444gOcsMKXdVVRGEe0SK77fAFcWYcqLg+a7xKnEet8bdfrEAJTfnMjj01rhgrIL10903w1a5Q==}
+    engines: {node: '>=18'}
+    peerDependencies:
+      zod: ^3.25.76 || ^4.1.8
+
  ai@6.0.59:
    resolution: {integrity: sha512-9SfCvcr4kVk4t8ZzIuyHpuL1hFYKsYMQfBSbBq3dipXPa+MphARvI8wHEjNaRqYl3JOsJbWxEBIMqHL0L92mUA==}
    engines: {node: '>=18'}
@@ -8718,6 +8740,13 @@ snapshots:
      '@vercel/oidc': 3.1.0
      zod: 3.25.76

+  '@ai-sdk/gateway@3.0.77(zod@3.25.76)':
+    dependencies:
+      '@ai-sdk/provider': 3.0.8
+      '@ai-sdk/provider-utils': 4.0.21(zod@3.25.76)
+      '@vercel/oidc': 3.1.0
+      zod: 3.25.76
+
  '@ai-sdk/provider-utils@4.0.10(zod@3.25.76)':
    dependencies:
      '@ai-sdk/provider': 3.0.5
@@ -8725,10 +8754,21 @@ snapshots:
      eventsource-parser: 3.0.6
      zod: 3.25.76

+  '@ai-sdk/provider-utils@4.0.21(zod@3.25.76)':
+    dependencies:
+      '@ai-sdk/provider': 3.0.8
+      '@standard-schema/spec': 1.1.0
+      eventsource-parser: 3.0.6
+      zod: 3.25.76
+
  '@ai-sdk/provider@3.0.5':
    dependencies:
      json-schema: 0.4.0

+  '@ai-sdk/provider@3.0.8':
+    dependencies:
+      json-schema: 0.4.0
+
  '@ai-sdk/react@3.0.61(react@18.3.1)(zod@3.25.76)':
    dependencies:
      '@ai-sdk/provider-utils': 4.0.10(zod@3.25.76)
@@ -12798,6 +12838,14 @@ snapshots:
  agent-base@7.1.4:
    optional: true

+  ai@6.0.134(zod@3.25.76):
+    dependencies:
+      '@ai-sdk/gateway': 3.0.77(zod@3.25.76)
+      '@ai-sdk/provider': 3.0.8
+      '@ai-sdk/provider-utils': 4.0.21(zod@3.25.76)
+      '@opentelemetry/api': 1.9.0
+      zod: 3.25.76
+
  ai@6.0.59(zod@3.25.76):
    dependencies:
      '@ai-sdk/gateway': 3.0.27(zod@3.25.76)
@@ -14066,8 +14114,8 @@ snapshots:
      '@typescript-eslint/parser': 8.52.0(eslint@8.57.1)(typescript@5.9.3)
      eslint: 8.57.1
      eslint-import-resolver-node: 0.3.9
-      eslint-import-resolver-typescript: 3.10.1(eslint-plugin-import@2.32.0)(eslint@8.57.1)
-      eslint-plugin-import: 2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-typescript@3.10.1)(eslint@8.57.1)
+      eslint-import-resolver-typescript: 3.10.1(eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint@8.57.1))(eslint@8.57.1)
+      eslint-plugin-import: 2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-typescript@3.10.1(eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint@8.57.1))(eslint@8.57.1))(eslint@8.57.1)
      eslint-plugin-jsx-a11y: 6.10.2(eslint@8.57.1)
      eslint-plugin-react: 7.37.5(eslint@8.57.1)
      eslint-plugin-react-hooks: 5.2.0(eslint@8.57.1)
@@ -14086,7 +14134,7 @@ snapshots:
    transitivePeerDependencies:
      - supports-color

-  eslint-import-resolver-typescript@3.10.1(eslint-plugin-import@2.32.0)(eslint@8.57.1):
+  eslint-import-resolver-typescript@3.10.1(eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint@8.57.1))(eslint@8.57.1):
    dependencies:
      '@nolyfill/is-core-module': 1.0.39
      debug: 4.4.3
@@ -14097,22 +14145,22 @@ snapshots:
      tinyglobby: 0.2.15
      unrs-resolver: 1.11.1
    optionalDependencies:
-      eslint-plugin-import: 2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-typescript@3.10.1)(eslint@8.57.1)
+      eslint-plugin-import: 2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-typescript@3.10.1(eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint@8.57.1))(eslint@8.57.1))(eslint@8.57.1)
    transitivePeerDependencies:
      - supports-color

-  eslint-module-utils@2.12.1(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-node@0.3.9)(eslint-import-resolver-typescript@3.10.1)(eslint@8.57.1):
+  eslint-module-utils@2.12.1(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-node@0.3.9)(eslint-import-resolver-typescript@3.10.1(eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint@8.57.1))(eslint@8.57.1))(eslint@8.57.1):
    dependencies:
      debug: 3.2.7
    optionalDependencies:
      '@typescript-eslint/parser': 8.52.0(eslint@8.57.1)(typescript@5.9.3)
      eslint: 8.57.1
      eslint-import-resolver-node: 0.3.9
-      eslint-import-resolver-typescript: 3.10.1(eslint-plugin-import@2.32.0)(eslint@8.57.1)
+      eslint-import-resolver-typescript: 3.10.1(eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint@8.57.1))(eslint@8.57.1)
    transitivePeerDependencies:
      - supports-color

-  eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-typescript@3.10.1)(eslint@8.57.1):
+  eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-typescript@3.10.1(eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint@8.57.1))(eslint@8.57.1))(eslint@8.57.1):
    dependencies:
      '@rtsao/scc': 1.1.0
      array-includes: 3.1.9
@@ -14123,7 +14171,7 @@ snapshots:
      doctrine: 2.1.0
      eslint: 8.57.1
      eslint-import-resolver-node: 0.3.9
-      eslint-module-utils: 2.12.1(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-node@0.3.9)(eslint-import-resolver-typescript@3.10.1)(eslint@8.57.1)
+      eslint-module-utils: 2.12.1(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint-import-resolver-node@0.3.9)(eslint-import-resolver-typescript@3.10.1(eslint-plugin-import@2.32.0(@typescript-eslint/parser@8.52.0(eslint@8.57.1)(typescript@5.9.3))(eslint@8.57.1))(eslint@8.57.1))(eslint@8.57.1)
      hasown: 2.0.2
      is-core-module: 2.16.1
      is-glob: 4.0.3
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
@@ -15,46 +15,11 @@ import { useCopilotUIStore } from "./store";
 import { useChatSession } from "./useChatSession";
 import { useCopilotNotifications } from "./useCopilotNotifications";
 import { useCopilotStream } from "./useCopilotStream";
+import { useWorkflowImportAutoSubmit } from "./useWorkflowImportAutoSubmit";

 const TITLE_POLL_INTERVAL_MS = 2_000;
 const TITLE_POLL_MAX_ATTEMPTS = 5;

-/**
- * Extract a prompt from the URL hash fragment.
- * Supports: /copilot#prompt=URL-encoded-text
- * Optionally auto-submits if ?autosubmit=true is in the query string.
- * Returns null if no prompt is present.
- */
-function extractPromptFromUrl(): {
-  prompt: string;
-  autosubmit: boolean;
-} | null {
-  if (typeof window === "undefined") return null;
-
-  const hash = window.location.hash;
-  if (!hash) return null;
-
-  const hashParams = new URLSearchParams(hash.slice(1));
-  const prompt = hashParams.get("prompt");
-
-  if (!prompt || !prompt.trim()) return null;
-
-  const searchParams = new URLSearchParams(window.location.search);
-  const autosubmit = searchParams.get("autosubmit") === "true";
-
-  // Clean up hash + autosubmit param only (preserve other query params)
-  const cleanURL = new URL(window.location.href);
-  cleanURL.hash = "";
-  cleanURL.searchParams.delete("autosubmit");
-  window.history.replaceState(
-    null,
-    "",
-    `${cleanURL.pathname}${cleanURL.search}`,
-  );
-
-  return { prompt: prompt.trim(), autosubmit };
-}
-
 interface UploadedFile {
  file_id: string;
  name: string;
@@ -130,16 +95,23 @@ export function useCopilotPage() {
    breakpoint === "base" || breakpoint === "sm" || breakpoint === "md";

  const pendingFilesRef = useRef<File[]>([]);
+  // Pre-built file parts from workflow import (already uploaded, skip re-upload)
+  const pendingFilePartsRef = useRef<FileUIPart[]>([]);

  // --- Send pending message after session creation ---
  useEffect(() => {
    if (!sessionId || pendingMessage === null) return;
    const msg = pendingMessage;
    const files = pendingFilesRef.current;
+    const prebuiltParts = pendingFilePartsRef.current;
    setPendingMessage(null);
    pendingFilesRef.current = [];
+    pendingFilePartsRef.current = [];

-    if (files.length > 0) {
+    if (prebuiltParts.length > 0) {
+      // File already uploaded (e.g. workflow import) — send directly
+      sendMessage({ text: msg, files: prebuiltParts });
+    } else if (files.length > 0) {
      setIsUploadingFiles(true);
      void uploadFiles(files, sessionId)
        .then((uploaded) => {
@@ -164,26 +136,11 @@ export function useCopilotPage() {
  }, [sessionId, pendingMessage, sendMessage]);

  // --- Extract prompt from URL hash on mount (e.g. /copilot#prompt=Hello) ---
-  const { setInitialPrompt } = useCopilotUIStore();
-  const hasProcessedUrlPrompt = useRef(false);
-  useEffect(() => {
-    if (hasProcessedUrlPrompt.current) return;
-
-    const urlPrompt = extractPromptFromUrl();
-    if (!urlPrompt) return;
-
-    hasProcessedUrlPrompt.current = true;
-
-    if (urlPrompt.autosubmit) {
-      setPendingMessage(urlPrompt.prompt);
-      void createSession().catch(() => {
-        setPendingMessage(null);
-        setInitialPrompt(urlPrompt.prompt);
-      });
-    } else {
-      setInitialPrompt(urlPrompt.prompt);
-    }
-  }, [createSession, setInitialPrompt]);
+  useWorkflowImportAutoSubmit({
+    createSession,
+    setPendingMessage,
+    pendingFilePartsRef,
+  });

  async function uploadFiles(
    files: File[],
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/useWorkflowImportAutoSubmit.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/useWorkflowImportAutoSubmit.ts
@@ -0,0 +1,122 @@
+import type { FileUIPart } from "ai";
+import { useEffect, useRef } from "react";
+import { useCopilotUIStore } from "./store";
+
+/**
+ * Extract a prompt from the URL hash fragment.
+ * Supports: /copilot#prompt=URL-encoded-text
+ * Optionally auto-submits if ?autosubmit=true is in the query string.
+ * Returns null if no prompt is present.
+ */
+function extractPromptFromUrl(): {
+  prompt: string;
+  autosubmit: boolean;
+  filePart?: FileUIPart;
+} | null {
+  if (typeof window === "undefined") return null;
+
+  const searchParams = new URLSearchParams(window.location.search);
+  const autosubmit = searchParams.get("autosubmit") === "true";
+
+  // Check sessionStorage first (used by workflow import for large prompts)
+  const storedPrompt = sessionStorage.getItem("importWorkflowPrompt");
+  if (storedPrompt) {
+    sessionStorage.removeItem("importWorkflowPrompt");
+
+    // Check for a pre-uploaded workflow file attached to this import
+    let filePart: FileUIPart | undefined;
+    const storedFile = sessionStorage.getItem("importWorkflowFile");
+    if (storedFile) {
+      sessionStorage.removeItem("importWorkflowFile");
+      try {
+        const { fileId, fileName, mimeType } = JSON.parse(storedFile);
+        // Validate fileId is a UUID to prevent path traversal
+        const UUID_RE =
+          /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
+        if (typeof fileId === "string" && UUID_RE.test(fileId)) {
+          filePart = {
+            type: "file",
+            mediaType: mimeType ?? "application/json",
+            filename: fileName ?? "workflow.json",
+            url: `/api/proxy/api/workspace/files/${fileId}/download`,
+          };
+        }
+      } catch {
+        // ignore malformed stored data
+      }
+    }
+
+    // Clean up query params
+    const cleanURL = new URL(window.location.href);
+    cleanURL.searchParams.delete("autosubmit");
+    cleanURL.searchParams.delete("source");
+    window.history.replaceState(
+      null,
+      "",
+      `${cleanURL.pathname}${cleanURL.search}`,
+    );
+    return { prompt: storedPrompt.trim(), autosubmit, filePart };
+  }
+
+  // Fall back to URL hash (e.g. /copilot#prompt=...)
+  const hash = window.location.hash;
+  if (!hash) return null;
+
+  const hashParams = new URLSearchParams(hash.slice(1));
+  const prompt = hashParams.get("prompt");
+
+  if (!prompt || !prompt.trim()) return null;
+
+  // Clean up hash + autosubmit param only (preserve other query params)
+  const cleanURL = new URL(window.location.href);
+  cleanURL.hash = "";
+  cleanURL.searchParams.delete("autosubmit");
+  window.history.replaceState(
+    null,
+    "",
+    `${cleanURL.pathname}${cleanURL.search}`,
+  );
+
+  return { prompt: prompt.trim(), autosubmit };
+}
+
+/**
+ * Hook that checks for workflow import data in sessionStorage / URL on mount,
+ * and auto-submits a new CoPilot session when `autosubmit=true`.
+ *
+ * Extracted from useCopilotPage to keep that hook focused on page-level concerns.
+ */
+export function useWorkflowImportAutoSubmit({
+  createSession,
+  setPendingMessage,
+  pendingFilePartsRef,
+}: {
+  createSession: () => Promise<string | undefined>;
+  setPendingMessage: (msg: string | null) => void;
+  pendingFilePartsRef: React.MutableRefObject<FileUIPart[]>;
+}) {
+  const { setInitialPrompt } = useCopilotUIStore();
+  const hasProcessedUrlPrompt = useRef(false);
+
+  useEffect(() => {
+    if (hasProcessedUrlPrompt.current) return;
+
+    const urlPrompt = extractPromptFromUrl();
+    if (!urlPrompt) return;
+
+    hasProcessedUrlPrompt.current = true;
+
+    if (urlPrompt.autosubmit) {
+      if (urlPrompt.filePart) {
+        pendingFilePartsRef.current = [urlPrompt.filePart];
+      }
+      setPendingMessage(urlPrompt.prompt);
+      void createSession().catch(() => {
+        setPendingMessage(null);
+        setInitialPrompt(urlPrompt.prompt);
+      });
+    } else {
+      setInitialPrompt(urlPrompt.prompt);
+    }
+  }, [createSession, setInitialPrompt, setPendingMessage, pendingFilePartsRef]);
+}
--- a/autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/selected-views/OutputRenderers/renderers/MarkdownRenderer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/selected-views/OutputRenderers/renderers/MarkdownRenderer.tsx
@@ -169,7 +169,7 @@ function renderMarkdown(
          [remarkMath, { singleDollarTextMath: false }], // Math support for LaTeX
        ]}
        rehypePlugins={[
-          rehypeKatex, // Render math with KaTeX
+          [rehypeKatex, { strict: false }], // Render math with KaTeX
          rehypeHighlight, // Syntax highlighting for code blocks
          rehypeSlug, // Add IDs to headings
          [rehypeAutolinkHeadings, { behavior: "wrap" }], // Make headings clickable
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryActionHeader/LibraryActionHeader.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryActionHeader/LibraryActionHeader.tsx
@@ -1,5 +1,5 @@
+import LibraryImportDialog from "../LibraryImportDialog/LibraryImportDialog";
 import { LibrarySearchBar } from "../LibrarySearchBar/LibrarySearchBar";
-import LibraryUploadAgentDialog from "../LibraryUploadAgentDialog/LibraryUploadAgentDialog";

 interface Props {
  setSearchTerm: (value: string) => void;
@@ -10,13 +10,13 @@ export function LibraryActionHeader({ setSearchTerm }: Props) {
    <>
      <div className="mb-[32px] hidden items-center justify-center gap-4 md:flex">
        <LibrarySearchBar setSearchTerm={setSearchTerm} />
-        <LibraryUploadAgentDialog />
+        <LibraryImportDialog />
      </div>

      {/* Mobile and tablet */}
      <div className="flex flex-col gap-4 p-4 pt-[52px] md:hidden">
-        <div className="flex w-full justify-between">
-          <LibraryUploadAgentDialog />
+        <div className="flex w-full justify-between gap-2">
+          <LibraryImportDialog />
        </div>

        <div className="flex items-center justify-center">
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/LibraryImportDialog.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/LibraryImportDialog.tsx
@@ -0,0 +1,66 @@
+"use client";
+import { Button } from "@/components/atoms/Button/Button";
+import { Dialog } from "@/components/molecules/Dialog/Dialog";
+import {
+  TabsLine,
+  TabsLineList,
+  TabsLineTrigger,
+} from "@/components/molecules/TabsLine/TabsLine";
+import { UploadSimpleIcon } from "@phosphor-icons/react";
+import { useState } from "react";
+import { useLibraryUploadAgentDialog } from "../LibraryUploadAgentDialog/useLibraryUploadAgentDialog";
+import AgentUploadTab from "./components/AgentUploadTab/AgentUploadTab";
+import ExternalWorkflowTab from "./components/ExternalWorkflowTab/ExternalWorkflowTab";
+import { useExternalWorkflowTab } from "./components/ExternalWorkflowTab/useExternalWorkflowTab";
+
+export default function LibraryImportDialog() {
+  const [isOpen, setIsOpen] = useState(false);
+
+  const importWorkflow = useExternalWorkflowTab();
+
+  function handleClose() {
+    setIsOpen(false);
+    importWorkflow.setFileValue("");
+    importWorkflow.setUrlValue("");
+  }
+
+  const upload = useLibraryUploadAgentDialog({ onSuccess: handleClose });
+
+  return (
+    <Dialog
+      title="Import"
+      styling={{ maxWidth: "32rem" }}
+      controlled={{
+        isOpen,
+        set: setIsOpen,
+      }}
+      onClose={handleClose}
+    >
+      <Dialog.Trigger>
+        <Button
+          data-testid="import-button"
+          variant="primary"
+          className="h-[2.78rem] w-full md:w-[10rem]"
+          size="small"
+        >
+          <UploadSimpleIcon width={18} height={18} />
+          <span>Import</span>
+        </Button>
+      </Dialog.Trigger>
+      <Dialog.Content>
+        <TabsLine defaultValue="agent">
+          <TabsLineList>
+            <TabsLineTrigger value="agent">AutoGPT agent</TabsLineTrigger>
+            <TabsLineTrigger value="platform">Another platform</TabsLineTrigger>
+          </TabsLineList>
+
+          {/* Tab: Import from any platform (file upload + n8n URL) */}
+          <ExternalWorkflowTab importWorkflow={importWorkflow} />
+
+          {/* Tab: Upload AutoGPT agent JSON */}
+          <AgentUploadTab upload={upload} />
+        </TabsLine>
+      </Dialog.Content>
+    </Dialog>
+  );
+}
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/components/AgentUploadTab/AgentUploadTab.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/components/AgentUploadTab/AgentUploadTab.tsx
@@ -0,0 +1,105 @@
+"use client";
+import { Button } from "@/components/atoms/Button/Button";
+import { FileInput } from "@/components/atoms/FileInput/FileInput";
+import { Input } from "@/components/atoms/Input/Input";
+import { LoadingSpinner } from "@/components/atoms/LoadingSpinner/LoadingSpinner";
+import {
+  Form,
+  FormControl,
+  FormField,
+  FormItem,
+  FormMessage,
+} from "@/components/molecules/Form/Form";
+import { TabsLineContent } from "@/components/molecules/TabsLine/TabsLine";
+import { useLibraryUploadAgentDialog } from "../../../LibraryUploadAgentDialog/useLibraryUploadAgentDialog";
+
+type AgentUploadTabProps = {
+  upload: ReturnType<typeof useLibraryUploadAgentDialog>;
+};
+
+export default function AgentUploadTab({ upload }: AgentUploadTabProps) {
+  return (
+    <TabsLineContent value="agent">
+      <p className="mb-4 text-sm text-neutral-500">
+        Upload a previously exported AutoGPT agent file (.json).
+      </p>
+      <Form
+        form={upload.form}
+        onSubmit={upload.onSubmit}
+        className="flex flex-col justify-center gap-0 px-1"
+      >
+        <FormField
+          control={upload.form.control}
+          name="agentName"
+          render={({ field }) => (
+            <FormItem>
+              <FormControl>
+                <Input
+                  {...field}
+                  id={field.name}
+                  label="Agent name"
+                  className="w-full rounded-[10px]"
+                />
+              </FormControl>
+              <FormMessage />
+            </FormItem>
+          )}
+        />
+        <FormField
+          control={upload.form.control}
+          name="agentDescription"
+          render={({ field }) => (
+            <FormItem>
+              <FormControl>
+                <Input
+                  {...field}
+                  id={field.name}
+                  label="Agent description"
+                  type="textarea"
+                  className="w-full rounded-[10px]"
+                />
+              </FormControl>
+              <FormMessage />
+            </FormItem>
+          )}
+        />
+        <FormField
+          control={upload.form.control}
+          name="agentFile"
+          render={({ field }) => (
+            <FormItem>
+              <FormControl>
+                <FileInput
+                  mode="base64"
+                  value={field.value}
+                  onChange={field.onChange}
+                  accept=".json,application/json"
+                  placeholder="Agent file"
+                  maxFileSize={10 * 1024 * 1024}
+                  showStorageNote={false}
+                  className="mb-8 mt-4"
+                />
+              </FormControl>
+              <FormMessage />
+            </FormItem>
+          )}
+        />
+        <Button
+          type="submit"
+          variant="primary"
+          className="w-full"
+          disabled={!upload.agentObject || upload.isUploading}
+        >
+          {upload.isUploading ? (
+            <div className="flex items-center gap-2">
+              <LoadingSpinner size="small" className="text-white" />
+              <span>Uploading...</span>
+            </div>
+          ) : (
+            "Upload"
+          )}
+        </Button>
+      </Form>
+    </TabsLineContent>
+  );
+}
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/components/ExternalWorkflowTab/ExternalWorkflowTab.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/components/ExternalWorkflowTab/ExternalWorkflowTab.tsx
@@ -0,0 +1,99 @@
+"use client";
+import { Button } from "@/components/atoms/Button/Button";
+import { FileInput } from "@/components/atoms/FileInput/FileInput";
+import { Input } from "@/components/atoms/Input/Input";
+import { LoadingSpinner } from "@/components/atoms/LoadingSpinner/LoadingSpinner";
+import { TabsLineContent } from "@/components/molecules/TabsLine/TabsLine";
+import { useExternalWorkflowTab } from "./useExternalWorkflowTab";
+
+const N8N_EXAMPLES = [
+  { label: "Build Your First AI Agent", url: "https://n8n.io/workflows/6270" },
+  { label: "Interactive AI Chat Agent", url: "https://n8n.io/workflows/5819" },
+];
+
+type ExternalWorkflowTabProps = {
+  importWorkflow: ReturnType<typeof useExternalWorkflowTab>;
+};
+
+export default function ExternalWorkflowTab({
+  importWorkflow,
+}: ExternalWorkflowTabProps) {
+  return (
+    <TabsLineContent value="platform">
+      <p className="mb-4 text-sm text-neutral-500">
+        Upload a workflow exported from n8n, Make.com, Zapier, or any other
+        platform. AutoPilot will convert it into an AutoGPT agent for you.
+      </p>
+      <FileInput
+        mode="base64"
+        value={importWorkflow.fileValue}
+        onChange={importWorkflow.setFileValue}
+        accept=".json,application/json"
+        placeholder="Workflow file (n8n, Make.com, Zapier, ...)"
+        maxFileSize={10 * 1024 * 1024}
+        showStorageNote={false}
+        className="mb-4 mt-2"
+      />
+      <Button
+        type="button"
+        variant="primary"
+        className="w-full"
+        disabled={!importWorkflow.fileValue || importWorkflow.isSubmitting}
+        onClick={() => importWorkflow.submitWithMode("file")}
+      >
+        {importWorkflow.submittingMode === "file" ? (
+          <div className="flex items-center gap-2">
+            <LoadingSpinner size="small" className="text-white" />
+            <span>Importing...</span>
+          </div>
+        ) : (
+          "Import to AutoPilot"
+        )}
+      </Button>
+
+      <div className="my-5 flex items-center gap-3">
+        <div className="h-px flex-1 bg-neutral-200" />
+        <span className="text-xs text-neutral-400">or import from URL</span>
+        <div className="h-px flex-1 bg-neutral-200" />
+      </div>
+
+      <div className="mb-3 flex flex-wrap gap-2">
+        {N8N_EXAMPLES.map((p) => (
+          <button
+            key={p.label}
+            type="button"
+            disabled={importWorkflow.isSubmitting}
+            onClick={() => importWorkflow.setUrlValue(p.url)}
+            className="rounded-full border border-neutral-200 px-3 py-1 text-xs text-neutral-600 hover:border-purple-400 hover:text-purple-600 disabled:opacity-50"
+          >
+            {p.label}
+          </button>
+        ))}
+      </div>
+      <Input
+        id="template-url"
+        value={importWorkflow.urlValue}
+        onChange={(e) => importWorkflow.setUrlValue(e.target.value)}
+        label="Workflow URL"
+        placeholder="https://n8n.io/workflows/1234"
+        className="mb-4 w-full rounded-[10px]"
+      />
+      <Button
+        type="button"
+        variant="primary"
+        className="w-full"
+        disabled={!importWorkflow.urlValue || importWorkflow.isSubmitting}
+        onClick={() => importWorkflow.submitWithMode("url")}
+      >
+        {importWorkflow.submittingMode === "url" ? (
+          <div className="flex items-center gap-2">
+            <LoadingSpinner size="small" className="text-white" />
+            <span>Importing...</span>
+          </div>
+        ) : (
+          "Import from URL"
+        )}
+      </Button>
+    </TabsLineContent>
+  );
+}
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/components/ExternalWorkflowTab/fetchWorkflowFromUrl.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/components/ExternalWorkflowTab/fetchWorkflowFromUrl.ts
@@ -0,0 +1,85 @@
+"use server";
+
+/**
+ * Regex to extract the numeric template ID from various n8n URL formats:
+ *   - https://n8n.io/workflows/1234
+ *   - https://n8n.io/workflows/1234-some-slug
+ *   - https://api.n8n.io/api/templates/workflows/1234
+ */
+const N8N_TEMPLATE_ID_RE = /n8n\.io\/(?:api\/templates\/)?workflows\/(\d+)/i;
+
+/** Hardcoded n8n templates API base — the only URL we ever fetch. */
+const N8N_TEMPLATES_API = "https://api.n8n.io/api/templates/workflows";
+
+/** Max response body size (10 MB) to prevent memory exhaustion. */
+const MAX_RESPONSE_BYTES = 10 * 1024 * 1024;
+
+export type FetchWorkflowResult =
+  | { ok: true; json: string }
+  | { ok: false; error: string };
+
+/**
+ * Server action that fetches a workflow JSON from an n8n template URL.
+ * Runs server-side so there are no CORS restrictions.
+ *
+ * Returns a result object instead of throwing because Next.js
+ * server actions do not propagate error messages to the client.
+ *
+ * Only n8n.io workflow URLs are accepted. The template ID is extracted
+ * and used to call the hardcoded n8n API — the user-supplied URL is
+ * never passed to fetch() directly (SSRF prevention).
+ */
+export async function fetchWorkflowFromUrl(
+  url: string,
+): Promise<FetchWorkflowResult> {
+  const match = url.match(N8N_TEMPLATE_ID_RE);
+  if (!match) {
+    return {
+      ok: false,
+      error:
+        "Invalid or unsupported URL. " +
+        "URL import is supported for n8n.io workflow templates " +
+        "(e.g. https://n8n.io/workflows/1234). " +
+        "For other platforms, use file upload.",
+    };
+  }
+
+  const templateId = match[1]; // purely numeric, safe to interpolate
+
+  try {
+    const json = await fetchN8nWorkflow(templateId);
+    return { ok: true, json };
+  } catch (err) {
+    return {
+      ok: false,
+      error: err instanceof Error ? err.message : "Failed to fetch workflow.",
+    };
+  }
+}
+
+async function fetchN8nWorkflow(templateId: string): Promise<string> {
+  // Only ever fetch from the hardcoded API base + numeric ID.
+  // parseInt + toString round-trips to guarantee the value is purely numeric,
+  // preventing any path-traversal or SSRF via the interpolated segment.
+  const safeId = parseInt(templateId, 10);
+  if (!Number.isFinite(safeId) || safeId <= 0) {
+    throw new Error("Invalid template ID");
+  }
+  const res = await fetch(`${N8N_TEMPLATES_API}/${safeId.toString()}`);
+  if (!res.ok) throw new Error(`n8n template not found (${res.status})`);
+
+  const contentLength = res.headers.get("content-length");
+  if (contentLength && parseInt(contentLength, 10) > MAX_RESPONSE_BYTES) {
+    throw new Error("Response too large.");
+  }
+
+  const text = await res.text();
+  if (text.length > MAX_RESPONSE_BYTES) throw new Error("Response too large.");
+
+  const data = JSON.parse(text);
+  const template = data?.workflow ?? data;
+  const workflow = template?.workflow ?? template;
+  if (!workflow?.nodes) throw new Error("Unexpected n8n API response format");
+  if (!workflow.name) workflow.name = template?.name ?? data?.name ?? "";
+  return JSON.stringify(workflow);
+}
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/components/ExternalWorkflowTab/useExternalWorkflowTab.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryImportDialog/components/ExternalWorkflowTab/useExternalWorkflowTab.ts
@@ -0,0 +1,114 @@
+import { useToast } from "@/components/molecules/Toast/use-toast";
+import { uploadFileDirect } from "@/lib/direct-upload";
+import { useRouter } from "next/navigation";
+import { useState } from "react";
+import { fetchWorkflowFromUrl } from "./fetchWorkflowFromUrl";
+
+function decodeBase64Json(dataUrl: string): string {
+  const match = dataUrl.match(/^data:[^;]+;base64,(.+)$/);
+  if (!match) throw new Error("Could not read the uploaded file.");
+  const binary = atob(match[1]);
+  const bytes = Uint8Array.from(binary, (c) => c.charCodeAt(0));
+  const json = new TextDecoder().decode(bytes);
+  JSON.parse(json); // validate — throws SyntaxError if invalid
+  return json;
+}
+
+async function uploadJsonAsFile(
+  jsonString: string,
+): Promise<{ fileId: string; fileName: string; mimeType: string }> {
+  const file = new File(
+    [new Blob([jsonString], { type: "application/json" })],
+    `workflow-${crypto.randomUUID()}.json`,
+    { type: "application/json" },
+  );
+  const uploaded = await uploadFileDirect(file);
+  return {
+    fileId: uploaded.file_id,
+    fileName: uploaded.name,
+    mimeType: uploaded.mime_type,
+  };
+}
+
+function storeAndRedirect(
+  fileInfo: { fileId: string; fileName: string; mimeType: string },
+  router: ReturnType<typeof useRouter>,
+) {
+  sessionStorage.setItem(
+    "importWorkflowPrompt",
+    "Import this workflow and recreate it as an AutoGPT agent",
+  );
+  sessionStorage.setItem("importWorkflowFile", JSON.stringify(fileInfo));
+  router.push("/copilot?source=import&autosubmit=true");
+}
+
+export function useExternalWorkflowTab() {
+  const { toast } = useToast();
+  const router = useRouter();
+  const [fileValue, setFileValue] = useState("");
+  const [urlValue, setUrlValue] = useState("");
+  const [submittingMode, setSubmittingMode] = useState<"url" | "file" | null>(
+    null,
+  );
+  const isSubmitting = submittingMode !== null;
+
+  async function submitWithMode(mode: "url" | "file") {
+    setSubmittingMode(mode);
+    try {
+      const jsonString = await resolveJson(mode);
+      if (!jsonString) return;
+      storeAndRedirect(await uploadJsonAsFile(jsonString), router);
+    } catch (err) {
+      toast({
+        title: "Upload failed",
+        description:
+          err instanceof Error ? err.message : "Could not upload the file.",
+        variant: "destructive",
+      });
+    } finally {
+      setSubmittingMode(null);
+    }
+  }
+
+  async function resolveJson(mode: "url" | "file"): Promise<string | null> {
+    if (mode === "url") {
+      const result = await fetchWorkflowFromUrl(urlValue);
+      if (!result.ok) {
+        toast({
+          title: "Could not fetch workflow",
+          description: result.error,
+          variant: "destructive",
+        });
+        return null;
+      }
+      setUrlValue("");
+      return result.json;
+    }
+
+    try {
+      const json = decodeBase64Json(fileValue);
+      setFileValue("");
+      return json;
+    } catch (err) {
+      const isParseError = err instanceof SyntaxError;
+      toast({
+        title: isParseError ? "Invalid JSON" : "Invalid file",
+        description: isParseError
+          ? "The uploaded file is not valid JSON."
+          : "Could not read the uploaded file.",
+        variant: "destructive",
+      });
+      return null;
+    }
+  }
+
+  return {
+    submitWithMode,
+    fileValue,
+    setFileValue,
+    urlValue,
+    setUrlValue,
+    isSubmitting,
+    submittingMode,
+  };
+}
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryUploadAgentDialog/useLibraryUploadAgentDialog.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/LibraryUploadAgentDialog/useLibraryUploadAgentDialog.ts
@@ -9,7 +9,9 @@ import { useForm } from "react-hook-form";
 import { z } from "zod";
 import { uploadAgentFormSchema } from "./LibraryUploadAgentDialog";

-export function useLibraryUploadAgentDialog() {
+export function useLibraryUploadAgentDialog(options?: {
+  onSuccess?: () => void;
+}) {
  const [isOpen, setIsOpen] = useState(false);
  const { toast } = useToast();
  const [agentObject, setAgentObject] = useState<Graph | null>(null);
@@ -19,6 +21,7 @@ export function useLibraryUploadAgentDialog() {
      mutation: {
        onSuccess: ({ data }) => {
          setIsOpen(false);
+          options?.onSuccess?.();
          toast({
            title: "Success",
            description: "Agent uploaded successfully",
@@ -114,7 +117,7 @@ export function useLibraryUploadAgentDialog() {
    }
  }, [agentFileValue, form, toast]);

-  const onSubmit = async (values: z.infer<typeof uploadAgentFormSchema>) => {
+  async function onSubmit(values: z.infer<typeof uploadAgentFormSchema>) {
    if (!agentObject) {
      form.setError("root", { message: "No Agent object to save" });
      return;
@@ -133,7 +136,7 @@ export function useLibraryUploadAgentDialog() {
        source: "upload",
      },
    });
-  };
+  }

  return {
    onSubmit,
--- a/Show More
+++ b/Show More