feat(backend/llm-registry): wire refresh_runtime_caches to Redis invalidation and pub/sub

After any admin DB mutation, clear the shared Redis cache, refresh this process's in-memory state, then publish a notification so all other workers reload from Redis without hitting the database.
fix(backend/llm-registry): enforce single recommended model in update_model
2026-04-08 03:00:28 -04:00 · 2026-04-07 18:35:41 +01:00 · 2026-04-07 18:35:08 +01:00 · 2026-04-07 18:35:08 +01:00 · 2026-04-07 18:35:08 +01:00 · 2026-04-07 18:35:08 +01:00
78 changed files with 5561 additions and 5270 deletions
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -17,14 +17,6 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
 gh pr view {N}
 ```

-## Read the PR description
-
-Understand the **Why / What / How** before addressing comments — you need context to make good fixes:
-
-```bash
-gh pr view {N} --json body --jq '.body'
-```
-
 ## Fetch comments (all sources)

 ### 1. Inline review threads — GraphQL (primary source of actionable items)
--- a/.claude/skills/pr-review/SKILL.md
+++ b/.claude/skills/pr-review/SKILL.md
@@ -17,16 +17,6 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
 gh pr view {N}
 ```

-## Read the PR description
-
-Before reading code, understand the **why**, **what**, and **how** from the PR description:
-
-```bash
-gh pr view {N} --json body --jq '.body'
-```
-
-Every PR should have a Why / What / How structure. If any of these are missing, note it as feedback.
-
 ## Read the diff

 ```bash
@@ -44,8 +34,6 @@ gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews

 ## What to check

-**Description quality:** Does the PR description cover Why (motivation/problem), What (summary of changes), and How (approach/implementation details)? If any are missing, request them — you can't judge the approach without understanding the problem and intent.
-
 **Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).

 **Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).
--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -5,96 +5,13 @@ user-invocable: true
 argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
 metadata:
  author: autogpt-team
-  version: "2.0.0"
+  version: "1.0.0"
 ---

 # Manual E2E Test

 Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.

-## Critical Requirements
-
-These are NON-NEGOTIABLE. Every test run MUST satisfy ALL the following:
-
-### 1. Screenshots at Every Step
- Take a screenshot at EVERY significant test step — not just at the end
- Every test scenario MUST have at least one BEFORE and one AFTER screenshot
- Name screenshots sequentially: `{NN}-{action}-{state}.png` (e.g., `01-credits-before.png`, `02-credits-after.png`)
- If a screenshot is missing for a scenario, the test is INCOMPLETE — go back and take it
-
-### 2. Screenshots MUST Be Posted to PR
- Push ALL screenshots to a temp branch `test-screenshots/pr-{N}`
- Post a PR comment with ALL screenshots embedded inline using GitHub raw URLs
- This is NOT optional — every test run MUST end with a PR comment containing screenshots
- If screenshot upload fails, retry. If it still fails, list failed files and require manual drag-and-drop/paste attachment in the PR comment
-
-### 3. State Verification with Before/After Evidence
- For EVERY state-changing operation (API call, user action), capture the state BEFORE and AFTER
- Log the actual API response values (e.g., `credits_before=100, credits_after=95`)
- Screenshot MUST show the relevant UI state change
- Compare expected vs actual values explicitly — do not just eyeball it
-
-### 4. Negative Test Cases Are Mandatory
- Test at least ONE negative case per feature (e.g., insufficient credits, invalid input, unauthorized access)
- Verify error messages are user-friendly and accurate
- Verify the system state did NOT change after a rejected operation
-
-### 5. Test Report Must Include Full Evidence
-Each test scenario in the report MUST have:
- **Steps**: What was done (exact commands or UI actions)
- **Expected**: What should happen
- **Actual**: What actually happened
- **API Evidence**: Before/after API response values for state-changing operations
- **Screenshot Evidence**: Before/after screenshots with explanations
-
-## State Manipulation for Realistic Testing
-
-When testing features that depend on specific states (rate limits, credits, quotas):
-
-1. **Use Redis CLI to set counters directly:**
-   ```bash
-   # Find the Redis container
-   REDIS_CONTAINER=$(docker ps --format '{{.Names}}' | grep redis | head -1)
-   # Set a key with expiry
-   docker exec $REDIS_CONTAINER redis-cli SET key value EX ttl
-   # Example: Set rate limit counter to near-limit
-   docker exec $REDIS_CONTAINER redis-cli SET "rate_limit:user:test@test.com" 99 EX 3600
-   # Example: Check current value
-   docker exec $REDIS_CONTAINER redis-cli GET "rate_limit:user:test@test.com"
-   ```
-
-2. **Use API calls to check before/after state:**
-   ```bash
-   # BEFORE: Record current state
-   BEFORE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
-   echo "Credits BEFORE: $BEFORE"
-
-   # Perform the action...
-
-   # AFTER: Record new state and compare
-   AFTER=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
-   echo "Credits AFTER: $AFTER"
-   echo "Delta: $(( BEFORE - AFTER ))"
-   ```
-
-3. **Take screenshots BEFORE and AFTER state changes** — the UI must reflect the backend state change
-
-4. **Never rely on mocked/injected browser state** — always use real backend state. Do NOT use `agent-browser eval` to fake UI state. The backend must be the source of truth.
-
-5. **Use direct DB queries when needed:**
-   ```bash
-   # Query via Supabase's PostgREST or docker exec into the DB
-   docker exec supabase-db psql -U supabase_admin -d postgres -c "SELECT credits FROM user_credits WHERE user_id = '...';"
-   ```
-
-6. **After every API test, verify the state change actually persisted:**
-   ```bash
-   # Example: After a credits purchase, verify DB matches API
-   API_CREDITS=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
-   DB_CREDITS=$(docker exec supabase-db psql -U supabase_admin -d postgres -t -c "SELECT credits FROM user_credits WHERE user_id = '...';" | tr -d ' ')
-   [ "$API_CREDITS" = "$DB_CREDITS" ] && echo "CONSISTENT" || echo "MISMATCH: API=$API_CREDITS DB=$DB_CREDITS"
-   ```
-
 ## Arguments

 - `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
@@ -136,20 +53,14 @@ Before testing, understand what changed:

 ```bash
 cd $WORKTREE_PATH
-
-# Read PR description to understand the WHY
-gh pr view {N} --json body --jq '.body'
-
 git log --oneline dev..HEAD | head -20
 git diff dev --stat
 ```

-Read the PR description (Why / What / How) and changed files to understand:
-0. **Why** does this PR exist? What problem does it solve?
-1. **What** feature/fix does this PR implement?
-2. **How** does it work? What's the approach?
-3. What components are affected? (backend, frontend, copilot, executor, etc.)
-4. What are the key user-facing behaviors to test?
+Read the changed files to understand:
+1. What feature/fix does this PR implement?
+2. What components are affected? (backend, frontend, copilot, executor, etc.)
+3. What are the key user-facing behaviors to test?

 ## Step 2: Write test scenarios

@@ -164,21 +75,15 @@ Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:

 ## API Tests (if applicable)
 1. [Endpoint] — [expected behavior]
-   - Before state: [what to check before]
-   - After state: [what to verify changed]

 ## UI Tests (if applicable)
 1. [Page/component] — [interaction to test]
-   - Screenshot before: [what to capture]
-   - Screenshot after: [what to capture]

-## Negative Tests (REQUIRED — at least one per feature)
-1. [What should NOT happen] — [how to trigger it]
-   - Expected error: [what error message/code]
-   - State unchanged: [what to verify did NOT change]
+## Negative Tests
+1. [What should NOT happen]
 ```

-**Be critical** — include edge cases, error paths, and security checks. Every scenario MUST specify what screenshots to take and what state to verify.
+**Be critical** — include edge cases, error paths, and security checks.

 ## Step 3: Environment setup

@@ -328,7 +233,7 @@ curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...

 ### API testing

-Use `curl` with the auth token for backend API tests. **For EVERY API call that changes state, record before/after values:**
+Use `curl` with the auth token for backend API tests:

 ```bash
 # Example: List agents
@@ -351,27 +256,6 @@ curl -s -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
 ```

-**State verification pattern (use for EVERY state-changing API call):**
-```bash
-# 1. Record BEFORE state
-BEFORE_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
-echo "BEFORE: $BEFORE_STATE"
-
-# 2. Perform the action
-ACTION_RESULT=$(curl -s -X POST ... | jq .)
-echo "ACTION RESULT: $ACTION_RESULT"
-
-# 3. Record AFTER state
-AFTER_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
-echo "AFTER: $AFTER_STATE"
-
-# 4. Log the comparison
-echo "=== STATE CHANGE VERIFICATION ==="
-echo "Before: $BEFORE_STATE"
-echo "After: $AFTER_STATE"
-echo "Expected change: {describe what should have changed}"
-```
-
 ### Browser testing with agent-browser

 ```bash
@@ -462,90 +346,59 @@ agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --time
 # ... fill chat input and press Enter, wait 20-30s for response
 ```

-## Step 5: Record results and take screenshots
+## Step 5: Record results

-**Take a screenshot at EVERY significant test step** — before and after interactions, on success, and on failure. This is NON-NEGOTIABLE.
-
-**Required screenshot pattern for each test scenario:**
-```bash
-# BEFORE the action
-agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-before.png
-
-# Perform the action...
-
-# AFTER the action
-agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-after.png
-```
-
-**Naming convention:**
-```bash
-# Examples:
-# $RESULTS_DIR/01-login-page-before.png
-# $RESULTS_DIR/02-login-page-after.png
-# $RESULTS_DIR/03-credits-page-before.png
-# $RESULTS_DIR/04-credits-purchase-after.png
-# $RESULTS_DIR/05-negative-insufficient-credits.png
-# $RESULTS_DIR/06-error-state.png
-```
-
-**Minimum requirements:**
- At least TWO screenshots per test scenario (before + after)
- At least ONE screenshot for each negative test case showing the error state
- If a test fails, screenshot the failure state AND any error logs visible in the UI
-
-## Step 6: Show results to user with screenshots
-
-**CRITICAL: After all tests complete, you MUST show every screenshot to the user using the Read tool, with an explanation of what each screenshot shows.** This is the most important part of the test report — the user needs to visually verify the results.
-
-For each screenshot:
-1. Use the `Read` tool to display the PNG file (Claude can read images)
-2. Write a 1-2 sentence explanation below it describing:
-   - What page/state is being shown
-   - What the screenshot proves (which test scenario it validates)
-   - Any notable details visible in the UI
-
-Format the output like this:
+For each test scenario, record in `$RESULTS_DIR/test-report.md`:

 ```markdown
-### Screenshot 1: {descriptive title}
-[Read the PNG file here]
+# E2E Test Report: PR #{N} — {title}
+Date: {date}
+Branch: {branch}
+Worktree: {path}

-**What it shows:** {1-2 sentence explanation of what this screenshot proves}
+## Environment
+- Docker services: [list running containers]
+- API keys: OpenRouter={present/missing}, E2B={present/missing}

---
+## Test Results
+
+### Scenario 1: {name}
+**Steps:**
+1. ...
+2. ...
+**Expected:** ...
+**Actual:** ...
+**Result:** PASS / FAIL
+**Screenshot:** {filename}.png
+**Logs:** (if relevant)
+
+### Scenario 2: {name}
+...
+
+## Summary
+- Total: X scenarios
+- Passed: Y
+- Failed: Z
+- Bugs found: [list]
 ```

-After showing all screenshots, output a **detailed** summary table:
-
-| # | Scenario | Result | API Evidence | Screenshot Evidence |
-|---|----------|--------|-------------|-------------------|
-| 1 | {name} | PASS/FAIL | Before: X, After: Y | 01-before.png, 02-after.png |
-| 2 | ... | ... | ... | ... |
-
-**IMPORTANT:** As you show each screenshot and record test results, persist them in shell variables for Step 7:
-
+Take screenshots at each significant step:
 ```bash
-# Build these variables during Step 6 — they are required by Step 7's script
-# NOTE: declare -A requires Bash 4.0+. This is standard on modern systems (macOS ships zsh
-# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
-# plain variable with a lookup function instead.
-declare -A SCREENSHOT_EXPLANATIONS=(
-  ["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
-  ["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
-  # ... one entry per screenshot, using the same explanations you showed the user above
-)
-
-TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
-| 2 | Credits purchase | PASS | Before: 100, After: 95 | 03-credits-before.png, 04-credits-after.png |
-| 3 | Insufficient credits (negative) | PASS | Credits: 0, rejected | 05-insufficient-credits-error.png |"
-# ... one row per test scenario with actual results
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{description}.png
 ```

-## Step 7: Post test report as PR comment with screenshots
+## Step 6: Report results

-Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees), then post a comment with inline images and per-screenshot explanations.
+After all tests complete, output a summary to the user:

-**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
+1. Table of all scenarios with PASS/FAIL
+2. Screenshots of failures (read the PNG files to show them)
+3. Any bugs found with details
+4. Recommendations
+
+### Post test results as PR comment with screenshots
+
+Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees).

 ```bash
 # Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
@@ -553,166 +406,93 @@ REPO="Significant-Gravitas/AutoGPT"
 SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
 SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"

-# Step 1: Create blobs for each screenshot and build tree JSON
-# Retry each blob upload up to 3 times. If still failing, list them at end of report.
-shopt -s nullglob
-SCREENSHOT_FILES=("$RESULTS_DIR"/*.png)
-if [ ${#SCREENSHOT_FILES[@]} -eq 0 ]; then
-  echo "ERROR: No screenshots found in $RESULTS_DIR. Test run is incomplete."
-  exit 1
-fi
-TREE_JSON='['
-FIRST=true
-FAILED_UPLOADS=()
-for img in "${SCREENSHOT_FILES[@]}"; do
+# Step 1: Create blobs for each screenshot
+declare -a TREE_ENTRIES
+for img in $RESULTS_DIR/*.png; do
  BASENAME=$(basename "$img")
  B64=$(base64 < "$img")
-  BLOB_SHA=""
-  for attempt in 1 2 3; do
-    BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha' 2>/dev/null || true)
-    [ -n "$BLOB_SHA" ] && break
-    sleep 1
-  done
-  if [ -z "$BLOB_SHA" ]; then
-    FAILED_UPLOADS+=("$img")
-    continue
-  fi
+  BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
+  TREE_ENTRIES+=("-f" "tree[][path]=${SCREENSHOTS_DIR}/${BASENAME}" "-f" "tree[][mode]=100644" "-f" "tree[][type]=blob" "-f" "tree[][sha]=${BLOB_SHA}")
+done
+
+# Step 2: Create a tree with all screenshot blobs
+# Build the tree JSON manually since gh api doesn't handle arrays well
+TREE_JSON='['
+FIRST=true
+for img in $RESULTS_DIR/*.png; do
+  BASENAME=$(basename "$img")
+  B64=$(base64 < "$img")
+  BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
  if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
  TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
 done
 TREE_JSON+=']'

-# Step 2: Create tree, commit, and branch ref
-TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
+TREE_SHA=$(echo "$TREE_JSON" | gh api "repos/${REPO}/git/trees" --input - -f base_tree="" --jq '.sha' 2>/dev/null \
+  || echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
+
+# Step 3: Create a commit pointing to that tree
 COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
  -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
  -f tree="$TREE_SHA" \
  --jq '.sha')
+
+# Step 4: Create or update the ref (branch) — no local checkout needed
 gh api "repos/${REPO}/git/refs" \
  -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
  -f sha="$COMMIT_SHA" 2>/dev/null \
  || gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
    -X PATCH -f sha="$COMMIT_SHA" -f force=true
-```

-Then post the comment with **inline images AND explanations for each screenshot**:
-
-```bash
+# Step 5: Build image markdown and post the comment
 REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"
-
-# Build image markdown using uploaded image URLs; skip FAILED_UPLOADS (listed separately)
-
 IMAGE_MARKDOWN=""
-for img in "${SCREENSHOT_FILES[@]}"; do
+for img in $RESULTS_DIR/*.png; do
  BASENAME=$(basename "$img")
-  TITLE=$(echo "${BASENAME%.png}" | sed 's/^[0-9]*-//' | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
-  # Skip images that failed to upload — they will be listed at the end
-  IS_FAILED=false
-  for failed in "${FAILED_UPLOADS[@]}"; do
-    [ "$(basename "$failed")" = "$BASENAME" ] && IS_FAILED=true && break
-  done
-  if [ "$IS_FAILED" = true ]; then
-    continue
-  fi
-  EXPLANATION="${SCREENSHOT_EXPLANATIONS[$BASENAME]}"
-  if [ -z "$EXPLANATION" ]; then
-    echo "ERROR: Missing screenshot explanation for $BASENAME. Add it to SCREENSHOT_EXPLANATIONS in Step 6."
-    exit 1
-  fi
-  IMAGE_MARKDOWN="${IMAGE_MARKDOWN}
-### ${TITLE}
-![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})
-${EXPLANATION}
-"
+  IMAGE_MARKDOWN="$IMAGE_MARKDOWN
+![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})"
 done

-# Write comment body to file to avoid shell interpretation issues with special characters
-COMMENT_FILE=$(mktemp)
-# If any uploads failed, append a section listing them with instructions
-FAILED_SECTION=""
-if [ ${#FAILED_UPLOADS[@]} -gt 0 ]; then
-  FAILED_SECTION="
-## ⚠️ Failed Screenshot Uploads
-The following screenshots could not be uploaded via the GitHub API after 3 retries.
-**To add them:** drag-and-drop or paste these files into a PR comment manually:
-"
-  for failed in "${FAILED_UPLOADS[@]}"; do
-    FAILED_SECTION="${FAILED_SECTION}
- \`$(basename "$failed")\` (local path: \`$failed\`)"
-  done
-  FAILED_SECTION="${FAILED_SECTION}
+gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -f body="$(cat <<EOF
+## 🧪 E2E Test Report

-**Run status:** INCOMPLETE until the files above are manually attached and visible inline in the PR."
-fi
-
-cat > "$COMMENT_FILE" <<INNEREOF
-## E2E Test Report
-
-| # | Scenario | Result | API Evidence | Screenshot Evidence |
-|---|----------|--------|-------------|-------------------|
-${TEST_RESULTS_TABLE}
+$(cat $RESULTS_DIR/test-report.md)

+### Screenshots
 ${IMAGE_MARKDOWN}
-${FAILED_SECTION}
-INNEREOF
-
-gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
-rm -f "$COMMENT_FILE"
+EOF
+)"
 ```

-**The PR comment MUST include:**
-1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
-2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
-3. A 1-2 sentence explanation below each screenshot describing what it proves
-
 This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.

 ## Fix mode (--fix flag)

-When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.
+When `--fix` is present, after finding a bug:

-### Fix protocol for EVERY issue found (including UX issues):
-
-1. **Identify** the root cause in the code — read the relevant source files
-2. **Write a failing test first** (TDD): For backend bugs, write a test marked with `pytest.mark.xfail(reason="...")`. For frontend/Playwright bugs, write a test with `.fixme` annotation. Run it to confirm it fails as expected.
-3. **Screenshot** the broken state: `agent-browser screenshot $RESULTS_DIR/{NN}-broken-{description}.png`
-4. **Fix** the code in the worktree
-5. **Rebuild** ONLY the affected service (not the whole stack):
-   ```bash
-   cd $PLATFORM_DIR && docker compose up --build -d {service_name}
-   # e.g., docker compose up --build -d rest_server
-   # e.g., docker compose up --build -d frontend
-   ```
-6. **Wait** for the service to be ready (poll health endpoint)
-7. **Re-test** the same scenario
-8. **Screenshot** the fixed state: `agent-browser screenshot $RESULTS_DIR/{NN}-fixed-{description}.png`
-9. **Remove the xfail/fixme marker** from the test written in step 2, and verify it passes
-10. **Verify** the fix did not break other scenarios (run a quick smoke test)
-11. **Commit and push** immediately:
+1. Identify the root cause in the code
+2. Fix it in the worktree
+3. Rebuild the affected service: `cd $PLATFORM_DIR && docker compose up --build -d {service_name}`
+4. Re-test the scenario
+5. If fix works, commit and push:
   ```bash
   cd $WORKTREE_PATH
   git add -A
   git commit -m "fix: {description of fix}"
   git push
   ```
-12. **Continue** to the next test scenario
+6. Continue testing remaining scenarios
+7. After all fixes, run the full test suite again to ensure no regressions

 ### Fix loop (like pr-address)

 ```text
-test scenario → find issue (bug OR UX problem) → screenshot broken state
-→ fix code → rebuild affected service only → re-test → screenshot fixed state
-→ verify no regressions → commit + push
-→ repeat for next scenario
-→ after ALL scenarios pass, run full re-test to verify everything together
+test scenario → find bug → fix code → rebuild service → re-test
+→ repeat until all scenarios pass
+→ commit + push all fixes
+→ run full re-test to verify
 ```

-**Key differences from non-fix mode:**
- UX issues count as bugs — fix them (bad alignment, confusing labels, missing loading states)
- Every fix MUST have a before/after screenshot pair proving it works
- Commit after EACH fix, not in a batch at the end
- The final re-test must produce a clean set of all-passing screenshots
-
 ## Known issues and workarounds

 ### Problem: "Database error finding user" on signup
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,12 +1,8 @@
-### Why / What / How
-
-<!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? -->
-<!-- What: What does this PR change? Summarize the changes at a high level. -->
-<!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. -->
+<!-- Clearly explain the need for these changes: -->

 ### Changes 🏗️

-<!-- List the key changes. Keep it higher level than the diff but specific enough to highlight what's new/modified. -->
+<!-- Concisely describe all of the changes made in this pull request: -->

 ### Checklist 📋

--- a/autogpt_platform/CLAUDE.md
+++ b/autogpt_platform/CLAUDE.md
@@ -55,7 +55,6 @@ AutoGPT Platform is a monorepo containing:
 - Create the PR against the `dev` branch of the repository.
 - Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
 - Use conventional commit messages (see below)
- **Structure the PR description with Why / What / How** — Why: the motivation (what problem it solves, what's broken/missing without it); What: high-level summary of changes; How: approach, key implementation details, or architecture decisions. Reviewers need all three to judge whether the approach fits the problem.
 - Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
 - Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
  ```bash
--- a/autogpt_platform/autogpt_libs/poetry.lock
+++ b/autogpt_platform/autogpt_libs/poetry.lock
@@ -1,4 +1,4 @@
-# This file is automatically @generated by Poetry 2.2.1 and should not be changed by hand.
+# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.

 [[package]]
 name = "annotated-doc"
@@ -67,7 +67,7 @@ description = "Backport of asyncio.Runner, a context manager that controls event
 optional = false
 python-versions = "<3.11,>=3.8"
 groups = ["dev"]
-markers = "python_version == \"3.10\""
+markers = "python_version < \"3.11\""
 files = [
    {file = "backports_asyncio_runner-1.2.0-py3-none-any.whl", hash = "sha256:0da0a936a8aeb554eccb426dc55af3ba63bcdc69fa1a600b5bb305413a4477b5"},
    {file = "backports_asyncio_runner-1.2.0.tar.gz", hash = "sha256:a5aa7b2b7d8f8bfcaa2b57313f70792df84e32a2a746f585213373f900b42162"},
@@ -541,7 +541,7 @@ description = "Backport of PEP 654 (exception groups)"
 optional = false
 python-versions = ">=3.7"
 groups = ["main", "dev"]
-markers = "python_version == \"3.10\""
+markers = "python_version < \"3.11\""
 files = [
    {file = "exceptiongroup-1.3.0-py3-none-any.whl", hash = "sha256:4d111e6e0c13d0644cad6ddaa7ed0261a0b36971f6d23e7ec9b4b9097da78a10"},
    {file = "exceptiongroup-1.3.0.tar.gz", hash = "sha256:b241f5885f560bc56a59ee63ca4c6a8bfa46ae4ad651af316d4e81817bb9fd88"},
@@ -2181,14 +2181,14 @@ testing = ["coverage (>=6.2)", "hypothesis (>=5.7.1)"]

 [[package]]
 name = "pytest-cov"
-version = "7.1.0"
+version = "7.0.0"
 description = "Pytest plugin for measuring coverage."
 optional = false
 python-versions = ">=3.9"
 groups = ["dev"]
 files = [
-    {file = "pytest_cov-7.1.0-py3-none-any.whl", hash = "sha256:a0461110b7865f9a271aa1b51e516c9a95de9d696734a2f71e3e78f46e1d4678"},
-    {file = "pytest_cov-7.1.0.tar.gz", hash = "sha256:30674f2b5f6351aa09702a9c8c364f6a01c27aae0c1366ae8016160d1efc56b2"},
+    {file = "pytest_cov-7.0.0-py3-none-any.whl", hash = "sha256:3b8e9558b16cc1479da72058bdecf8073661c7f57f7d3c5f22a1c23507f2d861"},
+    {file = "pytest_cov-7.0.0.tar.gz", hash = "sha256:33c97eda2e049a0c5298e91f519302a1334c26ac65c1a483d6206fd458361af1"},
 ]

 [package.dependencies]
@@ -2342,30 +2342,30 @@ pyasn1 = ">=0.1.3"

 [[package]]
 name = "ruff"
-version = "0.15.7"
+version = "0.15.0"
 description = "An extremely fast Python linter and code formatter, written in Rust."
 optional = false
 python-versions = ">=3.7"
 groups = ["dev"]
 files = [
-    {file = "ruff-0.15.7-py3-none-linux_armv6l.whl", hash = "sha256:a81cc5b6910fb7dfc7c32d20652e50fa05963f6e13ead3c5915c41ac5d16668e"},
-    {file = "ruff-0.15.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:722d165bd52403f3bdabc0ce9e41fc47070ac56d7a91b4e0d097b516a53a3477"},
-    {file = "ruff-0.15.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:7fbc2448094262552146cbe1b9643a92f66559d3761f1ad0656d4991491af49e"},
-    {file = "ruff-0.15.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b39329b60eba44156d138275323cc726bbfbddcec3063da57caa8a8b1d50adf"},
-    {file = "ruff-0.15.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:87768c151808505f2bfc93ae44e5f9e7c8518943e5074f76ac21558ef5627c85"},
-    {file = "ruff-0.15.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:fb0511670002c6c529ec66c0e30641c976c8963de26a113f3a30456b702468b0"},
-    {file = "ruff-0.15.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e0d19644f801849229db8345180a71bee5407b429dd217f853ec515e968a6912"},
-    {file = "ruff-0.15.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4806d8e09ef5e84eb19ba833d0442f7e300b23fe3f0981cae159a248a10f0036"},
-    {file = "ruff-0.15.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dce0896488562f09a27b9c91b1f58a097457143931f3c4d519690dea54e624c5"},
-    {file = "ruff-0.15.7-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:1852ce241d2bc89e5dc823e03cff4ce73d816b5c6cdadd27dbfe7b03217d2a12"},
-    {file = "ruff-0.15.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5f3e4b221fb4bd293f79912fc5e93a9063ebd6d0dcbd528f91b89172a9b8436c"},
-    {file = "ruff-0.15.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:b15e48602c9c1d9bdc504b472e90b90c97dc7d46c7028011ae67f3861ceba7b4"},
-    {file = "ruff-0.15.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:1b4705e0e85cedc74b0a23cf6a179dbb3df184cb227761979cc76c0440b5ab0d"},
-    {file = "ruff-0.15.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:112c1fa316a558bb34319282c1200a8bf0495f1b735aeb78bfcb2991e6087580"},
-    {file = "ruff-0.15.7-py3-none-win32.whl", hash = "sha256:6d39e2d3505b082323352f733599f28169d12e891f7dd407f2d4f54b4c2886de"},
-    {file = "ruff-0.15.7-py3-none-win_amd64.whl", hash = "sha256:4d53d712ddebcd7dace1bc395367aec12c057aacfe9adbb6d832302575f4d3a1"},
-    {file = "ruff-0.15.7-py3-none-win_arm64.whl", hash = "sha256:18e8d73f1c3fdf27931497972250340f92e8c861722161a9caeb89a58ead6ed2"},
-    {file = "ruff-0.15.7.tar.gz", hash = "sha256:04f1ae61fc20fe0b148617c324d9d009b5f63412c0b16474f3d5f1a1a665f7ac"},
+    {file = "ruff-0.15.0-py3-none-linux_armv6l.whl", hash = "sha256:aac4ebaa612a82b23d45964586f24ae9bc23ca101919f5590bdb368d74ad5455"},
+    {file = "ruff-0.15.0-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:dcd4be7cc75cfbbca24a98d04d0b9b36a270d0833241f776b788d59f4142b14d"},
+    {file = "ruff-0.15.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d747e3319b2bce179c7c1eaad3d884dc0a199b5f4d5187620530adf9105268ce"},
+    {file = "ruff-0.15.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:650bd9c56ae03102c51a5e4b554d74d825ff3abe4db22b90fd32d816c2e90621"},
+    {file = "ruff-0.15.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a6664b7eac559e3048223a2da77769c2f92b43a6dfd4720cef42654299a599c9"},
+    {file = "ruff-0.15.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6f811f97b0f092b35320d1556f3353bf238763420ade5d9e62ebd2b73f2ff179"},
+    {file = "ruff-0.15.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:761ec0a66680fab6454236635a39abaf14198818c8cdf691e036f4bc0f406b2d"},
+    {file = "ruff-0.15.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:940f11c2604d317e797b289f4f9f3fa5555ffe4fb574b55ed006c3d9b6f0eb78"},
+    {file = "ruff-0.15.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bcbca3d40558789126da91d7ef9a7c87772ee107033db7191edefa34e2c7f1b4"},
+    {file = "ruff-0.15.0-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:9a121a96db1d75fa3eb39c4539e607f628920dd72ff1f7c5ee4f1b768ac62d6e"},
+    {file = "ruff-0.15.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5298d518e493061f2eabd4abd067c7e4fb89e2f63291c94332e35631c07c3662"},
+    {file = "ruff-0.15.0-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:afb6e603d6375ff0d6b0cee563fa21ab570fd15e65c852cb24922cef25050cf1"},
+    {file = "ruff-0.15.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:77e515f6b15f828b94dc17d2b4ace334c9ddb7d9468c54b2f9ed2b9c1593ef16"},
+    {file = "ruff-0.15.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:6f6e80850a01eb13b3e42ee0ebdf6e4497151b48c35051aab51c101266d187a3"},
+    {file = "ruff-0.15.0-py3-none-win32.whl", hash = "sha256:238a717ef803e501b6d51e0bdd0d2c6e8513fe9eec14002445134d3907cd46c3"},
+    {file = "ruff-0.15.0-py3-none-win_amd64.whl", hash = "sha256:dd5e4d3301dc01de614da3cdffc33d4b1b96fb89e45721f1598e5532ccf78b18"},
+    {file = "ruff-0.15.0-py3-none-win_arm64.whl", hash = "sha256:c480d632cc0ca3f0727acac8b7d053542d9e114a462a145d0b00e7cd658c515a"},
+    {file = "ruff-0.15.0.tar.gz", hash = "sha256:6bdea47cdbea30d40f8f8d7d69c0854ba7c15420ec75a26f463290949d7f7e9a"},
 ]

 [[package]]
@@ -2564,7 +2564,7 @@ description = "A lil' TOML parser"
 optional = false
 python-versions = ">=3.8"
 groups = ["dev"]
-markers = "python_version == \"3.10\""
+markers = "python_version < \"3.11\""
 files = [
    {file = "tomli-2.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:678e4fa69e4575eb77d103de3df8a895e1591b48e740211bd1067378c69e8249"},
    {file = "tomli-2.2.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:023aa114dd824ade0100497eb2318602af309e5a55595f76b626d6d9f3b7b0a6"},
@@ -2912,4 +2912,4 @@ type = ["pytest-mypy"]
 [metadata]
 lock-version = "2.1"
 python-versions = ">=3.10,<4.0"
-content-hash = "e0936a065565550afed18f6298b7e04e814b44100def7049f1a0d68662624a39"
+content-hash = "9619cae908ad38fa2c48016a58bcf4241f6f5793aa0e6cc140276e91c433cbbb"
--- a/autogpt_platform/autogpt_libs/pyproject.toml
+++ b/autogpt_platform/autogpt_libs/pyproject.toml
@@ -26,8 +26,8 @@ pyright = "^1.1.408"
 pytest = "^8.4.1"
 pytest-asyncio = "^1.3.0"
 pytest-mock = "^3.15.1"
-pytest-cov = "^7.1.0"
-ruff = "^0.15.7"
+pytest-cov = "^7.0.0"
+ruff = "^0.15.0"

 [build-system]
 requires = ["poetry-core"]
--- a/autogpt_platform/backend/Dockerfile
+++ b/autogpt_platform/backend/Dockerfile
@@ -121,20 +121,36 @@ RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
 COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries

-# Install agent-browser (Copilot browser tool) using the system chromium package.
-# Chrome for Testing (the binary agent-browser downloads via `agent-browser install`)
-# has no ARM64 builds, so we use the distro-packaged chromium instead — verified to
-# work with agent-browser via Docker tests on arm64; amd64 is validated in CI.
-# Note: system chromium tracks the Debian package schedule rather than a pinned
-# Chrome for Testing release. If agent-browser requires a specific Chrome version,
-# verify compatibility against the chromium package version in the base image.
+# Install agent-browser (Copilot browser tool) + Chromium.
+# On amd64: install runtime libs + run `agent-browser install` to download
+#   Chrome for Testing (pinned version, tested with Playwright).
+# On arm64: install system chromium package — Chrome for Testing has no ARM64
+#   binary. AGENT_BROWSER_EXECUTABLE_PATH is set at runtime by the entrypoint
+#   script (below) to redirect agent-browser to the system binary.
+ARG TARGETARCH
 RUN apt-get update \
-    && apt-get install -y --no-install-recommends chromium fonts-liberation \
+    && if [ "$TARGETARCH" = "arm64" ]; then \
+         apt-get install -y --no-install-recommends chromium fonts-liberation; \
+       else \
+         apt-get install -y --no-install-recommends \
+           libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
+           libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
+           libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
+           libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
+           fonts-liberation libfontconfig1; \
+       fi \
    && rm -rf /var/lib/apt/lists/* \
    && npm install -g agent-browser \
+    && ([ "$TARGETARCH" = "arm64" ] || agent-browser install) \
    && rm -rf /tmp/* /root/.npm

-ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
+# On arm64 the system chromium is at /usr/bin/chromium; set
+# AGENT_BROWSER_EXECUTABLE_PATH so agent-browser's daemon uses it instead of
+# Chrome for Testing (which has no ARM64 binary). On amd64 the variable is left
+# unset so agent-browser uses the Chrome for Testing binary it downloaded above.
+RUN printf '#!/bin/sh\n[ -x /usr/bin/chromium ] && export AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium\nexec "$@"\n' \
+    > /usr/local/bin/entrypoint.sh \
+    && chmod +x /usr/local/bin/entrypoint.sh

 WORKDIR /app/autogpt_platform/backend

@@ -157,4 +173,5 @@ RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \

 ENV PORT=8000

+ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
 CMD ["rest"]
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
@@ -1,93 +0,0 @@
-from unittest.mock import AsyncMock, MagicMock, patch
-
-import pytest
-
-from backend.data.graph import get_graph_as_admin
-
-# Shared constants
-ADMIN_USER_ID = "admin-user-id"
-CREATOR_USER_ID = "other-creator-id"
-GRAPH_ID = "test-graph-id"
-GRAPH_VERSION = 3
-
-
-def _make_mock_graph(user_id: str = CREATOR_USER_ID) -> MagicMock:
-    graph = MagicMock()
-    graph.userId = user_id
-    graph.id = GRAPH_ID
-    graph.version = GRAPH_VERSION
-    graph.Nodes = []
-    return graph
-
-
-@pytest.mark.asyncio
-async def test_admin_can_access_pending_agent_not_owned() -> None:
-    """Admin must be able to access a graph they don't own even if it's not
-    APPROVED in the marketplace. This is the core use case: reviewing a
-    submitted-but-pending agent from the admin dashboard."""
-    mock_graph = _make_mock_graph()
-    mock_graph_model = MagicMock(name="GraphModel")
-
-    with (
-        patch(
-            "backend.data.graph.AgentGraph.prisma",
-        ) as mock_prisma,
-        patch(
-            "backend.data.graph.GraphModel.from_db",
-            return_value=mock_graph_model,
-        ),
-    ):
-        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
-
-        result = await get_graph_as_admin(
-            graph_id=GRAPH_ID,
-            version=GRAPH_VERSION,
-            user_id=ADMIN_USER_ID,
-            for_export=False,
-        )
-
-    assert (
-        result is not None
-    ), "Admin should be able to access a pending agent they don't own"
-    assert result is mock_graph_model
-
-
-@pytest.mark.asyncio
-async def test_admin_download_pending_agent_with_subagents() -> None:
-    """Admin export (for_export=True) of a pending agent must include
-    sub-graphs. This exercises the full export code path that the Download
-    button uses."""
-    mock_graph = _make_mock_graph()
-    mock_sub_graph = MagicMock(name="SubGraph")
-    mock_graph_model = MagicMock(name="GraphModel")
-
-    with (
-        patch(
-            "backend.data.graph.AgentGraph.prisma",
-        ) as mock_prisma,
-        patch(
-            "backend.data.graph.get_sub_graphs",
-            new_callable=AsyncMock,
-            return_value=[mock_sub_graph],
-        ) as mock_get_sub,
-        patch(
-            "backend.data.graph.GraphModel.from_db",
-            return_value=mock_graph_model,
-        ) as mock_from_db,
-    ):
-        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
-
-        result = await get_graph_as_admin(
-            graph_id=GRAPH_ID,
-            version=GRAPH_VERSION,
-            user_id=ADMIN_USER_ID,
-            for_export=True,
-        )
-
-    assert result is not None, "Admin export of pending agent must succeed"
-    mock_get_sub.assert_awaited_once_with(mock_graph)
-    mock_from_db.assert_called_once_with(
-        graph=mock_graph,
-        sub_graphs=[mock_sub_graph],
-        for_export=True,
-    )
--- a/autogpt_platform/backend/backend/api/rest_api.py
+++ b/autogpt_platform/backend/backend/api/rest_api.py
@@ -1,3 +1,4 @@
+import asyncio
 import contextlib
 import logging
 import platform
@@ -37,8 +38,10 @@ import backend.api.features.workspace.routes as workspace_routes
 import backend.data.block
 import backend.data.db
 import backend.data.graph
+import backend.data.llm_registry
 import backend.data.user
 import backend.integrations.webhooks.utils
+import backend.server.v2.llm
 import backend.util.service
 import backend.util.settings
 from backend.api.features.library.exceptions import (
@@ -117,16 +120,56 @@ async def lifespan_context(app: fastapi.FastAPI):

    AutoRegistry.patch_integrations()

+    # Load LLM registry before initializing blocks so blocks can use registry data.
+    # Tries Redis first (fast path on warm restart), falls back to DB.
+    # Note: Graceful fallback for now since no blocks consume registry yet (comes in PR #5)
+    try:
+        await backend.data.llm_registry.refresh_llm_registry()
+        logger.info("LLM registry loaded successfully at startup")
+    except Exception as e:
+        logger.warning(
+            f"Failed to load LLM registry at startup: {e}. "
+            "Blocks will initialize with empty registry."
+        )
+
+    # Start background task so this worker reloads its in-process cache whenever
+    # another worker (e.g. the admin API) refreshes the registry.
+    _registry_subscription_task = asyncio.create_task(
+        backend.data.llm_registry.subscribe_to_registry_refresh(
+            backend.data.llm_registry.refresh_llm_registry
+        )
+    )
+
    await backend.data.block.initialize_blocks()

    await backend.data.user.migrate_and_encrypt_user_integrations()
    await backend.data.graph.fix_llm_provider_credentials()
-    await backend.data.graph.migrate_llm_models(DEFAULT_LLM_MODEL)
+    try:
+        await backend.data.graph.migrate_llm_models(DEFAULT_LLM_MODEL)
+    except Exception as e:
+        err_str = str(e)
+        if "AgentNode" in err_str or "does not exist" in err_str:
+            logger.warning(
+                f"migrate_llm_models skipped: AgentNode table not found ({e}). "
+                "This is expected in test environments."
+            )
+        else:
+            logger.error(
+                f"migrate_llm_models failed unexpectedly: {e}",
+                exc_info=True,
+            )
+
    await backend.integrations.webhooks.utils.migrate_legacy_triggered_graphs()

    with launch_darkly_context():
        yield

+    _registry_subscription_task.cancel()
+    try:
+        await _registry_subscription_task
+    except asyncio.CancelledError:
+        pass
+
    try:
        await shutdown_cloud_storage_handler()
    except Exception as e:
@@ -355,6 +398,16 @@ app.include_router(
    tags=["oauth"],
    prefix="/api/oauth",
 )
+app.include_router(
+    backend.server.v2.llm.router,
+    tags=["v2", "llm"],
+    prefix="/api",
+)
+app.include_router(
+    backend.server.v2.llm.admin_router,
+    tags=["v2", "llm", "admin"],
+    prefix="/api",
+)

 app.mount("/external-api", external_api)

--- a/autogpt_platform/backend/backend/blocks/autopilot.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot.py
@@ -15,12 +15,6 @@ from backend.blocks._base import (
    BlockSchemaInput,
    BlockSchemaOutput,
 )
-from backend.copilot.permissions import (
-    CopilotPermissions,
-    ToolName,
-    all_known_tool_names,
-    validate_block_identifiers,
-)
 from backend.data.model import SchemaField

 if TYPE_CHECKING:
@@ -102,50 +96,6 @@ class AutoPilotBlock(Block):
            advanced=True,
        )

-        tools: list[ToolName] = SchemaField(
-            description=(
-                "Tool names to filter. Works with tools_exclude to form an "
-                "allow-list or deny-list. "
-                "Leave empty to apply no tool filter."
-            ),
-            default=[],
-            advanced=True,
-        )
-
-        tools_exclude: bool = SchemaField(
-            description=(
-                "Controls how the 'tools' list is interpreted. "
-                "True (default): 'tools' is a deny-list — listed tools are blocked, "
-                "all others are allowed. An empty 'tools' list means allow everything. "
-                "False: 'tools' is an allow-list — only listed tools are permitted."
-            ),
-            default=True,
-            advanced=True,
-        )
-
-        blocks: list[str] = SchemaField(
-            description=(
-                "Block identifiers to filter when the copilot uses run_block. "
-                "Each entry can be: a block name (e.g. 'HTTP Request'), "
-                "a full block UUID, or the first 8 hex characters of the UUID "
-                "(e.g. 'c069dc6b'). Works with blocks_exclude. "
-                "Leave empty to apply no block filter."
-            ),
-            default=[],
-            advanced=True,
-        )
-
-        blocks_exclude: bool = SchemaField(
-            description=(
-                "Controls how the 'blocks' list is interpreted. "
-                "True (default): 'blocks' is a deny-list — listed blocks are blocked, "
-                "all others are allowed. An empty 'blocks' list means allow everything. "
-                "False: 'blocks' is an allow-list — only listed blocks are permitted."
-            ),
-            default=True,
-            advanced=True,
-        )
-
        # timeout_seconds removed: the SDK manages its own heartbeat-based
        # timeouts internally; wrapping with asyncio.timeout corrupts the
        # SDK's internal stream (see service.py CRITICAL comment).
@@ -234,7 +184,7 @@ class AutoPilotBlock(Block):

    async def create_session(self, user_id: str) -> str:
        """Create a new chat session and return its ID (mockable for tests)."""
-        from backend.copilot.model import create_chat_session  # avoid circular import
+        from backend.copilot.model import create_chat_session

        session = await create_chat_session(user_id)
        return session.session_id
@@ -246,7 +196,6 @@ class AutoPilotBlock(Block):
        session_id: str,
        max_recursion_depth: int,
        user_id: str,
-        permissions: "CopilotPermissions | None" = None,
    ) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
        """Invoke the copilot and collect all stream results.

@@ -260,21 +209,14 @@ class AutoPilotBlock(Block):
            session_id: Chat session to use.
            max_recursion_depth: Maximum allowed recursion nesting.
            user_id: Authenticated user ID.
-            permissions: Optional capability filter restricting tools/blocks.

        Returns:
            A tuple of (response_text, tool_calls, history_json, session_id, usage).
        """
-        from backend.copilot.sdk.collect import (
-            collect_copilot_response,  # avoid circular import
-        )
+        from backend.copilot.sdk.collect import collect_copilot_response

        tokens = _check_recursion(max_recursion_depth)
-        perm_token = None
        try:
-            effective_permissions, perm_token = _merge_inherited_permissions(
-                permissions
-            )
            effective_prompt = prompt
            if system_context:
                effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
@@ -283,7 +225,6 @@ class AutoPilotBlock(Block):
                session_id=session_id,
                message=effective_prompt,
                user_id=user_id,
-                permissions=effective_permissions,
            )

            # Build a lightweight conversation summary from streamed data.
@@ -330,8 +271,6 @@ class AutoPilotBlock(Block):
            )
        finally:
            _reset_recursion(tokens)
-            if perm_token is not None:
-                _inherited_permissions.reset(perm_token)

    async def run(
        self,
@@ -356,13 +295,6 @@ class AutoPilotBlock(Block):
            yield "error", "max_recursion_depth must be at least 1."
            return

-        # Validate and build permissions eagerly — fail before creating a session.
-        permissions = await _build_and_validate_permissions(input_data)
-        if isinstance(permissions, str):
-            # Validation error returned as a string message.
-            yield "error", permissions
-            return
-
        # Create session eagerly so the user always gets the session_id,
        # even if the downstream stream fails (avoids orphaned sessions).
        sid = input_data.session_id
@@ -380,7 +312,6 @@ class AutoPilotBlock(Block):
                session_id=sid,
                max_recursion_depth=input_data.max_recursion_depth,
                user_id=execution_context.user_id,
-                permissions=permissions,
            )

            yield "response", response
@@ -443,78 +374,3 @@ def _reset_recursion(
    """Restore recursion depth and limit to their previous values."""
    _autopilot_recursion_depth.reset(tokens[0])
    _autopilot_recursion_limit.reset(tokens[1])
-
-
-# ---------------------------------------------------------------------------
-# Permission helpers
-# ---------------------------------------------------------------------------
-
-# Inherited permissions from a parent AutoPilotBlock execution.
-# This acts as a ceiling: child executions can only be more restrictive.
-_inherited_permissions: contextvars.ContextVar["CopilotPermissions | None"] = (
-    contextvars.ContextVar("_inherited_permissions", default=None)
-)
-
-
-async def _build_and_validate_permissions(
-    input_data: "AutoPilotBlock.Input",
-) -> "CopilotPermissions | str":
-    """Build a :class:`CopilotPermissions` from block input and validate it.
-
-    Returns a :class:`CopilotPermissions` on success or a human-readable
-    error string if validation fails.
-    """
-    # Tool names are validated by Pydantic via the ToolName Literal type
-    # at model construction time — no runtime check needed here.
-    # Validate block identifiers against live block registry.
-    if input_data.blocks:
-        invalid_blocks = await validate_block_identifiers(input_data.blocks)
-        if invalid_blocks:
-            return (
-                f"Unknown block identifier(s) in 'blocks': {invalid_blocks}. "
-                "Use find_block to discover valid block names and IDs. "
-                "You may also use the first 8 characters of a block UUID."
-            )
-
-    return CopilotPermissions(
-        tools=list(input_data.tools),
-        tools_exclude=input_data.tools_exclude,
-        blocks=input_data.blocks,
-        blocks_exclude=input_data.blocks_exclude,
-    )
-
-
-def _merge_inherited_permissions(
-    permissions: "CopilotPermissions | None",
-) -> "tuple[CopilotPermissions | None, contextvars.Token[CopilotPermissions | None] | None]":
-    """Merge *permissions* with any inherited parent permissions.
-
-    The merged result is stored back into the contextvar so that any nested
-    AutoPilotBlock invocation (sub-agent) inherits the merged ceiling.
-
-    Returns a tuple of (merged_permissions, reset_token).  The caller MUST
-    reset the contextvar via ``_inherited_permissions.reset(token)`` in a
-    ``finally`` block when ``reset_token`` is not None — this prevents
-    permission leakage between sequential independent executions in the same
-    asyncio task.
-    """
-    parent = _inherited_permissions.get()
-
-    if permissions is None and parent is None:
-        return None, None
-
-    all_tools = all_known_tool_names()
-
-    if permissions is None:
-        permissions = CopilotPermissions()  # allow-all; will be narrowed by parent
-
-    merged = (
-        permissions.merged_with_parent(parent, all_tools)
-        if parent is not None
-        else permissions
-    )
-
-    # Store merged permissions as the new inherited ceiling for nested calls.
-    # Return the token so the caller can restore the previous value in finally.
-    token = _inherited_permissions.set(merged)
-    return merged, token
--- a/autogpt_platform/backend/backend/blocks/autopilot_permissions_test.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot_permissions_test.py
@@ -1,265 +0,0 @@
-"""Tests for AutoPilotBlock permission fields and validation."""
-
-from __future__ import annotations
-
-from unittest.mock import AsyncMock, MagicMock, patch
-
-import pytest
-from pydantic import ValidationError
-
-from backend.blocks.autopilot import (
-    AutoPilotBlock,
-    _build_and_validate_permissions,
-    _inherited_permissions,
-    _merge_inherited_permissions,
-)
-from backend.copilot.permissions import CopilotPermissions, all_known_tool_names
-from backend.data.execution import ExecutionContext
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-
-def _make_input(**kwargs) -> AutoPilotBlock.Input:
-    defaults = {
-        "prompt": "Do something",
-        "system_context": "",
-        "session_id": "",
-        "max_recursion_depth": 3,
-        "tools": [],
-        "tools_exclude": True,
-        "blocks": [],
-        "blocks_exclude": True,
-    }
-    defaults.update(kwargs)
-    return AutoPilotBlock.Input(**defaults)
-
-
-# ---------------------------------------------------------------------------
-# _build_and_validate_permissions
-# ---------------------------------------------------------------------------
-
-
-@pytest.mark.asyncio
-class TestBuildAndValidatePermissions:
-    async def test_empty_inputs_returns_empty_permissions(self):
-        inp = _make_input()
-        result = await _build_and_validate_permissions(inp)
-        assert isinstance(result, CopilotPermissions)
-        assert result.is_empty()
-
-    async def test_valid_tool_names_accepted(self):
-        inp = _make_input(tools=["run_block", "web_fetch"], tools_exclude=True)
-        result = await _build_and_validate_permissions(inp)
-        assert isinstance(result, CopilotPermissions)
-        assert result.tools == ["run_block", "web_fetch"]
-        assert result.tools_exclude is True
-
-    async def test_invalid_tool_rejected_by_pydantic(self):
-        """Invalid tool names are now caught at Pydantic validation time
-        (Literal type), before ``_build_and_validate_permissions`` is called."""
-        with pytest.raises(ValidationError, match="not_a_real_tool"):
-            _make_input(tools=["not_a_real_tool"])
-
-    async def test_valid_block_name_accepted(self):
-        mock_block_cls = MagicMock()
-        mock_block_cls.return_value.name = "HTTP Request"
-        with patch(
-            "backend.blocks.get_blocks",
-            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
-        ):
-            inp = _make_input(blocks=["HTTP Request"], blocks_exclude=True)
-            result = await _build_and_validate_permissions(inp)
-        assert isinstance(result, CopilotPermissions)
-        assert result.blocks == ["HTTP Request"]
-
-    async def test_valid_partial_uuid_accepted(self):
-        mock_block_cls = MagicMock()
-        mock_block_cls.return_value.name = "HTTP Request"
-        with patch(
-            "backend.blocks.get_blocks",
-            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
-        ):
-            inp = _make_input(blocks=["c069dc6b"], blocks_exclude=False)
-            result = await _build_and_validate_permissions(inp)
-        assert isinstance(result, CopilotPermissions)
-
-    async def test_invalid_block_identifier_returns_error(self):
-        mock_block_cls = MagicMock()
-        mock_block_cls.return_value.name = "HTTP Request"
-        with patch(
-            "backend.blocks.get_blocks",
-            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
-        ):
-            inp = _make_input(blocks=["totally_fake_block"])
-            result = await _build_and_validate_permissions(inp)
-        assert isinstance(result, str)
-        assert "totally_fake_block" in result
-        assert "Unknown block identifier" in result
-
-    async def test_sdk_builtin_tool_names_accepted(self):
-        inp = _make_input(tools=["Read", "Task", "WebSearch"], tools_exclude=False)
-        result = await _build_and_validate_permissions(inp)
-        assert isinstance(result, CopilotPermissions)
-        assert not result.tools_exclude
-
-    async def test_empty_blocks_skips_validation(self):
-        # Should not call validate_block_identifiers at all when blocks=[].
-        with patch(
-            "backend.copilot.permissions.validate_block_identifiers"
-        ) as mock_validate:
-            inp = _make_input(blocks=[])
-            await _build_and_validate_permissions(inp)
-            mock_validate.assert_not_called()
-
-
-# ---------------------------------------------------------------------------
-# _merge_inherited_permissions
-# ---------------------------------------------------------------------------
-
-
-class TestMergeInheritedPermissions:
-    def test_no_permissions_no_parent_returns_none(self):
-        merged, token = _merge_inherited_permissions(None)
-        assert merged is None
-        assert token is None
-
-    def test_permissions_no_parent_returned_unchanged(self):
-        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
-        merged, token = _merge_inherited_permissions(perms)
-        try:
-            assert merged is perms
-            assert token is not None
-        finally:
-            if token is not None:
-                _inherited_permissions.reset(token)
-
-    def test_child_narrows_parent(self):
-        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
-        # Set parent as inherited
-        outer_token = _inherited_permissions.set(parent)
-        try:
-            child = CopilotPermissions(tools=["web_fetch"], tools_exclude=True)
-            merged, inner_token = _merge_inherited_permissions(child)
-            try:
-                assert merged is not None
-                all_t = all_known_tool_names()
-                effective = merged.effective_allowed_tools(all_t)
-                assert "bash_exec" not in effective
-                assert "web_fetch" not in effective
-            finally:
-                if inner_token is not None:
-                    _inherited_permissions.reset(inner_token)
-        finally:
-            _inherited_permissions.reset(outer_token)
-
-    def test_none_permissions_with_parent_uses_parent(self):
-        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
-        outer_token = _inherited_permissions.set(parent)
-        try:
-            merged, inner_token = _merge_inherited_permissions(None)
-            try:
-                assert merged is not None
-                # Merged should have parent's restrictions
-                effective = merged.effective_allowed_tools(all_known_tool_names())
-                assert "bash_exec" not in effective
-            finally:
-                if inner_token is not None:
-                    _inherited_permissions.reset(inner_token)
-        finally:
-            _inherited_permissions.reset(outer_token)
-
-    def test_child_cannot_expand_parent_whitelist(self):
-        parent = CopilotPermissions(tools=["run_block"], tools_exclude=False)
-        outer_token = _inherited_permissions.set(parent)
-        try:
-            # Child tries to allow more tools
-            child = CopilotPermissions(
-                tools=["run_block", "bash_exec"], tools_exclude=False
-            )
-            merged, inner_token = _merge_inherited_permissions(child)
-            try:
-                assert merged is not None
-                effective = merged.effective_allowed_tools(all_known_tool_names())
-                assert "bash_exec" not in effective
-                assert "run_block" in effective
-            finally:
-                if inner_token is not None:
-                    _inherited_permissions.reset(inner_token)
-        finally:
-            _inherited_permissions.reset(outer_token)
-
-
-# ---------------------------------------------------------------------------
-# AutoPilotBlock.run — validation integration
-# ---------------------------------------------------------------------------
-
-
-@pytest.mark.asyncio
-class TestAutoPilotBlockRunPermissions:
-    async def _collect_outputs(self, block, input_data, user_id="test-user"):
-        """Helper to collect all yields from block.run()."""
-        ctx = ExecutionContext(
-            user_id=user_id,
-            graph_id="g1",
-            graph_exec_id="ge1",
-            node_exec_id="ne1",
-            node_id="n1",
-        )
-        outputs = {}
-        async for key, val in block.run(input_data, execution_context=ctx):
-            outputs[key] = val
-        return outputs
-
-    async def test_invalid_tool_rejected_by_pydantic(self):
-        """Invalid tool names are caught at Pydantic validation (Literal type)."""
-        with pytest.raises(ValidationError, match="not_a_tool"):
-            _make_input(tools=["not_a_tool"])
-
-    async def test_invalid_block_yields_error(self):
-        mock_block_cls = MagicMock()
-        mock_block_cls.return_value.name = "HTTP Request"
-        with patch(
-            "backend.blocks.get_blocks",
-            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
-        ):
-            block = AutoPilotBlock()
-            inp = _make_input(blocks=["nonexistent_block"])
-            outputs = await self._collect_outputs(block, inp)
-        assert "error" in outputs
-        assert "nonexistent_block" in outputs["error"]
-
-    async def test_empty_prompt_yields_error_before_permission_check(self):
-        block = AutoPilotBlock()
-        inp = _make_input(prompt="   ", tools=["run_block"])
-        outputs = await self._collect_outputs(block, inp)
-        assert "error" in outputs
-        assert "Prompt cannot be empty" in outputs["error"]
-
-    async def test_valid_permissions_passed_to_execute(self):
-        """Permissions are forwarded to execute_copilot when valid."""
-        block = AutoPilotBlock()
-        captured: dict = {}
-
-        async def fake_execute_copilot(self_inner, **kwargs):
-            captured["permissions"] = kwargs.get("permissions")
-            return (
-                "ok",
-                [],
-                '[{"role":"user","content":"hi"}]',
-                "test-sid",
-                {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2},
-            )
-
-        with patch.object(
-            AutoPilotBlock, "create_session", new=AsyncMock(return_value="test-sid")
-        ), patch.object(AutoPilotBlock, "execute_copilot", new=fake_execute_copilot):
-            inp = _make_input(tools=["run_block"], tools_exclude=False)
-            outputs = await self._collect_outputs(block, inp)
-
-        assert "error" not in outputs
-        perms = captured.get("permissions")
-        assert isinstance(perms, CopilotPermissions)
-        assert perms.tools == ["run_block"]
-        assert perms.tools_exclude is False
--- a/autogpt_platform/backend/backend/blocks/llm.py
+++ b/autogpt_platform/backend/backend/blocks/llm.py
@@ -49,9 +49,6 @@ settings = Settings()
 logger = TruncatedLogger(logging.getLogger(__name__), "[LLM-Block]")
 fmt = TextFormatter(autoescape=False)

-# HTTP status codes for user-caused errors that should not be reported to Sentry.
-USER_ERROR_STATUS_CODES = (401, 403, 429)
-
 LLMProviderName = Literal[
    ProviderName.AIML_API,
    ProviderName.ANTHROPIC,
@@ -894,60 +891,65 @@ async def llm_call(
        client = anthropic.AsyncAnthropic(
            api_key=credentials.api_key.get_secret_value()
        )
-        resp = await client.messages.create(
-            model=llm_model.value,
-            system=sysprompt,
-            messages=messages,
-            max_tokens=max_tokens,
-            tools=an_tools,
-            timeout=600,
-        )
-
-        if not resp.content:
-            raise ValueError("No content returned from Anthropic.")
-
-        tool_calls = None
-        for content_block in resp.content:
-            # Antropic is different to openai, need to iterate through
-            # the content blocks to find the tool calls
-            if content_block.type == "tool_use":
-                if tool_calls is None:
-                    tool_calls = []
-                tool_calls.append(
-                    ToolContentBlock(
-                        id=content_block.id,
-                        type=content_block.type,
-                        function=ToolCall(
-                            name=content_block.name,
-                            arguments=json.dumps(content_block.input),
-                        ),
-                    )
-                )
-
-        if not tool_calls and resp.stop_reason == "tool_use":
-            logger.warning(
-                f"Tool use stop reason but no tool calls found in content. {resp}"
+        try:
+            resp = await client.messages.create(
+                model=llm_model.value,
+                system=sysprompt,
+                messages=messages,
+                max_tokens=max_tokens,
+                tools=an_tools,
+                timeout=600,
            )

-        reasoning = None
-        for content_block in resp.content:
-            if hasattr(content_block, "type") and content_block.type == "thinking":
-                reasoning = content_block.thinking
-                break
+            if not resp.content:
+                raise ValueError("No content returned from Anthropic.")

-        return LLMResponse(
-            raw_response=resp,
-            prompt=prompt,
-            response=(
-                resp.content[0].name
-                if isinstance(resp.content[0], anthropic.types.ToolUseBlock)
-                else getattr(resp.content[0], "text", "")
-            ),
-            tool_calls=tool_calls,
-            prompt_tokens=resp.usage.input_tokens,
-            completion_tokens=resp.usage.output_tokens,
-            reasoning=reasoning,
-        )
+            tool_calls = None
+            for content_block in resp.content:
+                # Antropic is different to openai, need to iterate through
+                # the content blocks to find the tool calls
+                if content_block.type == "tool_use":
+                    if tool_calls is None:
+                        tool_calls = []
+                    tool_calls.append(
+                        ToolContentBlock(
+                            id=content_block.id,
+                            type=content_block.type,
+                            function=ToolCall(
+                                name=content_block.name,
+                                arguments=json.dumps(content_block.input),
+                            ),
+                        )
+                    )
+
+            if not tool_calls and resp.stop_reason == "tool_use":
+                logger.warning(
+                    f"Tool use stop reason but no tool calls found in content. {resp}"
+                )
+
+            reasoning = None
+            for content_block in resp.content:
+                if hasattr(content_block, "type") and content_block.type == "thinking":
+                    reasoning = content_block.thinking
+                    break
+
+            return LLMResponse(
+                raw_response=resp,
+                prompt=prompt,
+                response=(
+                    resp.content[0].name
+                    if isinstance(resp.content[0], anthropic.types.ToolUseBlock)
+                    else getattr(resp.content[0], "text", "")
+                ),
+                tool_calls=tool_calls,
+                prompt_tokens=resp.usage.input_tokens,
+                completion_tokens=resp.usage.output_tokens,
+                reasoning=reasoning,
+            )
+        except anthropic.APIError as e:
+            error_message = f"Anthropic API error: {str(e)}"
+            logger.error(error_message)
+            raise ValueError(error_message)
    elif provider == "groq":
        if tools:
            raise ValueError("Groq does not support tools.")
@@ -1460,16 +1462,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
                    yield "prompt", self.prompt
                    return
            except Exception as e:
-                is_user_error = (
-                    isinstance(e, (anthropic.APIStatusError, openai.APIStatusError))
-                    and e.status_code in USER_ERROR_STATUS_CODES
-                )
-                if is_user_error:
-                    logger.warning(f"Error calling LLM: {e}")
-                    error_feedback_message = f"Error calling LLM: {e}"
-                    break
-                else:
-                    logger.exception(f"Error calling LLM: {e}")
+                logger.exception(f"Error calling LLM: {e}")
                if (
                    "maximum context length" in str(e).lower()
                    or "token limit" in str(e).lower()
--- a/autogpt_platform/backend/backend/blocks/smart_decision_maker.py
+++ b/autogpt_platform/backend/backend/blocks/smart_decision_maker.py
@@ -258,10 +258,9 @@ def get_pending_tool_calls(conversation_history: list[Any] | None) -> dict[str,
    return {call_id: count for call_id, count in pending_calls.items() if count > 0}


-class OrchestratorBlock(Block):
+class SmartDecisionMakerBlock(Block):
    """
-    A block that uses a language model to orchestrate tool calls, supporting both
-    single-shot and iterative agent mode execution.
+    A block that uses a language model to make smart decisions based on a given prompt.
    """

    class Input(BlockSchemaInput):
@@ -402,8 +401,8 @@ class OrchestratorBlock(Block):
            description="Uses AI to intelligently decide what tool to use.",
            categories={BlockCategory.AI},
            block_type=BlockType.AI,
-            input_schema=OrchestratorBlock.Input,
-            output_schema=OrchestratorBlock.Output,
+            input_schema=SmartDecisionMakerBlock.Input,
+            output_schema=SmartDecisionMakerBlock.Output,
            test_input={
                "prompt": "Hello, World!",
                "credentials": llm.TEST_CREDENTIALS_INPUT,
@@ -441,7 +440,7 @@ class OrchestratorBlock(Block):
        tool_name = custom_name if custom_name else block.name

        tool_function: dict[str, Any] = {
-            "name": OrchestratorBlock.cleanup(tool_name),
+            "name": SmartDecisionMakerBlock.cleanup(tool_name),
            "description": block.description,
        }
        sink_block_input_schema = block.input_schema
@@ -452,7 +451,7 @@ class OrchestratorBlock(Block):
            field_name = link.sink_name
            is_dynamic = is_dynamic_field(field_name)
            # Clean property key to ensure Anthropic API compatibility for ALL fields
-            clean_field_name = OrchestratorBlock.cleanup(field_name)
+            clean_field_name = SmartDecisionMakerBlock.cleanup(field_name)
            field_mapping[clean_field_name] = field_name

            if is_dynamic:
@@ -486,7 +485,7 @@ class OrchestratorBlock(Block):
            field_name = link.sink_name
            is_dynamic = is_dynamic_field(field_name)
            # Always use cleaned field name for property key (Anthropic API compliance)
-            clean_field_name = OrchestratorBlock.cleanup(field_name)
+            clean_field_name = SmartDecisionMakerBlock.cleanup(field_name)

            if is_dynamic:
                base_name = extract_base_field_name(field_name)
@@ -543,7 +542,7 @@ class OrchestratorBlock(Block):
        tool_name = custom_name if custom_name else sink_graph_meta.name

        tool_function: dict[str, Any] = {
-            "name": OrchestratorBlock.cleanup(tool_name),
+            "name": SmartDecisionMakerBlock.cleanup(tool_name),
            "description": sink_graph_meta.description,
        }

@@ -553,7 +552,7 @@ class OrchestratorBlock(Block):
        for link in links:
            field_name = link.sink_name

-            clean_field_name = OrchestratorBlock.cleanup(field_name)
+            clean_field_name = SmartDecisionMakerBlock.cleanup(field_name)
            field_mapping[clean_field_name] = field_name

            sink_block_input_schema = sink_node.input_default["input_schema"]
@@ -619,13 +618,17 @@ class OrchestratorBlock(Block):
                raise ValueError(f"Sink node not found: {links[0].sink_id}")

            if sink_node.block_id == AgentExecutorBlock().id:
-                tool_func = await OrchestratorBlock._create_agent_function_signature(
-                    sink_node, links
+                tool_func = (
+                    await SmartDecisionMakerBlock._create_agent_function_signature(
+                        sink_node, links
+                    )
                )
                return_tool_functions.append(tool_func)
            else:
-                tool_func = await OrchestratorBlock._create_block_function_signature(
-                    sink_node, links
+                tool_func = (
+                    await SmartDecisionMakerBlock._create_block_function_signature(
+                        sink_node, links
+                    )
                )
                return_tool_functions.append(tool_func)

@@ -905,7 +908,7 @@ class OrchestratorBlock(Block):
                task=node_exec_future,
            )

-            # Execute the node directly since we're in the Orchestrator context
+            # Execute the node directly since we're in the SmartDecisionMaker context
            node_exec_future.set_result(
                await execution_processor.on_node_execution(
                    node_exec=node_exec_entry,
@@ -1109,7 +1112,7 @@ class OrchestratorBlock(Block):
                return
        elif input_data.last_tool_output:
            logger.error(
-                f"[OrchestratorBlock-node_exec_id={node_exec_id}] "
+                f"[SmartDecisionMakerBlock-node_exec_id={node_exec_id}] "
                f"No pending tool calls found. This may indicate an issue with the "
                f"conversation history, or the tool giving response more than once."
                f"This should not happen! Please check the conversation history for any inconsistencies."
@@ -1246,7 +1249,7 @@ class OrchestratorBlock(Block):
                emit_key = f"tools_^_{sink_node_id}_~_{original_field_name}"

                logger.debug(
-                    "[OrchestratorBlock|geid:%s|neid:%s] emit %s",
+                    "[SmartDecisionMakerBlock|geid:%s|neid:%s] emit %s",
                    graph_exec_id,
                    node_exec_id,
                    emit_key,
--- a/autogpt_platform/backend/backend/blocks/stagehand/blocks.py
+++ b/autogpt_platform/backend/backend/blocks/stagehand/blocks.py
@@ -1,8 +1,13 @@
 import logging
+import signal
+import threading
+import warnings
+from contextlib import contextmanager
 from enum import Enum

-from stagehand import AsyncStagehand
-from stagehand.types.session_act_params import Options as ActOptions
+# Monkey patch Stagehands to prevent signal handling in worker threads
+import stagehand.main
+from stagehand import Stagehand

 from backend.blocks.llm import (
    MODEL_METADATA,
@@ -23,6 +28,46 @@ from backend.sdk import (
    SchemaField,
 )

+# Suppress false positive cleanup warning of litellm (a dependency of stagehand)
+warnings.filterwarnings("ignore", module="litellm.llms.custom_httpx")
+
+# Store the original method
+original_register_signal_handlers = stagehand.main.Stagehand._register_signal_handlers
+
+
+def safe_register_signal_handlers(self):
+    """Only register signal handlers in the main thread"""
+    if threading.current_thread() is threading.main_thread():
+        original_register_signal_handlers(self)
+    else:
+        # Skip signal handling in worker threads
+        pass
+
+
+# Replace the method
+stagehand.main.Stagehand._register_signal_handlers = safe_register_signal_handlers
+
+
+@contextmanager
+def disable_signal_handling():
+    """Context manager to temporarily disable signal handling"""
+    if threading.current_thread() is not threading.main_thread():
+        # In worker threads, temporarily replace signal.signal with a no-op
+        original_signal = signal.signal
+
+        def noop_signal(*args, **kwargs):
+            pass
+
+        signal.signal = noop_signal
+        try:
+            yield
+        finally:
+            signal.signal = original_signal
+    else:
+        # In main thread, don't modify anything
+        yield
+
+
 logger = logging.getLogger(__name__)


@@ -103,10 +148,13 @@ class StagehandObserveBlock(Block):
        instruction: str = SchemaField(
            description="Natural language description of elements or actions to discover.",
        )
-        dom_settle_timeout_ms: int = SchemaField(
-            description="Timeout in ms to wait for the DOM to settle after navigation.",
-            default=30000,
-            advanced=True,
+        iframes: bool = SchemaField(
+            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
+            default=True,
+        )
+        domSettleTimeoutMs: int = SchemaField(
+            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
+            default=45000,
        )

    class Output(BlockSchemaOutput):
@@ -137,28 +185,32 @@ class StagehandObserveBlock(Block):

        logger.debug(f"OBSERVE: Using model provider {model_credentials.provider}")

-        async with AsyncStagehand(
-            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
-            browserbase_project_id=input_data.browserbase_project_id,
-            model_api_key=model_credentials.api_key.get_secret_value(),
-        ) as client:
-            session = await client.sessions.start(
+        with disable_signal_handling():
+            stagehand = Stagehand(
+                api_key=stagehand_credentials.api_key.get_secret_value(),
+                project_id=input_data.browserbase_project_id,
                model_name=input_data.model.provider_name,
-                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
+                model_api_key=model_credentials.api_key.get_secret_value(),
            )
-            try:
-                await session.navigate(url=input_data.url)

-                observe_response = await session.observe(
-                    instruction=input_data.instruction,
-                )
-                for result in observe_response.data.result:
-                    yield "selector", result.selector
-                    yield "description", result.description
-                    yield "method", result.method
-                    yield "arguments", result.arguments
-            finally:
-                await session.end()
+            await stagehand.init()
+
+        page = stagehand.page
+
+        assert page is not None, "Stagehand page is not initialized"
+
+        await page.goto(input_data.url)
+
+        observe_results = await page.observe(
+            input_data.instruction,
+            iframes=input_data.iframes,
+            domSettleTimeoutMs=input_data.domSettleTimeoutMs,
+        )
+        for result in observe_results:
+            yield "selector", result.selector
+            yield "description", result.description
+            yield "method", result.method
+            yield "arguments", result.arguments


 class StagehandActBlock(Block):
@@ -190,22 +242,24 @@ class StagehandActBlock(Block):
            description="Variables to use in the action. Variables contains data you want the action to use.",
            default_factory=dict,
        )
-        dom_settle_timeout_ms: int = SchemaField(
-            description="Timeout in ms to wait for the DOM to settle after navigation.",
-            default=30000,
-            advanced=True,
+        iframes: bool = SchemaField(
+            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
+            default=True,
        )
-        timeout_ms: int = SchemaField(
-            description="Timeout in ms for each action.",
-            default=30000,
-            advanced=True,
+        domSettleTimeoutMs: int = SchemaField(
+            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
+            default=45000,
+        )
+        timeoutMs: int = SchemaField(
+            description="Timeout in milliseconds for DOM ready. Extended timeout for slow-loading forms",
+            default=60000,
        )

    class Output(BlockSchemaOutput):
        success: bool = SchemaField(
            description="Whether the action was completed successfully"
        )
-        message: str = SchemaField(description="Details about the action's execution.")
+        message: str = SchemaField(description="Details about the action’s execution.")
        action: str = SchemaField(description="Action performed")

    def __init__(self):
@@ -228,33 +282,32 @@ class StagehandActBlock(Block):

        logger.debug(f"ACT: Using model provider {model_credentials.provider}")

-        async with AsyncStagehand(
-            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
-            browserbase_project_id=input_data.browserbase_project_id,
-            model_api_key=model_credentials.api_key.get_secret_value(),
-        ) as client:
-            session = await client.sessions.start(
+        with disable_signal_handling():
+            stagehand = Stagehand(
+                api_key=stagehand_credentials.api_key.get_secret_value(),
+                project_id=input_data.browserbase_project_id,
                model_name=input_data.model.provider_name,
-                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
+                model_api_key=model_credentials.api_key.get_secret_value(),
            )
-            try:
-                await session.navigate(url=input_data.url)

-                for action in input_data.action:
-                    act_options = ActOptions(
-                        variables={k: v for k, v in input_data.variables.items()},
-                        timeout=input_data.timeout_ms,
-                    )
-                    act_response = await session.act(
-                        input=action,
-                        options=act_options,
-                    )
-                    result = act_response.data.result
-                    yield "success", result.success
-                    yield "message", result.message
-                    yield "action", result.action_description
-            finally:
-                await session.end()
+            await stagehand.init()
+
+        page = stagehand.page
+
+        assert page is not None, "Stagehand page is not initialized"
+
+        await page.goto(input_data.url)
+        for action in input_data.action:
+            action_results = await page.act(
+                action,
+                variables=input_data.variables,
+                iframes=input_data.iframes,
+                domSettleTimeoutMs=input_data.domSettleTimeoutMs,
+                timeoutMs=input_data.timeoutMs,
+            )
+            yield "success", action_results.success
+            yield "message", action_results.message
+            yield "action", action_results.action


 class StagehandExtractBlock(Block):
@@ -282,10 +335,13 @@ class StagehandExtractBlock(Block):
        instruction: str = SchemaField(
            description="Natural language description of elements or actions to discover.",
        )
-        dom_settle_timeout_ms: int = SchemaField(
-            description="Timeout in ms to wait for the DOM to settle after navigation.",
-            default=30000,
-            advanced=True,
+        iframes: bool = SchemaField(
+            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
+            default=True,
+        )
+        domSettleTimeoutMs: int = SchemaField(
+            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
+            default=45000,
        )

    class Output(BlockSchemaOutput):
@@ -311,21 +367,24 @@ class StagehandExtractBlock(Block):

        logger.debug(f"EXTRACT: Using model provider {model_credentials.provider}")

-        async with AsyncStagehand(
-            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
-            browserbase_project_id=input_data.browserbase_project_id,
-            model_api_key=model_credentials.api_key.get_secret_value(),
-        ) as client:
-            session = await client.sessions.start(
+        with disable_signal_handling():
+            stagehand = Stagehand(
+                api_key=stagehand_credentials.api_key.get_secret_value(),
+                project_id=input_data.browserbase_project_id,
                model_name=input_data.model.provider_name,
-                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
+                model_api_key=model_credentials.api_key.get_secret_value(),
            )
-            try:
-                await session.navigate(url=input_data.url)

-                extract_response = await session.extract(
-                    instruction=input_data.instruction,
-                )
-                yield "extraction", str(extract_response.data.result)
-            finally:
-                await session.end()
+            await stagehand.init()
+
+        page = stagehand.page
+
+        assert page is not None, "Stagehand page is not initialized"
+
+        await page.goto(input_data.url)
+        extraction = await page.extract(
+            input_data.instruction,
+            iframes=input_data.iframes,
+            domSettleTimeoutMs=input_data.domSettleTimeoutMs,
+        )
+        yield "extraction", str(extraction.model_dump()["extraction"])
--- a/autogpt_platform/backend/backend/blocks/test/test_llm.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_llm.py
@@ -1,18 +1,9 @@
-from typing import cast
 from unittest.mock import AsyncMock, MagicMock, patch

-import anthropic
-import httpx
-import openai
 import pytest

-import backend.blocks.llm as llm
 from backend.data.model import NodeExecutionStats

-# TEST_CREDENTIALS_INPUT is a plain dict that satisfies AICredentials at runtime
-# but not at the type level. Cast once here to avoid per-test suppressors.
-_TEST_AI_CREDENTIALS = cast(llm.AICredentials, llm.TEST_CREDENTIALS_INPUT)
-

 class TestLLMStatsTracking:
    """Test that LLM blocks correctly track token usage statistics."""
@@ -664,148 +655,3 @@ class TestAITextSummarizerValidation:
        error_message = str(exc_info.value)
        assert "Expected a string summary" in error_message
        assert "received dict" in error_message
-
-
-def _make_anthropic_status_error(status_code: int) -> anthropic.APIStatusError:
-    """Create an anthropic.APIStatusError with the given status code."""
-    request = httpx.Request("POST", "https://api.anthropic.com/v1/messages")
-    response = httpx.Response(status_code, request=request)
-    return anthropic.APIStatusError(
-        f"Error code: {status_code}", response=response, body=None
-    )
-
-
-def _make_openai_status_error(status_code: int) -> openai.APIStatusError:
-    """Create an openai.APIStatusError with the given status code."""
-    response = httpx.Response(
-        status_code, request=httpx.Request("POST", "https://api.openai.com/v1/chat")
-    )
-    return openai.APIStatusError(
-        f"Error code: {status_code}", response=response, body=None
-    )
-
-
-class TestUserErrorStatusCodeHandling:
-    """Test that user-caused LLM API errors (401/403/429) break the retry loop
-    and are logged as warnings, while server errors (500) trigger retries."""
-
-    @pytest.mark.asyncio
-    @pytest.mark.parametrize("status_code", [401, 403, 429])
-    async def test_anthropic_user_error_breaks_retry_loop(self, status_code: int):
-        """401/403/429 Anthropic errors should break immediately, not retry."""
-        import backend.blocks.llm as llm
-
-        block = llm.AIStructuredResponseGeneratorBlock()
-        call_count = 0
-
-        async def mock_llm_call(*args, **kwargs):
-            nonlocal call_count
-            call_count += 1
-            raise _make_anthropic_status_error(status_code)
-
-        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
-            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
-                prompt="Test",
-                expected_format={"key": "desc"},
-                model=llm.DEFAULT_LLM_MODEL,
-                credentials=_TEST_AI_CREDENTIALS,
-                retry=3,
-            )
-
-            with pytest.raises(RuntimeError):
-                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
-                    pass
-
-        assert (
-            call_count == 1
-        ), f"Expected exactly 1 call for status {status_code}, got {call_count}"
-
-    @pytest.mark.asyncio
-    @pytest.mark.parametrize("status_code", [401, 403, 429])
-    async def test_openai_user_error_breaks_retry_loop(self, status_code: int):
-        """401/403/429 OpenAI errors should break immediately, not retry."""
-        import backend.blocks.llm as llm
-
-        block = llm.AIStructuredResponseGeneratorBlock()
-        call_count = 0
-
-        async def mock_llm_call(*args, **kwargs):
-            nonlocal call_count
-            call_count += 1
-            raise _make_openai_status_error(status_code)
-
-        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
-            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
-                prompt="Test",
-                expected_format={"key": "desc"},
-                model=llm.DEFAULT_LLM_MODEL,
-                credentials=_TEST_AI_CREDENTIALS,
-                retry=3,
-            )
-
-            with pytest.raises(RuntimeError):
-                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
-                    pass
-
-        assert (
-            call_count == 1
-        ), f"Expected exactly 1 call for status {status_code}, got {call_count}"
-
-    @pytest.mark.asyncio
-    async def test_server_error_retries(self):
-        """500 errors should be retried (not break immediately)."""
-        import backend.blocks.llm as llm
-
-        block = llm.AIStructuredResponseGeneratorBlock()
-        call_count = 0
-
-        async def mock_llm_call(*args, **kwargs):
-            nonlocal call_count
-            call_count += 1
-            raise _make_anthropic_status_error(500)
-
-        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
-            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
-                prompt="Test",
-                expected_format={"key": "desc"},
-                model=llm.DEFAULT_LLM_MODEL,
-                credentials=_TEST_AI_CREDENTIALS,
-                retry=3,
-            )
-
-            with pytest.raises(RuntimeError):
-                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
-                    pass
-
-        assert (
-            call_count > 1
-        ), f"Expected multiple retry attempts for 500, got {call_count}"
-
-    @pytest.mark.asyncio
-    async def test_user_error_logs_warning_not_exception(self):
-        """User-caused errors should log with logger.warning, not logger.exception."""
-        import backend.blocks.llm as llm
-
-        block = llm.AIStructuredResponseGeneratorBlock()
-
-        async def mock_llm_call(*args, **kwargs):
-            raise _make_anthropic_status_error(401)
-
-        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
-            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
-                prompt="Test",
-                expected_format={"key": "desc"},
-                model=llm.DEFAULT_LLM_MODEL,
-                credentials=_TEST_AI_CREDENTIALS,
-            )
-
-            with (
-                patch.object(llm.logger, "warning") as mock_warning,
-                patch.object(llm.logger, "exception") as mock_exception,
-                pytest.raises(RuntimeError),
-            ):
-                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
-                    pass
-
-        mock_warning.assert_called_once()
-        mock_exception.assert_not_called()
--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker.py
@@ -57,7 +57,7 @@ async def execute_graph(
@pytest.mark.asyncio(loop_scope="session")
 async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):
    from backend.blocks.agent import AgentExecutorBlock
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
    from backend.data import graph

    test_user = await create_test_user()
@@ -66,7 +66,7 @@ async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):

    nodes = [
        graph.Node(
-            block_id=OrchestratorBlock().id,
+            block_id=SmartDecisionMakerBlock().id,
            input_default={
                "prompt": "Hello, World!",
                "credentials": creds,
@@ -108,10 +108,10 @@ async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):


@pytest.mark.asyncio(loop_scope="session")
-async def test_orchestrator_function_signature(server: SpinTestServer):
+async def test_smart_decision_maker_function_signature(server: SpinTestServer):
    from backend.blocks.agent import AgentExecutorBlock
    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
    from backend.data import graph

    test_user = await create_test_user()
@@ -120,7 +120,7 @@ async def test_orchestrator_function_signature(server: SpinTestServer):

    nodes = [
        graph.Node(
-            block_id=OrchestratorBlock().id,
+            block_id=SmartDecisionMakerBlock().id,
            input_default={
                "prompt": "Hello, World!",
                "credentials": creds,
@@ -169,7 +169,7 @@ async def test_orchestrator_function_signature(server: SpinTestServer):
    )
    test_graph = await create_graph(server, test_graph, test_user)

-    tool_functions = await OrchestratorBlock._create_tool_node_signatures(
+    tool_functions = await SmartDecisionMakerBlock._create_tool_node_signatures(
        test_graph.nodes[0].id
    )
    assert tool_functions is not None, "Tool functions should not be None"
@@ -198,12 +198,12 @@ async def test_orchestrator_function_signature(server: SpinTestServer):


@pytest.mark.asyncio
-async def test_orchestrator_tracks_llm_stats():
-    """Test that OrchestratorBlock correctly tracks LLM usage stats."""
+async def test_smart_decision_maker_tracks_llm_stats():
+    """Test that SmartDecisionMakerBlock correctly tracks LLM usage stats."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock

-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Mock the llm.llm_call function to return controlled data
    mock_response = MagicMock()
@@ -224,14 +224,14 @@ async def test_orchestrator_tracks_llm_stats():
        new_callable=AsyncMock,
        return_value=mock_response,
    ), patch.object(
-        OrchestratorBlock,
+        SmartDecisionMakerBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],
    ):

        # Create test input
-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Should I continue with this task?",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -274,12 +274,12 @@ async def test_orchestrator_tracks_llm_stats():


@pytest.mark.asyncio
-async def test_orchestrator_parameter_validation():
-    """Test that OrchestratorBlock correctly validates tool call parameters."""
+async def test_smart_decision_maker_parameter_validation():
+    """Test that SmartDecisionMakerBlock correctly validates tool call parameters."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock

-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Mock tool functions with specific parameter schema
    mock_tool_functions = [
@@ -327,13 +327,13 @@ async def test_orchestrator_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_with_typo,
    ) as mock_llm_call, patch.object(
-        OrchestratorBlock,
+        SmartDecisionMakerBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -394,13 +394,13 @@ async def test_orchestrator_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_missing_required,
    ), patch.object(
-        OrchestratorBlock,
+        SmartDecisionMakerBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -454,13 +454,13 @@ async def test_orchestrator_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_valid,
    ), patch.object(
-        OrchestratorBlock,
+        SmartDecisionMakerBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -518,13 +518,13 @@ async def test_orchestrator_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_all_params,
    ), patch.object(
-        OrchestratorBlock,
+        SmartDecisionMakerBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -562,12 +562,12 @@ async def test_orchestrator_parameter_validation():


@pytest.mark.asyncio
-async def test_orchestrator_raw_response_conversion():
-    """Test that Orchestrator correctly handles different raw_response types with retry mechanism."""
+async def test_smart_decision_maker_raw_response_conversion():
+    """Test that SmartDecisionMaker correctly handles different raw_response types with retry mechanism."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock

-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Mock tool functions
    mock_tool_functions = [
@@ -637,7 +637,7 @@ async def test_orchestrator_raw_response_conversion():
    with patch(
        "backend.blocks.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm_call, patch.object(
-        OrchestratorBlock,
+        SmartDecisionMakerBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
@@ -646,7 +646,7 @@ async def test_orchestrator_raw_response_conversion():
        # Second call returns successful response
        mock_llm_call.side_effect = [mock_response_retry, mock_response_success]

-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Test prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -715,12 +715,12 @@ async def test_orchestrator_raw_response_conversion():
        new_callable=AsyncMock,
        return_value=mock_response_ollama,
    ), patch.object(
-        OrchestratorBlock,
+        SmartDecisionMakerBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],  # No tools for this test
    ):
-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Simple prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -771,12 +771,12 @@ async def test_orchestrator_raw_response_conversion():
        new_callable=AsyncMock,
        return_value=mock_response_dict,
    ), patch.object(
-        OrchestratorBlock,
+        SmartDecisionMakerBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],
    ):
-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Another test",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -811,12 +811,12 @@ async def test_orchestrator_raw_response_conversion():


@pytest.mark.asyncio
-async def test_orchestrator_agent_mode():
+async def test_smart_decision_maker_agent_mode():
    """Test that agent mode executes tools directly and loops until finished."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock

-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Mock tool call that requires multiple iterations
    mock_tool_call_1 = MagicMock()
@@ -893,7 +893,7 @@ async def test_orchestrator_agent_mode():
    with patch("backend.blocks.llm.llm_call", llm_call_mock), patch.object(
        block, "_create_tool_node_signatures", return_value=mock_tool_signatures
    ), patch(
-        "backend.blocks.orchestrator.get_database_manager_async_client",
+        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
        return_value=mock_db_client,
    ), patch(
        "backend.executor.manager.async_update_node_execution_status",
@@ -929,7 +929,7 @@ async def test_orchestrator_agent_mode():
        }

        # Test agent mode with max_iterations = 3
-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Complete this task using tools",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -969,12 +969,12 @@ async def test_orchestrator_agent_mode():


@pytest.mark.asyncio
-async def test_orchestrator_traditional_mode_default():
+async def test_smart_decision_maker_traditional_mode_default():
    """Test that default behavior (agent_mode_max_iterations=0) works as traditional mode."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock

-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Mock tool call
    mock_tool_call = MagicMock()
@@ -1018,7 +1018,7 @@ async def test_orchestrator_traditional_mode_default():
    ):

        # Test default behavior (traditional mode)
-        input_data = OrchestratorBlock.Input(
+        input_data = SmartDecisionMakerBlock.Input(
            prompt="Test prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -1060,12 +1060,12 @@ async def test_orchestrator_traditional_mode_default():


@pytest.mark.asyncio
-async def test_orchestrator_uses_customized_name_for_blocks():
-    """Test that OrchestratorBlock uses customized_name from node metadata for tool names."""
+async def test_smart_decision_maker_uses_customized_name_for_blocks():
+    """Test that SmartDecisionMakerBlock uses customized_name from node metadata for tool names."""
    from unittest.mock import MagicMock

    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
    from backend.data.graph import Link, Node

    # Create a mock node with customized_name in metadata
@@ -1080,7 +1080,7 @@ async def test_orchestrator_uses_customized_name_for_blocks():
    mock_link.sink_name = "input"

    # Call the function directly
-    result = await OrchestratorBlock._create_block_function_signature(
+    result = await SmartDecisionMakerBlock._create_block_function_signature(
        mock_node, [mock_link]
    )

@@ -1091,12 +1091,12 @@ async def test_orchestrator_uses_customized_name_for_blocks():


@pytest.mark.asyncio
-async def test_orchestrator_falls_back_to_block_name():
-    """Test that OrchestratorBlock falls back to block.name when no customized_name."""
+async def test_smart_decision_maker_falls_back_to_block_name():
+    """Test that SmartDecisionMakerBlock falls back to block.name when no customized_name."""
    from unittest.mock import MagicMock

    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
    from backend.data.graph import Link, Node

    # Create a mock node without customized_name
@@ -1111,7 +1111,7 @@ async def test_orchestrator_falls_back_to_block_name():
    mock_link.sink_name = "input"

    # Call the function directly
-    result = await OrchestratorBlock._create_block_function_signature(
+    result = await SmartDecisionMakerBlock._create_block_function_signature(
        mock_node, [mock_link]
    )

@@ -1122,11 +1122,11 @@ async def test_orchestrator_falls_back_to_block_name():


@pytest.mark.asyncio
-async def test_orchestrator_uses_customized_name_for_agents():
-    """Test that OrchestratorBlock uses customized_name from metadata for agent nodes."""
+async def test_smart_decision_maker_uses_customized_name_for_agents():
+    """Test that SmartDecisionMakerBlock uses customized_name from metadata for agent nodes."""
    from unittest.mock import AsyncMock, MagicMock, patch

-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
    from backend.data.graph import Link, Node

    # Create a mock node with customized_name in metadata
@@ -1152,10 +1152,10 @@ async def test_orchestrator_uses_customized_name_for_agents():
    mock_db_client.get_graph_metadata.return_value = mock_graph_meta

    with patch(
-        "backend.blocks.orchestrator.get_database_manager_async_client",
+        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
        return_value=mock_db_client,
    ):
-        result = await OrchestratorBlock._create_agent_function_signature(
+        result = await SmartDecisionMakerBlock._create_agent_function_signature(
            mock_node, [mock_link]
        )

@@ -1166,11 +1166,11 @@ async def test_orchestrator_uses_customized_name_for_agents():


@pytest.mark.asyncio
-async def test_orchestrator_agent_falls_back_to_graph_name():
+async def test_smart_decision_maker_agent_falls_back_to_graph_name():
    """Test that agent node falls back to graph name when no customized_name."""
    from unittest.mock import AsyncMock, MagicMock, patch

-    from backend.blocks.orchestrator import OrchestratorBlock
+    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
    from backend.data.graph import Link, Node

    # Create a mock node without customized_name
@@ -1196,10 +1196,10 @@ async def test_orchestrator_agent_falls_back_to_graph_name():
    mock_db_client.get_graph_metadata.return_value = mock_graph_meta

    with patch(
-        "backend.blocks.orchestrator.get_database_manager_async_client",
+        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
        return_value=mock_db_client,
    ):
-        result = await OrchestratorBlock._create_agent_function_signature(
+        result = await SmartDecisionMakerBlock._create_agent_function_signature(
            mock_node, [mock_link]
        )

--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dict.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dict.py
@@ -3,12 +3,12 @@ from unittest.mock import Mock
 import pytest

 from backend.blocks.data_manipulation import AddToListBlock, CreateDictionaryBlock
-from backend.blocks.orchestrator import OrchestratorBlock
+from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock


@pytest.mark.asyncio
-async def test_orchestrator_handles_dynamic_dict_fields():
-    """Test Orchestrator can handle dynamic dictionary fields (_#_) for any block"""
+async def test_smart_decision_maker_handles_dynamic_dict_fields():
+    """Test Smart Decision Maker can handle dynamic dictionary fields (_#_) for any block"""

    # Create a mock node for CreateDictionaryBlock
    mock_node = Mock()
@@ -23,24 +23,24 @@ async def test_orchestrator_handles_dynamic_dict_fields():
            source_name="tools_^_create_dict_~_name",
            sink_name="values_#_name",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_age",
            sink_name="values_#_age",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_city",
            sink_name="values_#_city",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
    ]

    # Generate function signature
-    signature = await OrchestratorBlock._create_block_function_signature(
+    signature = await SmartDecisionMakerBlock._create_block_function_signature(
        mock_node, mock_links  # type: ignore
    )

@@ -70,8 +70,8 @@ async def test_orchestrator_handles_dynamic_dict_fields():


@pytest.mark.asyncio
-async def test_orchestrator_handles_dynamic_list_fields():
-    """Test Orchestrator can handle dynamic list fields (_$_) for any block"""
+async def test_smart_decision_maker_handles_dynamic_list_fields():
+    """Test Smart Decision Maker can handle dynamic list fields (_$_) for any block"""

    # Create a mock node for AddToListBlock
    mock_node = Mock()
@@ -86,18 +86,18 @@ async def test_orchestrator_handles_dynamic_list_fields():
            source_name="tools_^_add_to_list_~_0",
            sink_name="entries_$_0",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_add_to_list_~_1",
            sink_name="entries_$_1",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
    ]

    # Generate function signature
-    signature = await OrchestratorBlock._create_block_function_signature(
+    signature = await SmartDecisionMakerBlock._create_block_function_signature(
        mock_node, mock_links  # type: ignore
    )

--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dynamic_fields.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dynamic_fields.py
@@ -1,4 +1,4 @@
-"""Comprehensive tests for OrchestratorBlock dynamic field handling."""
+"""Comprehensive tests for SmartDecisionMakerBlock dynamic field handling."""

 import json
 from unittest.mock import AsyncMock, MagicMock, Mock, patch
@@ -6,7 +6,7 @@ from unittest.mock import AsyncMock, MagicMock, Mock, patch
 import pytest

 from backend.blocks.data_manipulation import AddToListBlock, CreateDictionaryBlock
-from backend.blocks.orchestrator import OrchestratorBlock
+from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
 from backend.blocks.text import MatchTextPatternBlock
 from backend.data.dynamic_fields import get_dynamic_field_description

@@ -37,7 +37,7 @@ async def test_dynamic_field_description_generation():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_dict_fields():
    """Test that function signatures are created correctly for dictionary dynamic fields."""
-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Create a mock node for CreateDictionaryBlock
    mock_node = Mock()
@@ -52,19 +52,19 @@ async def test_create_block_function_signature_with_dict_fields():
            source_name="tools_^_create_dict_~_values___name",  # Sanitized source
            sink_name="values_#_name",  # Original sink
            sink_id="dict_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_values___age",  # Sanitized source
            sink_name="values_#_age",  # Original sink
            sink_id="dict_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_values___email",  # Sanitized source
            sink_name="values_#_email",  # Original sink
            sink_id="dict_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
    ]

@@ -100,7 +100,7 @@ async def test_create_block_function_signature_with_dict_fields():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_list_fields():
    """Test that function signatures are created correctly for list dynamic fields."""
-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Create a mock node for AddToListBlock
    mock_node = Mock()
@@ -115,19 +115,19 @@ async def test_create_block_function_signature_with_list_fields():
            source_name="tools_^_add_list_~_0",
            sink_name="entries_$_0",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_add_list_~_1",
            sink_name="entries_$_1",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_add_list_~_2",
            sink_name="entries_$_2",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
    ]

@@ -154,7 +154,7 @@ async def test_create_block_function_signature_with_list_fields():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_object_fields():
    """Test that function signatures are created correctly for object dynamic fields."""
-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Create a mock node for MatchTextPatternBlock (simulating object fields)
    mock_node = Mock()
@@ -169,13 +169,13 @@ async def test_create_block_function_signature_with_object_fields():
            source_name="tools_^_extract_~_user_name",
            sink_name="data_@_user_name",  # Dynamic object field
            sink_id="extract_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_extract_~_user_email",
            sink_name="data_@_user_email",  # Dynamic object field
            sink_id="extract_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
    ]

@@ -197,11 +197,11 @@ async def test_create_block_function_signature_with_object_fields():
@pytest.mark.asyncio
 async def test_create_tool_node_signatures():
    """Test that the mapping between sanitized and original field names is built correctly."""
-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Mock the database client and connected nodes
    with patch(
-        "backend.blocks.orchestrator.get_database_manager_async_client"
+        "backend.blocks.smart_decision_maker.get_database_manager_async_client"
    ) as mock_db:
        mock_client = AsyncMock()
        mock_db.return_value = mock_client
@@ -281,7 +281,7 @@ async def test_create_tool_node_signatures():
@pytest.mark.asyncio
 async def test_output_yielding_with_dynamic_fields():
    """Test that outputs are yielded correctly with dynamic field names mapped back."""
-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # No more sanitized mapping needed since we removed sanitization

@@ -309,13 +309,13 @@ async def test_output_yielding_with_dynamic_fields():

    # Mock the LLM call
    with patch(
-        "backend.blocks.orchestrator.llm.llm_call", new_callable=AsyncMock
+        "backend.blocks.smart_decision_maker.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm:
        mock_llm.return_value = mock_response

        # Mock the database manager to avoid HTTP calls during tool execution
        with patch(
-            "backend.blocks.orchestrator.get_database_manager_async_client"
+            "backend.blocks.smart_decision_maker.get_database_manager_async_client"
        ) as mock_db_manager, patch.object(
            block, "_create_tool_node_signatures", new_callable=AsyncMock
        ) as mock_sig:
@@ -420,7 +420,7 @@ async def test_output_yielding_with_dynamic_fields():
@pytest.mark.asyncio
 async def test_mixed_regular_and_dynamic_fields():
    """Test handling of blocks with both regular and dynamic fields."""
-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Create a mock node
    mock_node = Mock()
@@ -450,19 +450,19 @@ async def test_mixed_regular_and_dynamic_fields():
            source_name="tools_^_test_~_regular",
            sink_name="regular_field",  # Regular field
            sink_id="test_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_test_~_dict_key",
            sink_name="values_#_key1",  # Dynamic dict field
            sink_id="test_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
        Mock(
            source_name="tools_^_test_~_dict_key2",
            sink_name="values_#_key2",  # Dynamic dict field
            sink_id="test_node_id",
-            source_id="orchestrator_node_id",
+            source_id="smart_decision_node_id",
        ),
    ]

@@ -488,7 +488,7 @@ async def test_mixed_regular_and_dynamic_fields():
@pytest.mark.asyncio
 async def test_validation_errors_dont_pollute_conversation():
    """Test that validation errors are only used during retries and don't pollute the conversation."""
-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # Track conversation history changes
    conversation_snapshots = []
@@ -535,7 +535,7 @@ async def test_validation_errors_dont_pollute_conversation():

    # Mock the LLM call
    with patch(
-        "backend.blocks.orchestrator.llm.llm_call", new_callable=AsyncMock
+        "backend.blocks.smart_decision_maker.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm:
        mock_llm.side_effect = mock_llm_call

@@ -565,7 +565,7 @@ async def test_validation_errors_dont_pollute_conversation():

            # Mock the database manager to avoid HTTP calls during tool execution
            with patch(
-                "backend.blocks.orchestrator.get_database_manager_async_client"
+                "backend.blocks.smart_decision_maker.get_database_manager_async_client"
            ) as mock_db_manager:
                # Set up the mock database manager for agent mode
                mock_db_client = AsyncMock()
--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_responses_api.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_responses_api.py
@@ -1,6 +1,6 @@
-"""Tests for OrchestratorBlock compatibility with the OpenAI Responses API.
+"""Tests for SmartDecisionMakerBlock compatibility with the OpenAI Responses API.

-The OrchestratorBlock manages conversation history in the Chat Completions
+The SmartDecisionMakerBlock manages conversation history in the Chat Completions
 format, but OpenAI models now use the Responses API which has a fundamentally
 different conversation structure.  These tests document:

@@ -27,8 +27,8 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from backend.blocks.orchestrator import (
-    OrchestratorBlock,
+from backend.blocks.smart_decision_maker import (
+    SmartDecisionMakerBlock,
    _combine_tool_responses,
    _convert_raw_response_to_dict,
    _create_tool_response,
@@ -733,7 +733,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_no_reasoning_no_tools(self):
        """Dict raw_response, no reasoning → appends assistant dict."""
-        block = OrchestratorBlock()
+        block = SmartDecisionMakerBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": "hi"})
        block._update_conversation(prompt, resp)
@@ -741,7 +741,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_with_reasoning_no_tool_calls(self):
        """Reasoning present, no tool calls → reasoning prepended."""
-        block = OrchestratorBlock()
+        block = SmartDecisionMakerBlock()
        prompt: list[dict] = []
        resp = self._make_response(
            {"role": "assistant", "content": "answer"},
@@ -757,7 +757,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_with_reasoning_and_anthropic_tool_calls(self):
        """Reasoning + Anthropic tool_use in content → reasoning skipped."""
-        block = OrchestratorBlock()
+        block = SmartDecisionMakerBlock()
        prompt: list[dict] = []
        raw = {
            "role": "assistant",
@@ -772,7 +772,7 @@ class TestUpdateConversation:

    def test_with_tool_outputs(self):
        """Tool outputs → extended onto prompt."""
-        block = OrchestratorBlock()
+        block = SmartDecisionMakerBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": None})
        outputs = [{"role": "tool", "tool_call_id": "call_1", "content": "r"}]
@@ -782,7 +782,7 @@ class TestUpdateConversation:

    def test_without_tool_outputs(self):
        """No tool outputs → only assistant message appended."""
-        block = OrchestratorBlock()
+        block = SmartDecisionMakerBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": "done"})
        block._update_conversation(prompt, resp, None)
@@ -790,7 +790,7 @@ class TestUpdateConversation:

    def test_string_raw_response(self):
        """Ollama string → wrapped as assistant dict."""
-        block = OrchestratorBlock()
+        block = SmartDecisionMakerBlock()
        prompt: list[dict] = []
        resp = self._make_response("hello from ollama")
        block._update_conversation(prompt, resp)
@@ -800,7 +800,7 @@ class TestUpdateConversation:

    def test_responses_api_text_response_produces_valid_items(self):
        """Responses API text response → conversation items must have valid role."""
-        block = OrchestratorBlock()
+        block = SmartDecisionMakerBlock()
        prompt: list[dict] = [
            {"role": "system", "content": "sys"},
            {"role": "user", "content": "user"},
@@ -820,7 +820,7 @@ class TestUpdateConversation:

    def test_responses_api_function_call_produces_valid_items(self):
        """Responses API function_call → conversation items must have valid type."""
-        block = OrchestratorBlock()
+        block = SmartDecisionMakerBlock()
        prompt: list[dict] = []
        resp = self._make_response(
            _MockResponse(output=[_MockFunctionCall("tool", "{}", call_id="call_1")])
@@ -856,7 +856,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
    """
    import backend.blocks.llm as llm_module

-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    # First response: tool call
    mock_tc = MagicMock()
@@ -936,7 +936,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
    with patch("backend.blocks.llm.llm_call", llm_mock), patch.object(
        block, "_create_tool_node_signatures", return_value=tool_sigs
    ), patch(
-        "backend.blocks.orchestrator.get_database_manager_async_client",
+        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
        return_value=mock_db,
    ), patch(
        "backend.executor.manager.async_update_node_execution_status",
@@ -945,7 +945,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
        "backend.integrations.creds_manager.IntegrationCredentialsManager"
    ):

-        inp = OrchestratorBlock.Input(
+        inp = SmartDecisionMakerBlock.Input(
            prompt="Improve this",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -992,7 +992,7 @@ async def test_traditional_mode_conversation_valid_for_responses_api():
    """Traditional mode: the yielded conversation must contain only valid items."""
    import backend.blocks.llm as llm_module

-    block = OrchestratorBlock()
+    block = SmartDecisionMakerBlock()

    mock_tc = MagicMock()
    mock_tc.function.name = "my_tool"
@@ -1028,7 +1028,7 @@ async def test_traditional_mode_conversation_valid_for_responses_api():
        "backend.blocks.llm.llm_call", new_callable=AsyncMock, return_value=resp
    ), patch.object(block, "_create_tool_node_signatures", return_value=tool_sigs):

-        inp = OrchestratorBlock.Input(
+        inp = SmartDecisionMakerBlock.Input(
            prompt="Do it",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
--- a/autogpt_platform/backend/backend/copilot/context.py
+++ b/autogpt_platform/backend/backend/copilot/context.py
@@ -17,9 +17,6 @@ from backend.util.workspace import WorkspaceManager
 if TYPE_CHECKING:
    from e2b import AsyncSandbox

-    from backend.copilot.permissions import CopilotPermissions
-
-
 # Allowed base directory for the Read tool.  Public so service.py can use it
 # for sweep operations without depending on a private implementation detail.
 # Respects CLAUDE_CONFIG_DIR env var, consistent with transcript.py's
@@ -46,12 +43,6 @@ _current_sandbox: ContextVar["AsyncSandbox | None"] = ContextVar(
 )
 _current_sdk_cwd: ContextVar[str] = ContextVar("_current_sdk_cwd", default="")

-# Current execution's capability filter.  None means "no restrictions".
-# Set by set_execution_context(); read by run_block and service.py.
-_current_permissions: "ContextVar[CopilotPermissions | None]" = ContextVar(
-    "_current_permissions", default=None
-)
-

 def encode_cwd_for_cli(cwd: str) -> str:
    """Encode a working directory path the same way the Claude CLI does.
@@ -72,7 +63,6 @@ def set_execution_context(
    session: ChatSession,
    sandbox: "AsyncSandbox | None" = None,
    sdk_cwd: str | None = None,
-    permissions: "CopilotPermissions | None" = None,
 ) -> None:
    """Set per-turn context variables used by file-resolution tool handlers."""
    _current_user_id.set(user_id)
@@ -80,7 +70,6 @@ def set_execution_context(
    _current_sandbox.set(sandbox)
    _current_sdk_cwd.set(sdk_cwd or "")
    _current_project_dir.set(_encode_cwd_for_cli(sdk_cwd) if sdk_cwd else "")
-    _current_permissions.set(permissions)


 def get_execution_context() -> tuple[str | None, ChatSession | None]:
@@ -88,11 +77,6 @@ def get_execution_context() -> tuple[str | None, ChatSession | None]:
    return _current_user_id.get(), _current_session.get()


-def get_current_permissions() -> "CopilotPermissions | None":
-    """Return the capability filter for the current execution, or None if unrestricted."""
-    return _current_permissions.get()
-
-
 def get_current_sandbox() -> "AsyncSandbox | None":
    """Return the E2B sandbox for the current session, or None if not active."""
    return _current_sandbox.get()
--- a/autogpt_platform/backend/backend/copilot/context_test.py
+++ b/autogpt_platform/backend/backend/copilot/context_test.py
@@ -11,7 +11,6 @@ import pytest
 from backend.copilot.context import (
    SDK_PROJECTS_DIR,
    _current_project_dir,
-    get_current_permissions,
    get_current_sandbox,
    get_execution_context,
    get_sdk_cwd,
@@ -19,7 +18,6 @@ from backend.copilot.context import (
    resolve_sandbox_path,
    set_execution_context,
 )
-from backend.copilot.permissions import CopilotPermissions


 def _make_session() -> MagicMock:
@@ -63,19 +61,6 @@ def test_get_current_sandbox_returns_set_value():
    assert get_current_sandbox() is mock_sandbox


-def test_set_and_get_current_permissions():
-    """set_execution_context stores permissions; get_current_permissions returns it."""
-    perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
-    set_execution_context("u1", _make_session(), permissions=perms)
-    assert get_current_permissions() is perms
-
-
-def test_get_current_permissions_defaults_to_none():
-    """get_current_permissions returns None when no permissions have been set."""
-    set_execution_context("u1", _make_session())
-    assert get_current_permissions() is None
-
-
 def test_get_sdk_cwd_empty_when_not_set():
    """get_sdk_cwd returns empty string when sdk_cwd is not set."""
    set_execution_context("u1", _make_session(), sdk_cwd=None)
--- a/autogpt_platform/backend/backend/copilot/permissions.py
+++ b/autogpt_platform/backend/backend/copilot/permissions.py
@@ -1,430 +0,0 @@
-"""Copilot execution permissions — tool and block allow/deny filtering.
-
-:class:`CopilotPermissions` is the single model used everywhere:
-
- ``AutoPilotBlock`` reads four block-input fields and builds one instance.
- ``stream_chat_completion_sdk`` applies it when constructing
-  ``ClaudeAgentOptions.allowed_tools`` / ``disallowed_tools``.
- ``run_block`` reads it from the contextvar to gate block execution.
- Recursive (sub-agent) invocations merge parent and child so children
-  can only be *more* restrictive, never more permissive.
-
-Tool names
----------
-Users specify the **short name** as it appears in ``TOOL_REGISTRY`` (e.g.
-``run_block``, ``web_fetch``) or as an SDK built-in (e.g. ``Read``,
-``Task``, ``WebSearch``).  Internally these are mapped to the full SDK
-format (``mcp__copilot__run_block``, ``Read``, …) by
-:func:`apply_tool_permissions`.
-
-Block identifiers
-----------------
-Each entry in ``blocks`` may be one of:
-
- A **full UUID** (``c069dc6b-c3ed-4c12-b6e5-d47361e64ce6``)
- A **partial UUID** — the first 8-character hex segment (``c069dc6b``)
- A **block name** (case-insensitive, e.g. ``"HTTP Request"``)
-
-:func:`validate_block_identifiers` resolves all entries against the live
-block registry and returns any that could not be matched.
-
-Semantics
---------
-``tools_exclude=True``  (default) — ``tools`` is a **blacklist**; listed
-tools are denied and everything else is allowed.  An empty list means
-"allow all" (no filtering).
-
-``tools_exclude=False`` — ``tools`` is a **whitelist**; only listed tools
-are allowed.
-
-``blocks_exclude`` follows the same pattern for ``blocks``.
-
-Recursion inheritance
---------------------
-:meth:`CopilotPermissions.merged_with_parent` produces a new instance that
-is at most as permissive as the parent:
-
- Tools: effective-allowed sets are intersected then stored as a whitelist.
- Blocks: the parent is stored in ``_parent`` and consulted during every
-  :meth:`is_block_allowed` call so both constraints must pass.
-"""
-
-from __future__ import annotations
-
-import re
-from typing import Literal, get_args
-
-from pydantic import BaseModel, PrivateAttr
-
-# ---------------------------------------------------------------------------
-# Constants — single source of truth for all accepted tool names
-# ---------------------------------------------------------------------------
-
-# Literal type combining all valid tool names — used by AutoPilotBlock.Input
-# so the frontend renders a multi-select dropdown.
-# This is the SINGLE SOURCE OF TRUTH.  All other name sets are derived from it.
-ToolName = Literal[
-    # Platform tools (must match keys in TOOL_REGISTRY)
-    "add_understanding",
-    "bash_exec",
-    "browser_act",
-    "browser_navigate",
-    "browser_screenshot",
-    "connect_integration",
-    "continue_run_block",
-    "create_agent",
-    "create_feature_request",
-    "create_folder",
-    "customize_agent",
-    "delete_folder",
-    "delete_workspace_file",
-    "edit_agent",
-    "find_agent",
-    "find_block",
-    "find_library_agent",
-    "fix_agent_graph",
-    "get_agent_building_guide",
-    "get_doc_page",
-    "get_mcp_guide",
-    "list_folders",
-    "list_workspace_files",
-    "move_agents_to_folder",
-    "move_folder",
-    "read_workspace_file",
-    "run_agent",
-    "run_block",
-    "run_mcp_tool",
-    "search_docs",
-    "search_feature_requests",
-    "update_folder",
-    "validate_agent_graph",
-    "view_agent_output",
-    "web_fetch",
-    "write_workspace_file",
-    # SDK built-ins
-    "Edit",
-    "Glob",
-    "Grep",
-    "Read",
-    "Task",
-    "TodoWrite",
-    "WebSearch",
-    "Write",
-]
-
-# Frozen set of all valid tool names — derived from the Literal.
-ALL_TOOL_NAMES: frozenset[str] = frozenset(get_args(ToolName))
-
-# SDK built-in tool names — uppercase-initial names are SDK built-ins.
-SDK_BUILTIN_TOOL_NAMES: frozenset[str] = frozenset(
-    n for n in ALL_TOOL_NAMES if n[0].isupper()
-)
-
-# Platform tool names — everything that isn't an SDK built-in.
-PLATFORM_TOOL_NAMES: frozenset[str] = ALL_TOOL_NAMES - SDK_BUILTIN_TOOL_NAMES
-
-# Compiled regex patterns for block identifier classification.
-_FULL_UUID_RE = re.compile(
-    r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
-    re.IGNORECASE,
-)
-_PARTIAL_UUID_RE = re.compile(r"^[0-9a-f]{8}$", re.IGNORECASE)
-
-
-# ---------------------------------------------------------------------------
-# Helper — block identifier matching
-# ---------------------------------------------------------------------------
-
-
-def _block_matches(identifier: str, block_id: str, block_name: str) -> bool:
-    """Return True if *identifier* resolves to the given block.
-
-    Resolution order:
-    1. Full UUID — exact case-insensitive match against *block_id*.
-    2. Partial UUID (8 hex chars, first segment) — prefix match.
-    3. Name — case-insensitive equality against *block_name*.
-    """
-    ident = identifier.strip()
-    if _FULL_UUID_RE.match(ident):
-        return ident.lower() == block_id.lower()
-    if _PARTIAL_UUID_RE.match(ident):
-        return block_id.lower().startswith(ident.lower())
-    return ident.lower() == block_name.lower()
-
-
-# ---------------------------------------------------------------------------
-# Model
-# ---------------------------------------------------------------------------
-
-
-class CopilotPermissions(BaseModel):
-    """Capability filter for a single copilot execution.
-
-    Attributes:
-        tools: Tool names to filter (short names, e.g. ``run_block``).
-        tools_exclude: When True (default) ``tools`` is a blacklist;
-            when False it is a whitelist.  Ignored when *tools* is empty.
-        blocks: Block identifiers (name, full UUID, or 8-char partial UUID).
-        blocks_exclude: Same semantics as *tools_exclude* but for blocks.
-    """
-
-    tools: list[str] = []
-    tools_exclude: bool = True
-    blocks: list[str] = []
-    blocks_exclude: bool = True
-
-    # Private: parent permissions for recursion inheritance.
-    # Set only by merged_with_parent(); never exposed in block input schema.
-    _parent: CopilotPermissions | None = PrivateAttr(default=None)
-
-    # ------------------------------------------------------------------
-    # Tool helpers
-    # ------------------------------------------------------------------
-
-    def effective_allowed_tools(self, all_tools: frozenset[str]) -> frozenset[str]:
-        """Compute the set of short tool names that are permitted.
-
-        Args:
-            all_tools: Universe of valid short tool names.
-
-        Returns:
-            Subset of *all_tools* that pass the filter.
-        """
-        if not self.tools:
-            return frozenset(all_tools)
-        tool_set = frozenset(self.tools)
-        if self.tools_exclude:
-            return all_tools - tool_set
-        return all_tools & tool_set
-
-    # ------------------------------------------------------------------
-    # Block helpers
-    # ------------------------------------------------------------------
-
-    def is_block_allowed(self, block_id: str, block_name: str) -> bool:
-        """Return True if the block may be executed under these permissions.
-
-        Checks this instance first, then consults the parent (if any) so
-        the entire inheritance chain is respected.
-        """
-        if not self._check_block_locally(block_id, block_name):
-            return False
-        if self._parent is not None:
-            return self._parent.is_block_allowed(block_id, block_name)
-        return True
-
-    def _check_block_locally(self, block_id: str, block_name: str) -> bool:
-        """Check *only* this instance's block filter (ignores parent)."""
-        if not self.blocks:
-            return True  # No filter → allow all
-        matched = any(
-            _block_matches(identifier, block_id, block_name)
-            for identifier in self.blocks
-        )
-        return not matched if self.blocks_exclude else matched
-
-    # ------------------------------------------------------------------
-    # Recursion / merging
-    # ------------------------------------------------------------------
-
-    def merged_with_parent(
-        self,
-        parent: CopilotPermissions,
-        all_tools: frozenset[str],
-    ) -> CopilotPermissions:
-        """Return a new instance that is at most as permissive as *parent*.
-
-        - Tools: intersection of effective-allowed sets, stored as a whitelist.
-        - Blocks: parent is stored internally; both constraints are applied
-          during :meth:`is_block_allowed`.
-        """
-        merged_tools = self.effective_allowed_tools(
-            all_tools
-        ) & parent.effective_allowed_tools(all_tools)
-        result = CopilotPermissions(
-            tools=sorted(merged_tools),
-            tools_exclude=False,
-            blocks=self.blocks,
-            blocks_exclude=self.blocks_exclude,
-        )
-        result._parent = parent
-        return result
-
-    # ------------------------------------------------------------------
-    # Convenience
-    # ------------------------------------------------------------------
-
-    def is_empty(self) -> bool:
-        """Return True when no filtering is configured (allow-all passthrough)."""
-        return not self.tools and not self.blocks and self._parent is None
-
-
-# ---------------------------------------------------------------------------
-# Validation helpers
-# ---------------------------------------------------------------------------
-
-
-def all_known_tool_names() -> frozenset[str]:
-    """Return all short tool names accepted in *tools*.
-
-    Returns the pre-computed ``ALL_TOOL_NAMES`` set (derived from the
-    ``ToolName`` Literal).  On first call, also verifies consistency with
-    the live ``TOOL_REGISTRY``.
-    """
-    _assert_tool_names_consistent()
-    return ALL_TOOL_NAMES
-
-
-def validate_tool_names(tools: list[str]) -> list[str]:
-    """Return entries in *tools* that are not valid tool names.
-
-    Args:
-        tools: List of short tool name strings to validate.
-
-    Returns:
-        List of invalid names (empty if all are valid).
-    """
-    return [t for t in tools if t not in ALL_TOOL_NAMES]
-
-
-_tool_names_checked = False
-
-
-def _assert_tool_names_consistent() -> None:
-    """Verify that ``PLATFORM_TOOL_NAMES`` matches ``TOOL_REGISTRY`` keys.
-
-    Called once lazily (TOOL_REGISTRY has heavy imports).  Raises
-    ``AssertionError`` with a helpful diff if they diverge.
-    """
-    global _tool_names_checked
-    if _tool_names_checked:
-        return
-    _tool_names_checked = True
-
-    from backend.copilot.tools import TOOL_REGISTRY
-
-    registry_keys: frozenset[str] = frozenset(TOOL_REGISTRY.keys())
-    declared: frozenset[str] = PLATFORM_TOOL_NAMES
-    if registry_keys != declared:
-        missing = registry_keys - declared
-        extra = declared - registry_keys
-        parts: list[str] = [
-            "PLATFORM_TOOL_NAMES in permissions.py is out of sync with TOOL_REGISTRY."
-        ]
-        if missing:
-            parts.append(f"  Missing from PLATFORM_TOOL_NAMES: {sorted(missing)}")
-        if extra:
-            parts.append(f"  Extra in PLATFORM_TOOL_NAMES: {sorted(extra)}")
-        parts.append("  Update the ToolName Literal to match.")
-        raise AssertionError("\n".join(parts))
-
-
-async def validate_block_identifiers(
-    identifiers: list[str],
-) -> list[str]:
-    """Resolve each block identifier and return those that could not be matched.
-
-    Args:
-        identifiers: List of block identifiers (name, full UUID, or partial UUID).
-
-    Returns:
-        List of identifiers that matched no known block.
-    """
-    from backend.blocks import get_blocks
-
-    # get_blocks() returns dict[block_id_str, BlockClass]; instantiate once to get names.
-    block_registry = get_blocks()
-    block_info = {bid: cls().name for bid, cls in block_registry.items()}
-    invalid: list[str] = []
-    for ident in identifiers:
-        matched = any(
-            _block_matches(ident, bid, bname) for bid, bname in block_info.items()
-        )
-        if not matched:
-            invalid.append(ident)
-    return invalid
-
-
-# ---------------------------------------------------------------------------
-# SDK tool-list application
-# ---------------------------------------------------------------------------
-
-
-def apply_tool_permissions(
-    permissions: CopilotPermissions,
-    *,
-    use_e2b: bool = False,
-) -> tuple[list[str], list[str]]:
-    """Compute (allowed_tools, extra_disallowed) for :class:`ClaudeAgentOptions`.
-
-    Takes the base allowed/disallowed lists from
-    :func:`~backend.copilot.sdk.tool_adapter.get_copilot_tool_names` /
-    :func:`~backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools` and
-    applies *permissions* on top.
-
-    Returns:
-        ``(allowed_tools, extra_disallowed)`` where *allowed_tools* is the
-        possibly-narrowed list to pass to ``ClaudeAgentOptions.allowed_tools``
-        and *extra_disallowed* is the list to pass to
-        ``ClaudeAgentOptions.disallowed_tools``.
-    """
-    from backend.copilot.sdk.tool_adapter import (
-        _READ_TOOL_NAME,
-        MCP_TOOL_PREFIX,
-        get_copilot_tool_names,
-        get_sdk_disallowed_tools,
-    )
-    from backend.copilot.tools import TOOL_REGISTRY
-
-    base_allowed = get_copilot_tool_names(use_e2b=use_e2b)
-    base_disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)
-
-    if permissions.is_empty():
-        return base_allowed, base_disallowed
-
-    all_tools = all_known_tool_names()
-    effective = permissions.effective_allowed_tools(all_tools)
-
-    # In E2B mode, SDK built-in file tools (Read, Write, Edit, Glob, Grep)
-    # are replaced by MCP equivalents (read_file, write_file, ...).
-    # Map each SDK built-in name to its E2B MCP name so users can use the
-    # familiar names in their permissions and the E2B tools are included.
-    _SDK_TO_E2B: dict[str, str] = {}
-    if use_e2b:
-        from backend.copilot.sdk.e2b_file_tools import E2B_FILE_TOOL_NAMES
-
-        _SDK_TO_E2B = dict(
-            zip(
-                ["Read", "Write", "Edit", "Glob", "Grep"],
-                E2B_FILE_TOOL_NAMES,
-                strict=False,
-            )
-        )
-
-    # Build an updated allowed list by mapping short names → SDK names and
-    # keeping only those present in the original base_allowed list.
-    def to_sdk_names(short: str) -> list[str]:
-        names: list[str] = []
-        if short in TOOL_REGISTRY:
-            names.append(f"{MCP_TOOL_PREFIX}{short}")
-        elif short in _SDK_TO_E2B:
-            # E2B mode: map SDK built-in file tool to its MCP equivalent.
-            names.append(f"{MCP_TOOL_PREFIX}{_SDK_TO_E2B[short]}")
-        else:
-            names.append(short)  # SDK built-in — used as-is
-        return names
-
-    # short names permitted by permissions
-    permitted_sdk: set[str] = set()
-    for s in effective:
-        permitted_sdk.update(to_sdk_names(s))
-    # Always include the internal Read tool (used by SDK for large/truncated outputs)
-    permitted_sdk.add(f"{MCP_TOOL_PREFIX}{_READ_TOOL_NAME}")
-
-    filtered_allowed = [t for t in base_allowed if t in permitted_sdk]
-
-    # Extra disallowed = tools that were in base_allowed but are now removed
-    removed = set(base_allowed) - set(filtered_allowed)
-    extra_disallowed = list(set(base_disallowed) | removed)
-
-    return filtered_allowed, extra_disallowed
--- a/autogpt_platform/backend/backend/copilot/permissions_test.py
+++ b/autogpt_platform/backend/backend/copilot/permissions_test.py
@@ -1,579 +0,0 @@
-"""Tests for CopilotPermissions — tool/block capability filtering."""
-
-from __future__ import annotations
-
-import pytest
-
-from backend.copilot.permissions import (
-    ALL_TOOL_NAMES,
-    PLATFORM_TOOL_NAMES,
-    SDK_BUILTIN_TOOL_NAMES,
-    CopilotPermissions,
-    _block_matches,
-    all_known_tool_names,
-    apply_tool_permissions,
-    validate_block_identifiers,
-    validate_tool_names,
-)
-from backend.copilot.tools import TOOL_REGISTRY
-
-# ---------------------------------------------------------------------------
-# _block_matches
-# ---------------------------------------------------------------------------
-
-
-class TestBlockMatches:
-    BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
-    BLOCK_NAME = "HTTP Request"
-
-    def test_full_uuid_match(self):
-        assert _block_matches(self.BLOCK_ID, self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_full_uuid_case_insensitive(self):
-        assert _block_matches(self.BLOCK_ID.upper(), self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_full_uuid_no_match(self):
-        other = "aaaaaaaa-0000-0000-0000-000000000000"
-        assert not _block_matches(other, self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_partial_uuid_match(self):
-        assert _block_matches("c069dc6b", self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_partial_uuid_case_insensitive(self):
-        assert _block_matches("C069DC6B", self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_partial_uuid_no_match(self):
-        assert not _block_matches("deadbeef", self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_name_match(self):
-        assert _block_matches("HTTP Request", self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_name_case_insensitive(self):
-        assert _block_matches("http request", self.BLOCK_ID, self.BLOCK_NAME)
-        assert _block_matches("HTTP REQUEST", self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_name_no_match(self):
-        assert not _block_matches("Unknown Block", self.BLOCK_ID, self.BLOCK_NAME)
-
-    def test_partial_uuid_not_matching_as_name(self):
-        # "c069dc6b" is 8 hex chars → treated as partial UUID, NOT name match
-        assert not _block_matches(
-            "c069dc6b", "ffffffff-0000-0000-0000-000000000000", "c069dc6b"
-        )
-
-
-# ---------------------------------------------------------------------------
-# CopilotPermissions.effective_allowed_tools
-# ---------------------------------------------------------------------------
-
-
-ALL_TOOLS = frozenset(
-    ["run_block", "web_fetch", "bash_exec", "find_agent", "Task", "Read"]
-)
-
-
-class TestEffectiveAllowedTools:
-    def test_empty_list_allows_all(self):
-        perms = CopilotPermissions(tools=[], tools_exclude=True)
-        assert perms.effective_allowed_tools(ALL_TOOLS) == ALL_TOOLS
-
-    def test_empty_whitelist_allows_all(self):
-        # edge: tools_exclude=False but empty list → allow all
-        perms = CopilotPermissions(tools=[], tools_exclude=False)
-        assert perms.effective_allowed_tools(ALL_TOOLS) == ALL_TOOLS
-
-    def test_blacklist_removes_listed(self):
-        perms = CopilotPermissions(tools=["bash_exec", "web_fetch"], tools_exclude=True)
-        result = perms.effective_allowed_tools(ALL_TOOLS)
-        assert "bash_exec" not in result
-        assert "web_fetch" not in result
-        assert "run_block" in result
-        assert "Task" in result
-
-    def test_whitelist_keeps_only_listed(self):
-        perms = CopilotPermissions(tools=["run_block", "Task"], tools_exclude=False)
-        result = perms.effective_allowed_tools(ALL_TOOLS)
-        assert result == frozenset(["run_block", "Task"])
-
-    def test_whitelist_unknown_tool_yields_empty(self):
-        perms = CopilotPermissions(tools=["nonexistent"], tools_exclude=False)
-        result = perms.effective_allowed_tools(ALL_TOOLS)
-        assert result == frozenset()
-
-    def test_blacklist_unknown_tool_ignored(self):
-        perms = CopilotPermissions(tools=["nonexistent"], tools_exclude=True)
-        result = perms.effective_allowed_tools(ALL_TOOLS)
-        assert result == ALL_TOOLS
-
-
-# ---------------------------------------------------------------------------
-# CopilotPermissions.is_block_allowed
-# ---------------------------------------------------------------------------
-
-
-BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
-BLOCK_NAME = "HTTP Request"
-
-
-class TestIsBlockAllowed:
-    def test_empty_allows_everything(self):
-        perms = CopilotPermissions(blocks=[], blocks_exclude=True)
-        assert perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_blacklist_blocks_listed(self):
-        perms = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
-        assert not perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_blacklist_allows_unlisted(self):
-        perms = CopilotPermissions(blocks=["Other Block"], blocks_exclude=True)
-        assert perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_whitelist_allows_listed(self):
-        perms = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=False)
-        assert perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_whitelist_blocks_unlisted(self):
-        perms = CopilotPermissions(blocks=["Other Block"], blocks_exclude=False)
-        assert not perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_partial_uuid_blacklist(self):
-        perms = CopilotPermissions(blocks=["c069dc6b"], blocks_exclude=True)
-        assert not perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_full_uuid_whitelist(self):
-        perms = CopilotPermissions(blocks=[BLOCK_ID], blocks_exclude=False)
-        assert perms.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_parent_blocks_when_child_allows(self):
-        parent = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
-        child = CopilotPermissions(blocks=[], blocks_exclude=True)
-        child._parent = parent
-        assert not child.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_parent_allows_when_child_blocks(self):
-        parent = CopilotPermissions(blocks=[], blocks_exclude=True)
-        child = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
-        child._parent = parent
-        assert not child.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_both_must_allow(self):
-        parent = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=False)
-        child = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=False)
-        child._parent = parent
-        assert child.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-    def test_grandparent_blocks_propagate(self):
-        grandparent = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
-        parent = CopilotPermissions(blocks=[], blocks_exclude=True)
-        parent._parent = grandparent
-        child = CopilotPermissions(blocks=[], blocks_exclude=True)
-        child._parent = parent
-        assert not child.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-
-# ---------------------------------------------------------------------------
-# CopilotPermissions.merged_with_parent
-# ---------------------------------------------------------------------------
-
-
-class TestMergedWithParent:
-    def test_tool_intersection(self):
-        all_t = frozenset(["run_block", "web_fetch", "bash_exec"])
-        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
-        child = CopilotPermissions(tools=["web_fetch"], tools_exclude=True)
-        merged = child.merged_with_parent(parent, all_t)
-        effective = merged.effective_allowed_tools(all_t)
-        assert "bash_exec" not in effective
-        assert "web_fetch" not in effective
-        assert "run_block" in effective
-
-    def test_parent_whitelist_narrows_child(self):
-        all_t = frozenset(["run_block", "web_fetch", "bash_exec"])
-        parent = CopilotPermissions(tools=["run_block"], tools_exclude=False)
-        child = CopilotPermissions(tools=[], tools_exclude=True)  # allow all
-        merged = child.merged_with_parent(parent, all_t)
-        effective = merged.effective_allowed_tools(all_t)
-        assert effective == frozenset(["run_block"])
-
-    def test_child_cannot_expand_parent_whitelist(self):
-        all_t = frozenset(["run_block", "web_fetch", "bash_exec"])
-        parent = CopilotPermissions(tools=["run_block"], tools_exclude=False)
-        child = CopilotPermissions(
-            tools=["run_block", "bash_exec"], tools_exclude=False
-        )
-        merged = child.merged_with_parent(parent, all_t)
-        effective = merged.effective_allowed_tools(all_t)
-        # bash_exec was not in parent's whitelist → must not appear
-        assert "bash_exec" not in effective
-        assert "run_block" in effective
-
-    def test_merged_stored_as_whitelist(self):
-        all_t = frozenset(["run_block", "web_fetch"])
-        parent = CopilotPermissions(tools=[], tools_exclude=True)
-        child = CopilotPermissions(tools=[], tools_exclude=True)
-        merged = child.merged_with_parent(parent, all_t)
-        assert not merged.tools_exclude  # stored as whitelist
-        assert set(merged.tools) == {"run_block", "web_fetch"}
-
-    def test_block_parent_stored(self):
-        all_t = frozenset(["run_block"])
-        parent = CopilotPermissions(blocks=["HTTP Request"], blocks_exclude=True)
-        child = CopilotPermissions(blocks=[], blocks_exclude=True)
-        merged = child.merged_with_parent(parent, all_t)
-        # Parent restriction is preserved via _parent
-        assert not merged.is_block_allowed(BLOCK_ID, BLOCK_NAME)
-
-
-# ---------------------------------------------------------------------------
-# CopilotPermissions.is_empty
-# ---------------------------------------------------------------------------
-
-
-class TestIsEmpty:
-    def test_default_is_empty(self):
-        assert CopilotPermissions().is_empty()
-
-    def test_with_tools_not_empty(self):
-        assert not CopilotPermissions(tools=["bash_exec"]).is_empty()
-
-    def test_with_blocks_not_empty(self):
-        assert not CopilotPermissions(blocks=["HTTP Request"]).is_empty()
-
-    def test_with_parent_not_empty(self):
-        perms = CopilotPermissions()
-        perms._parent = CopilotPermissions(tools=["bash_exec"])
-        assert not perms.is_empty()
-
-
-# ---------------------------------------------------------------------------
-# validate_tool_names
-# ---------------------------------------------------------------------------
-
-
-class TestValidateToolNames:
-    def test_valid_registry_tool(self):
-        assert validate_tool_names(["run_block", "web_fetch"]) == []
-
-    def test_valid_sdk_builtin(self):
-        assert validate_tool_names(["Read", "Task", "WebSearch"]) == []
-
-    def test_invalid_tool(self):
-        result = validate_tool_names(["nonexistent_tool"])
-        assert "nonexistent_tool" in result
-
-    def test_mixed(self):
-        result = validate_tool_names(["run_block", "fake_tool"])
-        assert "fake_tool" in result
-        assert "run_block" not in result
-
-    def test_empty_list(self):
-        assert validate_tool_names([]) == []
-
-
-# ---------------------------------------------------------------------------
-# validate_block_identifiers (async)
-# ---------------------------------------------------------------------------
-
-
-@pytest.mark.asyncio
-class TestValidateBlockIdentifiers:
-    async def test_empty_list(self):
-        result = await validate_block_identifiers([])
-        assert result == []
-
-    async def test_valid_full_uuid(self, mocker):
-        mock_block = mocker.MagicMock()
-        mock_block.return_value.name = "HTTP Request"
-        mocker.patch(
-            "backend.blocks.get_blocks",
-            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block},
-        )
-        result = await validate_block_identifiers(
-            ["c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"]
-        )
-        assert result == []
-
-    async def test_invalid_identifier(self, mocker):
-        mock_block = mocker.MagicMock()
-        mock_block.return_value.name = "HTTP Request"
-        mocker.patch(
-            "backend.blocks.get_blocks",
-            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block},
-        )
-        result = await validate_block_identifiers(["totally_unknown"])
-        assert "totally_unknown" in result
-
-    async def test_partial_uuid_match(self, mocker):
-        mock_block = mocker.MagicMock()
-        mock_block.return_value.name = "HTTP Request"
-        mocker.patch(
-            "backend.blocks.get_blocks",
-            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block},
-        )
-        result = await validate_block_identifiers(["c069dc6b"])
-        assert result == []
-
-    async def test_name_match(self, mocker):
-        mock_block = mocker.MagicMock()
-        mock_block.return_value.name = "HTTP Request"
-        mocker.patch(
-            "backend.blocks.get_blocks",
-            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block},
-        )
-        result = await validate_block_identifiers(["http request"])
-        assert result == []
-
-
-# ---------------------------------------------------------------------------
-# apply_tool_permissions
-# ---------------------------------------------------------------------------
-
-
-class TestApplyToolPermissions:
-    def test_empty_permissions_returns_base_unchanged(self, mocker):
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
-            return_value=["mcp__copilot__run_block", "mcp__copilot__web_fetch", "Task"],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
-            return_value=["Bash"],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": object(), "web_fetch": object()},
-        )
-        perms = CopilotPermissions()
-        allowed, disallowed = apply_tool_permissions(perms, use_e2b=False)
-        assert "mcp__copilot__run_block" in allowed
-        assert "mcp__copilot__web_fetch" in allowed
-
-    def test_blacklist_removes_tool(self, mocker):
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
-            return_value=[
-                "mcp__copilot__run_block",
-                "mcp__copilot__web_fetch",
-                "mcp__copilot__bash_exec",
-                "Task",
-            ],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
-            return_value=["Bash"],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {
-                "run_block": object(),
-                "web_fetch": object(),
-                "bash_exec": object(),
-            },
-        )
-        mocker.patch(
-            "backend.copilot.permissions.all_known_tool_names",
-            return_value=frozenset(["run_block", "web_fetch", "bash_exec", "Task"]),
-        )
-        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
-        allowed, _ = apply_tool_permissions(perms, use_e2b=False)
-        assert "mcp__copilot__bash_exec" not in allowed
-        assert "mcp__copilot__run_block" in allowed
-
-    def test_whitelist_keeps_only_listed(self, mocker):
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
-            return_value=[
-                "mcp__copilot__run_block",
-                "mcp__copilot__web_fetch",
-                "Task",
-                "WebSearch",
-            ],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
-            return_value=["Bash"],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": object(), "web_fetch": object()},
-        )
-        mocker.patch(
-            "backend.copilot.permissions.all_known_tool_names",
-            return_value=frozenset(["run_block", "web_fetch", "Task", "WebSearch"]),
-        )
-        perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
-        allowed, _ = apply_tool_permissions(perms, use_e2b=False)
-        assert "mcp__copilot__run_block" in allowed
-        assert "mcp__copilot__web_fetch" not in allowed
-        assert "Task" not in allowed
-
-    def test_read_tool_always_included_even_when_blacklisted(self, mocker):
-        """mcp__copilot__Read must stay in allowed even if Read is explicitly blacklisted."""
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
-            return_value=[
-                "mcp__copilot__run_block",
-                "mcp__copilot__Read",
-                "Task",
-            ],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
-            return_value=[],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": object()},
-        )
-        mocker.patch(
-            "backend.copilot.permissions.all_known_tool_names",
-            return_value=frozenset(["run_block", "Read", "Task"]),
-        )
-        # Explicitly blacklist Read
-        perms = CopilotPermissions(tools=["Read"], tools_exclude=True)
-        allowed, _ = apply_tool_permissions(perms, use_e2b=False)
-        assert "mcp__copilot__Read" in allowed  # always preserved for SDK internals
-        assert "mcp__copilot__run_block" in allowed
-        assert "Task" in allowed
-
-    def test_read_tool_always_included_with_narrow_whitelist(self, mocker):
-        """mcp__copilot__Read must stay in allowed even when not in a whitelist."""
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
-            return_value=[
-                "mcp__copilot__run_block",
-                "mcp__copilot__Read",
-                "Task",
-            ],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
-            return_value=[],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": object()},
-        )
-        mocker.patch(
-            "backend.copilot.permissions.all_known_tool_names",
-            return_value=frozenset(["run_block", "Read", "Task"]),
-        )
-        # Whitelist only run_block — Read not listed
-        perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
-        allowed, _ = apply_tool_permissions(perms, use_e2b=False)
-        assert "mcp__copilot__Read" in allowed  # always preserved for SDK internals
-        assert "mcp__copilot__run_block" in allowed
-
-    def test_e2b_file_tools_included_when_sdk_builtin_whitelisted(self, mocker):
-        """In E2B mode, whitelisting 'Read' must include mcp__copilot__read_file."""
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
-            return_value=[
-                "mcp__copilot__run_block",
-                "mcp__copilot__Read",
-                "mcp__copilot__read_file",
-                "mcp__copilot__write_file",
-                "Task",
-            ],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
-            return_value=["Bash", "Read", "Write", "Edit", "Glob", "Grep"],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": object()},
-        )
-        mocker.patch(
-            "backend.copilot.permissions.all_known_tool_names",
-            return_value=frozenset(["run_block", "Read", "Write", "Task"]),
-        )
-        mocker.patch(
-            "backend.copilot.sdk.e2b_file_tools.E2B_FILE_TOOL_NAMES",
-            ["read_file", "write_file", "edit_file", "glob", "grep"],
-        )
-        # Whitelist Read and run_block — E2B read_file should be included
-        perms = CopilotPermissions(tools=["Read", "run_block"], tools_exclude=False)
-        allowed, _ = apply_tool_permissions(perms, use_e2b=True)
-        assert "mcp__copilot__read_file" in allowed
-        assert "mcp__copilot__run_block" in allowed
-        # Write not whitelisted — write_file should NOT be included
-        assert "mcp__copilot__write_file" not in allowed
-
-    def test_e2b_file_tools_excluded_when_sdk_builtin_blacklisted(self, mocker):
-        """In E2B mode, blacklisting 'Read' must also remove mcp__copilot__read_file."""
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_copilot_tool_names",
-            return_value=[
-                "mcp__copilot__run_block",
-                "mcp__copilot__Read",
-                "mcp__copilot__read_file",
-                "Task",
-            ],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.get_sdk_disallowed_tools",
-            return_value=["Bash", "Read", "Write", "Edit", "Glob", "Grep"],
-        )
-        mocker.patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": object()},
-        )
-        mocker.patch(
-            "backend.copilot.permissions.all_known_tool_names",
-            return_value=frozenset(["run_block", "Read", "Task"]),
-        )
-        mocker.patch(
-            "backend.copilot.sdk.e2b_file_tools.E2B_FILE_TOOL_NAMES",
-            ["read_file", "write_file", "edit_file", "glob", "grep"],
-        )
-        # Blacklist Read — E2B read_file should also be removed
-        perms = CopilotPermissions(tools=["Read"], tools_exclude=True)
-        allowed, _ = apply_tool_permissions(perms, use_e2b=True)
-        assert "mcp__copilot__read_file" not in allowed
-        assert "mcp__copilot__run_block" in allowed
-        # mcp__copilot__Read is always preserved for SDK internals
-        assert "mcp__copilot__Read" in allowed
-
-
-# ---------------------------------------------------------------------------
-# SDK_BUILTIN_TOOL_NAMES sanity check
-# ---------------------------------------------------------------------------
-
-
-class TestSdkBuiltinToolNames:
-    def test_expected_builtins_present(self):
-        expected = {
-            "Read",
-            "Write",
-            "Edit",
-            "Glob",
-            "Grep",
-            "Task",
-            "WebSearch",
-            "TodoWrite",
-        }
-        assert expected.issubset(SDK_BUILTIN_TOOL_NAMES)
-
-    def test_platform_names_match_tool_registry(self):
-        """PLATFORM_TOOL_NAMES (derived from ToolName Literal) must match TOOL_REGISTRY keys."""
-        registry_keys = frozenset(TOOL_REGISTRY.keys())
-        assert PLATFORM_TOOL_NAMES == registry_keys, (
-            f"ToolName Literal is out of sync with TOOL_REGISTRY. "
-            f"Missing: {registry_keys - PLATFORM_TOOL_NAMES}, "
-            f"Extra: {PLATFORM_TOOL_NAMES - registry_keys}"
-        )
-
-    def test_all_tool_names_is_union(self):
-        """ALL_TOOL_NAMES must equal PLATFORM_TOOL_NAMES | SDK_BUILTIN_TOOL_NAMES."""
-        assert ALL_TOOL_NAMES == PLATFORM_TOOL_NAMES | SDK_BUILTIN_TOOL_NAMES
-
-    def test_no_overlap_between_platform_and_sdk(self):
-        """Platform and SDK built-in names must not overlap."""
-        assert PLATFORM_TOOL_NAMES.isdisjoint(SDK_BUILTIN_TOOL_NAMES)
-
-    def test_known_tools_includes_registry_and_builtins(self):
-        known = all_known_tool_names()
-        assert "run_block" in known
-        assert "Read" in known
-        assert "Task" in known
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -63,50 +63,6 @@ Example — committing an image file to GitHub:
 }}
 ```

-### Writing large files — CRITICAL
-**Never write an entire large document in a single tool call.**  When the
-content you want to write exceeds ~2000 words the tool call's output token
-limit will silently truncate the arguments, producing an empty `{{}}` input
-that fails repeatedly.
-
-**Preferred: compose from file references.**  If the data is already in
-files (tool outputs, workspace files), compose the report in one call
-using `@@agptfile:` references — the system expands them inline:
-
-```bash
-cat > report.md << 'EOF'
-# Research Report
-## Data from web research
-@@agptfile:/home/user/web_results.txt
-## Block execution output
-@@agptfile:workspace://<file_id>
-## Conclusion
-<brief synthesis>
-EOF
-```
-
-**Fallback: write section-by-section.**  When you must generate content
-from conversation context (no files to reference), split into multiple
-`bash_exec` calls — one section per call:
-
-```bash
-cat > report.md << 'EOF'
-# Section 1
-<content from your earlier tool call results>
-EOF
-```
-```bash
-cat >> report.md << 'EOF'
-# Section 2
-<content from your earlier tool call results>
-EOF
-```
-Use `cat >` for the first chunk and `cat >>` to append subsequent chunks.
-Do not re-fetch or re-generate data you already have from prior tool calls.
-
-After building the file, reference it with `@@agptfile:` in other tools:
-`@@agptfile:/home/user/report.md`
-
 ### Sub-agent tasks
 - When using the Task tool, NEVER set `run_in_background` to true.
  All tasks must run in the foreground.
--- a/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
@@ -143,11 +143,11 @@ To use an MCP (Model Context Protocol) tool as a node in the agent:
   tool_arguments.
 6. Output: `result` (the tool's return value) and `error` (error message)

-### Using OrchestratorBlock (AI Orchestrator with Agent Mode)
+### Using SmartDecisionMakerBlock (AI Orchestrator with Agent Mode)

 To create an agent where AI autonomously decides which tools or sub-agents to
 call in a loop until the task is complete:
-1. Create a `OrchestratorBlock` node
+1. Create a `SmartDecisionMakerBlock` node
   (ID: `3b191d9f-356f-482d-8238-ba04b6d18381`)
 2. Set `input_default`:
   - `agent_mode_max_iterations`: Choose based on task complexity:
@@ -169,8 +169,8 @@ call in a loop until the task is complete:
 3. Wire the `prompt` input from an `AgentInputBlock` (the user's task)
 4. Create downstream tool blocks — regular blocks **or** `AgentExecutorBlock`
   nodes that call sub-agents
-5. Link each tool to the Orchestrator: set `source_name: "tools"` on
-   the Orchestrator side and `sink_name: <input_field>` on each tool
+5. Link each tool to the SmartDecisionMaker: set `source_name: "tools"` on
+   the SmartDecisionMaker side and `sink_name: <input_field>` on each tool
   block's input. Create one link per input field the tool needs.
 6. Wire the `finished` output to an `AgentOutputBlock` for the final result
 7. Credentials (LLM API key) are configured by the user in the platform UI
@@ -178,35 +178,35 @@ call in a loop until the task is complete:

 **Example — Orchestrator calling two sub-agents:**
 - Node 1: `AgentInputBlock` (input_default: `{"name": "task"}`)
- Node 2: `OrchestratorBlock` (input_default:
+- Node 2: `SmartDecisionMakerBlock` (input_default:
  `{"agent_mode_max_iterations": 10, "conversation_compaction": true}`)
 - Node 3: `AgentExecutorBlock` (sub-agent A — set `graph_id`, `graph_version`,
  `input_schema`, `output_schema` from library agent)
 - Node 4: `AgentExecutorBlock` (sub-agent B — same pattern)
 - Node 5: `AgentOutputBlock` (input_default: `{"name": "result"}`)
 - Links:
-  - Input→Orchestrator: `source_name: "result"`, `sink_name: "prompt"`
-  - Orchestrator→Agent A (per input field): `source_name: "tools"`,
+  - Input→SDM: `source_name: "result"`, `sink_name: "prompt"`
+  - SDM→Agent A (per input field): `source_name: "tools"`,
    `sink_name: "<agent_a_input_field>"`
-  - Orchestrator→Agent B (per input field): `source_name: "tools"`,
+  - SDM→Agent B (per input field): `source_name: "tools"`,
    `sink_name: "<agent_b_input_field>"`
-  - Orchestrator→Output: `source_name: "finished"`, `sink_name: "value"`
+  - SDM→Output: `source_name: "finished"`, `sink_name: "value"`

 **Example — Orchestrator calling regular blocks as tools:**
 - Node 1: `AgentInputBlock` (input_default: `{"name": "task"}`)
- Node 2: `OrchestratorBlock` (input_default:
+- Node 2: `SmartDecisionMakerBlock` (input_default:
  `{"agent_mode_max_iterations": 5, "conversation_compaction": true}`)
 - Node 3: `GetWebpageBlock` (regular block — the AI calls it as a tool)
 - Node 4: `AITextGeneratorBlock` (another regular block as a tool)
 - Node 5: `AgentOutputBlock` (input_default: `{"name": "result"}`)
 - Links:
-  - Input→Orchestrator: `source_name: "result"`, `sink_name: "prompt"`
-  - Orchestrator→GetWebpage: `source_name: "tools"`, `sink_name: "url"`
-  - Orchestrator→AITextGenerator: `source_name: "tools"`, `sink_name: "prompt"`
-  - Orchestrator→Output: `source_name: "finished"`, `sink_name: "value"`
+  - Input→SDM: `source_name: "result"`, `sink_name: "prompt"`
+  - SDM→GetWebpage: `source_name: "tools"`, `sink_name: "url"`
+  - SDM→AITextGenerator: `source_name: "tools"`, `sink_name: "prompt"`
+  - SDM→Output: `source_name: "finished"`, `sink_name: "value"`

 Regular blocks work exactly like sub-agents as tools — wire each input
-field from `source_name: "tools"` on the Orchestrator side.
+field from `source_name: "tools"` on the SmartDecisionMaker side.

 ### Example: Simple AI Text Processor

--- a/autogpt_platform/backend/backend/copilot/sdk/collect.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/collect.py
@@ -7,20 +7,7 @@ without implementing their own event loop.

 from __future__ import annotations

-from typing import TYPE_CHECKING, Any
-
-from backend.copilot.response_model import (
-    StreamError,
-    StreamTextDelta,
-    StreamToolInputAvailable,
-    StreamToolOutputAvailable,
-    StreamUsage,
-)
-
-from .service import stream_chat_completion_sdk
-
-if TYPE_CHECKING:
-    from backend.copilot.permissions import CopilotPermissions
+from typing import Any


 class CopilotResult:
@@ -52,7 +39,6 @@ async def collect_copilot_response(
    message: str,
    user_id: str,
    is_user_message: bool = True,
-    permissions: "CopilotPermissions | None" = None,
 ) -> CopilotResult:
    """Consume :func:`stream_chat_completion_sdk` and return aggregated results.

@@ -67,8 +53,6 @@ async def collect_copilot_response(
        message: The user message / prompt.
        user_id: Authenticated user ID.
        is_user_message: Whether this is a user-initiated message.
-        permissions: Optional capability filter.  When provided, restricts
-            which tools and blocks the copilot may use during this execution.

    Returns:
        A :class:`CopilotResult` with the aggregated response text,
@@ -77,6 +61,16 @@ async def collect_copilot_response(
    Raises:
        RuntimeError: If the stream yields a ``StreamError`` event.
    """
+    from backend.copilot.response_model import (
+        StreamError,
+        StreamTextDelta,
+        StreamToolInputAvailable,
+        StreamToolOutputAvailable,
+        StreamUsage,
+    )
+
+    from .service import stream_chat_completion_sdk
+
    result = CopilotResult()
    response_parts: list[str] = []
    tool_calls_by_id: dict[str, dict[str, Any]] = {}
@@ -86,7 +80,6 @@ async def collect_copilot_response(
        message=message,
        is_user_message=is_user_message,
        user_id=user_id,
-        permissions=permissions,
    ):
        if isinstance(event, StreamTextDelta):
            response_parts.append(event.delta)
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
@@ -12,10 +12,7 @@ import time
 import uuid
 from collections.abc import AsyncGenerator, AsyncIterator
 from dataclasses import dataclass
-from typing import TYPE_CHECKING, Any, NamedTuple, cast
-
-if TYPE_CHECKING:
-    from backend.copilot.permissions import CopilotPermissions
+from typing import Any, NamedTuple, cast

 from claude_agent_sdk import (
    AssistantMessage,
@@ -32,7 +29,6 @@ from langsmith.integrations.claude_agent_sdk import configure_claude_agent_sdk
 from pydantic import BaseModel

 from backend.copilot.context import get_workspace_manager
-from backend.copilot.permissions import apply_tool_permissions
 from backend.data.redis_client import get_redis_async
 from backend.executor.cluster_lock import AsyncClusterLock
 from backend.util.exceptions import NotFoundError
@@ -81,13 +77,9 @@ from .response_adapter import SDKResponseAdapter
 from .security_hooks import create_security_hooks
 from .subscription import validate_subscription as _validate_claude_code_subscription
 from .tool_adapter import (
-    cancel_pending_tool_tasks,
    create_copilot_mcp_server,
    get_copilot_tool_names,
    get_sdk_disallowed_tools,
-    pre_launch_tool_call,
-    reset_stash_event,
-    reset_tool_failure_counters,
    set_execution_context,
    wait_for_stash,
 )
@@ -113,20 +105,6 @@ config = ChatConfig()
 # Non-context errors (network, auth, rate-limit) are NOT retried.
 _MAX_STREAM_ATTEMPTS = 3

-# Hard circuit breaker: abort the stream if the model sends this many
-# consecutive tool calls with empty parameters (a sign of context
-# saturation or serialization failure).  Empty input ({}) is never
-# legitimate — even one is suspicious, three is conclusive.
-_EMPTY_TOOL_CALL_LIMIT = 3
-
-# User-facing error shown when the empty-tool-call circuit breaker trips.
-_CIRCUIT_BREAKER_ERROR_MSG = (
-    "AutoPilot was unable to complete the tool call "
-    "— this usually happens when the response is "
-    "too large to fit in a single tool call. "
-    "Try breaking your request into smaller parts."
-)
-
 # Patterns that indicate the prompt/request exceeds the model's context limit.
 # Matched case-insensitively against the full exception chain.
 _PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
@@ -185,19 +163,6 @@ def _is_prompt_too_long(err: BaseException) -> bool:
    return False


-def _is_tool_only_message(sdk_msg: object) -> bool:
-    """Return True if *sdk_msg* is an AssistantMessage containing only ToolUseBlocks.
-
-    Such a message represents a parallel tool-call batch (no text output yet).
-    The ``bool(…content)`` guard prevents vacuous-truth evaluation on an empty list.
-    """
-    return (
-        isinstance(sdk_msg, AssistantMessage)
-        and bool(sdk_msg.content)
-        and all(isinstance(b, ToolUseBlock) for b in sdk_msg.content)
-    )
-
-
 class ReducedContext(NamedTuple):
    builder: TranscriptBuilder
    use_resume: bool
@@ -1031,122 +996,15 @@ def _dispatch_response(
    return response


-class _HandledStreamError(Exception):
+class _TransientErrorHandled(Exception):
    """Raised by `_run_stream_attempt` after it has already yielded a
-    `StreamError` to the client (e.g. transient API error, circuit breaker).
+    `StreamError` for a transient API error.

    This signals the outer retry loop that the attempt failed so it can
    perform session-message rollback and set the `ended_with_stream_error`
    flag, **without** yielding a duplicate `StreamError` to the client.
-
-    Attributes:
-        error_msg: The user-facing error message to persist.
-        code: Machine-readable error code (e.g. ``circuit_breaker_empty_tool_calls``).
-        retryable: Whether the frontend should offer a retry button.
    """

-    def __init__(
-        self,
-        message: str,
-        error_msg: str | None = None,
-        code: str | None = None,
-        retryable: bool = True,
-    ):
-        super().__init__(message)
-        self.error_msg = error_msg
-        self.code = code
-        self.retryable = retryable
-
-
-@dataclass
-class _EmptyToolBreakResult:
-    """Result of checking for empty tool calls in a single AssistantMessage."""
-
-    count: int  # Updated consecutive counter
-    tripped: bool  # Whether the circuit breaker fired
-    error: StreamError | None  # StreamError to yield (if tripped)
-    error_msg: str | None  # Error message (if tripped)
-    error_code: str | None  # Error code (if tripped)
-
-
-def _check_empty_tool_breaker(
-    sdk_msg: object,
-    consecutive: int,
-    ctx: _StreamContext,
-    state: _RetryState,
-) -> _EmptyToolBreakResult:
-    """Detect consecutive empty tool calls and trip the circuit breaker.
-
-    Returns an ``_EmptyToolBreakResult`` with the updated counter and, if the
-    breaker tripped, the ``StreamError`` to yield plus the error metadata.
-    """
-    if not isinstance(sdk_msg, AssistantMessage):
-        return _EmptyToolBreakResult(consecutive, False, None, None, None)
-
-    empty_tools = [
-        b.name for b in sdk_msg.content if isinstance(b, ToolUseBlock) and not b.input
-    ]
-    if not empty_tools:
-        # Reset on any non-empty-tool AssistantMessage (including text-only
-        # messages — any() over empty content is False).
-        return _EmptyToolBreakResult(0, False, None, None, None)
-
-    consecutive += 1
-
-    # Log full diagnostics on first occurrence only; subsequent hits just
-    # log the counter to reduce noise.
-    if consecutive == 1:
-        logger.warning(
-            "%s Empty tool call detected (%d/%d): "
-            "tools=%s, model=%s, error=%s, "
-            "block_types=%s, cumulative_usage=%s",
-            ctx.log_prefix,
-            consecutive,
-            _EMPTY_TOOL_CALL_LIMIT,
-            empty_tools,
-            sdk_msg.model,
-            sdk_msg.error,
-            [type(b).__name__ for b in sdk_msg.content],
-            {
-                "prompt": state.usage.prompt_tokens,
-                "completion": state.usage.completion_tokens,
-                "cache_read": state.usage.cache_read_tokens,
-            },
-        )
-    else:
-        logger.warning(
-            "%s Empty tool call detected (%d/%d): tools=%s",
-            ctx.log_prefix,
-            consecutive,
-            _EMPTY_TOOL_CALL_LIMIT,
-            empty_tools,
-        )
-
-    if consecutive < _EMPTY_TOOL_CALL_LIMIT:
-        return _EmptyToolBreakResult(consecutive, False, None, None, None)
-
-    logger.error(
-        "%s Circuit breaker: aborting stream after %d "
-        "consecutive empty tool calls. "
-        "This is likely caused by the model attempting "
-        "to write content too large for a single tool "
-        "call's output token limit. The model should "
-        "write large files in chunks using bash_exec "
-        "with cat >> (append).",
-        ctx.log_prefix,
-        consecutive,
-    )
-    error_msg = _CIRCUIT_BREAKER_ERROR_MSG
-    error_code = "circuit_breaker_empty_tool_calls"
-    _append_error_marker(ctx.session, error_msg, retryable=True)
-    return _EmptyToolBreakResult(
-        count=consecutive,
-        tripped=True,
-        error=StreamError(errorText=error_msg, code=error_code),
-        error_msg=error_msg,
-        error_code=error_code,
-    )
-

 async def _run_stream_attempt(
    ctx: _StreamContext,
@@ -1181,12 +1039,6 @@ async def _run_stream_attempt(
        accumulated_tool_calls=[],
    )
    ended_with_stream_error = False
-    # Stores the error message used by _append_error_marker so the outer
-    # retry loop can re-append the correct message after session rollback.
-    stream_error_msg: str | None = None
-    stream_error_code: str | None = None
-
-    consecutive_empty_tool_calls = 0

    async with ClaudeSDKClient(options=state.options) as client:
        logger.info(
@@ -1277,43 +1129,18 @@ async def _run_stream_attempt(
                        "suppressing raw error text",
                        ctx.log_prefix,
                    )
-                    stream_error_msg = FRIENDLY_TRANSIENT_MSG
-                    stream_error_code = "transient_api_error"
                    _append_error_marker(
                        ctx.session,
-                        stream_error_msg,
+                        FRIENDLY_TRANSIENT_MSG,
                        retryable=True,
                    )
                    yield StreamError(
-                        errorText=stream_error_msg,
-                        code=stream_error_code,
+                        errorText=FRIENDLY_TRANSIENT_MSG,
+                        code="transient_api_error",
                    )
                    ended_with_stream_error = True
                    break

-            # Parallel tool execution: pre-launch every ToolUseBlock as an
-            # asyncio.Task the moment its AssistantMessage arrives.  The SDK
-            # sends one AssistantMessage per tool call when issuing parallel
-            # calls, so each message is pre-launched independently.  The MCP
-            # handlers will await the already-running task instead of executing
-            # fresh, making all concurrent tool calls run in parallel.
-            #
-            # Also determine if the message is a tool-only batch (all content
-            # items are ToolUseBlocks) — such messages have no text output yet,
-            # so we skip the wait_for_stash flush below.
-            is_tool_only = False
-            if isinstance(sdk_msg, AssistantMessage) and sdk_msg.content:
-                is_tool_only = True
-                # NOTE: Pre-launches are sequential (each await completes
-                # file-ref expansion before the next starts).  This is fine
-                # since expansion is typically sub-ms; a future optimisation
-                # could gather all pre-launches concurrently.
-                for tool_use in sdk_msg.content:
-                    if isinstance(tool_use, ToolUseBlock):
-                        await pre_launch_tool_call(tool_use.name, tool_use.input)
-                    else:
-                        is_tool_only = False
-
            # Race-condition fix: SDK hooks (PostToolUse) are
            # executed asynchronously via start_soon() — the next
            # message can arrive before the hook stashes output.
@@ -1327,12 +1154,15 @@ async def _run_stream_attempt(
            # AssistantMessages (each containing only
            # ToolUseBlocks), we must NOT wait/flush — the prior
            # tools are still executing concurrently.
+            is_parallel_continuation = isinstance(sdk_msg, AssistantMessage) and all(
+                isinstance(b, ToolUseBlock) for b in sdk_msg.content
+            )
            if (
                state.adapter.has_unresolved_tool_calls
                and isinstance(sdk_msg, (AssistantMessage, ResultMessage))
-                and not is_tool_only
+                and not is_parallel_continuation
            ):
-                if await wait_for_stash():
+                if await wait_for_stash(timeout=0.5):
                    await asyncio.sleep(0)
                else:
                    logger.warning(
@@ -1347,17 +1177,13 @@ async def _run_stream_attempt(
            if isinstance(sdk_msg, ResultMessage):
                logger.info(
                    "%s Received: ResultMessage %s "
-                    "(unresolved=%d, current=%d, resolved=%d, "
-                    "num_turns=%d, cost_usd=%s, result=%s)",
+                    "(unresolved=%d, current=%d, resolved=%d)",
                    ctx.log_prefix,
                    sdk_msg.subtype,
                    len(state.adapter.current_tool_calls)
                    - len(state.adapter.resolved_tool_calls),
                    len(state.adapter.current_tool_calls),
                    len(state.adapter.resolved_tool_calls),
-                    sdk_msg.num_turns,
-                    sdk_msg.total_cost_usd,
-                    (sdk_msg.result or "")[:200],
                )
                if sdk_msg.subtype in (
                    "error",
@@ -1414,18 +1240,6 @@ async def _run_stream_attempt(
                    )
                    entries_replaced = True

-            # --- Hard circuit breaker for empty tool calls ---
-            breaker = _check_empty_tool_breaker(
-                sdk_msg, consecutive_empty_tool_calls, ctx, state
-            )
-            consecutive_empty_tool_calls = breaker.count
-            if breaker.tripped and breaker.error is not None:
-                stream_error_msg = breaker.error_msg
-                stream_error_code = breaker.error_code
-                yield breaker.error
-                ended_with_stream_error = True
-                break
-
            # --- Dispatch adapter responses ---
            for response in state.adapter.convert_message(sdk_msg):
                dispatched = _dispatch_response(
@@ -1506,10 +1320,8 @@ async def _run_stream_attempt(
    # to the client (StreamError yielded above), raise so the outer retry
    # loop can rollback session messages and set its error flags properly.
    if ended_with_stream_error:
-        raise _HandledStreamError(
-            "Stream error handled — StreamError already yielded",
-            error_msg=stream_error_msg,
-            code=stream_error_code,
+        raise _TransientErrorHandled(
+            "Transient API error handled — StreamError already yielded"
        )


@@ -1520,7 +1332,6 @@ async def stream_chat_completion_sdk(
    user_id: str | None = None,
    session: ChatSession | None = None,
    file_ids: list[str] | None = None,
-    permissions: "CopilotPermissions | None" = None,
    **_kwargs: Any,
 ) -> AsyncIterator[StreamBaseResponse]:
    """Stream chat completion using Claude Agent SDK.
@@ -1766,13 +1577,7 @@ async def stream_chat_completion_sdk(

        yield StreamStart(messageId=message_id, sessionId=session_id)

-        set_execution_context(
-            user_id,
-            session,
-            sandbox=e2b_sandbox,
-            sdk_cwd=sdk_cwd,
-            permissions=permissions,
-        )
+        set_execution_context(user_id, session, sandbox=e2b_sandbox, sdk_cwd=sdk_cwd)

        # Fail fast when no API credentials are available at all.
        sdk_env = _build_sdk_env(session_id=session_id, user_id=user_id)
@@ -1798,11 +1603,8 @@ async def stream_chat_completion_sdk(
            on_compact=compaction.on_compact,
        )

-        if permissions is not None:
-            allowed, disallowed = apply_tool_permissions(permissions, use_e2b=use_e2b)
-        else:
-            allowed = get_copilot_tool_names(use_e2b=use_e2b)
-            disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)
+        allowed = get_copilot_tool_names(use_e2b=use_e2b)
+        disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)

        def _on_stderr(line: str) -> None:
            """Log a stderr line emitted by the Claude CLI subprocess."""
@@ -1912,12 +1714,6 @@ async def stream_chat_completion_sdk(
        )

        for attempt in range(_MAX_STREAM_ATTEMPTS):
-            # Clear any stale stash signal from the previous attempt so
-            # wait_for_stash() doesn't fire prematurely on a leftover event.
-            reset_stash_event()
-            # Reset tool-level circuit breaker so failures from a previous
-            # (rolled-back) attempt don't carry over to the fresh attempt.
-            reset_tool_failure_counters()
            if attempt > 0:
                logger.info(
                    "%s Retrying with reduced context (%d/%d)",
@@ -1973,10 +1769,6 @@ async def stream_chat_completion_sdk(
                    if not isinstance(event, StreamHeartbeat):
                        events_yielded += 1
                    yield event
-                # Cancel any pre-launched tasks that were never dispatched
-                # by the SDK (e.g. edge-case SDK behaviour changes). Symmetric
-                # with the three error-path await cancel_pending_tool_tasks() calls.
-                await cancel_pending_tool_tasks()
                break  # Stream completed — exit retry loop
            except asyncio.CancelledError:
                logger.warning(
@@ -1985,42 +1777,26 @@ async def stream_chat_completion_sdk(
                    attempt + 1,
                    _MAX_STREAM_ATTEMPTS,
                )
-                # Cancel any pre-launched tasks so they don't continue executing
-                # against a rolled-back or abandoned session.
-                await cancel_pending_tool_tasks()
                raise
-            except _HandledStreamError as exc:
+            except _TransientErrorHandled:
                # _run_stream_attempt already yielded a StreamError and
                # appended an error marker.  We only need to rollback
                # session messages and set the error flag — do NOT set
                # stream_err so the post-loop code won't emit a
                # duplicate StreamError.
                logger.warning(
-                    "%s Stream error handled in attempt "
-                    "(attempt %d/%d, code=%s, events_yielded=%d)",
+                    "%s Transient error handled in stream attempt "
+                    "(attempt %d/%d, events_yielded=%d)",
                    log_prefix,
                    attempt + 1,
                    _MAX_STREAM_ATTEMPTS,
-                    exc.code or "transient",
                    events_yielded,
                )
                session.messages = session.messages[:pre_attempt_msg_count]
-                # transcript_builder still contains entries from the aborted
-                # attempt that no longer match session.messages.  Skip upload
-                # so a future --resume doesn't replay rolled-back content.
-                skip_transcript_upload = True
                # Re-append the error marker so it survives the rollback
                # and is persisted by the finally block (see #2947655365).
-                # Use the specific error message from the attempt (e.g.
-                # circuit breaker msg) rather than always the generic one.
-                _append_error_marker(
-                    session,
-                    exc.error_msg or FRIENDLY_TRANSIENT_MSG,
-                    retryable=True,
-                )
+                _append_error_marker(session, FRIENDLY_TRANSIENT_MSG, retryable=True)
                ended_with_stream_error = True
-                # Cancel any pre-launched tasks from the failed attempt.
-                await cancel_pending_tool_tasks()
                break
            except Exception as e:
                stream_err = e
@@ -2037,9 +1813,6 @@ async def stream_chat_completion_sdk(
                    exc_info=True,
                )
                session.messages = session.messages[:pre_attempt_msg_count]
-                # Cancel any pre-launched tasks from the failed attempt so they
-                # don't continue executing against the rolled-back session.
-                await cancel_pending_tool_tasks()
                if events_yielded > 0:
                    # Events were already sent to the frontend and cannot be
                    # unsent.  Retrying would produce duplicate/inconsistent
@@ -2049,13 +1822,11 @@ async def stream_chat_completion_sdk(
                        log_prefix,
                        events_yielded,
                    )
-                    skip_transcript_upload = True
                    ended_with_stream_error = True
                    break
                if not is_context_error:
                    # Non-context errors (network, auth, rate-limit) should
                    # not trigger compaction — surface the error immediately.
-                    skip_transcript_upload = True
                    ended_with_stream_error = True
                    break
                continue
--- a/autogpt_platform/backend/backend/copilot/sdk/service_helpers_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service_helpers_test.py
@@ -1,23 +1,21 @@
 """Unit tests for extracted service helpers.

 Covers ``_is_prompt_too_long``, ``_reduce_context``, ``_iter_sdk_messages``,
-``ReducedContext``, and the ``is_parallel_continuation`` logic.
+and the ``ReducedContext`` named tuple.
 """

 from __future__ import annotations

 import asyncio
 from collections.abc import AsyncGenerator
-from unittest.mock import AsyncMock, MagicMock, patch
+from unittest.mock import AsyncMock, patch

 import pytest
-from claude_agent_sdk import AssistantMessage, TextBlock, ToolUseBlock

 from .conftest import build_test_transcript as _build_transcript
 from .service import (
    ReducedContext,
    _is_prompt_too_long,
-    _is_tool_only_message,
    _iter_sdk_messages,
    _reduce_context,
 )
@@ -283,55 +281,3 @@ class TestIterSdkMessages:
        first = await gen.__anext__()
        assert first == "first"
        await gen.aclose()  # should cancel pending task cleanly
-
-
-# ---------------------------------------------------------------------------
-# is_parallel_continuation logic
-# ---------------------------------------------------------------------------
-
-
-class TestIsParallelContinuation:
-    """Unit tests for the is_parallel_continuation expression in the streaming loop.
-
-    Verifies the vacuous-truth guard (empty content must return False) and the
-    boundary cases for mixed TextBlock+ToolUseBlock messages.
-    """
-
-    def _make_tool_block(self) -> MagicMock:
-        block = MagicMock(spec=ToolUseBlock)
-        return block
-
-    def test_all_tool_use_blocks_is_parallel(self):
-        """AssistantMessage with only ToolUseBlocks is a parallel continuation."""
-        msg = MagicMock(spec=AssistantMessage)
-        msg.content = [self._make_tool_block(), self._make_tool_block()]
-        assert _is_tool_only_message(msg) is True
-
-    def test_empty_content_is_not_parallel(self):
-        """AssistantMessage with empty content must NOT be treated as parallel.
-
-        Without the bool(sdk_msg.content) guard, all() on an empty iterable
-        returns True via vacuous truth — this test ensures the guard is present.
-        """
-        msg = MagicMock(spec=AssistantMessage)
-        msg.content = []
-        assert _is_tool_only_message(msg) is False
-
-    def test_mixed_text_and_tool_blocks_not_parallel(self):
-        """AssistantMessage with text + tool blocks is NOT a parallel continuation."""
-        msg = MagicMock(spec=AssistantMessage)
-        text_block = MagicMock(spec=TextBlock)
-        msg.content = [text_block, self._make_tool_block()]
-        assert _is_tool_only_message(msg) is False
-
-    def test_non_assistant_message_not_parallel(self):
-        """Non-AssistantMessage types are never parallel continuations."""
-        assert _is_tool_only_message("not a message") is False
-        assert _is_tool_only_message(None) is False
-        assert _is_tool_only_message(42) is False
-
-    def test_single_tool_block_is_parallel(self):
-        """Single ToolUseBlock AssistantMessage is a parallel continuation."""
-        msg = MagicMock(spec=AssistantMessage)
-        msg.content = [self._make_tool_block()]
-        assert _is_tool_only_message(msg) is True
--- a/autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
@@ -1,96 +0,0 @@
-"""Tests for the tool call circuit breaker in tool_adapter.py."""
-
-import pytest
-
-from backend.copilot.sdk.tool_adapter import (
-    _MAX_CONSECUTIVE_TOOL_FAILURES,
-    _check_circuit_breaker,
-    _clear_tool_failures,
-    _consecutive_tool_failures,
-    _record_tool_failure,
-)
-
-
-@pytest.fixture(autouse=True)
-def _reset_tracker():
-    """Reset the circuit breaker tracker for each test."""
-    token = _consecutive_tool_failures.set({})
-    yield
-    _consecutive_tool_failures.reset(token)
-
-
-class TestCircuitBreaker:
-    def test_no_trip_below_threshold(self):
-        """Circuit breaker should not trip before reaching the limit."""
-        args = {"file_path": "/tmp/test.txt"}
-        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES - 1):
-            assert _check_circuit_breaker("write_file", args) is None
-            _record_tool_failure("write_file", args)
-        # Still under the limit
-        assert _check_circuit_breaker("write_file", args) is None
-
-    def test_trips_at_threshold(self):
-        """Circuit breaker should trip after reaching the failure limit."""
-        args = {"file_path": "/tmp/test.txt"}
-        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
-            assert _check_circuit_breaker("write_file", args) is None
-            _record_tool_failure("write_file", args)
-        # Now it should trip
-        result = _check_circuit_breaker("write_file", args)
-        assert result is not None
-        assert "STOP" in result
-        assert "write_file" in result
-
-    def test_different_args_tracked_separately(self):
-        """Different args should have separate failure counters."""
-        args_a = {"file_path": "/tmp/a.txt"}
-        args_b = {"file_path": "/tmp/b.txt"}
-        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
-            _record_tool_failure("write_file", args_a)
-        # args_a should trip
-        assert _check_circuit_breaker("write_file", args_a) is not None
-        # args_b should NOT trip
-        assert _check_circuit_breaker("write_file", args_b) is None
-
-    def test_different_tools_tracked_separately(self):
-        """Different tools should have separate failure counters."""
-        args = {"file_path": "/tmp/test.txt"}
-        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
-            _record_tool_failure("tool_a", args)
-        # tool_a should trip
-        assert _check_circuit_breaker("tool_a", args) is not None
-        # tool_b with same args should NOT trip
-        assert _check_circuit_breaker("tool_b", args) is None
-
-    def test_empty_args_tracked(self):
-        """Empty args ({}) — the exact failure pattern from the bug — should be tracked."""
-        args = {}
-        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
-            _record_tool_failure("write_file", args)
-        assert _check_circuit_breaker("write_file", args) is not None
-
-    def test_clear_resets_counter(self):
-        """Clearing failures should reset the counter."""
-        args = {}
-        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
-            _record_tool_failure("write_file", args)
-        _clear_tool_failures("write_file")
-        assert _check_circuit_breaker("write_file", args) is None
-
-    def test_success_clears_failures(self):
-        """A successful call should reset the failure counter."""
-        args = {}
-        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES - 1):
-            _record_tool_failure("write_file", args)
-        # Success clears failures
-        _clear_tool_failures("write_file")
-        # Should be able to fail again without tripping
-        for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES - 1):
-            _record_tool_failure("write_file", args)
-        assert _check_circuit_breaker("write_file", args) is None
-
-    def test_no_tracker_returns_none(self):
-        """If tracker is not initialized, circuit breaker should not trip."""
-        _consecutive_tool_failures.set(None)  # type: ignore[arg-type]
-        _record_tool_failure("write_file", {})  # should not raise
-        assert _check_circuit_breaker("write_file", {}) is None
--- a/autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
@@ -16,7 +16,6 @@ from typing import TYPE_CHECKING, Any
 from claude_agent_sdk import create_sdk_mcp_server, tool

 from backend.copilot.context import (
-    _current_permissions,
    _current_project_dir,
    _current_sandbox,
    _current_sdk_cwd,
@@ -42,8 +41,6 @@ from .e2b_file_tools import E2B_FILE_TOOL_NAMES, E2B_FILE_TOOLS
 if TYPE_CHECKING:
    from e2b import AsyncSandbox

-    from backend.copilot.permissions import CopilotPermissions
-
 logger = logging.getLogger(__name__)

 # Max MCP response size in chars — keeps tool output under the SDK's 10 MB JSON buffer.
@@ -53,14 +50,6 @@ _MCP_MAX_CHARS = 500_000
 MCP_SERVER_NAME = "copilot"
 MCP_TOOL_PREFIX = f"mcp__{MCP_SERVER_NAME}__"

-# Map from tool_name -> Queue of pre-launched (task, args) pairs.
-# Initialised per-session in set_execution_context() so concurrent sessions
-# never share the same dict.
-_TaskQueueItem = tuple[asyncio.Task[dict[str, Any]], dict[str, Any]]
-_tool_task_queues: ContextVar[dict[str, asyncio.Queue[_TaskQueueItem]] | None] = (
-    ContextVar("_tool_task_queues", default=None)
-)
-
 # Stash for MCP tool outputs before the SDK potentially truncates them.
 # Keyed by tool_name → full output string. Consumed (popped) by the
 # response adapter when it builds StreamToolOutputAvailable.
@@ -77,23 +66,12 @@ _stash_event: ContextVar[asyncio.Event | None] = ContextVar(
    "_stash_event", default=None
 )

-# Circuit breaker: tracks consecutive tool failures to detect infinite retry loops.
-# When a tool is called repeatedly with empty/identical args and keeps failing,
-# this counter is incremented.  After _MAX_CONSECUTIVE_TOOL_FAILURES identical
-# failures the tool handler returns a hard-stop message instead of the raw error.
-_MAX_CONSECUTIVE_TOOL_FAILURES = 3
-_consecutive_tool_failures: ContextVar[dict[str, int]] = ContextVar(
-    "_consecutive_tool_failures",
-    default=None,  # type: ignore[arg-type]
-)
-

 def set_execution_context(
    user_id: str | None,
    session: ChatSession,
    sandbox: "AsyncSandbox | None" = None,
    sdk_cwd: str | None = None,
-    permissions: "CopilotPermissions | None" = None,
 ) -> None:
    """Set the execution context for tool calls.

@@ -105,83 +83,14 @@ def set_execution_context(
        session: Current chat session.
        sandbox: Optional E2B sandbox; when set, bash_exec routes commands there.
        sdk_cwd: SDK working directory; used to scope tool-results reads.
-        permissions: Optional capability filter restricting tools/blocks.
    """
    _current_user_id.set(user_id)
    _current_session.set(session)
    _current_sandbox.set(sandbox)
    _current_sdk_cwd.set(sdk_cwd or "")
    _current_project_dir.set(_encode_cwd_for_cli(sdk_cwd) if sdk_cwd else "")
-    _current_permissions.set(permissions)
    _pending_tool_outputs.set({})
    _stash_event.set(asyncio.Event())
-    _tool_task_queues.set({})
-    _consecutive_tool_failures.set({})
-
-
-def reset_stash_event() -> None:
-    """Clear any stale stash signal left over from a previous stream attempt.
-
-    ``_stash_event`` is set once per session in ``set_execution_context`` and
-    reused across retry attempts.  A PostToolUse hook from a failed attempt may
-    leave the event set; calling this at the start of each retry prevents
-    ``wait_for_stash`` from returning prematurely on a stale signal.
-    """
-    event = _stash_event.get(None)
-    if event is not None:
-        event.clear()
-
-
-async def cancel_pending_tool_tasks() -> None:
-    """Cancel all queued pre-launched tasks for the current execution context.
-
-    Call this when a stream attempt aborts (error, cancellation) to prevent
-    pre-launched tasks from continuing to execute against a rolled-back session.
-    Tasks that are already done are skipped; in-flight tasks are cancelled and
-    awaited so that any cleanup (``finally`` blocks, DB rollbacks) completes
-    before the next retry starts.
-    """
-    queues = _tool_task_queues.get()
-    if not queues:
-        return
-    cancelled_tasks: list[asyncio.Task] = []
-    for tool_name, queue in list(queues.items()):
-        cancelled = 0
-        while not queue.empty():
-            task, _args = queue.get_nowait()
-            if not task.done():
-                task.cancel()
-                cancelled_tasks.append(task)
-                cancelled += 1
-        if cancelled:
-            logger.debug(
-                "Cancelled %d pre-launched task(s) for tool '%s'", cancelled, tool_name
-            )
-    queues.clear()
-    # Await all cancelled tasks so their cleanup (finally blocks, DB rollbacks)
-    # completes before the next retry attempt starts new pre-launches.
-    # Use a timeout to prevent hanging indefinitely if a task's cleanup is stuck.
-    if cancelled_tasks:
-        try:
-            await asyncio.wait_for(
-                asyncio.gather(*cancelled_tasks, return_exceptions=True),
-                timeout=5.0,
-            )
-        except TimeoutError:
-            logger.warning(
-                "Timed out waiting for %d cancelled task(s) to clean up",
-                len(cancelled_tasks),
-            )
-
-
-def reset_tool_failure_counters() -> None:
-    """Reset all tool-level circuit breaker counters.
-
-    Called at the start of each SDK retry attempt so that failure counts
-    from a previous (rolled-back) attempt do not carry over and prematurely
-    trip the breaker on a fresh attempt with different context.
-    """
-    _consecutive_tool_failures.set({})


 def pop_pending_tool_output(tool_name: str) -> str | None:
@@ -246,13 +155,12 @@ async def wait_for_stash(timeout: float = 2.0) -> bool:
    by waiting on the ``_stash_event``, which is signaled by
    :func:`stash_pending_tool_output`.

-    Uses ``asyncio.Event.wait()`` so it returns the instant the hook signals —
-    the timeout is purely a safety net for the case where the hook never fires.
-    Returns ``True`` if the stash signal was received, ``False`` on timeout.
+    Returns ``True`` if a stash signal was received, ``False`` on timeout.

-    The 2.0 s default was chosen to accommodate slower tool startup in cloud
-    sandboxes while still failing fast when the hook genuinely will not fire.
-    With the parallel pre-launch path, hooks typically fire well under 1 ms.
+    The 2.0 s default was chosen based on production metrics: the original
+    0.5 s caused frequent timeouts under load (parallel tool calls, large
+    outputs).  2.0 s gives a comfortable margin while still failing fast
+    when the hook genuinely will not fire.
    """
    event = _stash_event.get(None)
    if event is None:
@@ -261,7 +169,7 @@ async def wait_for_stash(timeout: float = 2.0) -> bool:
    if event.is_set():
        event.clear()
        return True
-    # Slow path: block until the hook signals or the safety timeout expires.
+    # Slow path: wait for the hook to signal.
    try:
        async with asyncio.timeout(timeout):
            await event.wait()
@@ -271,82 +179,6 @@ async def wait_for_stash(timeout: float = 2.0) -> bool:
        return False


-async def pre_launch_tool_call(tool_name: str, args: dict[str, Any]) -> None:
-    """Pre-launch a tool as a background task so parallel calls run concurrently.
-
-    Called when an AssistantMessage with ToolUseBlocks is received, before the
-    SDK dispatches the MCP tool/call requests. The tool_handler will await the
-    pre-launched task instead of executing fresh.
-
-    The tool_name may include an MCP prefix (e.g. ``mcp__copilot__run_block``);
-    the prefix is stripped automatically before looking up the tool.
-
-    Ordering guarantee: the Claude Agent SDK dispatches MCP ``tools/call`` requests
-    in the same order as the ToolUseBlocks appear in the AssistantMessage.
-    Pre-launched tasks are queued FIFO per tool name, so the N-th handler for a
-    given tool name dequeues the N-th pre-launched task — result and args always
-    correspond when the SDK preserves order (which it does in the current SDK).
-    """
-    queues = _tool_task_queues.get()
-    if queues is None:
-        return
-
-    # Strip the MCP server prefix (e.g. "mcp__copilot__") to get the bare tool name.
-    # Use removeprefix so tool names that themselves contain "__" are handled correctly.
-    bare_name = tool_name.removeprefix(MCP_TOOL_PREFIX)
-
-    base_tool = TOOL_REGISTRY.get(bare_name)
-    if base_tool is None:
-        return
-
-    user_id, session = get_execution_context()
-    if session is None:
-        return
-
-    # Expand @@agptfile: references before launching the task.
-    # The _truncating wrapper (which normally handles expansion) runs AFTER
-    # pre_launch_tool_call — the pre-launched task would otherwise receive raw
-    # @@agptfile: tokens and fail to resolve them inside _execute_tool_sync.
-    # Use _build_input_schema (same path as _truncating) for schema-aware expansion.
-    input_schema: dict[str, Any] | None
-    try:
-        input_schema = _build_input_schema(base_tool)
-    except Exception:
-        input_schema = None  # schema unavailable — skip schema-aware expansion
-    try:
-        args = await expand_file_refs_in_args(
-            args, user_id, session, input_schema=input_schema
-        )
-    except FileRefExpansionError as exc:
-        logger.warning(
-            "pre_launch_tool_call: @@agptfile expansion failed for %s: %s — skipping pre-launch",
-            bare_name,
-            exc,
-        )
-        return
-
-    task = asyncio.create_task(_execute_tool_sync(base_tool, user_id, session, args))
-    # Log unhandled exceptions so "Task exception was never retrieved" warnings
-    # do not pollute stderr when a task is pre-launched but never dequeued.
-    task.add_done_callback(
-        lambda t, name=bare_name: (
-            logger.warning(
-                "Pre-launched task for %s raised unhandled: %s",
-                name,
-                t.exception(),
-            )
-            if not t.cancelled() and t.exception()
-            else None
-        )
-    )
-
-    if bare_name not in queues:
-        queues[bare_name] = asyncio.Queue[_TaskQueueItem]()
-    # Store (task, args) so the handler can log a warning if the SDK dispatches
-    # calls in a different order than the ToolUseBlocks appeared in the message.
-    queues[bare_name].put_nowait((task, args))
-
-
 async def _execute_tool_sync(
    base_tool: BaseTool,
    user_id: str | None,
@@ -355,10 +187,8 @@ async def _execute_tool_sync(
 ) -> dict[str, Any]:
    """Execute a tool synchronously and return MCP-formatted response.

-    Note: ``@@agptfile:`` expansion should be performed by the caller before
-    invoking this function.  For the normal (non-parallel) path it is handled
-    by the ``_truncating`` wrapper; for the pre-launched parallel path it is
-    handled in :func:`pre_launch_tool_call` before the task is created.
+    Note: ``@@agptfile:`` expansion is handled upstream in the ``_truncating`` wrapper
+    so all registered handlers (BaseTool, E2B, Read) expand uniformly.
    """
    effective_id = f"sdk-{uuid.uuid4().hex[:12]}"
    result = await base_tool.execute(
@@ -387,66 +217,6 @@ def _mcp_error(message: str) -> dict[str, Any]:
    }


-def _failure_key(tool_name: str, args: dict[str, Any]) -> str:
-    """Compute a stable fingerprint for (tool_name, args) used by the circuit breaker."""
-    args_key = json.dumps(args, sort_keys=True, default=str)
-    return f"{tool_name}:{args_key}"
-
-
-def _check_circuit_breaker(tool_name: str, args: dict[str, Any]) -> str | None:
-    """Check if a tool has hit the consecutive failure limit.
-
-    Tracks failures keyed by (tool_name, args_fingerprint). Returns an error
-    message if the circuit breaker has tripped, or None if the call should proceed.
-    """
-    tracker = _consecutive_tool_failures.get(None)
-    if tracker is None:
-        return None
-
-    key = _failure_key(tool_name, args)
-    count = tracker.get(key, 0)
-    if count >= _MAX_CONSECUTIVE_TOOL_FAILURES:
-        logger.warning(
-            "Circuit breaker tripped for tool %s after %d consecutive "
-            "identical failures (args=%s)",
-            tool_name,
-            count,
-            key[len(tool_name) + 1 :][:200],
-        )
-        return (
-            f"STOP: Tool '{tool_name}' has failed {count} consecutive times with "
-            f"the same arguments. Do NOT retry this tool call. "
-            f"If you were trying to write content to a file, instead respond with "
-            f"the content directly as a text message to the user."
-        )
-    return None
-
-
-def _record_tool_failure(tool_name: str, args: dict[str, Any]) -> None:
-    """Record a tool failure for circuit breaker tracking."""
-    tracker = _consecutive_tool_failures.get(None)
-    if tracker is None:
-        return
-    key = _failure_key(tool_name, args)
-    tracker[key] = tracker.get(key, 0) + 1
-
-
-def _clear_tool_failures(tool_name: str) -> None:
-    """Clear failure tracking for a tool on success.
-
-    Clears ALL args variants for the tool, not just the successful call's args.
-    This gives the tool a "fresh start" on any success, which is appropriate for
-    the primary use case (detecting infinite loops with identical failing args).
-    """
-    tracker = _consecutive_tool_failures.get(None)
-    if tracker is None:
-        return
-    # Clear all entries for this tool name
-    keys_to_remove = [k for k in tracker if k.startswith(f"{tool_name}:")]
-    for k in keys_to_remove:
-        del tracker[k]
-
-
 def create_tool_handler(base_tool: BaseTool):
    """Create an async handler function for a BaseTool.

@@ -455,83 +225,7 @@ def create_tool_handler(base_tool: BaseTool):
    """

    async def tool_handler(args: dict[str, Any]) -> dict[str, Any]:
-        """Execute the wrapped tool and return MCP-formatted response.
-
-        If a pre-launched task exists (from parallel tool pre-launch in the
-        message loop), await it instead of executing fresh.
-        """
-        queues = _tool_task_queues.get()
-        if queues and base_tool.name in queues:
-            queue = queues[base_tool.name]
-            if not queue.empty():
-                task, launch_args = queue.get_nowait()
-                # Sanity-check: warn if the args don't match — this can happen
-                # if the SDK dispatches tool calls in a different order than the
-                # ToolUseBlocks appeared in the AssistantMessage (unlikely but
-                # could occur in future SDK versions or with SDK bugs).
-                # We compare full values (not just keys) so that two run_block
-                # calls with different block_id values are caught even though
-                # both have the same key set.
-                if launch_args != args:
-                    logger.warning(
-                        "Pre-launched task for %s: arg mismatch "
-                        "(launch_keys=%s, call_keys=%s) — cancelling "
-                        "pre-launched task and falling back to direct execution",
-                        base_tool.name,
-                        (
-                            sorted(launch_args.keys())
-                            if isinstance(launch_args, dict)
-                            else type(launch_args).__name__
-                        ),
-                        (
-                            sorted(args.keys())
-                            if isinstance(args, dict)
-                            else type(args).__name__
-                        ),
-                    )
-                    if not task.done():
-                        task.cancel()
-                        # Await cancellation to prevent duplicate concurrent
-                        # execution for blocks with side effects.
-                        try:
-                            await task
-                        except (asyncio.CancelledError, Exception):
-                            pass
-                    # Fall through to the direct-execution path below.
-                else:
-                    # Args match — await the pre-launched task.
-                    try:
-                        result = await task
-                    except asyncio.CancelledError:
-                        # Re-raise: CancelledError may be propagating from the
-                        # outer streaming loop being cancelled — swallowing it
-                        # would mask the cancellation and prevent proper cleanup.
-                        logger.warning(
-                            "Pre-launched tool %s was cancelled — re-raising",
-                            base_tool.name,
-                        )
-                        raise
-                    except Exception as e:
-                        logger.error(
-                            "Pre-launched tool %s failed: %s",
-                            base_tool.name,
-                            e,
-                            exc_info=True,
-                        )
-                        return _mcp_error(
-                            f"Failed to execute {base_tool.name}. "
-                            "Check server logs for details."
-                        )
-
-                    # Pre-truncate the result so the _truncating wrapper (which
-                    # wraps this handler) receives an already-within-budget
-                    # value. _truncating handles stashing — we must NOT stash
-                    # here or the output will be appended twice to the FIFO
-                    # queue and pop_pending_tool_output would return a duplicate
-                    # entry on the second call for the same tool.
-                    return truncate(result, _MCP_MAX_CHARS)
-
-        # No pre-launched task — execute directly (fallback for non-parallel calls).
+        """Execute the wrapped tool and return MCP-formatted response."""
        user_id, session = get_execution_context()

        if session is None:
@@ -540,12 +234,8 @@ def create_tool_handler(base_tool: BaseTool):
        try:
            return await _execute_tool_sync(base_tool, user_id, session, args)
        except Exception as e:
-            logger.error(
-                "Error executing tool %s: %s", base_tool.name, e, exc_info=True
-            )
-            return _mcp_error(
-                f"Failed to execute {base_tool.name}. Check server logs for details."
-            )
+            logger.error(f"Error executing tool {base_tool.name}: {e}", exc_info=True)
+            return _mcp_error(f"Failed to execute {base_tool.name}: {e}")

    return tool_handler

@@ -668,15 +358,6 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
        Applied once to every registered tool."""

        async def wrapper(args: dict[str, Any]) -> dict[str, Any]:
-            # Circuit breaker: stop infinite retry loops with identical args.
-            # Use the original (pre-expansion) args for fingerprinting so
-            # check and record always use the same key — @@agptfile:
-            # expansion mutates args, which would cause a key mismatch.
-            original_args = args
-            stop_msg = _check_circuit_breaker(tool_name, original_args)
-            if stop_msg:
-                return _mcp_error(stop_msg)
-
            user_id, session = get_execution_context()
            if session is not None:
                try:
@@ -684,7 +365,6 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
                        args, user_id, session, input_schema=input_schema
                    )
                except FileRefExpansionError as exc:
-                    _record_tool_failure(tool_name, original_args)
                    return _mcp_error(
                        f"@@agptfile: reference could not be resolved: {exc}. "
                        "Ensure the file exists before referencing it. "
@@ -694,12 +374,6 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
            result = await fn(args)
            truncated = truncate(result, _MCP_MAX_CHARS)

-            # Track consecutive failures for circuit breaker
-            if truncated.get("isError"):
-                _record_tool_failure(tool_name, original_args)
-            else:
-                _clear_tool_failures(tool_name)
-
            # Stash the text so the response adapter can forward our
            # middle-out truncated version to the frontend instead of the
            # SDK's head-truncated version (for outputs >~100 KB the SDK
--- a/autogpt_platform/backend/backend/copilot/sdk/tool_adapter_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/tool_adapter_test.py
@@ -1,26 +1,16 @@
-"""Tests for tool_adapter helpers: truncation, stash, context vars, parallel pre-launch."""
-
-import asyncio
-from unittest.mock import AsyncMock, MagicMock, patch
+"""Tests for tool_adapter helpers: truncation, stash, context vars."""

 import pytest

 from backend.copilot.context import get_sdk_cwd
-from backend.copilot.response_model import StreamToolOutputAvailable
-from backend.copilot.sdk.file_ref import FileRefExpansionError
 from backend.util.truncate import truncate

 from .tool_adapter import (
    _MCP_MAX_CHARS,
    _text_from_mcp_result,
-    cancel_pending_tool_tasks,
-    create_tool_handler,
    pop_pending_tool_output,
-    pre_launch_tool_call,
-    reset_stash_event,
    set_execution_context,
    stash_pending_tool_output,
-    wait_for_stash,
 )

 # ---------------------------------------------------------------------------
@@ -130,69 +120,6 @@ class TestToolOutputStash:
        assert pop_pending_tool_output("a") == "alpha"


-# ---------------------------------------------------------------------------
-# reset_stash_event / wait_for_stash
-# ---------------------------------------------------------------------------
-
-
-class TestResetStashEvent:
-    """Tests for reset_stash_event — the stale-signal fix for retry attempts."""
-
-    @pytest.fixture(autouse=True)
-    def _init_context(self):
-        set_execution_context(
-            user_id="test",
-            session=None,  # type: ignore[arg-type]
-            sandbox=None,
-        )
-
-    @pytest.mark.asyncio
-    async def test_reset_clears_stale_signal(self):
-        """After reset, wait_for_stash does NOT return immediately (blocks until timeout)."""
-        # Simulate a stale signal left by a failed attempt's PostToolUse hook.
-        stash_pending_tool_output("some_tool", "stale output")
-        # The stash_pending_tool_output call sets the event.
-        # Now reset it — simulating start of a new retry attempt.
-        reset_stash_event()
-        # wait_for_stash should block and time out since the event was cleared.
-        result = await wait_for_stash(timeout=0.05)
-        assert result is False, (
-            "wait_for_stash should have timed out after reset_stash_event, "
-            "but it returned True — stale signal was not cleared"
-        )
-
-    @pytest.mark.asyncio
-    async def test_wait_returns_true_when_signaled_after_reset(self):
-        """After reset, a new stash signal is correctly detected."""
-        reset_stash_event()
-
-        async def _signal_after_delay():
-            await asyncio.sleep(0.01)
-            stash_pending_tool_output("tool", "fresh output")
-
-        asyncio.create_task(_signal_after_delay())
-        result = await wait_for_stash(timeout=1.0)
-        assert result is True
-
-    @pytest.mark.asyncio
-    async def test_retry_scenario_stale_event_does_not_fire_prematurely(self):
-        """Simulates: attempt 1 leaves event set → reset → attempt 2 waits correctly."""
-        # Attempt 1: hook fires and sets the event
-        stash_pending_tool_output("t", "attempt-1-output")
-        # Pop it so the stash is empty (simulating normal consumption)
-        pop_pending_tool_output("t")
-
-        # Between attempts: reset (as service.py does before each retry)
-        reset_stash_event()
-
-        # Attempt 2: wait_for_stash should NOT return True immediately
-        result = await wait_for_stash(timeout=0.05)
-        assert result is False, (
-            "Stale event from attempt 1 caused wait_for_stash to return "
-            "prematurely in attempt 2"
-        )
-
-
 # ---------------------------------------------------------------------------
 # _truncating wrapper (integration via create_copilot_mcp_server)
 # ---------------------------------------------------------------------------
@@ -241,534 +168,3 @@ class TestTruncationAndStashIntegration:
        text = _text_from_mcp_result(truncated)
        assert len(text) < len(big_text)
        assert len(str(truncated)) <= _MCP_MAX_CHARS
-
-
-# ---------------------------------------------------------------------------
-# Parallel pre-launch infrastructure
-# ---------------------------------------------------------------------------
-
-
-def _make_mock_tool(name: str, output: str = "result") -> MagicMock:
-    """Return a BaseTool mock that returns a successful StreamToolOutputAvailable."""
-    tool = MagicMock()
-    tool.name = name
-    tool.parameters = {"properties": {}, "required": []}
-    tool.execute = AsyncMock(
-        return_value=StreamToolOutputAvailable(
-            toolCallId="test-id",
-            output=output,
-            toolName=name,
-            success=True,
-        )
-    )
-    return tool
-
-
-def _make_mock_session() -> MagicMock:
-    """Return a minimal ChatSession mock."""
-    return MagicMock()
-
-
-def _init_ctx(session=None):
-    set_execution_context(
-        user_id="user-1",
-        session=session,  # type: ignore[arg-type]
-        sandbox=None,
-    )
-
-
-class TestPreLaunchToolCall:
-    """Tests for pre_launch_tool_call and the queue-based parallel dispatch."""
-
-    @pytest.fixture(autouse=True)
-    def _init(self):
-        _init_ctx(session=_make_mock_session())
-
-    @pytest.mark.asyncio
-    async def test_unknown_tool_is_silently_ignored(self):
-        """pre_launch_tool_call does nothing for tools not in TOOL_REGISTRY."""
-        # Should not raise even if the tool name is completely unknown
-        await pre_launch_tool_call("nonexistent_tool", {})
-
-    @pytest.mark.asyncio
-    async def test_mcp_prefix_stripped_before_registry_lookup(self):
-        """mcp__copilot__run_block is looked up as 'run_block'."""
-        mock_tool = _make_mock_tool("run_block")
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("mcp__copilot__run_block", {"block_id": "b1"})
-
-        # The task was enqueued — mock_tool.execute should be called once
-        # (may not complete immediately but should start)
-        await asyncio.sleep(0)  # yield to event loop
-        mock_tool.execute.assert_awaited_once()
-
-    @pytest.mark.asyncio
-    async def test_bare_tool_name_without_prefix(self):
-        """Tool names without __ separator are looked up as-is."""
-        mock_tool = _make_mock_tool("run_block")
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("run_block", {"block_id": "b1"})
-
-        await asyncio.sleep(0)
-        mock_tool.execute.assert_awaited_once()
-
-    @pytest.mark.asyncio
-    async def test_task_enqueued_fifo_for_same_tool(self):
-        """Two pre-launched calls for the same tool name are enqueued FIFO."""
-        results = []
-
-        async def slow_execute(*args, **kwargs):
-            results.append(len(results))
-            return StreamToolOutputAvailable(
-                toolCallId="id",
-                output=str(len(results) - 1),
-                toolName="t",
-                success=True,
-            )
-
-        mock_tool = _make_mock_tool("t")
-        mock_tool.execute = AsyncMock(side_effect=slow_execute)
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"t": mock_tool},
-        ):
-            await pre_launch_tool_call("t", {"n": 1})
-            await pre_launch_tool_call("t", {"n": 2})
-            await asyncio.sleep(0)
-
-        assert mock_tool.execute.await_count == 2
-
-    @pytest.mark.asyncio
-    async def test_file_ref_expansion_failure_skips_pre_launch(self):
-        """When @@agptfile: expansion fails, pre_launch_tool_call skips the task.
-
-        The handler should then fall back to direct execution (which will also
-        fail with a proper MCP error via _truncating's own expansion).
-        """
-        mock_tool = _make_mock_tool("run_block", output="should-not-execute")
-
-        with (
-            patch(
-                "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-                {"run_block": mock_tool},
-            ),
-            patch(
-                "backend.copilot.sdk.tool_adapter.expand_file_refs_in_args",
-                AsyncMock(side_effect=FileRefExpansionError("@@agptfile:missing.txt")),
-            ),
-        ):
-            # Should not raise — expansion failure is handled gracefully
-            await pre_launch_tool_call("run_block", {"text": "@@agptfile:missing.txt"})
-            await asyncio.sleep(0)
-
-        # No task was pre-launched — execute was not called
-        mock_tool.execute.assert_not_awaited()
-
-
-class TestCreateToolHandlerParallel:
-    """Tests for create_tool_handler using pre-launched tasks."""
-
-    @pytest.fixture(autouse=True)
-    def _init(self):
-        _init_ctx(session=_make_mock_session())
-
-    @pytest.mark.asyncio
-    async def test_handler_uses_prelaunched_task(self):
-        """Handler pops and awaits the pre-launched task rather than re-executing."""
-        mock_tool = _make_mock_tool("run_block", output="pre-launched result")
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("run_block", {"block_id": "b1"})
-            await asyncio.sleep(0)  # let task start
-
-            handler = create_tool_handler(mock_tool)
-            result = await handler({"block_id": "b1"})
-
-        assert result["isError"] is False
-        text = result["content"][0]["text"]
-        assert "pre-launched result" in text
-        # Should only have been called once (the pre-launched task), not twice
-        mock_tool.execute.assert_awaited_once()
-
-    @pytest.mark.asyncio
-    async def test_handler_does_not_double_stash_for_prelaunched_task(self):
-        """Pre-launched task result must NOT be stashed by tool_handler directly.
-
-        The _truncating wrapper wraps tool_handler and handles stashing after
-        tool_handler returns.  If tool_handler also stashed, the output would be
-        appended twice to the FIFO queue and pop_pending_tool_output would return
-        a duplicate on the second call.
-
-        This test calls tool_handler directly (without _truncating) and asserts
-        that nothing was stashed — confirming stashing is deferred to _truncating.
-        """
-        mock_tool = _make_mock_tool("run_block", output="stash-me")
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("run_block", {"block_id": "b1"})
-            await asyncio.sleep(0)
-
-            handler = create_tool_handler(mock_tool)
-            result = await handler({"block_id": "b1"})
-
-        assert result["isError"] is False
-        assert "stash-me" in result["content"][0]["text"]
-        # tool_handler must NOT stash — _truncating (which wraps handler) does it.
-        # Calling pop here (without going through _truncating) should return None.
-        not_stashed = pop_pending_tool_output("run_block")
-        assert not_stashed is None, (
-            "tool_handler must not stash directly — _truncating handles stashing "
-            "to prevent double-stash in the FIFO queue"
-        )
-
-    @pytest.mark.asyncio
-    async def test_handler_falls_back_when_queue_empty(self):
-        """When no pre-launched task exists, handler executes directly."""
-        mock_tool = _make_mock_tool("run_block", output="direct result")
-
-        # Don't call pre_launch_tool_call — queue is empty
-        handler = create_tool_handler(mock_tool)
-        result = await handler({"block_id": "b1"})
-
-        assert result["isError"] is False
-        text = result["content"][0]["text"]
-        assert "direct result" in text
-        mock_tool.execute.assert_awaited_once()
-
-    @pytest.mark.asyncio
-    async def test_handler_cancelled_error_propagates(self):
-        """CancelledError from a pre-launched task is re-raised to preserve cancellation semantics."""
-        mock_tool = _make_mock_tool("run_block")
-        mock_tool.execute = AsyncMock(side_effect=asyncio.CancelledError())
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("run_block", {"block_id": "b1"})
-            await asyncio.sleep(0)
-
-            handler = create_tool_handler(mock_tool)
-            with pytest.raises(asyncio.CancelledError):
-                await handler({"block_id": "b1"})
-
-    @pytest.mark.asyncio
-    async def test_handler_exception_returns_mcp_error(self):
-        """Exception from a pre-launched task is caught and returned as MCP error."""
-        mock_tool = _make_mock_tool("run_block")
-        mock_tool.execute = AsyncMock(side_effect=RuntimeError("block exploded"))
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("run_block", {"block_id": "b1"})
-            await asyncio.sleep(0)
-
-            handler = create_tool_handler(mock_tool)
-            result = await handler({"block_id": "b1"})
-
-        assert result["isError"] is True
-        assert "Failed to execute run_block" in result["content"][0]["text"]
-
-    @pytest.mark.asyncio
-    async def test_two_same_tool_calls_dispatched_in_order(self):
-        """Two pre-launched tasks for the same tool are consumed in FIFO order."""
-        call_order = []
-
-        async def execute_with_tag(*args, **kwargs):
-            tag = kwargs.get("block_id", "?")
-            call_order.append(tag)
-            return StreamToolOutputAvailable(
-                toolCallId="id", output=f"out-{tag}", toolName="run_block", success=True
-            )
-
-        mock_tool = _make_mock_tool("run_block")
-        mock_tool.execute = AsyncMock(side_effect=execute_with_tag)
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("run_block", {"block_id": "first"})
-            await pre_launch_tool_call("run_block", {"block_id": "second"})
-            await asyncio.sleep(0)
-
-            handler = create_tool_handler(mock_tool)
-            r1 = await handler({"block_id": "first"})
-            r2 = await handler({"block_id": "second"})
-
-        assert "out-first" in r1["content"][0]["text"]
-        assert "out-second" in r2["content"][0]["text"]
-        assert call_order == [
-            "first",
-            "second",
-        ], f"Expected FIFO dispatch order but got {call_order}"
-
-    @pytest.mark.asyncio
-    async def test_arg_mismatch_falls_back_to_direct_execution(self):
-        """When pre-launched args differ from SDK args, handler cancels pre-launched
-        task and falls back to direct execution with the correct args."""
-        mock_tool = _make_mock_tool("run_block", output="direct-result")
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            # Pre-launch with args {"block_id": "wrong"}
-            await pre_launch_tool_call("run_block", {"block_id": "wrong"})
-            await asyncio.sleep(0)
-
-            # SDK dispatches with different args
-            handler = create_tool_handler(mock_tool)
-            result = await handler({"block_id": "correct"})
-
-        assert result["isError"] is False
-        # The tool was called twice: once by pre-launch (wrong args), once by
-        # direct fallback (correct args). The result should come from the
-        # direct execution path.
-        assert mock_tool.execute.await_count == 2
-
-    @pytest.mark.asyncio
-    async def test_no_session_falls_back_gracefully(self):
-        """When session is None and no pre-launched task, handler returns MCP error."""
-        mock_tool = _make_mock_tool("run_block")
-        # session=None means get_execution_context returns (user_id, None)
-        set_execution_context(user_id="u", session=None, sandbox=None)  # type: ignore[arg-type]
-
-        handler = create_tool_handler(mock_tool)
-        result = await handler({"block_id": "b1"})
-
-        assert result["isError"] is True
-        assert "session" in result["content"][0]["text"].lower()
-
-
-# ---------------------------------------------------------------------------
-# cancel_pending_tool_tasks
-# ---------------------------------------------------------------------------
-
-
-class TestCancelPendingToolTasks:
-    """Tests for cancel_pending_tool_tasks — the stream-abort cleanup helper."""
-
-    @pytest.fixture(autouse=True)
-    def _init(self):
-        _init_ctx(session=_make_mock_session())
-
-    @pytest.mark.asyncio
-    async def test_cancels_queued_tasks(self):
-        """Queued tasks are cancelled and the queue is cleared."""
-        ran = False
-
-        async def never_run(*_args, **_kwargs):
-            nonlocal ran
-            await asyncio.sleep(10)  # long enough to still be pending
-            ran = True
-
-        mock_tool = _make_mock_tool("run_block")
-        mock_tool.execute = AsyncMock(side_effect=never_run)
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("run_block", {"block_id": "b1"})
-            await asyncio.sleep(0)  # let task start
-            await cancel_pending_tool_tasks()
-            await asyncio.sleep(0)  # let cancellation propagate
-
-        assert not ran, "Task should have been cancelled before completing"
-
-    @pytest.mark.asyncio
-    async def test_noop_when_no_tasks_queued(self):
-        """cancel_pending_tool_tasks does not raise when queues are empty."""
-        await cancel_pending_tool_tasks()  # should not raise
-
-    @pytest.mark.asyncio
-    async def test_handler_does_not_find_cancelled_task(self):
-        """After cancel, tool_handler falls back to direct execution."""
-        mock_tool = _make_mock_tool("run_block", output="direct-fallback")
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"run_block": mock_tool},
-        ):
-            await pre_launch_tool_call("run_block", {"block_id": "b1"})
-            await asyncio.sleep(0)
-            await cancel_pending_tool_tasks()
-
-            # Queue is now empty — handler should execute directly
-            handler = create_tool_handler(mock_tool)
-            result = await handler({"block_id": "b1"})
-
-        assert result["isError"] is False
-        assert "direct-fallback" in result["content"][0]["text"]
-
-
-# ---------------------------------------------------------------------------
-# Concurrent / parallel pre-launch scenarios
-# ---------------------------------------------------------------------------
-
-
-class TestAllParallelToolsPrelaunchedIndependently:
-    """Simulate SDK sending N separate AssistantMessages for the same tool concurrently."""
-
-    @pytest.fixture(autouse=True)
-    def _init(self):
-        _init_ctx(session=_make_mock_session())
-
-    @pytest.mark.asyncio
-    async def test_all_parallel_tools_prelaunched_independently(self):
-        """5 pre-launches for the same tool all enqueue independently and run concurrently.
-
-        Each task sleeps for PER_TASK_S seconds. If they ran sequentially the total
-        wall time would be ~5*PER_TASK_S. Running concurrently it should finish in
-        roughly PER_TASK_S (plus scheduling overhead).
-        """
-        PER_TASK_S = 0.05
-        N = 5
-        started: list[int] = []
-        finished: list[int] = []
-
-        async def slow_execute(*args, **kwargs):
-            idx = len(started)
-            started.append(idx)
-            await asyncio.sleep(PER_TASK_S)
-            finished.append(idx)
-            return StreamToolOutputAvailable(
-                toolCallId=f"id-{idx}",
-                output=f"result-{idx}",
-                toolName="bash_exec",
-                success=True,
-            )
-
-        mock_tool = _make_mock_tool("bash_exec")
-        mock_tool.execute = AsyncMock(side_effect=slow_execute)
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"bash_exec": mock_tool},
-        ):
-            for i in range(N):
-                await pre_launch_tool_call("bash_exec", {"cmd": f"echo {i}"})
-
-            # Measure only the concurrent execution window, not pre-launch overhead.
-            # Starting the timer here avoids false failures on slow CI runners where
-            # the pre_launch_tool_call setup takes longer than the concurrent sleep.
-            t0 = asyncio.get_running_loop().time()
-            await asyncio.sleep(PER_TASK_S * 2)
-            elapsed = asyncio.get_running_loop().time() - t0
-
-        assert mock_tool.execute.await_count == N
-        assert len(finished) == N
-        # Wall time of the sleep window should be well under N * PER_TASK_S
-        # (sequential would be ~0.25s; concurrent finishes in ~PER_TASK_S = 0.05s)
-        assert elapsed < N * PER_TASK_S, (
-            f"Expected concurrent execution (<{N * PER_TASK_S:.2f}s) "
-            f"but sleep window took {elapsed:.2f}s"
-        )
-
-
-class TestHandlerReturnsResultFromCorrectPrelaunchedTask:
-    """Pop pre-launched tasks in order and verify each returns its own result."""
-
-    @pytest.fixture(autouse=True)
-    def _init(self):
-        _init_ctx(session=_make_mock_session())
-
-    @pytest.mark.asyncio
-    async def test_handler_returns_result_from_correct_prelaunched_task(self):
-        """Two pre-launches for the same tool: first handler gets first result, second gets second."""
-
-        async def execute_with_cmd(*args, **kwargs):
-            cmd = kwargs.get("cmd", "?")
-            return StreamToolOutputAvailable(
-                toolCallId="id",
-                output=f"output-for-{cmd}",
-                toolName="bash_exec",
-                success=True,
-            )
-
-        mock_tool = _make_mock_tool("bash_exec")
-        mock_tool.execute = AsyncMock(side_effect=execute_with_cmd)
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"bash_exec": mock_tool},
-        ):
-            await pre_launch_tool_call("bash_exec", {"cmd": "alpha"})
-            await pre_launch_tool_call("bash_exec", {"cmd": "beta"})
-            await asyncio.sleep(0)  # let both tasks start
-
-            handler = create_tool_handler(mock_tool)
-            r1 = await handler({"cmd": "alpha"})
-            r2 = await handler({"cmd": "beta"})
-
-        text1 = r1["content"][0]["text"]
-        text2 = r2["content"][0]["text"]
-        assert "output-for-alpha" in text1, f"Expected alpha result, got: {text1}"
-        assert "output-for-beta" in text2, f"Expected beta result, got: {text2}"
-        assert mock_tool.execute.await_count == 2
-
-
-class TestFiveConcurrentPrelaunchAllComplete:
-    """Pre-launch 5 tasks; consume all 5 via handlers; assert all succeed."""
-
-    @pytest.fixture(autouse=True)
-    def _init(self):
-        _init_ctx(session=_make_mock_session())
-
-    @pytest.mark.asyncio
-    async def test_five_concurrent_prelaunch_all_complete(self):
-        """All 5 pre-launched tasks complete and return successful results."""
-        N = 5
-        call_count = 0
-
-        async def counting_execute(*args, **kwargs):
-            nonlocal call_count
-            call_count += 1
-            n = call_count
-            return StreamToolOutputAvailable(
-                toolCallId=f"id-{n}",
-                output=f"done-{n}",
-                toolName="bash_exec",
-                success=True,
-            )
-
-        mock_tool = _make_mock_tool("bash_exec")
-        mock_tool.execute = AsyncMock(side_effect=counting_execute)
-
-        with patch(
-            "backend.copilot.sdk.tool_adapter.TOOL_REGISTRY",
-            {"bash_exec": mock_tool},
-        ):
-            for i in range(N):
-                await pre_launch_tool_call("bash_exec", {"cmd": f"task-{i}"})
-
-            await asyncio.sleep(0)  # let all tasks start
-
-            handler = create_tool_handler(mock_tool)
-            results = []
-            for i in range(N):
-                results.append(await handler({"cmd": f"task-{i}"}))
-
-        assert (
-            mock_tool.execute.await_count == N
-        ), f"Expected {N} execute calls, got {mock_tool.execute.await_count}"
-        for i, result in enumerate(results):
-            assert result["isError"] is False, f"Result {i} should not be an error"
-            text = result["content"][0]["text"]
-            assert "done-" in text, f"Result {i} missing expected output: {text}"
--- a/autogpt_platform/backend/backend/copilot/tools/agent_browser.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_browser.py
@@ -20,9 +20,9 @@ SSRF protection:

 Requires:
  npm install -g agent-browser
-  In Docker: system chromium package with AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
-             (set automatically — no `agent-browser install` needed).
-  Locally: run `agent-browser install` to download Chromium.
+  agent-browser install   (downloads Chromium, one-time — skipped in Docker
+                           where system chromium is pre-installed and
+                           AGENT_BROWSER_EXECUTABLE_PATH is set)
 """

 import asyncio
--- a/autogpt_platform/backend/backend/copilot/tools/agent_browser_integration_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_browser_integration_test.py
@@ -1,351 +0,0 @@
-"""Integration tests for agent-browser + system chromium.
-
-These tests actually invoke the agent-browser binary via subprocess and require:
-  - agent-browser installed (npm install -g agent-browser)
-  - AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium (set in Docker)
-
-Run with:
-    poetry run test
-
-Or to run only this file:
-    poetry run pytest backend/copilot/tools/agent_browser_integration_test.py -v -p no:autogpt_platform
-
-Skipped automatically when agent-browser binary is not found.
-Tests that hit external sites are marked ``integration`` and skipped by default
-in CI (use ``-m integration`` to include them).
-
-Two test tiers:
-  - CLI tests: call agent-browser subprocess directly (no backend imports needed)
-  - Tool class tests: call BrowserNavigateTool/BrowserActTool._execute() directly
-    with user_id=None (skips workspace/DB interactions — no Postgres/RabbitMQ needed)
-"""
-
-import concurrent.futures
-import os
-import shutil
-import subprocess
-import tempfile
-from datetime import datetime, timezone
-from urllib.parse import urlparse
-
-import pytest
-
-from backend.copilot.model import ChatSession
-from backend.copilot.tools.agent_browser import BrowserActTool, BrowserNavigateTool
-from backend.copilot.tools.models import (
-    BrowserActResponse,
-    BrowserNavigateResponse,
-    ErrorResponse,
-)
-
-pytestmark = pytest.mark.skipif(
-    shutil.which("agent-browser") is None,
-    reason="agent-browser binary not found",
-)
-
-_SESSION = "integration-test-session"
-
-
-def _agent_browser(
-    *args: str, session: str = _SESSION, timeout: int = 30
-) -> tuple[int, str, str]:
-    """Run agent-browser for the given session, return (rc, stdout, stderr)."""
-    result = subprocess.run(
-        ["agent-browser", "--session", session, "--session-name", session, *args],
-        capture_output=True,
-        text=True,
-        timeout=timeout,
-    )
-    return result.returncode, result.stdout, result.stderr
-
-
-def _close_session(session: str, timeout: int = 5) -> None:
-    """Best-effort close for a browser session; never raises on failure."""
-    try:
-        subprocess.run(
-            ["agent-browser", "--session", session, "--session-name", session, "close"],
-            capture_output=True,
-            timeout=timeout,
-        )
-    except (subprocess.TimeoutExpired, OSError):
-        pass
-
-
-@pytest.fixture(autouse=True)
-def _teardown():
-    """Close the shared test session after each test (best-effort)."""
-    yield
-    _close_session(_SESSION)
-
-
-# ---------------------------------------------------------------------------
-# Tests
-# ---------------------------------------------------------------------------
-
-
-def test_chromium_executable_env_is_set():
-    """AGENT_BROWSER_EXECUTABLE_PATH must be set and point to an executable binary."""
-    exe = os.environ.get("AGENT_BROWSER_EXECUTABLE_PATH", "")
-    assert exe, "AGENT_BROWSER_EXECUTABLE_PATH is not set"
-    assert os.path.isfile(exe), f"Chromium binary not found at {exe}"
-    assert os.access(exe, os.X_OK), f"Chromium binary at {exe} is not executable"
-
-
-@pytest.mark.integration
-def test_navigate_returns_success():
-    """agent-browser can open a public URL using system chromium."""
-    rc, _, stderr = _agent_browser("open", "https://example.com")
-    assert rc == 0, f"open failed (rc={rc}): {stderr}"
-
-
-@pytest.mark.integration
-def test_get_title_after_navigate():
-    """get title returns the page title after navigation."""
-    rc, _, _ = _agent_browser("open", "https://example.com")
-    assert rc == 0
-
-    rc, stdout, stderr = _agent_browser("get", "title", timeout=10)
-    assert rc == 0, f"get title failed: {stderr}"
-    assert "example" in stdout.lower()
-
-
-@pytest.mark.integration
-def test_get_url_after_navigate():
-    """get url returns the navigated URL."""
-    rc, _, _ = _agent_browser("open", "https://example.com")
-    assert rc == 0
-
-    rc, stdout, stderr = _agent_browser("get", "url", timeout=10)
-    assert rc == 0, f"get url failed: {stderr}"
-    assert urlparse(stdout.strip()).netloc == "example.com"
-
-
-@pytest.mark.integration
-def test_snapshot_returns_interactive_elements():
-    """snapshot -i -c lists interactive elements on the page."""
-    rc, _, _ = _agent_browser("open", "https://example.com")
-    assert rc == 0
-
-    rc, stdout, stderr = _agent_browser("snapshot", "-i", "-c", timeout=15)
-    assert rc == 0, f"snapshot failed: {stderr}"
-    assert len(stdout.strip()) > 0, "snapshot returned empty output"
-
-
-@pytest.mark.integration
-def test_screenshot_produces_valid_png():
-    """screenshot saves a non-empty, valid PNG file."""
-    rc, _, _ = _agent_browser("open", "https://example.com")
-    assert rc == 0
-
-    with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as f:
-        tmp = f.name
-    try:
-        rc, _, stderr = _agent_browser("screenshot", tmp, timeout=15)
-        assert rc == 0, f"screenshot failed: {stderr}"
-        size = os.path.getsize(tmp)
-        assert size > 1000, f"PNG too small ({size} bytes) — likely blank or corrupt"
-        with open(tmp, "rb") as f:
-            assert f.read(4) == b"\x89PNG", "Output is not a valid PNG"
-    finally:
-        os.unlink(tmp)
-
-
-@pytest.mark.integration
-def test_scroll_down():
-    """scroll down succeeds without error."""
-    rc, _, _ = _agent_browser("open", "https://example.com")
-    assert rc == 0
-
-    rc, _, stderr = _agent_browser("scroll", "down", timeout=10)
-    assert rc == 0, f"scroll failed: {stderr}"
-
-
-@pytest.mark.integration
-def test_fill_form_field():
-    """fill writes text into an input field."""
-    rc, _, _ = _agent_browser("open", "https://httpbin.org/forms/post")
-    assert rc == 0
-
-    rc, _, stderr = _agent_browser(
-        "fill", "input[name=custname]", "IntegrationTestUser", timeout=10
-    )
-    assert rc == 0, f"fill failed: {stderr}"
-
-
-@pytest.mark.integration
-def test_concurrent_independent_sessions():
-    """Two independent sessions can navigate in parallel without interference."""
-    session_a = "integration-concurrent-a"
-    session_b = "integration-concurrent-b"
-
-    try:
-        with concurrent.futures.ThreadPoolExecutor(max_workers=2) as pool:
-            fut_a = pool.submit(
-                _agent_browser, "open", "https://example.com", session=session_a
-            )
-            fut_b = pool.submit(
-                _agent_browser, "open", "https://httpbin.org/html", session=session_b
-            )
-            rc_a, _, err_a = fut_a.result(timeout=40)
-            rc_b, _, err_b = fut_b.result(timeout=40)
-        assert rc_a == 0, f"session_a open failed: {err_a}"
-        assert rc_b == 0, f"session_b open failed: {err_b}"
-
-        rc_ua, url_a, err_ua = _agent_browser(
-            "get", "url", session=session_a, timeout=10
-        )
-        rc_ub, url_b, err_ub = _agent_browser(
-            "get", "url", session=session_b, timeout=10
-        )
-        assert rc_ua == 0, f"session_a get url failed: {err_ua}"
-        assert rc_ub == 0, f"session_b get url failed: {err_ub}"
-        assert urlparse(url_a.strip()).netloc == "example.com"
-        assert urlparse(url_b.strip()).netloc == "httpbin.org"
-    finally:
-        _close_session(session_a)
-        _close_session(session_b)
-
-
-@pytest.mark.integration
-def test_close_session():
-    """close shuts down the browser daemon cleanly."""
-    rc, _, _ = _agent_browser("open", "https://example.com")
-    assert rc == 0
-
-    rc, _, stderr = _agent_browser("close", timeout=10)
-    assert rc == 0, f"close failed: {stderr}"
-
-
-# ---------------------------------------------------------------------------
-# Python tool class integration tests
-#
-# These tests exercise the actual BrowserNavigateTool / BrowserActTool Python
-# classes (not just the CLI binary) to verify the full call path — URL
-# validation, subprocess dispatch, response parsing — works with system
-# chromium.  user_id=None skips workspace/DB interactions so no Postgres or
-# RabbitMQ is needed.
-# ---------------------------------------------------------------------------
-
-_TOOL_SESSION_ID = "integration-tool-test-session"
-_TEST_SESSION = ChatSession(
-    session_id=_TOOL_SESSION_ID,
-    user_id="test-user",
-    messages=[],
-    usage=[],
-    started_at=datetime.now(timezone.utc),
-    updated_at=datetime.now(timezone.utc),
-)
-
-
-@pytest.fixture(autouse=False)
-def _close_tool_session():
-    """Tear down the tool-test browser session after each tool test."""
-    yield
-    _close_session(_TOOL_SESSION_ID)
-
-
-@pytest.mark.integration
-@pytest.mark.asyncio
-async def test_tool_navigate_returns_response(_close_tool_session):
-    """BrowserNavigateTool._execute returns a BrowserNavigateResponse with real content."""
-    tool = BrowserNavigateTool()
-    resp = await tool._execute(
-        user_id=None, session=_TEST_SESSION, url="https://example.com"
-    )
-    assert isinstance(
-        resp, BrowserNavigateResponse
-    ), f"Expected BrowserNavigateResponse, got: {resp}"
-    assert urlparse(resp.url).netloc == "example.com"
-    assert resp.title, "Expected non-empty page title"
-    assert resp.snapshot, "Expected non-empty accessibility snapshot"
-
-
-@pytest.mark.asyncio
-@pytest.mark.parametrize(
-    "ssrf_url",
-    [
-        "http://169.254.169.254/",  # AWS/GCP/Azure metadata endpoint
-        "http://127.0.0.1/",  # IPv4 loopback
-        "http://10.0.0.1/",  # RFC-1918 private range
-        "http://[::1]/",  # IPv6 loopback
-        "http://0.0.0.0/",  # Wildcard / INADDR_ANY
-    ],
-)
-async def test_tool_navigate_blocked_url(ssrf_url: str, _close_tool_session):
-    """BrowserNavigateTool._execute rejects internal/private URLs (SSRF guard)."""
-    tool = BrowserNavigateTool()
-    resp = await tool._execute(user_id=None, session=_TEST_SESSION, url=ssrf_url)
-    assert isinstance(
-        resp, ErrorResponse
-    ), f"Expected ErrorResponse for SSRF URL {ssrf_url!r}, got: {resp}"
-    assert resp.error == "blocked_url"
-
-
-@pytest.mark.asyncio
-async def test_tool_navigate_missing_url(_close_tool_session):
-    """BrowserNavigateTool._execute returns an error when url is empty."""
-    tool = BrowserNavigateTool()
-    resp = await tool._execute(user_id=None, session=_TEST_SESSION, url="")
-    assert isinstance(resp, ErrorResponse)
-    assert resp.error == "missing_url"
-
-
-@pytest.mark.integration
-@pytest.mark.asyncio
-async def test_tool_act_scroll(_close_tool_session):
-    """BrowserActTool._execute can scroll after a navigate."""
-    nav = BrowserNavigateTool()
-    nav_resp = await nav._execute(
-        user_id=None, session=_TEST_SESSION, url="https://example.com"
-    )
-    assert isinstance(nav_resp, BrowserNavigateResponse)
-
-    act = BrowserActTool()
-    resp = await act._execute(
-        user_id=None, session=_TEST_SESSION, action="scroll", direction="down"
-    )
-    assert isinstance(
-        resp, BrowserActResponse
-    ), f"Expected BrowserActResponse, got: {resp}"
-    assert resp.action == "scroll"
-
-
-@pytest.mark.integration
-@pytest.mark.asyncio
-async def test_tool_act_fill_and_click(_close_tool_session):
-    """BrowserActTool._execute can fill a form field."""
-    nav = BrowserNavigateTool()
-    nav_resp = await nav._execute(
-        user_id=None, session=_TEST_SESSION, url="https://httpbin.org/forms/post"
-    )
-    assert isinstance(nav_resp, BrowserNavigateResponse)
-
-    act = BrowserActTool()
-    resp = await act._execute(
-        user_id=None,
-        session=_TEST_SESSION,
-        action="fill",
-        target="input[name=custname]",
-        value="ToolIntegrationTest",
-    )
-    assert isinstance(resp, BrowserActResponse), f"fill failed: {resp}"
-
-
-@pytest.mark.asyncio
-async def test_tool_act_missing_action(_close_tool_session):
-    """BrowserActTool._execute returns an error when action is missing."""
-    act = BrowserActTool()
-    resp = await act._execute(user_id=None, session=_TEST_SESSION, action="")
-    assert isinstance(resp, ErrorResponse)
-    assert resp.error == "missing_action"
-
-
-@pytest.mark.asyncio
-async def test_tool_act_missing_target(_close_tool_session):
-    """BrowserActTool._execute returns an error when click target is missing."""
-    act = BrowserActTool()
-    resp = await act._execute(
-        user_id=None, session=_TEST_SESSION, action="click", target=""
-    )
-    assert isinstance(resp, ErrorResponse)
-    assert resp.error == "missing_target"
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
@@ -7,7 +7,7 @@ from typing import Any
 from .helpers import (
    AGENT_EXECUTOR_BLOCK_ID,
    MCP_TOOL_BLOCK_ID,
-    TOOL_ORCHESTRATOR_BLOCK_ID,
+    SMART_DECISION_MAKER_BLOCK_ID,
    AgentDict,
    are_types_compatible,
    generate_uuid,
@@ -31,7 +31,7 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
 _GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
 _TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"

-# Defaults applied to OrchestratorBlock nodes by the fixer.
+# Defaults applied to SmartDecisionMakerBlock nodes by the fixer.
 _SDM_DEFAULTS: dict[str, int | bool] = {
    "agent_mode_max_iterations": 10,
    "conversation_compaction": True,
@@ -1639,8 +1639,8 @@ class AgentFixer:

        return agent

-    def fix_orchestrator_blocks(self, agent: AgentDict) -> AgentDict:
-        """Fix OrchestratorBlock nodes to ensure agent-mode defaults.
+    def fix_smart_decision_maker_blocks(self, agent: AgentDict) -> AgentDict:
+        """Fix SmartDecisionMakerBlock nodes to ensure agent-mode defaults.

        Ensures:
        1. ``agent_mode_max_iterations`` defaults to ``10`` (bounded agent mode)
@@ -1657,7 +1657,7 @@ class AgentFixer:
        nodes = agent.get("nodes", [])

        for node in nodes:
-            if node.get("block_id") != TOOL_ORCHESTRATOR_BLOCK_ID:
+            if node.get("block_id") != SMART_DECISION_MAKER_BLOCK_ID:
                continue

            node_id = node.get("id", "unknown")
@@ -1670,7 +1670,7 @@ class AgentFixer:
                if field not in input_default or input_default[field] is None:
                    input_default[field] = default_value
                    self.add_fix_log(
-                        f"OrchestratorBlock {node_id}: "
+                        f"SmartDecisionMakerBlock {node_id}: "
                        f"Set {field}={default_value!r}"
                    )

@@ -1763,8 +1763,8 @@ class AgentFixer:
        # Apply fixes for MCPToolBlock nodes
        agent = self.fix_mcp_tool_blocks(agent)

-        # Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
-        agent = self.fix_orchestrator_blocks(agent)
+        # Apply fixes for SmartDecisionMakerBlock nodes (agent-mode defaults)
+        agent = self.fix_smart_decision_maker_blocks(agent)

        # Apply fixes for AgentExecutorBlock nodes (sub-agents)
        if library_agents:
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/helpers.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/helpers.py
@@ -12,7 +12,7 @@ __all__ = [
    "AGENT_OUTPUT_BLOCK_ID",
    "AgentDict",
    "MCP_TOOL_BLOCK_ID",
-    "TOOL_ORCHESTRATOR_BLOCK_ID",
+    "SMART_DECISION_MAKER_BLOCK_ID",
    "UUID_REGEX",
    "are_types_compatible",
    "generate_uuid",
@@ -34,7 +34,7 @@ UUID_REGEX = re.compile(r"^" + UUID_RE_STR + r"$")

 AGENT_EXECUTOR_BLOCK_ID = "e189baac-8c20-45a1-94a7-55177ea42565"
 MCP_TOOL_BLOCK_ID = "a0a4b1c2-d3e4-4f56-a7b8-c9d0e1f2a3b4"
-TOOL_ORCHESTRATOR_BLOCK_ID = "3b191d9f-356f-482d-8238-ba04b6d18381"
+SMART_DECISION_MAKER_BLOCK_ID = "3b191d9f-356f-482d-8238-ba04b6d18381"
 AGENT_INPUT_BLOCK_ID = "c0a8e994-ebf1-4a9c-a4d8-89d09c86741b"
 AGENT_OUTPUT_BLOCK_ID = "363ae599-353e-4804-937e-b2ee3cef3da4"

--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/validator.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/validator.py
@@ -10,7 +10,7 @@ from .helpers import (
    AGENT_INPUT_BLOCK_ID,
    AGENT_OUTPUT_BLOCK_ID,
    MCP_TOOL_BLOCK_ID,
-    TOOL_ORCHESTRATOR_BLOCK_ID,
+    SMART_DECISION_MAKER_BLOCK_ID,
    AgentDict,
    are_types_compatible,
    get_defined_property_type,
@@ -827,18 +827,18 @@ class AgentValidator:

        return valid

-    def validate_orchestrator_blocks(
+    def validate_smart_decision_maker_blocks(
        self,
        agent: AgentDict,
        node_lookup: dict[str, dict[str, Any]] | None = None,
    ) -> bool:
-        """Validate that OrchestratorBlock nodes have downstream tools.
+        """Validate that SmartDecisionMakerBlock nodes have downstream tools.

-        Checks that each OrchestratorBlock node has at least one link
+        Checks that each SmartDecisionMakerBlock node has at least one link
        with ``source_name == "tools"`` connecting to a downstream block.
        Without tools, the block has nothing to call and will error at runtime.

-        Returns True if all OrchestratorBlock nodes are valid.
+        Returns True if all SmartDecisionMakerBlock nodes are valid.
        """
        valid = True
        nodes = agent.get("nodes", [])
@@ -848,7 +848,7 @@ class AgentValidator:
        non_tool_block_ids = {AGENT_INPUT_BLOCK_ID, AGENT_OUTPUT_BLOCK_ID}

        for node in nodes:
-            if node.get("block_id") != TOOL_ORCHESTRATOR_BLOCK_ID:
+            if node.get("block_id") != SMART_DECISION_MAKER_BLOCK_ID:
                continue

            node_id = node.get("id", "unknown")
@@ -863,7 +863,7 @@ class AgentValidator:
            max_iter = input_default.get("agent_mode_max_iterations")
            if max_iter is not None and not isinstance(max_iter, int):
                self.add_error(
-                    f"OrchestratorBlock node '{customized_name}' "
+                    f"SmartDecisionMakerBlock node '{customized_name}' "
                    f"({node_id}) has non-integer "
                    f"agent_mode_max_iterations={max_iter!r}. "
                    f"This field must be an integer."
@@ -871,7 +871,7 @@ class AgentValidator:
                valid = False
            elif isinstance(max_iter, int) and max_iter < -1:
                self.add_error(
-                    f"OrchestratorBlock node '{customized_name}' "
+                    f"SmartDecisionMakerBlock node '{customized_name}' "
                    f"({node_id}) has invalid "
                    f"agent_mode_max_iterations={max_iter}. "
                    f"Use -1 for infinite or a positive number for "
@@ -880,7 +880,7 @@ class AgentValidator:
                valid = False
            elif isinstance(max_iter, int) and max_iter > 100:
                self.add_error(
-                    f"OrchestratorBlock node '{customized_name}' "
+                    f"SmartDecisionMakerBlock node '{customized_name}' "
                    f"({node_id}) has agent_mode_max_iterations="
                    f"{max_iter} which is unusually high. Values above "
                    f"100 risk excessive cost and long execution times. "
@@ -890,7 +890,7 @@ class AgentValidator:
                valid = False
            elif max_iter == 0:
                self.add_error(
-                    f"OrchestratorBlock node '{customized_name}' "
+                    f"SmartDecisionMakerBlock node '{customized_name}' "
                    f"({node_id}) has agent_mode_max_iterations=0 "
                    f"(traditional mode). The agent generator only supports "
                    f"agent mode (set to -1 for infinite or a positive "
@@ -908,7 +908,7 @@ class AgentValidator:

            if not has_tools:
                self.add_error(
-                    f"OrchestratorBlock node '{customized_name}' "
+                    f"SmartDecisionMakerBlock node '{customized_name}' "
                    f"({node_id}) has no downstream tool blocks connected. "
                    f"Connect at least one block to its 'tools' output so "
                    f"the AI has tools to call."
@@ -1025,8 +1025,8 @@ class AgentValidator:
                self.validate_mcp_tool_blocks(agent),
            ),
            (
-                "Orchestrator blocks",
-                self.validate_orchestrator_blocks(agent, node_lookup),
+                "SmartDecisionMaker blocks",
+                self.validate_smart_decision_maker_blocks(agent, node_lookup),
            ),
        ]

--- a/autogpt_platform/backend/backend/copilot/tools/conftest.py
+++ b/autogpt_platform/backend/backend/copilot/tools/conftest.py
@@ -1,20 +0,0 @@
-"""Local conftest for copilot/tools tests.
-
-Overrides the session-scoped `server` and `graph_cleanup` autouse fixtures from
-backend/conftest.py so that integration tests in this directory do not trigger
-the full SpinTestServer startup (which requires Postgres + RabbitMQ).
-"""
-
-import pytest_asyncio
-
-
-@pytest_asyncio.fixture(scope="session", loop_scope="session")
-async def server():  # type: ignore[override]
-    """No-op server stub — tools tests don't need the full backend."""
-    return None
-
-
-@pytest_asyncio.fixture(scope="session", loop_scope="session", autouse=True)
-async def graph_cleanup():  # type: ignore[override]
-    """No-op graph cleanup stub."""
-    yield
--- a/autogpt_platform/backend/backend/copilot/tools/find_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_block.py
@@ -5,7 +5,6 @@ from prisma.enums import ContentType

 from backend.blocks import get_block
 from backend.blocks._base import BlockType
-from backend.copilot.context import get_current_permissions
 from backend.copilot.model import ChatSession
 from backend.data.db_accessors import search

@@ -39,7 +38,7 @@ COPILOT_EXCLUDED_BLOCK_TYPES = {

 # Specific block IDs excluded from CoPilot (STANDARD type but still require graph context)
 COPILOT_EXCLUDED_BLOCK_IDS = {
-    # OrchestratorBlock - dynamically discovers downstream blocks via graph topology;
+    # SmartDecisionMakerBlock - dynamically discovers downstream blocks via graph topology;
    # usable in agent graphs (guide hardcodes its ID) but cannot run standalone.
    "3b191d9f-356f-482d-8238-ba04b6d18381",
 }
@@ -150,19 +149,6 @@ class FindBlockTool(BaseTool):
                            session_id=session_id,
                        )

-                    # Check block-level permissions — hide denied blocks entirely
-                    perms = get_current_permissions()
-                    if perms is not None and not perms.is_block_allowed(
-                        block.id, block.name
-                    ):
-                        return NoResultsResponse(
-                            message=f"No blocks found for '{query}'",
-                            suggestions=[
-                                "Search for an alternative block by name",
-                            ],
-                            session_id=session_id,
-                        )
-
                    summary = BlockInfoSummary(
                        id=block.id,
                        name=block.name,
@@ -209,7 +195,6 @@ class FindBlockTool(BaseTool):
                )

            # Enrich results with block information
-            perms = get_current_permissions()
            blocks: list[BlockInfoSummary] = []
            for result in results:
                block_id = result["content_id"]
@@ -226,12 +211,6 @@ class FindBlockTool(BaseTool):
                ):
                    continue

-                # Skip blocks denied by execution permissions
-                if perms is not None and not perms.is_block_allowed(
-                    block.id, block.name
-                ):
-                    continue
-
                summary = BlockInfoSummary(
                    id=block_id,
                    name=block.name,
--- a/autogpt_platform/backend/backend/copilot/tools/find_block_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_block_test.py
@@ -69,8 +69,8 @@ class TestFindBlockFiltering:
        assert BlockType.HUMAN_IN_THE_LOOP in COPILOT_EXCLUDED_BLOCK_TYPES
        assert BlockType.AGENT in COPILOT_EXCLUDED_BLOCK_TYPES

-    def test_excluded_block_ids_contains_orchestrator(self):
-        """Verify OrchestratorBlock is in COPILOT_EXCLUDED_BLOCK_IDS."""
+    def test_excluded_block_ids_contains_smart_decision_maker(self):
+        """Verify SmartDecisionMakerBlock is in COPILOT_EXCLUDED_BLOCK_IDS."""
        assert "3b191d9f-356f-482d-8238-ba04b6d18381" in COPILOT_EXCLUDED_BLOCK_IDS

    @pytest.mark.asyncio(loop_scope="session")
@@ -120,18 +120,18 @@ class TestFindBlockFiltering:

    @pytest.mark.asyncio(loop_scope="session")
    async def test_excluded_block_id_filtered_from_results(self):
-        """Verify OrchestratorBlock is filtered from search results."""
+        """Verify SmartDecisionMakerBlock is filtered from search results."""
        session = make_session(user_id=_TEST_USER_ID)

-        orchestrator_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
+        smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
        search_results = [
-            {"content_id": orchestrator_id, "score": 0.9},
+            {"content_id": smart_decision_id, "score": 0.9},
            {"content_id": "normal-block-id", "score": 0.8},
        ]

-        # OrchestratorBlock has STANDARD type but is excluded by ID
+        # SmartDecisionMakerBlock has STANDARD type but is excluded by ID
        smart_block = make_mock_block(
-            orchestrator_id, "Orchestrator", BlockType.STANDARD
+            smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
        )
        normal_block = make_mock_block(
            "normal-block-id", "Normal Block", BlockType.STANDARD
@@ -139,7 +139,7 @@ class TestFindBlockFiltering:

        def mock_get_block(block_id):
            return {
-                orchestrator_id: smart_block,
+                smart_decision_id: smart_block,
                "normal-block-id": normal_block,
            }.get(block_id)

@@ -161,7 +161,7 @@ class TestFindBlockFiltering:
                    user_id=_TEST_USER_ID, session=session, query="decision"
                )

-        # Should only return normal block, not OrchestratorBlock
+        # Should only return normal block, not SmartDecisionMakerBlock
        assert isinstance(response, BlockListResponse)
        assert len(response.blocks) == 1
        assert response.blocks[0].id == "normal-block-id"
@@ -601,8 +601,10 @@ class TestFindBlockDirectLookup:
    async def test_uuid_lookup_excluded_block_id(self):
        """UUID matching an excluded block ID returns NoResultsResponse."""
        session = make_session(user_id=_TEST_USER_ID)
-        orchestrator_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
-        block = make_mock_block(orchestrator_id, "Orchestrator", BlockType.STANDARD)
+        smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
+        block = make_mock_block(
+            smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
+        )

        with patch(
            "backend.copilot.tools.find_block.get_block",
@@ -610,7 +612,7 @@ class TestFindBlockDirectLookup:
        ):
            tool = FindBlockTool()
            response = await tool._execute(
-                user_id=_TEST_USER_ID, session=session, query=orchestrator_id
+                user_id=_TEST_USER_ID, session=session, query=smart_decision_id
            )

        from .models import NoResultsResponse
--- a/autogpt_platform/backend/backend/copilot/tools/helpers.py
+++ b/autogpt_platform/backend/backend/copilot/tools/helpers.py
@@ -1,24 +1,15 @@
 """Shared helpers for chat tools."""

 import logging
-import uuid
 from collections import defaultdict
-from dataclasses import dataclass
 from typing import Any

 from pydantic_core import PydanticUndefined

-from backend.blocks import BlockType, get_block
 from backend.blocks._base import AnyBlockSchema
-from backend.copilot.constants import (
-    COPILOT_NODE_EXEC_ID_SEPARATOR,
-    COPILOT_NODE_PREFIX,
-    COPILOT_SESSION_PREFIX,
-)
-from backend.copilot.model import ChatSession
-from backend.copilot.sdk.file_ref import FileRefExpansionError, expand_file_refs_in_args
+from backend.copilot.constants import COPILOT_NODE_PREFIX, COPILOT_SESSION_PREFIX
 from backend.data.credit import UsageTransactionMetadata
-from backend.data.db_accessors import credit_db, review_db, workspace_db
+from backend.data.db_accessors import credit_db, workspace_db
 from backend.data.execution import ExecutionContext
 from backend.data.model import CredentialsFieldInfo, CredentialsMetaInput
 from backend.executor.utils import block_usage_cost
@@ -26,20 +17,8 @@ from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.util.exceptions import BlockError, InsufficientBalanceError
 from backend.util.type import coerce_inputs_to_schema

-from .models import (
-    BlockOutputResponse,
-    ErrorResponse,
-    InputValidationErrorResponse,
-    ReviewRequiredResponse,
-    SetupInfo,
-    SetupRequirementsResponse,
-    ToolResponseBase,
-    UserReadiness,
-)
-from .utils import (
-    build_missing_credentials_from_field_info,
-    match_credentials_to_requirements,
-)
+from .models import BlockOutputResponse, ErrorResponse, ToolResponseBase
+from .utils import match_credentials_to_requirements

 logger = logging.getLogger(__name__)

@@ -252,286 +231,6 @@ async def resolve_block_credentials(
    return await match_credentials_to_requirements(user_id, requirements)


-@dataclass
-class BlockPreparation:
-    """Result of successful block validation, ready for execution or task creation.
-
-    Attributes:
-        block: The resolved block instance (schema definition + execute method).
-        block_id: UUID of the block being prepared.
-        input_data: User-supplied input values after file-ref expansion.
-        matched_credentials: Credential field name -> resolved credential metadata.
-        input_schema: JSON Schema for the block's input, with credential
-            discriminators resolved for the user's available providers.
-        credentials_fields: Set of field names in the schema that are credential
-            inputs (e.g. ``{"credentials", "api_key"}``).
-        required_non_credential_keys: Schema-required fields minus credential
-            fields — the fields the user must supply directly.
-        provided_input_keys: Keys the user actually provided in ``input_data``.
-        synthetic_graph_id: Auto-generated graph UUID used for CoPilot
-            single-block executions (no real graph exists in the DB).
-        synthetic_node_id: Auto-generated node UUID paired with
-            ``synthetic_graph_id`` to form the execution context for the block.
-    """
-
-    block: AnyBlockSchema
-    block_id: str
-    input_data: dict[str, Any]
-    matched_credentials: dict[str, CredentialsMetaInput]
-    input_schema: dict[str, Any]
-    credentials_fields: set[str]
-    required_non_credential_keys: set[str]
-    provided_input_keys: set[str]
-    synthetic_graph_id: str
-    synthetic_node_id: str
-
-
-async def prepare_block_for_execution(
-    block_id: str,
-    input_data: dict[str, Any],
-    user_id: str,
-    session: ChatSession,
-    session_id: str,
-) -> "BlockPreparation | ToolResponseBase":
-    """Validate and prepare a block for execution.
-
-    Performs: block lookup, disabled/excluded-type checks, credential resolution,
-    input schema generation, file-ref expansion, missing-credentials check, and
-    unrecognized-field validation.
-
-    Does NOT check for missing required fields (tools differ: run_block shows a
-    schema preview) and does NOT run the HITL review check (use check_hitl_review
-    separately).
-
-    Args:
-        block_id: Block UUID to prepare.
-        input_data: Input values provided by the caller.
-        user_id: Authenticated user ID.
-        session: Current chat session (needed for file-ref expansion).
-        session_id: Chat session ID (used in error responses).
-
-    Returns:
-        BlockPreparation on success, or a ToolResponseBase error/setup response.
-    """
-    # Lazy import: find_block imports from .base and .models (siblings), not
-    # from helpers — no actual circular dependency exists today.  Kept lazy as a
-    # precaution since find_block is the block-registry module and future changes
-    # could introduce a cycle.
-    from .find_block import COPILOT_EXCLUDED_BLOCK_IDS, COPILOT_EXCLUDED_BLOCK_TYPES
-
-    block = get_block(block_id)
-    if not block:
-        return ErrorResponse(
-            message=f"Block '{block_id}' not found", session_id=session_id
-        )
-    if block.disabled:
-        return ErrorResponse(
-            message=f"Block '{block_id}' is disabled", session_id=session_id
-        )
-
-    if (
-        block.block_type in COPILOT_EXCLUDED_BLOCK_TYPES
-        or block.id in COPILOT_EXCLUDED_BLOCK_IDS
-    ):
-        if block.block_type == BlockType.MCP_TOOL:
-            hint = (
-                " Use the `run_mcp_tool` tool instead — it handles "
-                "MCP server discovery, authentication, and execution."
-            )
-        elif block.block_type == BlockType.AGENT:
-            hint = " Use the `run_agent` tool instead."
-        else:
-            hint = " This block is designed for use within graphs only."
-        return ErrorResponse(
-            message=f"Block '{block.name}' cannot be run directly.{hint}",
-            session_id=session_id,
-        )
-
-    matched_credentials, missing_credentials = await resolve_block_credentials(
-        user_id, block, input_data
-    )
-
-    try:
-        input_schema: dict[str, Any] = block.input_schema.jsonschema()
-    except Exception as e:
-        logger.warning("Failed to generate input schema for block %s: %s", block_id, e)
-        return ErrorResponse(
-            message=f"Block '{block.name}' has an invalid input schema",
-            error=str(e),
-            session_id=session_id,
-        )
-
-    # Expand @@agptfile: refs using the block's input schema so string/list
-    # fields get the correct deserialization.
-    if input_data:
-        try:
-            input_data = await expand_file_refs_in_args(
-                input_data, user_id, session, input_schema=input_schema
-            )
-        except FileRefExpansionError as exc:
-            return ErrorResponse(
-                message=(
-                    f"Failed to resolve file reference: {exc}. "
-                    "Ensure the file exists before referencing it."
-                ),
-                session_id=session_id,
-            )
-
-    credentials_fields = set(block.input_schema.get_credentials_fields().keys())
-
-    if missing_credentials:
-        credentials_fields_info = _resolve_discriminated_credentials(block, input_data)
-        missing_creds_dict = build_missing_credentials_from_field_info(
-            credentials_fields_info, set(matched_credentials.keys())
-        )
-        missing_creds_list = list(missing_creds_dict.values())
-        return SetupRequirementsResponse(
-            message=(
-                f"Block '{block.name}' requires credentials that are not configured. "
-                "Please set up the required credentials before running this block."
-            ),
-            session_id=session_id,
-            setup_info=SetupInfo(
-                agent_id=block_id,
-                agent_name=block.name,
-                user_readiness=UserReadiness(
-                    has_all_credentials=False,
-                    missing_credentials=missing_creds_dict,
-                    ready_to_run=False,
-                ),
-                requirements={
-                    "credentials": missing_creds_list,
-                    "inputs": get_inputs_from_schema(
-                        input_schema, exclude_fields=credentials_fields
-                    ),
-                    "execution_modes": ["immediate"],
-                },
-            ),
-            graph_id=None,
-            graph_version=None,
-        )
-    required_keys = set(input_schema.get("required", []))
-    required_non_credential_keys = required_keys - credentials_fields
-    provided_input_keys = set(input_data.keys()) - credentials_fields
-
-    valid_fields = set(input_schema.get("properties", {}).keys()) - credentials_fields
-    unrecognized_fields = provided_input_keys - valid_fields
-    if unrecognized_fields:
-        return InputValidationErrorResponse(
-            message=(
-                f"Unknown input field(s) provided: {', '.join(sorted(unrecognized_fields))}. "
-                "Block was not executed. Please use the correct field names from the schema."
-            ),
-            session_id=session_id,
-            unrecognized_fields=sorted(unrecognized_fields),
-            inputs=input_schema,
-        )
-
-    synthetic_graph_id = f"{COPILOT_SESSION_PREFIX}{session_id}"
-    synthetic_node_id = f"{COPILOT_NODE_PREFIX}{block_id}"
-
-    return BlockPreparation(
-        block=block,
-        block_id=block_id,
-        input_data=input_data,
-        matched_credentials=matched_credentials,
-        input_schema=input_schema,
-        credentials_fields=credentials_fields,
-        required_non_credential_keys=required_non_credential_keys,
-        provided_input_keys=provided_input_keys,
-        synthetic_graph_id=synthetic_graph_id,
-        synthetic_node_id=synthetic_node_id,
-    )
-
-
-async def check_hitl_review(
-    prep: BlockPreparation,
-    user_id: str,
-    session_id: str,
-) -> "tuple[str, dict[str, Any]] | ToolResponseBase":
-    """Check for an existing or new HITL review requirement.
-
-    If a review is needed, stores the review record and returns a
-    ReviewRequiredResponse.  Otherwise returns
-    ``(synthetic_node_exec_id, input_data)`` ready for execute_block.
-    """
-    block = prep.block
-    block_id = prep.block_id
-    synthetic_graph_id = prep.synthetic_graph_id
-    synthetic_node_id = prep.synthetic_node_id
-    input_data = prep.input_data
-
-    # Reuse an existing WAITING review for identical input (LLM retry guard)
-    existing_reviews = await review_db().get_pending_reviews_for_execution(
-        synthetic_graph_id, user_id
-    )
-    existing_review = next(
-        (
-            r
-            for r in existing_reviews
-            if r.node_id == synthetic_node_id
-            and r.status.value == "WAITING"
-            and r.payload == input_data
-        ),
-        None,
-    )
-    if existing_review:
-        return ReviewRequiredResponse(
-            message=(
-                f"Block '{block.name}' requires human review. "
-                f"After the user approves, call continue_run_block with "
-                f"review_id='{existing_review.node_exec_id}' to execute."
-            ),
-            session_id=session_id,
-            block_id=block_id,
-            block_name=block.name,
-            review_id=existing_review.node_exec_id,
-            graph_exec_id=synthetic_graph_id,
-            input_data=input_data,
-        )
-
-    synthetic_node_exec_id = (
-        f"{synthetic_node_id}{COPILOT_NODE_EXEC_ID_SEPARATOR}" f"{uuid.uuid4().hex[:8]}"
-    )
-
-    review_context = ExecutionContext(
-        user_id=user_id,
-        graph_id=synthetic_graph_id,
-        graph_exec_id=synthetic_graph_id,
-        graph_version=1,
-        node_id=synthetic_node_id,
-        node_exec_id=synthetic_node_exec_id,
-        sensitive_action_safe_mode=True,
-    )
-    should_pause, input_data = await block.is_block_exec_need_review(
-        input_data,
-        user_id=user_id,
-        node_id=synthetic_node_id,
-        node_exec_id=synthetic_node_exec_id,
-        graph_exec_id=synthetic_graph_id,
-        graph_id=synthetic_graph_id,
-        graph_version=1,
-        execution_context=review_context,
-        is_graph_execution=False,
-    )
-    if should_pause:
-        return ReviewRequiredResponse(
-            message=(
-                f"Block '{block.name}' requires human review. "
-                f"After the user approves, call continue_run_block with "
-                f"review_id='{synthetic_node_exec_id}' to execute."
-            ),
-            session_id=session_id,
-            block_id=block_id,
-            block_name=block.name,
-            review_id=synthetic_node_exec_id,
-            graph_exec_id=synthetic_graph_id,
-            input_data=input_data,
-        )
-
-    return synthetic_node_exec_id, input_data
-
-
 def _resolve_discriminated_credentials(
    block: AnyBlockSchema,
    input_data: dict[str, Any],
--- a/autogpt_platform/backend/backend/copilot/tools/helpers_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/helpers_test.py
@@ -1,4 +1,4 @@
-"""Tests for execute_block, prepare_block_for_execution, and check_hitl_review."""
+"""Tests for execute_block — credit charging and type coercion."""

 from collections.abc import AsyncIterator
 from typing import Any
@@ -7,20 +7,8 @@ from unittest.mock import AsyncMock, MagicMock, patch
 import pytest

 from backend.blocks._base import BlockType
-from backend.copilot.constants import COPILOT_NODE_PREFIX, COPILOT_SESSION_PREFIX
-from backend.copilot.tools.helpers import (
-    BlockPreparation,
-    check_hitl_review,
-    execute_block,
-    prepare_block_for_execution,
-)
-from backend.copilot.tools.models import (
-    BlockOutputResponse,
-    ErrorResponse,
-    InputValidationErrorResponse,
-    ReviewRequiredResponse,
-    SetupRequirementsResponse,
-)
+from backend.copilot.tools.helpers import execute_block
+from backend.copilot.tools.models import BlockOutputResponse, ErrorResponse

 _USER = "test-user-helpers"
 _SESSION = "test-session-helpers"
@@ -522,341 +510,3 @@ async def test_coerce_inner_elements_of_generic():
    # Inner elements should be coerced from int to str
    assert block._captured_inputs["values"] == ["1", "2", "3"]
    assert all(isinstance(v, str) for v in block._captured_inputs["values"])
-
-
-# ---------------------------------------------------------------------------
-# prepare_block_for_execution tests
-# ---------------------------------------------------------------------------
-
-_PREP_USER = "prep-user"
-_PREP_SESSION = "prep-session"
-
-
-def _make_prep_session(session_id: str = _PREP_SESSION) -> MagicMock:
-    session = MagicMock()
-    session.session_id = session_id
-    return session
-
-
-def _make_simple_block(
-    block_id: str = "blk-1",
-    name: str = "Simple Block",
-    disabled: bool = False,
-    required: list[str] | None = None,
-    properties: dict[str, Any] | None = None,
-) -> MagicMock:
-    block = MagicMock()
-    block.id = block_id
-    block.name = name
-    block.disabled = disabled
-    block.description = ""
-    block.block_type = MagicMock()
-
-    schema = {
-        "type": "object",
-        "properties": properties or {"text": {"type": "string"}},
-        "required": required or [],
-    }
-    block.input_schema.jsonschema.return_value = schema
-    block.input_schema.get_credentials_fields.return_value = {}
-    block.input_schema.get_credentials_fields_info.return_value = {}
-    return block
-
-
-def _patch_excluded(block_ids: set | None = None, block_types: set | None = None):
-    return (
-        patch(
-            "backend.copilot.tools.find_block.COPILOT_EXCLUDED_BLOCK_IDS",
-            new=block_ids or set(),
-            create=True,
-        ),
-        patch(
-            "backend.copilot.tools.find_block.COPILOT_EXCLUDED_BLOCK_TYPES",
-            new=block_types or set(),
-            create=True,
-        ),
-    )
-
-
-@pytest.mark.asyncio
-async def test_prepare_block_not_found() -> None:
-    excl_ids, excl_types = _patch_excluded()
-    with (
-        patch("backend.copilot.tools.helpers.get_block", return_value=None),
-        excl_ids,
-        excl_types,
-    ):
-        result = await prepare_block_for_execution(
-            block_id="missing",
-            input_data={},
-            user_id=_PREP_USER,
-            session=_make_prep_session(),
-            session_id=_PREP_SESSION,
-        )
-    assert isinstance(result, ErrorResponse)
-    assert "not found" in result.message
-
-
-@pytest.mark.asyncio
-async def test_prepare_block_disabled() -> None:
-    block = _make_simple_block(disabled=True)
-    excl_ids, excl_types = _patch_excluded()
-    with (
-        patch("backend.copilot.tools.helpers.get_block", return_value=block),
-        excl_ids,
-        excl_types,
-    ):
-        result = await prepare_block_for_execution(
-            block_id="blk-1",
-            input_data={},
-            user_id=_PREP_USER,
-            session=_make_prep_session(),
-            session_id=_PREP_SESSION,
-        )
-    assert isinstance(result, ErrorResponse)
-    assert "disabled" in result.message
-
-
-@pytest.mark.asyncio
-async def test_prepare_block_unrecognized_fields() -> None:
-    block = _make_simple_block(properties={"text": {"type": "string"}})
-    excl_ids, excl_types = _patch_excluded()
-    with (
-        patch("backend.copilot.tools.helpers.get_block", return_value=block),
-        excl_ids,
-        excl_types,
-        patch(
-            "backend.copilot.tools.helpers.resolve_block_credentials",
-            AsyncMock(return_value=({}, [])),
-        ),
-        patch(
-            "backend.copilot.tools.helpers.expand_file_refs_in_args",
-            AsyncMock(side_effect=lambda d, *a, **kw: d),
-        ),
-    ):
-        result = await prepare_block_for_execution(
-            block_id="blk-1",
-            input_data={"text": "hi", "unknown_field": "oops"},
-            user_id=_PREP_USER,
-            session=_make_prep_session(),
-            session_id=_PREP_SESSION,
-        )
-    assert isinstance(result, InputValidationErrorResponse)
-    assert "unknown_field" in result.unrecognized_fields
-
-
-@pytest.mark.asyncio
-async def test_prepare_block_missing_credentials() -> None:
-    block = _make_simple_block()
-    mock_cred = MagicMock()
-    excl_ids, excl_types = _patch_excluded()
-    with (
-        patch("backend.copilot.tools.helpers.get_block", return_value=block),
-        excl_ids,
-        excl_types,
-        patch(
-            "backend.copilot.tools.helpers.resolve_block_credentials",
-            AsyncMock(return_value=({}, [mock_cred])),
-        ),
-        patch(
-            "backend.copilot.tools.helpers.build_missing_credentials_from_field_info",
-            return_value={"cred_key": mock_cred},
-        ),
-    ):
-        result = await prepare_block_for_execution(
-            block_id="blk-1",
-            input_data={},
-            user_id=_PREP_USER,
-            session=_make_prep_session(),
-            session_id=_PREP_SESSION,
-        )
-    assert isinstance(result, SetupRequirementsResponse)
-
-
-@pytest.mark.asyncio
-async def test_prepare_block_success_returns_preparation() -> None:
-    block = _make_simple_block(
-        required=["text"], properties={"text": {"type": "string"}}
-    )
-    excl_ids, excl_types = _patch_excluded()
-    with (
-        patch("backend.copilot.tools.helpers.get_block", return_value=block),
-        excl_ids,
-        excl_types,
-        patch(
-            "backend.copilot.tools.helpers.resolve_block_credentials",
-            AsyncMock(return_value=({}, [])),
-        ),
-        patch(
-            "backend.copilot.tools.helpers.expand_file_refs_in_args",
-            AsyncMock(side_effect=lambda d, *a, **kw: d),
-        ),
-    ):
-        result = await prepare_block_for_execution(
-            block_id="blk-1",
-            input_data={"text": "hello"},
-            user_id=_PREP_USER,
-            session=_make_prep_session(),
-            session_id=_PREP_SESSION,
-        )
-    assert isinstance(result, BlockPreparation)
-    assert result.required_non_credential_keys == {"text"}
-    assert result.provided_input_keys == {"text"}
-
-
-# ---------------------------------------------------------------------------
-# check_hitl_review tests
-# ---------------------------------------------------------------------------
-
-
-def _make_hitl_prep(
-    block_id: str = "blk-hitl",
-    input_data: dict | None = None,
-    session_id: str = "hitl-sess",
-    needs_review: bool = False,
-) -> BlockPreparation:
-    block = MagicMock()
-    block.id = block_id
-    block.name = "HITL Block"
-    data = input_data if input_data is not None else {"action": "delete"}
-    block.is_block_exec_need_review = AsyncMock(return_value=(needs_review, data))
-    return BlockPreparation(
-        block=block,
-        block_id=block_id,
-        input_data=data,
-        matched_credentials={},
-        input_schema={},
-        credentials_fields=set(),
-        required_non_credential_keys=set(),
-        provided_input_keys=set(),
-        synthetic_graph_id=f"{COPILOT_SESSION_PREFIX}{session_id}",
-        synthetic_node_id=f"{COPILOT_NODE_PREFIX}{block_id}",
-    )
-
-
-@pytest.mark.asyncio
-async def test_check_hitl_no_review_needed() -> None:
-    prep = _make_hitl_prep(input_data={"action": "read"}, needs_review=False)
-    mock_rdb = MagicMock()
-    mock_rdb.get_pending_reviews_for_execution = AsyncMock(return_value=[])
-
-    with patch("backend.copilot.tools.helpers.review_db", return_value=mock_rdb):
-        result = await check_hitl_review(prep, "user1", "hitl-sess")
-
-    assert isinstance(result, tuple)
-    node_exec_id, returned_data = result
-    assert node_exec_id.startswith(f"{COPILOT_NODE_PREFIX}blk-hitl")
-    assert returned_data == {"action": "read"}
-
-
-@pytest.mark.asyncio
-async def test_check_hitl_review_required() -> None:
-    prep = _make_hitl_prep(input_data={"action": "delete"}, needs_review=True)
-    mock_rdb = MagicMock()
-    mock_rdb.get_pending_reviews_for_execution = AsyncMock(return_value=[])
-
-    with patch("backend.copilot.tools.helpers.review_db", return_value=mock_rdb):
-        result = await check_hitl_review(prep, "user1", "hitl-sess")
-
-    assert isinstance(result, ReviewRequiredResponse)
-    assert result.block_id == "blk-hitl"
-
-
-@pytest.mark.asyncio
-async def test_check_hitl_reuses_existing_waiting_review() -> None:
-    prep = _make_hitl_prep(input_data={"action": "delete"}, needs_review=False)
-
-    existing = MagicMock()
-    existing.node_id = prep.synthetic_node_id
-    existing.status.value = "WAITING"
-    existing.payload = {"action": "delete"}
-    existing.node_exec_id = "existing-review-42"
-
-    mock_rdb = MagicMock()
-    mock_rdb.get_pending_reviews_for_execution = AsyncMock(return_value=[existing])
-
-    with patch("backend.copilot.tools.helpers.review_db", return_value=mock_rdb):
-        result = await check_hitl_review(prep, "user1", "hitl-sess")
-
-    assert isinstance(result, ReviewRequiredResponse)
-    assert result.review_id == "existing-review-42"
-
-
-@pytest.mark.asyncio
-async def test_prepare_block_excluded_by_type() -> None:
-    """prepare_block_for_execution returns ErrorResponse for excluded block types."""
-    from backend.blocks import BlockType
-
-    block = _make_simple_block()
-    block.block_type = BlockType.AGENT
-
-    excl_ids, excl_types = _patch_excluded(block_types={BlockType.AGENT})
-    with (
-        patch("backend.copilot.tools.helpers.get_block", return_value=block),
-        excl_ids,
-        excl_types,
-    ):
-        result = await prepare_block_for_execution(
-            block_id="blk-agent",
-            input_data={},
-            user_id=_PREP_USER,
-            session=_make_prep_session(),
-            session_id=_PREP_SESSION,
-        )
-    assert isinstance(result, ErrorResponse)
-    assert "cannot be run directly" in result.message
-
-
-@pytest.mark.asyncio
-async def test_prepare_block_excluded_by_id() -> None:
-    """prepare_block_for_execution returns ErrorResponse for excluded block IDs."""
-    block = _make_simple_block(block_id="blk-excluded")
-
-    excl_ids, excl_types = _patch_excluded(block_ids={"blk-excluded"})
-    with (
-        patch("backend.copilot.tools.helpers.get_block", return_value=block),
-        excl_ids,
-        excl_types,
-    ):
-        result = await prepare_block_for_execution(
-            block_id="blk-excluded",
-            input_data={},
-            user_id=_PREP_USER,
-            session=_make_prep_session(),
-            session_id=_PREP_SESSION,
-        )
-    assert isinstance(result, ErrorResponse)
-    assert "cannot be run directly" in result.message
-
-
-@pytest.mark.asyncio
-async def test_prepare_block_file_ref_expansion_error() -> None:
-    """prepare_block_for_execution returns ErrorResponse when file-ref expansion fails."""
-    from backend.copilot.sdk.file_ref import FileRefExpansionError
-
-    block = _make_simple_block(properties={"text": {"type": "string"}})
-    excl_ids, excl_types = _patch_excluded()
-    with (
-        patch("backend.copilot.tools.helpers.get_block", return_value=block),
-        excl_ids,
-        excl_types,
-        patch(
-            "backend.copilot.tools.helpers.resolve_block_credentials",
-            AsyncMock(return_value=({}, [])),
-        ),
-        patch(
-            "backend.copilot.tools.helpers.expand_file_refs_in_args",
-            AsyncMock(
-                side_effect=FileRefExpansionError("@@agptfile:missing.txt not found")
-            ),
-        ),
-    ):
-        result = await prepare_block_for_execution(
-            block_id="blk-1",
-            input_data={"text": "@@agptfile:missing.txt"},
-            user_id=_PREP_USER,
-            session=_make_prep_session(),
-            session_id=_PREP_SESSION,
-        )
-    assert isinstance(result, ErrorResponse)
-    assert "file reference" in result.message.lower()
--- a/autogpt_platform/backend/backend/copilot/tools/run_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_block.py
@@ -1,19 +1,36 @@
 """Tool for executing blocks directly."""

 import logging
+import uuid
 from typing import Any

-from backend.copilot.context import get_current_permissions
+from backend.blocks import BlockType, get_block
+from backend.blocks._base import AnyBlockSchema
+from backend.copilot.constants import (
+    COPILOT_NODE_EXEC_ID_SEPARATOR,
+    COPILOT_NODE_PREFIX,
+    COPILOT_SESSION_PREFIX,
+)
 from backend.copilot.model import ChatSession
+from backend.copilot.sdk.file_ref import FileRefExpansionError, expand_file_refs_in_args
+from backend.data.db_accessors import review_db
+from backend.data.execution import ExecutionContext

 from .base import BaseTool
-from .helpers import (
-    BlockPreparation,
-    check_hitl_review,
-    execute_block,
-    prepare_block_for_execution,
+from .find_block import COPILOT_EXCLUDED_BLOCK_IDS, COPILOT_EXCLUDED_BLOCK_TYPES
+from .helpers import execute_block, get_inputs_from_schema, resolve_block_credentials
+from .models import (
+    BlockDetails,
+    BlockDetailsResponse,
+    ErrorResponse,
+    InputValidationErrorResponse,
+    ReviewRequiredResponse,
+    SetupInfo,
+    SetupRequirementsResponse,
+    ToolResponseBase,
+    UserReadiness,
 )
-from .models import BlockDetails, BlockDetailsResponse, ErrorResponse, ToolResponseBase
+from .utils import build_missing_credentials_from_field_info

 logger = logging.getLogger(__name__)

@@ -96,85 +113,267 @@ class RunBlockTool(BaseTool):
                session_id=session_id,
            )

-        logger.info("Preparing block %s for user %s", block_id, user_id)
-
-        prep_or_err = await prepare_block_for_execution(
-            block_id=block_id,
-            input_data=input_data,
-            user_id=user_id,
-            session=session,
-            session_id=session_id,
-        )
-        if isinstance(prep_or_err, ToolResponseBase):
-            return prep_or_err
-        prep: BlockPreparation = prep_or_err
-
-        # Check block-level permissions before execution.
-        perms = get_current_permissions()
-        if perms is not None and not perms.is_block_allowed(block_id, prep.block.name):
-            available_hint = (
-                f"Allowed identifiers: {perms.blocks!r}. "
-                if not perms.blocks_exclude and perms.blocks
-                else (
-                    f"Blocked identifiers: {perms.blocks!r}. "
-                    if perms.blocks_exclude and perms.blocks
-                    else ""
-                )
-            )
+        # Get the block
+        block = get_block(block_id)
+        if not block:
            return ErrorResponse(
-                message=(
-                    f"Block '{prep.block.name}' ({block_id}) is not permitted "
-                    f"by the current execution permissions. {available_hint}"
-                    "Use find_block to discover blocks that are allowed."
-                ),
+                message=f"Block '{block_id}' not found",
+                session_id=session_id,
+            )
+        if block.disabled:
+            return ErrorResponse(
+                message=f"Block '{block_id}' is disabled",
                session_id=session_id,
            )

-        # Show block details when required inputs are not yet provided.
-        # This is run_block's two-step UX: first call returns the schema,
-        # second call (with inputs) actually executes.
-        if not (prep.required_non_credential_keys <= prep.provided_input_keys):
-            try:
-                output_schema: dict[str, Any] = prep.block.output_schema.jsonschema()
-            except Exception as e:
-                logger.warning(
-                    "Failed to generate output schema for block %s: %s", block_id, e
+        # Check if block is excluded from CoPilot (graph-only blocks)
+        if (
+            block.block_type in COPILOT_EXCLUDED_BLOCK_TYPES
+            or block.id in COPILOT_EXCLUDED_BLOCK_IDS
+        ):
+            # Provide actionable guidance for blocks with dedicated tools
+            if block.block_type == BlockType.MCP_TOOL:
+                hint = (
+                    " Use the `run_mcp_tool` tool instead — it handles "
+                    "MCP server discovery, authentication, and execution."
                )
+            elif block.block_type == BlockType.AGENT:
+                hint = " Use the `run_agent` tool instead."
+            else:
+                hint = " This block is designed for use within graphs only."
+            return ErrorResponse(
+                message=f"Block '{block.name}' cannot be run directly.{hint}",
+                session_id=session_id,
+            )
+
+        logger.info(f"Executing block {block.name} ({block_id}) for user {user_id}")
+
+        (
+            matched_credentials,
+            missing_credentials,
+        ) = await resolve_block_credentials(user_id, block, input_data)
+
+        # Get block schemas for details/validation
+        try:
+            input_schema: dict[str, Any] = block.input_schema.jsonschema()
+        except Exception as e:
+            logger.warning(
+                "Failed to generate input schema for block %s: %s",
+                block_id,
+                e,
+            )
+            return ErrorResponse(
+                message=f"Block '{block.name}' has an invalid input schema",
+                error=str(e),
+                session_id=session_id,
+            )
+        try:
+            output_schema: dict[str, Any] = block.output_schema.jsonschema()
+        except Exception as e:
+            logger.warning(
+                "Failed to generate output schema for block %s: %s",
+                block_id,
+                e,
+            )
+            return ErrorResponse(
+                message=f"Block '{block.name}' has an invalid output schema",
+                error=str(e),
+                session_id=session_id,
+            )
+
+        # Expand @@agptfile: refs in input_data with the block's input
+        # schema.  The generic _truncating wrapper skips opaque object
+        # properties (input_data has no declared inner properties in the
+        # tool schema), so file ref tokens are still intact here.
+        # Using the block's schema lets us return raw text for string-typed
+        # fields and parsed structures for list/dict-typed fields.
+        if input_data:
+            try:
+                input_data = await expand_file_refs_in_args(
+                    input_data,
+                    user_id,
+                    session,
+                    input_schema=input_schema,
+                )
+            except FileRefExpansionError as exc:
                return ErrorResponse(
-                    message=f"Block '{prep.block.name}' has an invalid output schema",
-                    error=str(e),
+                    message=(
+                        f"Failed to resolve file reference: {exc}. "
+                        "Ensure the file exists before referencing it."
+                    ),
                    session_id=session_id,
                )

-            credentials_meta = list(prep.matched_credentials.values())
+        if missing_credentials:
+            # Return setup requirements response with missing credentials
+            credentials_fields_info = block.input_schema.get_credentials_fields_info()
+            missing_creds_dict = build_missing_credentials_from_field_info(
+                credentials_fields_info, set(matched_credentials.keys())
+            )
+            missing_creds_list = list(missing_creds_dict.values())
+
+            return SetupRequirementsResponse(
+                message=(
+                    f"Block '{block.name}' requires credentials that are not configured. "
+                    "Please set up the required credentials before running this block."
+                ),
+                session_id=session_id,
+                setup_info=SetupInfo(
+                    agent_id=block_id,
+                    agent_name=block.name,
+                    user_readiness=UserReadiness(
+                        has_all_credentials=False,
+                        missing_credentials=missing_creds_dict,
+                        ready_to_run=False,
+                    ),
+                    requirements={
+                        "credentials": missing_creds_list,
+                        "inputs": self._get_inputs_list(block),
+                        "execution_modes": ["immediate"],
+                    },
+                ),
+                graph_id=None,
+                graph_version=None,
+            )
+
+        # Check if this is a first attempt (required inputs missing)
+        # Return block details so user can see what inputs are needed
+        credentials_fields = set(block.input_schema.get_credentials_fields().keys())
+        required_keys = set(input_schema.get("required", []))
+        required_non_credential_keys = required_keys - credentials_fields
+        provided_input_keys = set(input_data.keys()) - credentials_fields
+
+        # Check for unknown input fields
+        valid_fields = (
+            set(input_schema.get("properties", {}).keys()) - credentials_fields
+        )
+        unrecognized_fields = provided_input_keys - valid_fields
+        if unrecognized_fields:
+            return InputValidationErrorResponse(
+                message=(
+                    f"Unknown input field(s) provided: {', '.join(sorted(unrecognized_fields))}. "
+                    f"Block was not executed. Please use the correct field names from the schema."
+                ),
+                session_id=session_id,
+                unrecognized_fields=sorted(unrecognized_fields),
+                inputs=input_schema,
+            )
+
+        # Show details when not all required non-credential inputs are provided
+        if not (required_non_credential_keys <= provided_input_keys):
+            # Get credentials info for the response
+            credentials_meta = []
+            for field_name, cred_meta in matched_credentials.items():
+                credentials_meta.append(cred_meta)
+
            return BlockDetailsResponse(
                message=(
-                    f"Block '{prep.block.name}' details. "
+                    f"Block '{block.name}' details. "
                    "Provide input_data matching the inputs schema to execute the block."
                ),
                session_id=session_id,
                block=BlockDetails(
                    id=block_id,
-                    name=prep.block.name,
-                    description=prep.block.description or "",
-                    inputs=prep.input_schema,
+                    name=block.name,
+                    description=block.description or "",
+                    inputs=input_schema,
                    outputs=output_schema,
                    credentials=credentials_meta,
                ),
                user_authenticated=True,
            )

-        hitl_or_err = await check_hitl_review(prep, user_id, session_id)
-        if isinstance(hitl_or_err, ToolResponseBase):
-            return hitl_or_err
-        synthetic_node_exec_id, input_data = hitl_or_err
+        # Generate synthetic IDs for CoPilot context.
+        # Encode node_id in node_exec_id so it can be extracted later
+        # (e.g. for auto-approve, where we need node_id but have no NodeExecution row).
+        synthetic_graph_id = f"{COPILOT_SESSION_PREFIX}{session.session_id}"
+        synthetic_node_id = f"{COPILOT_NODE_PREFIX}{block_id}"
+
+        # Check for an existing WAITING review for this block with the same input.
+        # If the LLM retries run_block with identical input, we reuse the existing
+        # review instead of creating duplicates. Different inputs = new execution.
+        existing_reviews = await review_db().get_pending_reviews_for_execution(
+            synthetic_graph_id, user_id
+        )
+        existing_review = next(
+            (
+                r
+                for r in existing_reviews
+                if r.node_id == synthetic_node_id
+                and r.status.value == "WAITING"
+                and r.payload == input_data
+            ),
+            None,
+        )
+        if existing_review:
+            return ReviewRequiredResponse(
+                message=(
+                    f"Block '{block.name}' requires human review. "
+                    f"After the user approves, call continue_run_block with "
+                    f"review_id='{existing_review.node_exec_id}' to execute."
+                ),
+                session_id=session_id,
+                block_id=block_id,
+                block_name=block.name,
+                review_id=existing_review.node_exec_id,
+                graph_exec_id=synthetic_graph_id,
+                input_data=input_data,
+            )
+
+        synthetic_node_exec_id = (
+            f"{synthetic_node_id}{COPILOT_NODE_EXEC_ID_SEPARATOR}"
+            f"{uuid.uuid4().hex[:8]}"
+        )
+
+        # Check for HITL review before execution.
+        # This creates the review record in the DB for CoPilot flows.
+        review_context = ExecutionContext(
+            user_id=user_id,
+            graph_id=synthetic_graph_id,
+            graph_exec_id=synthetic_graph_id,
+            graph_version=1,
+            node_id=synthetic_node_id,
+            node_exec_id=synthetic_node_exec_id,
+            sensitive_action_safe_mode=True,
+        )
+        should_pause, input_data = await block.is_block_exec_need_review(
+            input_data,
+            user_id=user_id,
+            node_id=synthetic_node_id,
+            node_exec_id=synthetic_node_exec_id,
+            graph_exec_id=synthetic_graph_id,
+            graph_id=synthetic_graph_id,
+            graph_version=1,
+            execution_context=review_context,
+            is_graph_execution=False,
+        )
+        if should_pause:
+            return ReviewRequiredResponse(
+                message=(
+                    f"Block '{block.name}' requires human review. "
+                    f"After the user approves, call continue_run_block with "
+                    f"review_id='{synthetic_node_exec_id}' to execute."
+                ),
+                session_id=session_id,
+                block_id=block_id,
+                block_name=block.name,
+                review_id=synthetic_node_exec_id,
+                graph_exec_id=synthetic_graph_id,
+                input_data=input_data,
+            )

        return await execute_block(
-            block=prep.block,
+            block=block,
            block_id=block_id,
            input_data=input_data,
            user_id=user_id,
            session_id=session_id,
            node_exec_id=synthetic_node_exec_id,
-            matched_credentials=prep.matched_credentials,
+            matched_credentials=matched_credentials,
        )
+
+    def _get_inputs_list(self, block: AnyBlockSchema) -> list[dict[str, Any]]:
+        """Extract non-credential inputs from block schema."""
+        schema = block.input_schema.jsonschema()
+        credentials_fields = set(block.input_schema.get_credentials_fields().keys())
+        return get_inputs_from_schema(schema, exclude_fields=credentials_fields)
--- a/autogpt_platform/backend/backend/copilot/tools/run_block_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_block_test.py
@@ -5,8 +5,6 @@ from unittest.mock import AsyncMock, MagicMock, patch
 import pytest

 from backend.blocks._base import BlockType
-from backend.copilot.context import _current_permissions
-from backend.copilot.permissions import CopilotPermissions

 from ._test_data import make_session
 from .models import (
@@ -94,7 +92,7 @@ class TestRunBlockFiltering:
        input_block = make_mock_block("input-block-id", "Input Block", BlockType.INPUT)

        with patch(
-            "backend.copilot.tools.helpers.get_block",
+            "backend.copilot.tools.run_block.get_block",
            return_value=input_block,
        ):
            tool = RunBlockTool()
@@ -111,92 +109,29 @@ class TestRunBlockFiltering:

    @pytest.mark.asyncio(loop_scope="session")
    async def test_excluded_block_id_returns_error(self):
-        """Attempting to execute OrchestratorBlock returns error."""
+        """Attempting to execute SmartDecisionMakerBlock returns error."""
        session = make_session(user_id=_TEST_USER_ID)

-        orchestrator_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
+        smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
        smart_block = make_mock_block(
-            orchestrator_id, "Orchestrator", BlockType.STANDARD
+            smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
        )

        with patch(
-            "backend.copilot.tools.helpers.get_block",
+            "backend.copilot.tools.run_block.get_block",
            return_value=smart_block,
        ):
            tool = RunBlockTool()
            response = await tool._execute(
                user_id=_TEST_USER_ID,
                session=session,
-                block_id=orchestrator_id,
+                block_id=smart_decision_id,
                input_data={},
            )

        assert isinstance(response, ErrorResponse)
        assert "cannot be run directly" in response.message

-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_block_denied_by_permissions_returns_error(self):
-        """A block denied by CopilotPermissions returns an ErrorResponse."""
-        session = make_session(user_id=_TEST_USER_ID)
-        block_id = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
-        standard_block = make_mock_block(block_id, "HTTP Request", BlockType.STANDARD)
-
-        perms = CopilotPermissions(blocks=[block_id], blocks_exclude=True)
-        token = _current_permissions.set(perms)
-        try:
-            with patch(
-                "backend.copilot.tools.helpers.get_block",
-                return_value=standard_block,
-            ):
-                tool = RunBlockTool()
-                response = await tool._execute(
-                    user_id=_TEST_USER_ID,
-                    session=session,
-                    block_id=block_id,
-                    input_data={},
-                )
-        finally:
-            _current_permissions.reset(token)
-
-        assert isinstance(response, ErrorResponse)
-        assert "not permitted" in response.message
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_allowed_by_permissions_passes_guard(self):
-        """A block explicitly allowed by a whitelist CopilotPermissions passes the guard."""
-        session = make_session(user_id=_TEST_USER_ID)
-        block_id = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
-        standard_block = make_mock_block(block_id, "HTTP Request", BlockType.STANDARD)
-
-        perms = CopilotPermissions(blocks=[block_id], blocks_exclude=False)
-        token = _current_permissions.set(perms)
-        try:
-            with (
-                patch(
-                    "backend.copilot.tools.helpers.get_block",
-                    return_value=standard_block,
-                ),
-                patch(
-                    "backend.copilot.tools.helpers.match_credentials_to_requirements",
-                    return_value=({}, []),
-                ),
-            ):
-                tool = RunBlockTool()
-                response = await tool._execute(
-                    user_id=_TEST_USER_ID,
-                    session=session,
-                    block_id=block_id,
-                    input_data={},
-                )
-        finally:
-            _current_permissions.reset(token)
-
-        # Must NOT be blocked by permissions — assert it's not a permission error
-        assert (
-            not isinstance(response, ErrorResponse)
-            or "not permitted" not in response.message
-        )
-
    @pytest.mark.asyncio(loop_scope="session")
    async def test_non_excluded_block_passes_guard(self):
        """Non-excluded blocks pass the filtering guard (may fail later for other reasons)."""
@@ -208,7 +143,7 @@ class TestRunBlockFiltering:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=standard_block,
            ),
            patch(
@@ -265,7 +200,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -308,7 +243,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -354,7 +289,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -402,7 +337,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -446,7 +381,7 @@ class TestRunBlockInputValidation:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -500,7 +435,7 @@ class TestRunBlockSensitiveAction:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -556,7 +491,7 @@ class TestRunBlockSensitiveAction:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
@@ -610,7 +545,7 @@ class TestRunBlockSensitiveAction:

        with (
            patch(
-                "backend.copilot.tools.helpers.get_block",
+                "backend.copilot.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
--- a/autogpt_platform/backend/backend/copilot/tools/test_run_block_details.py
+++ b/autogpt_platform/backend/backend/copilot/tools/test_run_block_details.py
@@ -61,12 +61,12 @@ async def test_run_block_returns_details_when_no_input_provided():
    )

    with patch(
-        "backend.copilot.tools.helpers.get_block",
+        "backend.copilot.tools.run_block.get_block",
        return_value=http_block,
    ):
        # Mock credentials check to return no missing credentials
        with patch(
-            "backend.copilot.tools.helpers.resolve_block_credentials",
+            "backend.copilot.tools.run_block.resolve_block_credentials",
            new_callable=AsyncMock,
            return_value=({}, []),  # (matched_credentials, missing_credentials)
        ):
@@ -119,11 +119,11 @@ async def test_run_block_returns_details_when_only_credentials_provided():
    }

    with patch(
-        "backend.copilot.tools.helpers.get_block",
+        "backend.copilot.tools.run_block.get_block",
        return_value=mock,
    ):
        with patch(
-            "backend.copilot.tools.helpers.resolve_block_credentials",
+            "backend.copilot.tools.run_block.resolve_block_credentials",
            new_callable=AsyncMock,
            return_value=(
                {
--- a/autogpt_platform/backend/backend/data/block_cost_config.py
+++ b/autogpt_platform/backend/backend/data/block_cost_config.py
@@ -32,9 +32,9 @@ from backend.blocks.llm import (
    AITextSummarizerBlock,
    LlmModel,
 )
-from backend.blocks.orchestrator import OrchestratorBlock
 from backend.blocks.replicate.flux_advanced import ReplicateFluxAdvancedModelBlock
 from backend.blocks.replicate.replicate_block import ReplicateModelBlock
+from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
 from backend.blocks.talking_head import CreateTalkingAvatarVideoBlock
 from backend.blocks.text_to_speech_block import UnrealTextToSpeechBlock
 from backend.blocks.video.narration import VideoNarrationBlock
@@ -548,6 +548,7 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
            },
        )
    ],
+    SmartDecisionMakerBlock: LLM_COST,
    SearchOrganizationsBlock: [
        BlockCost(
            cost_amount=2,
@@ -699,7 +700,6 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
            },
        ),
    ],
-    OrchestratorBlock: LLM_COST,
    VideoNarrationBlock: [
        BlockCost(
            cost_amount=5,  # ElevenLabs TTS cost
--- a/autogpt_platform/backend/backend/data/db.py
+++ b/autogpt_platform/backend/backend/data/db.py
@@ -38,10 +38,6 @@ POOL_TIMEOUT = os.getenv("DB_POOL_TIMEOUT")
 if POOL_TIMEOUT:
    DATABASE_URL = add_param(DATABASE_URL, "pool_timeout", POOL_TIMEOUT)

-STMT_CACHE_SIZE = os.getenv("DB_STATEMENT_CACHE_SIZE")
-if STMT_CACHE_SIZE:
-    DATABASE_URL = add_param(DATABASE_URL, "statement_cache_size", STMT_CACHE_SIZE)
-
 HTTP_TIMEOUT = int(POOL_TIMEOUT) if POOL_TIMEOUT else None

 prisma = Prisma(
--- a/autogpt_platform/backend/backend/data/execution_outputs_test.py
+++ b/autogpt_platform/backend/backend/data/execution_outputs_test.py
@@ -7,7 +7,7 @@ the function returns plain values instead of lists, it causes:
    1 validation error for dict[str,list[any]] response
    Input should be a valid list [type=list_type, input_value='', input_type=str]

-This breaks OrchestratorBlock agent mode tool execution.
+This breaks SmartDecisionMakerBlock agent mode tool execution.
 """

 from unittest.mock import AsyncMock, MagicMock, patch
--- a/autogpt_platform/backend/backend/data/graph.py
+++ b/autogpt_platform/backend/backend/data/graph.py
@@ -38,7 +38,7 @@ from backend.util.request import parse_url
 from .block import BlockInput
 from .db import BaseDbModel
 from .db import prisma as db
-from .db import query_raw_with_schema, transaction
+from .db import execute_raw_with_schema, query_raw_with_schema, transaction
 from .dynamic_fields import is_tool_pin, sanitize_pin_name
 from .includes import AGENT_GRAPH_INCLUDE, AGENT_NODE_INCLUDE, MAX_GRAPH_VERSIONS_FETCH
 from .model import CredentialsFieldInfo, CredentialsMetaInput, is_credentials_field_name
@@ -737,7 +737,7 @@ class GraphModel(Graph, GraphMeta):
        # Collect errors per node
        node_errors: dict[str, dict[str, str]] = defaultdict(dict)

-        # Validate tool orchestrator nodes
+        # Validate smart decision maker nodes
        nodes_block = {
            node.id: block
            for node in graph.nodes
@@ -1207,9 +1207,13 @@ async def get_graph_as_admin(
        order={"version": "desc"},
    )

-    # Admin access bypasses ownership and marketplace checks — route-level
-    # auth already ensures only admins can call this function.
-    if graph is None:
+    # For access, the graph must be owned by the user or listed in the store
+    if graph is None or (
+        graph.userId != user_id
+        and not await is_graph_published_in_marketplace(
+            graph_id, version or graph.version
+        )
+    ):
        return None

    if for_export:
@@ -1665,16 +1669,15 @@ async def migrate_llm_models(migrate_to: LlmModel):

    # Update each block
    for id, path in llm_model_fields.items():
-        query = f"""
-            UPDATE platform."AgentNode"
+        query = """
+            UPDATE {schema_prefix}"AgentNode"
            SET "constantInput" = jsonb_set("constantInput", $1, to_jsonb($2), true)
            WHERE "agentBlockId" = $3
            AND "constantInput" ? ($4)::text
-            AND "constantInput"->>($4)::text NOT IN {escaped_enum_values}
-            """
+            AND "constantInput"->>($4)::text NOT IN """ + escaped_enum_values

-        await db.execute_raw(
-            query,  # type: ignore - is supposed to be LiteralString
+        await execute_raw_with_schema(
+            query,
            [path],
            migrate_to.value,
            id,
--- a/autogpt_platform/backend/backend/data/graph_test.py
+++ b/autogpt_platform/backend/backend/data/graph_test.py
@@ -1,6 +1,6 @@
 import json
 from typing import Any
-from unittest.mock import AsyncMock, MagicMock, patch
+from unittest.mock import AsyncMock, patch
 from uuid import UUID

 import fastapi.exceptions
@@ -13,7 +13,7 @@ from backend.api.model import CreateGraph
 from backend.blocks._base import BlockSchema, BlockSchemaInput
 from backend.blocks.basic import StoreValueBlock
 from backend.blocks.io import AgentInputBlock, AgentOutputBlock
-from backend.data.graph import Graph, Link, Node, get_graph
+from backend.data.graph import Graph, Link, Node
 from backend.data.model import SchemaField
 from backend.data.user import DEFAULT_USER_ID
 from backend.usecases.sample import create_test_user
@@ -595,82 +595,3 @@ def test_mcp_credential_combine_no_discriminator_values():
        f"Expected 1 credential entry for MCP blocks without discriminator_values, "
        f"got {len(combined)}: {list(combined.keys())}"
    )
-
-
-# --------------- get_graph access-control regression tests --------------- #
-# These protect the behavior introduced in PR #11323 (Reinier, 2025-11-05):
-# non-owners can access APPROVED marketplace agents but NOT pending ones.
-
-
-def _make_mock_db_graph(user_id: str = "owner-user-id") -> MagicMock:
-    graph = MagicMock()
-    graph.userId = user_id
-    graph.id = "graph-id"
-    graph.version = 1
-    graph.Nodes = []
-    return graph
-
-
-@pytest.mark.asyncio
-async def test_get_graph_non_owner_approved_marketplace_agent() -> None:
-    """A non-owner should be able to access a graph that has an APPROVED
-    marketplace listing.  This is the normal marketplace download flow."""
-    owner_id = "owner-user-id"
-    requester_id = "different-user-id"
-    graph_id = "graph-id"
-    mock_graph = _make_mock_db_graph(owner_id)
-    mock_graph_model = MagicMock(name="GraphModel")
-
-    mock_listing = MagicMock()
-    mock_listing.AgentGraph = mock_graph
-
-    with (
-        patch("backend.data.graph.AgentGraph.prisma") as mock_ag_prisma,
-        patch(
-            "backend.data.graph.StoreListingVersion.prisma",
-        ) as mock_slv_prisma,
-        patch(
-            "backend.data.graph.GraphModel.from_db",
-            return_value=mock_graph_model,
-        ),
-    ):
-        # First lookup (owned graph) returns None — requester != owner
-        mock_ag_prisma.return_value.find_first = AsyncMock(return_value=None)
-        # Marketplace fallback finds an APPROVED listing
-        mock_slv_prisma.return_value.find_first = AsyncMock(return_value=mock_listing)
-
-        result = await get_graph(
-            graph_id=graph_id,
-            version=1,
-            user_id=requester_id,
-        )
-
-    assert result is not None, "Non-owner should access APPROVED marketplace agent"
-
-
-@pytest.mark.asyncio
-async def test_get_graph_non_owner_pending_marketplace_agent_denied() -> None:
-    """A non-owner must NOT be able to access a graph that only has a PENDING
-    (not APPROVED) marketplace listing.  The marketplace fallback filters on
-    submissionStatus=APPROVED, so pending agents should be invisible."""
-    requester_id = "different-user-id"
-    graph_id = "graph-id"
-
-    with (
-        patch("backend.data.graph.AgentGraph.prisma") as mock_ag_prisma,
-        patch(
-            "backend.data.graph.StoreListingVersion.prisma",
-        ) as mock_slv_prisma,
-    ):
-        # First lookup (owned graph) returns None
-        mock_ag_prisma.return_value.find_first = AsyncMock(return_value=None)
-        # Marketplace fallback finds nothing (not APPROVED)
-        mock_slv_prisma.return_value.find_first = AsyncMock(return_value=None)
-
-        result = await get_graph(
-            graph_id=graph_id,
-            version=1,
-            user_id=requester_id,
-        )
-
-    assert result is None, "Non-owner must not access a pending marketplace agent"
--- a/autogpt_platform/backend/backend/data/llm_registry/init.py
+++ b/autogpt_platform/backend/backend/data/llm_registry/init.py
@@ -0,0 +1,40 @@
+"""LLM Registry - Dynamic model management system."""
+
+from backend.blocks.llm import ModelMetadata
+from .notifications import (
+    publish_registry_refresh_notification,
+    subscribe_to_registry_refresh,
+)
+from .registry import (
+    RegistryModel,
+    RegistryModelCost,
+    RegistryModelCreator,
+    clear_registry_cache,
+    get_all_model_slugs_for_validation,
+    get_all_models,
+    get_default_model_slug,
+    get_enabled_models,
+    get_model,
+    get_schema_options,
+    refresh_llm_registry,
+)
+
+__all__ = [
+    # Models
+    "ModelMetadata",
+    "RegistryModel",
+    "RegistryModelCost",
+    "RegistryModelCreator",
+    # Cache management
+    "clear_registry_cache",
+    "publish_registry_refresh_notification",
+    "subscribe_to_registry_refresh",
+    # Read functions
+    "refresh_llm_registry",
+    "get_model",
+    "get_all_models",
+    "get_enabled_models",
+    "get_schema_options",
+    "get_default_model_slug",
+    "get_all_model_slugs_for_validation",
+]
--- a/autogpt_platform/backend/backend/data/llm_registry/notifications.py
+++ b/autogpt_platform/backend/backend/data/llm_registry/notifications.py
@@ -0,0 +1,84 @@
+"""Pub/sub notifications for LLM registry cross-process synchronisation."""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from typing import Awaitable, Callable
+
+logger = logging.getLogger(__name__)
+
+REGISTRY_REFRESH_CHANNEL = "llm_registry:refresh"
+
+
+async def publish_registry_refresh_notification() -> None:
+    """Publish a refresh signal so all other workers reload their in-process cache."""
+    from backend.data.redis_client import get_redis_async
+
+    try:
+        redis = await get_redis_async()
+        await redis.publish(REGISTRY_REFRESH_CHANNEL, "refresh")
+        logger.debug("Published LLM registry refresh notification")
+    except Exception as e:
+        logger.warning("Failed to publish registry refresh notification: %s", e)
+
+
+async def subscribe_to_registry_refresh(
+    on_refresh: Callable[[], Awaitable[None]],
+) -> None:
+    """Listen for registry refresh signals and call on_refresh each time one arrives.
+
+    Designed to run as a long-lived background asyncio.Task.  Automatically
+    reconnects if the Redis connection drops.
+
+    Args:
+        on_refresh: Async callable invoked on each refresh signal.
+                    Typically ``llm_registry.refresh_llm_registry``.
+    """
+    from backend.data.redis_client import HOST, PASSWORD, PORT
+    from redis.asyncio import Redis as AsyncRedis
+
+    while True:
+        try:
+            # Dedicated connection — pub/sub must not share a connection used
+            # for regular commands.
+            redis_sub = AsyncRedis(
+                host=HOST, port=PORT, password=PASSWORD, decode_responses=True
+            )
+            pubsub = redis_sub.pubsub()
+            await pubsub.subscribe(REGISTRY_REFRESH_CHANNEL)
+            logger.info("Subscribed to LLM registry refresh channel")
+
+            while True:
+                try:
+                    message = await pubsub.get_message(
+                        ignore_subscribe_messages=True, timeout=1.0
+                    )
+                    if (
+                        message
+                        and message["type"] == "message"
+                        and message["channel"] == REGISTRY_REFRESH_CHANNEL
+                    ):
+                        logger.debug("LLM registry refresh signal received")
+                        try:
+                            await on_refresh()
+                        except Exception as e:
+                            logger.error(
+                                "Error in registry on_refresh callback: %s", e
+                            )
+                except asyncio.CancelledError:
+                    raise
+                except Exception as e:
+                    logger.warning(
+                        "Error processing registry refresh message: %s", e
+                    )
+                    await asyncio.sleep(1)
+
+        except asyncio.CancelledError:
+            logger.info("LLM registry subscription task cancelled")
+            break
+        except Exception as e:
+            logger.warning(
+                "LLM registry subscription error: %s. Retrying in 5s...", e
+            )
+            await asyncio.sleep(5)
--- a/autogpt_platform/backend/backend/data/llm_registry/registry.py
+++ b/autogpt_platform/backend/backend/data/llm_registry/registry.py
@@ -0,0 +1,254 @@
+"""Core LLM registry implementation for managing models dynamically."""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from typing import Any
+
+import prisma.models
+from pydantic import BaseModel, ConfigDict
+
+from backend.blocks.llm import ModelMetadata
+from backend.util.cache import cached
+
+logger = logging.getLogger(__name__)
+
+
+class RegistryModelCost(BaseModel):
+    """Cost configuration for an LLM model."""
+
+    model_config = ConfigDict(frozen=True)
+
+    unit: str  # "RUN" or "TOKENS"
+    credit_cost: int
+    credential_provider: str
+    credential_id: str | None = None
+    credential_type: str | None = None
+    currency: str | None = None
+    metadata: dict[str, Any] = {}
+
+
+class RegistryModelCreator(BaseModel):
+    """Creator information for an LLM model."""
+
+    model_config = ConfigDict(frozen=True)
+
+    id: str
+    name: str
+    display_name: str
+    description: str | None = None
+    website_url: str | None = None
+    logo_url: str | None = None
+
+
+class RegistryModel(BaseModel):
+    """Represents a model in the LLM registry."""
+
+    model_config = ConfigDict(frozen=True)
+
+    slug: str
+    display_name: str
+    description: str | None = None
+    metadata: ModelMetadata
+    capabilities: dict[str, Any] = {}
+    extra_metadata: dict[str, Any] = {}
+    provider_display_name: str
+    is_enabled: bool
+    is_recommended: bool = False
+    costs: tuple[RegistryModelCost, ...] = ()
+    creator: RegistryModelCreator | None = None
+
+    # Typed capability fields from DB schema
+    supports_tools: bool = False
+    supports_json_output: bool = False
+    supports_reasoning: bool = False
+    supports_parallel_tool_calls: bool = False
+
+
+# L1 in-process cache — Redis is the shared L2 via @cached(shared_cache=True)
+_dynamic_models: dict[str, RegistryModel] = {}
+_schema_options: list[dict[str, str]] = []
+_lock = asyncio.Lock()
+
+
+def _record_to_registry_model(record: prisma.models.LlmModel) -> RegistryModel:  # type: ignore[name-defined]
+    """Transform a raw Prisma LlmModel record into a RegistryModel instance."""
+    costs = tuple(
+        RegistryModelCost(
+            unit=str(cost.unit),
+            credit_cost=cost.creditCost,
+            credential_provider=cost.credentialProvider,
+            credential_id=cost.credentialId,
+            credential_type=cost.credentialType,
+            currency=cost.currency,
+            metadata=dict(cost.metadata or {}),
+        )
+        for cost in (record.Costs or [])
+    )
+
+    creator = None
+    if record.Creator:
+        creator = RegistryModelCreator(
+            id=record.Creator.id,
+            name=record.Creator.name,
+            display_name=record.Creator.displayName,
+            description=record.Creator.description,
+            website_url=record.Creator.websiteUrl,
+            logo_url=record.Creator.logoUrl,
+        )
+
+    capabilities = dict(record.capabilities or {})
+
+    if not record.Provider:
+        logger.warning(
+            "LlmModel %s has no Provider despite NOT NULL FK - "
+            "falling back to providerId %s",
+            record.slug,
+            record.providerId,
+        )
+    provider_name = record.Provider.name if record.Provider else record.providerId
+    provider_display = (
+        record.Provider.displayName if record.Provider else record.providerId
+    )
+    creator_name = record.Creator.displayName if record.Creator else "Unknown"
+
+    if record.priceTier not in (1, 2, 3):
+        logger.warning(
+            "LlmModel %s has out-of-range priceTier=%s, defaulting to 1",
+            record.slug,
+            record.priceTier,
+        )
+    price_tier = record.priceTier if record.priceTier in (1, 2, 3) else 1
+
+    metadata = ModelMetadata(
+        provider=provider_name,
+        context_window=record.contextWindow,
+        max_output_tokens=(
+            record.maxOutputTokens
+            if record.maxOutputTokens is not None
+            else record.contextWindow
+        ),
+        display_name=record.displayName,
+        provider_name=provider_display,
+        creator_name=creator_name,
+        price_tier=price_tier,
+    )
+
+    return RegistryModel(
+        slug=record.slug,
+        display_name=record.displayName,
+        description=record.description,
+        metadata=metadata,
+        capabilities=capabilities,
+        extra_metadata=dict(record.metadata or {}),
+        provider_display_name=provider_display,
+        is_enabled=record.isEnabled,
+        is_recommended=record.isRecommended,
+        costs=costs,
+        creator=creator,
+        supports_tools=record.supportsTools,
+        supports_json_output=record.supportsJsonOutput,
+        supports_reasoning=record.supportsReasoning,
+        supports_parallel_tool_calls=record.supportsParallelToolCalls,
+    )
+
+
+@cached(maxsize=1, ttl_seconds=300, shared_cache=True, refresh_ttl_on_get=True)
+async def _fetch_registry_from_db() -> list[RegistryModel]:
+    """Fetch all LLM models from the database.
+
+    Results are cached in Redis (shared_cache=True) so subsequent calls within
+    the TTL window skip the DB entirely — both within this process and across
+    all other workers that share the same Redis instance.
+    """
+    records = await prisma.models.LlmModel.prisma().find_many(  # type: ignore[attr-defined]
+        include={"Provider": True, "Costs": True, "Creator": True}
+    )
+    logger.info("Fetched %d LLM models from database", len(records))
+    return [_record_to_registry_model(r) for r in records]
+
+
+def clear_registry_cache() -> None:
+    """Invalidate the shared Redis cache for the registry DB fetch.
+
+    Call this before refresh_llm_registry() after any admin DB mutation so the
+    next fetch hits the database rather than serving the now-stale cached data.
+    """
+    _fetch_registry_from_db.cache_clear()
+
+
+async def refresh_llm_registry() -> None:
+    """Refresh the in-process L1 cache from Redis/DB.
+
+    On the first call (or after clear_registry_cache()), fetches fresh data
+    from the database and stores it in Redis.  Subsequent calls by other
+    workers hit the Redis cache instead of the DB.
+    """
+    async with _lock:
+        try:
+            models = await _fetch_registry_from_db()
+            new_models = {m.slug: m for m in models}
+
+            global _dynamic_models, _schema_options
+            _dynamic_models = new_models
+            _schema_options = _build_schema_options()
+
+            logger.info(
+                "LLM registry refreshed: %d models, %d schema options",
+                len(_dynamic_models),
+                len(_schema_options),
+            )
+        except Exception as e:
+            logger.error("Failed to refresh LLM registry: %s", e, exc_info=True)
+            raise
+
+
+def _build_schema_options() -> list[dict[str, str]]:
+    """Build schema options for model selection dropdown. Only includes enabled models."""
+    return [
+        {
+            "label": model.display_name,
+            "value": model.slug,
+            "group": model.metadata.provider,
+            "description": model.description or "",
+        }
+        for model in sorted(
+            _dynamic_models.values(), key=lambda m: m.display_name.lower()
+        )
+        if model.is_enabled
+    ]
+
+
+def get_model(slug: str) -> RegistryModel | None:
+    """Get a model by slug from the registry."""
+    return _dynamic_models.get(slug)
+
+
+def get_all_models() -> list[RegistryModel]:
+    """Get all models from the registry (including disabled)."""
+    return list(_dynamic_models.values())
+
+
+def get_enabled_models() -> list[RegistryModel]:
+    """Get only enabled models from the registry."""
+    return [model for model in _dynamic_models.values() if model.is_enabled]
+
+
+def get_schema_options() -> list[dict[str, str]]:
+    """Get schema options for model selection dropdown (enabled models only)."""
+    return list(_schema_options)
+
+
+def get_default_model_slug() -> str | None:
+    """Get the default model slug (first recommended, or first enabled)."""
+    models = sorted(_dynamic_models.values(), key=lambda m: m.display_name)
+    recommended = next(
+        (m.slug for m in models if m.is_recommended and m.is_enabled), None
+    )
+    return recommended or next((m.slug for m in models if m.is_enabled), None)
+
+
+def get_all_model_slugs_for_validation() -> list[str]:
+    """Get all model slugs for validation (enabled models only)."""
+    return [model.slug for model in _dynamic_models.values() if model.is_enabled]
--- a/autogpt_platform/backend/backend/data/llm_registry/registry_test.py
+++ b/autogpt_platform/backend/backend/data/llm_registry/registry_test.py
@@ -0,0 +1,358 @@
+"""Unit tests for the LLM registry module."""
+
+from __future__ import annotations
+
+import asyncio
+from unittest.mock import AsyncMock, Mock, patch
+
+import pytest
+import pydantic
+
+from backend.data.llm_registry.registry import (
+    RegistryModel,
+    RegistryModelCost,
+    RegistryModelCreator,
+    _build_schema_options,
+    _record_to_registry_model,
+    get_default_model_slug,
+    get_schema_options,
+    refresh_llm_registry,
+)
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_mock_record(**overrides):
+    """Build a realistic mock Prisma LlmModel record."""
+    provider = Mock()
+    provider.name = "openai"
+    provider.displayName = "OpenAI"
+
+    record = Mock()
+    record.slug = "openai/gpt-4o"
+    record.displayName = "GPT-4o"
+    record.description = "Latest GPT model"
+    record.providerId = "provider-uuid"
+    record.Provider = provider
+    record.creatorId = "creator-uuid"
+    record.Creator = None
+    record.contextWindow = 128000
+    record.maxOutputTokens = 16384
+    record.priceTier = 2
+    record.isEnabled = True
+    record.isRecommended = False
+    record.supportsTools = True
+    record.supportsJsonOutput = True
+    record.supportsReasoning = False
+    record.supportsParallelToolCalls = True
+    record.capabilities = {}
+    record.metadata = {}
+    record.Costs = []
+
+    for key, value in overrides.items():
+        setattr(record, key, value)
+    return record
+
+
+def _make_registry_model(**kwargs) -> RegistryModel:
+    """Build a minimal RegistryModel for testing registry-level functions."""
+    from backend.blocks.llm import ModelMetadata
+
+    defaults = dict(
+        slug="openai/gpt-4o",
+        display_name="GPT-4o",
+        description=None,
+        metadata=ModelMetadata(
+            provider="openai",
+            context_window=128000,
+            max_output_tokens=16384,
+            display_name="GPT-4o",
+            provider_name="OpenAI",
+            creator_name="Unknown",
+            price_tier=2,
+        ),
+        capabilities={},
+        extra_metadata={},
+        provider_display_name="OpenAI",
+        is_enabled=True,
+        is_recommended=False,
+    )
+    defaults.update(kwargs)
+    return RegistryModel(**defaults)
+
+
+# ---------------------------------------------------------------------------
+# _record_to_registry_model tests
+# ---------------------------------------------------------------------------
+
+
+def test_record_to_registry_model():
+    """Happy-path: well-formed record produces a correct RegistryModel."""
+    record = _make_mock_record()
+    model = _record_to_registry_model(record)
+
+    assert model.slug == "openai/gpt-4o"
+    assert model.display_name == "GPT-4o"
+    assert model.description == "Latest GPT model"
+    assert model.provider_display_name == "OpenAI"
+    assert model.is_enabled is True
+    assert model.is_recommended is False
+    assert model.supports_tools is True
+    assert model.supports_json_output is True
+    assert model.supports_reasoning is False
+    assert model.supports_parallel_tool_calls is True
+    assert model.metadata.provider == "openai"
+    assert model.metadata.context_window == 128000
+    assert model.metadata.max_output_tokens == 16384
+    assert model.metadata.price_tier == 2
+    assert model.creator is None
+    assert model.costs == ()
+
+
+def test_record_to_registry_model_missing_provider(caplog):
+    """Record with no Provider relation falls back to providerId and logs a warning."""
+    record = _make_mock_record(Provider=None, providerId="provider-uuid")
+    with caplog.at_level("WARNING"):
+        model = _record_to_registry_model(record)
+
+    assert "no Provider" in caplog.text
+    assert model.metadata.provider == "provider-uuid"
+    assert model.provider_display_name == "provider-uuid"
+
+
+def test_record_to_registry_model_missing_creator():
+    """When Creator is None, creator_name defaults to 'Unknown' and creator field is None."""
+    record = _make_mock_record(Creator=None)
+    model = _record_to_registry_model(record)
+
+    assert model.creator is None
+    assert model.metadata.creator_name == "Unknown"
+
+
+def test_record_to_registry_model_with_creator():
+    """When Creator is present, it is parsed into RegistryModelCreator."""
+    creator_mock = Mock()
+    creator_mock.id = "creator-uuid"
+    creator_mock.name = "openai"
+    creator_mock.displayName = "OpenAI"
+    creator_mock.description = "AI company"
+    creator_mock.websiteUrl = "https://openai.com"
+    creator_mock.logoUrl = "https://openai.com/logo.png"
+
+    record = _make_mock_record(Creator=creator_mock)
+    model = _record_to_registry_model(record)
+
+    assert model.creator is not None
+    assert isinstance(model.creator, RegistryModelCreator)
+    assert model.creator.id == "creator-uuid"
+    assert model.creator.display_name == "OpenAI"
+    assert model.metadata.creator_name == "OpenAI"
+
+
+def test_record_to_registry_model_null_max_output_tokens():
+    """maxOutputTokens=None falls back to contextWindow."""
+    record = _make_mock_record(maxOutputTokens=None, contextWindow=64000)
+    model = _record_to_registry_model(record)
+
+    assert model.metadata.max_output_tokens == 64000
+
+
+def test_record_to_registry_model_invalid_price_tier(caplog):
+    """Out-of-range priceTier is coerced to 1 and a warning is logged."""
+    record = _make_mock_record(priceTier=99)
+    with caplog.at_level("WARNING"):
+        model = _record_to_registry_model(record)
+
+    assert "out-of-range priceTier" in caplog.text
+    assert model.metadata.price_tier == 1
+
+
+def test_record_to_registry_model_with_costs():
+    """Costs are parsed into RegistryModelCost tuples."""
+    cost_mock = Mock()
+    cost_mock.unit = "TOKENS"
+    cost_mock.creditCost = 10
+    cost_mock.credentialProvider = "openai"
+    cost_mock.credentialId = None
+    cost_mock.credentialType = None
+    cost_mock.currency = "USD"
+    cost_mock.metadata = {}
+
+    record = _make_mock_record(Costs=[cost_mock])
+    model = _record_to_registry_model(record)
+
+    assert len(model.costs) == 1
+    cost = model.costs[0]
+    assert isinstance(cost, RegistryModelCost)
+    assert cost.unit == "TOKENS"
+    assert cost.credit_cost == 10
+    assert cost.credential_provider == "openai"
+
+
+# ---------------------------------------------------------------------------
+# get_default_model_slug tests
+# ---------------------------------------------------------------------------
+
+
+def test_get_default_model_slug_recommended():
+    """Recommended model is preferred over non-recommended enabled models."""
+    import backend.data.llm_registry.registry as reg
+
+    reg._dynamic_models = {
+        "openai/gpt-4o": _make_registry_model(
+            slug="openai/gpt-4o", display_name="GPT-4o", is_recommended=False
+        ),
+        "openai/gpt-4o-recommended": _make_registry_model(
+            slug="openai/gpt-4o-recommended",
+            display_name="GPT-4o Recommended",
+            is_recommended=True,
+        ),
+    }
+
+    result = get_default_model_slug()
+    assert result == "openai/gpt-4o-recommended"
+
+
+def test_get_default_model_slug_fallback():
+    """With no recommended model, falls back to first enabled (alphabetical)."""
+    import backend.data.llm_registry.registry as reg
+
+    reg._dynamic_models = {
+        "openai/gpt-4o": _make_registry_model(
+            slug="openai/gpt-4o", display_name="GPT-4o", is_recommended=False
+        ),
+        "openai/gpt-3.5": _make_registry_model(
+            slug="openai/gpt-3.5", display_name="GPT-3.5", is_recommended=False
+        ),
+    }
+
+    result = get_default_model_slug()
+    # Sorted alphabetically: GPT-3.5 < GPT-4o
+    assert result == "openai/gpt-3.5"
+
+
+def test_get_default_model_slug_empty():
+    """Empty registry returns None."""
+    import backend.data.llm_registry.registry as reg
+
+    reg._dynamic_models = {}
+
+    result = get_default_model_slug()
+    assert result is None
+
+
+# ---------------------------------------------------------------------------
+# _build_schema_options / get_schema_options tests
+# ---------------------------------------------------------------------------
+
+
+def test_build_schema_options():
+    """Only enabled models appear, sorted case-insensitively."""
+    import backend.data.llm_registry.registry as reg
+
+    reg._dynamic_models = {
+        "openai/gpt-4o": _make_registry_model(
+            slug="openai/gpt-4o", display_name="GPT-4o", is_enabled=True
+        ),
+        "openai/disabled": _make_registry_model(
+            slug="openai/disabled", display_name="Disabled Model", is_enabled=False
+        ),
+        "openai/gpt-3.5": _make_registry_model(
+            slug="openai/gpt-3.5", display_name="gpt-3.5", is_enabled=True
+        ),
+    }
+
+    options = _build_schema_options()
+    slugs = [o["value"] for o in options]
+
+    # disabled model should be excluded
+    assert "openai/disabled" not in slugs
+    # only enabled models
+    assert "openai/gpt-4o" in slugs
+    assert "openai/gpt-3.5" in slugs
+    # case-insensitive sort: "gpt-3.5" < "GPT-4o" (both lowercase: "gpt-3.5" < "gpt-4o")
+    assert slugs.index("openai/gpt-3.5") < slugs.index("openai/gpt-4o")
+
+    # Verify structure
+    for option in options:
+        assert "label" in option
+        assert "value" in option
+        assert "group" in option
+        assert "description" in option
+
+
+def test_get_schema_options_returns_copy():
+    """Mutating the returned list does not affect the internal cache."""
+    import backend.data.llm_registry.registry as reg
+
+    reg._dynamic_models = {
+        "openai/gpt-4o": _make_registry_model(slug="openai/gpt-4o", display_name="GPT-4o"),
+    }
+    reg._schema_options = _build_schema_options()
+
+    options = get_schema_options()
+    original_length = len(options)
+    options.append({"label": "Injected", "value": "evil/model", "group": "evil", "description": ""})
+
+    # Internal state should be unchanged
+    assert len(get_schema_options()) == original_length
+
+
+# ---------------------------------------------------------------------------
+# Pydantic frozen model tests
+# ---------------------------------------------------------------------------
+
+
+def test_registry_model_frozen():
+    """Pydantic frozen=True should reject attribute assignment."""
+    model = _make_registry_model()
+
+    with pytest.raises((pydantic.ValidationError, TypeError)):
+        model.slug = "changed/slug"  # type: ignore[misc]
+
+
+def test_registry_model_cost_frozen():
+    """RegistryModelCost is also frozen."""
+    cost = RegistryModelCost(
+        unit="TOKENS",
+        credit_cost=5,
+        credential_provider="openai",
+    )
+    with pytest.raises((pydantic.ValidationError, TypeError)):
+        cost.unit = "RUN"  # type: ignore[misc]
+
+
+# ---------------------------------------------------------------------------
+# refresh_llm_registry tests
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_refresh_llm_registry():
+    """Mock prisma find_many, verify cache is populated after refresh."""
+    import backend.data.llm_registry.registry as reg
+
+    record = _make_mock_record()
+    mock_find_many = AsyncMock(return_value=[record])
+
+    with patch("prisma.models.LlmModel.prisma") as mock_prisma_cls:
+        mock_prisma_instance = Mock()
+        mock_prisma_instance.find_many = mock_find_many
+        mock_prisma_cls.return_value = mock_prisma_instance
+
+        # Clear state first
+        reg._dynamic_models = {}
+        reg._schema_options = []
+
+        await refresh_llm_registry()
+
+    assert "openai/gpt-4o" in reg._dynamic_models
+    model = reg._dynamic_models["openai/gpt-4o"]
+    assert isinstance(model, RegistryModel)
+    assert model.slug == "openai/gpt-4o"
+    # Schema options should be populated too
+    assert len(reg._schema_options) == 1
+    assert reg._schema_options[0]["value"] == "openai/gpt-4o"
--- a/autogpt_platform/backend/backend/data/understanding.py
+++ b/autogpt_platform/backend/backend/data/understanding.py
@@ -23,29 +23,11 @@ def _cache_key(user_id: str) -> str:


 def _json_to_list(value: Any) -> list[str]:
-    """Convert Json field to list[str], handling None.
-
-    Also handles legacy dict-format rows (e.g. ``{"Learn": [...], "Create": [...]}``
-    from the reverted themed-prompts feature) by flattening all values into a single
-    list so existing personalised data isn't silently lost.
-    """
+    """Convert Json field to list[str], handling None."""
    if value is None:
        return []
    if isinstance(value, list):
        return cast(list[str], value)
-    if isinstance(value, dict):
-        # Legacy themed-prompt format: flatten all string values from all categories.
-        logger.debug(
-            "_json_to_list: flattening legacy dict-format value (keys=%s)",
-            list(value.keys()),
-        )
-        return [
-            item
-            for vals in value.values()
-            if isinstance(vals, list)
-            for item in vals
-            if isinstance(item, str)
-        ]
    return []


--- a/autogpt_platform/backend/backend/server/v2/llm/init.py
+++ b/autogpt_platform/backend/backend/server/v2/llm/init.py
@@ -0,0 +1,6 @@
+"""LLM registry API (public + admin)."""
+
+from .admin_routes import router as admin_router
+from .routes import router
+
+__all__ = ["router", "admin_router"]
--- a/autogpt_platform/backend/backend/server/v2/llm/admin_model.py
+++ b/autogpt_platform/backend/backend/server/v2/llm/admin_model.py
@@ -0,0 +1,115 @@
+"""Request/response models for LLM registry admin API."""
+
+from typing import Any
+
+from pydantic import BaseModel, Field
+
+
+class CreateLlmProviderRequest(BaseModel):
+    """Request model for creating an LLM provider."""
+
+    name: str = Field(
+        ..., description="Provider identifier (e.g., 'openai', 'anthropic')"
+    )
+    display_name: str = Field(..., description="Human-readable provider name")
+    description: str | None = Field(None, description="Provider description")
+    default_credential_provider: str | None = Field(
+        None, description="Default credential system identifier"
+    )
+    default_credential_id: str | None = Field(None, description="Default credential ID")
+    default_credential_type: str | None = Field(
+        None, description="Default credential type"
+    )
+    metadata: dict[str, Any] = Field(
+        default_factory=dict, description="Additional metadata"
+    )
+
+
+class UpdateLlmProviderRequest(BaseModel):
+    """Request model for updating an LLM provider."""
+
+    display_name: str | None = Field(None, description="Human-readable provider name")
+    description: str | None = Field(None, description="Provider description")
+    default_credential_provider: str | None = Field(
+        None, description="Default credential system identifier"
+    )
+    default_credential_id: str | None = Field(None, description="Default credential ID")
+    default_credential_type: str | None = Field(
+        None, description="Default credential type"
+    )
+    metadata: dict[str, Any] | None = Field(None, description="Additional metadata")
+
+
+class CreateLlmModelRequest(BaseModel):
+    """Request model for creating an LLM model."""
+
+    slug: str = Field(..., description="Model slug (e.g., 'gpt-4', 'claude-3-opus')")
+    display_name: str = Field(..., description="Human-readable model name")
+    description: str | None = Field(None, description="Model description")
+    provider_id: str = Field(..., description="Provider ID (UUID)")
+    creator_id: str | None = Field(None, description="Creator ID (UUID)")
+    context_window: int = Field(
+        ..., description="Maximum context window in tokens", gt=0
+    )
+    max_output_tokens: int | None = Field(
+        None, description="Maximum output tokens (None if unlimited)", gt=0
+    )
+    price_tier: int = Field(
+        ..., description="Price tier (1=cheapest, 2=medium, 3=expensive)", ge=1, le=3
+    )
+    is_enabled: bool = Field(default=True, description="Whether the model is enabled")
+    is_recommended: bool = Field(
+        default=False, description="Whether the model is recommended"
+    )
+    supports_tools: bool = Field(default=False, description="Supports function calling")
+    supports_json_output: bool = Field(
+        default=False, description="Supports JSON output mode"
+    )
+    supports_reasoning: bool = Field(
+        default=False, description="Supports reasoning mode"
+    )
+    supports_parallel_tool_calls: bool = Field(
+        default=False, description="Supports parallel tool calls"
+    )
+    capabilities: dict[str, Any] = Field(
+        default_factory=dict, description="Additional capabilities"
+    )
+    metadata: dict[str, Any] = Field(
+        default_factory=dict, description="Additional metadata"
+    )
+    costs: list[dict[str, Any]] = Field(
+        default_factory=list, description="Cost entries for the model"
+    )
+
+
+class UpdateLlmModelRequest(BaseModel):
+    """Request model for updating an LLM model."""
+
+    display_name: str | None = Field(None, description="Human-readable model name")
+    description: str | None = Field(None, description="Model description")
+    creator_id: str | None = Field(None, description="Creator ID (UUID)")
+    context_window: int | None = Field(
+        None, description="Maximum context window in tokens", gt=0
+    )
+    max_output_tokens: int | None = Field(
+        None, description="Maximum output tokens (None if unlimited)", gt=0
+    )
+    price_tier: int | None = Field(
+        None, description="Price tier (1=cheapest, 2=medium, 3=expensive)", ge=1, le=3
+    )
+    is_enabled: bool | None = Field(None, description="Whether the model is enabled")
+    is_recommended: bool | None = Field(
+        None, description="Whether the model is recommended"
+    )
+    supports_tools: bool | None = Field(None, description="Supports function calling")
+    supports_json_output: bool | None = Field(
+        None, description="Supports JSON output mode"
+    )
+    supports_reasoning: bool | None = Field(None, description="Supports reasoning mode")
+    supports_parallel_tool_calls: bool | None = Field(
+        None, description="Supports parallel tool calls"
+    )
+    capabilities: dict[str, Any] | None = Field(
+        None, description="Additional capabilities"
+    )
+    metadata: dict[str, Any] | None = Field(None, description="Additional metadata")
--- a/autogpt_platform/backend/backend/server/v2/llm/admin_routes.py
+++ b/autogpt_platform/backend/backend/server/v2/llm/admin_routes.py
@@ -0,0 +1,689 @@
+"""Admin API for LLM registry management.
+
+Provides endpoints for:
+- Reading creators (GET)
+- Creating, updating, and deleting models
+- Creating, updating, and deleting providers
+
+All endpoints require admin authentication. Mutations refresh the registry cache.
+"""
+
+import logging
+from typing import Any
+
+import prisma
+import autogpt_libs.auth
+from fastapi import APIRouter, HTTPException, Security, status
+
+from backend.server.v2.llm import db_write
+from backend.server.v2.llm.admin_model import (
+    CreateLlmModelRequest,
+    CreateLlmProviderRequest,
+    UpdateLlmModelRequest,
+    UpdateLlmProviderRequest,
+)
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter()
+
+
+def _map_provider_response(provider: Any) -> dict[str, Any]:
+    """Map Prisma provider model to response dict."""
+    return {
+        "id": provider.id,
+        "name": provider.name,
+        "display_name": provider.displayName,
+        "description": provider.description,
+        "default_credential_provider": provider.defaultCredentialProvider,
+        "default_credential_id": provider.defaultCredentialId,
+        "default_credential_type": provider.defaultCredentialType,
+        "metadata": dict(provider.metadata or {}),
+        "created_at": provider.createdAt.isoformat() if provider.createdAt else None,
+        "updated_at": provider.updatedAt.isoformat() if provider.updatedAt else None,
+    }
+
+
+def _map_model_response(model: Any) -> dict[str, Any]:
+    """Map Prisma model to response dict."""
+    return {
+        "id": model.id,
+        "slug": model.slug,
+        "display_name": model.displayName,
+        "description": model.description,
+        "provider_id": model.providerId,
+        "creator_id": model.creatorId,
+        "context_window": model.contextWindow,
+        "max_output_tokens": model.maxOutputTokens,
+        "price_tier": model.priceTier,
+        "is_enabled": model.isEnabled,
+        "is_recommended": model.isRecommended,
+        "supports_tools": model.supportsTools,
+        "supports_json_output": model.supportsJsonOutput,
+        "supports_reasoning": model.supportsReasoning,
+        "supports_parallel_tool_calls": model.supportsParallelToolCalls,
+        "capabilities": dict(model.capabilities or {}),
+        "metadata": dict(model.metadata or {}),
+        "created_at": model.createdAt.isoformat() if model.createdAt else None,
+        "updated_at": model.updatedAt.isoformat() if model.updatedAt else None,
+    }
+
+
+def _map_creator_response(creator: Any) -> dict[str, Any]:
+    """Map Prisma creator model to response dict."""
+    return {
+        "id": creator.id,
+        "name": creator.name,
+        "display_name": creator.displayName,
+        "description": creator.description,
+        "website_url": creator.websiteUrl,
+        "logo_url": creator.logoUrl,
+        "metadata": dict(creator.metadata or {}),
+        "created_at": creator.createdAt.isoformat() if creator.createdAt else None,
+        "updated_at": creator.updatedAt.isoformat() if creator.updatedAt else None,
+    }
+
+
+@router.post(
+    "/llm/models",
+    status_code=status.HTTP_201_CREATED,
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def create_model(
+    request: CreateLlmModelRequest,
+) -> dict[str, Any]:
+    """Create a new LLM model.
+
+    Requires admin authentication.
+    """
+    try:
+        import prisma.models as pm
+
+        # Resolve provider name to ID
+        provider = await pm.LlmProvider.prisma().find_unique(
+            where={"name": request.provider_id}
+        )
+        if not provider:
+            # Try as UUID fallback
+            provider = await pm.LlmProvider.prisma().find_unique(
+                where={"id": request.provider_id}
+            )
+        if not provider:
+            raise HTTPException(
+                status_code=404,
+                detail=f"Provider '{request.provider_id}' not found",
+            )
+
+        model = await db_write.create_model(
+            slug=request.slug,
+            display_name=request.display_name,
+            provider_id=provider.id,
+            context_window=request.context_window,
+            price_tier=request.price_tier,
+            description=request.description,
+            creator_id=request.creator_id,
+            max_output_tokens=request.max_output_tokens,
+            is_enabled=request.is_enabled,
+            is_recommended=request.is_recommended,
+            supports_tools=request.supports_tools,
+            supports_json_output=request.supports_json_output,
+            supports_reasoning=request.supports_reasoning,
+            supports_parallel_tool_calls=request.supports_parallel_tool_calls,
+            capabilities=request.capabilities,
+            metadata=request.metadata,
+        )
+        # Create costs if provided in the raw request body
+        if hasattr(request, 'costs') and request.costs:
+            for cost_input in request.costs:
+                await pm.LlmModelCost.prisma().create(
+                    data={
+                        "unit": cost_input.get("unit", "RUN"),
+                        "creditCost": int(cost_input.get("credit_cost", 1)),
+                        "credentialProvider": provider.name,
+                        "metadata": prisma.Json(cost_input.get("metadata", {})),
+                        "Model": {"connect": {"id": model.id}},
+                    }
+                )
+
+        await db_write.refresh_runtime_caches()
+        logger.info(f"Created model '{request.slug}' (id: {model.id})")
+
+        # Re-fetch with costs included
+        model = await pm.LlmModel.prisma().find_unique(
+            where={"id": model.id},
+            include={"Costs": True, "Creator": True},
+        )
+        return _map_model_response(model)
+    except ValueError as e:
+        logger.warning(f"Model creation validation failed: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.exception(f"Failed to create model: {e}")
+        raise HTTPException(status_code=500, detail="Failed to create model")
+
+
+@router.patch(
+    "/llm/models/{slug:path}",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def update_model(
+    slug: str,
+    request: UpdateLlmModelRequest,
+) -> dict[str, Any]:
+    """Update an existing LLM model.
+
+    Requires admin authentication.
+    """
+    try:
+        # Find model by slug first to get ID
+        import prisma.models
+
+        existing = await prisma.models.LlmModel.prisma().find_unique(
+            where={"slug": slug}
+        )
+        if not existing:
+            raise HTTPException(
+                status_code=404, detail=f"Model with slug '{slug}' not found"
+            )
+
+        model = await db_write.update_model(
+            model_id=existing.id,
+            display_name=request.display_name,
+            description=request.description,
+            creator_id=request.creator_id,
+            context_window=request.context_window,
+            max_output_tokens=request.max_output_tokens,
+            price_tier=request.price_tier,
+            is_enabled=request.is_enabled,
+            is_recommended=request.is_recommended,
+            supports_tools=request.supports_tools,
+            supports_json_output=request.supports_json_output,
+            supports_reasoning=request.supports_reasoning,
+            supports_parallel_tool_calls=request.supports_parallel_tool_calls,
+            capabilities=request.capabilities,
+            metadata=request.metadata,
+        )
+        await db_write.refresh_runtime_caches()
+        logger.info(f"Updated model '{slug}' (id: {model.id})")
+        return _map_model_response(model)
+    except ValueError as e:
+        logger.warning(f"Model update validation failed: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.exception(f"Failed to update model: {e}")
+        raise HTTPException(status_code=500, detail="Failed to update model")
+
+
+@router.delete(
+    "/llm/models/{slug:path}",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def delete_model(
+    slug: str,
+    replacement_model_slug: str | None = None,
+) -> dict[str, Any]:
+    """Delete an LLM model with optional migration.
+
+    If workflows are using this model and no replacement_model_slug is given,
+    returns 400 with the node count. Provide replacement_model_slug to migrate
+    affected nodes before deletion.
+    """
+    try:
+        import prisma.models
+
+        existing = await prisma.models.LlmModel.prisma().find_unique(
+            where={"slug": slug}
+        )
+        if not existing:
+            raise HTTPException(
+                status_code=404, detail=f"Model with slug '{slug}' not found"
+            )
+
+        result = await db_write.delete_model(
+            model_id=existing.id,
+            replacement_model_slug=replacement_model_slug,
+        )
+        await db_write.refresh_runtime_caches()
+        logger.info(
+            f"Deleted model '{slug}' (migrated {result['nodes_migrated']} nodes)"
+        )
+        return result
+    except ValueError as e:
+        logger.warning(f"Model deletion validation failed: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.exception(f"Failed to delete model: {e}")
+        raise HTTPException(status_code=500, detail="Failed to delete model")
+
+
+@router.get(
+    "/llm/models/{slug:path}/usage",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def get_model_usage(slug: str) -> dict[str, Any]:
+    """Get usage count for a model — how many workflow nodes reference it."""
+    try:
+        return await db_write.get_model_usage(slug)
+    except Exception as e:
+        logger.exception(f"Failed to get model usage: {e}")
+        raise HTTPException(status_code=500, detail="Failed to get model usage")
+
+
+@router.post(
+    "/llm/models/{slug:path}/toggle",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def toggle_model(
+    slug: str,
+    request: dict[str, Any],
+) -> dict[str, Any]:
+    """Toggle a model's enabled status with optional migration when disabling.
+
+    Body params:
+        is_enabled: bool
+        migrate_to_slug: optional str
+        migration_reason: optional str
+        custom_credit_cost: optional int
+    """
+    try:
+        import prisma.models
+
+        existing = await prisma.models.LlmModel.prisma().find_unique(
+            where={"slug": slug}
+        )
+        if not existing:
+            raise HTTPException(
+                status_code=404, detail=f"Model with slug '{slug}' not found"
+            )
+
+        result = await db_write.toggle_model_with_migration(
+            model_id=existing.id,
+            is_enabled=request.get("is_enabled", True),
+            migrate_to_slug=request.get("migrate_to_slug"),
+            migration_reason=request.get("migration_reason"),
+            custom_credit_cost=request.get("custom_credit_cost"),
+        )
+        await db_write.refresh_runtime_caches()
+        logger.info(
+            f"Toggled model '{slug}' enabled={request.get('is_enabled')} "
+            f"(migrated {result['nodes_migrated']} nodes)"
+        )
+        return result
+    except ValueError as e:
+        logger.warning(f"Model toggle failed: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.exception(f"Failed to toggle model: {e}")
+        raise HTTPException(status_code=500, detail="Failed to toggle model")
+
+
+@router.get(
+    "/llm/migrations",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def list_migrations(
+    include_reverted: bool = False,
+) -> dict[str, Any]:
+    """List model migrations."""
+    try:
+        migrations = await db_write.list_migrations(
+            include_reverted=include_reverted
+        )
+        return {"migrations": migrations}
+    except Exception as e:
+        logger.exception(f"Failed to list migrations: {e}")
+        raise HTTPException(
+            status_code=500, detail="Failed to list migrations"
+        )
+
+
+@router.post(
+    "/llm/migrations/{migration_id}/revert",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def revert_migration(
+    migration_id: str,
+    re_enable_source_model: bool = True,
+) -> dict[str, Any]:
+    """Revert a model migration, restoring affected nodes."""
+    try:
+        result = await db_write.revert_migration(
+            migration_id=migration_id,
+            re_enable_source_model=re_enable_source_model,
+        )
+        await db_write.refresh_runtime_caches()
+        logger.info(
+            f"Reverted migration {migration_id}: "
+            f"{result['nodes_reverted']} nodes restored"
+        )
+        return result
+    except ValueError as e:
+        logger.warning(f"Migration revert failed: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.exception(f"Failed to revert migration: {e}")
+        raise HTTPException(
+            status_code=500, detail="Failed to revert migration"
+        )
+
+
+@router.post(
+    "/llm/providers",
+    status_code=status.HTTP_201_CREATED,
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def create_provider(
+    request: CreateLlmProviderRequest,
+) -> dict[str, Any]:
+    """Create a new LLM provider.
+
+    Requires admin authentication.
+    """
+    try:
+        provider = await db_write.create_provider(
+            name=request.name,
+            display_name=request.display_name,
+            description=request.description,
+            default_credential_provider=request.default_credential_provider,
+            default_credential_id=request.default_credential_id,
+            default_credential_type=request.default_credential_type,
+            metadata=request.metadata,
+        )
+        await db_write.refresh_runtime_caches()
+        logger.info(f"Created provider '{request.name}' (id: {provider.id})")
+        return _map_provider_response(provider)
+    except ValueError as e:
+        logger.warning(f"Provider creation validation failed: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.exception(f"Failed to create provider: {e}")
+        raise HTTPException(status_code=500, detail="Failed to create provider")
+
+
+@router.patch(
+    "/llm/providers/{name}",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def update_provider(
+    name: str,
+    request: UpdateLlmProviderRequest,
+) -> dict[str, Any]:
+    """Update an existing LLM provider.
+
+    Requires admin authentication.
+    """
+    try:
+        # Find provider by name first to get ID
+        import prisma.models
+
+        existing = await prisma.models.LlmProvider.prisma().find_unique(
+            where={"name": name}
+        )
+        if not existing:
+            raise HTTPException(
+                status_code=404, detail=f"Provider with name '{name}' not found"
+            )
+
+        provider = await db_write.update_provider(
+            provider_id=existing.id,
+            display_name=request.display_name,
+            description=request.description,
+            default_credential_provider=request.default_credential_provider,
+            default_credential_id=request.default_credential_id,
+            default_credential_type=request.default_credential_type,
+            metadata=request.metadata,
+        )
+        await db_write.refresh_runtime_caches()
+        logger.info(f"Updated provider '{name}' (id: {provider.id})")
+        return _map_provider_response(provider)
+    except ValueError as e:
+        logger.warning(f"Provider update validation failed: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.exception(f"Failed to update provider: {e}")
+        raise HTTPException(status_code=500, detail="Failed to update provider")
+
+
+@router.delete(
+    "/llm/providers/{name}",
+    status_code=status.HTTP_204_NO_CONTENT,
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def delete_provider(
+    name: str,
+) -> None:
+    """Delete an LLM provider.
+
+    Requires admin authentication.
+    A provider can only be deleted if it has no associated models.
+    """
+    try:
+        # Find provider by name first to get ID
+        import prisma.models
+
+        existing = await prisma.models.LlmProvider.prisma().find_unique(
+            where={"name": name}
+        )
+        if not existing:
+            raise HTTPException(
+                status_code=404, detail=f"Provider with name '{name}' not found"
+            )
+
+        await db_write.delete_provider(provider_id=existing.id)
+        await db_write.refresh_runtime_caches()
+        logger.info(f"Deleted provider '{name}' (id: {existing.id})")
+    except ValueError as e:
+        logger.warning(f"Provider deletion validation failed: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.exception(f"Failed to delete provider: {e}")
+        raise HTTPException(status_code=500, detail="Failed to delete provider")
+
+
+@router.get(
+    "/llm/admin/providers",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def admin_list_providers() -> dict[str, Any]:
+    """List all LLM providers from the database.
+
+    Unlike the public endpoint, this returns ALL providers including
+    those with no models. Requires admin authentication.
+    """
+    try:
+        import prisma.models
+
+        providers = await prisma.models.LlmProvider.prisma().find_many(
+            order={"name": "asc"},
+            include={"Models": True},
+        )
+        return {
+            "providers": [
+                {**_map_provider_response(p), "model_count": len(p.Models) if p.Models else 0}
+                for p in providers
+            ]
+        }
+    except Exception as e:
+        logger.exception(f"Failed to list providers: {e}")
+        raise HTTPException(status_code=500, detail="Failed to list providers")
+
+
+@router.get(
+    "/llm/admin/models",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def admin_list_models(
+    page: int = 1,
+    page_size: int = 100,
+    enabled_only: bool = False,
+) -> dict[str, Any]:
+    """List all LLM models from the database.
+
+    Unlike the public endpoint, this returns full model data including
+    costs and creator info. Requires admin authentication.
+    """
+    try:
+        import prisma.models
+
+        where = {"isEnabled": True} if enabled_only else {}
+        models = await prisma.models.LlmModel.prisma().find_many(
+            where=where,
+            skip=(page - 1) * page_size,
+            take=page_size,
+            order={"displayName": "asc"},
+            include={"Costs": True, "Creator": True},
+        )
+        return {
+            "models": [
+                {
+                    **_map_model_response(m),
+                    "creator": _map_creator_response(m.Creator) if m.Creator else None,
+                    "costs": [
+                        {
+                            "unit": c.unit,
+                            "credit_cost": float(c.creditCost),
+                            "credential_provider": c.credentialProvider,
+                            "credential_type": c.credentialType,
+                            "metadata": dict(c.metadata or {}),
+                        }
+                        for c in (m.Costs or [])
+                    ],
+                }
+                for m in models
+            ]
+        }
+    except Exception as e:
+        logger.exception(f"Failed to list models: {e}")
+        raise HTTPException(status_code=500, detail="Failed to list models")
+
+
+@router.get(
+    "/llm/creators",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def list_creators() -> dict[str, Any]:
+    """List all LLM model creators.
+
+    Requires admin authentication.
+    """
+    try:
+        import prisma.models
+
+        creators = await prisma.models.LlmModelCreator.prisma().find_many(
+            order={"name": "asc"}
+        )
+        logger.info(f"Retrieved {len(creators)} creators")
+        return {"creators": [_map_creator_response(c) for c in creators]}
+    except Exception as e:
+        logger.exception(f"Failed to list creators: {e}")
+        raise HTTPException(status_code=500, detail="Failed to list creators")
+
+
+@router.post(
+    "/llm/creators",
+    status_code=status.HTTP_201_CREATED,
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def create_creator(
+    request: dict[str, Any],
+) -> dict[str, Any]:
+    """Create a new LLM model creator."""
+    try:
+        import prisma.models
+
+        creator = await prisma.models.LlmModelCreator.prisma().create(
+            data={
+                "name": request["name"],
+                "displayName": request["display_name"],
+                "description": request.get("description"),
+                "websiteUrl": request.get("website_url"),
+                "logoUrl": request.get("logo_url"),
+                "metadata": prisma.Json(request.get("metadata", {})),
+            }
+        )
+        logger.info(f"Created creator '{creator.name}' (id: {creator.id})")
+        return _map_creator_response(creator)
+    except Exception as e:
+        logger.exception(f"Failed to create creator: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+@router.patch(
+    "/llm/creators/{name}",
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def update_creator(
+    name: str,
+    request: dict[str, Any],
+) -> dict[str, Any]:
+    """Update an existing LLM model creator."""
+    try:
+        import prisma.models
+
+        existing = await prisma.models.LlmModelCreator.prisma().find_unique(
+            where={"name": name}
+        )
+        if not existing:
+            raise HTTPException(
+                status_code=404, detail=f"Creator '{name}' not found"
+            )
+
+        data: dict[str, Any] = {}
+        if "display_name" in request:
+            data["displayName"] = request["display_name"]
+        if "description" in request:
+            data["description"] = request["description"]
+        if "website_url" in request:
+            data["websiteUrl"] = request["website_url"]
+        if "logo_url" in request:
+            data["logoUrl"] = request["logo_url"]
+
+        creator = await prisma.models.LlmModelCreator.prisma().update(
+            where={"id": existing.id},
+            data=data,
+        )
+        logger.info(f"Updated creator '{name}' (id: {creator.id})")
+        return _map_creator_response(creator)
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.exception(f"Failed to update creator: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+@router.delete(
+    "/llm/creators/{name}",
+    status_code=status.HTTP_204_NO_CONTENT,
+    dependencies=[Security(autogpt_libs.auth.requires_admin_user)],
+)
+async def delete_creator(
+    name: str,
+) -> None:
+    """Delete an LLM model creator."""
+    try:
+        import prisma.models
+
+        existing = await prisma.models.LlmModelCreator.prisma().find_unique(
+            where={"name": name},
+            include={"Models": True},
+        )
+        if not existing:
+            raise HTTPException(
+                status_code=404, detail=f"Creator '{name}' not found"
+            )
+
+        if existing.Models and len(existing.Models) > 0:
+            raise HTTPException(
+                status_code=400,
+                detail=f"Cannot delete creator '{name}' — it has {len(existing.Models)} associated models",
+            )
+
+        await prisma.models.LlmModelCreator.prisma().delete(
+            where={"id": existing.id}
+        )
+        logger.info(f"Deleted creator '{name}' (id: {existing.id})")
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.exception(f"Failed to delete creator: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
--- a/autogpt_platform/backend/backend/server/v2/llm/db_write.py
+++ b/autogpt_platform/backend/backend/server/v2/llm/db_write.py
@@ -0,0 +1,588 @@
+"""Database write operations for LLM registry admin API."""
+
+import json
+import logging
+from datetime import datetime, timezone
+from typing import Any
+
+import prisma
+import prisma.models
+
+from backend.data import llm_registry
+from backend.data.db import transaction
+
+logger = logging.getLogger(__name__)
+
+
+def _build_provider_data(
+    name: str,
+    display_name: str,
+    description: str | None = None,
+    default_credential_provider: str | None = None,
+    default_credential_id: str | None = None,
+    default_credential_type: str | None = None,
+    metadata: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    """Build provider data dict for Prisma operations."""
+    return {
+        "name": name,
+        "displayName": display_name,
+        "description": description,
+        "defaultCredentialProvider": default_credential_provider,
+        "defaultCredentialId": default_credential_id,
+        "defaultCredentialType": default_credential_type,
+        "metadata": prisma.Json(metadata or {}),
+    }
+
+
+def _build_model_data(
+    slug: str,
+    display_name: str,
+    provider_id: str,
+    context_window: int,
+    price_tier: int,
+    description: str | None = None,
+    creator_id: str | None = None,
+    max_output_tokens: int | None = None,
+    is_enabled: bool = True,
+    is_recommended: bool = False,
+    supports_tools: bool = False,
+    supports_json_output: bool = False,
+    supports_reasoning: bool = False,
+    supports_parallel_tool_calls: bool = False,
+    capabilities: dict[str, Any] | None = None,
+    metadata: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    """Build model data dict for Prisma operations."""
+    data: dict[str, Any] = {
+        "slug": slug,
+        "displayName": display_name,
+        "description": description,
+        "Provider": {"connect": {"id": provider_id}},
+        "contextWindow": context_window,
+        "maxOutputTokens": max_output_tokens,
+        "priceTier": price_tier,
+        "isEnabled": is_enabled,
+        "isRecommended": is_recommended,
+        "supportsTools": supports_tools,
+        "supportsJsonOutput": supports_json_output,
+        "supportsReasoning": supports_reasoning,
+        "supportsParallelToolCalls": supports_parallel_tool_calls,
+        "capabilities": prisma.Json(capabilities or {}),
+        "metadata": prisma.Json(metadata or {}),
+    }
+    if creator_id:
+        data["Creator"] = {"connect": {"id": creator_id}}
+    return data
+
+
+async def create_provider(
+    name: str,
+    display_name: str,
+    description: str | None = None,
+    default_credential_provider: str | None = None,
+    default_credential_id: str | None = None,
+    default_credential_type: str | None = None,
+    metadata: dict[str, Any] | None = None,
+) -> prisma.models.LlmProvider:
+    """Create a new LLM provider."""
+    data = _build_provider_data(
+        name=name,
+        display_name=display_name,
+        description=description,
+        default_credential_provider=default_credential_provider,
+        default_credential_id=default_credential_id,
+        default_credential_type=default_credential_type,
+        metadata=metadata,
+    )
+    provider = await prisma.models.LlmProvider.prisma().create(
+        data=data,
+        include={"Models": True},
+    )
+    if not provider:
+        raise ValueError("Failed to create provider")
+    return provider
+
+
+async def update_provider(
+    provider_id: str,
+    display_name: str | None = None,
+    description: str | None = None,
+    default_credential_provider: str | None = None,
+    default_credential_id: str | None = None,
+    default_credential_type: str | None = None,
+    metadata: dict[str, Any] | None = None,
+) -> prisma.models.LlmProvider:
+    """Update an existing LLM provider."""
+    # Fetch existing provider to get current name
+    provider = await prisma.models.LlmProvider.prisma().find_unique(
+        where={"id": provider_id}
+    )
+    if not provider:
+        raise ValueError(f"Provider with id '{provider_id}' not found")
+
+    # Build update data (only include fields that are provided)
+    data: dict[str, Any] = {}
+    if display_name is not None:
+        data["displayName"] = display_name
+    if description is not None:
+        data["description"] = description
+    if default_credential_provider is not None:
+        data["defaultCredentialProvider"] = default_credential_provider
+    if default_credential_id is not None:
+        data["defaultCredentialId"] = default_credential_id
+    if default_credential_type is not None:
+        data["defaultCredentialType"] = default_credential_type
+    if metadata is not None:
+        data["metadata"] = prisma.Json(metadata)
+
+    updated = await prisma.models.LlmProvider.prisma().update(
+        where={"id": provider_id},
+        data=data,
+        include={"Models": True},
+    )
+    if not updated:
+        raise ValueError("Failed to update provider")
+    return updated
+
+
+async def delete_provider(provider_id: str) -> bool:
+    """Delete an LLM provider.
+
+    A provider can only be deleted if it has no associated models.
+    """
+    # Check if provider exists
+    provider = await prisma.models.LlmProvider.prisma().find_unique(
+        where={"id": provider_id},
+        include={"Models": True},
+    )
+    if not provider:
+        raise ValueError(f"Provider with id '{provider_id}' not found")
+
+    # Check if provider has any models
+    model_count = len(provider.Models) if provider.Models else 0
+    if model_count > 0:
+        raise ValueError(
+            f"Cannot delete provider '{provider.displayName}' because it has "
+            f"{model_count} model(s). Delete all models first."
+        )
+
+    await prisma.models.LlmProvider.prisma().delete(where={"id": provider_id})
+    return True
+
+
+async def create_model(
+    slug: str,
+    display_name: str,
+    provider_id: str,
+    context_window: int,
+    price_tier: int,
+    description: str | None = None,
+    creator_id: str | None = None,
+    max_output_tokens: int | None = None,
+    is_enabled: bool = True,
+    is_recommended: bool = False,
+    supports_tools: bool = False,
+    supports_json_output: bool = False,
+    supports_reasoning: bool = False,
+    supports_parallel_tool_calls: bool = False,
+    capabilities: dict[str, Any] | None = None,
+    metadata: dict[str, Any] | None = None,
+) -> prisma.models.LlmModel:
+    """Create a new LLM model."""
+    data = _build_model_data(
+        slug=slug,
+        display_name=display_name,
+        provider_id=provider_id,
+        context_window=context_window,
+        price_tier=price_tier,
+        description=description,
+        creator_id=creator_id,
+        max_output_tokens=max_output_tokens,
+        is_enabled=is_enabled,
+        is_recommended=is_recommended,
+        supports_tools=supports_tools,
+        supports_json_output=supports_json_output,
+        supports_reasoning=supports_reasoning,
+        supports_parallel_tool_calls=supports_parallel_tool_calls,
+        capabilities=capabilities,
+        metadata=metadata,
+    )
+    model = await prisma.models.LlmModel.prisma().create(
+        data=data,
+        include={"Costs": True, "Creator": True, "Provider": True},
+    )
+    if not model:
+        raise ValueError("Failed to create model")
+    return model
+
+
+async def update_model(
+    model_id: str,
+    display_name: str | None = None,
+    description: str | None = None,
+    creator_id: str | None = None,
+    context_window: int | None = None,
+    max_output_tokens: int | None = None,
+    price_tier: int | None = None,
+    is_enabled: bool | None = None,
+    is_recommended: bool | None = None,
+    supports_tools: bool | None = None,
+    supports_json_output: bool | None = None,
+    supports_reasoning: bool | None = None,
+    supports_parallel_tool_calls: bool | None = None,
+    capabilities: dict[str, Any] | None = None,
+    metadata: dict[str, Any] | None = None,
+) -> prisma.models.LlmModel:
+    """Update an existing LLM model.
+
+    When is_recommended=True, clears the flag on all other models first so
+    only one model can be recommended at a time.
+    """
+    # Build update data (only include fields that are provided)
+    data: dict[str, Any] = {}
+    if display_name is not None:
+        data["displayName"] = display_name
+    if description is not None:
+        data["description"] = description
+    if context_window is not None:
+        data["contextWindow"] = context_window
+    if max_output_tokens is not None:
+        data["maxOutputTokens"] = max_output_tokens
+    if price_tier is not None:
+        data["priceTier"] = price_tier
+    if is_enabled is not None:
+        data["isEnabled"] = is_enabled
+    if is_recommended is not None:
+        data["isRecommended"] = is_recommended
+    if supports_tools is not None:
+        data["supportsTools"] = supports_tools
+    if supports_json_output is not None:
+        data["supportsJsonOutput"] = supports_json_output
+    if supports_reasoning is not None:
+        data["supportsReasoning"] = supports_reasoning
+    if supports_parallel_tool_calls is not None:
+        data["supportsParallelToolCalls"] = supports_parallel_tool_calls
+    if capabilities is not None:
+        data["capabilities"] = prisma.Json(capabilities)
+    if metadata is not None:
+        data["metadata"] = prisma.Json(metadata)
+    if creator_id is not None:
+        data["creatorId"] = creator_id if creator_id else None
+
+    async with transaction() as tx:
+        # Enforce single recommended model: unset all others first.
+        if is_recommended is True:
+            await tx.llmmodel.update_many(
+                where={"id": {"not": model_id}},
+                data={"isRecommended": False},
+            )
+
+        model = await tx.llmmodel.update(
+            where={"id": model_id},
+            data=data,
+            include={"Costs": True, "Creator": True, "Provider": True},
+        )
+
+    if not model:
+        raise ValueError(f"Model with id '{model_id}' not found")
+    return model
+
+
+async def get_model_usage(slug: str) -> dict[str, Any]:
+    """Get usage count for a model — how many AgentNodes reference it."""
+    import prisma as prisma_module
+
+    count_result = await prisma_module.get_client().query_raw(
+        """
+        SELECT COUNT(*) as count
+        FROM "AgentNode"
+        WHERE "constantInput"::jsonb->>'model' = $1
+        """,
+        slug,
+    )
+    node_count = int(count_result[0]["count"]) if count_result else 0
+    return {"model_slug": slug, "node_count": node_count}
+
+
+async def toggle_model_with_migration(
+    model_id: str,
+    is_enabled: bool,
+    migrate_to_slug: str | None = None,
+    migration_reason: str | None = None,
+    custom_credit_cost: int | None = None,
+) -> dict[str, Any]:
+    """Toggle a model's enabled status, optionally migrating workflows when disabling."""
+    model = await prisma.models.LlmModel.prisma().find_unique(
+        where={"id": model_id}, include={"Costs": True}
+    )
+    if not model:
+        raise ValueError(f"Model with id '{model_id}' not found")
+
+    nodes_migrated = 0
+    migration_id: str | None = None
+
+    if not is_enabled and migrate_to_slug:
+        async with transaction() as tx:
+            replacement = await tx.llmmodel.find_unique(
+                where={"slug": migrate_to_slug}
+            )
+            if not replacement:
+                raise ValueError(
+                    f"Replacement model '{migrate_to_slug}' not found"
+                )
+            if not replacement.isEnabled:
+                raise ValueError(
+                    f"Replacement model '{migrate_to_slug}' is disabled. "
+                    f"Please enable it before using it as a replacement."
+                )
+
+            node_ids_result = await tx.query_raw(
+                """
+                SELECT id
+                FROM "AgentNode"
+                WHERE "constantInput"::jsonb->>'model' = $1
+                FOR UPDATE
+                """,
+                model.slug,
+            )
+            migrated_node_ids = (
+                [row["id"] for row in node_ids_result] if node_ids_result else []
+            )
+            nodes_migrated = len(migrated_node_ids)
+
+            if nodes_migrated > 0:
+                node_ids_json = json.dumps(migrated_node_ids)
+                await tx.execute_raw(
+                    """
+                    UPDATE "AgentNode"
+                    SET "constantInput" = JSONB_SET(
+                        "constantInput"::jsonb,
+                        '{model}',
+                        to_jsonb($1::text)
+                    )
+                    WHERE id::text IN (
+                        SELECT jsonb_array_elements_text($2::jsonb)
+                    )
+                    """,
+                    migrate_to_slug,
+                    node_ids_json,
+                )
+
+            await tx.llmmodel.update(
+                where={"id": model_id},
+                data={"isEnabled": is_enabled},
+            )
+
+            if nodes_migrated > 0:
+                migration_record = await tx.llmmodelmigration.create(
+                    data={
+                        "sourceModelSlug": model.slug,
+                        "targetModelSlug": migrate_to_slug,
+                        "reason": migration_reason,
+                        "migratedNodeIds": json.dumps(migrated_node_ids),
+                        "nodeCount": nodes_migrated,
+                        "customCreditCost": custom_credit_cost,
+                    }
+                )
+                migration_id = migration_record.id
+    else:
+        await prisma.models.LlmModel.prisma().update(
+            where={"id": model_id},
+            data={"isEnabled": is_enabled},
+        )
+
+    return {
+        "nodes_migrated": nodes_migrated,
+        "migrated_to_slug": migrate_to_slug if nodes_migrated > 0 else None,
+        "migration_id": migration_id,
+    }
+
+
+async def delete_model(
+    model_id: str, replacement_model_slug: str | None = None
+) -> dict[str, Any]:
+    """Delete an LLM model, optionally migrating affected AgentNodes first.
+
+    If workflows are using this model and no replacement is given, raises ValueError.
+    If replacement is given, atomically migrates all affected nodes then deletes.
+    """
+    model = await prisma.models.LlmModel.prisma().find_unique(
+        where={"id": model_id}, include={"Costs": True}
+    )
+    if not model:
+        raise ValueError(f"Model with id '{model_id}' not found")
+
+    deleted_slug = model.slug
+    deleted_display_name = model.displayName
+
+    async with transaction() as tx:
+        count_result = await tx.query_raw(
+            """
+            SELECT COUNT(*) as count
+            FROM "AgentNode"
+            WHERE "constantInput"::jsonb->>'model' = $1
+            """,
+            deleted_slug,
+        )
+        nodes_to_migrate = int(count_result[0]["count"]) if count_result else 0
+
+        if nodes_to_migrate > 0:
+            if not replacement_model_slug:
+                raise ValueError(
+                    f"Cannot delete model '{deleted_slug}': {nodes_to_migrate} workflow node(s) "
+                    f"are using it. Please provide a replacement_model_slug to migrate them."
+                )
+            replacement = await tx.llmmodel.find_unique(
+                where={"slug": replacement_model_slug}
+            )
+            if not replacement:
+                raise ValueError(
+                    f"Replacement model '{replacement_model_slug}' not found"
+                )
+            if not replacement.isEnabled:
+                raise ValueError(
+                    f"Replacement model '{replacement_model_slug}' is disabled."
+                )
+
+            await tx.execute_raw(
+                """
+                UPDATE "AgentNode"
+                SET "constantInput" = JSONB_SET(
+                    "constantInput"::jsonb,
+                    '{model}',
+                    to_jsonb($1::text)
+                )
+                WHERE "constantInput"::jsonb->>'model' = $2
+                """,
+                replacement_model_slug,
+                deleted_slug,
+            )
+
+        await tx.llmmodel.delete(where={"id": model_id})
+
+    return {
+        "deleted_model_slug": deleted_slug,
+        "deleted_model_display_name": deleted_display_name,
+        "replacement_model_slug": replacement_model_slug,
+        "nodes_migrated": nodes_to_migrate,
+    }
+
+
+async def list_migrations(
+    include_reverted: bool = False,
+) -> list[dict[str, Any]]:
+    """List model migrations."""
+    where: Any = None if include_reverted else {"isReverted": False}
+    records = await prisma.models.LlmModelMigration.prisma().find_many(
+        where=where,
+        order={"createdAt": "desc"},
+    )
+    return [
+        {
+            "id": r.id,
+            "source_model_slug": r.sourceModelSlug,
+            "target_model_slug": r.targetModelSlug,
+            "reason": r.reason,
+            "node_count": r.nodeCount,
+            "custom_credit_cost": r.customCreditCost,
+            "is_reverted": r.isReverted,
+            "reverted_at": r.revertedAt.isoformat() if r.revertedAt else None,
+            "created_at": r.createdAt.isoformat(),
+        }
+        for r in records
+    ]
+
+
+async def revert_migration(
+    migration_id: str,
+    re_enable_source_model: bool = True,
+) -> dict[str, Any]:
+    """Revert a model migration, restoring affected nodes to their original model."""
+    migration = await prisma.models.LlmModelMigration.prisma().find_unique(
+        where={"id": migration_id}
+    )
+    if not migration:
+        raise ValueError(f"Migration with id '{migration_id}' not found")
+
+    if migration.isReverted:
+        raise ValueError(
+            f"Migration '{migration_id}' has already been reverted"
+        )
+
+    source_model = await prisma.models.LlmModel.prisma().find_unique(
+        where={"slug": migration.sourceModelSlug}
+    )
+    if not source_model:
+        raise ValueError(
+            f"Source model '{migration.sourceModelSlug}' no longer exists."
+        )
+
+    migrated_node_ids: list[str] = (
+        migration.migratedNodeIds
+        if isinstance(migration.migratedNodeIds, list)
+        else json.loads(migration.migratedNodeIds)  # type: ignore
+    )
+    if not migrated_node_ids:
+        raise ValueError("No nodes to revert in this migration")
+
+    source_model_re_enabled = False
+
+    async with transaction() as tx:
+        if not source_model.isEnabled and re_enable_source_model:
+            await tx.llmmodel.update(
+                where={"id": source_model.id},
+                data={"isEnabled": True},
+            )
+            source_model_re_enabled = True
+
+        node_ids_json = json.dumps(migrated_node_ids)
+        result = await tx.execute_raw(
+            """
+            UPDATE "AgentNode"
+            SET "constantInput" = JSONB_SET(
+                "constantInput"::jsonb,
+                '{model}',
+                to_jsonb($1::text)
+            )
+            WHERE id::text IN (
+                SELECT jsonb_array_elements_text($2::jsonb)
+            )
+            AND "constantInput"::jsonb->>'model' = $3
+            """,
+            migration.sourceModelSlug,
+            node_ids_json,
+            migration.targetModelSlug,
+        )
+        nodes_reverted = result if isinstance(result, int) else 0
+
+        await tx.llmmodelmigration.update(
+            where={"id": migration_id},
+            data={
+                "isReverted": True,
+                "revertedAt": datetime.now(timezone.utc),
+            },
+        )
+
+    return {
+        "migration_id": migration_id,
+        "source_model_slug": migration.sourceModelSlug,
+        "target_model_slug": migration.targetModelSlug,
+        "nodes_reverted": nodes_reverted,
+        "nodes_already_changed": len(migrated_node_ids) - nodes_reverted,
+        "source_model_re_enabled": source_model_re_enabled,
+    }
+
+
+async def refresh_runtime_caches() -> None:
+    """Invalidate the shared Redis cache, refresh this process, notify other workers."""
+    from backend.data.llm_registry.notifications import (
+        publish_registry_refresh_notification,
+    )
+
+    # Invalidate Redis so the next fetch hits the DB.
+    llm_registry.clear_registry_cache()
+    # Refresh this process (also repopulates Redis via @cached(shared_cache=True)).
+    await llm_registry.refresh_llm_registry()
+    # Tell other workers to reload their in-process cache from the fresh Redis data.
+    await publish_registry_refresh_notification()
--- a/autogpt_platform/backend/backend/server/v2/llm/model.py
+++ b/autogpt_platform/backend/backend/server/v2/llm/model.py
@@ -0,0 +1,68 @@
+"""Pydantic models for LLM registry public API."""
+
+from __future__ import annotations
+
+from typing import Any
+
+import pydantic
+
+
+class LlmModelCost(pydantic.BaseModel):
+    """Cost configuration for an LLM model."""
+
+    unit: str  # "RUN" or "TOKENS"
+    credit_cost: int = pydantic.Field(ge=0)
+    credential_provider: str
+    credential_id: str | None = None
+    credential_type: str | None = None
+    currency: str | None = None
+    metadata: dict[str, Any] = pydantic.Field(default_factory=dict)
+
+
+class LlmModelCreator(pydantic.BaseModel):
+    """Represents the organization that created/trained the model."""
+
+    id: str
+    name: str
+    display_name: str
+    description: str | None = None
+    website_url: str | None = None
+    logo_url: str | None = None
+
+
+class LlmModel(pydantic.BaseModel):
+    """Public-facing LLM model information."""
+
+    slug: str
+    display_name: str
+    description: str | None = None
+    provider_name: str
+    creator: LlmModelCreator | None = None
+    context_window: int
+    max_output_tokens: int | None = None
+    price_tier: int  # 1=cheapest, 2=medium, 3=expensive
+    is_enabled: bool = True
+    is_recommended: bool = False
+    capabilities: dict[str, Any] = pydantic.Field(default_factory=dict)
+    costs: list[LlmModelCost] = pydantic.Field(default_factory=list)
+
+
+class LlmProvider(pydantic.BaseModel):
+    """Provider with its enabled models."""
+
+    name: str
+    display_name: str
+    models: list[LlmModel] = pydantic.Field(default_factory=list)
+
+
+class LlmModelsResponse(pydantic.BaseModel):
+    """Response for GET /llm/models."""
+
+    models: list[LlmModel]
+    total: int
+
+
+class LlmProvidersResponse(pydantic.BaseModel):
+    """Response for GET /llm/providers."""
+
+    providers: list[LlmProvider]
--- a/autogpt_platform/backend/backend/server/v2/llm/routes.py
+++ b/autogpt_platform/backend/backend/server/v2/llm/routes.py
@@ -0,0 +1,143 @@
+"""Public read-only API for LLM registry."""
+
+import autogpt_libs.auth
+import fastapi
+
+from backend.data.llm_registry import (
+    RegistryModelCreator,
+    get_all_models,
+    get_enabled_models,
+)
+from backend.server.v2.llm import model as llm_model
+
+router = fastapi.APIRouter(
+    prefix="/llm",
+    tags=["llm"],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+)
+
+
+def _map_creator(
+    creator: RegistryModelCreator | None,
+) -> llm_model.LlmModelCreator | None:
+    """Convert registry creator to API model."""
+    if not creator:
+        return None
+    return llm_model.LlmModelCreator(
+        id=creator.id,
+        name=creator.name,
+        display_name=creator.display_name,
+        description=creator.description,
+        website_url=creator.website_url,
+        logo_url=creator.logo_url,
+    )
+
+
+@router.get("/models", response_model=llm_model.LlmModelsResponse)
+async def list_models(
+    enabled_only: bool = fastapi.Query(
+        default=True, description="Only return enabled models"
+    ),
+):
+    """
+    List all LLM models available to users.
+
+    Returns models from the in-memory registry cache.
+    Use enabled_only=true to filter to only enabled models (default).
+    """
+    # Get models from in-memory registry
+    registry_models = get_enabled_models() if enabled_only else get_all_models()
+
+    # Map to API response models
+    models = [
+        llm_model.LlmModel(
+            slug=model.slug,
+            display_name=model.display_name,
+            description=model.description,
+            provider_name=model.provider_display_name,
+            creator=_map_creator(model.creator),
+            context_window=model.metadata.context_window,
+            max_output_tokens=model.metadata.max_output_tokens,
+            price_tier=model.metadata.price_tier,
+            is_enabled=model.is_enabled,
+            is_recommended=model.is_recommended,
+            capabilities=model.capabilities,
+            costs=[
+                llm_model.LlmModelCost(
+                    unit=cost.unit,
+                    credit_cost=cost.credit_cost,
+                    credential_provider=cost.credential_provider,
+                    credential_id=cost.credential_id,
+                    credential_type=cost.credential_type,
+                    currency=cost.currency,
+                    metadata=cost.metadata,
+                )
+                for cost in model.costs
+            ],
+        )
+        for model in registry_models
+    ]
+
+    return llm_model.LlmModelsResponse(models=models, total=len(models))
+
+
+@router.get("/providers", response_model=llm_model.LlmProvidersResponse)
+async def list_providers():
+    """
+    List all LLM providers with their enabled models.
+
+    Groups enabled models by provider from the in-memory registry.
+    """
+    # Get all enabled models and group by provider
+    registry_models = get_enabled_models()
+
+    # Group models by provider
+    provider_map: dict[str, list] = {}
+    for model in registry_models:
+        provider_key = model.metadata.provider
+        if provider_key not in provider_map:
+            provider_map[provider_key] = []
+        provider_map[provider_key].append(model)
+
+    # Build provider responses
+    providers = []
+    for provider_key, models in sorted(provider_map.items()):
+        # Use the first model's provider display name
+        display_name = models[0].provider_display_name if models else provider_key
+
+        providers.append(
+            llm_model.LlmProvider(
+                name=provider_key,
+                display_name=display_name,
+                models=[
+                    llm_model.LlmModel(
+                        slug=model.slug,
+                        display_name=model.display_name,
+                        description=model.description,
+                        provider_name=model.provider_display_name,
+                        creator=_map_creator(model.creator),
+                        context_window=model.metadata.context_window,
+                        max_output_tokens=model.metadata.max_output_tokens,
+                        price_tier=model.metadata.price_tier,
+                        is_enabled=model.is_enabled,
+            is_recommended=model.is_recommended,
+                        capabilities=model.capabilities,
+                        costs=[
+                            llm_model.LlmModelCost(
+                                unit=cost.unit,
+                                credit_cost=cost.credit_cost,
+                                credential_provider=cost.credential_provider,
+                                credential_id=cost.credential_id,
+                                credential_type=cost.credential_type,
+                                currency=cost.currency,
+                                metadata=cost.metadata,
+                            )
+                            for cost in model.costs
+                        ],
+                    )
+                    for model in sorted(models, key=lambda m: m.display_name)
+                ],
+            )
+        )
+
+    return llm_model.LlmProvidersResponse(providers=providers)
--- a/autogpt_platform/backend/backend/util/prompt_test.py
+++ b/autogpt_platform/backend/backend/util/prompt_test.py
@@ -612,7 +612,7 @@ class TestEnsureToolPairsIntact:
    # ---- Mixed/Edge Case Tests ----

    def test_anthropic_with_type_message_field(self):
-        """Test Anthropic format with 'type': 'message' field (orchestrator style)."""
+        """Test Anthropic format with 'type': 'message' field (smart_decision_maker style)."""
        all_msgs = [
            {"role": "system", "content": "You are helpful."},
            {
@@ -628,7 +628,7 @@ class TestEnsureToolPairsIntact:
            },
            {
                "role": "user",
-                "type": "message",  # Extra field from orchestrator
+                "type": "message",  # Extra field from smart_decision_maker
                "content": [
                    {
                        "type": "tool_result",
--- a/autogpt_platform/backend/migrations/20260310_add_llm_registry_schema/migration.sql
+++ b/autogpt_platform/backend/migrations/20260310_add_llm_registry_schema/migration.sql
@@ -0,0 +1,148 @@
+-- CreateEnum
+CREATE TYPE "LlmCostUnit" AS ENUM ('RUN', 'TOKENS');
+
+-- CreateTable
+CREATE TABLE "LlmProvider" (
+    "id" TEXT NOT NULL,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "updatedAt" TIMESTAMP(3) NOT NULL,
+    "name" TEXT NOT NULL,
+    "displayName" TEXT NOT NULL,
+    "description" TEXT,
+    "defaultCredentialProvider" TEXT,
+    "defaultCredentialId" TEXT,
+    "defaultCredentialType" TEXT,
+    "metadata" JSONB NOT NULL DEFAULT '{}',
+
+    CONSTRAINT "LlmProvider_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateTable
+CREATE TABLE "LlmModelCreator" (
+    "id" TEXT NOT NULL,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "updatedAt" TIMESTAMP(3) NOT NULL,
+    "name" TEXT NOT NULL,
+    "displayName" TEXT NOT NULL,
+    "description" TEXT,
+    "websiteUrl" TEXT,
+    "logoUrl" TEXT,
+    "metadata" JSONB NOT NULL DEFAULT '{}',
+
+    CONSTRAINT "LlmModelCreator_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateTable
+CREATE TABLE "LlmModel" (
+    "id" TEXT NOT NULL,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "updatedAt" TIMESTAMP(3) NOT NULL,
+    "slug" TEXT NOT NULL,
+    "displayName" TEXT NOT NULL,
+    "description" TEXT,
+    "providerId" TEXT NOT NULL,
+    "creatorId" TEXT,
+    "contextWindow" INTEGER NOT NULL,
+    "maxOutputTokens" INTEGER,
+    "priceTier" INTEGER NOT NULL DEFAULT 1,
+    "isEnabled" BOOLEAN NOT NULL DEFAULT true,
+    "isRecommended" BOOLEAN NOT NULL DEFAULT false,
+    "supportsTools" BOOLEAN NOT NULL DEFAULT false,
+    "supportsJsonOutput" BOOLEAN NOT NULL DEFAULT false,
+    "supportsReasoning" BOOLEAN NOT NULL DEFAULT false,
+    "supportsParallelToolCalls" BOOLEAN NOT NULL DEFAULT false,
+    "capabilities" JSONB NOT NULL DEFAULT '{}',
+    "metadata" JSONB NOT NULL DEFAULT '{}',
+
+    CONSTRAINT "LlmModel_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateTable
+CREATE TABLE "LlmModelCost" (
+    "id" TEXT NOT NULL,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "updatedAt" TIMESTAMP(3) NOT NULL,
+    "unit" "LlmCostUnit" NOT NULL DEFAULT 'RUN',
+    "creditCost" INTEGER NOT NULL,
+    "credentialProvider" TEXT NOT NULL,
+    "credentialId" TEXT,
+    "credentialType" TEXT,
+    "currency" TEXT,
+    "metadata" JSONB NOT NULL DEFAULT '{}',
+    "llmModelId" TEXT NOT NULL,
+
+    CONSTRAINT "LlmModelCost_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateTable
+CREATE TABLE "LlmModelMigration" (
+    "id" TEXT NOT NULL,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "updatedAt" TIMESTAMP(3) NOT NULL,
+    "sourceModelSlug" TEXT NOT NULL,
+    "targetModelSlug" TEXT NOT NULL,
+    "reason" TEXT,
+    "migratedNodeIds" JSONB NOT NULL DEFAULT '[]',
+    "nodeCount" INTEGER NOT NULL,
+    "customCreditCost" INTEGER,
+    "isReverted" BOOLEAN NOT NULL DEFAULT false,
+    "revertedAt" TIMESTAMP(3),
+
+    CONSTRAINT "LlmModelMigration_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateIndex
+CREATE UNIQUE INDEX "LlmProvider_name_key" ON "LlmProvider"("name");
+
+-- CreateIndex
+CREATE UNIQUE INDEX "LlmModelCreator_name_key" ON "LlmModelCreator"("name");
+
+-- CreateIndex
+CREATE UNIQUE INDEX "LlmModel_slug_key" ON "LlmModel"("slug");
+
+-- CreateIndex
+CREATE INDEX "LlmModel_providerId_isEnabled_idx" ON "LlmModel"("providerId", "isEnabled");
+
+-- CreateIndex
+CREATE INDEX "LlmModel_creatorId_idx" ON "LlmModel"("creatorId");
+
+-- CreateIndex (partial unique for default costs - no specific credential)
+CREATE UNIQUE INDEX "LlmModelCost_default_cost_key" ON "LlmModelCost"("llmModelId", "credentialProvider", "unit") WHERE "credentialId" IS NULL;
+
+-- CreateIndex (partial unique for credential-specific costs)
+CREATE UNIQUE INDEX "LlmModelCost_credential_cost_key" ON "LlmModelCost"("llmModelId", "credentialProvider", "credentialId", "unit") WHERE "credentialId" IS NOT NULL;
+
+-- CreateIndex
+CREATE INDEX "LlmModelMigration_targetModelSlug_idx" ON "LlmModelMigration"("targetModelSlug");
+
+-- CreateIndex
+CREATE INDEX "LlmModelMigration_sourceModelSlug_isReverted_idx" ON "LlmModelMigration"("sourceModelSlug", "isReverted");
+
+-- CreateIndex (partial unique to prevent multiple active migrations per source)
+CREATE UNIQUE INDEX "LlmModelMigration_active_source_key" ON "LlmModelMigration"("sourceModelSlug") WHERE "isReverted" = false;
+
+-- AddForeignKey
+ALTER TABLE "LlmModel" ADD CONSTRAINT "LlmModel_providerId_fkey" FOREIGN KEY ("providerId") REFERENCES "LlmProvider"("id") ON DELETE RESTRICT ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "LlmModel" ADD CONSTRAINT "LlmModel_creatorId_fkey" FOREIGN KEY ("creatorId") REFERENCES "LlmModelCreator"("id") ON DELETE SET NULL ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "LlmModelCost" ADD CONSTRAINT "LlmModelCost_llmModelId_fkey" FOREIGN KEY ("llmModelId") REFERENCES "LlmModel"("id") ON DELETE CASCADE ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "LlmModelMigration" ADD CONSTRAINT "LlmModelMigration_sourceModelSlug_fkey" FOREIGN KEY ("sourceModelSlug") REFERENCES "LlmModel"("slug") ON DELETE RESTRICT ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "LlmModelMigration" ADD CONSTRAINT "LlmModelMigration_targetModelSlug_fkey" FOREIGN KEY ("targetModelSlug") REFERENCES "LlmModel"("slug") ON DELETE RESTRICT ON UPDATE CASCADE;
+
+-- AddCheckConstraints (enforce data integrity)
+ALTER TABLE "LlmModel"
+    ADD CONSTRAINT "LlmModel_priceTier_check" CHECK ("priceTier" BETWEEN 1 AND 3);
+
+ALTER TABLE "LlmModelCost"
+    ADD CONSTRAINT "LlmModelCost_creditCost_check" CHECK ("creditCost" >= 0);
+
+ALTER TABLE "LlmModelMigration"
+    ADD CONSTRAINT "LlmModelMigration_nodeCount_check" CHECK ("nodeCount" >= 0),
+    ADD CONSTRAINT "LlmModelMigration_customCreditCost_check" CHECK ("customCreditCost" IS NULL OR "customCreditCost" >= 0);
--- a/autogpt_platform/backend/migrations/20260310_seed_llm_registry/migration.sql
+++ b/autogpt_platform/backend/migrations/20260310_seed_llm_registry/migration.sql
@@ -0,0 +1,287 @@
+-- Seed LLM Registry from existing hard-coded data
+-- This migration populates the LlmProvider, LlmModelCreator, LlmModel, and LlmModelCost tables
+-- with data from the existing MODEL_METADATA and MODEL_COST dictionaries
+
+-- Insert Providers
+INSERT INTO "LlmProvider" ("id", "createdAt", "updatedAt", "name", "displayName", "description", "defaultCredentialProvider", "defaultCredentialType", "metadata")
+VALUES
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'openai', 'OpenAI', 'OpenAI language models', 'openai', 'api_key', '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'anthropic', 'Anthropic', 'Anthropic Claude models', 'anthropic', 'api_key', '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'groq', 'Groq', 'Groq inference API', 'groq', 'api_key', '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'open_router', 'OpenRouter', 'OpenRouter unified API', 'open_router', 'api_key', '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'aiml_api', 'AI/ML API', 'AI/ML API models', 'aiml_api', 'api_key', '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'ollama', 'Ollama', 'Ollama local models', 'ollama', 'api_key', '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'llama_api', 'Llama API', 'Llama API models', 'llama_api', 'api_key', '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'v0', 'v0', 'v0 by Vercel models', 'v0', 'api_key', '{}'::jsonb)
+ON CONFLICT ("name") DO NOTHING;
+
+-- Insert Model Creators
+INSERT INTO "LlmModelCreator" ("id", "createdAt", "updatedAt", "name", "displayName", "description", "websiteUrl", "logoUrl", "metadata")
+VALUES
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'openai', 'OpenAI', 'Creator of GPT, O1, O3, and DALL-E models', 'https://openai.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'anthropic', 'Anthropic', 'Creator of Claude AI models', 'https://anthropic.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'meta', 'Meta', 'Creator of Llama foundation models', 'https://llama.meta.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'google', 'Google', 'Creator of Gemini and PaLM models', 'https://deepmind.google', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'mistralai', 'Mistral AI', 'Creator of Mistral and Codestral models', 'https://mistral.ai', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'cohere', 'Cohere', 'Creator of Command language models', 'https://cohere.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'deepseek', 'DeepSeek', 'Creator of DeepSeek reasoning models', 'https://deepseek.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'alibaba', 'Alibaba', 'Creator of Qwen language models', 'https://qwenlm.github.io', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'nvidia', 'NVIDIA', 'Creator of Nemotron models', 'https://nvidia.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'vercel', 'Vercel', 'Creator of v0 AI models', 'https://v0.dev', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'microsoft', 'Microsoft', 'Creator of Phi models', 'https://microsoft.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'xai', 'xAI', 'Creator of Grok models', 'https://x.ai', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'perplexity', 'Perplexity AI', 'Creator of Sonar search models', 'https://perplexity.ai', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'nousresearch', 'Nous Research', 'Creator of Hermes language models', 'https://nousresearch.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'amazon', 'Amazon', 'Creator of Nova language models', 'https://aws.amazon.com', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'gryphe', 'Gryphe', 'Creator of MythoMax models', 'https://huggingface.co/Gryphe', NULL, '{}'::jsonb),
+    (gen_random_uuid(), CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 'moonshotai', 'Moonshot AI', 'Creator of Kimi language models', 'https://moonshot.ai', NULL, '{}'::jsonb)
+ON CONFLICT ("name") DO NOTHING;
+
+-- Insert Models (using CTEs to reference provider and creator IDs)
+WITH provider_ids AS (
+    SELECT "id", "name" FROM "LlmProvider"
+),
+creator_ids AS (
+    SELECT "id", "name" FROM "LlmModelCreator"
+)
+INSERT INTO "LlmModel" ("id", "createdAt", "updatedAt", "slug", "displayName", "description", "providerId", "creatorId", "contextWindow", "maxOutputTokens", "isEnabled", "capabilities", "metadata")
+SELECT
+    gen_random_uuid(),
+    CURRENT_TIMESTAMP,
+    CURRENT_TIMESTAMP,
+    model_slug,
+    model_display_name,
+    NULL,
+    p."id",
+    c."id",
+    context_window,
+    max_output_tokens,
+    true,
+    '{}'::jsonb,
+    '{}'::jsonb
+FROM (VALUES
+    -- OpenAI models (creator: openai)
+    ('o3-2025-04-16', 'O3', 'openai', 'openai', 200000, 100000),
+    ('o3-mini', 'O3 Mini', 'openai', 'openai', 200000, 100000),
+    ('o1', 'O1', 'openai', 'openai', 200000, 100000),
+    ('o1-mini', 'O1 Mini', 'openai', 'openai', 128000, 65536),
+    ('gpt-5.2-2025-12-11', 'GPT-5.2', 'openai', 'openai', 400000, 128000),
+    ('gpt-5-2025-08-07', 'GPT 5', 'openai', 'openai', 400000, 128000),
+    ('gpt-5.1-2025-11-13', 'GPT 5.1', 'openai', 'openai', 400000, 128000),
+    ('gpt-5-mini-2025-08-07', 'GPT 5 Mini', 'openai', 'openai', 400000, 128000),
+    ('gpt-5-nano-2025-08-07', 'GPT 5 Nano', 'openai', 'openai', 400000, 128000),
+    ('gpt-5-chat-latest', 'GPT 5 Chat', 'openai', 'openai', 400000, 16384),
+    ('gpt-4.1-2025-04-14', 'GPT 4.1', 'openai', 'openai', 1000000, 32768),
+    ('gpt-4.1-mini-2025-04-14', 'GPT 4.1 Mini', 'openai', 'openai', 1047576, 32768),
+    ('gpt-4o-mini', 'GPT 4o Mini', 'openai', 'openai', 128000, 16384),
+    ('gpt-4o', 'GPT 4o', 'openai', 'openai', 128000, 16384),
+    ('gpt-4-turbo', 'GPT 4 Turbo', 'openai', 'openai', 128000, 4096),
+    -- Anthropic models (creator: anthropic)
+    ('claude-opus-4-6', 'Claude Opus 4.6', 'anthropic', 'anthropic', 200000, 128000),
+    ('claude-sonnet-4-6', 'Claude Sonnet 4.6', 'anthropic', 'anthropic', 200000, 64000),
+    ('claude-opus-4-1-20250805', 'Claude 4.1 Opus', 'anthropic', 'anthropic', 200000, 32000),
+    ('claude-opus-4-20250514', 'Claude 4 Opus', 'anthropic', 'anthropic', 200000, 32000),
+    ('claude-sonnet-4-20250514', 'Claude 4 Sonnet', 'anthropic', 'anthropic', 200000, 64000),
+    ('claude-opus-4-5-20251101', 'Claude 4.5 Opus', 'anthropic', 'anthropic', 200000, 64000),
+    ('claude-sonnet-4-5-20250929', 'Claude 4.5 Sonnet', 'anthropic', 'anthropic', 200000, 64000),
+    ('claude-haiku-4-5-20251001', 'Claude 4.5 Haiku', 'anthropic', 'anthropic', 200000, 64000),
+    ('claude-3-haiku-20240307', 'Claude 3 Haiku', 'anthropic', 'anthropic', 200000, 4096),
+    -- AI/ML API models (creators: alibaba, nvidia, meta)
+    ('Qwen/Qwen2.5-72B-Instruct-Turbo', 'Qwen 2.5 72B', 'aiml_api', 'alibaba', 32000, 8000),
+    ('nvidia/llama-3.1-nemotron-70b-instruct', 'Llama 3.1 Nemotron 70B', 'aiml_api', 'nvidia', 128000, 40000),
+    ('meta-llama/Llama-3.3-70B-Instruct-Turbo', 'Llama 3.3 70B', 'aiml_api', 'meta', 128000, NULL),
+    ('meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo', 'Meta Llama 3.1 70B', 'aiml_api', 'meta', 131000, 2000),
+    ('meta-llama/Llama-3.2-3B-Instruct-Turbo', 'Llama 3.2 3B', 'aiml_api', 'meta', 128000, NULL),
+    -- Groq models (creator: meta for Llama)
+    ('llama-3.3-70b-versatile', 'Llama 3.3 70B', 'groq', 'meta', 128000, 32768),
+    ('llama-3.1-8b-instant', 'Llama 3.1 8B', 'groq', 'meta', 128000, 8192),
+    -- Ollama models (creators: meta for Llama, mistralai for Mistral)
+    ('llama3.3', 'Llama 3.3', 'ollama', 'meta', 8192, NULL),
+    ('llama3.2', 'Llama 3.2', 'ollama', 'meta', 8192, NULL),
+    ('llama3', 'Llama 3', 'ollama', 'meta', 8192, NULL),
+    ('llama3.1:405b', 'Llama 3.1 405B', 'ollama', 'meta', 8192, NULL),
+    ('dolphin-mistral:latest', 'Dolphin Mistral', 'ollama', 'mistralai', 32768, NULL),
+    -- OpenRouter models (creators: google, mistralai, cohere, deepseek, perplexity, nousresearch, openai, amazon, microsoft, gryphe, meta, xai, moonshotai, alibaba)
+    ('google/gemini-2.5-pro-preview-03-25', 'Gemini 2.5 Pro', 'open_router', 'google', 1050000, 8192),
+    ('google/gemini-2.5-pro', 'Gemini 2.5 Pro', 'open_router', 'google', 1048576, 65536),
+    ('google/gemini-3.1-pro-preview', 'Gemini 3.1 Pro Preview', 'open_router', 'google', 1048576, 65536),
+    ('google/gemini-3-flash-preview', 'Gemini 3 Flash Preview', 'open_router', 'google', 1048576, 65536),
+    ('google/gemini-2.5-flash', 'Gemini 2.5 Flash', 'open_router', 'google', 1048576, 65535),
+    ('google/gemini-2.0-flash-001', 'Gemini 2.0 Flash', 'open_router', 'google', 1048576, 8192),
+    ('google/gemini-3.1-flash-lite-preview', 'Gemini 3.1 Flash Lite Preview', 'open_router', 'google', 1048576, 65536),
+    ('google/gemini-2.5-flash-lite-preview-06-17', 'Gemini 2.5 Flash Lite Preview', 'open_router', 'google', 1048576, 65535),
+    ('google/gemini-2.0-flash-lite-001', 'Gemini 2.0 Flash Lite', 'open_router', 'google', 1048576, 8192),
+    ('mistralai/mistral-nemo', 'Mistral Nemo', 'open_router', 'mistralai', 128000, 4096),
+    ('mistralai/mistral-large-2512', 'Mistral Large 3 2512', 'open_router', 'mistralai', 262144, NULL),
+    ('mistralai/mistral-medium-3.1', 'Mistral Medium 3.1', 'open_router', 'mistralai', 131072, NULL),
+    ('mistralai/mistral-small-3.2-24b-instruct', 'Mistral Small 3.2 24B', 'open_router', 'mistralai', 131072, 131072),
+    ('mistralai/codestral-2508', 'Codestral 2508', 'open_router', 'mistralai', 256000, NULL),
+    ('cohere/command-r-08-2024', 'Command R', 'open_router', 'cohere', 128000, 4096),
+    ('cohere/command-r-plus-08-2024', 'Command R Plus', 'open_router', 'cohere', 128000, 4096),
+    ('cohere/command-a-03-2025', 'Command A 03.2025', 'open_router', 'cohere', 256000, 8192),
+    ('cohere/command-a-reasoning-08-2025', 'Command A Reasoning 08.2025', 'open_router', 'cohere', 256000, 32768),
+    ('cohere/command-a-translate-08-2025', 'Command A Translate 08.2025', 'open_router', 'cohere', 128000, 8192),
+    ('cohere/command-a-vision-07-2025', 'Command A Vision 07.2025', 'open_router', 'cohere', 128000, 8192),
+    ('deepseek/deepseek-chat', 'DeepSeek Chat', 'open_router', 'deepseek', 64000, 2048),
+    ('deepseek/deepseek-r1-0528', 'DeepSeek R1', 'open_router', 'deepseek', 163840, 163840),
+    ('perplexity/sonar', 'Perplexity Sonar', 'open_router', 'perplexity', 127000, 8000),
+    ('perplexity/sonar-pro', 'Perplexity Sonar Pro', 'open_router', 'perplexity', 200000, 8000),
+    ('perplexity/sonar-deep-research', 'Perplexity Sonar Deep Research', 'open_router', 'perplexity', 128000, 16000),
+    ('perplexity/sonar-reasoning-pro', 'Sonar Reasoning Pro', 'open_router', 'perplexity', 128000, 8000),
+    ('nousresearch/hermes-3-llama-3.1-405b', 'Hermes 3 Llama 3.1 405B', 'open_router', 'nousresearch', 131000, 4096),
+    ('nousresearch/hermes-3-llama-3.1-70b', 'Hermes 3 Llama 3.1 70B', 'open_router', 'nousresearch', 12288, 12288),
+    ('openai/gpt-oss-120b', 'GPT OSS 120B', 'open_router', 'openai', 131072, 131072),
+    ('openai/gpt-oss-20b', 'GPT OSS 20B', 'open_router', 'openai', 131072, 32768),
+    ('amazon/nova-lite-v1', 'Amazon Nova Lite', 'open_router', 'amazon', 300000, 5120),
+    ('amazon/nova-micro-v1', 'Amazon Nova Micro', 'open_router', 'amazon', 128000, 5120),
+    ('amazon/nova-pro-v1', 'Amazon Nova Pro', 'open_router', 'amazon', 300000, 5120),
+    ('microsoft/wizardlm-2-8x22b', 'WizardLM 2 8x22B', 'open_router', 'microsoft', 65536, 4096),
+    ('microsoft/phi-4', 'Phi-4', 'open_router', 'microsoft', 16384, 16384),
+    ('gryphe/mythomax-l2-13b', 'MythoMax L2 13B', 'open_router', 'gryphe', 4096, 4096),
+    ('meta-llama/llama-4-scout', 'Llama 4 Scout', 'open_router', 'meta', 131072, 131072),
+    ('meta-llama/llama-4-maverick', 'Llama 4 Maverick', 'open_router', 'meta', 1048576, 1000000),
+    ('x-ai/grok-3', 'Grok 3', 'open_router', 'xai', 131072, 131072),
+    ('x-ai/grok-4', 'Grok 4', 'open_router', 'xai', 256000, 256000),
+    ('x-ai/grok-4-fast', 'Grok 4 Fast', 'open_router', 'xai', 2000000, 30000),
+    ('x-ai/grok-4.1-fast', 'Grok 4.1 Fast', 'open_router', 'xai', 2000000, 30000),
+    ('x-ai/grok-code-fast-1', 'Grok Code Fast 1', 'open_router', 'xai', 256000, 10000),
+    ('moonshotai/kimi-k2', 'Kimi K2', 'open_router', 'moonshotai', 131000, 131000),
+    ('qwen/qwen3-235b-a22b-thinking-2507', 'Qwen 3 235B Thinking', 'open_router', 'alibaba', 262144, 262144),
+    ('qwen/qwen3-coder', 'Qwen 3 Coder', 'open_router', 'alibaba', 262144, 262144),
+    -- Llama API models (creator: meta)
+    ('Llama-4-Scout-17B-16E-Instruct-FP8', 'Llama 4 Scout', 'llama_api', 'meta', 128000, 4028),
+    ('Llama-4-Maverick-17B-128E-Instruct-FP8', 'Llama 4 Maverick', 'llama_api', 'meta', 128000, 4028),
+    ('Llama-3.3-8B-Instruct', 'Llama 3.3 8B', 'llama_api', 'meta', 128000, 4028),
+    ('Llama-3.3-70B-Instruct', 'Llama 3.3 70B', 'llama_api', 'meta', 128000, 4028),
+    -- v0 models (creator: vercel)
+    ('v0-1.5-md', 'v0 1.5 MD', 'v0', 'vercel', 128000, 64000),
+    ('v0-1.5-lg', 'v0 1.5 LG', 'v0', 'vercel', 512000, 64000),
+    ('v0-1.0-md', 'v0 1.0 MD', 'v0', 'vercel', 128000, 64000)
+) AS models(model_slug, model_display_name, provider_name, creator_name, context_window, max_output_tokens)
+JOIN provider_ids p ON p."name" = models.provider_name
+JOIN creator_ids c ON c."name" = models.creator_name
+ON CONFLICT ("slug") DO NOTHING;
+
+-- Insert Costs (using CTEs to reference model IDs)
+WITH model_ids AS (
+    SELECT "id", "slug", "providerId" FROM "LlmModel"
+),
+provider_ids AS (
+    SELECT "id", "name" FROM "LlmProvider"
+)
+INSERT INTO "LlmModelCost" ("id", "createdAt", "updatedAt", "unit", "creditCost", "credentialProvider", "credentialId", "credentialType", "currency", "metadata", "llmModelId")
+SELECT
+    gen_random_uuid(),
+    CURRENT_TIMESTAMP,
+    CURRENT_TIMESTAMP,
+    'RUN'::"LlmCostUnit",
+    cost,
+    p."name",
+    NULL,
+    'api_key',
+    NULL,
+    '{}'::jsonb,
+    m."id"
+FROM (VALUES
+    -- OpenAI costs
+    ('o3-2025-04-16', 4),
+    ('o3-mini', 2),
+    ('o1', 16),
+    ('o1-mini', 4),
+    ('gpt-5.2-2025-12-11', 5),
+    ('gpt-5-2025-08-07', 2),
+    ('gpt-5.1-2025-11-13', 5),
+    ('gpt-5-mini-2025-08-07', 1),
+    ('gpt-5-nano-2025-08-07', 1),
+    ('gpt-5-chat-latest', 5),
+    ('gpt-4.1-2025-04-14', 2),
+    ('gpt-4.1-mini-2025-04-14', 1),
+    ('gpt-4o-mini', 1),
+    ('gpt-4o', 3),
+    ('gpt-4-turbo', 10),
+    -- Anthropic costs
+    ('claude-opus-4-6', 21),
+    ('claude-sonnet-4-6', 5),
+    ('claude-opus-4-1-20250805', 21),
+    ('claude-opus-4-20250514', 21),
+    ('claude-sonnet-4-20250514', 5),
+    ('claude-haiku-4-5-20251001', 4),
+    ('claude-opus-4-5-20251101', 14),
+    ('claude-sonnet-4-5-20250929', 9),
+    ('claude-3-haiku-20240307', 1),
+    -- AI/ML API costs
+    ('Qwen/Qwen2.5-72B-Instruct-Turbo', 1),
+    ('nvidia/llama-3.1-nemotron-70b-instruct', 1),
+    ('meta-llama/Llama-3.3-70B-Instruct-Turbo', 1),
+    ('meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo', 1),
+    ('meta-llama/Llama-3.2-3B-Instruct-Turbo', 1),
+    -- Groq costs
+    ('llama-3.3-70b-versatile', 1),
+    ('llama-3.1-8b-instant', 1),
+    -- Ollama costs
+    ('llama3.3', 1),
+    ('llama3.2', 1),
+    ('llama3', 1),
+    ('llama3.1:405b', 1),
+    ('dolphin-mistral:latest', 1),
+    -- OpenRouter costs
+    ('google/gemini-2.5-pro-preview-03-25', 4),
+    ('google/gemini-2.5-pro', 4),
+    ('google/gemini-3.1-pro-preview', 5),
+    ('google/gemini-3-flash-preview', 3),
+    ('google/gemini-3.1-flash-lite-preview', 1),
+    ('mistralai/mistral-nemo', 1),
+    ('mistralai/mistral-large-2512', 3),
+    ('mistralai/mistral-medium-3.1', 2),
+    ('mistralai/mistral-small-3.2-24b-instruct', 1),
+    ('mistralai/codestral-2508', 2),
+    ('cohere/command-r-08-2024', 1),
+    ('cohere/command-r-plus-08-2024', 3),
+    ('cohere/command-a-03-2025', 2),
+    ('cohere/command-a-reasoning-08-2025', 3),
+    ('cohere/command-a-translate-08-2025', 1),
+    ('cohere/command-a-vision-07-2025', 2),
+    ('deepseek/deepseek-chat', 2),
+    ('perplexity/sonar', 1),
+    ('perplexity/sonar-pro', 5),
+    ('perplexity/sonar-deep-research', 10),
+    ('perplexity/sonar-reasoning-pro', 5),
+    ('nousresearch/hermes-3-llama-3.1-405b', 1),
+    ('nousresearch/hermes-3-llama-3.1-70b', 1),
+    ('amazon/nova-lite-v1', 1),
+    ('amazon/nova-micro-v1', 1),
+    ('amazon/nova-pro-v1', 1),
+    ('microsoft/wizardlm-2-8x22b', 1),
+    ('microsoft/phi-4', 1),
+    ('gryphe/mythomax-l2-13b', 1),
+    ('meta-llama/llama-4-scout', 1),
+    ('meta-llama/llama-4-maverick', 1),
+    ('x-ai/grok-3', 5),
+    ('x-ai/grok-4', 9),
+    ('x-ai/grok-4-fast', 1),
+    ('x-ai/grok-4.1-fast', 1),
+    ('x-ai/grok-code-fast-1', 1),
+    ('moonshotai/kimi-k2', 1),
+    ('qwen/qwen3-235b-a22b-thinking-2507', 1),
+    ('qwen/qwen3-coder', 9),
+    ('google/gemini-2.5-flash', 1),
+    ('google/gemini-2.0-flash-001', 1),
+    ('google/gemini-2.5-flash-lite-preview-06-17', 1),
+    ('google/gemini-2.0-flash-lite-001', 1),
+    ('deepseek/deepseek-r1-0528', 1),
+    ('openai/gpt-oss-120b', 1),
+    ('openai/gpt-oss-20b', 1),
+    -- Llama API costs
+    ('Llama-4-Scout-17B-16E-Instruct-FP8', 1),
+    ('Llama-4-Maverick-17B-128E-Instruct-FP8', 1),
+    ('Llama-3.3-8B-Instruct', 1),
+    ('Llama-3.3-70B-Instruct', 1),
+    -- v0 costs
+    ('v0-1.5-md', 1),
+    ('v0-1.5-lg', 2),
+    ('v0-1.0-md', 1)
+) AS costs(model_slug, cost)
+JOIN model_ids m ON m."slug" = costs.model_slug
+JOIN provider_ids p ON p."id" = m."providerId"
+ON CONFLICT ("llmModelId", "credentialProvider", "unit") WHERE "credentialId" IS NULL DO NOTHING;
+
--- a/autogpt_platform/backend/poetry.lock
+++ b/autogpt_platform/backend/poetry.lock
@@ -594,6 +594,26 @@ files = [
    {file = "bracex-2.6.tar.gz", hash = "sha256:98f1347cd77e22ee8d967a30ad4e310b233f7754dbf31ff3fceb76145ba47dc7"},
 ]

+[[package]]
+name = "browserbase"
+version = "1.4.0"
+description = "The official Python library for the Browserbase API"
+optional = false
+python-versions = ">=3.8"
+groups = ["main"]
+files = [
+    {file = "browserbase-1.4.0-py3-none-any.whl", hash = "sha256:ea9f1fb4a88921975b8b9606835c441a59d8ce82ce00313a6d48bbe8e30f79fb"},
+    {file = "browserbase-1.4.0.tar.gz", hash = "sha256:e2ed36f513c8630b94b826042c4bb9f497c333f3bd28e5b76cb708c65b4318a0"},
+]
+
+[package.dependencies]
+anyio = ">=3.5.0,<5"
+distro = ">=1.7.0,<2"
+httpx = ">=0.23.0,<1"
+pydantic = ">=1.9.0,<3"
+sniffio = "*"
+typing-extensions = ">=4.10,<5"
+
 [[package]]
 name = "build"
 version = "1.4.0"
@@ -1468,6 +1488,94 @@ files = [
 [package.extras]
 devel = ["colorama", "json-spec", "jsonschema", "pylint", "pytest", "pytest-benchmark", "pytest-cache", "validictory"]

+[[package]]
+name = "fastuuid"
+version = "0.14.0"
+description = "Python bindings to Rust's UUID library."
+optional = false
+python-versions = ">=3.8"
+groups = ["main"]
+files = [
+    {file = "fastuuid-0.14.0-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:6e6243d40f6c793c3e2ee14c13769e341b90be5ef0c23c82fa6515a96145181a"},
+    {file = "fastuuid-0.14.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:13ec4f2c3b04271f62be2e1ce7e95ad2dd1cf97e94503a3760db739afbd48f00"},
+    {file = "fastuuid-0.14.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:b2fdd48b5e4236df145a149d7125badb28e0a383372add3fbaac9a6b7a394470"},
+    {file = "fastuuid-0.14.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f74631b8322d2780ebcf2d2d75d58045c3e9378625ec51865fe0b5620800c39d"},
+    {file = "fastuuid-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:83cffc144dc93eb604b87b179837f2ce2af44871a7b323f2bfed40e8acb40ba8"},
+    {file = "fastuuid-0.14.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1a771f135ab4523eb786e95493803942a5d1fc1610915f131b363f55af53b219"},
+    {file = "fastuuid-0.14.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:4edc56b877d960b4eda2c4232f953a61490c3134da94f3c28af129fb9c62a4f6"},
+    {file = "fastuuid-0.14.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:bcc96ee819c282e7c09b2eed2b9bd13084e3b749fdb2faf58c318d498df2efbe"},
+    {file = "fastuuid-0.14.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:7a3c0bca61eacc1843ea97b288d6789fbad7400d16db24e36a66c28c268cfe3d"},
+    {file = "fastuuid-0.14.0-cp310-cp310-win32.whl", hash = "sha256:7f2f3efade4937fae4e77efae1af571902263de7b78a0aee1a1653795a093b2a"},
+    {file = "fastuuid-0.14.0-cp310-cp310-win_amd64.whl", hash = "sha256:ae64ba730d179f439b0736208b4c279b8bc9c089b102aec23f86512ea458c8a4"},
+    {file = "fastuuid-0.14.0-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:73946cb950c8caf65127d4e9a325e2b6be0442a224fd51ba3b6ac44e1912ce34"},
+    {file = "fastuuid-0.14.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:12ac85024637586a5b69645e7ed986f7535106ed3013640a393a03e461740cb7"},
+    {file = "fastuuid-0.14.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:05a8dde1f395e0c9b4be515b7a521403d1e8349443e7641761af07c7ad1624b1"},
+    {file = "fastuuid-0.14.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:09378a05020e3e4883dfdab438926f31fea15fd17604908f3d39cbeb22a0b4dc"},
+    {file = "fastuuid-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bbb0c4b15d66b435d2538f3827f05e44e2baafcc003dd7d8472dc67807ab8fd8"},
+    {file = "fastuuid-0.14.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cd5a7f648d4365b41dbf0e38fe8da4884e57bed4e77c83598e076ac0c93995e7"},
+    {file = "fastuuid-0.14.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:c0a94245afae4d7af8c43b3159d5e3934c53f47140be0be624b96acd672ceb73"},
+    {file = "fastuuid-0.14.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:2b29e23c97e77c3a9514d70ce343571e469098ac7f5a269320a0f0b3e193ab36"},
+    {file = "fastuuid-0.14.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:1e690d48f923c253f28151b3a6b4e335f2b06bf669c68a02665bc150b7839e94"},
+    {file = "fastuuid-0.14.0-cp311-cp311-win32.whl", hash = "sha256:a6f46790d59ab38c6aa0e35c681c0484b50dc0acf9e2679c005d61e019313c24"},
+    {file = "fastuuid-0.14.0-cp311-cp311-win_amd64.whl", hash = "sha256:e150eab56c95dc9e3fefc234a0eedb342fac433dacc273cd4d150a5b0871e1fa"},
+    {file = "fastuuid-0.14.0-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:77e94728324b63660ebf8adb27055e92d2e4611645bf12ed9d88d30486471d0a"},
+    {file = "fastuuid-0.14.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:caa1f14d2102cb8d353096bc6ef6c13b2c81f347e6ab9d6fbd48b9dea41c153d"},
+    {file = "fastuuid-0.14.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:d23ef06f9e67163be38cece704170486715b177f6baae338110983f99a72c070"},
+    {file = "fastuuid-0.14.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0c9ec605ace243b6dbe3bd27ebdd5d33b00d8d1d3f580b39fdd15cd96fd71796"},
+    {file = "fastuuid-0.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:808527f2407f58a76c916d6aa15d58692a4a019fdf8d4c32ac7ff303b7d7af09"},
+    {file = "fastuuid-0.14.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2fb3c0d7fef6674bbeacdd6dbd386924a7b60b26de849266d1ff6602937675c8"},
+    {file = "fastuuid-0.14.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:ab3f5d36e4393e628a4df337c2c039069344db5f4b9d2a3c9cea48284f1dd741"},
+    {file = "fastuuid-0.14.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:b9a0ca4f03b7e0b01425281ffd44e99d360e15c895f1907ca105854ed85e2057"},
+    {file = "fastuuid-0.14.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:3acdf655684cc09e60fb7e4cf524e8f42ea760031945aa8086c7eae2eeeabeb8"},
+    {file = "fastuuid-0.14.0-cp312-cp312-win32.whl", hash = "sha256:9579618be6280700ae36ac42c3efd157049fe4dd40ca49b021280481c78c3176"},
+    {file = "fastuuid-0.14.0-cp312-cp312-win_amd64.whl", hash = "sha256:d9e4332dc4ba054434a9594cbfaf7823b57993d7d8e7267831c3e059857cf397"},
+    {file = "fastuuid-0.14.0-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:77a09cb7427e7af74c594e409f7731a0cf887221de2f698e1ca0ebf0f3139021"},
+    {file = "fastuuid-0.14.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:9bd57289daf7b153bfa3e8013446aa144ce5e8c825e9e366d455155ede5ea2dc"},
+    {file = "fastuuid-0.14.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:ac60fc860cdf3c3f327374db87ab8e064c86566ca8c49d2e30df15eda1b0c2d5"},
+    {file = "fastuuid-0.14.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ab32f74bd56565b186f036e33129da77db8be09178cd2f5206a5d4035fb2a23f"},
+    {file = "fastuuid-0.14.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:33e678459cf4addaedd9936bbb038e35b3f6b2061330fd8f2f6a1d80414c0f87"},
+    {file = "fastuuid-0.14.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1e3cc56742f76cd25ecb98e4b82a25f978ccffba02e4bdce8aba857b6d85d87b"},
+    {file = "fastuuid-0.14.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:cb9a030f609194b679e1660f7e32733b7a0f332d519c5d5a6a0a580991290022"},
+    {file = "fastuuid-0.14.0-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:09098762aad4f8da3a888eb9ae01c84430c907a297b97166b8abc07b640f2995"},
+    {file = "fastuuid-0.14.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:1383fff584fa249b16329a059c68ad45d030d5a4b70fb7c73a08d98fd53bcdab"},
+    {file = "fastuuid-0.14.0-cp313-cp313-win32.whl", hash = "sha256:a0809f8cc5731c066c909047f9a314d5f536c871a7a22e815cc4967c110ac9ad"},
+    {file = "fastuuid-0.14.0-cp313-cp313-win_amd64.whl", hash = "sha256:0df14e92e7ad3276327631c9e7cec09e32572ce82089c55cb1bb8df71cf394ed"},
+    {file = "fastuuid-0.14.0-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:b852a870a61cfc26c884af205d502881a2e59cc07076b60ab4a951cc0c94d1ad"},
+    {file = "fastuuid-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:c7502d6f54cd08024c3ea9b3514e2d6f190feb2f46e6dbcd3747882264bb5f7b"},
+    {file = "fastuuid-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:1ca61b592120cf314cfd66e662a5b54a578c5a15b26305e1b8b618a6f22df714"},
+    {file = "fastuuid-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:aa75b6657ec129d0abded3bec745e6f7ab642e6dba3a5272a68247e85f5f316f"},
+    {file = "fastuuid-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a8a0dfea3972200f72d4c7df02c8ac70bad1bb4c58d7e0ec1e6f341679073a7f"},
+    {file = "fastuuid-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1bf539a7a95f35b419f9ad105d5a8a35036df35fdafae48fb2fd2e5f318f0d75"},
+    {file = "fastuuid-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:9a133bf9cc78fdbd1179cb58a59ad0100aa32d8675508150f3658814aeefeaa4"},
+    {file = "fastuuid-0.14.0-cp314-cp314-musllinux_1_1_i686.whl", hash = "sha256:f54d5b36c56a2d5e1a31e73b950b28a0d83eb0c37b91d10408875a5a29494bad"},
+    {file = "fastuuid-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:ec27778c6ca3393ef662e2762dba8af13f4ec1aaa32d08d77f71f2a70ae9feb8"},
+    {file = "fastuuid-0.14.0-cp314-cp314-win32.whl", hash = "sha256:e23fc6a83f112de4be0cc1990e5b127c27663ae43f866353166f87df58e73d06"},
+    {file = "fastuuid-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:df61342889d0f5e7a32f7284e55ef95103f2110fee433c2ae7c2c0956d76ac8a"},
+    {file = "fastuuid-0.14.0-cp38-cp38-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:47c821f2dfe95909ead0085d4cb18d5149bca704a2b03e03fb3f81a5202d8cea"},
+    {file = "fastuuid-0.14.0-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:3964bab460c528692c70ab6b2e469dd7a7b152fbe8c18616c58d34c93a6cf8d4"},
+    {file = "fastuuid-0.14.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:c501561e025b7aea3508719c5801c360c711d5218fc4ad5d77bf1c37c1a75779"},
+    {file = "fastuuid-0.14.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2dce5d0756f046fa792a40763f36accd7e466525c5710d2195a038f93ff96346"},
+    {file = "fastuuid-0.14.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:193ca10ff553cf3cc461572da83b5780fc0e3eea28659c16f89ae5202f3958d4"},
+    {file = "fastuuid-0.14.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:0737606764b29785566f968bd8005eace73d3666bd0862f33a760796e26d1ede"},
+    {file = "fastuuid-0.14.0-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:e0976c0dff7e222513d206e06341503f07423aceb1db0b83ff6851c008ceee06"},
+    {file = "fastuuid-0.14.0-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:6fbc49a86173e7f074b1a9ec8cf12ca0d54d8070a85a06ebf0e76c309b84f0d0"},
+    {file = "fastuuid-0.14.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:de01280eabcd82f7542828ecd67ebf1551d37203ecdfd7ab1f2e534edb78d505"},
+    {file = "fastuuid-0.14.0-cp38-cp38-win32.whl", hash = "sha256:af5967c666b7d6a377098849b07f83462c4fedbafcf8eb8bc8ff05dcbe8aa209"},
+    {file = "fastuuid-0.14.0-cp38-cp38-win_amd64.whl", hash = "sha256:c3091e63acf42f56a6f74dc65cfdb6f99bfc79b5913c8a9ac498eb7ca09770a8"},
+    {file = "fastuuid-0.14.0-cp39-cp39-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:2ec3d94e13712a133137b2805073b65ecef4a47217d5bac15d8ac62376cefdb4"},
+    {file = "fastuuid-0.14.0-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:139d7ff12bb400b4a0c76be64c28cbe2e2edf60b09826cbfd85f33ed3d0bbe8b"},
+    {file = "fastuuid-0.14.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:d55b7e96531216fc4f071909e33e35e5bfa47962ae67d9e84b00a04d6e8b7173"},
+    {file = "fastuuid-0.14.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c0eb25f0fd935e376ac4334927a59e7c823b36062080e2e13acbaf2af15db836"},
+    {file = "fastuuid-0.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:089c18018fdbdda88a6dafd7d139f8703a1e7c799618e33ea25eb52503d28a11"},
+    {file = "fastuuid-0.14.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2fc37479517d4d70c08696960fad85494a8a7a0af4e93e9a00af04d74c59f9e3"},
+    {file = "fastuuid-0.14.0-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:73657c9f778aba530bc96a943d30e1a7c80edb8278df77894fe9457540df4f85"},
+    {file = "fastuuid-0.14.0-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:d31f8c257046b5617fc6af9c69be066d2412bdef1edaa4bdf6a214cf57806105"},
+    {file = "fastuuid-0.14.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:5816d41f81782b209843e52fdef757a361b448d782452d96abedc53d545da722"},
+    {file = "fastuuid-0.14.0-cp39-cp39-win32.whl", hash = "sha256:448aa6833f7a84bfe37dd47e33df83250f404d591eb83527fa2cac8d1e57d7f3"},
+    {file = "fastuuid-0.14.0-cp39-cp39-win_amd64.whl", hash = "sha256:84b0779c5abbdec2a9511d5ffbfcd2e53079bf889824b32be170c0d8ef5fc74c"},
+    {file = "fastuuid-0.14.0.tar.gz", hash = "sha256:178947fc2f995b38497a74172adee64fdeb8b7ec18f2a5934d037641ba265d26"},
+]
+
 [[package]]
 name = "feedparser"
 version = "6.0.12"
@@ -1930,6 +2038,7 @@ files = [
 [package.dependencies]
 cryptography = ">=38.0.3"
 pyasn1-modules = ">=0.2.1"
+requests = {version = ">=2.20.0,<3.0.0", optional = true, markers = "extra == \"requests\""}
 rsa = ">=3.1.4,<5"

 [package.extras]
@@ -2131,6 +2240,34 @@ files = [
    {file = "google_crc32c-1.8.0.tar.gz", hash = "sha256:a428e25fb7691024de47fecfbff7ff957214da51eddded0da0ae0e0f03a2cf79"},
 ]

+[[package]]
+name = "google-genai"
+version = "1.62.0"
+description = "GenAI Python SDK"
+optional = false
+python-versions = ">=3.10"
+groups = ["main"]
+files = [
+    {file = "google_genai-1.62.0-py3-none-any.whl", hash = "sha256:4c3daeff3d05fafee4b9a1a31f9c07f01bc22051081aa58b4d61f58d16d1bcc0"},
+    {file = "google_genai-1.62.0.tar.gz", hash = "sha256:709468a14c739a080bc240a4f3191df597bf64485b1ca3728e0fb67517774c18"},
+]
+
+[package.dependencies]
+anyio = ">=4.8.0,<5.0.0"
+distro = ">=1.7.0,<2"
+google-auth = {version = ">=2.47.0,<3.0.0", extras = ["requests"]}
+httpx = ">=0.28.1,<1.0.0"
+pydantic = ">=2.9.0,<3.0.0"
+requests = ">=2.28.1,<3.0.0"
+sniffio = "*"
+tenacity = ">=8.2.3,<9.2.0"
+typing-extensions = ">=4.11.0,<5.0.0"
+websockets = ">=13.0.0,<15.1.0"
+
+[package.extras]
+aiohttp = ["aiohttp (<3.13.3)"]
+local-tokenizer = ["protobuf", "sentencepiece (>=0.2.0)"]
+
 [[package]]
 name = "google-resumable-media"
 version = "2.8.0"
@@ -2223,7 +2360,6 @@ description = "Lightweight in-process concurrent programming"
 optional = false
 python-versions = ">=3.10"
 groups = ["main"]
-markers = "platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\""
 files = [
    {file = "greenlet-3.3.1-cp310-cp310-macosx_11_0_universal2.whl", hash = "sha256:04bee4775f40ecefcdaa9d115ab44736cd4b9c5fba733575bfe9379419582e13"},
    {file = "greenlet-3.3.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:50e1457f4fed12a50e427988a07f0f9df53cf0ee8da23fab16e6732c2ec909d4"},
@@ -2446,6 +2582,42 @@ files = [
 hpack = ">=4.1,<5"
 hyperframe = ">=6.1,<7"

+[[package]]
+name = "hf-xet"
+version = "1.2.0"
+description = "Fast transfer of large files with the Hugging Face Hub."
+optional = false
+python-versions = ">=3.8"
+groups = ["main"]
+markers = "platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"arm64\" or platform_machine == \"aarch64\""
+files = [
+    {file = "hf_xet-1.2.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:ceeefcd1b7aed4956ae8499e2199607765fbd1c60510752003b6cc0b8413b649"},
+    {file = "hf_xet-1.2.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:b70218dd548e9840224df5638fdc94bd033552963cfa97f9170829381179c813"},
+    {file = "hf_xet-1.2.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7d40b18769bb9a8bc82a9ede575ce1a44c75eb80e7375a01d76259089529b5dc"},
+    {file = "hf_xet-1.2.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:cd3a6027d59cfb60177c12d6424e31f4b5ff13d8e3a1247b3a584bf8977e6df5"},
+    {file = "hf_xet-1.2.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6de1fc44f58f6dd937956c8d304d8c2dea264c80680bcfa61ca4a15e7b76780f"},
+    {file = "hf_xet-1.2.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:f182f264ed2acd566c514e45da9f2119110e48a87a327ca271027904c70c5832"},
+    {file = "hf_xet-1.2.0-cp313-cp313t-win_amd64.whl", hash = "sha256:293a7a3787e5c95d7be1857358a9130694a9c6021de3f27fa233f37267174382"},
+    {file = "hf_xet-1.2.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:10bfab528b968c70e062607f663e21e34e2bba349e8038db546646875495179e"},
+    {file = "hf_xet-1.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:2a212e842647b02eb6a911187dc878e79c4aa0aa397e88dd3b26761676e8c1f8"},
+    {file = "hf_xet-1.2.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:30e06daccb3a7d4c065f34fc26c14c74f4653069bb2b194e7f18f17cbe9939c0"},
+    {file = "hf_xet-1.2.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:29c8fc913a529ec0a91867ce3d119ac1aac966e098cf49501800c870328cc090"},
+    {file = "hf_xet-1.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:66e159cbfcfbb29f920db2c09ed8b660eb894640d284f102ada929b6e3dc410a"},
+    {file = "hf_xet-1.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:9c91d5ae931510107f148874e9e2de8a16052b6f1b3ca3c1b12f15ccb491390f"},
+    {file = "hf_xet-1.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:210d577732b519ac6ede149d2f2f34049d44e8622bf14eb3d63bbcd2d4b332dc"},
+    {file = "hf_xet-1.2.0-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:46740d4ac024a7ca9b22bebf77460ff43332868b661186a8e46c227fdae01848"},
+    {file = "hf_xet-1.2.0-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:27df617a076420d8845bea087f59303da8be17ed7ec0cd7ee3b9b9f579dff0e4"},
+    {file = "hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3651fd5bfe0281951b988c0facbe726aa5e347b103a675f49a3fa8144c7968fd"},
+    {file = "hf_xet-1.2.0-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:d06fa97c8562fb3ee7a378dd9b51e343bc5bc8190254202c9771029152f5e08c"},
+    {file = "hf_xet-1.2.0-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:4c1428c9ae73ec0939410ec73023c4f842927f39db09b063b9482dac5a3bb737"},
+    {file = "hf_xet-1.2.0-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a55558084c16b09b5ed32ab9ed38421e2d87cf3f1f89815764d1177081b99865"},
+    {file = "hf_xet-1.2.0-cp37-abi3-win_amd64.whl", hash = "sha256:e6584a52253f72c9f52f9e549d5895ca7a471608495c4ecaa6cc73dba2b24d69"},
+    {file = "hf_xet-1.2.0.tar.gz", hash = "sha256:a8c27070ca547293b6890c4bf389f713f80e8c478631432962bb7f4bc0bd7d7f"},
+]
+
+[package.extras]
+tests = ["pytest"]
+
 [[package]]
 name = "hpack"
 version = "4.1.0"
@@ -2597,6 +2769,42 @@ files = [
    {file = "httpx_sse-0.4.3.tar.gz", hash = "sha256:9b1ed0127459a66014aec3c56bebd93da3c1bc8bb6618c8082039a44889a755d"},
 ]

+[[package]]
+name = "huggingface-hub"
+version = "1.4.1"
+description = "Client library to download and publish models, datasets and other repos on the huggingface.co hub"
+optional = false
+python-versions = ">=3.9.0"
+groups = ["main"]
+files = [
+    {file = "huggingface_hub-1.4.1-py3-none-any.whl", hash = "sha256:9931d075fb7a79af5abc487106414ec5fba2c0ae86104c0c62fd6cae38873d18"},
+    {file = "huggingface_hub-1.4.1.tar.gz", hash = "sha256:b41131ec35e631e7383ab26d6146b8d8972abc8b6309b963b306fbcca87f5ed5"},
+]
+
+[package.dependencies]
+filelock = "*"
+fsspec = ">=2023.5.0"
+hf-xet = {version = ">=1.2.0,<2.0.0", markers = "platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"arm64\" or platform_machine == \"aarch64\""}
+httpx = ">=0.23.0,<1"
+packaging = ">=20.9"
+pyyaml = ">=5.1"
+shellingham = "*"
+tqdm = ">=4.42.1"
+typer-slim = "*"
+typing-extensions = ">=4.1.0"
+
+[package.extras]
+all = ["Jinja2", "Pillow", "authlib (>=1.3.2)", "fastapi", "fastapi", "httpx", "itsdangerous", "jedi", "libcst (>=1.4.0)", "mypy (==1.15.0)", "numpy", "pytest (>=8.4.2)", "pytest-asyncio", "pytest-cov", "pytest-env", "pytest-mock", "pytest-rerunfailures (<16.0)", "pytest-vcr", "pytest-xdist", "ruff (>=0.9.0)", "soundfile", "ty", "types-PyYAML", "types-simplejson", "types-toml", "types-tqdm", "types-urllib3", "typing-extensions (>=4.8.0)", "urllib3 (<2.0)"]
+dev = ["Jinja2", "Pillow", "authlib (>=1.3.2)", "fastapi", "fastapi", "httpx", "itsdangerous", "jedi", "libcst (>=1.4.0)", "mypy (==1.15.0)", "numpy", "pytest (>=8.4.2)", "pytest-asyncio", "pytest-cov", "pytest-env", "pytest-mock", "pytest-rerunfailures (<16.0)", "pytest-vcr", "pytest-xdist", "ruff (>=0.9.0)", "soundfile", "ty", "types-PyYAML", "types-simplejson", "types-toml", "types-tqdm", "types-urllib3", "typing-extensions (>=4.8.0)", "urllib3 (<2.0)"]
+fastai = ["fastai (>=2.4)", "fastcore (>=1.3.27)", "toml"]
+hf-xet = ["hf-xet (>=1.2.0,<2.0.0)"]
+mcp = ["mcp (>=1.8.0)"]
+oauth = ["authlib (>=1.3.2)", "fastapi", "httpx", "itsdangerous"]
+quality = ["libcst (>=1.4.0)", "mypy (==1.15.0)", "ruff (>=0.9.0)", "ty"]
+testing = ["Jinja2", "Pillow", "authlib (>=1.3.2)", "fastapi", "fastapi", "httpx", "itsdangerous", "jedi", "numpy", "pytest (>=8.4.2)", "pytest-asyncio", "pytest-cov", "pytest-env", "pytest-mock", "pytest-rerunfailures (<16.0)", "pytest-vcr", "pytest-xdist", "soundfile", "urllib3 (<2.0)"]
+torch = ["safetensors[torch]", "torch"]
+typing = ["types-PyYAML", "types-simplejson", "types-toml", "types-tqdm", "types-urllib3", "typing-extensions (>=4.8.0)"]
+
 [[package]]
 name = "hyperframe"
 version = "6.1.0"
@@ -3142,6 +3350,40 @@ dynamodb = ["boto3 (>=1.9.71)"]
 redis = ["redis (>=2.10.5)"]
 test-filesource = ["pyyaml (>=5.3.1)", "watchdog (>=3.0.0)"]

+[[package]]
+name = "litellm"
+version = "1.80.0"
+description = "Library to easily interface with LLM API providers"
+optional = false
+python-versions = "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8"
+groups = ["main"]
+files = [
+    {file = "litellm-1.80.0-py3-none-any.whl", hash = "sha256:fd0009758f4772257048d74bf79bb64318859adb4ea49a8b66fdbc718cd80b6e"},
+    {file = "litellm-1.80.0.tar.gz", hash = "sha256:eeac733eb6b226f9e5fb020f72fe13a32b3354b001dc62bcf1bc4d9b526d6231"},
+]
+
+[package.dependencies]
+aiohttp = ">=3.10"
+click = "*"
+fastuuid = ">=0.13.0"
+httpx = ">=0.23.0"
+importlib-metadata = ">=6.8.0"
+jinja2 = ">=3.1.2,<4.0.0"
+jsonschema = ">=4.22.0,<5.0.0"
+openai = ">=1.99.5"
+pydantic = ">=2.5.0,<3.0.0"
+python-dotenv = ">=0.2.0"
+tiktoken = ">=0.7.0"
+tokenizers = "*"
+
+[package.extras]
+caching = ["diskcache (>=5.6.1,<6.0.0)"]
+extra-proxy = ["azure-identity (>=1.15.0,<2.0.0)", "azure-keyvault-secrets (>=4.8.0,<5.0.0)", "google-cloud-iam (>=2.19.1,<3.0.0)", "google-cloud-kms (>=2.21.3,<3.0.0)", "prisma (==0.11.0)", "redisvl (>=0.4.1,<0.5.0) ; python_version >= \"3.9\" and python_version < \"3.14\"", "resend (>=0.8.0,<0.9.0)"]
+mlflow = ["mlflow (>3.1.4) ; python_version >= \"3.10\""]
+proxy = ["PyJWT (>=2.8.0,<3.0.0)", "apscheduler (>=3.10.4,<4.0.0)", "azure-identity (>=1.15.0,<2.0.0)", "azure-storage-blob (>=12.25.1,<13.0.0)", "backoff", "boto3 (==1.36.0)", "cryptography", "fastapi (>=0.120.1)", "fastapi-sso (>=0.16.0,<0.17.0)", "gunicorn (>=23.0.0,<24.0.0)", "litellm-enterprise (==0.1.21)", "litellm-proxy-extras (==0.4.5)", "mcp (>=1.10.0,<2.0.0) ; python_version >= \"3.10\"", "orjson (>=3.9.7,<4.0.0)", "polars (>=1.31.0,<2.0.0) ; python_version >= \"3.10\"", "pynacl (>=1.5.0,<2.0.0)", "python-multipart (>=0.0.18,<0.0.19)", "pyyaml (>=6.0.1,<7.0.0)", "rich (==13.7.1)", "rq", "soundfile (>=0.12.1,<0.13.0)", "uvicorn (>=0.29.0,<0.30.0)", "uvloop (>=0.21.0,<0.22.0) ; sys_platform != \"win32\"", "websockets (>=13.1.0,<14.0.0)"]
+semantic-router = ["semantic-router ; python_version >= \"3.9\""]
+utils = ["numpydoc"]
+
 [[package]]
 name = "markdown-it-py"
 version = "4.0.0"
@@ -4615,6 +4857,28 @@ docs = ["furo (>=2025.9.25)", "proselint (>=0.14)", "sphinx (>=8.2.3)", "sphinx-
 test = ["appdirs (==1.4.4)", "covdefaults (>=2.3)", "pytest (>=8.4.2)", "pytest-cov (>=7)", "pytest-mock (>=3.15.1)"]
 type = ["mypy (>=1.18.2)"]

+[[package]]
+name = "playwright"
+version = "1.58.0"
+description = "A high-level API to automate web browsers"
+optional = false
+python-versions = ">=3.9"
+groups = ["main"]
+files = [
+    {file = "playwright-1.58.0-py3-none-macosx_10_13_x86_64.whl", hash = "sha256:96e3204aac292ee639edbfdef6298b4be2ea0a55a16b7068df91adac077cc606"},
+    {file = "playwright-1.58.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:70c763694739d28df71ed578b9c8202bb83e8fe8fb9268c04dd13afe36301f71"},
+    {file = "playwright-1.58.0-py3-none-macosx_11_0_universal2.whl", hash = "sha256:185e0132578733d02802dfddfbbc35f42be23a45ff49ccae5081f25952238117"},
+    {file = "playwright-1.58.0-py3-none-manylinux1_x86_64.whl", hash = "sha256:c95568ba1eda83812598c1dc9be60b4406dffd60b149bc1536180ad108723d6b"},
+    {file = "playwright-1.58.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8f9999948f1ab541d98812de25e3a8c410776aa516d948807140aff797b4bffa"},
+    {file = "playwright-1.58.0-py3-none-win32.whl", hash = "sha256:1e03be090e75a0fabbdaeab65ce17c308c425d879fa48bb1d7986f96bfad0b99"},
+    {file = "playwright-1.58.0-py3-none-win_amd64.whl", hash = "sha256:a2bf639d0ce33b3ba38de777e08697b0d8f3dc07ab6802e4ac53fb65e3907af8"},
+    {file = "playwright-1.58.0-py3-none-win_arm64.whl", hash = "sha256:32ffe5c303901a13a0ecab91d1c3f74baf73b84f4bedbb6b935f5bc11cc98e1b"},
+]
+
+[package.dependencies]
+greenlet = ">=3.1.1,<4.0.0"
+pyee = ">=13,<14"
+
 [[package]]
 name = "pluggy"
 version = "1.6.0"
@@ -5601,6 +5865,24 @@ gcp-secret-manager = ["google-cloud-secret-manager (>=2.23.1)"]
 toml = ["tomli (>=2.0.1)"]
 yaml = ["pyyaml (>=6.0.1)"]

+[[package]]
+name = "pyee"
+version = "13.0.0"
+description = "A rough port of Node.js's EventEmitter to Python with a few tricks of its own"
+optional = false
+python-versions = ">=3.8"
+groups = ["main"]
+files = [
+    {file = "pyee-13.0.0-py3-none-any.whl", hash = "sha256:48195a3cddb3b1515ce0695ed76036b5ccc2ef3a9f963ff9f77aec0139845498"},
+    {file = "pyee-13.0.0.tar.gz", hash = "sha256:b391e3c5a434d1f5118a25615001dbc8f669cf410ab67d04c4d4e07c55481c37"},
+]
+
+[package.dependencies]
+typing-extensions = "*"
+
+[package.extras]
+dev = ["black", "build", "flake8", "flake8-black", "isort", "jupyter-console", "mkdocs", "mkdocs-include-markdown-plugin", "mkdocstrings[python]", "mypy", "pytest", "pytest-asyncio ; python_version >= \"3.4\"", "pytest-trio ; python_version >= \"3.7\"", "sphinx", "toml", "tox", "trio", "trio ; python_version > \"3.6\"", "trio-typing ; python_version > \"3.6\"", "twine", "twisted", "validate-pyproject[all]"]
+
 [[package]]
 name = "pyflakes"
 version = "3.4.0"
@@ -7033,29 +7315,32 @@ uvicorn = ["uvicorn (>=0.34.0)"]

 [[package]]
 name = "stagehand"
-version = "3.7.0"
-description = "The official Python library for the stagehand API"
+version = "0.5.9"
+description = "Python SDK for Stagehand"
 optional = false
 python-versions = ">=3.9"
 groups = ["main"]
 files = [
-    {file = "stagehand-3.7.0-py3-none-macosx_10_9_x86_64.whl", hash = "sha256:4918068e6c02717c09766f1df41d5a41ac2ad9b610a30bb584a7d5d359f8d654"},
-    {file = "stagehand-3.7.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:cedb940ebbd47930227f5ef82077080aeb1f77480913382183187aba98e3cca5"},
-    {file = "stagehand-3.7.0-py3-none-manylinux2014_x86_64.whl", hash = "sha256:87df69bca9a611c4acae7383333f1e0cf67cc5b92be91639c65772aa59f8e6ea"},
-    {file = "stagehand-3.7.0-py3-none-win_amd64.whl", hash = "sha256:09d809f3b35389b2ed0b879e8909a8ed01e1ba9330f39c08cfcefe1699197585"},
-    {file = "stagehand-3.7.0.tar.gz", hash = "sha256:53cdd79111147a4c6fedcf17ef92427472beaf11ad3fcd800736ae3475a5cc54"},
+    {file = "stagehand-0.5.9-py3-none-any.whl", hash = "sha256:cc8d2a114799ea1c3d6f199e86abd6479a8b338a101fffa6824d85b542ed9071"},
+    {file = "stagehand-0.5.9.tar.gz", hash = "sha256:068a2825b02fbc949ab9d1cf59b80d2c17caba0259e759d807f38d0e9ab236b0"},
 ]

 [package.dependencies]
-anyio = ">=3.5.0,<5"
-distro = ">=1.7.0,<2"
-httpx = ">=0.23.0,<1"
-pydantic = ">=1.9.0,<3"
-sniffio = "*"
-typing-extensions = ">=4.14,<5"
+anthropic = ">=0.51.0"
+browserbase = ">=1.4.0"
+google-genai = ">=1.40.0"
+httpx = ">=0.24.0"
+litellm = ">=1.72.0,<=1.80.0"
+nest-asyncio = ">=1.6.0"
+openai = ">=1.99.6"
+playwright = ">=1.42.1"
+pydantic = ">=1.10.0"
+python-dotenv = ">=1.0.0"
+requests = ">=2.31.0"
+rich = ">=13.7.0"

 [package.extras]
-aiohttp = ["aiohttp", "httpx-aiohttp (>=0.1.9)"]
+dev = ["black (>=23.3.0)", "isort (>=5.12.0)", "mypy (>=1.3.0)", "psutil (>=5.9.0)", "pytest (>=7.3.1)", "pytest-asyncio (>=0.21.0)", "pytest-cov (>=4.1.0)", "pytest-mock (>=3.10.0)", "ruff"]

 [[package]]
 name = "starlette"
@@ -7322,6 +7607,48 @@ files = [
 [package.dependencies]
 requests = ">=2.32.3,<3.0.0"

+[[package]]
+name = "tokenizers"
+version = "0.22.2"
+description = ""
+optional = false
+python-versions = ">=3.9"
+groups = ["main"]
+files = [
+    {file = "tokenizers-0.22.2-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:544dd704ae7238755d790de45ba8da072e9af3eea688f698b137915ae959281c"},
+    {file = "tokenizers-0.22.2-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:1e418a55456beedca4621dbab65a318981467a2b188e982a23e117f115ce5001"},
+    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2249487018adec45d6e3554c71d46eb39fa8ea67156c640f7513eb26f318cec7"},
+    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:25b85325d0815e86e0bac263506dd114578953b7b53d7de09a6485e4a160a7dd"},
+    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:bfb88f22a209ff7b40a576d5324bf8286b519d7358663db21d6246fb17eea2d5"},
+    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1c774b1276f71e1ef716e5486f21e76333464f47bece56bbd554485982a9e03e"},
+    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:df6c4265b289083bf710dff49bc51ef252f9d5be33a45ee2bed151114a56207b"},
+    {file = "tokenizers-0.22.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:369cc9fc8cc10cb24143873a0d95438bb8ee257bb80c71989e3ee290e8d72c67"},
+    {file = "tokenizers-0.22.2-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:29c30b83d8dcd061078b05ae0cb94d3c710555fbb44861139f9f83dcca3dc3e4"},
+    {file = "tokenizers-0.22.2-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:37ae80a28c1d3265bb1f22464c856bd23c02a05bb211e56d0c5301a435be6c1a"},
+    {file = "tokenizers-0.22.2-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:791135ee325f2336f498590eb2f11dc5c295232f288e75c99a36c5dbce63088a"},
+    {file = "tokenizers-0.22.2-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:38337540fbbddff8e999d59970f3c6f35a82de10053206a7562f1ea02d046fa5"},
+    {file = "tokenizers-0.22.2-cp39-abi3-win32.whl", hash = "sha256:a6bf3f88c554a2b653af81f3204491c818ae2ac6fbc09e76ef4773351292bc92"},
+    {file = "tokenizers-0.22.2-cp39-abi3-win_amd64.whl", hash = "sha256:c9ea31edff2968b44a88f97d784c2f16dc0729b8b143ed004699ebca91f05c48"},
+    {file = "tokenizers-0.22.2-cp39-abi3-win_arm64.whl", hash = "sha256:9ce725d22864a1e965217204946f830c37876eee3b2ba6fc6255e8e903d5fcbc"},
+    {file = "tokenizers-0.22.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:753d47ebd4542742ef9261d9da92cd545b2cacbb48349a1225466745bb866ec4"},
+    {file = "tokenizers-0.22.2-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e10bf9113d209be7cd046d40fbabbaf3278ff6d18eb4da4c500443185dc1896c"},
+    {file = "tokenizers-0.22.2-pp310-pypy310_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:64d94e84f6660764e64e7e0b22baa72f6cd942279fdbb21d46abd70d179f0195"},
+    {file = "tokenizers-0.22.2-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f01a9c019878532f98927d2bacb79bbb404b43d3437455522a00a30718cdedb5"},
+    {file = "tokenizers-0.22.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:319f659ee992222f04e58f84cbf407cfa66a65fe3a8de44e8ad2bc53e7d99012"},
+    {file = "tokenizers-0.22.2-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:1e50f8554d504f617d9e9d6e4c2c2884a12b388a97c5c77f0bc6cf4cd032feee"},
+    {file = "tokenizers-0.22.2-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1a62ba2c5faa2dd175aaeed7b15abf18d20266189fb3406c5d0550dd34dd5f37"},
+    {file = "tokenizers-0.22.2-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:143b999bdc46d10febb15cbffb4207ddd1f410e2c755857b5a0797961bbdc113"},
+    {file = "tokenizers-0.22.2.tar.gz", hash = "sha256:473b83b915e547aa366d1eee11806deaf419e17be16310ac0a14077f1e28f917"},
+]
+
+[package.dependencies]
+huggingface-hub = ">=0.16.4,<2.0"
+
+[package.extras]
+dev = ["tokenizers[testing]"]
+docs = ["setuptools-rust", "sphinx", "sphinx-rtd-theme"]
+testing = ["datasets", "numpy", "pytest", "pytest-asyncio", "requests", "ruff", "ty"]
+
 [[package]]
 name = "tomli"
 version = "2.4.0"
@@ -7448,6 +7775,25 @@ async = ["aiohttp (>=3.7.3,<4)", "async-lru (>=1.0.3,<3)"]
 dev = ["coverage (>=4.4.2)", "coveralls (>=2.1.0)", "tox (>=3.21.0)"]
 test = ["urllib3 (<2)", "vcrpy (>=1.10.3)"]

+[[package]]
+name = "typer-slim"
+version = "0.21.1"
+description = "Typer, build great CLIs. Easy to code. Based on Python type hints."
+optional = false
+python-versions = ">=3.9"
+groups = ["main"]
+files = [
+    {file = "typer_slim-0.21.1-py3-none-any.whl", hash = "sha256:6e6c31047f171ac93cc5a973c9e617dbc5ab2bddc4d0a3135dc161b4e2020e0d"},
+    {file = "typer_slim-0.21.1.tar.gz", hash = "sha256:73495dd08c2d0940d611c5a8c04e91c2a0a98600cbd4ee19192255a233b6dbfd"},
+]
+
+[package.dependencies]
+click = ">=8.0.0"
+typing-extensions = ">=3.7.4.3"
+
+[package.extras]
+standard = ["rich (>=10.11.0)", "shellingham (>=1.3.0)"]
+
 [[package]]
 name = "typing-extensions"
 version = "4.15.0"
@@ -8630,4 +8976,4 @@ cffi = ["cffi (>=1.17,<2.0) ; platform_python_implementation != \"PyPy\" and pyt
 [metadata]
 lock-version = "2.1"
 python-versions = ">=3.10,<3.14"
-content-hash = "1dd10577184ebff0d10997f4c6ba49484de79b7fa090946e8e5ce5c5bac3cdeb"
+content-hash = "938e93b7de4005bdd60ce5fb542a63df79115f9e21b1cb9940a19605f00d354a"
--- a/autogpt_platform/backend/pyproject.toml
+++ b/autogpt_platform/backend/pyproject.toml
@@ -88,7 +88,7 @@ pandas = "^2.3.1"
 firecrawl-py = "^4.3.6"
 exa-py = "^1.14.20"
 croniter = "^6.0.0"
-stagehand = "^3.4.0"
+stagehand = "^0.5.1"
 gravitas-md2gdocs = "^0.1.0"
 posthog = "^7.6.0"
 fpdf2 = "^2.8.6"
--- a/autogpt_platform/backend/schema.prisma
+++ b/autogpt_platform/backend/schema.prisma
@@ -1301,3 +1301,164 @@ model OAuthRefreshToken {
  @@index([userId, applicationId])
  @@index([expiresAt]) // For cleanup
 }
+
+// ============================================================================
+// LLM Registry Models
+// ============================================================================
+
+enum LlmCostUnit {
+  RUN
+  TOKENS
+}
+
+model LlmProvider {
+  id        String   @id @default(uuid())
+  createdAt DateTime @default(now())
+  updatedAt DateTime @updatedAt
+
+  name        String @unique
+  displayName String
+  description String?
+
+  defaultCredentialProvider String?
+  defaultCredentialId       String?
+  defaultCredentialType     String?
+
+  metadata Json @default("{}")
+
+  Models LlmModel[]
+
+}
+
+model LlmModel {
+  id        String   @id @default(uuid())
+  createdAt DateTime @default(now())
+  updatedAt DateTime @updatedAt
+
+  slug        String @unique
+  displayName String
+  description String?
+
+  providerId String
+  Provider   LlmProvider @relation(fields: [providerId], references: [id], onDelete: Restrict)
+
+  // Creator is the organization that created/trained the model (e.g., OpenAI, Meta)
+  // This is distinct from the provider who hosts/serves the model (e.g., OpenRouter)
+  creatorId String?
+  Creator   LlmModelCreator? @relation(fields: [creatorId], references: [id], onDelete: SetNull)
+
+  contextWindow   Int
+  maxOutputTokens Int?
+  priceTier       Int     @default(1) // 1=cheapest, 2=medium, 3=expensive (DB constraint: 1-3)
+  isEnabled       Boolean @default(true)
+  isRecommended   Boolean @default(false)
+
+  // Model-specific capabilities
+  // These vary per model even within the same provider (e.g., Hugging Face)
+  // Default to false for safety - partially-seeded rows should not be assumed capable
+  supportsTools            Boolean @default(false)
+  supportsJsonOutput       Boolean @default(false)
+  supportsReasoning        Boolean @default(false)
+  supportsParallelToolCalls Boolean @default(false)
+
+  capabilities Json @default("{}")
+  metadata     Json @default("{}")
+
+  Costs             LlmModelCost[]
+  SourceMigrations  LlmModelMigration[] @relation("SourceMigrations")
+  TargetMigrations  LlmModelMigration[] @relation("TargetMigrations")
+
+  @@index([providerId, isEnabled])
+  @@index([creatorId])
+  // Note: slug already has @unique which creates an implicit index
+}
+
+model LlmModelCost {
+  id        String      @id @default(uuid())
+  createdAt DateTime    @default(now())
+  updatedAt DateTime    @updatedAt
+  unit      LlmCostUnit @default(RUN)
+
+  creditCost Int // DB constraint: >= 0
+
+  // Provider identifier (e.g., "openai", "anthropic", "openrouter")
+  // Used to determine which credential system provides the API key.
+  // Allows different pricing for:
+  // - Default provider costs (WHERE credentialId IS NULL)
+  // - User's own API key costs (WHERE credentialId IS NOT NULL)
+  credentialProvider String
+  credentialId       String?
+  credentialType     String?
+  currency           String?
+
+  metadata Json @default("{}")
+
+  llmModelId String
+  Model      LlmModel @relation(fields: [llmModelId], references: [id], onDelete: Cascade)
+
+  // Note: Unique constraints are implemented as partial indexes in migration SQL:
+  // - One for default costs (WHERE credentialId IS NULL)
+  // - One for credential-specific costs (WHERE credentialId IS NOT NULL)
+  // This allows both provider-level defaults and credential-specific overrides
+}
+
+model LlmModelCreator {
+  id        String   @id @default(uuid())
+  createdAt DateTime @default(now())
+  updatedAt DateTime @updatedAt
+
+  name        String  @unique // e.g., "openai", "anthropic", "meta"
+  displayName String  // e.g., "OpenAI", "Anthropic", "Meta"
+  description String?
+  websiteUrl  String? // Link to creator's website
+  logoUrl     String? // URL to creator's logo
+
+  metadata Json @default("{}")
+
+  Models LlmModel[]
+
+}
+
+model LlmModelMigration {
+  id        String   @id @default(uuid())
+  createdAt DateTime @default(now())
+  updatedAt DateTime @updatedAt
+
+  sourceModelSlug String // The original model that was disabled
+  targetModelSlug String // The model workflows were migrated to
+  reason          String? // Why the migration happened (e.g., "Provider outage")
+
+  // FK constraints ensure slugs reference valid models
+  SourceModel LlmModel @relation("SourceMigrations", fields: [sourceModelSlug], references: [slug], onDelete: Restrict)
+  TargetModel LlmModel @relation("TargetMigrations", fields: [targetModelSlug], references: [slug], onDelete: Restrict)
+
+  // Track affected nodes as JSON array of node IDs
+  // Format: ["node-uuid-1", "node-uuid-2", ...]
+  migratedNodeIds Json @default("[]")
+  nodeCount       Int  // Number of nodes migrated (DB constraint: >= 0)
+
+  // Custom pricing override for migrated workflows during the migration period.
+  // Use case: When migrating users from an expensive model (e.g., GPT-4) to a cheaper
+  // one (e.g., GPT-3.5), you may want to temporarily maintain the original pricing
+  // to avoid billing surprises, or offer a discount during the transition.
+  //
+  // IMPORTANT: This field is intended for integration with the billing system.
+  // When billing calculates costs for nodes affected by this migration, it should
+  // check if customCreditCost is set and use it instead of the target model's cost.
+  // If null, the target model's normal cost applies.
+  //
+  // TODO: Integrate with billing system to apply this override during cost calculation.
+  // LIMITATION: This is a simple Int and doesn't distinguish RUN vs TOKENS pricing.
+  // For token-priced models, this may be ambiguous. Consider migrating to a relation
+  // with LlmModelCost or a dedicated override model in a follow-up PR.
+  customCreditCost Int? // DB constraint: >= 0 when not null
+
+  // Revert tracking
+  isReverted Boolean   @default(false)
+  revertedAt DateTime?
+
+  // Note: Partial unique index in migration SQL prevents multiple active migrations per source:
+  // UNIQUE (sourceModelSlug) WHERE isReverted = false
+  @@index([targetModelSlug])
+  @@index([sourceModelSlug, isReverted]) // Composite index for active migration queries
+}
--- a/autogpt_platform/backend/test/agent_generator/test_smart_decision_maker.py
+++ b/autogpt_platform/backend/test/agent_generator/test_smart_decision_maker.py
@@ -1,10 +1,10 @@
 """
-Tests for OrchestratorBlock support in agent generator.
+Tests for SmartDecisionMakerBlock support in agent generator.

 Covers:
- AgentFixer.fix_orchestrator_blocks()
- AgentValidator.validate_orchestrator_blocks()
- End-to-end fix → validate → pipeline for Orchestrator agents
+- AgentFixer.fix_smart_decision_maker_blocks()
+- AgentValidator.validate_smart_decision_maker_blocks()
+- End-to-end fix → validate → pipeline for SmartDecisionMaker agents
 """

 import uuid
@@ -14,7 +14,7 @@ from backend.copilot.tools.agent_generator.helpers import (
    AGENT_EXECUTOR_BLOCK_ID,
    AGENT_INPUT_BLOCK_ID,
    AGENT_OUTPUT_BLOCK_ID,
-    TOOL_ORCHESTRATOR_BLOCK_ID,
+    SMART_DECISION_MAKER_BLOCK_ID,
 )
 from backend.copilot.tools.agent_generator.validator import AgentValidator

@@ -28,10 +28,10 @@ def _make_sdm_node(
    input_default: dict | None = None,
    metadata: dict | None = None,
 ) -> dict:
-    """Create a OrchestratorBlock node dict."""
+    """Create a SmartDecisionMakerBlock node dict."""
    return {
        "id": node_id or _uid(),
-        "block_id": TOOL_ORCHESTRATOR_BLOCK_ID,
+        "block_id": SMART_DECISION_MAKER_BLOCK_ID,
        "input_default": input_default or {},
        "metadata": metadata or {"position": {"x": 0, "y": 0}},
    }
@@ -125,15 +125,15 @@ def _make_orchestrator_agent() -> dict:
 # ---------------------------------------------------------------------------


-class TestFixOrchestratorBlocks:
-    """Tests for AgentFixer.fix_orchestrator_blocks()."""
+class TestFixSmartDecisionMakerBlocks:
+    """Tests for AgentFixer.fix_smart_decision_maker_blocks()."""

    def test_fills_defaults_when_missing(self):
        """All agent-mode defaults are populated for a bare SDM node."""
        fixer = AgentFixer()
        agent = {"nodes": [_make_sdm_node()], "links": []}

-        result = fixer.fix_orchestrator_blocks(agent)
+        result = fixer.fix_smart_decision_maker_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 10
@@ -159,7 +159,7 @@ class TestFixOrchestratorBlocks:
            "links": [],
        }

-        result = fixer.fix_orchestrator_blocks(agent)
+        result = fixer.fix_smart_decision_maker_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 5
@@ -182,7 +182,7 @@ class TestFixOrchestratorBlocks:
            "links": [],
        }

-        result = fixer.fix_orchestrator_blocks(agent)
+        result = fixer.fix_smart_decision_maker_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 10  # kept
@@ -192,7 +192,7 @@ class TestFixOrchestratorBlocks:
        assert len(fixer.fixes_applied) == 3

    def test_skips_non_sdm_nodes(self):
-        """Non-Orchestrator nodes are untouched."""
+        """Non-SmartDecisionMaker nodes are untouched."""
        fixer = AgentFixer()
        other_node = {
            "id": _uid(),
@@ -202,7 +202,7 @@ class TestFixOrchestratorBlocks:
        }
        agent = {"nodes": [other_node], "links": []}

-        result = fixer.fix_orchestrator_blocks(agent)
+        result = fixer.fix_smart_decision_maker_blocks(agent)

        assert "agent_mode_max_iterations" not in result["nodes"][0]["input_default"]
        assert len(fixer.fixes_applied) == 0
@@ -212,12 +212,12 @@ class TestFixOrchestratorBlocks:
        fixer = AgentFixer()
        node = {
            "id": _uid(),
-            "block_id": TOOL_ORCHESTRATOR_BLOCK_ID,
+            "block_id": SMART_DECISION_MAKER_BLOCK_ID,
            "metadata": {},
        }
        agent = {"nodes": [node], "links": []}

-        result = fixer.fix_orchestrator_blocks(agent)
+        result = fixer.fix_smart_decision_maker_blocks(agent)

        assert "input_default" in result["nodes"][0]
        assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 10
@@ -227,13 +227,13 @@ class TestFixOrchestratorBlocks:
        fixer = AgentFixer()
        node = {
            "id": _uid(),
-            "block_id": TOOL_ORCHESTRATOR_BLOCK_ID,
+            "block_id": SMART_DECISION_MAKER_BLOCK_ID,
            "input_default": None,
            "metadata": {},
        }
        agent = {"nodes": [node], "links": []}

-        result = fixer.fix_orchestrator_blocks(agent)
+        result = fixer.fix_smart_decision_maker_blocks(agent)

        assert isinstance(result["nodes"][0]["input_default"], dict)
        assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 10
@@ -255,7 +255,7 @@ class TestFixOrchestratorBlocks:
            "links": [],
        }

-        result = fixer.fix_orchestrator_blocks(agent)
+        result = fixer.fix_smart_decision_maker_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 10  # None → default
@@ -275,7 +275,7 @@ class TestFixOrchestratorBlocks:
            "links": [],
        }

-        result = fixer.fix_orchestrator_blocks(agent)
+        result = fixer.fix_smart_decision_maker_blocks(agent)

        # First node: 3 defaults filled (agent_mode was already set)
        assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 3
@@ -284,7 +284,7 @@ class TestFixOrchestratorBlocks:
        assert len(fixer.fixes_applied) == 7  # 3 + 4

    def test_registered_in_apply_all_fixes(self):
-        """fix_orchestrator_blocks runs as part of apply_all_fixes."""
+        """fix_smart_decision_maker_blocks runs as part of apply_all_fixes."""
        fixer = AgentFixer()
        agent = {
            "nodes": [_make_sdm_node()],
@@ -295,7 +295,7 @@ class TestFixOrchestratorBlocks:

        defaults = result["nodes"][0]["input_default"]
        assert defaults["agent_mode_max_iterations"] == 10
-        assert any("OrchestratorBlock" in fix for fix in fixer.fixes_applied)
+        assert any("SmartDecisionMakerBlock" in fix for fix in fixer.fixes_applied)


 # ---------------------------------------------------------------------------
@@ -303,15 +303,15 @@ class TestFixOrchestratorBlocks:
 # ---------------------------------------------------------------------------


-class TestValidateOrchestratorBlocks:
-    """Tests for AgentValidator.validate_orchestrator_blocks()."""
+class TestValidateSmartDecisionMakerBlocks:
+    """Tests for AgentValidator.validate_smart_decision_maker_blocks()."""

    def test_valid_sdm_with_tools(self):
        """SDM with downstream tool links passes validation."""
        validator = AgentValidator()
        agent = _make_orchestrator_agent()

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is True
        assert len(validator.errors) == 0
@@ -325,7 +325,7 @@ class TestValidateOrchestratorBlocks:
            "links": [],  # no tool links
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is False
        assert len(validator.errors) == 1
@@ -344,20 +344,20 @@ class TestValidateOrchestratorBlocks:
            ],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is False
        assert len(validator.errors) == 1

    def test_no_sdm_nodes_passes(self):
-        """Agent without Orchestrator nodes passes trivially."""
+        """Agent without SmartDecisionMaker nodes passes trivially."""
        validator = AgentValidator()
        agent = {
            "nodes": [_make_input_node(), _make_output_node()],
            "links": [],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is True
        assert len(validator.errors) == 0
@@ -373,7 +373,7 @@ class TestValidateOrchestratorBlocks:
        )
        agent = {"nodes": [sdm], "links": []}

-        validator.validate_orchestrator_blocks(agent)
+        validator.validate_smart_decision_maker_blocks(agent)

        assert "My Orchestrator" in validator.errors[0]

@@ -392,7 +392,7 @@ class TestValidateOrchestratorBlocks:
            ],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is False
        assert len(validator.errors) == 1
@@ -408,7 +408,7 @@ class TestValidateOrchestratorBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is False
        assert any("agent_mode_max_iterations=0" in e for e in validator.errors)
@@ -423,7 +423,7 @@ class TestValidateOrchestratorBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is True
        assert len(validator.errors) == 0
@@ -438,7 +438,7 @@ class TestValidateOrchestratorBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is False
        assert any("unusually high" in e for e in validator.errors)
@@ -453,7 +453,7 @@ class TestValidateOrchestratorBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is False
        assert any("non-integer" in e for e in validator.errors)
@@ -468,7 +468,7 @@ class TestValidateOrchestratorBlocks:
            "links": [_link(sdm["id"], "tools", tool["id"], "query")],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is False
        assert any("invalid" in e and "-5" in e for e in validator.errors)
@@ -488,14 +488,14 @@ class TestValidateOrchestratorBlocks:
            ],
        }

-        result = validator.validate_orchestrator_blocks(agent)
+        result = validator.validate_smart_decision_maker_blocks(agent)

        assert result is False
        assert len(validator.errors) == 1
        assert "no downstream tool blocks" in validator.errors[0]

    def test_registered_in_validate(self):
-        """validate_orchestrator_blocks runs as part of validate()."""
+        """validate_smart_decision_maker_blocks runs as part of validate()."""
        validator = AgentValidator()
        sdm = _make_sdm_node()
        agent = {
@@ -511,8 +511,8 @@ class TestValidateOrchestratorBlocks:
        # Build a minimal blocks list with the SDM block info
        blocks = [
            {
-                "id": TOOL_ORCHESTRATOR_BLOCK_ID,
-                "name": "OrchestratorBlock",
+                "id": SMART_DECISION_MAKER_BLOCK_ID,
+                "name": "SmartDecisionMakerBlock",
                "inputSchema": {"properties": {"prompt": {"type": "string"}}},
                "outputSchema": {
                    "properties": {
@@ -557,7 +557,7 @@ class TestValidateOrchestratorBlocks:
 # ---------------------------------------------------------------------------


-class TestOrchestratorE2EPipeline:
+class TestSmartDecisionMakerE2EPipeline:
    """End-to-end tests: build agent JSON → fix → validate."""

    def test_orchestrator_agent_fix_then_validate(self):
@@ -570,7 +570,7 @@ class TestOrchestratorE2EPipeline:

        # Verify defaults were applied
        sdm_nodes = [
-            n for n in fixed["nodes"] if n["block_id"] == TOOL_ORCHESTRATOR_BLOCK_ID
+            n for n in fixed["nodes"] if n["block_id"] == SMART_DECISION_MAKER_BLOCK_ID
        ]
        assert len(sdm_nodes) == 1
        assert sdm_nodes[0]["input_default"]["agent_mode_max_iterations"] == 10
@@ -578,7 +578,7 @@ class TestOrchestratorE2EPipeline:

        # Validate (standalone SDM check)
        validator = AgentValidator()
-        assert validator.validate_orchestrator_blocks(fixed) is True
+        assert validator.validate_smart_decision_maker_blocks(fixed) is True

    def test_bare_sdm_no_tools_fix_then_validate(self):
        """SDM without tools: fixer fills defaults, validator catches error."""
@@ -606,7 +606,7 @@ class TestOrchestratorE2EPipeline:

        # Validate catches missing tools
        validator = AgentValidator()
-        assert validator.validate_orchestrator_blocks(fixed) is False
+        assert validator.validate_smart_decision_maker_blocks(fixed) is False
        assert any("no downstream tool blocks" in e for e in validator.errors)

    def test_sdm_with_user_set_bounded_iterations(self):
@@ -614,7 +614,7 @@ class TestOrchestratorE2EPipeline:
        agent = _make_orchestrator_agent()
        # Simulate user setting bounded iterations
        for node in agent["nodes"]:
-            if node["block_id"] == TOOL_ORCHESTRATOR_BLOCK_ID:
+            if node["block_id"] == SMART_DECISION_MAKER_BLOCK_ID:
                node["input_default"]["agent_mode_max_iterations"] = 5
                node["input_default"]["sys_prompt"] = "You are a helpful orchestrator"

@@ -622,7 +622,7 @@ class TestOrchestratorE2EPipeline:
        fixed = fixer.apply_all_fixes(agent)

        sdm = next(
-            n for n in fixed["nodes"] if n["block_id"] == TOOL_ORCHESTRATOR_BLOCK_ID
+            n for n in fixed["nodes"] if n["block_id"] == SMART_DECISION_MAKER_BLOCK_ID
        )
        assert sdm["input_default"]["agent_mode_max_iterations"] == 5
        assert sdm["input_default"]["sys_prompt"] == "You are a helpful orchestrator"
@@ -638,8 +638,8 @@ class TestOrchestratorE2EPipeline:

        blocks = [
            {
-                "id": TOOL_ORCHESTRATOR_BLOCK_ID,
-                "name": "OrchestratorBlock",
+                "id": SMART_DECISION_MAKER_BLOCK_ID,
+                "name": "SmartDecisionMakerBlock",
                "inputSchema": {
                    "properties": {
                        "prompt": {"type": "string"},
@@ -709,5 +709,5 @@ class TestOrchestratorE2EPipeline:
        assert is_valid, f"Validation failed: {error_msg}"

        # SDM-specific validation should pass (has tool links)
-        sdm_errors = [e for e in validator.errors if "OrchestratorBlock" in e]
+        sdm_errors = [e for e in validator.errors if "SmartDecisionMakerBlock" in e]
        assert len(sdm_errors) == 0, f"Unexpected SDM errors: {sdm_errors}"
--- a/autogpt_platform/db/docker/docker-compose.yml
+++ b/autogpt_platform/db/docker/docker-compose.yml
@@ -66,9 +66,6 @@ services:
    container_name: supabase-kong
    image: kong:2.8.1
    restart: unless-stopped
-    networks:
-      - default
-      - shared-network
    ports:
      - 8000:8000/tcp
      - 8443:8443/tcp
@@ -410,9 +407,6 @@ services:
    container_name: supabase-db
    image: supabase/postgres:15.8.1.049
    restart: unless-stopped
-    networks:
-      - default
-      - app-network
    volumes:
      - ./volumes/db/realtime.sql:/docker-entrypoint-initdb.d/migrations/99-realtime.sql:Z
      # Must be superuser to create event trigger
@@ -544,11 +538,5 @@ services:
        "/app/bin/migrate && /app/bin/supavisor eval \"$$(cat /etc/pooler/pooler.exs)\" && /app/bin/server"
      ]

-networks:
-  shared-network:
-    name: shared-network
-  app-network:
-    name: app-network
-
 volumes:
  supabase-config:
--- a/autogpt_platform/db/docker/reset.sh
+++ b/autogpt_platform/db/docker/reset.sh
@@ -10,12 +10,6 @@ then
 fi

 echo "Stopping and removing all containers..."
-# Use the platform compose to tear everything down so no orphan containers remain
-# (the platform compose manages supabase containers via `extends`, using the
-# standalone supabase compose here would leave orphans that conflict on next start)
-if [ -f "../../docker-compose.yml" ]; then
-  docker compose -f ../../docker-compose.yml down -v --remove-orphans
-fi
 docker compose -f docker-compose.yml -f ./dev/docker-compose.dev.yml down -v --remove-orphans

 echo "Cleaning up bind-mounted directories..."
--- a/autogpt_platform/docker-compose.platform.yml
+++ b/autogpt_platform/docker-compose.platform.yml
@@ -114,8 +114,6 @@ services:
      <<: *backend-env
    ports:
      - "8006:8006"
-    volumes:
-      - workspace-data:/app/autogpt_platform/backend/workspaces
    networks:
      - app-network
    logging:
@@ -187,8 +185,6 @@ services:
      PYTHONUNBUFFERED: "1"
    ports:
      - "8008:8008"
-    volumes:
-      - workspace-data:/app/autogpt_platform/backend/workspaces
    networks:
      - app-network
    logging:
@@ -372,9 +368,6 @@ services:
      SUPABASE_URL: http://kong:8000
      AGPT_SERVER_URL: http://rest_server:8006/api
      AGPT_WS_SERVER_URL: ws://websocket_server:8001/ws
-volumes:
-  workspace-data:
-
 networks:
  app-network:
    driver: bridge
--- a/autogpt_platform/docker-compose.yml
+++ b/autogpt_platform/docker-compose.yml
@@ -7,7 +7,6 @@ networks:
 volumes:
  supabase-config:
  clamav-data:
-  workspace-data:

 x-agpt-services:
  &agpt-services
--- a/autogpt_platform/frontend/src/app/api/openapi.json
+++ b/autogpt_platform/frontend/src/app/api/openapi.json
--- a/autogpt_platform/frontend/src/lib/autogpt-server-api/types.ts
+++ b/autogpt_platform/frontend/src/lib/autogpt-server-api/types.ts
@@ -743,7 +743,7 @@ export enum BlockUIType {
 export enum SpecialBlockID {
  AGENT = "e189baac-8c20-45a1-94a7-55177ea42565",
  MCP_TOOL = "a0a4b1c2-d3e4-4f56-a7b8-c9d0e1f2a3b4",
-  TOOL_ORCHESTRATOR = "3b191d9f-356f-482d-8238-ba04b6d18381",
+  SMART_DECISION = "3b191d9f-356f-482d-8238-ba04b6d18381",
  OUTPUT = "363ae599-353e-4804-937e-b2ee3cef3da4",
 }

--- a/docs/integrations/README.md
+++ b/docs/integrations/README.md
@@ -230,10 +230,10 @@ Below is a comprehensive list of all available blocks, categorized by their prim
 | [Ideogram Model](block-integrations/llm.md#ideogram-model) | This block runs Ideogram models with both simple and advanced settings |
 | [Jina Chunking](block-integrations/jina/chunking.md#jina-chunking) | Chunks texts using Jina AI's segmentation service |
 | [Jina Embedding](block-integrations/jina/embeddings.md#jina-embedding) | Generates embeddings using Jina AI |
-| [Orchestrator](block-integrations/llm.md#orchestrator) | Uses AI to intelligently decide what tool to use |
 | [Perplexity](block-integrations/llm.md#perplexity) | Query Perplexity's sonar models with real-time web search capabilities and receive annotated responses with source citations |
 | [Replicate Flux Advanced Model](block-integrations/replicate/flux_advanced.md#replicate-flux-advanced-model) | This block runs Flux models on Replicate with advanced settings |
 | [Replicate Model](block-integrations/replicate/replicate_block.md#replicate-model) | Run Replicate models synchronously |
+| [Smart Decision Maker](block-integrations/llm.md#smart-decision-maker) | Uses AI to intelligently decide what tool to use |
 | [Stagehand Act](block-integrations/stagehand/blocks.md#stagehand-act) | Interact with a web page by performing actions on a web page |
 | [Stagehand Extract](block-integrations/stagehand/blocks.md#stagehand-extract) | Extract structured data from a webpage |
 | [Stagehand Observe](block-integrations/stagehand/blocks.md#stagehand-observe) | Find suggested actions for your workflows |
--- a/docs/integrations/block-integrations/llm.md
+++ b/docs/integrations/block-integrations/llm.md
@@ -706,49 +706,6 @@ Advanced options include upscaling, custom color palettes, and negative prompts

 ---

-## Orchestrator
-
-### What it is
-Uses AI to intelligently decide what tool to use.
-
-### How it works
-<!-- MANUAL: how_it_works -->
-_Add technical explanation here._
-<!-- END MANUAL -->
-
-### Inputs
-
-| Input | Description | Type | Required |
-|-------|-------------|------|----------|
-| prompt | The prompt to send to the language model. | str | Yes |
-| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
-| multiple_tool_calls | Whether to allow multiple tool calls in a single response. | bool | No |
-| sys_prompt | The system prompt to provide additional context to the model. | str | No |
-| conversation_history | The conversation history to provide context for the prompt. | List[Dict[str, Any]] | No |
-| last_tool_output | The output of the last tool that was called. | Last Tool Output | No |
-| retry | Number of times to retry the LLM call if the response does not match the expected format. | int | No |
-| prompt_values | Values used to fill in the prompt. The values can be used in the prompt by putting them in a double curly braces, e.g. {{variable_name}}. | Dict[str, str] | No |
-| max_tokens | The maximum number of tokens to generate in the chat completion. | int | No |
-| ollama_host | Ollama host for local  models | str | No |
-| agent_mode_max_iterations | Maximum iterations for agent mode. 0 = traditional mode (single LLM call, yield tool calls for external execution), -1 = infinite agent mode (loop until finished), 1+ = agent mode with max iterations limit. | int | No |
-| conversation_compaction | Automatically compact the context window once it hits the limit | bool | No |
-
-### Outputs
-
-| Output | Description | Type |
-|--------|-------------|------|
-| error | Error message if the operation failed | str |
-| tools | The tools that are available to use. | Tools |
-| finished | The finished message to display to the user. | str |
-| conversations | The conversation history to provide context for the prompt. | List[Any] |
-
-### Possible use case
-<!-- MANUAL: use_case -->
-_Add practical use case examples here._
-<!-- END MANUAL -->
-
---
-
 ## Perplexity

 ### What it is
@@ -789,6 +746,55 @@ Choose from different sonar model variants including deep-research for comprehen

 ---

+## Smart Decision Maker
+
+### What it is
+Uses AI to intelligently decide what tool to use.
+
+### How it works
+<!-- MANUAL: how_it_works -->
+This block enables agentic behavior by letting an LLM decide which tools to use based on the prompt. Connect tool outputs to feed back results, creating autonomous reasoning loops.
+
+Configure agent_mode_max_iterations to control loop behavior: 0 for single decisions, -1 for infinite looping, or a positive number for max iterations. The block outputs tool calls or a finished message.
+<!-- END MANUAL -->
+
+### Inputs
+
+| Input | Description | Type | Required |
+|-------|-------------|------|----------|
+| prompt | The prompt to send to the language model. | str | Yes |
+| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
+| multiple_tool_calls | Whether to allow multiple tool calls in a single response. | bool | No |
+| sys_prompt | The system prompt to provide additional context to the model. | str | No |
+| conversation_history | The conversation history to provide context for the prompt. | List[Dict[str, Any]] | No |
+| last_tool_output | The output of the last tool that was called. | Last Tool Output | No |
+| retry | Number of times to retry the LLM call if the response does not match the expected format. | int | No |
+| prompt_values | Values used to fill in the prompt. The values can be used in the prompt by putting them in a double curly braces, e.g. {{variable_name}}. | Dict[str, str] | No |
+| max_tokens | The maximum number of tokens to generate in the chat completion. | int | No |
+| ollama_host | Ollama host for local  models | str | No |
+| agent_mode_max_iterations | Maximum iterations for agent mode. 0 = traditional mode (single LLM call, yield tool calls for external execution), -1 = infinite agent mode (loop until finished), 1+ = agent mode with max iterations limit. | int | No |
+| conversation_compaction | Automatically compact the context window once it hits the limit | bool | No |
+
+### Outputs
+
+| Output | Description | Type |
+|--------|-------------|------|
+| error | Error message if the operation failed | str |
+| tools | The tools that are available to use. | Tools |
+| finished | The finished message to display to the user. | str |
+| conversations | The conversation history to provide context for the prompt. | List[Any] |
+
+### Possible use case
+<!-- MANUAL: use_case -->
+**Autonomous Agents**: Build agents that can independently decide which tools to use for tasks.
+
+**Dynamic Workflows**: Create workflows that adapt their execution path based on AI decisions.
+
+**Multi-Tool Orchestration**: Let AI coordinate multiple tools to accomplish complex goals.
+<!-- END MANUAL -->
+
+---
+
 ## Unreal Text To Speech

 ### What it is
--- a/docs/integrations/block-integrations/misc.md
+++ b/docs/integrations/block-integrations/misc.md
@@ -46,8 +46,6 @@ Execute tasks using AutoGPT AutoPilot with full access to platform tools (agent
 ### How it works
 <!-- MANUAL: how_it_works -->
 This block invokes the platform's copilot system directly via `stream_chat_completion_sdk`. It creates (or resumes) a chat session, streams the autopilot's response collecting text deltas, tool call details, and token usage, then returns the aggregated results. A recursion depth guard prevents infinite loops when the autopilot calls this block as a sub-agent.
-
-Tool and block identifiers provided in `tools` and `blocks` are validated at run time before any execution begins — unknown names or UUIDs produce an error output immediately. When valid, the permissions object is passed into the SDK layer, which narrows the `allowed_tools` list sent to Claude; blocks are enforced at the `run_block` tool call site so the copilot cannot circumvent the filter by calling `run_block` directly. For sub-agent patterns (where an autopilot invokes another AutoPilot block), permissions are inherited: the child's effective allowed set is intersected with the parent's, so a sub-agent can only be *more* restrictive than its parent, never more permissive.
 <!-- END MANUAL -->

 ### Inputs
@@ -58,10 +56,6 @@ Tool and block identifiers provided in `tools` and `blocks` are validated at run
 | system_context | Optional additional context prepended to the prompt. Use this to constrain autopilot behavior, provide domain context, or set output format requirements. | str | No |
 | session_id | Session ID to continue an existing autopilot conversation. Leave empty to start a new session. Use the session_id output from a previous run to continue. | str | No |
 | max_recursion_depth | Maximum nesting depth when the autopilot calls this block recursively (sub-agent pattern). Prevents infinite loops. | int | No |
-| tools | Tool names to filter. Works with tools_exclude to form an allow-list or deny-list. Leave empty to apply no tool filter. | List["add_understanding" \| "bash_exec" \| "browser_act" \| "browser_navigate" \| "browser_screenshot" \| "connect_integration" \| "continue_run_block" \| "create_agent" \| "create_feature_request" \| "create_folder" \| "customize_agent" \| "delete_folder" \| "delete_workspace_file" \| "edit_agent" \| "find_agent" \| "find_block" \| "find_library_agent" \| "fix_agent_graph" \| "get_agent_building_guide" \| "get_doc_page" \| "get_mcp_guide" \| "list_folders" \| "list_workspace_files" \| "move_agents_to_folder" \| "move_folder" \| "read_workspace_file" \| "run_agent" \| "run_block" \| "run_mcp_tool" \| "search_docs" \| "search_feature_requests" \| "update_folder" \| "validate_agent_graph" \| "view_agent_output" \| "web_fetch" \| "write_workspace_file" \| "Edit" \| "Glob" \| "Grep" \| "Read" \| "Task" \| "TodoWrite" \| "WebSearch" \| "Write"] | No |
-| tools_exclude | Controls how the 'tools' list is interpreted. True (default): 'tools' is a deny-list — listed tools are blocked, all others are allowed. An empty 'tools' list means allow everything. False: 'tools' is an allow-list — only listed tools are permitted. | bool | No |
-| blocks | Block identifiers to filter when the copilot uses run_block. Each entry can be: a block name (e.g. 'HTTP Request'), a full block UUID, or the first 8 hex characters of the UUID (e.g. 'c069dc6b'). Works with blocks_exclude. Leave empty to apply no block filter. | List[str] | No |
-| blocks_exclude | Controls how the 'blocks' list is interpreted. True (default): 'blocks' is a deny-list — listed blocks are blocked, all others are allowed. An empty 'blocks' list means allow everything. False: 'blocks' is an allow-list — only listed blocks are permitted. | bool | No |

 ### Outputs

--- a/docs/integrations/block-integrations/stagehand/blocks.md
+++ b/docs/integrations/block-integrations/stagehand/blocks.md
@@ -24,8 +24,9 @@ Configure timeouts for DOM settlement and page loading. Variables can be passed
 | url | URL to navigate to. | str | Yes |
 | action | Action to perform. Suggested actions are: click, fill, type, press, scroll, select from dropdown. For multi-step actions, add an entry for each step. | List[str] | Yes |
 | variables | Variables to use in the action. Variables contains data you want the action to use. | Dict[str, str] | No |
-| dom_settle_timeout_ms | Timeout in ms to wait for the DOM to settle after navigation. | int | No |
-| timeout_ms | Timeout in ms for each action. | int | No |
+| iframes | Whether to search within iframes. If True, Stagehand will search for actions within iframes. | bool | No |
+| domSettleTimeoutMs | Timeout in milliseconds for DOM settlement.Wait longer for dynamic content | int | No |
+| timeoutMs | Timeout in milliseconds for DOM ready. Extended timeout for slow-loading forms | int | No |

 ### Outputs

@@ -33,7 +34,7 @@ Configure timeouts for DOM settlement and page loading. Variables can be passed
 |--------|-------------|------|
 | error | Error message if the operation failed | str |
 | success | Whether the action was completed successfully | bool |
-| message | Details about the action's execution. | str |
+| message | Details about the action’s execution. | str |
 | action | Action performed | str |

 ### Possible use case
@@ -67,7 +68,8 @@ Supports searching within iframes and configurable timeouts for dynamic content
 | model | LLM to use for Stagehand (provider is inferred) | "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "claude-sonnet-4-5-20250929" \| "claude-sonnet-4-6" | No |
 | url | URL to navigate to. | str | Yes |
 | instruction | Natural language description of elements or actions to discover. | str | Yes |
-| dom_settle_timeout_ms | Timeout in ms to wait for the DOM to settle after navigation. | int | No |
+| iframes | Whether to search within iframes. If True, Stagehand will search for actions within iframes. | bool | No |
+| domSettleTimeoutMs | Timeout in milliseconds for DOM settlement.Wait longer for dynamic content | int | No |

 ### Outputs

@@ -107,7 +109,8 @@ Use this to explore a page's interactive elements before building automated work
 | model | LLM to use for Stagehand (provider is inferred) | "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "claude-sonnet-4-5-20250929" \| "claude-sonnet-4-6" | No |
 | url | URL to navigate to. | str | Yes |
 | instruction | Natural language description of elements or actions to discover. | str | Yes |
-| dom_settle_timeout_ms | Timeout in ms to wait for the DOM to settle after navigation. | int | No |
+| iframes | Whether to search within iframes. If True, Stagehand will search for actions within iframes. | bool | No |
+| domSettleTimeoutMs | Timeout in milliseconds for DOM settlement.Wait longer for dynamic content | int | No |

 ### Outputs