Merge remote-tracking branch 'origin/codex/platform-cost-tracking' into combined-preview-test

Merge remote-tracking branch 'origin/fix/copilot-tool-output-e2b-bridging' into combined-preview-test
Merge origin/fix/copilot-subagent-security (resolved conflicts)
2026-04-08 03:00:28 -04:00 · 2026-04-02 18:32:06 +02:00 · 2026-04-02 18:32:06 +02:00 · 2026-04-02 18:32:06 +02:00 · 2026-04-02 18:32:05 +02:00 · 2026-04-02 18:32:05 +02:00
363 changed files with 37443 additions and 5461 deletions
--- a/.agents/skills
+++ b/.agents/skills
@@ -0,0 +1 @@
+../.claude/skills
--- a/.claude/skills/open-pr/SKILL.md
+++ b/.claude/skills/open-pr/SKILL.md
@@ -0,0 +1,106 @@
+---
+name: open-pr
+description: Open a pull request with proper PR template, test coverage, and review workflow. Guides agents through creating a PR that follows repo conventions, ensures existing behaviors aren't broken, covers new behaviors with tests, and handles review via bot when local testing isn't possible. TRIGGER when user asks to "open a PR", "create a PR", "make a PR", "submit a PR", "open pull request", "push and create PR", or any variation of opening/submitting a pull request.
+user-invocable: true
+args: "[base-branch] — optional target branch (defaults to dev)."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Open a Pull Request
+
+## Step 1: Pre-flight checks
+
+Before opening the PR:
+
+1. Ensure all changes are committed
+2. Ensure the branch is pushed to the remote (`git push -u origin <branch>`)
+3. Run linters/formatters across the whole repo (not just changed files) and commit any fixes
+
+## Step 2: Test coverage
+
+**This is critical.** Before opening the PR, verify:
+
+### Existing behavior is not broken
+- Identify which modules/components your changes touch
+- Run the existing test suites for those areas
+- If tests fail, fix them before opening the PR — do not open a PR with known regressions
+
+### New behavior has test coverage
+- Every new feature, endpoint, or behavior change needs tests
+- If you added a new block, add tests for that block
+- If you changed API behavior, add or update API tests
+- If you changed frontend behavior, verify it doesn't break existing flows
+
+If you cannot run the full test suite locally, note which tests you ran and which you couldn't in the test plan.
+
+## Step 3: Create the PR using the repo template
+
+Read the canonical PR template at `.github/PULL_REQUEST_TEMPLATE.md` and use it **verbatim** as your PR body:
+
+1. Read the template: `cat .github/PULL_REQUEST_TEMPLATE.md`
+2. Preserve the exact section titles and formatting, including:
+   - `### Why / What / How`
+   - `### Changes 🏗️`
+   - `### Checklist 📋`
+3. Replace HTML comment prompts (`<!-- ... -->`) with actual content; do not leave them in
+4. **Do not pre-check boxes** — leave all checkboxes as `- [ ]` until each step is actually completed
+5. Do not alter the template structure, rename sections, or remove any checklist items
+
+**PR title must use conventional commit format** (e.g., `feat(backend): add new block`, `fix(frontend): resolve routing bug`, `dx(skills): update PR workflow`). See CLAUDE.md for the full list of scopes.
+
+Use `gh pr create` with the base branch (defaults to `dev` if no `[base-branch]` was provided). Use `--body-file` to avoid shell interpretation of backticks and special characters:
+
+```bash
+BASE_BRANCH="${BASE_BRANCH:-dev}"
+PR_BODY=$(mktemp)
+cat > "$PR_BODY" << 'PREOF'
+<filled-in template from .github/PULL_REQUEST_TEMPLATE.md>
+PREOF
+gh pr create --base "$BASE_BRANCH" --title "<type>(scope): short description" --body-file "$PR_BODY"
+rm "$PR_BODY"
+```
+
+## Step 4: Review workflow
+
+### If you have a workspace that allows testing (docker, running backend, etc.)
+- Run `/pr-test` to do E2E manual testing of the PR using docker compose, agent-browser, and API calls. This is the most thorough way to validate your changes before review.
+- After testing, run `/pr-review` to self-review the PR for correctness, security, code quality, and testing gaps before requesting human review.
+
+### If you do NOT have a workspace that allows testing
+This is common for agents running in worktrees without a full stack. In this case:
+
+1. Run `/pr-review` locally to catch obvious issues before pushing
+2. **Comment `/review` on the PR** after creating it to trigger the review bot
+3. **Poll for the review** rather than blindly waiting — check for new review comments every 30 seconds using `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` and the GraphQL inline threads query. The bot typically responds within 30 minutes, but polling lets the agent react as soon as it arrives.
+4. Do NOT proceed or merge until the bot review comes back
+5. Address any issues the bot raises — use `/pr-address` which has a full polling loop with CI + comment tracking
+
+```bash
+# After creating the PR:
+PR_NUMBER=$(gh pr view --json number -q .number)
+gh pr comment "$PR_NUMBER" --body "/review"
+# Then use /pr-address to poll for and address the review when it arrives
+```
+
+## Step 5: Address review feedback
+
+Once the review bot or human reviewers leave comments:
+- Run `/pr-address` to address review comments. It will loop until CI is green and all comments are resolved.
+- Do not merge without human approval.
+
+## Related skills
+
+| Skill | When to use |
+|---|---|
+| `/pr-test` | E2E testing with docker compose, agent-browser, API calls — use when you have a running workspace |
+| `/pr-review` | Review for correctness, security, code quality — use before requesting human review |
+| `/pr-address` | Address reviewer comments and loop until CI green — use after reviews come in |
+
+## Step 6: Post-creation
+
+After the PR is created and review is triggered:
+- Share the PR URL with the user
+- If waiting on the review bot, let the user know the expected wait time (~30 min)
+- Do not merge without human approval
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -17,6 +17,14 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
 gh pr view {N}
 ```

+## Read the PR description
+
+Understand the **Why / What / How** before addressing comments — you need context to make good fixes:
+
+```bash
+gh pr view {N} --json body --jq '.body'
+```
+
 ## Fetch comments (all sources)

 ### 1. Inline review threads — GraphQL (primary source of actionable items)
@@ -105,7 +113,9 @@ kill $REST_PID 2>/dev/null; trap - EXIT
 ```
 Never manually edit files in `src/app/api/__generated__/`.

-Then commit and **push immediately** — never batch commits without pushing.
+Then commit and **push immediately** — never batch commits without pushing. Each fix should be visible on GitHub right away so CI can start and reviewers can see progress.
+
+**Never push empty commits** (`git commit --allow-empty`) to re-trigger CI or bot checks. When a check fails, investigate the root cause (unchecked PR checklist, unaddressed review comments, code issues) and fix those directly. Empty commits add noise to git history.

 For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).

--- a/.claude/skills/pr-review/SKILL.md
+++ b/.claude/skills/pr-review/SKILL.md
@@ -17,6 +17,16 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
 gh pr view {N}
 ```

+## Read the PR description
+
+Before reading code, understand the **why**, **what**, and **how** from the PR description:
+
+```bash
+gh pr view {N} --json body --jq '.body'
+```
+
+Every PR should have a Why / What / How structure. If any of these are missing, note it as feedback.
+
 ## Read the diff

 ```bash
@@ -34,6 +44,8 @@ gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews

 ## What to check

+**Description quality:** Does the PR description cover Why (motivation/problem), What (summary of changes), and How (approach/implementation details)? If any are missing, request them — you can't judge the approach without understanding the problem and intent.
+
 **Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).

 **Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).
--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -5,13 +5,96 @@ user-invocable: true
 argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
 metadata:
  author: autogpt-team
-  version: "1.0.0"
+  version: "2.0.0"
 ---

 # Manual E2E Test

 Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.

+## Critical Requirements
+
+These are NON-NEGOTIABLE. Every test run MUST satisfy ALL the following:
+
+### 1. Screenshots at Every Step
+- Take a screenshot at EVERY significant test step — not just at the end
+- Every test scenario MUST have at least one BEFORE and one AFTER screenshot
+- Name screenshots sequentially: `{NN}-{action}-{state}.png` (e.g., `01-credits-before.png`, `02-credits-after.png`)
+- If a screenshot is missing for a scenario, the test is INCOMPLETE — go back and take it
+
+### 2. Screenshots MUST Be Posted to PR
+- Push ALL screenshots to a temp branch `test-screenshots/pr-{N}`
+- Post a PR comment with ALL screenshots embedded inline using GitHub raw URLs
+- This is NOT optional — every test run MUST end with a PR comment containing screenshots
+- If screenshot upload fails, retry. If it still fails, list failed files and require manual drag-and-drop/paste attachment in the PR comment
+
+### 3. State Verification with Before/After Evidence
+- For EVERY state-changing operation (API call, user action), capture the state BEFORE and AFTER
+- Log the actual API response values (e.g., `credits_before=100, credits_after=95`)
+- Screenshot MUST show the relevant UI state change
+- Compare expected vs actual values explicitly — do not just eyeball it
+
+### 4. Negative Test Cases Are Mandatory
+- Test at least ONE negative case per feature (e.g., insufficient credits, invalid input, unauthorized access)
+- Verify error messages are user-friendly and accurate
+- Verify the system state did NOT change after a rejected operation
+
+### 5. Test Report Must Include Full Evidence
+Each test scenario in the report MUST have:
+- **Steps**: What was done (exact commands or UI actions)
+- **Expected**: What should happen
+- **Actual**: What actually happened
+- **API Evidence**: Before/after API response values for state-changing operations
+- **Screenshot Evidence**: Before/after screenshots with explanations
+
+## State Manipulation for Realistic Testing
+
+When testing features that depend on specific states (rate limits, credits, quotas):
+
+1. **Use Redis CLI to set counters directly:**
+   ```bash
+   # Find the Redis container
+   REDIS_CONTAINER=$(docker ps --format '{{.Names}}' | grep redis | head -1)
+   # Set a key with expiry
+   docker exec $REDIS_CONTAINER redis-cli SET key value EX ttl
+   # Example: Set rate limit counter to near-limit
+   docker exec $REDIS_CONTAINER redis-cli SET "rate_limit:user:test@test.com" 99 EX 3600
+   # Example: Check current value
+   docker exec $REDIS_CONTAINER redis-cli GET "rate_limit:user:test@test.com"
+   ```
+
+2. **Use API calls to check before/after state:**
+   ```bash
+   # BEFORE: Record current state
+   BEFORE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   echo "Credits BEFORE: $BEFORE"
+
+   # Perform the action...
+
+   # AFTER: Record new state and compare
+   AFTER=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   echo "Credits AFTER: $AFTER"
+   echo "Delta: $(( BEFORE - AFTER ))"
+   ```
+
+3. **Take screenshots BEFORE and AFTER state changes** — the UI must reflect the backend state change
+
+4. **Never rely on mocked/injected browser state** — always use real backend state. Do NOT use `agent-browser eval` to fake UI state. The backend must be the source of truth.
+
+5. **Use direct DB queries when needed:**
+   ```bash
+   # Query via Supabase's PostgREST or docker exec into the DB
+   docker exec supabase-db psql -U supabase_admin -d postgres -c "SELECT credits FROM user_credits WHERE user_id = '...';"
+   ```
+
+6. **After every API test, verify the state change actually persisted:**
+   ```bash
+   # Example: After a credits purchase, verify DB matches API
+   API_CREDITS=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   DB_CREDITS=$(docker exec supabase-db psql -U supabase_admin -d postgres -t -c "SELECT credits FROM user_credits WHERE user_id = '...';" | tr -d ' ')
+   [ "$API_CREDITS" = "$DB_CREDITS" ] && echo "CONSISTENT" || echo "MISMATCH: API=$API_CREDITS DB=$DB_CREDITS"
+   ```
+
 ## Arguments

 - `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
@@ -53,14 +136,20 @@ Before testing, understand what changed:

 ```bash
 cd $WORKTREE_PATH
+
+# Read PR description to understand the WHY
+gh pr view {N} --json body --jq '.body'
+
 git log --oneline dev..HEAD | head -20
 git diff dev --stat
 ```

-Read the changed files to understand:
-1. What feature/fix does this PR implement?
-2. What components are affected? (backend, frontend, copilot, executor, etc.)
-3. What are the key user-facing behaviors to test?
+Read the PR description (Why / What / How) and changed files to understand:
+0. **Why** does this PR exist? What problem does it solve?
+1. **What** feature/fix does this PR implement?
+2. **How** does it work? What's the approach?
+3. What components are affected? (backend, frontend, copilot, executor, etc.)
+4. What are the key user-facing behaviors to test?

 ## Step 2: Write test scenarios

@@ -75,15 +164,21 @@ Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:

 ## API Tests (if applicable)
 1. [Endpoint] — [expected behavior]
+   - Before state: [what to check before]
+   - After state: [what to verify changed]

 ## UI Tests (if applicable)
 1. [Page/component] — [interaction to test]
+   - Screenshot before: [what to capture]
+   - Screenshot after: [what to capture]

-## Negative Tests
-1. [What should NOT happen]
+## Negative Tests (REQUIRED — at least one per feature)
+1. [What should NOT happen] — [how to trigger it]
+   - Expected error: [what error message/code]
+   - State unchanged: [what to verify did NOT change]
 ```

-**Be critical** — include edge cases, error paths, and security checks.
+**Be critical** — include edge cases, error paths, and security checks. Every scenario MUST specify what screenshots to take and what state to verify.

 ## Step 3: Environment setup

@@ -233,7 +328,7 @@ curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...

 ### API testing

-Use `curl` with the auth token for backend API tests:
+Use `curl` with the auth token for backend API tests. **For EVERY API call that changes state, record before/after values:**

 ```bash
 # Example: List agents
@@ -256,6 +351,27 @@ curl -s -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
 ```

+**State verification pattern (use for EVERY state-changing API call):**
+```bash
+# 1. Record BEFORE state
+BEFORE_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
+echo "BEFORE: $BEFORE_STATE"
+
+# 2. Perform the action
+ACTION_RESULT=$(curl -s -X POST ... | jq .)
+echo "ACTION RESULT: $ACTION_RESULT"
+
+# 3. Record AFTER state
+AFTER_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
+echo "AFTER: $AFTER_STATE"
+
+# 4. Log the comparison
+echo "=== STATE CHANGE VERIFICATION ==="
+echo "Before: $BEFORE_STATE"
+echo "After: $AFTER_STATE"
+echo "Expected change: {describe what should have changed}"
+```
+
 ### Browser testing with agent-browser

 ```bash
@@ -348,19 +464,34 @@ agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --time

 ## Step 5: Record results and take screenshots

-**Take a screenshot at every significant test step** — before and after interactions, on success, and on failure. Name them sequentially with descriptive names:
+**Take a screenshot at EVERY significant test step** — before and after interactions, on success, and on failure. This is NON-NEGOTIABLE.

+**Required screenshot pattern for each test scenario:**
 ```bash
-agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{description}.png
-# Examples:
-# $RESULTS_DIR/01-login-page.png
-# $RESULTS_DIR/02-builder-with-block.png
-# $RESULTS_DIR/03-copilot-response.png
-# $RESULTS_DIR/04-agent-execution-result.png
-# $RESULTS_DIR/05-error-state.png
+# BEFORE the action
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-before.png
+
+# Perform the action...
+
+# AFTER the action
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-after.png
 ```

-**Aim for at least one screenshot per test scenario.** More is better — screenshots are the primary evidence that tests were actually run.
+**Naming convention:**
+```bash
+# Examples:
+# $RESULTS_DIR/01-login-page-before.png
+# $RESULTS_DIR/02-login-page-after.png
+# $RESULTS_DIR/03-credits-page-before.png
+# $RESULTS_DIR/04-credits-purchase-after.png
+# $RESULTS_DIR/05-negative-insufficient-credits.png
+# $RESULTS_DIR/06-error-state.png
+```
+
+**Minimum requirements:**
+- At least TWO screenshots per test scenario (before + after)
+- At least ONE screenshot for each negative test case showing the error state
+- If a test fails, screenshot the failure state AND any error logs visible in the UI

 ## Step 6: Show results to user with screenshots

@@ -384,26 +515,29 @@ Format the output like this:
 ---
 ```

-After showing all screenshots, output a summary table:
+After showing all screenshots, output a **detailed** summary table:

-| # | Scenario | Result |
-|---|----------|--------|
-| 1 | {name} | PASS/FAIL |
-| 2 | ... | ... |
+| # | Scenario | Result | API Evidence | Screenshot Evidence |
+|---|----------|--------|-------------|-------------------|
+| 1 | {name} | PASS/FAIL | Before: X, After: Y | 01-before.png, 02-after.png |
+| 2 | ... | ... | ... | ... |

 **IMPORTANT:** As you show each screenshot and record test results, persist them in shell variables for Step 7:

 ```bash
 # Build these variables during Step 6 — they are required by Step 7's script
+# NOTE: declare -A requires Bash 4.0+. This is standard on modern systems (macOS ships zsh
+# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
+# plain variable with a lookup function instead.
 declare -A SCREENSHOT_EXPLANATIONS=(
  ["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
  ["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
  # ... one entry per screenshot, using the same explanations you showed the user above
 )

-TEST_RESULTS_TABLE="| 1 | Login flow | PASS |
-| 2 | Builder block addition | PASS |
-| 3 | Copilot chat | FAIL |"
+TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
+| 2 | Credits purchase | PASS | Before: 100, After: 95 | 03-credits-before.png, 04-credits-after.png |
+| 3 | Insufficient credits (negative) | PASS | Credits: 0, rejected | 05-insufficient-credits-error.png |"
 # ... one row per test scenario with actual results
 ```

@@ -411,6 +545,8 @@ TEST_RESULTS_TABLE="| 1 | Login flow | PASS |

 Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees), then post a comment with inline images and per-screenshot explanations.

+**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
+
 ```bash
 # Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
 REPO="Significant-Gravitas/AutoGPT"
@@ -418,12 +554,29 @@ SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
 SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"

 # Step 1: Create blobs for each screenshot and build tree JSON
+# Retry each blob upload up to 3 times. If still failing, list them at end of report.
+shopt -s nullglob
+SCREENSHOT_FILES=("$RESULTS_DIR"/*.png)
+if [ ${#SCREENSHOT_FILES[@]} -eq 0 ]; then
+  echo "ERROR: No screenshots found in $RESULTS_DIR. Test run is incomplete."
+  exit 1
+fi
 TREE_JSON='['
 FIRST=true
-for img in $RESULTS_DIR/*.png; do
+FAILED_UPLOADS=()
+for img in "${SCREENSHOT_FILES[@]}"; do
  BASENAME=$(basename "$img")
  B64=$(base64 < "$img")
-  BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
+  BLOB_SHA=""
+  for attempt in 1 2 3; do
+    BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha' 2>/dev/null || true)
+    [ -n "$BLOB_SHA" ] && break
+    sleep 1
+  done
+  if [ -z "$BLOB_SHA" ]; then
+    FAILED_UPLOADS+=("$img")
+    continue
+  fi
  if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
  TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
 done
@@ -447,13 +600,25 @@ Then post the comment with **inline images AND explanations for each screenshot*
 ```bash
 REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"

-# Build image markdown using SCREENSHOT_EXPLANATIONS and TEST_RESULTS_TABLE from Step 6
+# Build image markdown using uploaded image URLs; skip FAILED_UPLOADS (listed separately)

 IMAGE_MARKDOWN=""
-for img in $RESULTS_DIR/*.png; do
+for img in "${SCREENSHOT_FILES[@]}"; do
  BASENAME=$(basename "$img")
  TITLE=$(echo "${BASENAME%.png}" | sed 's/^[0-9]*-//' | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
+  # Skip images that failed to upload — they will be listed at the end
+  IS_FAILED=false
+  for failed in "${FAILED_UPLOADS[@]}"; do
+    [ "$(basename "$failed")" = "$BASENAME" ] && IS_FAILED=true && break
+  done
+  if [ "$IS_FAILED" = true ]; then
+    continue
+  fi
  EXPLANATION="${SCREENSHOT_EXPLANATIONS[$BASENAME]}"
+  if [ -z "$EXPLANATION" ]; then
+    echo "ERROR: Missing screenshot explanation for $BASENAME. Add it to SCREENSHOT_EXPLANATIONS in Step 6."
+    exit 1
+  fi
  IMAGE_MARKDOWN="${IMAGE_MARKDOWN}
 ### ${TITLE}
 ![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})
@@ -463,14 +628,32 @@ done

 # Write comment body to file to avoid shell interpretation issues with special characters
 COMMENT_FILE=$(mktemp)
-cat > "$COMMENT_FILE" <<INNEREOF
-## 🧪 E2E Test Report
+# If any uploads failed, append a section listing them with instructions
+FAILED_SECTION=""
+if [ ${#FAILED_UPLOADS[@]} -gt 0 ]; then
+  FAILED_SECTION="
+## ⚠️ Failed Screenshot Uploads
+The following screenshots could not be uploaded via the GitHub API after 3 retries.
+**To add them:** drag-and-drop or paste these files into a PR comment manually:
+"
+  for failed in "${FAILED_UPLOADS[@]}"; do
+    FAILED_SECTION="${FAILED_SECTION}
+- \`$(basename "$failed")\` (local path: \`$failed\`)"
+  done
+  FAILED_SECTION="${FAILED_SECTION}

-| # | Scenario | Result |
-|---|----------|--------|
+**Run status:** INCOMPLETE until the files above are manually attached and visible inline in the PR."
+fi
+
+cat > "$COMMENT_FILE" <<INNEREOF
+## E2E Test Report
+
+| # | Scenario | Result | API Evidence | Screenshot Evidence |
+|---|----------|--------|-------------|-------------------|
 ${TEST_RESULTS_TABLE}

 ${IMAGE_MARKDOWN}
+${FAILED_SECTION}
 INNEREOF

 gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
@@ -478,39 +661,58 @@ rm -f "$COMMENT_FILE"
 ```

 **The PR comment MUST include:**
-1. A summary table of all scenarios with PASS/FAIL
-2. Every screenshot rendered inline (not just linked)
+1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
+2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
 3. A 1-2 sentence explanation below each screenshot describing what it proves

 This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.

 ## Fix mode (--fix flag)

-When `--fix` is present, after finding a bug:
+When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.

-1. Identify the root cause in the code
-2. Fix it in the worktree
-3. Rebuild the affected service: `cd $PLATFORM_DIR && docker compose up --build -d {service_name}`
-4. Re-test the scenario
-5. If fix works, commit and push:
+### Fix protocol for EVERY issue found (including UX issues):
+
+1. **Identify** the root cause in the code — read the relevant source files
+2. **Write a failing test first** (TDD): For backend bugs, write a test marked with `pytest.mark.xfail(reason="...")`. For frontend/Playwright bugs, write a test with `.fixme` annotation. Run it to confirm it fails as expected.
+3. **Screenshot** the broken state: `agent-browser screenshot $RESULTS_DIR/{NN}-broken-{description}.png`
+4. **Fix** the code in the worktree
+5. **Rebuild** ONLY the affected service (not the whole stack):
+   ```bash
+   cd $PLATFORM_DIR && docker compose up --build -d {service_name}
+   # e.g., docker compose up --build -d rest_server
+   # e.g., docker compose up --build -d frontend
+   ```
+6. **Wait** for the service to be ready (poll health endpoint)
+7. **Re-test** the same scenario
+8. **Screenshot** the fixed state: `agent-browser screenshot $RESULTS_DIR/{NN}-fixed-{description}.png`
+9. **Remove the xfail/fixme marker** from the test written in step 2, and verify it passes
+10. **Verify** the fix did not break other scenarios (run a quick smoke test)
+11. **Commit and push** immediately:
   ```bash
   cd $WORKTREE_PATH
   git add -A
   git commit -m "fix: {description of fix}"
   git push
   ```
-6. Continue testing remaining scenarios
-7. After all fixes, run the full test suite again to ensure no regressions
+12. **Continue** to the next test scenario

 ### Fix loop (like pr-address)

 ```text
-test scenario → find bug → fix code → rebuild service → re-test
-→ repeat until all scenarios pass
-→ commit + push all fixes
-→ run full re-test to verify
+test scenario → find issue (bug OR UX problem) → screenshot broken state
+→ fix code → rebuild affected service only → re-test → screenshot fixed state
+→ verify no regressions → commit + push
+→ repeat for next scenario
+→ after ALL scenarios pass, run full re-test to verify everything together
 ```

+**Key differences from non-fix mode:**
+- UX issues count as bugs — fix them (bad alignment, confusing labels, missing loading states)
+- Every fix MUST have a before/after screenshot pair proving it works
+- Commit after EACH fix, not in a batch at the end
+- The final re-test must produce a clean set of all-passing screenshots
+
 ## Known issues and workarounds

 ### Problem: "Database error finding user" on signup
--- a/.claude/skills/setup-repo/SKILL.md
+++ b/.claude/skills/setup-repo/SKILL.md
@@ -0,0 +1,195 @@
+---
+name: setup-repo
+description: Initialize a worktree-based repo layout for parallel development. Creates a main worktree, a reviews worktree for PR reviews, and N numbered work branches. Handles .env creation, dependency installation, and branchlet config. TRIGGER when user asks to set up the repo from scratch, initialize worktrees, bootstrap their dev environment, "setup repo", "setup worktrees", "initialize dev environment", "set up branches", or when a freshly cloned repo has no sibling worktrees.
+user-invocable: true
+args: "No arguments — interactive setup via prompts."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Repository Setup
+
+This skill sets up a worktree-based development layout from a freshly cloned repo. It creates:
+- A **main** worktree (the primary checkout)
+- A **reviews** worktree (for PR reviews)
+- **N work branches** (branch1..branchN) for parallel development
+
+## Step 1: Identify the repo
+
+Determine the repo root and parent directory:
+
+```bash
+ROOT=$(git rev-parse --show-toplevel)
+REPO_NAME=$(basename "$ROOT")
+PARENT=$(dirname "$ROOT")
+```
+
+Detect if the repo is already inside a worktree layout by counting sibling worktrees (not just checking the directory name, which could be anything):
+
+```bash
+# Count worktrees that are siblings (live under $PARENT but aren't $ROOT itself)
+SIBLING_COUNT=$(git worktree list --porcelain 2>/dev/null | grep "^worktree " | grep -c "$PARENT/" || true)
+if [ "$SIBLING_COUNT" -gt 1 ]; then
+  echo "INFO: Existing worktree layout detected at $PARENT ($SIBLING_COUNT worktrees)"
+  # Use $ROOT as-is; skip renaming/restructuring
+else
+  echo "INFO: Fresh clone detected, proceeding with setup"
+fi
+```
+
+## Step 2: Ask the user questions
+
+Use AskUserQuestion to gather setup preferences:
+
+1. **How many parallel work branches do you need?** (Options: 4, 8, 16, or custom)
+   - These become `branch1` through `branchN`
+2. **Which branch should be the base?** (Options: origin/master, origin/dev, or custom)
+   - All work branches and reviews will start from this
+
+## Step 3: Fetch and set up branches
+
+```bash
+cd "$ROOT"
+git fetch origin
+
+# Create the reviews branch from base (skip if already exists)
+if git show-ref --verify --quiet refs/heads/reviews; then
+  echo "INFO: Branch 'reviews' already exists, skipping"
+else
+  git branch reviews <base-branch>
+fi
+
+# Create numbered work branches from base (skip if already exists)
+for i in $(seq 1 "$COUNT"); do
+  if git show-ref --verify --quiet "refs/heads/branch$i"; then
+    echo "INFO: Branch 'branch$i' already exists, skipping"
+  else
+    git branch "branch$i" <base-branch>
+  fi
+done
+```
+
+## Step 4: Create worktrees
+
+Create worktrees as siblings to the main checkout:
+
+```bash
+if [ -d "$PARENT/reviews" ]; then
+  echo "INFO: Worktree '$PARENT/reviews' already exists, skipping"
+else
+  git worktree add "$PARENT/reviews" reviews
+fi
+
+for i in $(seq 1 "$COUNT"); do
+  if [ -d "$PARENT/branch$i" ]; then
+    echo "INFO: Worktree '$PARENT/branch$i' already exists, skipping"
+  else
+    git worktree add "$PARENT/branch$i" "branch$i"
+  fi
+done
+```
+
+## Step 5: Set up environment files
+
+**Do NOT assume .env files exist.** For each worktree (including main if needed):
+
+1. Check if `.env` exists in the source worktree for each path
+2. If `.env` exists, copy it
+3. If only `.env.default` or `.env.example` exists, copy that as `.env`
+4. If neither exists, warn the user and list which env files are missing
+
+Env file locations to check (same as the `/worktree` skill — keep these in sync):
+- `autogpt_platform/.env`
+- `autogpt_platform/backend/.env`
+- `autogpt_platform/frontend/.env`
+
+> **Note:** This env copying logic intentionally mirrors the `/worktree` skill's approach. If you update the path list or fallback logic here, update `/worktree` as well.
+
+```bash
+SOURCE="$ROOT"
+WORKTREES="reviews"
+for i in $(seq 1 "$COUNT"); do WORKTREES="$WORKTREES branch$i"; done
+
+FOUND_ANY_ENV=0
+for wt in $WORKTREES; do
+  TARGET="$PARENT/$wt"
+  for envpath in autogpt_platform autogpt_platform/backend autogpt_platform/frontend; do
+    if [ -f "$SOURCE/$envpath/.env" ]; then
+      FOUND_ANY_ENV=1
+      cp "$SOURCE/$envpath/.env" "$TARGET/$envpath/.env"
+    elif [ -f "$SOURCE/$envpath/.env.default" ]; then
+      FOUND_ANY_ENV=1
+      cp "$SOURCE/$envpath/.env.default" "$TARGET/$envpath/.env"
+      echo "NOTE: $wt/$envpath/.env was created from .env.default — you may need to edit it"
+    elif [ -f "$SOURCE/$envpath/.env.example" ]; then
+      FOUND_ANY_ENV=1
+      cp "$SOURCE/$envpath/.env.example" "$TARGET/$envpath/.env"
+      echo "NOTE: $wt/$envpath/.env was created from .env.example — you may need to edit it"
+    else
+      echo "WARNING: No .env, .env.default, or .env.example found at $SOURCE/$envpath/"
+    fi
+  done
+done
+
+if [ "$FOUND_ANY_ENV" -eq 0 ]; then
+  echo "WARNING: No environment files or templates were found in the source worktree."
+  # Use AskUserQuestion to confirm: "Continue setup without env files?"
+  # If the user declines, stop here and let them set up .env files first.
+fi
+```
+
+## Step 6: Copy branchlet config
+
+Copy `.branchlet.json` from main to each worktree so branchlet can manage sub-worktrees:
+
+```bash
+if [ -f "$ROOT/.branchlet.json" ]; then
+  for wt in $WORKTREES; do
+    cp "$ROOT/.branchlet.json" "$PARENT/$wt/.branchlet.json"
+  done
+fi
+```
+
+## Step 7: Install dependencies
+
+Install deps in all worktrees. Run these sequentially per worktree:
+
+```bash
+for wt in $WORKTREES; do
+  TARGET="$PARENT/$wt"
+  echo "=== Installing deps for $wt ==="
+  (cd "$TARGET/autogpt_platform/autogpt_libs" && poetry install) &&
+  (cd "$TARGET/autogpt_platform/backend" && poetry install && poetry run prisma generate) &&
+  (cd "$TARGET/autogpt_platform/frontend" && pnpm install) &&
+  echo "=== Done: $wt ===" ||
+  echo "=== FAILED: $wt ==="
+done
+```
+
+This is slow. Run in background if possible and notify when complete.
+
+## Step 8: Verify and report
+
+After setup, verify and report to the user:
+
+```bash
+git worktree list
+```
+
+Summarize:
+- Number of worktrees created
+- Which env files were copied vs created from defaults vs missing
+- Any warnings or errors encountered
+
+## Final directory layout
+
+```
+parent/
+  main/              # Primary checkout (already exists)
+  reviews/           # PR review worktree
+  branch1/           # Work branch 1
+  branch2/           # Work branch 2
+  ...
+  branchN/           # Work branch N
+```
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,8 +1,12 @@
-<!-- Clearly explain the need for these changes: -->
+### Why / What / How
+
+<!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? -->
+<!-- What: What does this PR change? Summarize the changes at a high level. -->
+<!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. -->

 ### Changes 🏗️

-<!-- Concisely describe all of the changes made in this pull request: -->
+<!-- List the key changes. Keep it higher level than the diff but specific enough to highlight what's new/modified. -->

 ### Checklist 📋

--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,6 +1,6 @@
 # AutoGPT Platform Contribution Guide

-This guide provides context for Codex when updating the **autogpt_platform** folder.
+This guide provides context for coding agents when updating the **autogpt_platform** folder.

 ## Directory overview

--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+@AGENTS.md
--- a/README.md
+++ b/README.md
@@ -83,13 +83,13 @@ The AutoGPT frontend is where users interact with our powerful AI automation pla

   **Agent Builder:** For those who want to customize, our intuitive, low-code interface allows you to design and configure your own AI agents. 
   
-   **Workflow Management:** Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block     performs a single action.
+   **Workflow Management:** Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block performs a single action.
   
   **Deployment Controls:** Manage the lifecycle of your agents, from testing to production.
   
   **Ready-to-Use Agents:** Don't want to build? Simply select from our library of pre-configured agents and put them to work immediately.
   
-   **Agent Interaction:** Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly      interface.
+   **Agent Interaction:** Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly interface.

   **Monitoring and Analytics:** Keep track of your agents' performance and gain insights to continually improve your automation processes.

--- a/autogpt_platform/AGENTS.md
+++ b/autogpt_platform/AGENTS.md
@@ -0,0 +1,120 @@
+# AutoGPT Platform
+
+This file provides guidance to coding agents when working with code in this repository.
+
+## Repository Overview
+
+AutoGPT Platform is a monorepo containing:
+
+- **Backend** (`backend`): Python FastAPI server with async support
+- **Frontend** (`frontend`): Next.js React application
+- **Shared Libraries** (`autogpt_libs`): Common Python utilities
+
+## Component Documentation
+
+- **Backend**: See @backend/AGENTS.md for backend-specific commands, architecture, and development tasks
+- **Frontend**: See @frontend/AGENTS.md for frontend-specific commands, architecture, and development patterns
+
+## Key Concepts
+
+1. **Agent Graphs**: Workflow definitions stored as JSON, executed by the backend
+2. **Blocks**: Reusable components in `backend/backend/blocks/` that perform specific tasks
+3. **Integrations**: OAuth and API connections stored per user
+4. **Store**: Marketplace for sharing agent templates
+5. **Virus Scanning**: ClamAV integration for file upload security
+
+### Environment Configuration
+
+#### Configuration Files
+
+- **Backend**: `backend/.env.default` (defaults) → `backend/.env` (user overrides)
+- **Frontend**: `frontend/.env.default` (defaults) → `frontend/.env` (user overrides)
+- **Platform**: `.env.default` (Supabase/shared defaults) → `.env` (user overrides)
+
+#### Docker Environment Loading Order
+
+1. `.env.default` files provide base configuration (tracked in git)
+2. `.env` files provide user-specific overrides (gitignored)
+3. Docker Compose `environment:` sections provide service-specific overrides
+4. Shell environment variables have highest precedence
+
+#### Key Points
+
+- All services use hardcoded defaults in docker-compose files (no `${VARIABLE}` substitutions)
+- The `env_file` directive loads variables INTO containers at runtime
+- Backend/Frontend services use YAML anchors for consistent configuration
+- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
+
+### Branching Strategy
+
+- **`dev`** is the main development branch. All PRs should target `dev`.
+- **`master`** is the production branch. Only used for production releases.
+
+### Creating Pull Requests
+
+- Create the PR against the `dev` branch of the repository.
+- **Split PRs by concern** — each PR should have a single clear purpose. For example, "usage tracking" and "credit charging" should be separate PRs even if related. Combining multiple concerns makes it harder for reviewers to understand what belongs to what.
+- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
+- Use conventional commit messages (see below)
+- **Structure the PR description with Why / What / How** — Why: the motivation (what problem it solves, what's broken/missing without it); What: high-level summary of changes; How: approach, key implementation details, or architecture decisions. Reviewers need all three to judge whether the approach fits the problem.
+- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
+- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
+  ```bash
+  PR_BODY=$(mktemp)
+  cat > "$PR_BODY" << 'PREOF'
+  ## Summary
+  - use `backticks` freely here
+  PREOF
+  gh pr create --title "..." --body-file "$PR_BODY" --base dev
+  rm "$PR_BODY"
+  ```
+- Run the github pre-commit hooks to ensure code quality.
+
+### Test-Driven Development (TDD)
+
+When fixing a bug or adding a feature, follow a test-first approach:
+
+1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
+2. **Implement the fix/feature** — write the minimal code to make the test pass.
+3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
+
+This ensures every change is covered by a test and that the test actually validates the intended behavior.
+
+### Reviewing/Revising Pull Requests
+
+Use `/pr-review` to review a PR or `/pr-address` to address comments.
+
+When fetching comments manually:
+- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
+- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
+- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
+
+### Conventional Commits
+
+Use this format for commit messages and Pull Request titles:
+
+**Conventional Commit Types:**
+
+- `feat`: Introduces a new feature to the codebase
+- `fix`: Patches a bug in the codebase
+- `refactor`: Code change that neither fixes a bug nor adds a feature; also applies to removing features
+- `ci`: Changes to CI configuration
+- `docs`: Documentation-only changes
+- `dx`: Improvements to the developer experience
+
+**Recommended Base Scopes:**
+
+- `platform`: Changes affecting both frontend and backend
+- `frontend`
+- `backend`
+- `infra`
+- `blocks`: Modifications/additions of individual blocks
+
+**Subscope Examples:**
+
+- `backend/executor`
+- `backend/db`
+- `frontend/builder` (includes changes to the block UI component)
+- `infra/prod`
+
+Use these scopes and subscopes for clarity and consistency in commit messages.
--- a/autogpt_platform/CLAUDE.md
+++ b/autogpt_platform/CLAUDE.md
@@ -1,118 +1 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Repository Overview
-
-AutoGPT Platform is a monorepo containing:
-
- **Backend** (`backend`): Python FastAPI server with async support
- **Frontend** (`frontend`): Next.js React application
- **Shared Libraries** (`autogpt_libs`): Common Python utilities
-
-## Component Documentation
-
- **Backend**: See @backend/CLAUDE.md for backend-specific commands, architecture, and development tasks
- **Frontend**: See @frontend/CLAUDE.md for frontend-specific commands, architecture, and development patterns
-
-## Key Concepts
-
-1. **Agent Graphs**: Workflow definitions stored as JSON, executed by the backend
-2. **Blocks**: Reusable components in `backend/backend/blocks/` that perform specific tasks
-3. **Integrations**: OAuth and API connections stored per user
-4. **Store**: Marketplace for sharing agent templates
-5. **Virus Scanning**: ClamAV integration for file upload security
-
-### Environment Configuration
-
-#### Configuration Files
-
- **Backend**: `backend/.env.default` (defaults) → `backend/.env` (user overrides)
- **Frontend**: `frontend/.env.default` (defaults) → `frontend/.env` (user overrides)
- **Platform**: `.env.default` (Supabase/shared defaults) → `.env` (user overrides)
-
-#### Docker Environment Loading Order
-
-1. `.env.default` files provide base configuration (tracked in git)
-2. `.env` files provide user-specific overrides (gitignored)
-3. Docker Compose `environment:` sections provide service-specific overrides
-4. Shell environment variables have highest precedence
-
-#### Key Points
-
- All services use hardcoded defaults in docker-compose files (no `${VARIABLE}` substitutions)
- The `env_file` directive loads variables INTO containers at runtime
- Backend/Frontend services use YAML anchors for consistent configuration
- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
-
-### Branching Strategy
-
- **`dev`** is the main development branch. All PRs should target `dev`.
- **`master`** is the production branch. Only used for production releases.
-
-### Creating Pull Requests
-
- Create the PR against the `dev` branch of the repository.
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
-  ```bash
-  PR_BODY=$(mktemp)
-  cat > "$PR_BODY" << 'PREOF'
-  ## Summary
-  - use `backticks` freely here
-  PREOF
-  gh pr create --title "..." --body-file "$PR_BODY" --base dev
-  rm "$PR_BODY"
-  ```
- Run the github pre-commit hooks to ensure code quality.
-
-### Test-Driven Development (TDD)
-
-When fixing a bug or adding a feature, follow a test-first approach:
-
-1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
-2. **Implement the fix/feature** — write the minimal code to make the test pass.
-3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
-
-This ensures every change is covered by a test and that the test actually validates the intended behavior.
-
-### Reviewing/Revising Pull Requests
-
-Use `/pr-review` to review a PR or `/pr-address` to address comments.
-
-When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
-
-### Conventional Commits
-
-Use this format for commit messages and Pull Request titles:
-
-**Conventional Commit Types:**
-
- `feat`: Introduces a new feature to the codebase
- `fix`: Patches a bug in the codebase
- `refactor`: Code change that neither fixes a bug nor adds a feature; also applies to removing features
- `ci`: Changes to CI configuration
- `docs`: Documentation-only changes
- `dx`: Improvements to the developer experience
-
-**Recommended Base Scopes:**
-
- `platform`: Changes affecting both frontend and backend
- `frontend`
- `backend`
- `infra`
- `blocks`: Modifications/additions of individual blocks
-
-**Subscope Examples:**
-
- `backend/executor`
- `backend/db`
- `frontend/builder` (includes changes to the block UI component)
- `infra/prod`
-
-Use these scopes and subscopes for clarity and consistency in commit messages.
+@AGENTS.md
--- a/autogpt_platform/autogpt_libs/poetry.lock
+++ b/autogpt_platform/autogpt_libs/poetry.lock
@@ -1,4 +1,4 @@
-# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.
+# This file is automatically @generated by Poetry 2.2.1 and should not be changed by hand.

 [[package]]
 name = "annotated-doc"
@@ -67,7 +67,7 @@ description = "Backport of asyncio.Runner, a context manager that controls event
 optional = false
 python-versions = "<3.11,>=3.8"
 groups = ["dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "backports_asyncio_runner-1.2.0-py3-none-any.whl", hash = "sha256:0da0a936a8aeb554eccb426dc55af3ba63bcdc69fa1a600b5bb305413a4477b5"},
    {file = "backports_asyncio_runner-1.2.0.tar.gz", hash = "sha256:a5aa7b2b7d8f8bfcaa2b57313f70792df84e32a2a746f585213373f900b42162"},
@@ -541,7 +541,7 @@ description = "Backport of PEP 654 (exception groups)"
 optional = false
 python-versions = ">=3.7"
 groups = ["main", "dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "exceptiongroup-1.3.0-py3-none-any.whl", hash = "sha256:4d111e6e0c13d0644cad6ddaa7ed0261a0b36971f6d23e7ec9b4b9097da78a10"},
    {file = "exceptiongroup-1.3.0.tar.gz", hash = "sha256:b241f5885f560bc56a59ee63ca4c6a8bfa46ae4ad651af316d4e81817bb9fd88"},
@@ -2181,14 +2181,14 @@ testing = ["coverage (>=6.2)", "hypothesis (>=5.7.1)"]

 [[package]]
 name = "pytest-cov"
-version = "7.0.0"
+version = "7.1.0"
 description = "Pytest plugin for measuring coverage."
 optional = false
 python-versions = ">=3.9"
 groups = ["dev"]
 files = [
-    {file = "pytest_cov-7.0.0-py3-none-any.whl", hash = "sha256:3b8e9558b16cc1479da72058bdecf8073661c7f57f7d3c5f22a1c23507f2d861"},
-    {file = "pytest_cov-7.0.0.tar.gz", hash = "sha256:33c97eda2e049a0c5298e91f519302a1334c26ac65c1a483d6206fd458361af1"},
+    {file = "pytest_cov-7.1.0-py3-none-any.whl", hash = "sha256:a0461110b7865f9a271aa1b51e516c9a95de9d696734a2f71e3e78f46e1d4678"},
+    {file = "pytest_cov-7.1.0.tar.gz", hash = "sha256:30674f2b5f6351aa09702a9c8c364f6a01c27aae0c1366ae8016160d1efc56b2"},
 ]

 [package.dependencies]
@@ -2342,30 +2342,30 @@ pyasn1 = ">=0.1.3"

 [[package]]
 name = "ruff"
-version = "0.15.0"
+version = "0.15.7"
 description = "An extremely fast Python linter and code formatter, written in Rust."
 optional = false
 python-versions = ">=3.7"
 groups = ["dev"]
 files = [
-    {file = "ruff-0.15.0-py3-none-linux_armv6l.whl", hash = "sha256:aac4ebaa612a82b23d45964586f24ae9bc23ca101919f5590bdb368d74ad5455"},
-    {file = "ruff-0.15.0-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:dcd4be7cc75cfbbca24a98d04d0b9b36a270d0833241f776b788d59f4142b14d"},
-    {file = "ruff-0.15.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d747e3319b2bce179c7c1eaad3d884dc0a199b5f4d5187620530adf9105268ce"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:650bd9c56ae03102c51a5e4b554d74d825ff3abe4db22b90fd32d816c2e90621"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a6664b7eac559e3048223a2da77769c2f92b43a6dfd4720cef42654299a599c9"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6f811f97b0f092b35320d1556f3353bf238763420ade5d9e62ebd2b73f2ff179"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:761ec0a66680fab6454236635a39abaf14198818c8cdf691e036f4bc0f406b2d"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:940f11c2604d317e797b289f4f9f3fa5555ffe4fb574b55ed006c3d9b6f0eb78"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bcbca3d40558789126da91d7ef9a7c87772ee107033db7191edefa34e2c7f1b4"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:9a121a96db1d75fa3eb39c4539e607f628920dd72ff1f7c5ee4f1b768ac62d6e"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5298d518e493061f2eabd4abd067c7e4fb89e2f63291c94332e35631c07c3662"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:afb6e603d6375ff0d6b0cee563fa21ab570fd15e65c852cb24922cef25050cf1"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:77e515f6b15f828b94dc17d2b4ace334c9ddb7d9468c54b2f9ed2b9c1593ef16"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:6f6e80850a01eb13b3e42ee0ebdf6e4497151b48c35051aab51c101266d187a3"},
-    {file = "ruff-0.15.0-py3-none-win32.whl", hash = "sha256:238a717ef803e501b6d51e0bdd0d2c6e8513fe9eec14002445134d3907cd46c3"},
-    {file = "ruff-0.15.0-py3-none-win_amd64.whl", hash = "sha256:dd5e4d3301dc01de614da3cdffc33d4b1b96fb89e45721f1598e5532ccf78b18"},
-    {file = "ruff-0.15.0-py3-none-win_arm64.whl", hash = "sha256:c480d632cc0ca3f0727acac8b7d053542d9e114a462a145d0b00e7cd658c515a"},
-    {file = "ruff-0.15.0.tar.gz", hash = "sha256:6bdea47cdbea30d40f8f8d7d69c0854ba7c15420ec75a26f463290949d7f7e9a"},
+    {file = "ruff-0.15.7-py3-none-linux_armv6l.whl", hash = "sha256:a81cc5b6910fb7dfc7c32d20652e50fa05963f6e13ead3c5915c41ac5d16668e"},
+    {file = "ruff-0.15.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:722d165bd52403f3bdabc0ce9e41fc47070ac56d7a91b4e0d097b516a53a3477"},
+    {file = "ruff-0.15.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:7fbc2448094262552146cbe1b9643a92f66559d3761f1ad0656d4991491af49e"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b39329b60eba44156d138275323cc726bbfbddcec3063da57caa8a8b1d50adf"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:87768c151808505f2bfc93ae44e5f9e7c8518943e5074f76ac21558ef5627c85"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:fb0511670002c6c529ec66c0e30641c976c8963de26a113f3a30456b702468b0"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e0d19644f801849229db8345180a71bee5407b429dd217f853ec515e968a6912"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4806d8e09ef5e84eb19ba833d0442f7e300b23fe3f0981cae159a248a10f0036"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dce0896488562f09a27b9c91b1f58a097457143931f3c4d519690dea54e624c5"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:1852ce241d2bc89e5dc823e03cff4ce73d816b5c6cdadd27dbfe7b03217d2a12"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5f3e4b221fb4bd293f79912fc5e93a9063ebd6d0dcbd528f91b89172a9b8436c"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:b15e48602c9c1d9bdc504b472e90b90c97dc7d46c7028011ae67f3861ceba7b4"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:1b4705e0e85cedc74b0a23cf6a179dbb3df184cb227761979cc76c0440b5ab0d"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:112c1fa316a558bb34319282c1200a8bf0495f1b735aeb78bfcb2991e6087580"},
+    {file = "ruff-0.15.7-py3-none-win32.whl", hash = "sha256:6d39e2d3505b082323352f733599f28169d12e891f7dd407f2d4f54b4c2886de"},
+    {file = "ruff-0.15.7-py3-none-win_amd64.whl", hash = "sha256:4d53d712ddebcd7dace1bc395367aec12c057aacfe9adbb6d832302575f4d3a1"},
+    {file = "ruff-0.15.7-py3-none-win_arm64.whl", hash = "sha256:18e8d73f1c3fdf27931497972250340f92e8c861722161a9caeb89a58ead6ed2"},
+    {file = "ruff-0.15.7.tar.gz", hash = "sha256:04f1ae61fc20fe0b148617c324d9d009b5f63412c0b16474f3d5f1a1a665f7ac"},
 ]

 [[package]]
@@ -2564,7 +2564,7 @@ description = "A lil' TOML parser"
 optional = false
 python-versions = ">=3.8"
 groups = ["dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "tomli-2.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:678e4fa69e4575eb77d103de3df8a895e1591b48e740211bd1067378c69e8249"},
    {file = "tomli-2.2.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:023aa114dd824ade0100497eb2318602af309e5a55595f76b626d6d9f3b7b0a6"},
@@ -2912,4 +2912,4 @@ type = ["pytest-mypy"]
 [metadata]
 lock-version = "2.1"
 python-versions = ">=3.10,<4.0"
-content-hash = "9619cae908ad38fa2c48016a58bcf4241f6f5793aa0e6cc140276e91c433cbbb"
+content-hash = "e0936a065565550afed18f6298b7e04e814b44100def7049f1a0d68662624a39"
--- a/autogpt_platform/autogpt_libs/pyproject.toml
+++ b/autogpt_platform/autogpt_libs/pyproject.toml
@@ -26,8 +26,8 @@ pyright = "^1.1.408"
 pytest = "^8.4.1"
 pytest-asyncio = "^1.3.0"
 pytest-mock = "^3.15.1"
-pytest-cov = "^7.0.0"
-ruff = "^0.15.0"
+pytest-cov = "^7.1.0"
+ruff = "^0.15.7"

 [build-system]
 requires = ["poetry-core"]
--- a/autogpt_platform/backend/.env.default
+++ b/autogpt_platform/backend/.env.default
@@ -178,6 +178,7 @@ SMTP_USERNAME=
 SMTP_PASSWORD=

 # Business & Marketing Tools
+AGENTMAIL_API_KEY=
 APOLLO_API_KEY=
 ENRICHLAYER_API_KEY=
 AYRSHARE_API_KEY=
--- a/autogpt_platform/backend/AGENTS.md
+++ b/autogpt_platform/backend/AGENTS.md
@@ -0,0 +1,227 @@
+# Backend
+
+This file provides guidance to coding agents when working with the backend.
+
+## Essential Commands
+
+To run something with Python package dependencies you MUST use `poetry run ...`.
+
+```bash
+# Install dependencies
+poetry install
+
+# Run database migrations
+poetry run prisma migrate dev
+
+# Start all services (database, redis, rabbitmq, clamav)
+docker compose up -d
+
+# Run the backend as a whole
+poetry run app
+
+# Run tests
+poetry run test
+
+# Run specific test
+poetry run pytest path/to/test_file.py::test_function_name
+
+# Run block tests (tests that validate all blocks work correctly)
+poetry run pytest backend/blocks/test/test_block.py -xvs
+
+# Run tests for a specific block (e.g., GetCurrentTimeBlock)
+poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[GetCurrentTimeBlock]' -xvs
+
+# Lint and format
+# prefer format if you want to just "fix" it and only get the errors that can't be autofixed
+poetry run format  # Black + isort
+poetry run lint    # ruff
+```
+
+More details can be found in @TESTING.md
+
+### Creating/Updating Snapshots
+
+When you first write a test or when the expected output changes:
+
+```bash
+poetry run pytest path/to/test.py --snapshot-update
+```
+
+⚠️ **Important**: Always review snapshot changes before committing! Use `git diff` to verify the changes are expected.
+
+## Architecture
+
+- **API Layer**: FastAPI with REST and WebSocket endpoints
+- **Database**: PostgreSQL with Prisma ORM, includes pgvector for embeddings
+- **Queue System**: RabbitMQ for async task processing
+- **Execution Engine**: Separate executor service processes agent workflows
+- **Authentication**: JWT-based with Supabase integration
+- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
+
+## Code Style
+
+- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
+- **Absolute imports** — use `from backend.module import ...` for cross-package imports. Single-dot relative (`from .sibling import ...`) is acceptable for sibling modules within the same package (e.g., blocks). Avoid double-dot relative imports (`from ..parent import ...`) — use the absolute path instead
+- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
+- **Pydantic models** over dataclass/namedtuple/dict for structured data
+- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
+- **List comprehensions** over manual loop-and-append
+- **Early return** — guard clauses first, avoid deep nesting
+- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
+- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
+- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
+- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
+- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
+- **`max(0, value)` guards** — for computed values that should never be negative
+- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
+- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
+- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
+- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
+
+## Testing Approach
+
+- Uses pytest with snapshot testing for API responses
+- Test files are colocated with source files (`*_test.py`)
+- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
+- After refactoring, update mock targets to match new module paths
+- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
+
+### Test-Driven Development (TDD)
+
+When fixing a bug or adding a feature, write the test **before** the implementation:
+
+```python
+# 1. Write a failing test marked xfail
+@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
+def test_widget_handles_empty_input():
+    result = widget.process("")
+    assert result == Widget.EMPTY_RESULT
+
+# 2. Run it — confirm it fails (XFAIL)
+# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
+
+# 3. Implement the fix
+
+# 4. Remove xfail, run again — confirm it passes
+def test_widget_handles_empty_input():
+    result = widget.process("")
+    assert result == Widget.EMPTY_RESULT
+```
+
+This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
+
+## Database Schema
+
+Key models (defined in `schema.prisma`):
+
+- `User`: Authentication and profile data
+- `AgentGraph`: Workflow definitions with version control
+- `AgentGraphExecution`: Execution history and results
+- `AgentNode`: Individual nodes in a workflow
+- `StoreListing`: Marketplace listings for sharing agents
+
+## Environment Configuration
+
+- **Backend**: `.env.default` (defaults) → `.env` (user overrides)
+
+## Common Development Tasks
+
+### Adding a new block
+
+Follow the comprehensive [Block SDK Guide](@../../docs/platform/block-sdk-guide.md) which covers:
+
+- Provider configuration with `ProviderBuilder`
+- Block schema definition
+- Authentication (API keys, OAuth, webhooks)
+- Testing and validation
+- File organization
+
+Quick steps:
+
+1. Create new file in `backend/blocks/`
+2. Configure provider using `ProviderBuilder` in `_config.py`
+3. Inherit from `Block` base class
+4. Define input/output schemas using `BlockSchema`
+5. Implement async `run` method
+6. Generate unique block ID using `uuid.uuid4()`
+7. Test with `poetry run pytest backend/blocks/test/test_block.py`
+
+Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph-based editor or would they struggle to connect productively?
+ex: do the inputs and outputs tie well together?
+
+If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
+
+#### Handling files in blocks with `store_media_file()`
+
+When blocks need to work with files (images, videos, documents), use `store_media_file()` from `backend.util.file`. The `return_format` parameter determines what you get back:
+
+| Format | Use When | Returns |
+|--------|----------|---------|
+| `"for_local_processing"` | Processing with local tools (ffmpeg, MoviePy, PIL) | Local file path (e.g., `"image.png"`) |
+| `"for_external_api"` | Sending content to external APIs (Replicate, OpenAI) | Data URI (e.g., `"data:image/png;base64,..."`) |
+| `"for_block_output"` | Returning output from your block | Smart: `workspace://` in CoPilot, data URI in graphs |
+
+**Examples:**
+
+```python
+# INPUT: Need to process file locally with ffmpeg
+local_path = await store_media_file(
+    file=input_data.video,
+    execution_context=execution_context,
+    return_format="for_local_processing",
+)
+# local_path = "video.mp4" - use with Path/ffmpeg/etc
+
+# INPUT: Need to send to external API like Replicate
+image_b64 = await store_media_file(
+    file=input_data.image,
+    execution_context=execution_context,
+    return_format="for_external_api",
+)
+# image_b64 = "data:image/png;base64,iVBORw0..." - send to API
+
+# OUTPUT: Returning result from block
+result_url = await store_media_file(
+    file=generated_image_url,
+    execution_context=execution_context,
+    return_format="for_block_output",
+)
+yield "image_url", result_url
+# In CoPilot: result_url = "workspace://abc123"
+# In graphs:  result_url = "data:image/png;base64,..."
+```
+
+**Key points:**
+
+- `for_block_output` is the ONLY format that auto-adapts to execution context
+- Always use `for_block_output` for block outputs unless you have a specific reason not to
+- Never hardcode workspace checks - let `for_block_output` handle it
+
+### Modifying the API
+
+1. Update route in `backend/api/features/`
+2. Add/update Pydantic models in same directory
+3. Write tests alongside the route file
+4. Run `poetry run test` to verify
+
+## Workspace & Media Files
+
+**Read [Workspace & Media Architecture](../../docs/platform/workspace-media-architecture.md) when:**
+- Working on CoPilot file upload/download features
+- Building blocks that handle `MediaFileType` inputs/outputs
+- Modifying `WorkspaceManager` or `store_media_file()`
+- Debugging file persistence or virus scanning issues
+
+Covers: `WorkspaceManager` (persistent storage with session scoping), `store_media_file()` (media normalization pipeline), and responsibility boundaries for virus scanning and persistence.
+
+## Security Implementation
+
+### Cache Protection Middleware
+
+- Located in `backend/api/middleware/security.py`
+- Default behavior: Disables caching for ALL endpoints with `Cache-Control: no-store, no-cache, must-revalidate, private`
+- Uses an allow list approach - only explicitly permitted paths can be cached
+- Cacheable paths include: static assets (`static/*`, `_next/static/*`), health checks, public store pages, documentation
+- Prevents sensitive data (auth tokens, API keys, user data) from being cached by browsers/proxies
+- To allow caching for a new endpoint, add it to `CACHEABLE_PATHS` in the middleware
+- Applied to both main API server and external API applications
--- a/autogpt_platform/backend/CLAUDE.md
+++ b/autogpt_platform/backend/CLAUDE.md
@@ -1,226 +1 @@
-# CLAUDE.md - Backend
-
-This file provides guidance to Claude Code when working with the backend.
-
-## Essential Commands
-
-To run something with Python package dependencies you MUST use `poetry run ...`.
-
-```bash
-# Install dependencies
-poetry install
-
-# Run database migrations
-poetry run prisma migrate dev
-
-# Start all services (database, redis, rabbitmq, clamav)
-docker compose up -d
-
-# Run the backend as a whole
-poetry run app
-
-# Run tests
-poetry run test
-
-# Run specific test
-poetry run pytest path/to/test_file.py::test_function_name
-
-# Run block tests (tests that validate all blocks work correctly)
-poetry run pytest backend/blocks/test/test_block.py -xvs
-
-# Run tests for a specific block (e.g., GetCurrentTimeBlock)
-poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[GetCurrentTimeBlock]' -xvs
-
-# Lint and format
-# prefer format if you want to just "fix" it and only get the errors that can't be autofixed
-poetry run format  # Black + isort
-poetry run lint    # ruff
-```
-
-More details can be found in @TESTING.md
-
-### Creating/Updating Snapshots
-
-When you first write a test or when the expected output changes:
-
-```bash
-poetry run pytest path/to/test.py --snapshot-update
-```
-
-⚠️ **Important**: Always review snapshot changes before committing! Use `git diff` to verify the changes are expected.
-
-## Architecture
-
- **API Layer**: FastAPI with REST and WebSocket endpoints
- **Database**: PostgreSQL with Prisma ORM, includes pgvector for embeddings
- **Queue System**: RabbitMQ for async task processing
- **Execution Engine**: Separate executor service processes agent workflows
- **Authentication**: JWT-based with Supabase integration
- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
-
-## Code Style
-
- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
- **Pydantic models** over dataclass/namedtuple/dict for structured data
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
- **`max(0, value)` guards** — for computed values that should never be negative
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
-
-## Testing Approach
-
- Uses pytest with snapshot testing for API responses
- Test files are colocated with source files (`*_test.py`)
- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
-
-### Test-Driven Development (TDD)
-
-When fixing a bug or adding a feature, write the test **before** the implementation:
-
-```python
-# 1. Write a failing test marked xfail
-@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
-def test_widget_handles_empty_input():
-    result = widget.process("")
-    assert result == Widget.EMPTY_RESULT
-
-# 2. Run it — confirm it fails (XFAIL)
-# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
-
-# 3. Implement the fix
-
-# 4. Remove xfail, run again — confirm it passes
-def test_widget_handles_empty_input():
-    result = widget.process("")
-    assert result == Widget.EMPTY_RESULT
-```
-
-This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
-
-## Database Schema
-
-Key models (defined in `schema.prisma`):
-
- `User`: Authentication and profile data
- `AgentGraph`: Workflow definitions with version control
- `AgentGraphExecution`: Execution history and results
- `AgentNode`: Individual nodes in a workflow
- `StoreListing`: Marketplace listings for sharing agents
-
-## Environment Configuration
-
- **Backend**: `.env.default` (defaults) → `.env` (user overrides)
-
-## Common Development Tasks
-
-### Adding a new block
-
-Follow the comprehensive [Block SDK Guide](@../../docs/content/platform/block-sdk-guide.md) which covers:
-
- Provider configuration with `ProviderBuilder`
- Block schema definition
- Authentication (API keys, OAuth, webhooks)
- Testing and validation
- File organization
-
-Quick steps:
-
-1. Create new file in `backend/blocks/`
-2. Configure provider using `ProviderBuilder` in `_config.py`
-3. Inherit from `Block` base class
-4. Define input/output schemas using `BlockSchema`
-5. Implement async `run` method
-6. Generate unique block ID using `uuid.uuid4()`
-7. Test with `poetry run pytest backend/blocks/test/test_block.py`
-
-Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph-based editor or would they struggle to connect productively?
-ex: do the inputs and outputs tie well together?
-
-If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
-
-#### Handling files in blocks with `store_media_file()`
-
-When blocks need to work with files (images, videos, documents), use `store_media_file()` from `backend.util.file`. The `return_format` parameter determines what you get back:
-
-| Format | Use When | Returns |
-|--------|----------|---------|
-| `"for_local_processing"` | Processing with local tools (ffmpeg, MoviePy, PIL) | Local file path (e.g., `"image.png"`) |
-| `"for_external_api"` | Sending content to external APIs (Replicate, OpenAI) | Data URI (e.g., `"data:image/png;base64,..."`) |
-| `"for_block_output"` | Returning output from your block | Smart: `workspace://` in CoPilot, data URI in graphs |
-
-**Examples:**
-
-```python
-# INPUT: Need to process file locally with ffmpeg
-local_path = await store_media_file(
-    file=input_data.video,
-    execution_context=execution_context,
-    return_format="for_local_processing",
-)
-# local_path = "video.mp4" - use with Path/ffmpeg/etc
-
-# INPUT: Need to send to external API like Replicate
-image_b64 = await store_media_file(
-    file=input_data.image,
-    execution_context=execution_context,
-    return_format="for_external_api",
-)
-# image_b64 = "data:image/png;base64,iVBORw0..." - send to API
-
-# OUTPUT: Returning result from block
-result_url = await store_media_file(
-    file=generated_image_url,
-    execution_context=execution_context,
-    return_format="for_block_output",
-)
-yield "image_url", result_url
-# In CoPilot: result_url = "workspace://abc123"
-# In graphs:  result_url = "data:image/png;base64,..."
-```
-
-**Key points:**
-
- `for_block_output` is the ONLY format that auto-adapts to execution context
- Always use `for_block_output` for block outputs unless you have a specific reason not to
- Never hardcode workspace checks - let `for_block_output` handle it
-
-### Modifying the API
-
-1. Update route in `backend/api/features/`
-2. Add/update Pydantic models in same directory
-3. Write tests alongside the route file
-4. Run `poetry run test` to verify
-
-## Workspace & Media Files
-
-**Read [Workspace & Media Architecture](../../docs/platform/workspace-media-architecture.md) when:**
- Working on CoPilot file upload/download features
- Building blocks that handle `MediaFileType` inputs/outputs
- Modifying `WorkspaceManager` or `store_media_file()`
- Debugging file persistence or virus scanning issues
-
-Covers: `WorkspaceManager` (persistent storage with session scoping), `store_media_file()` (media normalization pipeline), and responsibility boundaries for virus scanning and persistence.
-
-## Security Implementation
-
-### Cache Protection Middleware
-
- Located in `backend/api/middleware/security.py`
- Default behavior: Disables caching for ALL endpoints with `Cache-Control: no-store, no-cache, must-revalidate, private`
- Uses an allow list approach - only explicitly permitted paths can be cached
- Cacheable paths include: static assets (`static/*`, `_next/static/*`), health checks, public store pages, documentation
- Prevents sensitive data (auth tokens, API keys, user data) from being cached by browsers/proxies
- To allow caching for a new endpoint, add it to `CACHEABLE_PATHS` in the middleware
- Applied to both main API server and external API applications
+@AGENTS.md
--- a/autogpt_platform/backend/backend/api/external/v1/integrations.py
+++ b/autogpt_platform/backend/backend/api/external/v1/integrations.py
@@ -18,14 +18,22 @@ from pydantic import BaseModel, Field, SecretStr

 from backend.api.external.middleware import require_permission
 from backend.api.features.integrations.models import get_all_provider_names
+from backend.api.features.integrations.router import (
+    CredentialsMetaResponse,
+    to_meta_response,
+)
 from backend.data.auth.base import APIAuthorizationInfo
 from backend.data.model import (
    APIKeyCredentials,
    Credentials,
    CredentialsType,
    HostScopedCredentials,
-    OAuth2Credentials,
    UserPasswordCredentials,
+    is_sdk_default,
+)
+from backend.integrations.credentials_store import (
+    is_system_credential,
+    provider_matches,
 )
 from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
@@ -91,18 +99,6 @@ class OAuthCompleteResponse(BaseModel):
    )


-class CredentialSummary(BaseModel):
-    """Summary of a credential without sensitive data."""
-
-    id: str
-    provider: str
-    type: CredentialsType
-    title: Optional[str] = None
-    scopes: Optional[list[str]] = None
-    username: Optional[str] = None
-    host: Optional[str] = None
-
-
 class ProviderInfo(BaseModel):
    """Information about an integration provider."""

@@ -473,12 +469,12 @@ async def complete_oauth(
    )


-@integrations_router.get("/credentials", response_model=list[CredentialSummary])
+@integrations_router.get("/credentials", response_model=list[CredentialsMetaResponse])
 async def list_credentials(
    auth: APIAuthorizationInfo = Security(
        require_permission(APIKeyPermission.READ_INTEGRATIONS)
    ),
-) -> list[CredentialSummary]:
+) -> list[CredentialsMetaResponse]:
    """
    List all credentials for the authenticated user.

@@ -486,28 +482,19 @@ async def list_credentials(
    """
    credentials = await creds_manager.store.get_all_creds(auth.user_id)
    return [
-        CredentialSummary(
-            id=cred.id,
-            provider=cred.provider,
-            type=cred.type,
-            title=cred.title,
-            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
-            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
-            host=cred.host if isinstance(cred, HostScopedCredentials) else None,
-        )
-        for cred in credentials
+        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
    ]


@integrations_router.get(
-    "/{provider}/credentials", response_model=list[CredentialSummary]
+    "/{provider}/credentials", response_model=list[CredentialsMetaResponse]
 )
 async def list_credentials_by_provider(
    provider: Annotated[str, Path(title="The provider to list credentials for")],
    auth: APIAuthorizationInfo = Security(
        require_permission(APIKeyPermission.READ_INTEGRATIONS)
    ),
-) -> list[CredentialSummary]:
+) -> list[CredentialsMetaResponse]:
    """
    List credentials for a specific provider.
    """
@@ -515,16 +502,7 @@ async def list_credentials_by_provider(
        auth.user_id, provider
    )
    return [
-        CredentialSummary(
-            id=cred.id,
-            provider=cred.provider,
-            type=cred.type,
-            title=cred.title,
-            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
-            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
-            host=cred.host if isinstance(cred, HostScopedCredentials) else None,
-        )
-        for cred in credentials
+        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
    ]


@@ -597,11 +575,11 @@ async def create_credential(
    # Store credentials
    try:
        await creds_manager.create(auth.user_id, credentials)
-    except Exception as e:
-        logger.error(f"Failed to store credentials: {e}")
+    except Exception:
+        logger.exception("Failed to store credentials")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-            detail=f"Failed to store credentials: {str(e)}",
+            detail="Failed to store credentials",
        )

    logger.info(f"Created {request.type} credentials for provider {provider}")
@@ -639,15 +617,23 @@ async def delete_credential(
    use the main API's delete endpoint which handles webhook cleanup and
    token revocation.
    """
+    if is_sdk_default(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
+        )
+    if is_system_credential(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="System-managed credentials cannot be deleted",
+        )
    creds = await creds_manager.store.get_creds_by_id(auth.user_id, cred_id)
    if not creds:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    if creds.provider != provider:
+    if not provider_matches(creds.provider, provider):
        raise HTTPException(
-            status_code=status.HTTP_404_NOT_FOUND,
-            detail="Credentials do not match the specified provider",
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )

    await creds_manager.delete(auth.user_id, cred_id)
--- a/autogpt_platform/backend/backend/api/external/v1/tools.py
+++ b/autogpt_platform/backend/backend/api/external/v1/tools.py
@@ -72,7 +72,7 @@ class RunAgentRequest(BaseModel):

 def _create_ephemeral_session(user_id: str) -> ChatSession:
    """Create an ephemeral session for stateless API requests."""
-    return ChatSession.new(user_id)
+    return ChatSession.new(user_id, dry_run=False)


@tools_router.post(
--- a/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes.py
@@ -0,0 +1,85 @@
+import logging
+import typing
+from datetime import datetime
+
+from autogpt_libs.auth import get_user_id, requires_admin_user
+from fastapi import APIRouter, Query, Security
+from pydantic import BaseModel
+
+from backend.data.platform_cost import (
+    CostLogRow,
+    PlatformCostDashboard,
+    get_platform_cost_dashboard,
+    get_platform_cost_logs,
+)
+from backend.util.models import Pagination
+
+logger = logging.getLogger(__name__)
+
+
+router = APIRouter(
+    prefix="/admin",
+    tags=["platform-cost", "admin"],
+    dependencies=[Security(requires_admin_user)],
+)
+
+
+class PlatformCostLogsResponse(BaseModel):
+    logs: list[CostLogRow]
+    pagination: Pagination
+
+
+@router.get(
+    "/platform_costs/dashboard",
+    response_model=PlatformCostDashboard,
+    summary="Get Platform Cost Dashboard",
+)
+async def get_cost_dashboard(
+    admin_user_id: str = Security(get_user_id),
+    start: typing.Optional[datetime] = Query(None),
+    end: typing.Optional[datetime] = Query(None),
+    provider: typing.Optional[str] = Query(None),
+    user_id: typing.Optional[str] = Query(None),
+):
+    logger.info(f"Admin {admin_user_id} fetching platform cost dashboard")
+    return await get_platform_cost_dashboard(
+        start=start,
+        end=end,
+        provider=provider,
+        user_id=user_id,
+    )
+
+
+@router.get(
+    "/platform_costs/logs",
+    response_model=PlatformCostLogsResponse,
+    summary="Get Platform Cost Logs",
+)
+async def get_cost_logs(
+    admin_user_id: str = Security(get_user_id),
+    start: typing.Optional[datetime] = Query(None),
+    end: typing.Optional[datetime] = Query(None),
+    provider: typing.Optional[str] = Query(None),
+    user_id: typing.Optional[str] = Query(None),
+    page: int = Query(1, ge=1),
+    page_size: int = Query(50, ge=1, le=200),
+):
+    logger.info(f"Admin {admin_user_id} fetching platform cost logs")
+    logs, total = await get_platform_cost_logs(
+        start=start,
+        end=end,
+        provider=provider,
+        user_id=user_id,
+        page=page,
+        page_size=page_size,
+    )
+    total_pages = (total + page_size - 1) // page_size
+    return PlatformCostLogsResponse(
+        logs=logs,
+        pagination=Pagination(
+            total_items=total,
+            total_pages=total_pages,
+            current_page=page,
+            page_size=page_size,
+        ),
+    )
--- a/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes_test.py
@@ -0,0 +1,135 @@
+from unittest.mock import AsyncMock
+
+import fastapi
+import fastapi.testclient
+import pytest
+import pytest_mock
+from autogpt_libs.auth.jwt_utils import get_jwt_payload
+
+from .platform_cost_routes import router as platform_cost_router
+
+app = fastapi.FastAPI()
+app.include_router(platform_cost_router)
+
+client = fastapi.testclient.TestClient(app)
+
+
+@pytest.fixture(autouse=True)
+def setup_app_admin_auth(mock_jwt_admin):
+    """Setup admin auth overrides for all tests in this module"""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
+    yield
+    app.dependency_overrides.clear()
+
+
+def test_get_dashboard_success(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    mock_dashboard = AsyncMock(
+        return_value=AsyncMock(
+            by_provider=[],
+            by_user=[],
+            total_cost_microdollars=0,
+            total_requests=0,
+            total_users=0,
+            model_dump=lambda **_: {
+                "by_provider": [],
+                "by_user": [],
+                "total_cost_microdollars": 0,
+                "total_requests": 0,
+                "total_users": 0,
+            },
+        )
+    )
+    mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
+        mock_dashboard,
+    )
+
+    response = client.get("/admin/platform_costs/dashboard")
+    assert response.status_code == 200
+    data = response.json()
+    assert "by_provider" in data
+    assert "by_user" in data
+    assert data["total_cost_microdollars"] == 0
+
+
+def test_get_logs_success(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
+        AsyncMock(return_value=([], 0)),
+    )
+
+    response = client.get("/admin/platform_costs/logs")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["logs"] == []
+    assert data["pagination"]["total_items"] == 0
+
+
+def test_get_dashboard_with_filters(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    mock_dashboard = AsyncMock(
+        return_value=AsyncMock(
+            by_provider=[],
+            by_user=[],
+            total_cost_microdollars=0,
+            total_requests=0,
+            total_users=0,
+            model_dump=lambda **_: {
+                "by_provider": [],
+                "by_user": [],
+                "total_cost_microdollars": 0,
+                "total_requests": 0,
+                "total_users": 0,
+            },
+        )
+    )
+    mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
+        mock_dashboard,
+    )
+
+    response = client.get(
+        "/admin/platform_costs/dashboard",
+        params={
+            "start": "2026-01-01T00:00:00",
+            "end": "2026-04-01T00:00:00",
+            "provider": "openai",
+            "user_id": "test-user-123",
+        },
+    )
+    assert response.status_code == 200
+    mock_dashboard.assert_called_once()
+    call_kwargs = mock_dashboard.call_args.kwargs
+    assert call_kwargs["provider"] == "openai"
+    assert call_kwargs["user_id"] == "test-user-123"
+    assert call_kwargs["start"] is not None
+    assert call_kwargs["end"] is not None
+
+
+def test_get_logs_with_pagination(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
+        AsyncMock(return_value=([], 0)),
+    )
+
+    response = client.get(
+        "/admin/platform_costs/logs",
+        params={"page": 2, "page_size": 25, "provider": "anthropic"},
+    )
+    assert response.status_code == 200
+    data = response.json()
+    assert data["pagination"]["current_page"] == 2
+    assert data["pagination"]["page_size"] == 25
+
+
+def test_get_dashboard_requires_admin() -> None:
+    app.dependency_overrides.clear()
+    response = client.get("/admin/platform_costs/dashboard")
+    assert response.status_code in (401, 403)
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
@@ -0,0 +1,253 @@
+"""Admin endpoints for checking and resetting user CoPilot rate limit usage."""
+
+import logging
+from typing import Optional
+
+from autogpt_libs.auth import get_user_id, requires_admin_user
+from fastapi import APIRouter, Body, HTTPException, Security
+from pydantic import BaseModel
+
+from backend.copilot.config import ChatConfig
+from backend.copilot.rate_limit import (
+    SubscriptionTier,
+    get_global_rate_limits,
+    get_usage_status,
+    get_user_tier,
+    reset_user_usage,
+    set_user_tier,
+)
+from backend.data.user import get_user_by_email, get_user_email_by_id, search_users
+
+logger = logging.getLogger(__name__)
+
+config = ChatConfig()
+
+router = APIRouter(
+    prefix="/admin",
+    tags=["copilot", "admin"],
+    dependencies=[Security(requires_admin_user)],
+)
+
+
+class UserRateLimitResponse(BaseModel):
+    user_id: str
+    user_email: Optional[str] = None
+    daily_token_limit: int
+    weekly_token_limit: int
+    daily_tokens_used: int
+    weekly_tokens_used: int
+    tier: SubscriptionTier
+
+
+class UserTierResponse(BaseModel):
+    user_id: str
+    tier: SubscriptionTier
+
+
+class SetUserTierRequest(BaseModel):
+    user_id: str
+    tier: SubscriptionTier
+
+
+async def _resolve_user_id(
+    user_id: Optional[str], email: Optional[str]
+) -> tuple[str, Optional[str]]:
+    """Resolve a user_id and email from the provided parameters.
+
+    Returns (user_id, email). Accepts either user_id or email; at least one
+    must be provided.  When both are provided, ``email`` takes precedence.
+    """
+    if email:
+        user = await get_user_by_email(email)
+        if not user:
+            raise HTTPException(
+                status_code=404, detail="No user found with the provided email."
+            )
+        return user.id, email
+
+    if not user_id:
+        raise HTTPException(
+            status_code=400,
+            detail="Either user_id or email query parameter is required.",
+        )
+
+    # We have a user_id; try to look up their email for display purposes.
+    # This is non-critical -- a failure should not block the response.
+    try:
+        resolved_email = await get_user_email_by_id(user_id)
+    except Exception:
+        logger.warning("Failed to resolve email for user %s", user_id, exc_info=True)
+        resolved_email = None
+    return user_id, resolved_email
+
+
+@router.get(
+    "/rate_limit",
+    response_model=UserRateLimitResponse,
+    summary="Get User Rate Limit",
+)
+async def get_user_rate_limit(
+    user_id: Optional[str] = None,
+    email: Optional[str] = None,
+    admin_user_id: str = Security(get_user_id),
+) -> UserRateLimitResponse:
+    """Get a user's current usage and effective rate limits. Admin-only.
+
+    Accepts either ``user_id`` or ``email`` as a query parameter.
+    When ``email`` is provided the user is looked up by email first.
+    """
+    resolved_id, resolved_email = await _resolve_user_id(user_id, email)
+
+    logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)
+
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
+        resolved_id, config.daily_token_limit, config.weekly_token_limit
+    )
+    usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)
+
+    return UserRateLimitResponse(
+        user_id=resolved_id,
+        user_email=resolved_email,
+        daily_token_limit=daily_limit,
+        weekly_token_limit=weekly_limit,
+        daily_tokens_used=usage.daily.used,
+        weekly_tokens_used=usage.weekly.used,
+        tier=tier,
+    )
+
+
+@router.post(
+    "/rate_limit/reset",
+    response_model=UserRateLimitResponse,
+    summary="Reset User Rate Limit Usage",
+)
+async def reset_user_rate_limit(
+    user_id: str = Body(embed=True),
+    reset_weekly: bool = Body(False, embed=True),
+    admin_user_id: str = Security(get_user_id),
+) -> UserRateLimitResponse:
+    """Reset a user's daily usage counter (and optionally weekly). Admin-only."""
+    logger.info(
+        "Admin %s resetting rate limit for user %s (reset_weekly=%s)",
+        admin_user_id,
+        user_id,
+        reset_weekly,
+    )
+
+    try:
+        await reset_user_usage(user_id, reset_weekly=reset_weekly)
+    except Exception as e:
+        logger.exception("Failed to reset user usage")
+        raise HTTPException(status_code=500, detail="Failed to reset usage") from e
+
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
+        user_id, config.daily_token_limit, config.weekly_token_limit
+    )
+    usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)
+
+    try:
+        resolved_email = await get_user_email_by_id(user_id)
+    except Exception:
+        logger.warning("Failed to resolve email for user %s", user_id, exc_info=True)
+        resolved_email = None
+
+    return UserRateLimitResponse(
+        user_id=user_id,
+        user_email=resolved_email,
+        daily_token_limit=daily_limit,
+        weekly_token_limit=weekly_limit,
+        daily_tokens_used=usage.daily.used,
+        weekly_tokens_used=usage.weekly.used,
+        tier=tier,
+    )
+
+
+@router.get(
+    "/rate_limit/tier",
+    response_model=UserTierResponse,
+    summary="Get User Rate Limit Tier",
+)
+async def get_user_rate_limit_tier(
+    user_id: str,
+    admin_user_id: str = Security(get_user_id),
+) -> UserTierResponse:
+    """Get a user's current rate-limit tier. Admin-only.
+
+    Returns 404 if the user does not exist in the database.
+    """
+    logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
+
+    resolved_email = await get_user_email_by_id(user_id)
+    if resolved_email is None:
+        raise HTTPException(status_code=404, detail=f"User {user_id} not found")
+
+    tier = await get_user_tier(user_id)
+    return UserTierResponse(user_id=user_id, tier=tier)
+
+
+@router.post(
+    "/rate_limit/tier",
+    response_model=UserTierResponse,
+    summary="Set User Rate Limit Tier",
+)
+async def set_user_rate_limit_tier(
+    request: SetUserTierRequest,
+    admin_user_id: str = Security(get_user_id),
+) -> UserTierResponse:
+    """Set a user's rate-limit tier. Admin-only."""
+    old_tier = await get_user_tier(request.user_id)
+
+    # Resolve email for audit logging (non-blocking — don't fail the
+    # tier change if email lookup fails).
+    try:
+        resolved_email = await get_user_email_by_id(request.user_id)
+    except Exception:
+        logger.warning(
+            "Failed to resolve email for user %s", request.user_id, exc_info=True
+        )
+        resolved_email = None
+    logger.info(
+        "Admin %s changing tier for user %s (%s): %s -> %s",
+        admin_user_id,
+        request.user_id,
+        resolved_email or "unknown",
+        old_tier.value,
+        request.tier.value,
+    )
+    try:
+        await set_user_tier(request.user_id, request.tier)
+    except Exception as e:
+        logger.exception("Failed to set user tier")
+        raise HTTPException(status_code=500, detail="Failed to set tier") from e
+
+    return UserTierResponse(user_id=request.user_id, tier=request.tier)
+
+
+class UserSearchResult(BaseModel):
+    user_id: str
+    user_email: Optional[str] = None
+
+
+@router.get(
+    "/rate_limit/search_users",
+    response_model=list[UserSearchResult],
+    summary="Search Users by Name or Email",
+)
+async def admin_search_users(
+    query: str,
+    limit: int = 20,
+    admin_user_id: str = Security(get_user_id),
+) -> list[UserSearchResult]:
+    """Search users by partial email or name. Admin-only.
+
+    Queries the User table directly — returns results even for users
+    without credit transaction history.
+    """
+    if len(query.strip()) < 3:
+        raise HTTPException(
+            status_code=400,
+            detail="Search query must be at least 3 characters.",
+        )
+    logger.info("Admin %s searching users with query=%r", admin_user_id, query)
+    results = await search_users(query, limit=max(1, min(limit, 50)))
+    return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
@@ -0,0 +1,557 @@
+import json
+from types import SimpleNamespace
+from unittest.mock import AsyncMock
+
+import fastapi
+import fastapi.testclient
+import pytest
+import pytest_mock
+from autogpt_libs.auth.jwt_utils import get_jwt_payload
+from pytest_snapshot.plugin import Snapshot
+
+from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
+
+from .rate_limit_admin_routes import router as rate_limit_admin_router
+
+app = fastapi.FastAPI()
+app.include_router(rate_limit_admin_router)
+
+client = fastapi.testclient.TestClient(app)
+
+_MOCK_MODULE = "backend.api.features.admin.rate_limit_admin_routes"
+
+_TARGET_EMAIL = "target@example.com"
+
+
+@pytest.fixture(autouse=True)
+def setup_app_admin_auth(mock_jwt_admin):
+    """Setup admin auth overrides for all tests in this module"""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
+    yield
+    app.dependency_overrides.clear()
+
+
+def _mock_usage_status(
+    daily_used: int = 500_000, weekly_used: int = 3_000_000
+) -> CoPilotUsageStatus:
+    from datetime import UTC, datetime, timedelta
+
+    now = datetime.now(UTC)
+    return CoPilotUsageStatus(
+        daily=UsageWindow(
+            used=daily_used, limit=2_500_000, resets_at=now + timedelta(hours=6)
+        ),
+        weekly=UsageWindow(
+            used=weekly_used, limit=12_500_000, resets_at=now + timedelta(days=3)
+        ),
+    )
+
+
+def _patch_rate_limit_deps(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+    daily_used: int = 500_000,
+    weekly_used: int = 3_000_000,
+):
+    """Patch the common rate-limit + user-lookup dependencies."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_global_rate_limits",
+        new_callable=AsyncMock,
+        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_usage_status",
+        new_callable=AsyncMock,
+        return_value=_mock_usage_status(daily_used=daily_used, weekly_used=weekly_used),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+
+
+def test_get_rate_limit(
+    mocker: pytest_mock.MockerFixture,
+    configured_snapshot: Snapshot,
+    target_user_id: str,
+) -> None:
+    """Test getting rate limit and usage for a user."""
+    _patch_rate_limit_deps(mocker, target_user_id)
+
+    response = client.get("/admin/rate_limit", params={"user_id": target_user_id})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["user_email"] == _TARGET_EMAIL
+    assert data["daily_token_limit"] == 2_500_000
+    assert data["weekly_token_limit"] == 12_500_000
+    assert data["daily_tokens_used"] == 500_000
+    assert data["weekly_tokens_used"] == 3_000_000
+    assert data["tier"] == "FREE"
+
+    configured_snapshot.assert_match(
+        json.dumps(data, indent=2, sort_keys=True) + "\n",
+        "get_rate_limit",
+    )
+
+
+def test_get_rate_limit_by_email(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test looking up rate limits via email instead of user_id."""
+    _patch_rate_limit_deps(mocker, target_user_id)
+
+    mock_user = SimpleNamespace(id=target_user_id, email=_TARGET_EMAIL)
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_by_email",
+        new_callable=AsyncMock,
+        return_value=mock_user,
+    )
+
+    response = client.get("/admin/rate_limit", params={"email": _TARGET_EMAIL})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["user_email"] == _TARGET_EMAIL
+    assert data["daily_token_limit"] == 2_500_000
+
+
+def test_get_rate_limit_by_email_not_found(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    """Test that looking up a non-existent email returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_by_email",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.get("/admin/rate_limit", params={"email": "nobody@example.com"})
+
+    assert response.status_code == 404
+
+
+def test_get_rate_limit_no_params() -> None:
+    """Test that omitting both user_id and email returns 400."""
+    response = client.get("/admin/rate_limit")
+    assert response.status_code == 400
+
+
+def test_reset_user_usage_daily_only(
+    mocker: pytest_mock.MockerFixture,
+    configured_snapshot: Snapshot,
+    target_user_id: str,
+) -> None:
+    """Test resetting only daily usage (default behaviour)."""
+    mock_reset = mocker.patch(
+        f"{_MOCK_MODULE}.reset_user_usage",
+        new_callable=AsyncMock,
+    )
+    _patch_rate_limit_deps(mocker, target_user_id, daily_used=0, weekly_used=3_000_000)
+
+    response = client.post(
+        "/admin/rate_limit/reset",
+        json={"user_id": target_user_id},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["daily_tokens_used"] == 0
+    # Weekly is untouched
+    assert data["weekly_tokens_used"] == 3_000_000
+    assert data["tier"] == "FREE"
+
+    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)
+
+    configured_snapshot.assert_match(
+        json.dumps(data, indent=2, sort_keys=True) + "\n",
+        "reset_user_usage_daily_only",
+    )
+
+
+def test_reset_user_usage_daily_and_weekly(
+    mocker: pytest_mock.MockerFixture,
+    configured_snapshot: Snapshot,
+    target_user_id: str,
+) -> None:
+    """Test resetting both daily and weekly usage."""
+    mock_reset = mocker.patch(
+        f"{_MOCK_MODULE}.reset_user_usage",
+        new_callable=AsyncMock,
+    )
+    _patch_rate_limit_deps(mocker, target_user_id, daily_used=0, weekly_used=0)
+
+    response = client.post(
+        "/admin/rate_limit/reset",
+        json={"user_id": target_user_id, "reset_weekly": True},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["daily_tokens_used"] == 0
+    assert data["weekly_tokens_used"] == 0
+    assert data["tier"] == "FREE"
+
+    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)
+
+    configured_snapshot.assert_match(
+        json.dumps(data, indent=2, sort_keys=True) + "\n",
+        "reset_user_usage_daily_and_weekly",
+    )
+
+
+def test_reset_user_usage_redis_failure(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that Redis failure on reset returns 500."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.reset_user_usage",
+        new_callable=AsyncMock,
+        side_effect=Exception("Redis connection refused"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/reset",
+        json={"user_id": target_user_id},
+    )
+
+    assert response.status_code == 500
+
+
+def test_get_rate_limit_email_lookup_failure(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that failing to resolve a user email degrades gracefully."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_global_rate_limits",
+        new_callable=AsyncMock,
+        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_usage_status",
+        new_callable=AsyncMock,
+        return_value=_mock_usage_status(),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection lost"),
+    )
+
+    response = client.get("/admin/rate_limit", params={"user_id": target_user_id})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["user_email"] is None
+
+
+def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
+    """Test that rate limit admin endpoints require admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit", params={"user_id": "test"})
+    assert response.status_code == 403
+
+    response = client.post(
+        "/admin/rate_limit/reset",
+        json={"user_id": "test"},
+    )
+    assert response.status_code == 403
+
+
+# ---------------------------------------------------------------------------
+# Tier management endpoints
+# ---------------------------------------------------------------------------
+
+
+def test_get_user_tier(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test getting a user's rate-limit tier."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.PRO,
+    )
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "PRO"
+
+
+def test_get_user_tier_user_not_found(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that getting tier for a non-existent user returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test setting a user's rate-limit tier (upgrade)."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "ENTERPRISE"},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "ENTERPRISE"
+    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
+
+
+def test_set_user_tier_downgrade(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test downgrading a user's tier from PRO to FREE."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.PRO,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "FREE"},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "FREE"
+    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
+
+
+def test_set_user_tier_invalid_tier(
+    target_user_id: str,
+) -> None:
+    """Test that setting an invalid tier returns 422."""
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "invalid"},
+    )
+
+    assert response.status_code == 422
+
+
+def test_set_user_tier_invalid_tier_uppercase(
+    target_user_id: str,
+) -> None:
+    """Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
+
+    Regression: ensures Pydantic enum validation rejects values that are not
+    members of SubscriptionTier, even when they look like valid enum names.
+    """
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "INVALID"},
+    )
+
+    assert response.status_code == 422
+    body = response.json()
+    assert "detail" in body
+
+
+def test_set_user_tier_email_lookup_failure_non_blocking(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that email lookup failure doesn't block tier change."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection failed"),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 200
+    mock_set.assert_awaited_once()
+
+
+def test_set_user_tier_db_failure(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that DB failure on set tier returns 500."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection refused"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 500
+
+
+def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
+    """Test that tier admin endpoints require admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
+    assert response.status_code == 403
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": "test", "tier": "PRO"},
+    )
+    assert response.status_code == 403
+
+
+# ─── search_users endpoint ──────────────────────────────────────────
+
+
+def test_search_users_returns_matching_users(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Partial search should return all matching users from the User table."""
+    mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[
+            ("user-1", "zamil.majdy@gmail.com"),
+            ("user-2", "zamil.majdy@agpt.co"),
+        ],
+    )
+
+    response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
+
+    assert response.status_code == 200
+    results = response.json()
+    assert len(results) == 2
+    assert results[0]["user_email"] == "zamil.majdy@gmail.com"
+    assert results[1]["user_email"] == "zamil.majdy@agpt.co"
+
+
+def test_search_users_empty_results(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Search with no matches returns empty list."""
+    mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[],
+    )
+
+    response = client.get(
+        "/admin/rate_limit/search_users", params={"query": "nonexistent"}
+    )
+
+    assert response.status_code == 200
+    assert response.json() == []
+
+
+def test_search_users_short_query_rejected(
+    admin_user_id: str,
+) -> None:
+    """Query shorter than 3 characters should return 400."""
+    response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
+    assert response.status_code == 400
+
+
+def test_search_users_negative_limit_clamped(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Negative limit should be clamped to 1, not passed through."""
+    mock_search = mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[],
+    )
+
+    response = client.get(
+        "/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
+    )
+
+    assert response.status_code == 200
+    mock_search.assert_awaited_once_with("test", limit=1)
+
+
+def test_search_users_requires_admin_role(mock_jwt_user) -> None:
+    """Test that the search_users endpoint requires admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
+    assert response.status_code == 403
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
@@ -7,6 +7,8 @@ import fastapi
 import fastapi.responses
 import prisma.enums

+import backend.api.features.library.db as library_db
+import backend.api.features.library.model as library_model
 import backend.api.features.store.cache as store_cache
 import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
@@ -132,3 +134,40 @@ async def admin_download_agent_file(
        return fastapi.responses.FileResponse(
            tmp_file.name, filename=file_name, media_type="application/json"
        )
+
+
+@router.get(
+    "/submissions/{store_listing_version_id}/preview",
+    summary="Admin Preview Submission Listing",
+)
+async def admin_preview_submission(
+    store_listing_version_id: str,
+) -> store_model.StoreAgentDetails:
+    """
+    Preview a marketplace submission as it would appear on the listing page.
+    Bypasses the APPROVED-only StoreAgent view so admins can preview pending
+    submissions before approving.
+    """
+    return await store_db.get_store_agent_details_as_admin(store_listing_version_id)
+
+
+@router.post(
+    "/submissions/{store_listing_version_id}/add-to-library",
+    summary="Admin Add Pending Agent to Library",
+    status_code=201,
+)
+async def admin_add_agent_to_library(
+    store_listing_version_id: str,
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+) -> library_model.LibraryAgent:
+    """
+    Add a pending marketplace agent to the admin's library for review.
+    Uses admin-level access to bypass marketplace APPROVED-only checks.
+
+    The builder can load the graph because get_graph() checks library
+    membership as a fallback: "you added it, you keep it."
+    """
+    return await library_db.add_store_agent_to_library_as_admin(
+        store_listing_version_id=store_listing_version_id,
+        user_id=user_id,
+    )
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
@@ -0,0 +1,335 @@
+"""Tests for admin store routes and the bypass logic they depend on.
+
+Tests are organized by what they protect:
+- SECRT-2162: get_graph_as_admin bypasses ownership/marketplace checks
+- SECRT-2167 security: admin endpoints reject non-admin users
+- SECRT-2167 bypass: preview queries StoreListingVersion (not StoreAgent view),
+  and add-to-library uses get_graph_as_admin (not get_graph)
+"""
+
+from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import fastapi
+import fastapi.responses
+import fastapi.testclient
+import pytest
+import pytest_mock
+from autogpt_libs.auth.jwt_utils import get_jwt_payload
+
+from backend.data.graph import get_graph_as_admin
+from backend.util.exceptions import NotFoundError
+
+from .store_admin_routes import router as store_admin_router
+
+# Shared constants
+ADMIN_USER_ID = "admin-user-id"
+CREATOR_USER_ID = "other-creator-id"
+GRAPH_ID = "test-graph-id"
+GRAPH_VERSION = 3
+SLV_ID = "test-store-listing-version-id"
+
+
+def _make_mock_graph(user_id: str = CREATOR_USER_ID) -> MagicMock:
+    graph = MagicMock()
+    graph.userId = user_id
+    graph.id = GRAPH_ID
+    graph.version = GRAPH_VERSION
+    graph.Nodes = []
+    return graph
+
+
+# ---- SECRT-2162: get_graph_as_admin bypasses ownership checks ---- #
+
+
+@pytest.mark.asyncio
+async def test_admin_can_access_pending_agent_not_owned() -> None:
+    """get_graph_as_admin must return a graph even when the admin doesn't own
+    it and it's not APPROVED in the marketplace."""
+    mock_graph = _make_mock_graph()
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch("backend.data.graph.AgentGraph.prisma") as mock_prisma,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ),
+    ):
+        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
+
+        result = await get_graph_as_admin(
+            graph_id=GRAPH_ID,
+            version=GRAPH_VERSION,
+            user_id=ADMIN_USER_ID,
+            for_export=False,
+        )
+
+    assert result is mock_graph_model
+
+
+@pytest.mark.asyncio
+async def test_admin_download_pending_agent_with_subagents() -> None:
+    """get_graph_as_admin with for_export=True must call get_sub_graphs
+    and pass sub_graphs to GraphModel.from_db."""
+    mock_graph = _make_mock_graph()
+    mock_sub_graph = MagicMock(name="SubGraph")
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch("backend.data.graph.AgentGraph.prisma") as mock_prisma,
+        patch(
+            "backend.data.graph.get_sub_graphs",
+            new_callable=AsyncMock,
+            return_value=[mock_sub_graph],
+        ) as mock_get_sub,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ) as mock_from_db,
+    ):
+        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
+
+        result = await get_graph_as_admin(
+            graph_id=GRAPH_ID,
+            version=GRAPH_VERSION,
+            user_id=ADMIN_USER_ID,
+            for_export=True,
+        )
+
+    assert result is mock_graph_model
+    mock_get_sub.assert_awaited_once_with(mock_graph)
+    mock_from_db.assert_called_once_with(
+        graph=mock_graph,
+        sub_graphs=[mock_sub_graph],
+        for_export=True,
+    )
+
+
+# ---- SECRT-2167 security: admin endpoints reject non-admin users ---- #
+
+app = fastapi.FastAPI()
+app.include_router(store_admin_router)
+
+
+@app.exception_handler(NotFoundError)
+async def _not_found_handler(
+    request: fastapi.Request, exc: NotFoundError
+) -> fastapi.responses.JSONResponse:
+    return fastapi.responses.JSONResponse(status_code=404, content={"detail": str(exc)})
+
+
+client = fastapi.testclient.TestClient(app)
+
+
+@pytest.fixture(autouse=True)
+def setup_app_admin_auth(mock_jwt_admin):
+    """Setup admin auth overrides for all route tests in this module."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
+    yield
+    app.dependency_overrides.clear()
+
+
+def test_preview_requires_admin(mock_jwt_user) -> None:
+    """Non-admin users must get 403 on the preview endpoint."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+    response = client.get(f"/admin/submissions/{SLV_ID}/preview")
+    assert response.status_code == 403
+
+
+def test_add_to_library_requires_admin(mock_jwt_user) -> None:
+    """Non-admin users must get 403 on the add-to-library endpoint."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+    response = client.post(f"/admin/submissions/{SLV_ID}/add-to-library")
+    assert response.status_code == 403
+
+
+def test_preview_nonexistent_submission(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    """Preview of a nonexistent submission returns 404."""
+    mocker.patch(
+        "backend.api.features.admin.store_admin_routes.store_db"
+        ".get_store_agent_details_as_admin",
+        side_effect=NotFoundError("not found"),
+    )
+    response = client.get(f"/admin/submissions/{SLV_ID}/preview")
+    assert response.status_code == 404
+
+
+# ---- SECRT-2167 bypass: verify the right data sources are used ---- #
+
+
+@pytest.mark.asyncio
+async def test_preview_queries_store_listing_version_not_store_agent() -> None:
+    """get_store_agent_details_as_admin must query StoreListingVersion
+    directly (not the APPROVED-only StoreAgent view). This is THE test that
+    prevents the bypass from being accidentally reverted."""
+    from backend.api.features.store.db import get_store_agent_details_as_admin
+
+    mock_slv = MagicMock()
+    mock_slv.id = SLV_ID
+    mock_slv.name = "Test Agent"
+    mock_slv.subHeading = "Short desc"
+    mock_slv.description = "Long desc"
+    mock_slv.videoUrl = None
+    mock_slv.agentOutputDemoUrl = None
+    mock_slv.imageUrls = ["https://example.com/img.png"]
+    mock_slv.instructions = None
+    mock_slv.categories = ["productivity"]
+    mock_slv.version = 1
+    mock_slv.agentGraphId = GRAPH_ID
+    mock_slv.agentGraphVersion = GRAPH_VERSION
+    mock_slv.updatedAt = datetime(2026, 3, 24, tzinfo=timezone.utc)
+    mock_slv.recommendedScheduleCron = "0 9 * * *"
+
+    mock_listing = MagicMock()
+    mock_listing.id = "listing-id"
+    mock_listing.slug = "test-agent"
+    mock_listing.activeVersionId = SLV_ID
+    mock_listing.hasApprovedVersion = False
+    mock_listing.CreatorProfile = MagicMock(username="creator", avatarUrl="")
+    mock_slv.StoreListing = mock_listing
+
+    with (
+        patch(
+            "backend.api.features.store.db.prisma.models" ".StoreListingVersion.prisma",
+        ) as mock_slv_prisma,
+        patch(
+            "backend.api.features.store.db.prisma.models.StoreAgent.prisma",
+        ) as mock_store_agent_prisma,
+    ):
+        mock_slv_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
+
+        result = await get_store_agent_details_as_admin(SLV_ID)
+
+    # Verify it queried StoreListingVersion (not the APPROVED-only StoreAgent)
+    mock_slv_prisma.return_value.find_unique.assert_awaited_once()
+    await_args = mock_slv_prisma.return_value.find_unique.await_args
+    assert await_args is not None
+    assert await_args.kwargs["where"] == {"id": SLV_ID}
+
+    # Verify the APPROVED-only StoreAgent view was NOT touched
+    mock_store_agent_prisma.assert_not_called()
+
+    # Verify the result has the right data
+    assert result.agent_name == "Test Agent"
+    assert result.agent_image == ["https://example.com/img.png"]
+    assert result.has_approved_version is False
+    assert result.runs == 0
+    assert result.rating == 0.0
+
+
+@pytest.mark.asyncio
+async def test_resolve_graph_admin_uses_get_graph_as_admin() -> None:
+    """resolve_graph_for_library(admin=True) must call get_graph_as_admin,
+    not get_graph. This is THE test that prevents the add-to-library bypass
+    from being accidentally reverted."""
+    from backend.api.features.library._add_to_library import resolve_graph_for_library
+
+    mock_slv = MagicMock()
+    mock_slv.AgentGraph = MagicMock(id=GRAPH_ID, version=GRAPH_VERSION)
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch(
+            "backend.api.features.library._add_to_library.prisma.models"
+            ".StoreListingVersion.prisma",
+        ) as mock_prisma,
+        patch(
+            "backend.api.features.library._add_to_library.graph_db"
+            ".get_graph_as_admin",
+            new_callable=AsyncMock,
+            return_value=mock_graph_model,
+        ) as mock_admin,
+        patch(
+            "backend.api.features.library._add_to_library.graph_db.get_graph",
+            new_callable=AsyncMock,
+        ) as mock_regular,
+    ):
+        mock_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
+
+        result = await resolve_graph_for_library(SLV_ID, ADMIN_USER_ID, admin=True)
+
+    assert result is mock_graph_model
+    mock_admin.assert_awaited_once_with(
+        graph_id=GRAPH_ID, version=GRAPH_VERSION, user_id=ADMIN_USER_ID
+    )
+    mock_regular.assert_not_awaited()
+
+
+@pytest.mark.asyncio
+async def test_resolve_graph_regular_uses_get_graph() -> None:
+    """resolve_graph_for_library(admin=False) must call get_graph,
+    not get_graph_as_admin. Ensures the non-admin path is preserved."""
+    from backend.api.features.library._add_to_library import resolve_graph_for_library
+
+    mock_slv = MagicMock()
+    mock_slv.AgentGraph = MagicMock(id=GRAPH_ID, version=GRAPH_VERSION)
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch(
+            "backend.api.features.library._add_to_library.prisma.models"
+            ".StoreListingVersion.prisma",
+        ) as mock_prisma,
+        patch(
+            "backend.api.features.library._add_to_library.graph_db"
+            ".get_graph_as_admin",
+            new_callable=AsyncMock,
+        ) as mock_admin,
+        patch(
+            "backend.api.features.library._add_to_library.graph_db.get_graph",
+            new_callable=AsyncMock,
+            return_value=mock_graph_model,
+        ) as mock_regular,
+    ):
+        mock_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
+
+        result = await resolve_graph_for_library(SLV_ID, "regular-user-id", admin=False)
+
+    assert result is mock_graph_model
+    mock_regular.assert_awaited_once_with(
+        graph_id=GRAPH_ID, version=GRAPH_VERSION, user_id="regular-user-id"
+    )
+    mock_admin.assert_not_awaited()
+
+
+# ---- Library membership grants graph access (product decision) ---- #
+
+
+@pytest.mark.asyncio
+async def test_library_member_can_view_pending_agent_in_builder() -> None:
+    """After adding a pending agent to their library, the user should be
+    able to load the graph in the builder via get_graph()."""
+    mock_graph = _make_mock_graph()
+    mock_graph_model = MagicMock(name="GraphModel")
+    mock_library_agent = MagicMock()
+    mock_library_agent.AgentGraph = mock_graph
+
+    with (
+        patch("backend.data.graph.AgentGraph.prisma") as mock_ag_prisma,
+        patch(
+            "backend.data.graph.StoreListingVersion.prisma",
+        ) as mock_slv_prisma,
+        patch("backend.data.graph.LibraryAgent.prisma") as mock_lib_prisma,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ),
+    ):
+        mock_ag_prisma.return_value.find_first = AsyncMock(return_value=None)
+        mock_slv_prisma.return_value.find_first = AsyncMock(return_value=None)
+        mock_lib_prisma.return_value.find_first = AsyncMock(
+            return_value=mock_library_agent
+        )
+
+        from backend.data.graph import get_graph
+
+        result = await get_graph(
+            graph_id=GRAPH_ID,
+            version=GRAPH_VERSION,
+            user_id=ADMIN_USER_ID,
+        )
+
+    assert result is mock_graph_model, "Library membership should grant graph access"
--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
@@ -4,14 +4,14 @@ import asyncio
 import logging
 import re
 from collections.abc import AsyncGenerator
-from typing import Annotated
+from typing import Annotated, Literal
 from uuid import uuid4

 from autogpt_libs import auth
 from fastapi import APIRouter, HTTPException, Query, Response, Security
 from fastapi.responses import StreamingResponse
 from prisma.models import UserWorkspaceFile
-from pydantic import BaseModel, Field, field_validator
+from pydantic import BaseModel, ConfigDict, Field, field_validator

 from backend.copilot import service as chat_service
 from backend.copilot import stream_registry
@@ -20,6 +20,7 @@ from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_
 from backend.copilot.model import (
    ChatMessage,
    ChatSession,
+    ChatSessionMetadata,
    append_and_save_message,
    create_chat_session,
    delete_chat_session,
@@ -30,8 +31,14 @@ from backend.copilot.model import (
 from backend.copilot.rate_limit import (
    CoPilotUsageStatus,
    RateLimitExceeded,
+    acquire_reset_lock,
    check_rate_limit,
+    get_daily_reset_count,
+    get_global_rate_limits,
    get_usage_status,
+    increment_daily_reset_count,
+    release_reset_lock,
+    reset_daily_usage,
 )
 from backend.copilot.response_model import StreamError, StreamFinish, StreamHeartbeat
 from backend.copilot.tools.e2b_sandbox import kill_sandbox
@@ -59,9 +66,16 @@ from backend.copilot.tools.models import (
    UnderstandingUpdatedResponse,
 )
 from backend.copilot.tracking import track_user_message
+from backend.data.credit import UsageTransactionMetadata, get_user_credit_model
 from backend.data.redis_client import get_redis_async
+from backend.data.understanding import get_business_understanding
 from backend.data.workspace import get_or_create_workspace
-from backend.util.exceptions import NotFoundError
+from backend.util.exceptions import InsufficientBalanceError, NotFoundError
+from backend.util.settings import Settings
+
+settings = Settings()
+
+logger = logging.getLogger(__name__)

 config = ChatConfig()

@@ -69,8 +83,6 @@ _UUID_RE = re.compile(
    r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", re.I
 )

-logger = logging.getLogger(__name__)
-

 async def _validate_and_get_session(
    session_id: str,
@@ -99,6 +111,23 @@ class StreamChatRequest(BaseModel):
    file_ids: list[str] | None = Field(
        default=None, max_length=20
    )  # Workspace file IDs attached to this message
+    mode: Literal["fast", "extended_thinking"] | None = Field(
+        default=None,
+        description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
+        "If None, uses the server default (extended_thinking).",
+    )
+
+
+class CreateSessionRequest(BaseModel):
+    """Request model for creating a new chat session.
+
+    ``dry_run`` is a **top-level** field — do not nest it inside ``metadata``.
+    Extra/unknown fields are rejected (422) to prevent silent mis-use.
+    """
+
+    model_config = ConfigDict(extra="forbid")
+
+    dry_run: bool = False


 class CreateSessionResponse(BaseModel):
@@ -107,6 +136,7 @@ class CreateSessionResponse(BaseModel):
    id: str
    created_at: str
    user_id: str | None
+    metadata: ChatSessionMetadata = ChatSessionMetadata()


 class ActiveStreamInfo(BaseModel):
@@ -127,6 +157,7 @@ class SessionDetailResponse(BaseModel):
    active_stream: ActiveStreamInfo | None = None  # Present if stream is still active
    total_prompt_tokens: int = 0
    total_completion_tokens: int = 0
+    metadata: ChatSessionMetadata = ChatSessionMetadata()


 class SessionSummaryResponse(BaseModel):
@@ -237,6 +268,7 @@ async def list_sessions(
 )
 async def create_session(
    user_id: Annotated[str, Security(auth.get_user_id)],
+    request: CreateSessionRequest | None = None,
 ) -> CreateSessionResponse:
    """
    Create a new chat session.
@@ -245,22 +277,28 @@ async def create_session(

    Args:
        user_id: The authenticated user ID parsed from the JWT (required).
+        request: Optional request body. When provided, ``dry_run=True``
+            forces run_block and run_agent calls to use dry-run simulation.

    Returns:
        CreateSessionResponse: Details of the created session.

    """
+    dry_run = request.dry_run if request else False
+
    logger.info(
        f"Creating session with user_id: "
        f"...{user_id[-8:] if len(user_id) > 8 else '<redacted>'}"
+        f"{', dry_run=True' if dry_run else ''}"
    )

-    session = await create_chat_session(user_id)
+    session = await create_chat_session(user_id, dry_run=dry_run)

    return CreateSessionResponse(
        id=session.session_id,
        created_at=session.started_at.isoformat(),
        user_id=session.user_id,
+        metadata=session.metadata,
    )


@@ -409,6 +447,7 @@ async def get_session(
        active_stream=active_stream_info,
        total_prompt_tokens=total_prompt,
        total_completion_tokens=total_completion,
+        metadata=session.metadata,
    )


@@ -421,11 +460,193 @@ async def get_copilot_usage(
    """Get CoPilot usage status for the authenticated user.

    Returns current token usage vs limits for daily and weekly windows.
+    Global defaults sourced from LaunchDarkly (falling back to config).
+    Includes the user's rate-limit tier.
    """
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
+        user_id, config.daily_token_limit, config.weekly_token_limit
+    )
    return await get_usage_status(
        user_id=user_id,
-        daily_token_limit=config.daily_token_limit,
-        weekly_token_limit=config.weekly_token_limit,
+        daily_token_limit=daily_limit,
+        weekly_token_limit=weekly_limit,
+        rate_limit_reset_cost=config.rate_limit_reset_cost,
+        tier=tier,
+    )
+
+
+class RateLimitResetResponse(BaseModel):
+    """Response from resetting the daily rate limit."""
+
+    success: bool
+    credits_charged: int = Field(description="Credits charged (in cents)")
+    remaining_balance: int = Field(description="Credit balance after charge (in cents)")
+    usage: CoPilotUsageStatus = Field(description="Updated usage status after reset")
+
+
+@router.post(
+    "/usage/reset",
+    status_code=200,
+    responses={
+        400: {
+            "description": "Bad Request (feature disabled or daily limit not reached)"
+        },
+        402: {"description": "Payment Required (insufficient credits)"},
+        429: {
+            "description": "Too Many Requests (max daily resets exceeded or reset in progress)"
+        },
+        503: {
+            "description": "Service Unavailable (Redis reset failed; credits refunded or support needed)"
+        },
+    },
+)
+async def reset_copilot_usage(
+    user_id: Annotated[str, Security(auth.get_user_id)],
+) -> RateLimitResetResponse:
+    """Reset the daily CoPilot rate limit by spending credits.
+
+    Allows users who have hit their daily token limit to spend credits
+    to reset their daily usage counter and continue working.
+    Returns 400 if the feature is disabled or the user is not over the limit.
+    Returns 402 if the user has insufficient credits.
+    """
+    cost = config.rate_limit_reset_cost
+    if cost <= 0:
+        raise HTTPException(
+            status_code=400,
+            detail="Rate limit reset is not available.",
+        )
+
+    if not settings.config.enable_credit:
+        raise HTTPException(
+            status_code=400,
+            detail="Rate limit reset is not available (credit system is disabled).",
+        )
+
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
+        user_id, config.daily_token_limit, config.weekly_token_limit
+    )
+
+    if daily_limit <= 0:
+        raise HTTPException(
+            status_code=400,
+            detail="No daily limit is configured — nothing to reset.",
+        )
+
+    # Check max daily resets.  get_daily_reset_count returns None when Redis
+    # is unavailable; reject the reset in that case to prevent unlimited
+    # free resets when the counter store is down.
+    reset_count = await get_daily_reset_count(user_id)
+    if reset_count is None:
+        raise HTTPException(
+            status_code=503,
+            detail="Unable to verify reset eligibility — please try again later.",
+        )
+    if config.max_daily_resets > 0 and reset_count >= config.max_daily_resets:
+        raise HTTPException(
+            status_code=429,
+            detail=f"You've used all {config.max_daily_resets} resets for today.",
+        )
+
+    # Acquire a per-user lock to prevent TOCTOU races (concurrent resets).
+    if not await acquire_reset_lock(user_id):
+        raise HTTPException(
+            status_code=429,
+            detail="A reset is already in progress. Please try again.",
+        )
+
+    try:
+        # Verify the user is actually at or over their daily limit.
+        # (rate_limit_reset_cost intentionally omitted — this object is only
+        # used for limit checks, not returned to the client.)
+        usage_status = await get_usage_status(
+            user_id=user_id,
+            daily_token_limit=daily_limit,
+            weekly_token_limit=weekly_limit,
+            tier=tier,
+        )
+        if daily_limit > 0 and usage_status.daily.used < daily_limit:
+            raise HTTPException(
+                status_code=400,
+                detail="You have not reached your daily limit yet.",
+            )
+
+        # If the weekly limit is also exhausted, resetting the daily counter
+        # won't help — the user would still be blocked by the weekly limit.
+        if weekly_limit > 0 and usage_status.weekly.used >= weekly_limit:
+            raise HTTPException(
+                status_code=400,
+                detail="Your weekly limit is also reached. Resetting the daily limit won't help.",
+            )
+
+        # Charge credits.
+        credit_model = await get_user_credit_model(user_id)
+        try:
+            remaining = await credit_model.spend_credits(
+                user_id=user_id,
+                cost=cost,
+                metadata=UsageTransactionMetadata(
+                    reason="CoPilot daily rate limit reset",
+                ),
+            )
+        except InsufficientBalanceError as e:
+            raise HTTPException(
+                status_code=402,
+                detail="Insufficient credits to reset your rate limit.",
+            ) from e
+
+        # Reset daily usage in Redis.  If this fails, refund the credits
+        # so the user is not charged for a service they did not receive.
+        if not await reset_daily_usage(user_id, daily_token_limit=daily_limit):
+            # Compensate: refund the charged credits.
+            refunded = False
+            try:
+                await credit_model.top_up_credits(user_id, cost)
+                refunded = True
+                logger.warning(
+                    "Refunded %d credits to user %s after Redis reset failure",
+                    cost,
+                    user_id[:8],
+                )
+            except Exception:
+                logger.error(
+                    "CRITICAL: Failed to refund %d credits to user %s "
+                    "after Redis reset failure — manual intervention required",
+                    cost,
+                    user_id[:8],
+                    exc_info=True,
+                )
+            if refunded:
+                raise HTTPException(
+                    status_code=503,
+                    detail="Rate limit reset failed — please try again later. "
+                    "Your credits have not been charged.",
+                )
+            raise HTTPException(
+                status_code=503,
+                detail="Rate limit reset failed and the automatic refund "
+                "also failed. Please contact support for assistance.",
+            )
+
+        # Track the reset count for daily cap enforcement.
+        await increment_daily_reset_count(user_id)
+    finally:
+        await release_reset_lock(user_id)
+
+    # Return updated usage status.
+    updated_usage = await get_usage_status(
+        user_id=user_id,
+        daily_token_limit=daily_limit,
+        weekly_token_limit=weekly_limit,
+        rate_limit_reset_cost=config.rate_limit_reset_cost,
+        tier=tier,
+    )
+
+    return RateLimitResetResponse(
+        success=True,
+        credits_charged=cost,
+        remaining_balance=remaining,
+        usage=updated_usage,
    )


@@ -526,12 +747,16 @@ async def stream_chat_post(

    # Pre-turn rate limit check (token-based).
    # check_rate_limit short-circuits internally when both limits are 0.
+    # Global defaults sourced from LaunchDarkly, falling back to config.
    if user_id:
        try:
+            daily_limit, weekly_limit, _ = await get_global_rate_limits(
+                user_id, config.daily_token_limit, config.weekly_token_limit
+            )
            await check_rate_limit(
                user_id=user_id,
-                daily_token_limit=config.daily_token_limit,
-                weekly_token_limit=config.weekly_token_limit,
+                daily_token_limit=daily_limit,
+                weekly_token_limit=weekly_limit,
            )
        except RateLimitExceeded as e:
            raise HTTPException(status_code=429, detail=str(e)) from e
@@ -620,6 +845,7 @@ async def stream_chat_post(
        is_user_message=request.is_user_message,
        context=request.context,
        file_ids=sanitized_file_ids,
+        mode=request.mode,
    )

    setup_time = (time.perf_counter() - stream_start_time) * 1000
@@ -894,6 +1120,47 @@ async def session_assign_user(
    return {"status": "ok"}


+# ========== Suggested Prompts ==========
+
+
+class SuggestedTheme(BaseModel):
+    """A themed group of suggested prompts."""
+
+    name: str
+    prompts: list[str]
+
+
+class SuggestedPromptsResponse(BaseModel):
+    """Response model for user-specific suggested prompts grouped by theme."""
+
+    themes: list[SuggestedTheme]
+
+
+@router.get(
+    "/suggested-prompts",
+    dependencies=[Security(auth.requires_user)],
+)
+async def get_suggested_prompts(
+    user_id: Annotated[str, Security(auth.get_user_id)],
+) -> SuggestedPromptsResponse:
+    """
+    Get LLM-generated suggested prompts grouped by theme.
+
+    Returns personalized quick-action prompts based on the user's
+    business understanding. Returns empty themes list if no custom
+    prompts are available.
+    """
+    understanding = await get_business_understanding(user_id)
+    if understanding is None or not understanding.suggested_prompts:
+        return SuggestedPromptsResponse(themes=[])
+
+    themes = [
+        SuggestedTheme(name=name, prompts=prompts)
+        for name, prompts in understanding.suggested_prompts.items()
+    ]
+    return SuggestedPromptsResponse(themes=themes)
+
+
 # ========== Configuration ==========


@@ -942,7 +1209,7 @@ async def health_check() -> dict:
    )

    # Create and retrieve session to verify full data layer
-    session = await create_chat_session(health_check_user_id)
+    session = await create_chat_session(health_check_user_id, dry_run=False)
    await get_chat_session(session.session_id, health_check_user_id)

    return {
--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -1,7 +1,7 @@
 """Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""

 from datetime import UTC, datetime, timedelta
-from unittest.mock import AsyncMock
+from unittest.mock import AsyncMock, MagicMock

 import fastapi
 import fastapi.testclient
@@ -9,6 +9,7 @@ import pytest
 import pytest_mock

 from backend.api.features.chat import routes as chat_routes
+from backend.copilot.rate_limit import SubscriptionTier

 app = fastapi.FastAPI()
 app.include_router(chat_routes.router)
@@ -331,14 +332,28 @@ def _mock_usage(
    *,
    daily_used: int = 500,
    weekly_used: int = 2000,
+    daily_limit: int = 10000,
+    weekly_limit: int = 50000,
+    tier: "SubscriptionTier" = SubscriptionTier.FREE,
 ) -> AsyncMock:
-    """Mock get_usage_status to return a predictable CoPilotUsageStatus."""
+    """Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
+
+    Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
+    ``get_usage_status`` so that tests exercise the endpoint without hitting
+    LaunchDarkly or Prisma.
+    """
    from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow

+    mocker.patch(
+        "backend.api.features.chat.routes.get_global_rate_limits",
+        new_callable=AsyncMock,
+        return_value=(daily_limit, weekly_limit, tier),
+    )
+
    resets_at = datetime.now(UTC) + timedelta(days=1)
    status = CoPilotUsageStatus(
-        daily=UsageWindow(used=daily_used, limit=10000, resets_at=resets_at),
-        weekly=UsageWindow(used=weekly_used, limit=50000, resets_at=resets_at),
+        daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
+        weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
    )
    return mocker.patch(
        "backend.api.features.chat.routes.get_usage_status",
@@ -368,6 +383,8 @@ def test_usage_returns_daily_and_weekly(
        user_id=test_user_id,
        daily_token_limit=10000,
        weekly_token_limit=50000,
+        rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
+        tier=SubscriptionTier.FREE,
    )


@@ -375,11 +392,10 @@ def test_usage_uses_config_limits(
    mocker: pytest_mock.MockerFixture,
    test_user_id: str,
 ) -> None:
-    """The endpoint forwards daily_token_limit and weekly_token_limit from config."""
-    mock_get = _mock_usage(mocker)
+    """The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
+    mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)

-    mocker.patch.object(chat_routes.config, "daily_token_limit", 99999)
-    mocker.patch.object(chat_routes.config, "weekly_token_limit", 77777)
+    mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)

    response = client.get("/usage")

@@ -388,6 +404,8 @@ def test_usage_uses_config_limits(
        user_id=test_user_id,
        daily_token_limit=99999,
        weekly_token_limit=77777,
+        rate_limit_reset_cost=500,
+        tier=SubscriptionTier.FREE,
    )


@@ -400,3 +418,126 @@ def test_usage_rejects_unauthenticated_request() -> None:
    response = unauthenticated_client.get("/usage")

    assert response.status_code == 401
+
+
+# ─── Suggested prompts endpoint ──────────────────────────────────────
+
+
+def _mock_get_business_understanding(
+    mocker: pytest_mock.MockerFixture,
+    *,
+    return_value=None,
+):
+    """Mock get_business_understanding."""
+    return mocker.patch(
+        "backend.api.features.chat.routes.get_business_understanding",
+        new_callable=AsyncMock,
+        return_value=return_value,
+    )
+
+
+def test_suggested_prompts_returns_themes(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """User with themed prompts gets them back as themes list."""
+    mock_understanding = MagicMock()
+    mock_understanding.suggested_prompts = {
+        "Learn": ["L1", "L2"],
+        "Create": ["C1"],
+    }
+    _mock_get_business_understanding(mocker, return_value=mock_understanding)
+
+    response = client.get("/suggested-prompts")
+
+    assert response.status_code == 200
+    data = response.json()
+    assert "themes" in data
+    themes_by_name = {t["name"]: t["prompts"] for t in data["themes"]}
+    assert themes_by_name["Learn"] == ["L1", "L2"]
+    assert themes_by_name["Create"] == ["C1"]
+
+
+def test_suggested_prompts_no_understanding(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """User with no understanding gets empty themes list."""
+    _mock_get_business_understanding(mocker, return_value=None)
+
+    response = client.get("/suggested-prompts")
+
+    assert response.status_code == 200
+    assert response.json() == {"themes": []}
+
+
+def test_suggested_prompts_empty_prompts(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """User with understanding but empty prompts gets empty themes list."""
+    mock_understanding = MagicMock()
+    mock_understanding.suggested_prompts = {}
+    _mock_get_business_understanding(mocker, return_value=mock_understanding)
+
+    response = client.get("/suggested-prompts")
+
+    assert response.status_code == 200
+    assert response.json() == {"themes": []}
+
+
+# ─── Create session: dry_run contract ─────────────────────────────────
+
+
+def _mock_create_chat_session(mocker: pytest_mock.MockerFixture):
+    """Mock create_chat_session to return a fake session."""
+    from backend.copilot.model import ChatSession
+
+    async def _fake_create(user_id: str, *, dry_run: bool):
+        return ChatSession.new(user_id, dry_run=dry_run)
+
+    return mocker.patch(
+        "backend.api.features.chat.routes.create_chat_session",
+        new_callable=AsyncMock,
+        side_effect=_fake_create,
+    )
+
+
+def test_create_session_dry_run_true(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """Sending ``{"dry_run": true}`` sets metadata.dry_run to True."""
+    _mock_create_chat_session(mocker)
+
+    response = client.post("/sessions", json={"dry_run": True})
+
+    assert response.status_code == 200
+    assert response.json()["metadata"]["dry_run"] is True
+
+
+def test_create_session_dry_run_default_false(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """Empty body defaults dry_run to False."""
+    _mock_create_chat_session(mocker)
+
+    response = client.post("/sessions")
+
+    assert response.status_code == 200
+    assert response.json()["metadata"]["dry_run"] is False
+
+
+def test_create_session_rejects_nested_metadata(
+    test_user_id: str,
+) -> None:
+    """Sending ``{"metadata": {"dry_run": true}}`` must return 422, not silently
+    default to ``dry_run=False``. This guards against the common mistake of
+    nesting dry_run inside metadata instead of providing it at the top level."""
+    response = client.post(
+        "/sessions",
+        json={"metadata": {"dry_run": True}},
+    )
+
+    assert response.status_code == 422
--- a/autogpt_platform/backend/backend/api/features/integrations/conftest.py
+++ b/autogpt_platform/backend/backend/api/features/integrations/conftest.py
@@ -0,0 +1,13 @@
+"""Override session-scoped fixtures so unit tests run without the server."""
+
+import pytest
+
+
+@pytest.fixture(scope="session")
+def server():
+    yield None
+
+
+@pytest.fixture(scope="session", autouse=True)
+def graph_cleanup():
+    yield
--- a/autogpt_platform/backend/backend/api/features/integrations/router.py
+++ b/autogpt_platform/backend/backend/api/features/integrations/router.py
@@ -34,16 +34,21 @@ from backend.data.model import (
    HostScopedCredentials,
    OAuth2Credentials,
    UserIntegrations,
+    is_sdk_default,
 )
 from backend.data.onboarding import OnboardingStep, complete_onboarding_step
 from backend.data.user import get_user_integrations
 from backend.executor.utils import add_graph_execution
 from backend.integrations.ayrshare import AyrshareClient, SocialPlatform
-from backend.integrations.credentials_store import provider_matches
+from backend.integrations.credentials_store import (
+    is_system_credential,
+    provider_matches,
+)
 from backend.integrations.creds_manager import (
    IntegrationCredentialsManager,
    create_mcp_oauth_handler,
 )
+from backend.integrations.managed_credentials import ensure_managed_credentials
 from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
 from backend.integrations.providers import ProviderName
 from backend.integrations.webhooks import get_webhook_manager
@@ -109,6 +114,7 @@ class CredentialsMetaResponse(BaseModel):
        default=None,
        description="Host pattern for host-scoped or MCP server URL for MCP credentials",
    )
+    is_managed: bool = False

    @model_validator(mode="before")
    @classmethod
@@ -138,6 +144,19 @@ class CredentialsMetaResponse(BaseModel):
        return None


+def to_meta_response(cred: Credentials) -> CredentialsMetaResponse:
+    return CredentialsMetaResponse(
+        id=cred.id,
+        provider=cred.provider,
+        type=cred.type,
+        title=cred.title,
+        scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
+        username=cred.username if isinstance(cred, OAuth2Credentials) else None,
+        host=CredentialsMetaResponse.get_host(cred),
+        is_managed=cred.is_managed,
+    )
+
+
@router.post("/{provider}/callback", summary="Exchange OAuth code for tokens")
 async def callback(
    provider: Annotated[
@@ -204,34 +223,20 @@ async def callback(
        f"and provider {provider.value}"
    )

-    return CredentialsMetaResponse(
-        id=credentials.id,
-        provider=credentials.provider,
-        type=credentials.type,
-        title=credentials.title,
-        scopes=credentials.scopes,
-        username=credentials.username,
-        host=(CredentialsMetaResponse.get_host(credentials)),
-    )
+    return to_meta_response(credentials)


@router.get("/credentials", summary="List Credentials")
 async def list_credentials(
    user_id: Annotated[str, Security(get_user_id)],
 ) -> list[CredentialsMetaResponse]:
+    # Fire-and-forget: provision missing managed credentials in the background.
+    # The credential appears on the next page load; listing is never blocked.
+    asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
    credentials = await creds_manager.store.get_all_creds(user_id)

    return [
-        CredentialsMetaResponse(
-            id=cred.id,
-            provider=cred.provider,
-            type=cred.type,
-            title=cred.title,
-            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
-            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
-            host=CredentialsMetaResponse.get_host(cred),
-        )
-        for cred in credentials
+        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
    ]


@@ -242,19 +247,11 @@ async def list_credentials_by_provider(
    ],
    user_id: Annotated[str, Security(get_user_id)],
 ) -> list[CredentialsMetaResponse]:
+    asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
    credentials = await creds_manager.store.get_creds_by_provider(user_id, provider)

    return [
-        CredentialsMetaResponse(
-            id=cred.id,
-            provider=cred.provider,
-            type=cred.type,
-            title=cred.title,
-            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
-            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
-            host=CredentialsMetaResponse.get_host(cred),
-        )
-        for cred in credentials
+        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
    ]


@@ -267,18 +264,21 @@ async def get_credential(
    ],
    cred_id: Annotated[str, Path(title="The ID of the credentials to retrieve")],
    user_id: Annotated[str, Security(get_user_id)],
-) -> Credentials:
+) -> CredentialsMetaResponse:
+    if is_sdk_default(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
+        )
    credential = await creds_manager.get(user_id, cred_id)
    if not credential:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    if credential.provider != provider:
+    if not provider_matches(credential.provider, provider):
        raise HTTPException(
-            status_code=status.HTTP_404_NOT_FOUND,
-            detail="Credentials do not match the specified provider",
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    return credential
+    return to_meta_response(credential)


@router.post("/{provider}/credentials", status_code=201, summary="Create Credentials")
@@ -288,16 +288,22 @@ async def create_credentials(
        ProviderName, Path(title="The provider to create credentials for")
    ],
    credentials: Credentials,
-) -> Credentials:
+) -> CredentialsMetaResponse:
+    if is_sdk_default(credentials.id):
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="Cannot create credentials with a reserved ID",
+        )
    credentials.provider = provider
    try:
        await creds_manager.create(user_id, credentials)
-    except Exception as e:
+    except Exception:
+        logger.exception("Failed to store credentials")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-            detail=f"Failed to store credentials: {str(e)}",
+            detail="Failed to store credentials",
        )
-    return credentials
+    return to_meta_response(credentials)


 class CredentialsDeletionResponse(BaseModel):
@@ -332,15 +338,29 @@ async def delete_credentials(
        bool, Query(title="Whether to proceed if any linked webhooks are still in use")
    ] = False,
 ) -> CredentialsDeletionResponse | CredentialsDeletionNeedsConfirmationResponse:
+    if is_sdk_default(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
+        )
+    if is_system_credential(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="System-managed credentials cannot be deleted",
+        )
    creds = await creds_manager.store.get_creds_by_id(user_id, cred_id)
    if not creds:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    if creds.provider != provider:
+    if not provider_matches(creds.provider, provider):
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
-            detail="Credentials do not match the specified provider",
+            detail="Credentials not found",
+        )
+    if creds.is_managed:
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="AutoGPT-managed credentials cannot be deleted",
        )

    try:
--- a/autogpt_platform/backend/backend/api/features/integrations/router_test.py
+++ b/autogpt_platform/backend/backend/api/features/integrations/router_test.py
@@ -0,0 +1,570 @@
+"""Tests for credentials API security: no secret leakage, SDK defaults filtered."""
+
+from contextlib import asynccontextmanager
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import fastapi
+import fastapi.testclient
+import pytest
+from pydantic import SecretStr
+
+from backend.api.features.integrations.router import router
+from backend.data.model import (
+    APIKeyCredentials,
+    HostScopedCredentials,
+    OAuth2Credentials,
+    UserPasswordCredentials,
+)
+
+app = fastapi.FastAPI()
+app.include_router(router)
+client = fastapi.testclient.TestClient(app)
+
+TEST_USER_ID = "test-user-id"
+
+
+def _make_api_key_cred(cred_id: str = "cred-123", provider: str = "openai"):
+    return APIKeyCredentials(
+        id=cred_id,
+        provider=provider,
+        title="My API Key",
+        api_key=SecretStr("sk-secret-key-value"),
+    )
+
+
+def _make_oauth2_cred(cred_id: str = "cred-456", provider: str = "github"):
+    return OAuth2Credentials(
+        id=cred_id,
+        provider=provider,
+        title="My OAuth",
+        access_token=SecretStr("ghp_secret_token"),
+        refresh_token=SecretStr("ghp_refresh_secret"),
+        scopes=["repo", "user"],
+        username="testuser",
+    )
+
+
+def _make_user_password_cred(cred_id: str = "cred-789", provider: str = "openai"):
+    return UserPasswordCredentials(
+        id=cred_id,
+        provider=provider,
+        title="My Login",
+        username=SecretStr("admin"),
+        password=SecretStr("s3cret-pass"),
+    )
+
+
+def _make_host_scoped_cred(cred_id: str = "cred-host", provider: str = "openai"):
+    return HostScopedCredentials(
+        id=cred_id,
+        provider=provider,
+        title="Host Cred",
+        host="https://api.example.com",
+        headers={"Authorization": SecretStr("Bearer top-secret")},
+    )
+
+
+def _make_sdk_default_cred(provider: str = "openai"):
+    return APIKeyCredentials(
+        id=f"{provider}-default",
+        provider=provider,
+        title=f"{provider} (default)",
+        api_key=SecretStr("sk-platform-secret-key"),
+    )
+
+
+@pytest.fixture(autouse=True)
+def setup_auth(mock_jwt_user):
+    from autogpt_libs.auth.jwt_utils import get_jwt_payload
+
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+    yield
+    app.dependency_overrides.clear()
+
+
+class TestGetCredentialReturnsMetaOnly:
+    """GET /{provider}/credentials/{cred_id} must not return secrets."""
+
+    def test_api_key_credential_no_secret(self):
+        cred = _make_api_key_cred()
+        with (
+            patch.object(router, "dependencies", []),
+            patch("backend.api.features.integrations.router.creds_manager") as mock_mgr,
+        ):
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/openai/credentials/cred-123")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["id"] == "cred-123"
+        assert data["provider"] == "openai"
+        assert data["type"] == "api_key"
+        assert "api_key" not in data
+        assert "sk-secret-key-value" not in str(data)
+
+    def test_oauth2_credential_no_secret(self):
+        cred = _make_oauth2_cred()
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/github/credentials/cred-456")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["id"] == "cred-456"
+        assert data["scopes"] == ["repo", "user"]
+        assert data["username"] == "testuser"
+        assert "access_token" not in data
+        assert "refresh_token" not in data
+        assert "ghp_" not in str(data)
+
+    def test_user_password_credential_no_secret(self):
+        cred = _make_user_password_cred()
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/openai/credentials/cred-789")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["id"] == "cred-789"
+        assert "password" not in data
+        assert "username" not in data or data["username"] is None
+        assert "s3cret-pass" not in str(data)
+        assert "admin" not in str(data)
+
+    def test_host_scoped_credential_no_secret(self):
+        cred = _make_host_scoped_cred()
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/openai/credentials/cred-host")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["id"] == "cred-host"
+        assert data["host"] == "https://api.example.com"
+        assert "headers" not in data
+        assert "top-secret" not in str(data)
+
+    def test_get_credential_wrong_provider_returns_404(self):
+        """Provider mismatch should return generic 404, not leak credential existence."""
+        cred = _make_api_key_cred(provider="openai")
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/github/credentials/cred-123")
+
+        assert resp.status_code == 404
+        assert resp.json()["detail"] == "Credentials not found"
+
+    def test_list_credentials_no_secrets(self):
+        """List endpoint must not leak secrets in any credential."""
+        creds = [_make_api_key_cred(), _make_oauth2_cred()]
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_all_creds = AsyncMock(return_value=creds)
+            resp = client.get("/credentials")
+
+        assert resp.status_code == 200
+        raw = str(resp.json())
+        assert "sk-secret-key-value" not in raw
+        assert "ghp_secret_token" not in raw
+        assert "ghp_refresh_secret" not in raw
+
+
+class TestSdkDefaultCredentialsNotAccessible:
+    """SDK default credentials (ID ending in '-default') must be hidden."""
+
+    def test_get_sdk_default_returns_404(self):
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock()
+            resp = client.get("/openai/credentials/openai-default")
+
+        assert resp.status_code == 404
+        mock_mgr.get.assert_not_called()
+
+    def test_list_credentials_excludes_sdk_defaults(self):
+        user_cred = _make_api_key_cred()
+        sdk_cred = _make_sdk_default_cred("openai")
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_all_creds = AsyncMock(return_value=[user_cred, sdk_cred])
+            resp = client.get("/credentials")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        ids = [c["id"] for c in data]
+        assert "cred-123" in ids
+        assert "openai-default" not in ids
+
+    def test_list_by_provider_excludes_sdk_defaults(self):
+        user_cred = _make_api_key_cred()
+        sdk_cred = _make_sdk_default_cred("openai")
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_creds_by_provider = AsyncMock(
+                return_value=[user_cred, sdk_cred]
+            )
+            resp = client.get("/openai/credentials")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        ids = [c["id"] for c in data]
+        assert "cred-123" in ids
+        assert "openai-default" not in ids
+
+    def test_delete_sdk_default_returns_404(self):
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_creds_by_id = AsyncMock()
+            resp = client.request("DELETE", "/openai/credentials/openai-default")
+
+        assert resp.status_code == 404
+        mock_mgr.store.get_creds_by_id.assert_not_called()
+
+
+class TestCreateCredentialNoSecretInResponse:
+    """POST /{provider}/credentials must not return secrets."""
+
+    def test_create_api_key_no_secret_in_response(self):
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.create = AsyncMock()
+            resp = client.post(
+                "/openai/credentials",
+                json={
+                    "id": "new-cred",
+                    "provider": "openai",
+                    "type": "api_key",
+                    "title": "New Key",
+                    "api_key": "sk-newsecret",
+                },
+            )
+
+        assert resp.status_code == 201
+        data = resp.json()
+        assert data["id"] == "new-cred"
+        assert "api_key" not in data
+        assert "sk-newsecret" not in str(data)
+
+    def test_create_with_sdk_default_id_rejected(self):
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.create = AsyncMock()
+            resp = client.post(
+                "/openai/credentials",
+                json={
+                    "id": "openai-default",
+                    "provider": "openai",
+                    "type": "api_key",
+                    "title": "Sneaky",
+                    "api_key": "sk-evil",
+                },
+            )
+
+        assert resp.status_code == 403
+        mock_mgr.create.assert_not_called()
+
+
+class TestManagedCredentials:
+    """AutoGPT-managed credentials cannot be deleted by users."""
+
+    def test_delete_is_managed_returns_403(self):
+        cred = APIKeyCredentials(
+            id="managed-cred-1",
+            provider="agent_mail",
+            title="AgentMail (managed by AutoGPT)",
+            api_key=SecretStr("sk-managed-key"),
+            is_managed=True,
+        )
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_creds_by_id = AsyncMock(return_value=cred)
+            resp = client.request("DELETE", "/agent_mail/credentials/managed-cred-1")
+
+        assert resp.status_code == 403
+        assert "AutoGPT-managed" in resp.json()["detail"]
+
+    def test_list_credentials_includes_is_managed_field(self):
+        managed = APIKeyCredentials(
+            id="managed-1",
+            provider="agent_mail",
+            title="AgentMail (managed)",
+            api_key=SecretStr("sk-key"),
+            is_managed=True,
+        )
+        regular = APIKeyCredentials(
+            id="regular-1",
+            provider="openai",
+            title="My Key",
+            api_key=SecretStr("sk-key"),
+        )
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_all_creds = AsyncMock(return_value=[managed, regular])
+            resp = client.get("/credentials")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        managed_cred = next(c for c in data if c["id"] == "managed-1")
+        regular_cred = next(c for c in data if c["id"] == "regular-1")
+        assert managed_cred["is_managed"] is True
+        assert regular_cred["is_managed"] is False
+
+
+# ---------------------------------------------------------------------------
+# Managed credential provisioning infrastructure
+# ---------------------------------------------------------------------------
+
+
+def _make_managed_cred(
+    provider: str = "agent_mail", pod_id: str = "pod-abc"
+) -> APIKeyCredentials:
+    return APIKeyCredentials(
+        id="managed-auto",
+        provider=provider,
+        title="AgentMail (managed by AutoGPT)",
+        api_key=SecretStr("sk-pod-key"),
+        is_managed=True,
+        metadata={"pod_id": pod_id},
+    )
+
+
+def _make_store_mock(**kwargs) -> MagicMock:
+    """Create a store mock with a working async ``locks()`` context manager."""
+
+    @asynccontextmanager
+    async def _noop_locked(key):
+        yield
+
+    locks_obj = MagicMock()
+    locks_obj.locked = _noop_locked
+
+    store = MagicMock(**kwargs)
+    store.locks = AsyncMock(return_value=locks_obj)
+    return store
+
+
+class TestEnsureManagedCredentials:
+    """Unit tests for the ensure/cleanup helpers in managed_credentials.py."""
+
+    @pytest.mark.asyncio
+    async def test_provisions_when_missing(self):
+        """Provider.provision() is called when no managed credential exists."""
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            _provisioned_users,
+            ensure_managed_credentials,
+        )
+
+        cred = _make_managed_cred()
+        provider = MagicMock()
+        provider.provider_name = "test_provider"
+        provider.is_available = AsyncMock(return_value=True)
+        provider.provision = AsyncMock(return_value=cred)
+
+        store = _make_store_mock()
+        store.has_managed_credential = AsyncMock(return_value=False)
+        store.add_managed_credential = AsyncMock()
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["test_provider"] = provider
+        _provisioned_users.pop("user-1", None)
+        try:
+            await ensure_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+            _provisioned_users.pop("user-1", None)
+
+        provider.provision.assert_awaited_once_with("user-1")
+        store.add_managed_credential.assert_awaited_once_with("user-1", cred)
+
+    @pytest.mark.asyncio
+    async def test_skips_when_already_exists(self):
+        """Provider.provision() is NOT called when managed credential exists."""
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            _provisioned_users,
+            ensure_managed_credentials,
+        )
+
+        provider = MagicMock()
+        provider.provider_name = "test_provider"
+        provider.is_available = AsyncMock(return_value=True)
+        provider.provision = AsyncMock()
+
+        store = _make_store_mock()
+        store.has_managed_credential = AsyncMock(return_value=True)
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["test_provider"] = provider
+        _provisioned_users.pop("user-1", None)
+        try:
+            await ensure_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+            _provisioned_users.pop("user-1", None)
+
+        provider.provision.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_skips_when_unavailable(self):
+        """Provider.provision() is NOT called when provider is not available."""
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            _provisioned_users,
+            ensure_managed_credentials,
+        )
+
+        provider = MagicMock()
+        provider.provider_name = "test_provider"
+        provider.is_available = AsyncMock(return_value=False)
+        provider.provision = AsyncMock()
+
+        store = _make_store_mock()
+        store.has_managed_credential = AsyncMock()
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["test_provider"] = provider
+        _provisioned_users.pop("user-1", None)
+        try:
+            await ensure_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+            _provisioned_users.pop("user-1", None)
+
+        provider.provision.assert_not_awaited()
+        store.has_managed_credential.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_provision_failure_does_not_propagate(self):
+        """A failed provision is logged but does not raise."""
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            _provisioned_users,
+            ensure_managed_credentials,
+        )
+
+        provider = MagicMock()
+        provider.provider_name = "test_provider"
+        provider.is_available = AsyncMock(return_value=True)
+        provider.provision = AsyncMock(side_effect=RuntimeError("boom"))
+
+        store = _make_store_mock()
+        store.has_managed_credential = AsyncMock(return_value=False)
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["test_provider"] = provider
+        _provisioned_users.pop("user-1", None)
+        try:
+            await ensure_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+            _provisioned_users.pop("user-1", None)
+
+        # No exception raised — provisioning failure is swallowed.
+
+
+class TestCleanupManagedCredentials:
+    """Unit tests for cleanup_managed_credentials."""
+
+    @pytest.mark.asyncio
+    async def test_calls_deprovision_for_managed_creds(self):
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            cleanup_managed_credentials,
+        )
+
+        cred = _make_managed_cred()
+        provider = MagicMock()
+        provider.provider_name = "agent_mail"
+        provider.deprovision = AsyncMock()
+
+        store = MagicMock()
+        store.get_all_creds = AsyncMock(return_value=[cred])
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["agent_mail"] = provider
+        try:
+            await cleanup_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+
+        provider.deprovision.assert_awaited_once_with("user-1", cred)
+
+    @pytest.mark.asyncio
+    async def test_skips_non_managed_creds(self):
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            cleanup_managed_credentials,
+        )
+
+        regular = _make_api_key_cred()
+        provider = MagicMock()
+        provider.provider_name = "openai"
+        provider.deprovision = AsyncMock()
+
+        store = MagicMock()
+        store.get_all_creds = AsyncMock(return_value=[regular])
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["openai"] = provider
+        try:
+            await cleanup_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+
+        provider.deprovision.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_deprovision_failure_does_not_propagate(self):
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            cleanup_managed_credentials,
+        )
+
+        cred = _make_managed_cred()
+        provider = MagicMock()
+        provider.provider_name = "agent_mail"
+        provider.deprovision = AsyncMock(side_effect=RuntimeError("boom"))
+
+        store = MagicMock()
+        store.get_all_creds = AsyncMock(return_value=[cred])
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["agent_mail"] = provider
+        try:
+            await cleanup_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+
+        # No exception raised — cleanup failure is swallowed.
--- a/autogpt_platform/backend/backend/api/features/library/_add_to_library.py
+++ b/autogpt_platform/backend/backend/api/features/library/_add_to_library.py
@@ -0,0 +1,120 @@
+"""Shared logic for adding store agents to a user's library.
+
+Both `add_store_agent_to_library` and `add_store_agent_to_library_as_admin`
+delegate to these helpers so the duplication-prone create/restore/dedup
+logic lives in exactly one place.
+"""
+
+import logging
+
+import prisma.errors
+import prisma.models
+
+import backend.api.features.library.model as library_model
+import backend.data.graph as graph_db
+from backend.data.graph import GraphModel, GraphSettings
+from backend.data.includes import library_agent_include
+from backend.util.exceptions import NotFoundError
+from backend.util.json import SafeJson
+
+logger = logging.getLogger(__name__)
+
+
+async def resolve_graph_for_library(
+    store_listing_version_id: str,
+    user_id: str,
+    *,
+    admin: bool,
+) -> GraphModel:
+    """Look up a StoreListingVersion and resolve its graph.
+
+    When ``admin=True``, uses ``get_graph_as_admin`` to bypass the marketplace
+    APPROVED-only check.  Otherwise uses the regular ``get_graph``.
+    """
+    slv = await prisma.models.StoreListingVersion.prisma().find_unique(
+        where={"id": store_listing_version_id}, include={"AgentGraph": True}
+    )
+    if not slv or not slv.AgentGraph:
+        raise NotFoundError(
+            f"Store listing version {store_listing_version_id} not found or invalid"
+        )
+
+    ag = slv.AgentGraph
+    if admin:
+        graph_model = await graph_db.get_graph_as_admin(
+            graph_id=ag.id, version=ag.version, user_id=user_id
+        )
+    else:
+        graph_model = await graph_db.get_graph(
+            graph_id=ag.id, version=ag.version, user_id=user_id
+        )
+
+    if not graph_model:
+        raise NotFoundError(f"Graph #{ag.id} v{ag.version} not found or accessible")
+    return graph_model
+
+
+async def add_graph_to_library(
+    store_listing_version_id: str,
+    graph_model: GraphModel,
+    user_id: str,
+) -> library_model.LibraryAgent:
+    """Check existing / restore soft-deleted / create new LibraryAgent.
+
+    Uses a create-then-catch-UniqueViolationError-then-update pattern on
+    the (userId, agentGraphId, agentGraphVersion) composite unique constraint.
+    This is more robust than ``upsert`` because Prisma's upsert atomicity
+    guarantees are not well-documented for all versions.
+    """
+    settings_json = SafeJson(GraphSettings.from_graph(graph_model).model_dump())
+    _include = library_agent_include(
+        user_id, include_nodes=False, include_executions=False
+    )
+
+    try:
+        added_agent = await prisma.models.LibraryAgent.prisma().create(
+            data={
+                "User": {"connect": {"id": user_id}},
+                "AgentGraph": {
+                    "connect": {
+                        "graphVersionId": {
+                            "id": graph_model.id,
+                            "version": graph_model.version,
+                        }
+                    }
+                },
+                "isCreatedByUser": False,
+                "useGraphIsActiveVersion": False,
+                "settings": settings_json,
+            },
+            include=_include,
+        )
+    except prisma.errors.UniqueViolationError:
+        # Already exists — update to restore if previously soft-deleted/archived
+        added_agent = await prisma.models.LibraryAgent.prisma().update(
+            where={
+                "userId_agentGraphId_agentGraphVersion": {
+                    "userId": user_id,
+                    "agentGraphId": graph_model.id,
+                    "agentGraphVersion": graph_model.version,
+                }
+            },
+            data={
+                "isDeleted": False,
+                "isArchived": False,
+                "settings": settings_json,
+            },
+            include=_include,
+        )
+        if added_agent is None:
+            raise NotFoundError(
+                f"LibraryAgent for graph #{graph_model.id} "
+                f"v{graph_model.version} not found after UniqueViolationError"
+            )
+
+    logger.debug(
+        f"Added graph #{graph_model.id} v{graph_model.version} "
+        f"for store listing version #{store_listing_version_id} "
+        f"to library for user #{user_id}"
+    )
+    return library_model.LibraryAgent.from_db(added_agent)
--- a/autogpt_platform/backend/backend/api/features/library/_add_to_library_test.py
+++ b/autogpt_platform/backend/backend/api/features/library/_add_to_library_test.py
@@ -0,0 +1,80 @@
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import prisma.errors
+import pytest
+
+from ._add_to_library import add_graph_to_library
+
+
+@pytest.mark.asyncio
+async def test_add_graph_to_library_create_new_agent() -> None:
+    """When no matching LibraryAgent exists, create inserts a new one."""
+    graph_model = MagicMock(id="graph-id", version=2, nodes=[])
+    created_agent = MagicMock(name="CreatedLibraryAgent")
+    converted_agent = MagicMock(name="ConvertedLibraryAgent")
+
+    with (
+        patch(
+            "backend.api.features.library._add_to_library.prisma.models.LibraryAgent.prisma"
+        ) as mock_prisma,
+        patch(
+            "backend.api.features.library._add_to_library.library_model.LibraryAgent.from_db",
+            return_value=converted_agent,
+        ) as mock_from_db,
+    ):
+        mock_prisma.return_value.create = AsyncMock(return_value=created_agent)
+
+        result = await add_graph_to_library("slv-id", graph_model, "user-id")
+
+    assert result is converted_agent
+    mock_from_db.assert_called_once_with(created_agent)
+    # Verify create was called with correct data
+    create_call = mock_prisma.return_value.create.call_args
+    create_data = create_call.kwargs["data"]
+    assert create_data["User"] == {"connect": {"id": "user-id"}}
+    assert create_data["AgentGraph"] == {
+        "connect": {"graphVersionId": {"id": "graph-id", "version": 2}}
+    }
+    assert create_data["isCreatedByUser"] is False
+    assert create_data["useGraphIsActiveVersion"] is False
+
+
+@pytest.mark.asyncio
+async def test_add_graph_to_library_unique_violation_updates_existing() -> None:
+    """UniqueViolationError on create falls back to update."""
+    graph_model = MagicMock(id="graph-id", version=2, nodes=[])
+    updated_agent = MagicMock(name="UpdatedLibraryAgent")
+    converted_agent = MagicMock(name="ConvertedLibraryAgent")
+
+    with (
+        patch(
+            "backend.api.features.library._add_to_library.prisma.models.LibraryAgent.prisma"
+        ) as mock_prisma,
+        patch(
+            "backend.api.features.library._add_to_library.library_model.LibraryAgent.from_db",
+            return_value=converted_agent,
+        ) as mock_from_db,
+    ):
+        mock_prisma.return_value.create = AsyncMock(
+            side_effect=prisma.errors.UniqueViolationError(
+                MagicMock(), message="unique constraint"
+            )
+        )
+        mock_prisma.return_value.update = AsyncMock(return_value=updated_agent)
+
+        result = await add_graph_to_library("slv-id", graph_model, "user-id")
+
+    assert result is converted_agent
+    mock_from_db.assert_called_once_with(updated_agent)
+    # Verify update was called with correct where and data
+    update_call = mock_prisma.return_value.update.call_args
+    assert update_call.kwargs["where"] == {
+        "userId_agentGraphId_agentGraphVersion": {
+            "userId": "user-id",
+            "agentGraphId": "graph-id",
+            "agentGraphVersion": 2,
+        }
+    }
+    update_data = update_call.kwargs["data"]
+    assert update_data["isDeleted"] is False
+    assert update_data["isArchived"] is False
--- a/autogpt_platform/backend/backend/api/features/library/db.py
+++ b/autogpt_platform/backend/backend/api/features/library/db.py
@@ -336,12 +336,15 @@ async def get_library_agent_by_graph_id(
    user_id: str,
    graph_id: str,
    graph_version: Optional[int] = None,
+    include_archived: bool = False,
 ) -> library_model.LibraryAgent | None:
    filter: prisma.types.LibraryAgentWhereInput = {
        "agentGraphId": graph_id,
        "userId": user_id,
        "isDeleted": False,
    }
+    if not include_archived:
+        filter["isArchived"] = False
    if graph_version is not None:
        filter["agentGraphVersion"] = graph_version

@@ -433,32 +436,58 @@ async def create_library_agent(
    async with transaction() as tx:
        library_agents = await asyncio.gather(
            *(
-                prisma.models.LibraryAgent.prisma(tx).create(
-                    data=prisma.types.LibraryAgentCreateInput(
-                        isCreatedByUser=(user_id == user_id),
-                        useGraphIsActiveVersion=True,
-                        User={"connect": {"id": user_id}},
-                        AgentGraph={
-                            "connect": {
-                                "graphVersionId": {
-                                    "id": graph_entry.id,
-                                    "version": graph_entry.version,
+                prisma.models.LibraryAgent.prisma(tx).upsert(
+                    where={
+                        "userId_agentGraphId_agentGraphVersion": {
+                            "userId": user_id,
+                            "agentGraphId": graph_entry.id,
+                            "agentGraphVersion": graph_entry.version,
+                        }
+                    },
+                    data={
+                        "create": prisma.types.LibraryAgentCreateInput(
+                            isCreatedByUser=(user_id == graph.user_id),
+                            useGraphIsActiveVersion=True,
+                            User={"connect": {"id": user_id}},
+                            AgentGraph={
+                                "connect": {
+                                    "graphVersionId": {
+                                        "id": graph_entry.id,
+                                        "version": graph_entry.version,
+                                    }
                                }
-                            }
+                            },
+                            settings=SafeJson(
+                                GraphSettings.from_graph(
+                                    graph_entry,
+                                    hitl_safe_mode=hitl_safe_mode,
+                                    sensitive_action_safe_mode=sensitive_action_safe_mode,
+                                ).model_dump()
+                            ),
+                            **(
+                                {"Folder": {"connect": {"id": folder_id}}}
+                                if folder_id and graph_entry is graph
+                                else {}
+                            ),
+                        ),
+                        "update": {
+                            "isDeleted": False,
+                            "isArchived": False,
+                            "useGraphIsActiveVersion": True,
+                            "settings": SafeJson(
+                                GraphSettings.from_graph(
+                                    graph_entry,
+                                    hitl_safe_mode=hitl_safe_mode,
+                                    sensitive_action_safe_mode=sensitive_action_safe_mode,
+                                ).model_dump()
+                            ),
+                            **(
+                                {"Folder": {"connect": {"id": folder_id}}}
+                                if folder_id and graph_entry is graph
+                                else {}
+                            ),
                        },
-                        settings=SafeJson(
-                            GraphSettings.from_graph(
-                                graph_entry,
-                                hitl_safe_mode=hitl_safe_mode,
-                                sensitive_action_safe_mode=sensitive_action_safe_mode,
-                            ).model_dump()
-                        ),
-                        **(
-                            {"Folder": {"connect": {"id": folder_id}}}
-                            if folder_id and graph_entry is graph
-                            else {}
-                        ),
-                    ),
+                    },
                    include=library_agent_include(
                        user_id, include_nodes=False, include_executions=False
                    ),
@@ -582,7 +611,9 @@ async def update_graph_in_library(

    created_graph = await graph_db.create_graph(graph_model, user_id)

-    library_agent = await get_library_agent_by_graph_id(user_id, created_graph.id)
+    library_agent = await get_library_agent_by_graph_id(
+        user_id, created_graph.id, include_archived=True
+    )
    if not library_agent:
        raise NotFoundError(f"Library agent not found for graph {created_graph.id}")

@@ -818,92 +849,38 @@ async def delete_library_agent_by_graph_id(graph_id: str, user_id: str) -> None:
 async def add_store_agent_to_library(
    store_listing_version_id: str, user_id: str
 ) -> library_model.LibraryAgent:
+    """Adds a marketplace agent to the user’s library.
+
+    See also: `add_store_agent_to_library_as_admin()` which uses
+    `get_graph_as_admin` to bypass marketplace status checks for admin review.
    """
-    Adds an agent from a store listing version to the user's library if they don't already have it.
+    from ._add_to_library import add_graph_to_library, resolve_graph_for_library

-    Args:
-        store_listing_version_id: The ID of the store listing version containing the agent.
-        user_id: The user’s library to which the agent is being added.
-
-    Returns:
-        The newly created LibraryAgent if successfully added, the existing corresponding one if any.
-
-    Raises:
-        NotFoundError: If the store listing or associated agent is not found.
-        DatabaseError: If there's an issue creating the LibraryAgent record.
-    """
    logger.debug(
        f"Adding agent from store listing version #{store_listing_version_id} "
        f"to library for user #{user_id}"
    )
-
-    store_listing_version = (
-        await prisma.models.StoreListingVersion.prisma().find_unique(
-            where={"id": store_listing_version_id}, include={"AgentGraph": True}
-        )
+    graph_model = await resolve_graph_for_library(
+        store_listing_version_id, user_id, admin=False
    )
-    if not store_listing_version or not store_listing_version.AgentGraph:
-        logger.warning(f"Store listing version not found: {store_listing_version_id}")
-        raise NotFoundError(
-            f"Store listing version {store_listing_version_id} not found or invalid"
-        )
+    return await add_graph_to_library(store_listing_version_id, graph_model, user_id)

-    graph = store_listing_version.AgentGraph

-    # Convert to GraphModel to check for HITL blocks
-    graph_model = await graph_db.get_graph(
-        graph_id=graph.id,
-        version=graph.version,
-        user_id=user_id,
-        include_subgraphs=False,
+async def add_store_agent_to_library_as_admin(
+    store_listing_version_id: str, user_id: str
+) -> library_model.LibraryAgent:
+    """Admin variant that uses `get_graph_as_admin` to bypass marketplace
+    APPROVED-only checks, allowing admins to add pending agents for review."""
+    from ._add_to_library import add_graph_to_library, resolve_graph_for_library
+
+    logger.warning(
+        f"ADMIN adding agent from store listing version "
+        f"#{store_listing_version_id} to library for user #{user_id}"
    )
-    if not graph_model:
-        raise NotFoundError(
-            f"Graph #{graph.id} v{graph.version} not found or accessible"
-        )
-
-    # Check if user already has this agent (non-deleted)
-    if existing := await get_library_agent_by_graph_id(
-        user_id, graph.id, graph.version
-    ):
-        return existing
-
-    # Check for soft-deleted version and restore it
-    deleted_agent = await prisma.models.LibraryAgent.prisma().find_unique(
-        where={
-            "userId_agentGraphId_agentGraphVersion": {
-                "userId": user_id,
-                "agentGraphId": graph.id,
-                "agentGraphVersion": graph.version,
-            }
-        },
+    graph_model = await resolve_graph_for_library(
+        store_listing_version_id, user_id, admin=True
    )
-    if deleted_agent and deleted_agent.isDeleted:
-        return await update_library_agent(deleted_agent.id, user_id, is_deleted=False)
-
-    # Create LibraryAgent entry
-    added_agent = await prisma.models.LibraryAgent.prisma().create(
-        data={
-            "User": {"connect": {"id": user_id}},
-            "AgentGraph": {
-                "connect": {
-                    "graphVersionId": {"id": graph.id, "version": graph.version}
-                }
-            },
-            "isCreatedByUser": False,
-            "useGraphIsActiveVersion": False,
-            "settings": SafeJson(GraphSettings.from_graph(graph_model).model_dump()),
-        },
-        include=library_agent_include(
-            user_id, include_nodes=False, include_executions=False
-        ),
-    )
-    logger.debug(
-        f"Added graph #{graph.id} v{graph.version}"
-        f"for store listing version #{store_listing_version.id} "
-        f"to library for user #{user_id}"
-    )
-    return library_model.LibraryAgent.from_db(added_agent)
+    return await add_graph_to_library(store_listing_version_id, graph_model, user_id)


 ##############################################
--- a/autogpt_platform/backend/backend/api/features/library/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/library/db_test.py
@@ -1,4 +1,6 @@
+from contextlib import asynccontextmanager
 from datetime import datetime
+from unittest.mock import AsyncMock, MagicMock, patch

 import prisma.enums
 import prisma.models
@@ -85,10 +87,6 @@ async def test_get_library_agents(mocker):
 async def test_add_agent_to_library(mocker):
    await connect()

-    # Mock the transaction context
-    mock_transaction = mocker.patch("backend.api.features.library.db.transaction")
-    mock_transaction.return_value.__aenter__ = mocker.AsyncMock(return_value=None)
-    mock_transaction.return_value.__aexit__ = mocker.AsyncMock(return_value=None)
    # Mock data
    mock_store_listing_data = prisma.models.StoreListingVersion(
        id="version123",
@@ -143,15 +141,18 @@ async def test_add_agent_to_library(mocker):
    )

    mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
-    mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
-    mock_library_agent.return_value.find_unique = mocker.AsyncMock(return_value=None)
    mock_library_agent.return_value.create = mocker.AsyncMock(
        return_value=mock_library_agent_data
    )

-    # Mock graph_db.get_graph function that's called to check for HITL blocks
-    mock_graph_db = mocker.patch("backend.api.features.library.db.graph_db")
+    # Mock graph_db.get_graph function that's called in resolve_graph_for_library
+    # (lives in _add_to_library.py after refactor, not db.py)
+    mock_graph_db = mocker.patch(
+        "backend.api.features.library._add_to_library.graph_db"
+    )
    mock_graph_model = mocker.Mock()
+    mock_graph_model.id = "agent1"
+    mock_graph_model.version = 1
    mock_graph_model.nodes = (
        []
    )  # Empty list so _has_human_in_the_loop_blocks returns False
@@ -170,37 +171,27 @@ async def test_add_agent_to_library(mocker):
    mock_store_listing_version.return_value.find_unique.assert_called_once_with(
        where={"id": "version123"}, include={"AgentGraph": True}
    )
-    mock_library_agent.return_value.find_unique.assert_called_once_with(
-        where={
-            "userId_agentGraphId_agentGraphVersion": {
-                "userId": "test-user",
-                "agentGraphId": "agent1",
-                "agentGraphVersion": 1,
-            }
-        },
-    )
    # Check that create was called with the expected data including settings
    create_call_args = mock_library_agent.return_value.create.call_args
    assert create_call_args is not None

-    # Verify the main structure
-    expected_data = {
+    # Verify the create data structure
+    create_data = create_call_args.kwargs["data"]
+    expected_create = {
        "User": {"connect": {"id": "test-user"}},
        "AgentGraph": {"connect": {"graphVersionId": {"id": "agent1", "version": 1}}},
        "isCreatedByUser": False,
+        "useGraphIsActiveVersion": False,
    }
-
-    actual_data = create_call_args[1]["data"]
-    # Check that all expected fields are present
-    for key, value in expected_data.items():
-        assert actual_data[key] == value
+    for key, value in expected_create.items():
+        assert create_data[key] == value

    # Check that settings field is present and is a SafeJson object
-    assert "settings" in actual_data
-    assert hasattr(actual_data["settings"], "__class__")  # Should be a SafeJson object
+    assert "settings" in create_data
+    assert hasattr(create_data["settings"], "__class__")  # Should be a SafeJson object

    # Check include parameter
-    assert create_call_args[1]["include"] == library_agent_include(
+    assert create_call_args.kwargs["include"] == library_agent_include(
        "test-user", include_nodes=False, include_executions=False
    )

@@ -224,3 +215,141 @@ async def test_add_agent_to_library_not_found(mocker):
    mock_store_listing_version.return_value.find_unique.assert_called_once_with(
        where={"id": "version123"}, include={"AgentGraph": True}
    )
+
+
+@pytest.mark.asyncio
+async def test_get_library_agent_by_graph_id_excludes_archived(mocker):
+    mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
+    mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
+
+    result = await db.get_library_agent_by_graph_id("test-user", "agent1", 7)
+
+    assert result is None
+    mock_library_agent.return_value.find_first.assert_called_once()
+    where = mock_library_agent.return_value.find_first.call_args.kwargs["where"]
+    assert where == {
+        "agentGraphId": "agent1",
+        "userId": "test-user",
+        "isDeleted": False,
+        "isArchived": False,
+        "agentGraphVersion": 7,
+    }
+
+
+@pytest.mark.asyncio
+async def test_get_library_agent_by_graph_id_can_include_archived(mocker):
+    mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
+    mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
+
+    result = await db.get_library_agent_by_graph_id(
+        "test-user",
+        "agent1",
+        7,
+        include_archived=True,
+    )
+
+    assert result is None
+    mock_library_agent.return_value.find_first.assert_called_once()
+    where = mock_library_agent.return_value.find_first.call_args.kwargs["where"]
+    assert where == {
+        "agentGraphId": "agent1",
+        "userId": "test-user",
+        "isDeleted": False,
+        "agentGraphVersion": 7,
+    }
+
+
+@pytest.mark.asyncio
+async def test_update_graph_in_library_allows_archived_library_agent(mocker):
+    graph = mocker.Mock(id="graph-id")
+    existing_version = mocker.Mock(version=1, is_active=True)
+    graph_model = mocker.Mock()
+    created_graph = mocker.Mock(id="graph-id", version=2, is_active=False)
+    current_library_agent = mocker.Mock()
+    updated_library_agent = mocker.Mock()
+
+    mocker.patch(
+        "backend.api.features.library.db.graph_db.get_graph_all_versions",
+        new=mocker.AsyncMock(return_value=[existing_version]),
+    )
+    mocker.patch(
+        "backend.api.features.library.db.graph_db.make_graph_model",
+        return_value=graph_model,
+    )
+    mocker.patch(
+        "backend.api.features.library.db.graph_db.create_graph",
+        new=mocker.AsyncMock(return_value=created_graph),
+    )
+    mock_get_library_agent = mocker.patch(
+        "backend.api.features.library.db.get_library_agent_by_graph_id",
+        new=mocker.AsyncMock(return_value=current_library_agent),
+    )
+    mock_update_library_agent = mocker.patch(
+        "backend.api.features.library.db.update_library_agent_version_and_settings",
+        new=mocker.AsyncMock(return_value=updated_library_agent),
+    )
+
+    result_graph, result_library_agent = await db.update_graph_in_library(
+        graph,
+        "test-user",
+    )
+
+    assert result_graph is created_graph
+    assert result_library_agent is updated_library_agent
+    assert graph.version == 2
+    graph_model.reassign_ids.assert_called_once_with(
+        user_id="test-user", reassign_graph_id=False
+    )
+    mock_get_library_agent.assert_awaited_once_with(
+        "test-user",
+        "graph-id",
+        include_archived=True,
+    )
+    mock_update_library_agent.assert_awaited_once_with("test-user", created_graph)
+
+
+@pytest.mark.asyncio
+async def test_create_library_agent_uses_upsert():
+    """create_library_agent should use upsert (not create) to handle duplicates."""
+    mock_graph = MagicMock()
+    mock_graph.id = "graph-1"
+    mock_graph.version = 1
+    mock_graph.user_id = "user-1"
+    mock_graph.nodes = []
+    mock_graph.sub_graphs = []
+
+    mock_upserted = MagicMock(name="UpsertedLibraryAgent")
+
+    @asynccontextmanager
+    async def fake_tx():
+        yield None
+
+    with (
+        patch("backend.api.features.library.db.transaction", fake_tx),
+        patch("prisma.models.LibraryAgent.prisma") as mock_prisma,
+        patch(
+            "backend.api.features.library.db.add_generated_agent_image",
+            new=AsyncMock(),
+        ),
+        patch(
+            "backend.api.features.library.model.LibraryAgent.from_db",
+            return_value=MagicMock(),
+        ),
+    ):
+        mock_prisma.return_value.upsert = AsyncMock(return_value=mock_upserted)
+
+        result = await db.create_library_agent(mock_graph, "user-1")
+
+    assert len(result) == 1
+    upsert_call = mock_prisma.return_value.upsert.call_args
+    assert upsert_call is not None
+    # Verify the upsert where clause uses the composite unique key
+    where = upsert_call.kwargs["where"]
+    assert "userId_agentGraphId_agentGraphVersion" in where
+    # Verify the upsert data has both create and update branches
+    data = upsert_call.kwargs["data"]
+    assert "create" in data
+    assert "update" in data
+    # Verify update branch restores soft-deleted/archived agents
+    assert data["update"]["isDeleted"] is False
+    assert data["update"]["isArchived"] is False
--- a/autogpt_platform/backend/backend/api/features/oauth_test.py
+++ b/autogpt_platform/backend/backend/api/features/oauth_test.py
@@ -12,6 +12,7 @@ Tests cover:
 5. Complete OAuth flow end-to-end
 """

+import asyncio
 import base64
 import hashlib
 import secrets
@@ -58,14 +59,27 @@ async def test_user(server, test_user_id: str):

    yield test_user_id

-    # Cleanup - delete in correct order due to foreign key constraints
-    await PrismaOAuthAccessToken.prisma().delete_many(where={"userId": test_user_id})
-    await PrismaOAuthRefreshToken.prisma().delete_many(where={"userId": test_user_id})
-    await PrismaOAuthAuthorizationCode.prisma().delete_many(
-        where={"userId": test_user_id}
-    )
-    await PrismaOAuthApplication.prisma().delete_many(where={"ownerId": test_user_id})
-    await PrismaUser.prisma().delete(where={"id": test_user_id})
+    # Cleanup - delete in correct order due to foreign key constraints.
+    # Wrap in try/except because the event loop or Prisma engine may already
+    # be closed during session teardown on Python 3.12+.
+    try:
+        await asyncio.gather(
+            PrismaOAuthAccessToken.prisma().delete_many(where={"userId": test_user_id}),
+            PrismaOAuthRefreshToken.prisma().delete_many(
+                where={"userId": test_user_id}
+            ),
+            PrismaOAuthAuthorizationCode.prisma().delete_many(
+                where={"userId": test_user_id}
+            ),
+        )
+        await asyncio.gather(
+            PrismaOAuthApplication.prisma().delete_many(
+                where={"ownerId": test_user_id}
+            ),
+            PrismaUser.prisma().delete(where={"id": test_user_id}),
+        )
+    except RuntimeError:
+        pass


@pytest_asyncio.fixture
--- a/autogpt_platform/backend/backend/api/features/store/db.py
+++ b/autogpt_platform/backend/backend/api/features/store/db.py
@@ -391,6 +391,11 @@ async def get_available_graph(
 async def get_store_agent_by_version_id(
    store_listing_version_id: str,
 ) -> store_model.StoreAgentDetails:
+    """Get agent details from the StoreAgent view (APPROVED agents only).
+
+    See also: `get_store_agent_details_as_admin()` which bypasses the
+    APPROVED-only StoreAgent view for admin preview of pending submissions.
+    """
    logger.debug(f"Getting store agent details for {store_listing_version_id}")

    try:
@@ -411,6 +416,57 @@ async def get_store_agent_by_version_id(
        raise DatabaseError("Failed to fetch agent details") from e


+async def get_store_agent_details_as_admin(
+    store_listing_version_id: str,
+) -> store_model.StoreAgentDetails:
+    """Get agent details for admin preview, bypassing the APPROVED-only
+    StoreAgent view. Queries StoreListingVersion directly so pending
+    submissions are visible."""
+    slv = await prisma.models.StoreListingVersion.prisma().find_unique(
+        where={"id": store_listing_version_id},
+        include={
+            "StoreListing": {"include": {"CreatorProfile": True}},
+        },
+    )
+    if not slv or not slv.StoreListing:
+        raise NotFoundError(
+            f"Store listing version {store_listing_version_id} not found"
+        )
+
+    listing = slv.StoreListing
+    # CreatorProfile is a required FK relation — should always exist.
+    # If it's None, the DB is in a bad state.
+    profile = listing.CreatorProfile
+    if not profile:
+        raise DatabaseError(
+            f"StoreListing {listing.id} has no CreatorProfile — FK violated"
+        )
+
+    return store_model.StoreAgentDetails(
+        store_listing_version_id=slv.id,
+        slug=listing.slug,
+        agent_name=slv.name,
+        agent_video=slv.videoUrl or "",
+        agent_output_demo=slv.agentOutputDemoUrl or "",
+        agent_image=slv.imageUrls,
+        creator=profile.username,
+        creator_avatar=profile.avatarUrl or "",
+        sub_heading=slv.subHeading,
+        description=slv.description,
+        instructions=slv.instructions,
+        categories=slv.categories,
+        runs=0,
+        rating=0.0,
+        versions=[str(slv.version)],
+        graph_id=slv.agentGraphId,
+        graph_versions=[str(slv.agentGraphVersion)],
+        last_updated=slv.updatedAt,
+        recommended_schedule_cron=slv.recommendedScheduleCron,
+        active_version_id=listing.activeVersionId or slv.id,
+        has_approved_version=listing.hasApprovedVersion,
+    )
+
+
 class StoreCreatorsSortOptions(Enum):
    # NOTE: values correspond 1:1 to columns of the Creator view
    AGENT_RATING = "agent_rating"
--- a/autogpt_platform/backend/backend/api/features/store/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/db_test.py
@@ -189,6 +189,7 @@ async def test_create_store_submission(mocker):
        notifyOnAgentApproved=True,
        notifyOnAgentRejected=True,
        timezone="Europe/Delft",
+        subscriptionTier=prisma.enums.SubscriptionTier.FREE,  # type: ignore[reportCallIssue,reportAttributeAccessIssue]
    )
    mock_agent = prisma.models.AgentGraph(
        id="agent-id",
--- a/autogpt_platform/backend/backend/api/features/v1.py
+++ b/autogpt_platform/backend/backend/api/features/v1.py
@@ -980,14 +980,16 @@ async def execute_graph(
    source: Annotated[GraphExecutionSource | None, Body(embed=True)] = None,
    graph_version: Optional[int] = None,
    preset_id: Optional[str] = None,
+    dry_run: Annotated[bool, Body(embed=True)] = False,
 ) -> execution_db.GraphExecutionMeta:
-    user_credit_model = await get_user_credit_model(user_id)
-    current_balance = await user_credit_model.get_credits(user_id)
-    if current_balance <= 0:
-        raise HTTPException(
-            status_code=402,
-            detail="Insufficient balance to execute the agent. Please top up your account.",
-        )
+    if not dry_run:
+        user_credit_model = await get_user_credit_model(user_id)
+        current_balance = await user_credit_model.get_credits(user_id)
+        if current_balance <= 0:
+            raise HTTPException(
+                status_code=402,
+                detail="Insufficient balance to execute the agent. Please top up your account.",
+            )

    try:
        result = await execution_utils.add_graph_execution(
@@ -997,6 +999,7 @@ async def execute_graph(
            preset_id=preset_id,
            graph_version=graph_version,
            graph_credentials_inputs=credentials_inputs,
+            dry_run=dry_run,
        )
        # Record successful graph execution
        record_graph_execution(graph_id=graph_id, status="success", user_id=user_id)
--- a/autogpt_platform/backend/backend/api/rest_api.py
+++ b/autogpt_platform/backend/backend/api/rest_api.py
@@ -18,6 +18,8 @@ from prisma.errors import PrismaError

 import backend.api.features.admin.credit_admin_routes
 import backend.api.features.admin.execution_analytics_routes
+import backend.api.features.admin.platform_cost_routes
+import backend.api.features.admin.rate_limit_admin_routes
 import backend.api.features.admin.store_admin_routes
 import backend.api.features.builder
 import backend.api.features.builder.routes
@@ -117,6 +119,11 @@ async def lifespan_context(app: fastapi.FastAPI):

    AutoRegistry.patch_integrations()

+    # Register managed credential providers (e.g. AgentMail)
+    from backend.integrations.managed_providers import register_all
+
+    register_all()
+
    await backend.data.block.initialize_blocks()

    await backend.data.user.migrate_and_encrypt_user_integrations()
@@ -318,6 +325,16 @@ app.include_router(
    tags=["v2", "admin"],
    prefix="/api/executions",
 )
+app.include_router(
+    backend.api.features.admin.rate_limit_admin_routes.router,
+    tags=["v2", "admin"],
+    prefix="/api/copilot",
+)
+app.include_router(
+    backend.api.features.admin.platform_cost_routes.router,
+    tags=["v2", "admin"],
+    prefix="/api/platform-costs",
+)
 app.include_router(
    backend.api.features.executions.review.routes.router,
    tags=["v2", "executions", "review"],
@@ -528,8 +545,11 @@ class AgentServer(backend.util.service.AppProcess):
        user_id: str,
        provider: ProviderName,
        credentials: Credentials,
-    ) -> Credentials:
-        from .features.integrations.router import create_credentials, get_credential
+    ):
+        from backend.api.features.integrations.router import (
+            create_credentials,
+            get_credential,
+        )

        try:
            return await create_credentials(
--- a/autogpt_platform/backend/backend/blocks/_base.py
+++ b/autogpt_platform/backend/backend/blocks/_base.py
@@ -698,13 +698,30 @@ class Block(ABC, Generic[BlockSchemaInputType, BlockSchemaOutputType]):
            if should_pause:
                return

-        # Validate the input data (original or reviewer-modified) once
-        if error := self.input_schema.validate_data(input_data):
-            raise BlockInputError(
-                message=f"Unable to execute block with invalid input data: {error}",
-                block_name=self.name,
-                block_id=self.id,
-            )
+        # Validate the input data (original or reviewer-modified) once.
+        # In dry-run mode, credential fields may contain sentinel None values
+        # that would fail JSON schema required checks.  We still validate the
+        # non-credential fields so blocks that execute for real during dry-run
+        # (e.g. AgentExecutorBlock) get proper input validation.
+        is_dry_run = getattr(kwargs.get("execution_context"), "dry_run", False)
+        if is_dry_run:
+            cred_field_names = set(self.input_schema.get_credentials_fields().keys())
+            non_cred_data = {
+                k: v for k, v in input_data.items() if k not in cred_field_names
+            }
+            if error := self.input_schema.validate_data(non_cred_data):
+                raise BlockInputError(
+                    message=f"Unable to execute block with invalid input data: {error}",
+                    block_name=self.name,
+                    block_id=self.id,
+                )
+        else:
+            if error := self.input_schema.validate_data(input_data):
+                raise BlockInputError(
+                    message=f"Unable to execute block with invalid input data: {error}",
+                    block_name=self.name,
+                    block_id=self.id,
+                )

        # Use the validated input data
        async for output_name, output_data in self.run(
--- a/autogpt_platform/backend/backend/blocks/agent.py
+++ b/autogpt_platform/backend/backend/blocks/agent.py
@@ -49,11 +49,17 @@ class AgentExecutorBlock(Block):
        @classmethod
        def get_missing_input(cls, data: BlockInput) -> set[str]:
            required_fields = cls.get_input_schema(data).get("required", [])
-            return set(required_fields) - set(data)
+            # Check against the nested `inputs` dict, not the top-level node
+            # data — required fields like "topic" live inside data["inputs"],
+            # not at data["topic"].
+            provided = data.get("inputs", {})
+            return set(required_fields) - set(provided)

        @classmethod
        def get_mismatch_error(cls, data: BlockInput) -> str | None:
-            return validate_with_jsonschema(cls.get_input_schema(data), data)
+            return validate_with_jsonschema(
+                cls.get_input_schema(data), data.get("inputs", {})
+            )

    class Output(BlockSchema):
        # Use BlockSchema to avoid automatic error field that could clash with graph outputs
@@ -88,6 +94,7 @@ class AgentExecutorBlock(Block):
            execution_context=execution_context.model_copy(
                update={"parent_execution_id": graph_exec_id},
            ),
+            dry_run=execution_context.dry_run,
        )

        logger = execution_utils.LogMetadata(
@@ -149,14 +156,19 @@ class AgentExecutorBlock(Block):
                ExecutionStatus.TERMINATED,
                ExecutionStatus.FAILED,
            ]:
-                logger.debug(
-                    f"Execution {log_id} received event {event.event_type} with status {event.status}"
+                logger.info(
+                    f"Execution {log_id} skipping event {event.event_type} status={event.status} "
+                    f"node={getattr(event, 'node_exec_id', '?')}"
                )
                continue

            if event.event_type == ExecutionEventType.GRAPH_EXEC_UPDATE:
                # If the graph execution is COMPLETED, TERMINATED, or FAILED,
                # we can stop listening for further events.
+                logger.info(
+                    f"Execution {log_id} graph completed with status {event.status}, "
+                    f"yielded {len(yielded_node_exec_ids)} outputs"
+                )
                self.merge_stats(
                    NodeExecutionStats(
                        extra_cost=event.stats.cost if event.stats else 0,
--- a/autogpt_platform/backend/backend/blocks/ai_condition.py
+++ b/autogpt_platform/backend/backend/blocks/ai_condition.py
@@ -1,3 +1,4 @@
+import re
 from typing import Any

 from backend.blocks._base import (
@@ -19,6 +20,33 @@ from backend.blocks.llm import (
 )
 from backend.data.model import APIKeyCredentials, NodeExecutionStats, SchemaField

+# Minimum max_output_tokens accepted by OpenAI-compatible APIs.
+# A true/false answer fits comfortably within this budget.
+MIN_LLM_OUTPUT_TOKENS = 16
+
+
+def _parse_boolean_response(response_text: str) -> tuple[bool, str | None]:
+    """Parse an LLM response into a boolean result.
+
+    Returns a ``(result, error)`` tuple.  *error* is ``None`` when the
+    response is unambiguous; otherwise it contains a diagnostic message
+    and *result* defaults to ``False``.
+    """
+    text = response_text.strip().lower()
+    if text == "true":
+        return True, None
+    if text == "false":
+        return False, None
+
+    # Fuzzy match – use word boundaries to avoid false positives like "untrue".
+    tokens = set(re.findall(r"\b(true|false|yes|no|1|0)\b", text))
+    if tokens == {"true"} or tokens == {"yes"} or tokens == {"1"}:
+        return True, None
+    if tokens == {"false"} or tokens == {"no"} or tokens == {"0"}:
+        return False, None
+
+    return False, f"Unclear AI response: '{response_text}'"
+

 class AIConditionBlock(AIBlockBase):
    """
@@ -162,54 +190,26 @@ class AIConditionBlock(AIBlockBase):
        ]

        # Call the LLM
-        try:
-            response = await self.llm_call(
-                credentials=credentials,
-                llm_model=input_data.model,
-                prompt=prompt,
-                max_tokens=10,  # We only expect a true/false response
+        response = await self.llm_call(
+            credentials=credentials,
+            llm_model=input_data.model,
+            prompt=prompt,
+            max_tokens=MIN_LLM_OUTPUT_TOKENS,
+        )
+
+        # Extract the boolean result from the response
+        result, error = _parse_boolean_response(response.response)
+        if error:
+            yield "error", error
+
+        # Update internal stats
+        self.merge_stats(
+            NodeExecutionStats(
+                input_token_count=response.prompt_tokens,
+                output_token_count=response.completion_tokens,
            )
-
-            # Extract the boolean result from the response
-            response_text = response.response.strip().lower()
-            if response_text == "true":
-                result = True
-            elif response_text == "false":
-                result = False
-            else:
-                # If the response is not clear, try to interpret it using word boundaries
-                import re
-
-                # Use word boundaries to avoid false positives like 'untrue' or '10'
-                tokens = set(re.findall(r"\b(true|false|yes|no|1|0)\b", response_text))
-
-                if tokens == {"true"} or tokens == {"yes"} or tokens == {"1"}:
-                    result = True
-                elif tokens == {"false"} or tokens == {"no"} or tokens == {"0"}:
-                    result = False
-                else:
-                    # Unclear or conflicting response - default to False and yield error
-                    result = False
-                    yield "error", f"Unclear AI response: '{response.response}'"
-
-            # Update internal stats
-            self.merge_stats(
-                NodeExecutionStats(
-                    input_token_count=response.prompt_tokens,
-                    output_token_count=response.completion_tokens,
-                )
-            )
-            self.prompt = response.prompt
-
-        except Exception as e:
-            # In case of any error, default to False to be safe
-            result = False
-            # Log the error but don't fail the block execution
-            import logging
-
-            logger = logging.getLogger(__name__)
-            logger.error(f"AI condition evaluation failed: {str(e)}")
-            yield "error", f"AI evaluation failed: {str(e)}"
+        )
+        self.prompt = response.prompt

        # Yield results
        yield "result", result
--- a/autogpt_platform/backend/backend/blocks/ai_condition_test.py
+++ b/autogpt_platform/backend/backend/blocks/ai_condition_test.py
@@ -0,0 +1,147 @@
+"""Tests for AIConditionBlock – regression coverage for max_tokens and error propagation."""
+
+from __future__ import annotations
+
+from typing import cast
+
+import pytest
+
+from backend.blocks.ai_condition import (
+    MIN_LLM_OUTPUT_TOKENS,
+    AIConditionBlock,
+    _parse_boolean_response,
+)
+from backend.blocks.llm import (
+    DEFAULT_LLM_MODEL,
+    TEST_CREDENTIALS,
+    TEST_CREDENTIALS_INPUT,
+    AICredentials,
+    LLMResponse,
+)
+
+_TEST_AI_CREDENTIALS = cast(AICredentials, TEST_CREDENTIALS_INPUT)
+
+
+# ---------------------------------------------------------------------------
+# Helper to collect all yields from the async generator
+# ---------------------------------------------------------------------------
+
+
+async def _collect_outputs(block: AIConditionBlock, input_data, credentials):
+    outputs: dict[str, object] = {}
+    async for name, value in block.run(input_data, credentials=credentials):
+        outputs[name] = value
+    return outputs
+
+
+def _make_input(**overrides) -> AIConditionBlock.Input:
+    defaults: dict = {
+        "input_value": "hello@example.com",
+        "condition": "the input is an email address",
+        "yes_value": "yes!",
+        "no_value": "no!",
+        "model": DEFAULT_LLM_MODEL,
+        "credentials": TEST_CREDENTIALS_INPUT,
+    }
+    defaults.update(overrides)
+    return AIConditionBlock.Input(**defaults)
+
+
+def _mock_llm_response(response_text: str) -> LLMResponse:
+    return LLMResponse(
+        raw_response="",
+        prompt=[],
+        response=response_text,
+        tool_calls=None,
+        prompt_tokens=10,
+        completion_tokens=5,
+        reasoning=None,
+    )
+
+
+# ---------------------------------------------------------------------------
+# _parse_boolean_response unit tests
+# ---------------------------------------------------------------------------
+
+
+class TestParseBooleanResponse:
+    def test_true_exact(self):
+        assert _parse_boolean_response("true") == (True, None)
+
+    def test_false_exact(self):
+        assert _parse_boolean_response("false") == (False, None)
+
+    def test_true_with_whitespace(self):
+        assert _parse_boolean_response("  True  ") == (True, None)
+
+    def test_yes_fuzzy(self):
+        assert _parse_boolean_response("Yes") == (True, None)
+
+    def test_no_fuzzy(self):
+        assert _parse_boolean_response("no") == (False, None)
+
+    def test_one_fuzzy(self):
+        assert _parse_boolean_response("1") == (True, None)
+
+    def test_zero_fuzzy(self):
+        assert _parse_boolean_response("0") == (False, None)
+
+    def test_unclear_response(self):
+        result, error = _parse_boolean_response("I'm not sure")
+        assert result is False
+        assert error is not None
+        assert "Unclear" in error
+
+    def test_conflicting_tokens(self):
+        result, error = _parse_boolean_response("true and false")
+        assert result is False
+        assert error is not None
+
+
+# ---------------------------------------------------------------------------
+# Regression: max_tokens is set to MIN_LLM_OUTPUT_TOKENS
+# ---------------------------------------------------------------------------
+
+
+class TestMaxTokensRegression:
+    @pytest.mark.asyncio
+    async def test_llm_call_receives_min_output_tokens(self):
+        """max_tokens must be MIN_LLM_OUTPUT_TOKENS (16) – the previous value
+        of 1 was too low and caused OpenAI to reject the request."""
+        block = AIConditionBlock()
+        captured_kwargs: dict = {}
+
+        async def spy_llm_call(**kwargs):
+            captured_kwargs.update(kwargs)
+            return _mock_llm_response("true")
+
+        block.llm_call = spy_llm_call  # type: ignore[assignment]
+
+        input_data = _make_input()
+        await _collect_outputs(block, input_data, credentials=TEST_CREDENTIALS)
+
+        assert captured_kwargs["max_tokens"] == MIN_LLM_OUTPUT_TOKENS
+        assert captured_kwargs["max_tokens"] == 16
+
+
+# ---------------------------------------------------------------------------
+# Regression: exceptions from llm_call must propagate
+# ---------------------------------------------------------------------------
+
+
+class TestExceptionPropagation:
+    @pytest.mark.asyncio
+    async def test_llm_call_exception_propagates(self):
+        """If llm_call raises, the exception must NOT be swallowed.
+        Previously the block caught all exceptions and silently returned
+        result=False."""
+        block = AIConditionBlock()
+
+        async def boom(**kwargs):
+            raise RuntimeError("LLM provider error")
+
+        block.llm_call = boom  # type: ignore[assignment]
+
+        input_data = _make_input()
+        with pytest.raises(RuntimeError, match="LLM provider error"):
+            await _collect_outputs(block, input_data, credentials=TEST_CREDENTIALS)
--- a/autogpt_platform/backend/backend/blocks/ai_shortform_video_block.py
+++ b/autogpt_platform/backend/backend/blocks/ai_shortform_video_block.py
@@ -18,6 +18,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -358,6 +359,7 @@ class AIShortformVideoCreatorBlock(Block):
                execution_context=execution_context,
                return_format="for_block_output",
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "video_url", stored_url


@@ -565,6 +567,7 @@ class AIAdMakerVideoCreatorBlock(Block):
            execution_context=execution_context,
            return_format="for_block_output",
        )
+        self.merge_stats(NodeExecutionStats(output_size=1))
        yield "video_url", stored_url


@@ -760,4 +763,5 @@ class AIScreenshotToVideoAdBlock(Block):
            execution_context=execution_context,
            return_format="for_block_output",
        )
+        self.merge_stats(NodeExecutionStats(output_size=1))
        yield "video_url", stored_url
--- a/autogpt_platform/backend/backend/blocks/apollo/organization.py
+++ b/autogpt_platform/backend/backend/blocks/apollo/organization.py
@@ -17,7 +17,7 @@ from backend.blocks.apollo.models import (
    PrimaryPhone,
    SearchOrganizationsRequest,
 )
-from backend.data.model import CredentialsField, SchemaField
+from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField


 class SearchOrganizationsBlock(Block):
@@ -218,6 +218,7 @@ To find IDs, identify the values for organization_id when you call this endpoint
    ) -> BlockOutput:
        query = SearchOrganizationsRequest(**input_data.model_dump())
        organizations = await self.search_organizations(query, credentials)
+        self.merge_stats(NodeExecutionStats(output_size=len(organizations)))
        for organization in organizations:
            yield "organization", organization
        yield "organizations", organizations
--- a/autogpt_platform/backend/backend/blocks/apollo/people.py
+++ b/autogpt_platform/backend/backend/blocks/apollo/people.py
@@ -21,7 +21,7 @@ from backend.blocks.apollo.models import (
    SearchPeopleRequest,
    SenorityLevels,
 )
-from backend.data.model import CredentialsField, SchemaField
+from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField


 class SearchPeopleBlock(Block):
@@ -366,4 +366,5 @@ class SearchPeopleBlock(Block):
                *(enrich_or_fallback(person) for person in people)
            )

+        self.merge_stats(NodeExecutionStats(output_size=len(people)))
        yield "people", people
--- a/autogpt_platform/backend/backend/blocks/apollo/person.py
+++ b/autogpt_platform/backend/backend/blocks/apollo/person.py
@@ -13,7 +13,7 @@ from backend.blocks.apollo._auth import (
    ApolloCredentialsInput,
 )
 from backend.blocks.apollo.models import Contact, EnrichPersonRequest
-from backend.data.model import CredentialsField, SchemaField
+from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField


 class GetPersonDetailBlock(Block):
@@ -141,4 +141,5 @@ class GetPersonDetailBlock(Block):
        **kwargs,
    ) -> BlockOutput:
        query = EnrichPersonRequest(**input_data.model_dump())
+        self.merge_stats(NodeExecutionStats(output_size=1))
        yield "contact", await self.enrich_person(query, credentials)
--- a/autogpt_platform/backend/backend/blocks/autopilot.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot.py
@@ -15,6 +15,12 @@ from backend.blocks._base import (
    BlockSchemaInput,
    BlockSchemaOutput,
 )
+from backend.copilot.permissions import (
+    CopilotPermissions,
+    ToolName,
+    all_known_tool_names,
+    validate_block_identifiers,
+)
 from backend.data.model import SchemaField

 if TYPE_CHECKING:
@@ -96,6 +102,65 @@ class AutoPilotBlock(Block):
            advanced=True,
        )

+        tools: list[ToolName] = SchemaField(
+            description=(
+                "Tool names to filter. Works with tools_exclude to form an "
+                "allow-list or deny-list. "
+                "Leave empty to apply no tool filter."
+            ),
+            default=[],
+            advanced=True,
+        )
+
+        tools_exclude: bool = SchemaField(
+            description=(
+                "Controls how the 'tools' list is interpreted. "
+                "True (default): 'tools' is a deny-list — listed tools are blocked, "
+                "all others are allowed. An empty 'tools' list means allow everything. "
+                "False: 'tools' is an allow-list — only listed tools are permitted."
+            ),
+            default=True,
+            advanced=True,
+        )
+
+        blocks: list[str] = SchemaField(
+            description=(
+                "Block identifiers to filter when the copilot uses run_block. "
+                "Each entry can be: a block name (e.g. 'HTTP Request'), "
+                "a full block UUID, or the first 8 hex characters of the UUID "
+                "(e.g. 'c069dc6b'). Works with blocks_exclude. "
+                "Leave empty to apply no block filter."
+            ),
+            default=[],
+            advanced=True,
+        )
+
+        blocks_exclude: bool = SchemaField(
+            description=(
+                "Controls how the 'blocks' list is interpreted. "
+                "True (default): 'blocks' is a deny-list — listed blocks are blocked, "
+                "all others are allowed. An empty 'blocks' list means allow everything. "
+                "False: 'blocks' is an allow-list — only listed blocks are permitted."
+            ),
+            default=True,
+            advanced=True,
+        )
+
+        dry_run: bool = SchemaField(
+            description=(
+                "When enabled, run_block and run_agent tool calls in this "
+                "autopilot session are forced to use dry-run simulation mode. "
+                "No real API calls, side effects, or credits are consumed "
+                "by those tools. Useful for testing agent wiring and "
+                "previewing outputs. "
+                "Only applies when creating a new session (session_id is empty). "
+                "When reusing an existing session_id, the session's original "
+                "dry_run setting is preserved."
+            ),
+            default=False,
+            advanced=True,
+        )
+
        # timeout_seconds removed: the SDK manages its own heartbeat-based
        # timeouts internally; wrapping with asyncio.timeout corrupts the
        # SDK's internal stream (see service.py CRITICAL comment).
@@ -182,11 +247,11 @@ class AutoPilotBlock(Block):
            },
        )

-    async def create_session(self, user_id: str) -> str:
+    async def create_session(self, user_id: str, *, dry_run: bool) -> str:
        """Create a new chat session and return its ID (mockable for tests)."""
-        from backend.copilot.model import create_chat_session
+        from backend.copilot.model import create_chat_session  # avoid circular import

-        session = await create_chat_session(user_id)
+        session = await create_chat_session(user_id, dry_run=dry_run)
        return session.session_id

    async def execute_copilot(
@@ -196,6 +261,7 @@ class AutoPilotBlock(Block):
        session_id: str,
        max_recursion_depth: int,
        user_id: str,
+        permissions: "CopilotPermissions | None" = None,
    ) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
        """Invoke the copilot and collect all stream results.

@@ -209,14 +275,21 @@ class AutoPilotBlock(Block):
            session_id: Chat session to use.
            max_recursion_depth: Maximum allowed recursion nesting.
            user_id: Authenticated user ID.
+            permissions: Optional capability filter restricting tools/blocks.

        Returns:
            A tuple of (response_text, tool_calls, history_json, session_id, usage).
        """
-        from backend.copilot.sdk.collect import collect_copilot_response
+        from backend.copilot.sdk.collect import (
+            collect_copilot_response,  # avoid circular import
+        )

        tokens = _check_recursion(max_recursion_depth)
+        perm_token = None
        try:
+            effective_permissions, perm_token = _merge_inherited_permissions(
+                permissions
+            )
            effective_prompt = prompt
            if system_context:
                effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
@@ -225,6 +298,7 @@ class AutoPilotBlock(Block):
                session_id=session_id,
                message=effective_prompt,
                user_id=user_id,
+                permissions=effective_permissions,
            )

            # Build a lightweight conversation summary from streamed data.
@@ -271,6 +345,8 @@ class AutoPilotBlock(Block):
            )
        finally:
            _reset_recursion(tokens)
+            if perm_token is not None:
+                _inherited_permissions.reset(perm_token)

    async def run(
        self,
@@ -295,11 +371,20 @@ class AutoPilotBlock(Block):
            yield "error", "max_recursion_depth must be at least 1."
            return

+        # Validate and build permissions eagerly — fail before creating a session.
+        permissions = await _build_and_validate_permissions(input_data)
+        if isinstance(permissions, str):
+            # Validation error returned as a string message.
+            yield "error", permissions
+            return
+
        # Create session eagerly so the user always gets the session_id,
        # even if the downstream stream fails (avoids orphaned sessions).
        sid = input_data.session_id
        if not sid:
-            sid = await self.create_session(execution_context.user_id)
+            sid = await self.create_session(
+                execution_context.user_id, dry_run=input_data.dry_run
+            )

        # NOTE: No asyncio.timeout() here — the SDK manages its own
        # heartbeat-based timeouts internally.  Wrapping with asyncio.timeout
@@ -312,6 +397,7 @@ class AutoPilotBlock(Block):
                session_id=sid,
                max_recursion_depth=input_data.max_recursion_depth,
                user_id=execution_context.user_id,
+                permissions=permissions,
            )

            yield "response", response
@@ -374,3 +460,78 @@ def _reset_recursion(
    """Restore recursion depth and limit to their previous values."""
    _autopilot_recursion_depth.reset(tokens[0])
    _autopilot_recursion_limit.reset(tokens[1])
+
+
+# ---------------------------------------------------------------------------
+# Permission helpers
+# ---------------------------------------------------------------------------
+
+# Inherited permissions from a parent AutoPilotBlock execution.
+# This acts as a ceiling: child executions can only be more restrictive.
+_inherited_permissions: contextvars.ContextVar["CopilotPermissions | None"] = (
+    contextvars.ContextVar("_inherited_permissions", default=None)
+)
+
+
+async def _build_and_validate_permissions(
+    input_data: "AutoPilotBlock.Input",
+) -> "CopilotPermissions | str":
+    """Build a :class:`CopilotPermissions` from block input and validate it.
+
+    Returns a :class:`CopilotPermissions` on success or a human-readable
+    error string if validation fails.
+    """
+    # Tool names are validated by Pydantic via the ToolName Literal type
+    # at model construction time — no runtime check needed here.
+    # Validate block identifiers against live block registry.
+    if input_data.blocks:
+        invalid_blocks = await validate_block_identifiers(input_data.blocks)
+        if invalid_blocks:
+            return (
+                f"Unknown block identifier(s) in 'blocks': {invalid_blocks}. "
+                "Use find_block to discover valid block names and IDs. "
+                "You may also use the first 8 characters of a block UUID."
+            )
+
+    return CopilotPermissions(
+        tools=list(input_data.tools),
+        tools_exclude=input_data.tools_exclude,
+        blocks=input_data.blocks,
+        blocks_exclude=input_data.blocks_exclude,
+    )
+
+
+def _merge_inherited_permissions(
+    permissions: "CopilotPermissions | None",
+) -> "tuple[CopilotPermissions | None, contextvars.Token[CopilotPermissions | None] | None]":
+    """Merge *permissions* with any inherited parent permissions.
+
+    The merged result is stored back into the contextvar so that any nested
+    AutoPilotBlock invocation (sub-agent) inherits the merged ceiling.
+
+    Returns a tuple of (merged_permissions, reset_token).  The caller MUST
+    reset the contextvar via ``_inherited_permissions.reset(token)`` in a
+    ``finally`` block when ``reset_token`` is not None — this prevents
+    permission leakage between sequential independent executions in the same
+    asyncio task.
+    """
+    parent = _inherited_permissions.get()
+
+    if permissions is None and parent is None:
+        return None, None
+
+    all_tools = all_known_tool_names()
+
+    if permissions is None:
+        permissions = CopilotPermissions()  # allow-all; will be narrowed by parent
+
+    merged = (
+        permissions.merged_with_parent(parent, all_tools)
+        if parent is not None
+        else permissions
+    )
+
+    # Store merged permissions as the new inherited ceiling for nested calls.
+    # Return the token so the caller can restore the previous value in finally.
+    token = _inherited_permissions.set(merged)
+    return merged, token
--- a/autogpt_platform/backend/backend/blocks/autopilot_permissions_test.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot_permissions_test.py
@@ -0,0 +1,265 @@
+"""Tests for AutoPilotBlock permission fields and validation."""
+
+from __future__ import annotations
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from pydantic import ValidationError
+
+from backend.blocks.autopilot import (
+    AutoPilotBlock,
+    _build_and_validate_permissions,
+    _inherited_permissions,
+    _merge_inherited_permissions,
+)
+from backend.copilot.permissions import CopilotPermissions, all_known_tool_names
+from backend.data.execution import ExecutionContext
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_input(**kwargs) -> AutoPilotBlock.Input:
+    defaults = {
+        "prompt": "Do something",
+        "system_context": "",
+        "session_id": "",
+        "max_recursion_depth": 3,
+        "tools": [],
+        "tools_exclude": True,
+        "blocks": [],
+        "blocks_exclude": True,
+    }
+    defaults.update(kwargs)
+    return AutoPilotBlock.Input(**defaults)
+
+
+# ---------------------------------------------------------------------------
+# _build_and_validate_permissions
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+class TestBuildAndValidatePermissions:
+    async def test_empty_inputs_returns_empty_permissions(self):
+        inp = _make_input()
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.is_empty()
+
+    async def test_valid_tool_names_accepted(self):
+        inp = _make_input(tools=["run_block", "web_fetch"], tools_exclude=True)
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.tools == ["run_block", "web_fetch"]
+        assert result.tools_exclude is True
+
+    async def test_invalid_tool_rejected_by_pydantic(self):
+        """Invalid tool names are now caught at Pydantic validation time
+        (Literal type), before ``_build_and_validate_permissions`` is called."""
+        with pytest.raises(ValidationError, match="not_a_real_tool"):
+            _make_input(tools=["not_a_real_tool"])
+
+    async def test_valid_block_name_accepted(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["HTTP Request"], blocks_exclude=True)
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.blocks == ["HTTP Request"]
+
+    async def test_valid_partial_uuid_accepted(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["c069dc6b"], blocks_exclude=False)
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+
+    async def test_invalid_block_identifier_returns_error(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["totally_fake_block"])
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, str)
+        assert "totally_fake_block" in result
+        assert "Unknown block identifier" in result
+
+    async def test_sdk_builtin_tool_names_accepted(self):
+        inp = _make_input(tools=["Read", "Task", "WebSearch"], tools_exclude=False)
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert not result.tools_exclude
+
+    async def test_empty_blocks_skips_validation(self):
+        # Should not call validate_block_identifiers at all when blocks=[].
+        with patch(
+            "backend.copilot.permissions.validate_block_identifiers"
+        ) as mock_validate:
+            inp = _make_input(blocks=[])
+            await _build_and_validate_permissions(inp)
+            mock_validate.assert_not_called()
+
+
+# ---------------------------------------------------------------------------
+# _merge_inherited_permissions
+# ---------------------------------------------------------------------------
+
+
+class TestMergeInheritedPermissions:
+    def test_no_permissions_no_parent_returns_none(self):
+        merged, token = _merge_inherited_permissions(None)
+        assert merged is None
+        assert token is None
+
+    def test_permissions_no_parent_returned_unchanged(self):
+        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        merged, token = _merge_inherited_permissions(perms)
+        try:
+            assert merged is perms
+            assert token is not None
+        finally:
+            if token is not None:
+                _inherited_permissions.reset(token)
+
+    def test_child_narrows_parent(self):
+        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        # Set parent as inherited
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            child = CopilotPermissions(tools=["web_fetch"], tools_exclude=True)
+            merged, inner_token = _merge_inherited_permissions(child)
+            try:
+                assert merged is not None
+                all_t = all_known_tool_names()
+                effective = merged.effective_allowed_tools(all_t)
+                assert "bash_exec" not in effective
+                assert "web_fetch" not in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+    def test_none_permissions_with_parent_uses_parent(self):
+        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            merged, inner_token = _merge_inherited_permissions(None)
+            try:
+                assert merged is not None
+                # Merged should have parent's restrictions
+                effective = merged.effective_allowed_tools(all_known_tool_names())
+                assert "bash_exec" not in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+    def test_child_cannot_expand_parent_whitelist(self):
+        parent = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            # Child tries to allow more tools
+            child = CopilotPermissions(
+                tools=["run_block", "bash_exec"], tools_exclude=False
+            )
+            merged, inner_token = _merge_inherited_permissions(child)
+            try:
+                assert merged is not None
+                effective = merged.effective_allowed_tools(all_known_tool_names())
+                assert "bash_exec" not in effective
+                assert "run_block" in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+
+# ---------------------------------------------------------------------------
+# AutoPilotBlock.run — validation integration
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+class TestAutoPilotBlockRunPermissions:
+    async def _collect_outputs(self, block, input_data, user_id="test-user"):
+        """Helper to collect all yields from block.run()."""
+        ctx = ExecutionContext(
+            user_id=user_id,
+            graph_id="g1",
+            graph_exec_id="ge1",
+            node_exec_id="ne1",
+            node_id="n1",
+        )
+        outputs = {}
+        async for key, val in block.run(input_data, execution_context=ctx):
+            outputs[key] = val
+        return outputs
+
+    async def test_invalid_tool_rejected_by_pydantic(self):
+        """Invalid tool names are caught at Pydantic validation (Literal type)."""
+        with pytest.raises(ValidationError, match="not_a_tool"):
+            _make_input(tools=["not_a_tool"])
+
+    async def test_invalid_block_yields_error(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            block = AutoPilotBlock()
+            inp = _make_input(blocks=["nonexistent_block"])
+            outputs = await self._collect_outputs(block, inp)
+        assert "error" in outputs
+        assert "nonexistent_block" in outputs["error"]
+
+    async def test_empty_prompt_yields_error_before_permission_check(self):
+        block = AutoPilotBlock()
+        inp = _make_input(prompt="   ", tools=["run_block"])
+        outputs = await self._collect_outputs(block, inp)
+        assert "error" in outputs
+        assert "Prompt cannot be empty" in outputs["error"]
+
+    async def test_valid_permissions_passed_to_execute(self):
+        """Permissions are forwarded to execute_copilot when valid."""
+        block = AutoPilotBlock()
+        captured: dict = {}
+
+        async def fake_execute_copilot(self_inner, **kwargs):
+            captured["permissions"] = kwargs.get("permissions")
+            return (
+                "ok",
+                [],
+                '[{"role":"user","content":"hi"}]',
+                "test-sid",
+                {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2},
+            )
+
+        with patch.object(
+            AutoPilotBlock, "create_session", new=AsyncMock(return_value="test-sid")
+        ), patch.object(AutoPilotBlock, "execute_copilot", new=fake_execute_copilot):
+            inp = _make_input(tools=["run_block"], tools_exclude=False)
+            outputs = await self._collect_outputs(block, inp)
+
+        assert "error" not in outputs
+        perms = captured.get("permissions")
+        assert isinstance(perms, CopilotPermissions)
+        assert perms.tools == ["run_block"]
+        assert perms.tools_exclude is False
--- a/autogpt_platform/backend/backend/blocks/code_executor.py
+++ b/autogpt_platform/backend/backend/blocks/code_executor.py
@@ -17,6 +17,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -342,6 +343,7 @@ class ExecuteCodeBlock(Block, BaseE2BExecutorMixin):

            # Determine result object shape & filter out empty formats
            main_result, results = self.process_execution_results(results)
+            self.merge_stats(NodeExecutionStats(output_size=1))
            if main_result:
                yield "main_result", main_result
            yield "results", results
@@ -467,6 +469,7 @@ class InstantiateCodeSandboxBlock(Block, BaseE2BExecutorMixin):
                setup_commands=input_data.setup_commands,
                timeout=input_data.timeout,
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            if sandbox_id:
                yield "sandbox_id", sandbox_id
            else:
@@ -577,6 +580,7 @@ class ExecuteCodeStepBlock(Block, BaseE2BExecutorMixin):

            # Determine result object shape & filter out empty formats
            main_result, results = self.process_execution_results(results)
+            self.merge_stats(NodeExecutionStats(output_size=1))
            if main_result:
                yield "main_result", main_result
            yield "results", results
--- a/autogpt_platform/backend/backend/blocks/discord/bot_blocks.py
+++ b/autogpt_platform/backend/backend/blocks/discord/bot_blocks.py
@@ -73,7 +73,7 @@ class ReadDiscordMessagesBlock(Block):
            id="df06086a-d5ac-4abb-9996-2ad0acb2eff7",
            input_schema=ReadDiscordMessagesBlock.Input,  # Assign input schema
            output_schema=ReadDiscordMessagesBlock.Output,  # Assign output schema
-            description="Reads messages from a Discord channel using a bot token.",
+            description="Reads new messages from a Discord channel using a bot token and triggers when a new message is posted",
            categories={BlockCategory.SOCIAL},
            test_input={
                "continuous_read": False,
--- a/autogpt_platform/backend/backend/blocks/enrichlayer/linkedin.py
+++ b/autogpt_platform/backend/backend/blocks/enrichlayer/linkedin.py
@@ -15,7 +15,12 @@ from backend.blocks._base import (
    BlockSchemaInput,
    BlockSchemaOutput,
 )
-from backend.data.model import APIKeyCredentials, CredentialsField, SchemaField
+from backend.data.model import (
+    APIKeyCredentials,
+    CredentialsField,
+    NodeExecutionStats,
+    SchemaField,
+)
 from backend.util.type import MediaFileType

 from ._api import (
@@ -195,6 +200,7 @@ class GetLinkedinProfileBlock(Block):
                include_social_media=input_data.include_social_media,
                include_extra=input_data.include_extra,
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "profile", profile
        except Exception as e:
            logger.error(f"Error fetching LinkedIn profile: {str(e)}")
@@ -341,6 +347,7 @@ class LinkedinPersonLookupBlock(Block):
                include_similarity_checks=input_data.include_similarity_checks,
                enrich_profile=input_data.enrich_profile,
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "lookup_result", lookup_result
        except Exception as e:
            logger.error(f"Error looking up LinkedIn profile: {str(e)}")
@@ -443,6 +450,7 @@ class LinkedinRoleLookupBlock(Block):
                company_name=input_data.company_name,
                enrich_profile=input_data.enrich_profile,
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "role_lookup_result", role_lookup_result
        except Exception as e:
            logger.error(f"Error looking up role in company: {str(e)}")
@@ -523,6 +531,7 @@ class GetLinkedinProfilePictureBlock(Block):
                credentials=credentials,
                linkedin_profile_url=input_data.linkedin_profile_url,
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "profile_picture_url", profile_picture
        except Exception as e:
            logger.error(f"Error getting profile picture: {str(e)}")
--- a/autogpt_platform/backend/backend/blocks/exa/contents.py
+++ b/autogpt_platform/backend/backend/blocks/exa/contents.py
@@ -4,6 +4,7 @@ from typing import Optional
 from exa_py import AsyncExa
 from pydantic import BaseModel

+from backend.data.model import NodeExecutionStats
 from backend.sdk import (
    APIKeyCredentials,
    Block,
@@ -223,3 +224,6 @@ class ExaContentsBlock(Block):

        if response.cost_dollars:
            yield "cost_dollars", response.cost_dollars
+            self.merge_stats(
+                NodeExecutionStats(provider_cost=response.cost_dollars.total)
+            )
--- a/autogpt_platform/backend/backend/blocks/exa/search.py
+++ b/autogpt_platform/backend/backend/blocks/exa/search.py
@@ -4,6 +4,7 @@ from typing import Optional

 from exa_py import AsyncExa

+from backend.data.model import NodeExecutionStats
 from backend.sdk import (
    APIKeyCredentials,
    Block,
@@ -206,3 +207,6 @@ class ExaSearchBlock(Block):

        if response.cost_dollars:
            yield "cost_dollars", response.cost_dollars
+            self.merge_stats(
+                NodeExecutionStats(provider_cost=response.cost_dollars.total)
+            )
--- a/autogpt_platform/backend/backend/blocks/fal/ai_video_generator.py
+++ b/autogpt_platform/backend/backend/blocks/fal/ai_video_generator.py
@@ -18,7 +18,7 @@ from backend.blocks.fal._auth import (
    FalCredentialsInput,
 )
 from backend.data.execution import ExecutionContext
-from backend.data.model import SchemaField
+from backend.data.model import NodeExecutionStats, SchemaField
 from backend.util.file import store_media_file
 from backend.util.request import ClientResponseError, Requests
 from backend.util.type import MediaFileType
@@ -230,6 +230,7 @@ class AIVideoGeneratorBlock(Block):
                execution_context=execution_context,
                return_format="for_block_output",
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "video_url", stored_url
        except Exception as e:
            error_message = str(e)
--- a/autogpt_platform/backend/backend/blocks/google/gmail.py
+++ b/autogpt_platform/backend/backend/blocks/google/gmail.py
@@ -1,5 +1,6 @@
 import asyncio
 import base64
+import re
 from abc import ABC
 from email import encoders
 from email.mime.base import MIMEBase
@@ -8,7 +9,7 @@ from email.mime.text import MIMEText
 from email.policy import SMTP
 from email.utils import getaddresses, parseaddr
 from pathlib import Path
-from typing import List, Literal, Optional
+from typing import List, Literal, Optional, Protocol, runtime_checkable

 from google.oauth2.credentials import Credentials
 from googleapiclient.discovery import build
@@ -42,8 +43,52 @@ NO_WRAP_POLICY = SMTP.clone(max_line_length=0)


 def serialize_email_recipients(recipients: list[str]) -> str:
-    """Serialize recipients list to comma-separated string."""
-    return ", ".join(recipients)
+    """Serialize recipients list to comma-separated string.
+
+    Strips leading/trailing whitespace from each address to keep MIME
+    headers clean (mirrors the strip done in ``validate_email_recipients``).
+    """
+    return ", ".join(addr.strip() for addr in recipients)
+
+
+# RFC 5322 simplified pattern: local@domain where domain has at least one dot
+_EMAIL_RE = re.compile(r"^[^@\s]+@[^@\s]+\.[^@\s]+$")
+
+
+def validate_email_recipients(recipients: list[str], field_name: str = "to") -> None:
+    """Validate that all recipients are plausible email addresses.
+
+    Raises ``ValueError`` with a user-friendly message listing every
+    invalid entry so the caller (or LLM) can correct them in one pass.
+    """
+    invalid = [addr for addr in recipients if not _EMAIL_RE.match(addr.strip())]
+    if invalid:
+        formatted = ", ".join(f"'{a}'" for a in invalid)
+        raise ValueError(
+            f"Invalid email address(es) in '{field_name}': {formatted}. "
+            f"Each entry must be a valid email address (e.g. user@example.com)."
+        )
+
+
+@runtime_checkable
+class HasRecipients(Protocol):
+    to: list[str]
+    cc: list[str]
+    bcc: list[str]
+
+
+def validate_all_recipients(input_data: HasRecipients) -> None:
+    """Validate to/cc/bcc recipient fields on an input namespace.
+
+    Calls ``validate_email_recipients`` for ``to`` (required) and
+    ``cc``/``bcc`` (when non-empty), raising ``ValueError`` on the
+    first field that contains an invalid address.
+    """
+    validate_email_recipients(input_data.to, "to")
+    if input_data.cc:
+        validate_email_recipients(input_data.cc, "cc")
+    if input_data.bcc:
+        validate_email_recipients(input_data.bcc, "bcc")


 def _make_mime_text(
@@ -100,14 +145,16 @@ async def create_mime_message(
 ) -> str:
    """Create a MIME message with attachments and return base64-encoded raw message."""

+    validate_all_recipients(input_data)
+
    message = MIMEMultipart()
    message["to"] = serialize_email_recipients(input_data.to)
    message["subject"] = input_data.subject

    if input_data.cc:
-        message["cc"] = ", ".join(input_data.cc)
+        message["cc"] = serialize_email_recipients(input_data.cc)
    if input_data.bcc:
-        message["bcc"] = ", ".join(input_data.bcc)
+        message["bcc"] = serialize_email_recipients(input_data.bcc)

    # Use the new helper function with content_type if available
    content_type = getattr(input_data, "content_type", None)
@@ -1167,13 +1214,15 @@ async def _build_reply_message(
        references.append(headers["message-id"])

    # Create MIME message
+    validate_all_recipients(input_data)
+
    msg = MIMEMultipart()
    if input_data.to:
-        msg["To"] = ", ".join(input_data.to)
+        msg["To"] = serialize_email_recipients(input_data.to)
    if input_data.cc:
-        msg["Cc"] = ", ".join(input_data.cc)
+        msg["Cc"] = serialize_email_recipients(input_data.cc)
    if input_data.bcc:
-        msg["Bcc"] = ", ".join(input_data.bcc)
+        msg["Bcc"] = serialize_email_recipients(input_data.bcc)
    msg["Subject"] = subject
    if headers.get("message-id"):
        msg["In-Reply-To"] = headers["message-id"]
@@ -1685,13 +1734,16 @@ To: {original_to}
        else:
            body = f"{forward_header}\n\n{original_body}"

+        # Validate all recipient lists before building the MIME message
+        validate_all_recipients(input_data)
+
        # Create MIME message
        msg = MIMEMultipart()
-        msg["To"] = ", ".join(input_data.to)
+        msg["To"] = serialize_email_recipients(input_data.to)
        if input_data.cc:
-            msg["Cc"] = ", ".join(input_data.cc)
+            msg["Cc"] = serialize_email_recipients(input_data.cc)
        if input_data.bcc:
-            msg["Bcc"] = ", ".join(input_data.bcc)
+            msg["Bcc"] = serialize_email_recipients(input_data.bcc)
        msg["Subject"] = subject

        # Add body with proper content type
--- a/autogpt_platform/backend/backend/blocks/google_maps.py
+++ b/autogpt_platform/backend/backend/blocks/google_maps.py
@@ -14,6 +14,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -117,6 +118,7 @@ class GoogleMapsSearchBlock(Block):
            input_data.radius,
            input_data.max_results,
        )
+        self.merge_stats(NodeExecutionStats(output_size=len(places)))
        for place in places:
            yield "place", place

--- a/autogpt_platform/backend/backend/blocks/ideogram.py
+++ b/autogpt_platform/backend/backend/blocks/ideogram.py
@@ -14,6 +14,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -227,6 +228,7 @@ class IdeogramModelBlock(Block):
                image_url=result,
            )

+        self.merge_stats(NodeExecutionStats(output_size=1))
        yield "result", result

    async def run_model(
--- a/autogpt_platform/backend/backend/blocks/io.py
+++ b/autogpt_platform/backend/backend/blocks/io.py
@@ -2,6 +2,8 @@ import copy
 from datetime import date, time
 from typing import Any, Optional

+from pydantic import AliasChoices, Field
+
 from backend.blocks._base import (
    Block,
    BlockCategory,
@@ -28,9 +30,9 @@ class AgentInputBlock(Block):
    """
    This block is used to provide input to the graph.

-    It takes in a value, name, description, default values list and bool to limit selection to default values.
+    It takes in a value, name, and description.

-    It Outputs the value passed as input.
+    It outputs the value passed as input.
    """

    class Input(BlockSchemaInput):
@@ -47,12 +49,6 @@ class AgentInputBlock(Block):
            default=None,
            advanced=True,
        )
-        placeholder_values: list = SchemaField(
-            description="The placeholder values to be passed as input.",
-            default_factory=list,
-            advanced=True,
-            hidden=True,
-        )
        advanced: bool = SchemaField(
            description="Whether to show the input in the advanced section, if the field is not required.",
            default=False,
@@ -65,10 +61,7 @@ class AgentInputBlock(Block):
        )

        def generate_schema(self):
-            schema = copy.deepcopy(self.get_field_schema("value"))
-            if possible_values := self.placeholder_values:
-                schema["enum"] = possible_values
-            return schema
+            return copy.deepcopy(self.get_field_schema("value"))

    class Output(BlockSchema):
        # Use BlockSchema to avoid automatic error field for interface definition
@@ -86,18 +79,16 @@ class AgentInputBlock(Block):
                        "value": "Hello, World!",
                        "name": "input_1",
                        "description": "Example test input.",
-                        "placeholder_values": [],
                    },
                    {
-                        "value": "Hello, World!",
+                        "value": 42,
                        "name": "input_2",
-                        "description": "Example test input with placeholders.",
-                        "placeholder_values": ["Hello, World!"],
+                        "description": "Example numeric input.",
                    },
                ],
                "test_output": [
                    ("result", "Hello, World!"),
-                    ("result", "Hello, World!"),
+                    ("result", 42),
                ],
                "categories": {BlockCategory.INPUT, BlockCategory.BASIC},
                "block_type": BlockType.INPUT,
@@ -245,13 +236,11 @@ class AgentShortTextInputBlock(AgentInputBlock):
                    "value": "Hello",
                    "name": "short_text_1",
                    "description": "Short text example 1",
-                    "placeholder_values": [],
                },
                {
                    "value": "Quick test",
                    "name": "short_text_2",
                    "description": "Short text example 2",
-                    "placeholder_values": ["Quick test", "Another option"],
                },
            ],
            test_output=[
@@ -285,13 +274,11 @@ class AgentLongTextInputBlock(AgentInputBlock):
                    "value": "Lorem ipsum dolor sit amet...",
                    "name": "long_text_1",
                    "description": "Long text example 1",
-                    "placeholder_values": [],
                },
                {
                    "value": "Another multiline text input.",
                    "name": "long_text_2",
                    "description": "Long text example 2",
-                    "placeholder_values": ["Another multiline text input."],
                },
            ],
            test_output=[
@@ -325,13 +312,11 @@ class AgentNumberInputBlock(AgentInputBlock):
                    "value": 42,
                    "name": "number_input_1",
                    "description": "Number example 1",
-                    "placeholder_values": [],
                },
                {
                    "value": 314,
                    "name": "number_input_2",
                    "description": "Number example 2",
-                    "placeholder_values": [314, 2718],
                },
            ],
            test_output=[
@@ -484,7 +469,8 @@ class AgentFileInputBlock(AgentInputBlock):

 class AgentDropdownInputBlock(AgentInputBlock):
    """
-    A specialized text input block that relies on placeholder_values to present a dropdown.
+    A specialized text input block that presents a dropdown selector
+    restricted to a fixed set of values.
    """

    class Input(AgentInputBlock.Input):
@@ -494,13 +480,26 @@ class AgentDropdownInputBlock(AgentInputBlock):
            advanced=False,
            title="Default Value",
        )
-        placeholder_values: list = SchemaField(
-            description="Possible values for the dropdown.",
+        # Use Field() directly (not SchemaField) to pass validation_alias,
+        # which handles backward compat for legacy "placeholder_values" across
+        # all construction paths (model_construct, __init__, model_validate).
+        options: list = Field(
            default_factory=list,
-            advanced=False,
            title="Dropdown Options",
+            description=(
+                "If provided, renders the input as a dropdown selector "
+                "restricted to these values. Leave empty for free-text input."
+            ),
+            validation_alias=AliasChoices("options", "placeholder_values"),
+            json_schema_extra={"advanced": False, "secret": False},
        )

+        def generate_schema(self):
+            schema = super().generate_schema()
+            if possible_values := self.options:
+                schema["enum"] = possible_values
+            return schema
+
    class Output(AgentInputBlock.Output):
        result: str = SchemaField(description="Selected dropdown value.")

@@ -515,13 +514,13 @@ class AgentDropdownInputBlock(AgentInputBlock):
                {
                    "value": "Option A",
                    "name": "dropdown_1",
-                    "placeholder_values": ["Option A", "Option B", "Option C"],
+                    "options": ["Option A", "Option B", "Option C"],
                    "description": "Dropdown example 1",
                },
                {
                    "value": "Option C",
                    "name": "dropdown_2",
-                    "placeholder_values": ["Option A", "Option B", "Option C"],
+                    "options": ["Option A", "Option B", "Option C"],
                    "description": "Dropdown example 2",
                },
            ],
--- a/autogpt_platform/backend/backend/blocks/jina/embeddings.py
+++ b/autogpt_platform/backend/backend/blocks/jina/embeddings.py
@@ -10,7 +10,7 @@ from backend.blocks.jina._auth import (
    JinaCredentialsField,
    JinaCredentialsInput,
 )
-from backend.data.model import SchemaField
+from backend.data.model import NodeExecutionStats, SchemaField
 from backend.util.request import Requests


@@ -45,5 +45,13 @@ class JinaEmbeddingBlock(Block):
        }
        data = {"input": input_data.texts, "model": input_data.model}
        response = await Requests().post(url, headers=headers, json=data)
-        embeddings = [e["embedding"] for e in response.json()["data"]]
+        resp_json = response.json()
+        embeddings = [e["embedding"] for e in resp_json["data"]]
+        usage = resp_json.get("usage", {})
+        if usage.get("total_tokens"):
+            self.merge_stats(
+                NodeExecutionStats(
+                    input_token_count=usage.get("total_tokens", 0),
+                )
+            )
        yield "embeddings", embeddings
--- a/autogpt_platform/backend/backend/blocks/llm.py
+++ b/autogpt_platform/backend/backend/blocks/llm.py
@@ -49,6 +49,9 @@ settings = Settings()
 logger = TruncatedLogger(logging.getLogger(__name__), "[LLM-Block]")
 fmt = TextFormatter(autoescape=False)

+# HTTP status codes for user-caused errors that should not be reported to Sentry.
+USER_ERROR_STATUS_CODES = (401, 403, 429)
+
 LLMProviderName = Literal[
    ProviderName.AIML_API,
    ProviderName.ANTHROPIC,
@@ -101,6 +104,18 @@ class LlmModelMeta(EnumMeta):


 class LlmModel(str, Enum, metaclass=LlmModelMeta):
+
+    @classmethod
+    def _missing_(cls, value: object) -> "LlmModel | None":
+        """Handle provider-prefixed model names like 'anthropic/claude-sonnet-4-6'."""
+        if isinstance(value, str) and "/" in value:
+            stripped = value.split("/", 1)[1]
+            try:
+                return cls(stripped)
+            except ValueError:
+                return None
+        return None
+
    # OpenAI models
    O3_MINI = "o3-mini"
    O3 = "o3-2025-04-16"
@@ -672,6 +687,7 @@ class LLMResponse(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    reasoning: Optional[str] = None
+    provider_cost: float | None = None


 def convert_openai_tool_fmt_to_anthropic(
@@ -709,6 +725,9 @@ def convert_openai_tool_fmt_to_anthropic(
 def extract_openai_reasoning(response) -> str | None:
    """Extract reasoning from OpenAI-compatible response if available."""
    """Note: This will likely not working since the reasoning is not present in another Response API"""
+    if not response.choices:
+        logger.warning("LLM response has empty choices in extract_openai_reasoning")
+        return None
    reasoning = None
    choice = response.choices[0]
    if hasattr(choice, "reasoning") and getattr(choice, "reasoning", None):
@@ -724,6 +743,9 @@ def extract_openai_reasoning(response) -> str | None:

 def extract_openai_tool_calls(response) -> list[ToolContentBlock] | None:
    """Extract tool calls from OpenAI-compatible response."""
+    if not response.choices:
+        logger.warning("LLM response has empty choices in extract_openai_tool_calls")
+        return None
    if response.choices[0].message.tool_calls:
        return [
            ToolContentBlock(
@@ -891,65 +913,60 @@ async def llm_call(
        client = anthropic.AsyncAnthropic(
            api_key=credentials.api_key.get_secret_value()
        )
-        try:
-            resp = await client.messages.create(
-                model=llm_model.value,
-                system=sysprompt,
-                messages=messages,
-                max_tokens=max_tokens,
-                tools=an_tools,
-                timeout=600,
-            )
+        resp = await client.messages.create(
+            model=llm_model.value,
+            system=sysprompt,
+            messages=messages,
+            max_tokens=max_tokens,
+            tools=an_tools,
+            timeout=600,
+        )

-            if not resp.content:
-                raise ValueError("No content returned from Anthropic.")
+        if not resp.content:
+            raise ValueError("No content returned from Anthropic.")

-            tool_calls = None
-            for content_block in resp.content:
-                # Antropic is different to openai, need to iterate through
-                # the content blocks to find the tool calls
-                if content_block.type == "tool_use":
-                    if tool_calls is None:
-                        tool_calls = []
-                    tool_calls.append(
-                        ToolContentBlock(
-                            id=content_block.id,
-                            type=content_block.type,
-                            function=ToolCall(
-                                name=content_block.name,
-                                arguments=json.dumps(content_block.input),
-                            ),
-                        )
+        tool_calls = None
+        for content_block in resp.content:
+            # Antropic is different to openai, need to iterate through
+            # the content blocks to find the tool calls
+            if content_block.type == "tool_use":
+                if tool_calls is None:
+                    tool_calls = []
+                tool_calls.append(
+                    ToolContentBlock(
+                        id=content_block.id,
+                        type=content_block.type,
+                        function=ToolCall(
+                            name=content_block.name,
+                            arguments=json.dumps(content_block.input),
+                        ),
                    )
-
-            if not tool_calls and resp.stop_reason == "tool_use":
-                logger.warning(
-                    f"Tool use stop reason but no tool calls found in content. {resp}"
                )

-            reasoning = None
-            for content_block in resp.content:
-                if hasattr(content_block, "type") and content_block.type == "thinking":
-                    reasoning = content_block.thinking
-                    break
-
-            return LLMResponse(
-                raw_response=resp,
-                prompt=prompt,
-                response=(
-                    resp.content[0].name
-                    if isinstance(resp.content[0], anthropic.types.ToolUseBlock)
-                    else getattr(resp.content[0], "text", "")
-                ),
-                tool_calls=tool_calls,
-                prompt_tokens=resp.usage.input_tokens,
-                completion_tokens=resp.usage.output_tokens,
-                reasoning=reasoning,
+        if not tool_calls and resp.stop_reason == "tool_use":
+            logger.warning(
+                f"Tool use stop reason but no tool calls found in content. {resp}"
            )
-        except anthropic.APIError as e:
-            error_message = f"Anthropic API error: {str(e)}"
-            logger.error(error_message)
-            raise ValueError(error_message)
+
+        reasoning = None
+        for content_block in resp.content:
+            if hasattr(content_block, "type") and content_block.type == "thinking":
+                reasoning = content_block.thinking
+                break
+
+        return LLMResponse(
+            raw_response=resp,
+            prompt=prompt,
+            response=(
+                resp.content[0].name
+                if isinstance(resp.content[0], anthropic.types.ToolUseBlock)
+                else getattr(resp.content[0], "text", "")
+            ),
+            tool_calls=tool_calls,
+            prompt_tokens=resp.usage.input_tokens,
+            completion_tokens=resp.usage.output_tokens,
+            reasoning=reasoning,
+        )
    elif provider == "groq":
        if tools:
            raise ValueError("Groq does not support tools.")
@@ -962,6 +979,8 @@ async def llm_call(
            response_format=response_format,  # type: ignore
            max_tokens=max_tokens,
        )
+        if not response.choices:
+            raise ValueError("Groq returned empty choices in response")
        return LLMResponse(
            raw_response=response.choices[0].message,
            prompt=prompt,
@@ -1021,16 +1040,22 @@ async def llm_call(
            parallel_tool_calls=parallel_tool_calls_param,
        )

-        # If there's no response, raise an error
        if not response.choices:
-            if response:
-                raise ValueError(f"OpenRouter error: {response}")
-            else:
-                raise ValueError("No response from OpenRouter.")
+            raise ValueError(f"OpenRouter returned empty choices: {response}")

        tool_calls = extract_openai_tool_calls(response)
        reasoning = extract_openai_reasoning(response)

+        cost = None
+        try:
+            raw_resp = getattr(response, "_response", None)
+            if raw_resp and hasattr(raw_resp, "headers"):
+                cost_header = raw_resp.headers.get("x-total-cost")
+                if cost_header:
+                    cost = float(cost_header)
+        except (ValueError, AttributeError):
+            pass
+
        return LLMResponse(
            raw_response=response.choices[0].message,
            prompt=prompt,
@@ -1039,6 +1064,7 @@ async def llm_call(
            prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
            completion_tokens=response.usage.completion_tokens if response.usage else 0,
            reasoning=reasoning,
+            provider_cost=cost,
        )
    elif provider == "llama_api":
        tools_param = tools if tools else openai.NOT_GIVEN
@@ -1063,12 +1089,8 @@ async def llm_call(
            parallel_tool_calls=parallel_tool_calls_param,
        )

-        # If there's no response, raise an error
        if not response.choices:
-            if response:
-                raise ValueError(f"Llama API error: {response}")
-            else:
-                raise ValueError("No response from Llama API.")
+            raise ValueError(f"Llama API returned empty choices: {response}")

        tool_calls = extract_openai_tool_calls(response)
        reasoning = extract_openai_reasoning(response)
@@ -1098,6 +1120,8 @@ async def llm_call(
            messages=prompt,  # type: ignore
            max_tokens=max_tokens,
        )
+        if not completion.choices:
+            raise ValueError("AI/ML API returned empty choices in response")

        return LLMResponse(
            raw_response=completion.choices[0].message,
@@ -1134,6 +1158,9 @@ async def llm_call(
            parallel_tool_calls=parallel_tool_calls_param,
        )

+        if not response.choices:
+            raise ValueError(f"v0 API returned empty choices: {response}")
+
        tool_calls = extract_openai_tool_calls(response)
        reasoning = extract_openai_reasoning(response)

@@ -1362,12 +1389,13 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
                    max_tokens=input_data.max_tokens,
                )
                response_text = llm_response.response
-                self.merge_stats(
-                    NodeExecutionStats(
-                        input_token_count=llm_response.prompt_tokens,
-                        output_token_count=llm_response.completion_tokens,
-                    )
+                cost_stats = NodeExecutionStats(
+                    input_token_count=llm_response.prompt_tokens,
+                    output_token_count=llm_response.completion_tokens,
                )
+                if llm_response.provider_cost is not None:
+                    cost_stats.provider_cost = llm_response.provider_cost
+                self.merge_stats(cost_stats)
                logger.debug(f"LLM attempt-{retry_count} response: {response_text}")

                if input_data.expected_format:
@@ -1462,7 +1490,16 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
                    yield "prompt", self.prompt
                    return
            except Exception as e:
-                logger.exception(f"Error calling LLM: {e}")
+                is_user_error = (
+                    isinstance(e, (anthropic.APIStatusError, openai.APIStatusError))
+                    and e.status_code in USER_ERROR_STATUS_CODES
+                )
+                if is_user_error:
+                    logger.warning(f"Error calling LLM: {e}")
+                    error_feedback_message = f"Error calling LLM: {e}"
+                    break
+                else:
+                    logger.exception(f"Error calling LLM: {e}")
                if (
                    "maximum context length" in str(e).lower()
                    or "token limit" in str(e).lower()
@@ -1992,6 +2029,19 @@ class AIConversationBlock(AIBlockBase):
    async def run(
        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
    ) -> BlockOutput:
+        has_messages = any(
+            isinstance(m, dict)
+            and isinstance(m.get("content"), str)
+            and bool(m["content"].strip())
+            for m in (input_data.messages or [])
+        )
+        has_prompt = bool(input_data.prompt and input_data.prompt.strip())
+        if not has_messages and not has_prompt:
+            raise ValueError(
+                "Cannot call LLM with no messages and no prompt. "
+                "Provide at least one message or a non-empty prompt."
+            )
+
        response = await self.llm_call(
            AIStructuredResponseGeneratorBlock.Input(
                prompt=input_data.prompt,
--- a/autogpt_platform/backend/backend/blocks/mcp/block.py
+++ b/autogpt_platform/backend/backend/blocks/mcp/block.py
@@ -89,6 +89,12 @@ class MCPToolBlock(Block):
            default={},
            hidden=True,
        )
+        tool_description: str = SchemaField(
+            description="Description of the selected MCP tool. "
+            "Populated automatically when a tool is selected.",
+            default="",
+            hidden=True,
+        )

        tool_arguments: dict[str, Any] = SchemaField(
            description="Arguments to pass to the selected MCP tool. "
--- a/autogpt_platform/backend/backend/blocks/mem0.py
+++ b/autogpt_platform/backend/backend/blocks/mem0.py
@@ -8,6 +8,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -153,6 +154,7 @@ class AddMemoryBlock(Block, Mem0Base):
                messages,
                **params,
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))

            results = result.get("results", [])
            yield "results", results
@@ -255,6 +257,7 @@ class SearchMemoryBlock(Block, Mem0Base):
            result: list[dict[str, Any]] = client.search(
                input_data.query, version="v2", filters=filters
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "memories", result

        except Exception as e:
@@ -340,6 +343,7 @@ class GetAllMemoriesBlock(Block, Mem0Base):
                filters=filters,
                version="v2",
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))

            yield "memories", memories

@@ -434,6 +438,7 @@ class GetLatestMemoryBlock(Block, Mem0Base):
                filters=filters,
                version="v2",
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))

            if memories:
                # Return the latest memory (first in the list as they're sorted by recency)
--- a/autogpt_platform/backend/backend/blocks/nvidia/deepfake.py
+++ b/autogpt_platform/backend/backend/blocks/nvidia/deepfake.py
@@ -10,7 +10,7 @@ from backend.blocks.nvidia._auth import (
    NvidiaCredentialsField,
    NvidiaCredentialsInput,
 )
-from backend.data.model import SchemaField
+from backend.data.model import NodeExecutionStats, SchemaField
 from backend.util.request import Requests
 from backend.util.type import MediaFileType

@@ -69,6 +69,7 @@ class NvidiaDeepfakeDetectBlock(Block):
            data = response.json()

            result = data.get("data", [{}])[0]
+            self.merge_stats(NodeExecutionStats(output_size=1))

            # Get deepfake probability from first bounding box if any
            deepfake_prob = 0.0
--- a/autogpt_platform/backend/backend/blocks/smart_decision_maker.py
+++ b/autogpt_platform/backend/backend/blocks/smart_decision_maker.py
--- a/autogpt_platform/backend/backend/blocks/replicate/replicate_block.py
+++ b/autogpt_platform/backend/backend/blocks/replicate/replicate_block.py
@@ -17,7 +17,12 @@ from backend.blocks.replicate._auth import (
    ReplicateCredentialsInput,
 )
 from backend.blocks.replicate._helper import ReplicateOutputs, extract_result
-from backend.data.model import APIKeyCredentials, CredentialsField, SchemaField
+from backend.data.model import (
+    APIKeyCredentials,
+    CredentialsField,
+    NodeExecutionStats,
+    SchemaField,
+)
 from backend.util.exceptions import BlockExecutionError, BlockInputError

 logger = logging.getLogger(__name__)
@@ -108,6 +113,7 @@ class ReplicateModelBlock(Block):
            result = await self.run_model(
                model_ref, input_data.model_inputs, credentials.api_key
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "result", result
            yield "status", "succeeded"
            yield "model_name", input_data.model_name
--- a/autogpt_platform/backend/backend/blocks/screenshotone.py
+++ b/autogpt_platform/backend/backend/blocks/screenshotone.py
@@ -16,6 +16,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -185,6 +186,7 @@ class ScreenshotWebPageBlock(Block):
                block_chats=input_data.block_chats,
                cache=input_data.cache,
            )
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "image", screenshot_data["image"]
        except Exception as e:
            yield "error", str(e)
--- a/autogpt_platform/backend/backend/blocks/search.py
+++ b/autogpt_platform/backend/backend/blocks/search.py
@@ -15,6 +15,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -146,6 +147,7 @@ class GetWeatherInformationBlock(Block, GetRequest):
        weather_data = await self.get_request(url, json=True)

        if "main" in weather_data and "weather" in weather_data:
+            self.merge_stats(NodeExecutionStats(output_size=1))
            yield "temperature", str(weather_data["main"]["temp"])
            yield "humidity", str(weather_data["main"]["humidity"])
            yield "condition", weather_data["weather"][0]["description"]
--- a/autogpt_platform/backend/backend/blocks/smartlead/campaign.py
+++ b/autogpt_platform/backend/backend/blocks/smartlead/campaign.py
@@ -23,7 +23,7 @@ from backend.blocks.smartlead.models import (
    SaveSequencesResponse,
    Sequence,
 )
-from backend.data.model import CredentialsField, SchemaField
+from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField


 class CreateCampaignBlock(Block):
@@ -100,6 +100,7 @@ class CreateCampaignBlock(Block):
        **kwargs,
    ) -> BlockOutput:
        response = await self.create_campaign(input_data.name, credentials)
+        self.merge_stats(NodeExecutionStats(output_size=1))

        yield "id", response.id
        yield "name", response.name
@@ -226,6 +227,7 @@ class AddLeadToCampaignBlock(Block):
        response = await self.add_leads_to_campaign(
            input_data.campaign_id, input_data.lead_list, credentials
        )
+        self.merge_stats(NodeExecutionStats(output_size=len(input_data.lead_list)))

        yield "campaign_id", input_data.campaign_id
        yield "upload_count", response.upload_count
@@ -321,6 +323,7 @@ class SaveCampaignSequencesBlock(Block):
        response = await self.save_campaign_sequences(
            input_data.campaign_id, input_data.sequences, credentials
        )
+        self.merge_stats(NodeExecutionStats(output_size=1))

        if response.data:
            yield "data", response.data
--- a/autogpt_platform/backend/backend/blocks/sql_query_block.py
+++ b/autogpt_platform/backend/backend/blocks/sql_query_block.py
@@ -0,0 +1,304 @@
+import asyncio
+from typing import Any, Literal
+
+from pydantic import SecretStr
+from sqlalchemy.engine.url import URL
+from sqlalchemy.exc import DBAPIError, OperationalError, ProgrammingError
+
+from backend.blocks._base import (
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+)
+from backend.blocks.sql_query_helpers import (
+    _DATABASE_TYPE_DEFAULT_PORT,
+    _DATABASE_TYPE_TO_DRIVER,
+    DatabaseType,
+    _execute_query,
+    _sanitize_error,
+    _validate_query_is_read_only,
+    _validate_single_statement,
+)
+from backend.data.model import (
+    CredentialsField,
+    CredentialsMetaInput,
+    SchemaField,
+    UserPasswordCredentials,
+)
+from backend.integrations.providers import ProviderName
+from backend.util.request import resolve_and_check_blocked
+
+TEST_CREDENTIALS = UserPasswordCredentials(
+    id="01234567-89ab-cdef-0123-456789abcdef",
+    provider="database",
+    username=SecretStr("test_user"),
+    password=SecretStr("test_pass"),
+    title="Mock Database credentials",
+)
+
+TEST_CREDENTIALS_INPUT = {
+    "provider": TEST_CREDENTIALS.provider,
+    "id": TEST_CREDENTIALS.id,
+    "type": TEST_CREDENTIALS.type,
+    "title": TEST_CREDENTIALS.title,
+}
+
+DatabaseCredentials = UserPasswordCredentials
+DatabaseCredentialsInput = CredentialsMetaInput[
+    Literal[ProviderName.DATABASE],
+    Literal["user_password"],
+]
+
+
+def DatabaseCredentialsField() -> DatabaseCredentialsInput:
+    return CredentialsField(
+        description="Database username and password",
+    )
+
+
+class SQLQueryBlock(Block):
+    class Input(BlockSchemaInput):
+        database_type: DatabaseType = SchemaField(
+            default=DatabaseType.POSTGRES,
+            description="Database engine",
+            advanced=False,
+        )
+        host: SecretStr = SchemaField(
+            description="Database hostname or IP address",
+            placeholder="db.example.com",
+            secret=True,
+        )
+        port: int | None = SchemaField(
+            default=None,
+            description=(
+                "Database port (leave empty for default: "
+                "PostgreSQL: 5432, MySQL: 3306, MSSQL: 1433)"
+            ),
+            ge=1,
+            le=65535,
+        )
+        database: str = SchemaField(
+            description="Name of the database to connect to",
+            placeholder="my_database",
+        )
+        query: str = SchemaField(
+            description="SQL query to execute",
+            placeholder="SELECT * FROM analytics.daily_active_users LIMIT 10",
+        )
+        read_only: bool = SchemaField(
+            default=True,
+            description=(
+                "When enabled (default), only SELECT queries are allowed "
+                "and the database session is set to read-only mode. "
+                "Disable to allow write operations (INSERT, UPDATE, DELETE, etc.)."
+            ),
+        )
+        timeout: int = SchemaField(
+            default=30,
+            description="Query timeout in seconds (max 120)",
+            ge=1,
+            le=120,
+        )
+        max_rows: int = SchemaField(
+            default=1000,
+            description="Maximum number of rows to return (max 10000)",
+            ge=1,
+            le=10000,
+        )
+        credentials: DatabaseCredentialsInput = DatabaseCredentialsField()
+
+    class Output(BlockSchemaOutput):
+        results: list[dict[str, Any]] = SchemaField(
+            description="Query results as a list of row dictionaries"
+        )
+        columns: list[str] = SchemaField(
+            description="Column names from the query result"
+        )
+        row_count: int = SchemaField(description="Number of rows returned")
+        affected_rows: int = SchemaField(
+            description="Number of rows affected by a write query (INSERT/UPDATE/DELETE)"
+        )
+        error: str = SchemaField(description="Error message if the query failed")
+
+    def __init__(self):
+        super().__init__(
+            id="4dc35c0f-4fd8-465e-9616-5a216f1ba2bc",
+            description=(
+                "Execute a SQL query. Read-only by default for safety "
+                "-- disable to allow write operations. "
+                "Supports PostgreSQL, MySQL, and MSSQL via SQLAlchemy."
+            ),
+            categories={BlockCategory.DATA},
+            input_schema=SQLQueryBlock.Input,
+            output_schema=SQLQueryBlock.Output,
+            test_input={
+                "query": "SELECT 1 AS test_col",
+                "database_type": DatabaseType.POSTGRES,
+                "host": "localhost",
+                "database": "test_db",
+                "timeout": 30,
+                "max_rows": 1000,
+                "credentials": TEST_CREDENTIALS_INPUT,
+            },
+            test_credentials=TEST_CREDENTIALS,
+            test_output=[
+                ("results", [{"test_col": 1}]),
+                ("columns", ["test_col"]),
+                ("row_count", 1),
+            ],
+            test_mock={
+                "execute_query": lambda *_args, **_kwargs: (
+                    [{"test_col": 1}],
+                    ["test_col"],
+                    -1,
+                ),
+                "check_host_allowed": lambda *_args, **_kwargs: ["127.0.0.1"],
+            },
+        )
+
+    @staticmethod
+    async def check_host_allowed(host: str) -> list[str]:
+        """Validate that the given host is not a private/blocked address.
+
+        Returns the list of resolved IP addresses so the caller can pin the
+        connection to the validated IP (preventing DNS rebinding / TOCTOU).
+        Raises ValueError or OSError if the host is blocked.
+        Extracted as a method so it can be mocked during block tests.
+        """
+        return await resolve_and_check_blocked(host)
+
+    @staticmethod
+    def execute_query(
+        connection_url: URL | str,
+        query: str,
+        timeout: int,
+        max_rows: int,
+        read_only: bool = True,
+        database_type: DatabaseType = DatabaseType.POSTGRES,
+    ) -> tuple[list[dict[str, Any]], list[str], int]:
+        """Execute a SQL query and return (rows, columns, affected_rows).
+
+        Delegates to ``_execute_query`` in ``sql_query_helpers``.
+        Extracted as a method so it can be mocked during block tests.
+        """
+        return _execute_query(
+            connection_url=connection_url,
+            query=query,
+            timeout=timeout,
+            max_rows=max_rows,
+            read_only=read_only,
+            database_type=database_type,
+        )
+
+    async def run(
+        self,
+        input_data: Input,
+        *,
+        credentials: DatabaseCredentials,
+        **_kwargs: Any,
+    ) -> BlockOutput:
+        # Validate query structure and read-only constraints.
+        error = self._validate_query(input_data)
+        if error:
+            yield "error", error
+            return
+
+        # Validate host and resolve for SSRF protection.
+        host, pinned_host, error = await self._resolve_host(input_data)
+        if error:
+            yield "error", error
+            return
+
+        # Build connection URL and execute.
+        port = input_data.port or _DATABASE_TYPE_DEFAULT_PORT[input_data.database_type]
+        username = credentials.username.get_secret_value()
+        connection_url = URL.create(
+            drivername=_DATABASE_TYPE_TO_DRIVER[input_data.database_type],
+            username=username,
+            password=credentials.password.get_secret_value(),
+            host=pinned_host,
+            port=port,
+            database=input_data.database,
+        )
+        conn_str = connection_url.render_as_string(hide_password=True)
+        db_name = input_data.database
+
+        def _sanitize(err: Exception) -> str:
+            return _sanitize_error(
+                str(err).strip(),
+                conn_str,
+                host=pinned_host,
+                original_host=host,
+                username=username,
+                port=port,
+                database=db_name,
+            )
+
+        try:
+            results, columns, affected = await asyncio.to_thread(
+                self.execute_query,
+                connection_url=connection_url,
+                query=input_data.query,
+                timeout=input_data.timeout,
+                max_rows=input_data.max_rows,
+                read_only=input_data.read_only,
+                database_type=input_data.database_type,
+            )
+            yield "results", results
+            yield "columns", columns
+            yield "row_count", len(results)
+            if affected >= 0:
+                yield "affected_rows", affected
+        except OperationalError as e:
+            yield "error", self._classify_operational_error(
+                _sanitize(e),
+                input_data.timeout,
+            )
+        except ProgrammingError as e:
+            yield "error", f"SQL error: {_sanitize(e)}"
+        except DBAPIError as e:
+            yield "error", f"Database error: {_sanitize(e)}"
+        except ModuleNotFoundError:
+            yield "error", (
+                f"Database driver not available for "
+                f"{input_data.database_type.value}. "
+                f"Please contact the platform administrator."
+            )
+
+    @staticmethod
+    def _validate_query(input_data: "SQLQueryBlock.Input") -> str | None:
+        """Validate query structure and read-only constraints."""
+        stmt_error, parsed_stmt = _validate_single_statement(input_data.query)
+        if stmt_error:
+            return stmt_error
+        assert parsed_stmt is not None
+        if input_data.read_only:
+            return _validate_query_is_read_only(parsed_stmt)
+        return None
+
+    async def _resolve_host(
+        self, input_data: "SQLQueryBlock.Input"
+    ) -> tuple[str, str, str | None]:
+        """Validate and resolve the database host. Returns (host, pinned_ip, error)."""
+        host = input_data.host.get_secret_value().strip()
+        if not host:
+            return "", "", "Database host is required."
+        if host.startswith("/"):
+            return host, "", "Unix socket connections are not allowed."
+        try:
+            resolved_ips = await self.check_host_allowed(host)
+        except (ValueError, OSError) as e:
+            return host, "", f"Blocked host: {str(e).strip()}"
+        return host, resolved_ips[0], None
+
+    @staticmethod
+    def _classify_operational_error(sanitized_msg: str, timeout: int) -> str:
+        """Classify an already-sanitized OperationalError for user display."""
+        lower = sanitized_msg.lower()
+        if "timeout" in lower or "cancel" in lower:
+            return f"Query timed out after {timeout}s."
+        if "connect" in lower:
+            return f"Failed to connect to database: {sanitized_msg}"
+        return f"Database error: {sanitized_msg}"
--- a/autogpt_platform/backend/backend/blocks/sql_query_block_test.py
+++ b/autogpt_platform/backend/backend/blocks/sql_query_block_test.py
--- a/autogpt_platform/backend/backend/blocks/sql_query_helpers.py
+++ b/autogpt_platform/backend/backend/blocks/sql_query_helpers.py
@@ -0,0 +1,376 @@
+import re
+from datetime import date, datetime, time
+from decimal import Decimal
+from enum import Enum
+from typing import Any
+
+import sqlparse
+from sqlalchemy import create_engine, text
+from sqlalchemy.engine.url import URL
+
+
+class DatabaseType(str, Enum):
+    POSTGRES = "postgres"
+    MYSQL = "mysql"
+    MSSQL = "mssql"
+
+
+# Defense-in-depth: reject queries containing data-modifying keywords.
+# These are checked against parsed SQL tokens (not raw text) so column names
+# and string literals do not cause false positives.
+_DISALLOWED_KEYWORDS = {
+    "INSERT",
+    "UPDATE",
+    "DELETE",
+    "DROP",
+    "ALTER",
+    "CREATE",
+    "TRUNCATE",
+    "GRANT",
+    "REVOKE",
+    "COPY",
+    "EXECUTE",
+    "CALL",
+    "SET",
+    "RESET",
+    "DISCARD",
+    "NOTIFY",
+    "DO",
+}
+
+# Map DatabaseType enum values to the expected SQLAlchemy driver prefix.
+_DATABASE_TYPE_TO_DRIVER = {
+    DatabaseType.POSTGRES: "postgresql",
+    DatabaseType.MYSQL: "mysql+pymysql",
+    DatabaseType.MSSQL: "mssql+pymssql",
+}
+
+# Default ports for each database type.
+_DATABASE_TYPE_DEFAULT_PORT = {
+    DatabaseType.POSTGRES: 5432,
+    DatabaseType.MYSQL: 3306,
+    DatabaseType.MSSQL: 1433,
+}
+
+
+def _sanitize_error(
+    error_msg: str,
+    connection_string: str,
+    *,
+    host: str = "",
+    original_host: str = "",
+    username: str = "",
+    port: int = 0,
+    database: str = "",
+) -> str:
+    """Remove connection string, credentials, and infrastructure details
+    from error messages so they are safe to expose to the LLM.
+
+    Scrubs:
+    - The full connection string
+    - URL-embedded credentials (``://user:pass@``)
+    - ``password=<value>`` key-value pairs
+    - The database hostname / IP used for the connection
+    - The original (pre-resolution) hostname provided by the user
+    - Any IPv4 addresses that appear in the message
+    - Any bracketed IPv6 addresses (e.g. ``[::1]``, ``[fe80::1%eth0]``)
+    - The database username
+    - The database port number
+    - The database name
+    """
+    sanitized = error_msg.replace(connection_string, "<connection_string>")
+    sanitized = re.sub(r"password=[^\s&]+", "password=***", sanitized)
+    sanitized = re.sub(r"://[^@]+@", "://***:***@", sanitized)
+
+    # Replace the known host (may be an IP already) before the generic IP pass.
+    # Also replace the original (pre-DNS-resolution) hostname if it differs.
+    if original_host and original_host != host:
+        sanitized = sanitized.replace(original_host, "<host>")
+    if host:
+        sanitized = sanitized.replace(host, "<host>")
+
+    # Replace any remaining IPv4 addresses (e.g. resolved IPs the driver logs)
+    sanitized = re.sub(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", "<ip>", sanitized)
+
+    # Replace bracketed IPv6 addresses (e.g. "[::1]", "[fe80::1%eth0]")
+    sanitized = re.sub(r"\[[0-9a-fA-F:]+(?:%[^\]]+)?\]", "<ip>", sanitized)
+
+    # Replace the database username (handles double-quoted, single-quoted,
+    # and unquoted formats across PostgreSQL, MySQL, and MSSQL error messages).
+    if username:
+        sanitized = re.sub(
+            r"""for user ["']?""" + re.escape(username) + r"""["']?""",
+            "for user <user>",
+            sanitized,
+        )
+        # Catch remaining bare occurrences in various quote styles:
+        # - PostgreSQL: "FATAL:  role "myuser" does not exist"
+        # - MySQL: "Access denied for user 'myuser'@'host'"
+        # - MSSQL: "Login failed for user 'myuser'"
+        sanitized = sanitized.replace(f'"{username}"', "<user>")
+        sanitized = sanitized.replace(f"'{username}'", "<user>")
+
+    # Replace the port number (handles "port 5432" and ":5432" formats)
+    if port:
+        port_str = re.escape(str(port))
+        sanitized = re.sub(
+            r"(?:port |:)" + port_str + r"(?![0-9])",
+            lambda m: ("port " if m.group().startswith("p") else ":") + "<port>",
+            sanitized,
+        )
+
+    # Replace the database name to avoid leaking internal infrastructure names.
+    # Use word-boundary regex to prevent mangling when the database name is a
+    # common substring (e.g. "test", "data", "on").
+    if database:
+        sanitized = re.sub(r"\b" + re.escape(database) + r"\b", "<database>", sanitized)
+
+    return sanitized
+
+
+def _extract_keyword_tokens(parsed: sqlparse.sql.Statement) -> list[str]:
+    """Extract keyword tokens from a parsed SQL statement.
+
+    Uses sqlparse token type classification to collect Keyword/DML/DDL/DCL
+    tokens. String literals and identifiers have different token types, so
+    they are naturally excluded from the result.
+    """
+    return [
+        token.normalized.upper()
+        for token in parsed.flatten()
+        if token.ttype
+        in (
+            sqlparse.tokens.Keyword,
+            sqlparse.tokens.Keyword.DML,
+            sqlparse.tokens.Keyword.DDL,
+            sqlparse.tokens.Keyword.DCL,
+        )
+    ]
+
+
+def _has_disallowed_into(stmt: sqlparse.sql.Statement) -> bool:
+    """Check if a statement contains a disallowed ``INTO`` clause.
+
+    ``SELECT ... INTO @variable`` is a valid read-only MySQL syntax that stores
+    a query result into a session-scoped user variable.  All other forms of
+    ``INTO`` are data-modifying or file-writing and must be blocked:
+
+    * ``SELECT ... INTO new_table``  (PostgreSQL / MSSQL – creates a table)
+    * ``SELECT ... INTO OUTFILE``    (MySQL – writes to the filesystem)
+    * ``SELECT ... INTO DUMPFILE``   (MySQL – writes to the filesystem)
+    * ``INSERT INTO ...``            (already blocked by INSERT being in the
+      disallowed set, but we reject INTO as well for defense-in-depth)
+
+    Returns ``True`` if the statement contains a disallowed ``INTO``.
+    """
+    flat = list(stmt.flatten())
+    for i, token in enumerate(flat):
+        if not (
+            token.ttype in (sqlparse.tokens.Keyword,)
+            and token.normalized.upper() == "INTO"
+        ):
+            continue
+
+        # Look at the first non-whitespace token after INTO.
+        j = i + 1
+        while j < len(flat) and flat[j].ttype is sqlparse.tokens.Text.Whitespace:
+            j += 1
+
+        if j >= len(flat):
+            # INTO at the very end – malformed, block it.
+            return True
+
+        next_token = flat[j]
+        # MySQL user variable: either a single Name starting with "@"
+        # (e.g. ``@total``) or a bare ``@`` Operator token followed by a Name.
+        if next_token.ttype is sqlparse.tokens.Name and next_token.value.startswith(
+            "@"
+        ):
+            continue
+        if next_token.ttype is sqlparse.tokens.Operator and next_token.value == "@":
+            continue
+
+        # Everything else (table name, OUTFILE, DUMPFILE, etc.) is disallowed.
+        return True
+
+    return False
+
+
+def _validate_query_is_read_only(stmt: sqlparse.sql.Statement) -> str | None:
+    """Validate that a parsed SQL statement is read-only (SELECT/WITH only).
+
+    Accepts an already-parsed statement from ``_validate_single_statement``
+    to avoid re-parsing. Checks:
+    1. Statement type must be SELECT (sqlparse classifies WITH...SELECT as SELECT)
+    2. No disallowed keywords (INSERT, UPDATE, DELETE, DROP, etc.)
+    3. No disallowed INTO clauses (allows MySQL ``SELECT ... INTO @variable``)
+
+    Returns an error message if the query is not read-only, None otherwise.
+    """
+    # sqlparse returns 'SELECT' for SELECT and WITH...SELECT queries
+    if stmt.get_type() != "SELECT":
+        return "Only SELECT queries are allowed."
+
+    # Defense-in-depth: check parsed keyword tokens for disallowed keywords
+    for kw in _extract_keyword_tokens(stmt):
+        # Normalize multi-word tokens (e.g. "SET LOCAL" -> "SET")
+        base_kw = kw.split()[0] if " " in kw else kw
+        if base_kw in _DISALLOWED_KEYWORDS:
+            return f"Disallowed SQL keyword: {kw}"
+
+    # Contextual check for INTO: allow MySQL @variable syntax, block everything else
+    if _has_disallowed_into(stmt):
+        return "Disallowed SQL keyword: INTO"
+
+    return None
+
+
+def _validate_single_statement(
+    query: str,
+) -> tuple[str | None, sqlparse.sql.Statement | None]:
+    """Validate that the query contains exactly one non-empty SQL statement.
+
+    Returns (error_message, parsed_statement). If error_message is not None,
+    the query is invalid and parsed_statement will be None.
+    """
+    stripped = query.strip().rstrip(";").strip()
+    if not stripped:
+        return "Query is empty.", None
+
+    # Parse the SQL using sqlparse for proper tokenization
+    statements = sqlparse.parse(stripped)
+
+    # Filter out empty statements and comment-only statements
+    statements = [
+        s
+        for s in statements
+        if s.tokens
+        and str(s).strip()
+        and not all(
+            t.is_whitespace or t.ttype in sqlparse.tokens.Comment for t in s.flatten()
+        )
+    ]
+
+    if not statements:
+        return "Query is empty.", None
+
+    # Reject multiple statements -- prevents injection via semicolons
+    if len(statements) > 1:
+        return "Only single statements are allowed.", None
+
+    return None, statements[0]
+
+
+def _serialize_value(value: Any) -> Any:
+    """Convert database-specific types to JSON-serializable Python types."""
+    if isinstance(value, Decimal):
+        # Use int for whole numbers; use str for fractional to preserve exact
+        # precision (float would silently round high-precision analytics values).
+        if value == value.to_integral_value():
+            return int(value)
+        return str(value)
+    if isinstance(value, (datetime, date, time)):
+        return value.isoformat()
+    if isinstance(value, memoryview):
+        return bytes(value).hex()
+    if isinstance(value, bytes):
+        return value.hex()
+    return value
+
+
+def _configure_session(
+    conn: Any,
+    dialect_name: str,
+    timeout_ms: str,
+    read_only: bool,
+) -> None:
+    """Set session-level timeout and read-only mode for the given dialect."""
+    if dialect_name == "postgresql":
+        conn.execute(text("SET statement_timeout = " + timeout_ms))
+        if read_only:
+            conn.execute(text("SET default_transaction_read_only = ON"))
+    elif dialect_name == "mysql":
+        # NOTE: MAX_EXECUTION_TIME only applies to SELECT statements.
+        # Write queries (INSERT/UPDATE/DELETE) are not bounded by this
+        # setting; they rely on the database's wait_timeout instead.
+        conn.execute(text("SET SESSION MAX_EXECUTION_TIME = " + timeout_ms))
+        if read_only:
+            conn.execute(text("SET SESSION TRANSACTION READ ONLY"))
+    elif dialect_name == "mssql":
+        # MSSQL: SET LOCK_TIMEOUT limits lock-wait time (ms).
+        # pymssql's connect_args "login_timeout" handles the connection
+        # timeout, but LOCK_TIMEOUT covers in-query lock waits.
+        conn.execute(text("SET LOCK_TIMEOUT " + timeout_ms))
+        # MSSQL lacks a session-level read-only mode like
+        # PostgreSQL/MySQL.  Read-only enforcement is handled by
+        # the SQL validation layer (_validate_query_is_read_only)
+        # and the ROLLBACK in the finally block.
+
+
+def _run_in_transaction(
+    conn: Any,
+    dialect_name: str,
+    query: str,
+    max_rows: int,
+    read_only: bool,
+) -> tuple[list[dict[str, Any]], list[str], int]:
+    """Execute a query inside an explicit transaction, returning results."""
+    # MSSQL uses T-SQL "BEGIN TRANSACTION"; others use "BEGIN".
+    begin_stmt = "BEGIN TRANSACTION" if dialect_name == "mssql" else "BEGIN"
+    conn.execute(text(begin_stmt))
+    try:
+        result = conn.execute(text(query))
+        affected = result.rowcount if not result.returns_rows else -1
+        columns = list(result.keys()) if result.returns_rows else []
+        rows = result.fetchmany(max_rows) if result.returns_rows else []
+        results = [
+            {col: _serialize_value(val) for col, val in zip(columns, row)}
+            for row in rows
+        ]
+    except Exception:
+        conn.execute(text("ROLLBACK"))
+        raise
+    else:
+        conn.execute(text("ROLLBACK" if read_only else "COMMIT"))
+    return results, columns, affected
+
+
+def _execute_query(
+    connection_url: URL | str,
+    query: str,
+    timeout: int,
+    max_rows: int,
+    read_only: bool = True,
+    database_type: DatabaseType = DatabaseType.POSTGRES,
+) -> tuple[list[dict[str, Any]], list[str], int]:
+    """Execute a SQL query and return (rows, columns, affected_rows).
+
+    Uses SQLAlchemy to connect to any supported database.
+    For SELECT queries, rows are limited to ``max_rows`` via DBAPI fetchmany.
+    For write queries, affected_rows contains the rowcount from the driver.
+    When ``read_only`` is True, the database session is set to read-only
+    mode and the transaction is always rolled back.
+    """
+    # Determine driver-specific connection timeout argument.
+    # pymssql uses "login_timeout", while PostgreSQL/MySQL use "connect_timeout".
+    timeout_key = (
+        "login_timeout" if database_type == DatabaseType.MSSQL else "connect_timeout"
+    )
+    engine = create_engine(connection_url, connect_args={timeout_key: 10})
+    try:
+        with engine.connect() as conn:
+            # Use AUTOCOMMIT so SET commands take effect immediately.
+            conn = conn.execution_options(isolation_level="AUTOCOMMIT")
+
+            # Compute timeout in milliseconds. The value is Pydantic-validated
+            # (ge=1, le=120), but we use int() as defense-in-depth.
+            # NOTE: SET commands do not support bind parameters in most
+            # databases, so we use str(int(...)) for safe interpolation.
+            timeout_ms = str(int(timeout * 1000))
+
+            _configure_session(conn, engine.dialect.name, timeout_ms, read_only)
+            return _run_in_transaction(
+                conn, engine.dialect.name, query, max_rows, read_only
+            )
+    finally:
+        engine.dispose()
--- a/autogpt_platform/backend/backend/blocks/stagehand/blocks.py
+++ b/autogpt_platform/backend/backend/blocks/stagehand/blocks.py
@@ -1,13 +1,8 @@
 import logging
-import signal
-import threading
-import warnings
-from contextlib import contextmanager
 from enum import Enum

-# Monkey patch Stagehands to prevent signal handling in worker threads
-import stagehand.main
-from stagehand import Stagehand
+from stagehand import AsyncStagehand
+from stagehand.types.session_act_params import Options as ActOptions

 from backend.blocks.llm import (
    MODEL_METADATA,
@@ -28,46 +23,6 @@ from backend.sdk import (
    SchemaField,
 )

-# Suppress false positive cleanup warning of litellm (a dependency of stagehand)
-warnings.filterwarnings("ignore", module="litellm.llms.custom_httpx")
-
-# Store the original method
-original_register_signal_handlers = stagehand.main.Stagehand._register_signal_handlers
-
-
-def safe_register_signal_handlers(self):
-    """Only register signal handlers in the main thread"""
-    if threading.current_thread() is threading.main_thread():
-        original_register_signal_handlers(self)
-    else:
-        # Skip signal handling in worker threads
-        pass
-
-
-# Replace the method
-stagehand.main.Stagehand._register_signal_handlers = safe_register_signal_handlers
-
-
-@contextmanager
-def disable_signal_handling():
-    """Context manager to temporarily disable signal handling"""
-    if threading.current_thread() is not threading.main_thread():
-        # In worker threads, temporarily replace signal.signal with a no-op
-        original_signal = signal.signal
-
-        def noop_signal(*args, **kwargs):
-            pass
-
-        signal.signal = noop_signal
-        try:
-            yield
-        finally:
-            signal.signal = original_signal
-    else:
-        # In main thread, don't modify anything
-        yield
-
-
 logger = logging.getLogger(__name__)


@@ -148,13 +103,10 @@ class StagehandObserveBlock(Block):
        instruction: str = SchemaField(
            description="Natural language description of elements or actions to discover.",
        )
-        iframes: bool = SchemaField(
-            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
-            default=True,
-        )
-        domSettleTimeoutMs: int = SchemaField(
-            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
-            default=45000,
+        dom_settle_timeout_ms: int = SchemaField(
+            description="Timeout in ms to wait for the DOM to settle after navigation.",
+            default=30000,
+            advanced=True,
        )

    class Output(BlockSchemaOutput):
@@ -185,32 +137,28 @@ class StagehandObserveBlock(Block):

        logger.debug(f"OBSERVE: Using model provider {model_credentials.provider}")

-        with disable_signal_handling():
-            stagehand = Stagehand(
-                api_key=stagehand_credentials.api_key.get_secret_value(),
-                project_id=input_data.browserbase_project_id,
+        async with AsyncStagehand(
+            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
+            browserbase_project_id=input_data.browserbase_project_id,
+            model_api_key=model_credentials.api_key.get_secret_value(),
+        ) as client:
+            session = await client.sessions.start(
                model_name=input_data.model.provider_name,
-                model_api_key=model_credentials.api_key.get_secret_value(),
+                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
            )
+            try:
+                await session.navigate(url=input_data.url)

-            await stagehand.init()
-
-        page = stagehand.page
-
-        assert page is not None, "Stagehand page is not initialized"
-
-        await page.goto(input_data.url)
-
-        observe_results = await page.observe(
-            input_data.instruction,
-            iframes=input_data.iframes,
-            domSettleTimeoutMs=input_data.domSettleTimeoutMs,
-        )
-        for result in observe_results:
-            yield "selector", result.selector
-            yield "description", result.description
-            yield "method", result.method
-            yield "arguments", result.arguments
+                observe_response = await session.observe(
+                    instruction=input_data.instruction,
+                )
+                for result in observe_response.data.result:
+                    yield "selector", result.selector
+                    yield "description", result.description
+                    yield "method", result.method
+                    yield "arguments", result.arguments
+            finally:
+                await session.end()


 class StagehandActBlock(Block):
@@ -242,24 +190,22 @@ class StagehandActBlock(Block):
            description="Variables to use in the action. Variables contains data you want the action to use.",
            default_factory=dict,
        )
-        iframes: bool = SchemaField(
-            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
-            default=True,
+        dom_settle_timeout_ms: int = SchemaField(
+            description="Timeout in ms to wait for the DOM to settle after navigation.",
+            default=30000,
+            advanced=True,
        )
-        domSettleTimeoutMs: int = SchemaField(
-            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
-            default=45000,
-        )
-        timeoutMs: int = SchemaField(
-            description="Timeout in milliseconds for DOM ready. Extended timeout for slow-loading forms",
-            default=60000,
+        timeout_ms: int = SchemaField(
+            description="Timeout in ms for each action.",
+            default=30000,
+            advanced=True,
        )

    class Output(BlockSchemaOutput):
        success: bool = SchemaField(
            description="Whether the action was completed successfully"
        )
-        message: str = SchemaField(description="Details about the action’s execution.")
+        message: str = SchemaField(description="Details about the action's execution.")
        action: str = SchemaField(description="Action performed")

    def __init__(self):
@@ -282,32 +228,33 @@ class StagehandActBlock(Block):

        logger.debug(f"ACT: Using model provider {model_credentials.provider}")

-        with disable_signal_handling():
-            stagehand = Stagehand(
-                api_key=stagehand_credentials.api_key.get_secret_value(),
-                project_id=input_data.browserbase_project_id,
+        async with AsyncStagehand(
+            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
+            browserbase_project_id=input_data.browserbase_project_id,
+            model_api_key=model_credentials.api_key.get_secret_value(),
+        ) as client:
+            session = await client.sessions.start(
                model_name=input_data.model.provider_name,
-                model_api_key=model_credentials.api_key.get_secret_value(),
+                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
            )
+            try:
+                await session.navigate(url=input_data.url)

-            await stagehand.init()
-
-        page = stagehand.page
-
-        assert page is not None, "Stagehand page is not initialized"
-
-        await page.goto(input_data.url)
-        for action in input_data.action:
-            action_results = await page.act(
-                action,
-                variables=input_data.variables,
-                iframes=input_data.iframes,
-                domSettleTimeoutMs=input_data.domSettleTimeoutMs,
-                timeoutMs=input_data.timeoutMs,
-            )
-            yield "success", action_results.success
-            yield "message", action_results.message
-            yield "action", action_results.action
+                for action in input_data.action:
+                    act_options = ActOptions(
+                        variables={k: v for k, v in input_data.variables.items()},
+                        timeout=input_data.timeout_ms,
+                    )
+                    act_response = await session.act(
+                        input=action,
+                        options=act_options,
+                    )
+                    result = act_response.data.result
+                    yield "success", result.success
+                    yield "message", result.message
+                    yield "action", result.action_description
+            finally:
+                await session.end()


 class StagehandExtractBlock(Block):
@@ -335,13 +282,10 @@ class StagehandExtractBlock(Block):
        instruction: str = SchemaField(
            description="Natural language description of elements or actions to discover.",
        )
-        iframes: bool = SchemaField(
-            description="Whether to search within iframes. If True, Stagehand will search for actions within iframes.",
-            default=True,
-        )
-        domSettleTimeoutMs: int = SchemaField(
-            description="Timeout in milliseconds for DOM settlement.Wait longer for dynamic content",
-            default=45000,
+        dom_settle_timeout_ms: int = SchemaField(
+            description="Timeout in ms to wait for the DOM to settle after navigation.",
+            default=30000,
+            advanced=True,
        )

    class Output(BlockSchemaOutput):
@@ -367,24 +311,21 @@ class StagehandExtractBlock(Block):

        logger.debug(f"EXTRACT: Using model provider {model_credentials.provider}")

-        with disable_signal_handling():
-            stagehand = Stagehand(
-                api_key=stagehand_credentials.api_key.get_secret_value(),
-                project_id=input_data.browserbase_project_id,
+        async with AsyncStagehand(
+            browserbase_api_key=stagehand_credentials.api_key.get_secret_value(),
+            browserbase_project_id=input_data.browserbase_project_id,
+            model_api_key=model_credentials.api_key.get_secret_value(),
+        ) as client:
+            session = await client.sessions.start(
                model_name=input_data.model.provider_name,
-                model_api_key=model_credentials.api_key.get_secret_value(),
+                dom_settle_timeout_ms=input_data.dom_settle_timeout_ms,
            )
+            try:
+                await session.navigate(url=input_data.url)

-            await stagehand.init()
-
-        page = stagehand.page
-
-        assert page is not None, "Stagehand page is not initialized"
-
-        await page.goto(input_data.url)
-        extraction = await page.extract(
-            input_data.instruction,
-            iframes=input_data.iframes,
-            domSettleTimeoutMs=input_data.domSettleTimeoutMs,
-        )
-        yield "extraction", str(extraction.model_dump()["extraction"])
+                extract_response = await session.extract(
+                    instruction=input_data.instruction,
+                )
+                yield "extraction", str(extract_response.data.result)
+            finally:
+                await session.end()
--- a/autogpt_platform/backend/backend/blocks/talking_head.py
+++ b/autogpt_platform/backend/backend/blocks/talking_head.py
@@ -15,6 +15,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -181,6 +182,7 @@ class CreateTalkingAvatarVideoBlock(Block):
                    execution_context=execution_context,
                    return_format="for_block_output",
                )
+                self.merge_stats(NodeExecutionStats(output_size=1))
                yield "video_url", stored_url
                return
            elif status_response["status"] == "error":
--- a/autogpt_platform/backend/backend/blocks/test/test_block.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_block.py
@@ -4,6 +4,8 @@ import pytest

 from backend.blocks import get_blocks
 from backend.blocks._base import Block, BlockSchemaInput
+from backend.blocks.io import AgentDropdownInputBlock, AgentInputBlock
+from backend.data.graph import BaseGraph
 from backend.data.model import SchemaField
 from backend.util.test import execute_block_test

@@ -279,3 +281,113 @@ class TestAutoCredentialsFieldsValidation:
        assert "Duplicate auto_credentials kwarg_name 'credentials'" in str(
            exc_info.value
        )
+
+
+def test_agent_input_block_ignores_legacy_placeholder_values():
+    """Verify AgentInputBlock.Input.model_construct tolerates extra placeholder_values
+    for backward compatibility with existing agent JSON."""
+    legacy_data = {
+        "name": "url",
+        "value": "",
+        "description": "Enter a URL",
+        "placeholder_values": ["https://example.com"],
+    }
+    instance = AgentInputBlock.Input.model_construct(**legacy_data)
+    schema = instance.generate_schema()
+    assert (
+        "enum" not in schema
+    ), "AgentInputBlock should not produce enum from legacy placeholder_values"
+
+
+def test_dropdown_input_block_produces_enum():
+    """Verify AgentDropdownInputBlock.Input.generate_schema() produces enum
+    using the canonical 'options' field name."""
+    opts = ["Option A", "Option B"]
+    instance = AgentDropdownInputBlock.Input.model_construct(
+        name="choice", value=None, options=opts
+    )
+    schema = instance.generate_schema()
+    assert schema.get("enum") == opts
+
+
+def test_dropdown_input_block_legacy_placeholder_values_produces_enum():
+    """Verify backward compat: passing legacy 'placeholder_values' to
+    AgentDropdownInputBlock still produces enum via model_construct remap."""
+    opts = ["Option A", "Option B"]
+    instance = AgentDropdownInputBlock.Input.model_construct(
+        name="choice", value=None, placeholder_values=opts
+    )
+    schema = instance.generate_schema()
+    assert (
+        schema.get("enum") == opts
+    ), "Legacy placeholder_values should be remapped to options"
+
+
+def test_generate_schema_integration_legacy_placeholder_values():
+    """Test the full Graph._generate_schema path with legacy placeholder_values
+    on AgentInputBlock — verifies no enum leaks through the graph loading path."""
+    legacy_input_default = {
+        "name": "url",
+        "value": "",
+        "description": "Enter a URL",
+        "placeholder_values": ["https://example.com"],
+    }
+    result = BaseGraph._generate_schema(
+        (AgentInputBlock.Input, legacy_input_default),
+    )
+    url_props = result["properties"]["url"]
+    assert (
+        "enum" not in url_props
+    ), "Graph schema should not contain enum from AgentInputBlock placeholder_values"
+
+
+def test_generate_schema_integration_dropdown_produces_enum():
+    """Test the full Graph._generate_schema path with AgentDropdownInputBlock
+    — verifies enum IS produced for dropdown blocks using canonical field name."""
+    dropdown_input_default = {
+        "name": "color",
+        "value": None,
+        "options": ["Red", "Green", "Blue"],
+    }
+    result = BaseGraph._generate_schema(
+        (AgentDropdownInputBlock.Input, dropdown_input_default),
+    )
+    color_props = result["properties"]["color"]
+    assert color_props.get("enum") == [
+        "Red",
+        "Green",
+        "Blue",
+    ], "Graph schema should contain enum from AgentDropdownInputBlock"
+
+
+def test_generate_schema_integration_dropdown_legacy_placeholder_values():
+    """Test the full Graph._generate_schema path with AgentDropdownInputBlock
+    using legacy 'placeholder_values' — verifies backward compat produces enum."""
+    legacy_dropdown_input_default = {
+        "name": "color",
+        "value": None,
+        "placeholder_values": ["Red", "Green", "Blue"],
+    }
+    result = BaseGraph._generate_schema(
+        (AgentDropdownInputBlock.Input, legacy_dropdown_input_default),
+    )
+    color_props = result["properties"]["color"]
+    assert color_props.get("enum") == [
+        "Red",
+        "Green",
+        "Blue",
+    ], "Legacy placeholder_values should still produce enum via model_construct remap"
+
+
+def test_dropdown_input_block_init_legacy_placeholder_values():
+    """Verify backward compat: constructing AgentDropdownInputBlock.Input via
+    model_validate with legacy 'placeholder_values' correctly maps to 'options'."""
+    opts = ["Option A", "Option B"]
+    instance = AgentDropdownInputBlock.Input.model_validate(
+        {"name": "choice", "value": None, "placeholder_values": opts}
+    )
+    assert (
+        instance.options == opts
+    ), "Legacy placeholder_values should be remapped to options via model_validate"
+    schema = instance.generate_schema()
+    assert schema.get("enum") == opts
--- a/autogpt_platform/backend/backend/blocks/test/test_blocks_dos_vulnerability.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_blocks_dos_vulnerability.py
@@ -207,6 +207,51 @@ class TestXMLParserBlockSecurity:
                pass


+class TestXMLParserBlockSyntaxErrors:
+    """XML syntax errors should raise ValueError (not SyntaxError).
+
+    This ensures the base Block.execute() wraps them as BlockExecutionError
+    (expected / user-caused) instead of BlockUnknownError (unexpected / alerts
+    Sentry).
+    """
+
+    async def test_unclosed_tag_raises_value_error(self):
+        """Unclosed tags should raise ValueError, not SyntaxError."""
+        block = XMLParserBlock()
+        bad_xml = "<root><unclosed>"
+
+        with pytest.raises(ValueError, match="Unclosed tag"):
+            async for _ in block.run(XMLParserBlock.Input(input_xml=bad_xml)):
+                pass
+
+    async def test_unexpected_closing_tag_raises_value_error(self):
+        """Extra closing tags should raise ValueError, not SyntaxError."""
+        block = XMLParserBlock()
+        bad_xml = "</unexpected>"
+
+        with pytest.raises(ValueError):
+            async for _ in block.run(XMLParserBlock.Input(input_xml=bad_xml)):
+                pass
+
+    async def test_empty_xml_raises_value_error(self):
+        """Empty XML input should raise ValueError."""
+        block = XMLParserBlock()
+
+        with pytest.raises(ValueError, match="XML input is empty"):
+            async for _ in block.run(XMLParserBlock.Input(input_xml="")):
+                pass
+
+    async def test_syntax_error_from_parser_becomes_value_error(self):
+        """SyntaxErrors from gravitasml library become ValueError (BlockExecutionError)."""
+        block = XMLParserBlock()
+        # Malformed XML that might trigger a SyntaxError from the parser
+        bad_xml = "<root><child>no closing"
+
+        with pytest.raises(ValueError):
+            async for _ in block.run(XMLParserBlock.Input(input_xml=bad_xml)):
+                pass
+
+
 class TestStoreMediaFileSecurity:
    """Test file storage security limits."""

--- a/autogpt_platform/backend/backend/blocks/test/test_llm.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_llm.py
@@ -1,9 +1,18 @@
+from typing import cast
 from unittest.mock import AsyncMock, MagicMock, patch

+import anthropic
+import httpx
+import openai
 import pytest

+import backend.blocks.llm as llm
 from backend.data.model import NodeExecutionStats

+# TEST_CREDENTIALS_INPUT is a plain dict that satisfies AICredentials at runtime
+# but not at the type level. Cast once here to avoid per-test suppressors.
+_TEST_AI_CREDENTIALS = cast(llm.AICredentials, llm.TEST_CREDENTIALS_INPUT)
+

 class TestLLMStatsTracking:
    """Test that LLM blocks correctly track token usage statistics."""
@@ -479,6 +488,154 @@ class TestLLMStatsTracking:
        assert outputs["response"] == {"result": "test"}


+class TestAIConversationBlockValidation:
+    """Test that AIConversationBlock validates inputs before calling the LLM."""
+
+    @pytest.mark.asyncio
+    async def test_empty_messages_and_empty_prompt_raises_error(self):
+        """Empty messages with no prompt should raise ValueError, not a cryptic API error."""
+        block = llm.AIConversationBlock()
+
+        input_data = llm.AIConversationBlock.Input(
+            messages=[],
+            prompt="",
+            model=llm.DEFAULT_LLM_MODEL,
+            credentials=_TEST_AI_CREDENTIALS,
+        )
+
+        with pytest.raises(ValueError, match="no messages and no prompt"):
+            async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                pass
+
+    @pytest.mark.asyncio
+    async def test_empty_messages_with_prompt_succeeds(self):
+        """Empty messages but a non-empty prompt should proceed without error."""
+        block = llm.AIConversationBlock()
+
+        async def mock_llm_call(input_data, credentials):
+            return {"response": "OK"}
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIConversationBlock.Input(
+                messages=[],
+                prompt="Hello, how are you?",
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+            )
+
+            outputs = {}
+            async for name, data in block.run(
+                input_data, credentials=llm.TEST_CREDENTIALS
+            ):
+                outputs[name] = data
+
+        assert outputs["response"] == "OK"
+
+    @pytest.mark.asyncio
+    async def test_nonempty_messages_with_empty_prompt_succeeds(self):
+        """Non-empty messages with no prompt should proceed without error."""
+        block = llm.AIConversationBlock()
+
+        async def mock_llm_call(input_data, credentials):
+            return {"response": "response from conversation"}
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIConversationBlock.Input(
+                messages=[{"role": "user", "content": "Hello"}],
+                prompt="",
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+            )
+
+            outputs = {}
+            async for name, data in block.run(
+                input_data, credentials=llm.TEST_CREDENTIALS
+            ):
+                outputs[name] = data
+
+        assert outputs["response"] == "response from conversation"
+
+    @pytest.mark.asyncio
+    async def test_messages_with_empty_content_raises_error(self):
+        """Messages with empty content strings should be treated as no messages."""
+        block = llm.AIConversationBlock()
+
+        input_data = llm.AIConversationBlock.Input(
+            messages=[{"role": "user", "content": ""}],
+            prompt="",
+            model=llm.DEFAULT_LLM_MODEL,
+            credentials=_TEST_AI_CREDENTIALS,
+        )
+
+        with pytest.raises(ValueError, match="no messages and no prompt"):
+            async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                pass
+
+    @pytest.mark.asyncio
+    async def test_messages_with_whitespace_content_raises_error(self):
+        """Messages with whitespace-only content should be treated as no messages."""
+        block = llm.AIConversationBlock()
+
+        input_data = llm.AIConversationBlock.Input(
+            messages=[{"role": "user", "content": "   "}],
+            prompt="",
+            model=llm.DEFAULT_LLM_MODEL,
+            credentials=_TEST_AI_CREDENTIALS,
+        )
+
+        with pytest.raises(ValueError, match="no messages and no prompt"):
+            async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                pass
+
+    @pytest.mark.asyncio
+    async def test_messages_with_none_entry_raises_error(self):
+        """Messages list containing None should be treated as no messages."""
+        block = llm.AIConversationBlock()
+
+        input_data = llm.AIConversationBlock.Input(
+            messages=[None],
+            prompt="",
+            model=llm.DEFAULT_LLM_MODEL,
+            credentials=_TEST_AI_CREDENTIALS,
+        )
+
+        with pytest.raises(ValueError, match="no messages and no prompt"):
+            async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                pass
+
+    @pytest.mark.asyncio
+    async def test_messages_with_empty_dict_raises_error(self):
+        """Messages list containing empty dict should be treated as no messages."""
+        block = llm.AIConversationBlock()
+
+        input_data = llm.AIConversationBlock.Input(
+            messages=[{}],
+            prompt="",
+            model=llm.DEFAULT_LLM_MODEL,
+            credentials=_TEST_AI_CREDENTIALS,
+        )
+
+        with pytest.raises(ValueError, match="no messages and no prompt"):
+            async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                pass
+
+    @pytest.mark.asyncio
+    async def test_messages_with_none_content_raises_error(self):
+        """Messages with content=None should not crash with AttributeError."""
+        block = llm.AIConversationBlock()
+
+        input_data = llm.AIConversationBlock.Input(
+            messages=[{"role": "user", "content": None}],
+            prompt="",
+            model=llm.DEFAULT_LLM_MODEL,
+            credentials=_TEST_AI_CREDENTIALS,
+        )
+
+        with pytest.raises(ValueError, match="no messages and no prompt"):
+            async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                pass
+
+
 class TestAITextSummarizerValidation:
    """Test that AITextSummarizerBlock validates LLM responses are strings."""

@@ -655,3 +812,178 @@ class TestAITextSummarizerValidation:
        error_message = str(exc_info.value)
        assert "Expected a string summary" in error_message
        assert "received dict" in error_message
+
+
+def _make_anthropic_status_error(status_code: int) -> anthropic.APIStatusError:
+    """Create an anthropic.APIStatusError with the given status code."""
+    request = httpx.Request("POST", "https://api.anthropic.com/v1/messages")
+    response = httpx.Response(status_code, request=request)
+    return anthropic.APIStatusError(
+        f"Error code: {status_code}", response=response, body=None
+    )
+
+
+def _make_openai_status_error(status_code: int) -> openai.APIStatusError:
+    """Create an openai.APIStatusError with the given status code."""
+    response = httpx.Response(
+        status_code, request=httpx.Request("POST", "https://api.openai.com/v1/chat")
+    )
+    return openai.APIStatusError(
+        f"Error code: {status_code}", response=response, body=None
+    )
+
+
+class TestUserErrorStatusCodeHandling:
+    """Test that user-caused LLM API errors (401/403/429) break the retry loop
+    and are logged as warnings, while server errors (500) trigger retries."""
+
+    @pytest.mark.asyncio
+    @pytest.mark.parametrize("status_code", [401, 403, 429])
+    async def test_anthropic_user_error_breaks_retry_loop(self, status_code: int):
+        """401/403/429 Anthropic errors should break immediately, not retry."""
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+        call_count = 0
+
+        async def mock_llm_call(*args, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            raise _make_anthropic_status_error(status_code)
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+                prompt="Test",
+                expected_format={"key": "desc"},
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+                retry=3,
+            )
+
+            with pytest.raises(RuntimeError):
+                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                    pass
+
+        assert (
+            call_count == 1
+        ), f"Expected exactly 1 call for status {status_code}, got {call_count}"
+
+    @pytest.mark.asyncio
+    @pytest.mark.parametrize("status_code", [401, 403, 429])
+    async def test_openai_user_error_breaks_retry_loop(self, status_code: int):
+        """401/403/429 OpenAI errors should break immediately, not retry."""
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+        call_count = 0
+
+        async def mock_llm_call(*args, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            raise _make_openai_status_error(status_code)
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+                prompt="Test",
+                expected_format={"key": "desc"},
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+                retry=3,
+            )
+
+            with pytest.raises(RuntimeError):
+                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                    pass
+
+        assert (
+            call_count == 1
+        ), f"Expected exactly 1 call for status {status_code}, got {call_count}"
+
+    @pytest.mark.asyncio
+    async def test_server_error_retries(self):
+        """500 errors should be retried (not break immediately)."""
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+        call_count = 0
+
+        async def mock_llm_call(*args, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            raise _make_anthropic_status_error(500)
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+                prompt="Test",
+                expected_format={"key": "desc"},
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+                retry=3,
+            )
+
+            with pytest.raises(RuntimeError):
+                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                    pass
+
+        assert (
+            call_count > 1
+        ), f"Expected multiple retry attempts for 500, got {call_count}"
+
+    @pytest.mark.asyncio
+    async def test_user_error_logs_warning_not_exception(self):
+        """User-caused errors should log with logger.warning, not logger.exception."""
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+
+        async def mock_llm_call(*args, **kwargs):
+            raise _make_anthropic_status_error(401)
+
+        with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
+            input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+                prompt="Test",
+                expected_format={"key": "desc"},
+                model=llm.DEFAULT_LLM_MODEL,
+                credentials=_TEST_AI_CREDENTIALS,
+            )
+
+            with (
+                patch.object(llm.logger, "warning") as mock_warning,
+                patch.object(llm.logger, "exception") as mock_exception,
+                pytest.raises(RuntimeError),
+            ):
+                async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                    pass
+
+        mock_warning.assert_called_once()
+        mock_exception.assert_not_called()
+
+
+class TestLlmModelMissing:
+    """Test that LlmModel handles provider-prefixed model names."""
+
+    def test_provider_prefixed_model_resolves(self):
+        """Provider-prefixed model string should resolve to the correct enum member."""
+        assert (
+            llm.LlmModel("anthropic/claude-sonnet-4-6")
+            == llm.LlmModel.CLAUDE_4_6_SONNET
+        )
+
+    def test_bare_model_still_works(self):
+        """Bare (non-prefixed) model string should still resolve correctly."""
+        assert llm.LlmModel("claude-sonnet-4-6") == llm.LlmModel.CLAUDE_4_6_SONNET
+
+    def test_invalid_prefixed_model_raises(self):
+        """Unknown provider-prefixed model string should raise ValueError."""
+        with pytest.raises(ValueError):
+            llm.LlmModel("invalid/nonexistent-model")
+
+    def test_slash_containing_value_direct_lookup(self):
+        """Enum values with '/' (e.g., OpenRouter models) should resolve via direct lookup, not _missing_."""
+        assert llm.LlmModel("google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
+
+    def test_double_prefixed_slash_model(self):
+        """Double-prefixed value should still resolve by stripping first prefix."""
+        assert (
+            llm.LlmModel("extra/google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
+        )
--- a/autogpt_platform/backend/backend/blocks/test/test_llm_empty_choices.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_llm_empty_choices.py
@@ -0,0 +1,87 @@
+"""Tests for empty-choices guard in extract_openai_tool_calls() and extract_openai_reasoning()."""
+
+from unittest.mock import MagicMock
+
+from backend.blocks.llm import extract_openai_reasoning, extract_openai_tool_calls
+
+
+class TestExtractOpenaiToolCallsEmptyChoices:
+    """extract_openai_tool_calls() must return None when choices is empty."""
+
+    def test_returns_none_for_empty_choices(self):
+        response = MagicMock()
+        response.choices = []
+        assert extract_openai_tool_calls(response) is None
+
+    def test_returns_none_for_none_choices(self):
+        response = MagicMock()
+        response.choices = None
+        assert extract_openai_tool_calls(response) is None
+
+    def test_returns_tool_calls_when_choices_present(self):
+        tool = MagicMock()
+        tool.id = "call_1"
+        tool.type = "function"
+        tool.function.name = "my_func"
+        tool.function.arguments = '{"a": 1}'
+
+        message = MagicMock()
+        message.tool_calls = [tool]
+
+        choice = MagicMock()
+        choice.message = message
+
+        response = MagicMock()
+        response.choices = [choice]
+
+        result = extract_openai_tool_calls(response)
+        assert result is not None
+        assert len(result) == 1
+        assert result[0].function.name == "my_func"
+
+    def test_returns_none_when_no_tool_calls(self):
+        message = MagicMock()
+        message.tool_calls = None
+
+        choice = MagicMock()
+        choice.message = message
+
+        response = MagicMock()
+        response.choices = [choice]
+
+        assert extract_openai_tool_calls(response) is None
+
+
+class TestExtractOpenaiReasoningEmptyChoices:
+    """extract_openai_reasoning() must return None when choices is empty."""
+
+    def test_returns_none_for_empty_choices(self):
+        response = MagicMock()
+        response.choices = []
+        assert extract_openai_reasoning(response) is None
+
+    def test_returns_none_for_none_choices(self):
+        response = MagicMock()
+        response.choices = None
+        assert extract_openai_reasoning(response) is None
+
+    def test_returns_reasoning_from_choice(self):
+        choice = MagicMock()
+        choice.reasoning = "Step-by-step reasoning"
+        choice.message = MagicMock(spec=[])  # no 'reasoning' attr on message
+
+        response = MagicMock(spec=[])  # no 'reasoning' attr on response
+        response.choices = [choice]
+
+        result = extract_openai_reasoning(response)
+        assert result == "Step-by-step reasoning"
+
+    def test_returns_none_when_no_reasoning(self):
+        choice = MagicMock(spec=[])  # no 'reasoning' attr
+        choice.message = MagicMock(spec=[])  # no 'reasoning' attr
+
+        response = MagicMock(spec=[])  # no 'reasoning' attr
+        response.choices = [choice]
+
+        result = extract_openai_reasoning(response)
+        assert result is None
--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker.py
@@ -57,7 +57,7 @@ async def execute_graph(
@pytest.mark.asyncio(loop_scope="session")
 async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):
    from backend.blocks.agent import AgentExecutorBlock
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data import graph

    test_user = await create_test_user()
@@ -66,7 +66,7 @@ async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):

    nodes = [
        graph.Node(
-            block_id=SmartDecisionMakerBlock().id,
+            block_id=OrchestratorBlock().id,
            input_default={
                "prompt": "Hello, World!",
                "credentials": creds,
@@ -108,10 +108,10 @@ async def test_graph_validation_with_tool_nodes_correct(server: SpinTestServer):


@pytest.mark.asyncio(loop_scope="session")
-async def test_smart_decision_maker_function_signature(server: SpinTestServer):
+async def test_orchestrator_function_signature(server: SpinTestServer):
    from backend.blocks.agent import AgentExecutorBlock
    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data import graph

    test_user = await create_test_user()
@@ -120,7 +120,7 @@ async def test_smart_decision_maker_function_signature(server: SpinTestServer):

    nodes = [
        graph.Node(
-            block_id=SmartDecisionMakerBlock().id,
+            block_id=OrchestratorBlock().id,
            input_default={
                "prompt": "Hello, World!",
                "credentials": creds,
@@ -169,7 +169,7 @@ async def test_smart_decision_maker_function_signature(server: SpinTestServer):
    )
    test_graph = await create_graph(server, test_graph, test_user)

-    tool_functions = await SmartDecisionMakerBlock._create_tool_node_signatures(
+    tool_functions = await OrchestratorBlock._create_tool_node_signatures(
        test_graph.nodes[0].id
    )
    assert tool_functions is not None, "Tool functions should not be None"
@@ -198,12 +198,12 @@ async def test_smart_decision_maker_function_signature(server: SpinTestServer):


@pytest.mark.asyncio
-async def test_smart_decision_maker_tracks_llm_stats():
-    """Test that SmartDecisionMakerBlock correctly tracks LLM usage stats."""
+async def test_orchestrator_tracks_llm_stats():
+    """Test that OrchestratorBlock correctly tracks LLM usage stats."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock the llm.llm_call function to return controlled data
    mock_response = MagicMock()
@@ -224,14 +224,14 @@ async def test_smart_decision_maker_tracks_llm_stats():
        new_callable=AsyncMock,
        return_value=mock_response,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],
    ):

        # Create test input
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Should I continue with this task?",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -274,12 +274,12 @@ async def test_smart_decision_maker_tracks_llm_stats():


@pytest.mark.asyncio
-async def test_smart_decision_maker_parameter_validation():
-    """Test that SmartDecisionMakerBlock correctly validates tool call parameters."""
+async def test_orchestrator_parameter_validation():
+    """Test that OrchestratorBlock correctly validates tool call parameters."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock tool functions with specific parameter schema
    mock_tool_functions = [
@@ -327,13 +327,13 @@ async def test_smart_decision_maker_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_with_typo,
    ) as mock_llm_call, patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -394,13 +394,13 @@ async def test_smart_decision_maker_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_missing_required,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -454,13 +454,13 @@ async def test_smart_decision_maker_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_valid,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -518,13 +518,13 @@ async def test_smart_decision_maker_parameter_validation():
        new_callable=AsyncMock,
        return_value=mock_response_all_params,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
    ):

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Search for keywords",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -562,12 +562,12 @@ async def test_smart_decision_maker_parameter_validation():


@pytest.mark.asyncio
-async def test_smart_decision_maker_raw_response_conversion():
-    """Test that SmartDecisionMaker correctly handles different raw_response types with retry mechanism."""
+async def test_orchestrator_raw_response_conversion():
+    """Test that Orchestrator correctly handles different raw_response types with retry mechanism."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock tool functions
    mock_tool_functions = [
@@ -637,7 +637,7 @@ async def test_smart_decision_maker_raw_response_conversion():
    with patch(
        "backend.blocks.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm_call, patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=mock_tool_functions,
@@ -646,7 +646,7 @@ async def test_smart_decision_maker_raw_response_conversion():
        # Second call returns successful response
        mock_llm_call.side_effect = [mock_response_retry, mock_response_success]

-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Test prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -715,12 +715,12 @@ async def test_smart_decision_maker_raw_response_conversion():
        new_callable=AsyncMock,
        return_value=mock_response_ollama,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],  # No tools for this test
    ):
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Simple prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -771,12 +771,12 @@ async def test_smart_decision_maker_raw_response_conversion():
        new_callable=AsyncMock,
        return_value=mock_response_dict,
    ), patch.object(
-        SmartDecisionMakerBlock,
+        OrchestratorBlock,
        "_create_tool_node_signatures",
        new_callable=AsyncMock,
        return_value=[],
    ):
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Another test",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -811,12 +811,12 @@ async def test_smart_decision_maker_raw_response_conversion():


@pytest.mark.asyncio
-async def test_smart_decision_maker_agent_mode():
+async def test_orchestrator_agent_mode():
    """Test that agent mode executes tools directly and loops until finished."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock tool call that requires multiple iterations
    mock_tool_call_1 = MagicMock()
@@ -893,7 +893,7 @@ async def test_smart_decision_maker_agent_mode():
    with patch("backend.blocks.llm.llm_call", llm_call_mock), patch.object(
        block, "_create_tool_node_signatures", return_value=mock_tool_signatures
    ), patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
+        "backend.blocks.orchestrator.get_database_manager_async_client",
        return_value=mock_db_client,
    ), patch(
        "backend.executor.manager.async_update_node_execution_status",
@@ -929,7 +929,7 @@ async def test_smart_decision_maker_agent_mode():
        }

        # Test agent mode with max_iterations = 3
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Complete this task using tools",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -969,12 +969,12 @@ async def test_smart_decision_maker_agent_mode():


@pytest.mark.asyncio
-async def test_smart_decision_maker_traditional_mode_default():
+async def test_orchestrator_traditional_mode_default():
    """Test that default behavior (agent_mode_max_iterations=0) works as traditional mode."""
    import backend.blocks.llm as llm_module
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock tool call
    mock_tool_call = MagicMock()
@@ -1018,7 +1018,7 @@ async def test_smart_decision_maker_traditional_mode_default():
    ):

        # Test default behavior (traditional mode)
-        input_data = SmartDecisionMakerBlock.Input(
+        input_data = OrchestratorBlock.Input(
            prompt="Test prompt",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -1060,12 +1060,12 @@ async def test_smart_decision_maker_traditional_mode_default():


@pytest.mark.asyncio
-async def test_smart_decision_maker_uses_customized_name_for_blocks():
-    """Test that SmartDecisionMakerBlock uses customized_name from node metadata for tool names."""
+async def test_orchestrator_uses_customized_name_for_blocks():
+    """Test that OrchestratorBlock uses customized_name from node metadata for tool names."""
    from unittest.mock import MagicMock

    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data.graph import Link, Node

    # Create a mock node with customized_name in metadata
@@ -1074,13 +1074,14 @@ async def test_smart_decision_maker_uses_customized_name_for_blocks():
    mock_node.block_id = StoreValueBlock().id
    mock_node.metadata = {"customized_name": "My Custom Tool Name"}
    mock_node.block = StoreValueBlock()
+    mock_node.input_default = {}

    # Create a mock link
    mock_link = MagicMock(spec=Link)
    mock_link.sink_name = "input"

    # Call the function directly
-    result = await SmartDecisionMakerBlock._create_block_function_signature(
+    result = await OrchestratorBlock._create_block_function_signature(
        mock_node, [mock_link]
    )

@@ -1091,12 +1092,12 @@ async def test_smart_decision_maker_uses_customized_name_for_blocks():


@pytest.mark.asyncio
-async def test_smart_decision_maker_falls_back_to_block_name():
-    """Test that SmartDecisionMakerBlock falls back to block.name when no customized_name."""
+async def test_orchestrator_falls_back_to_block_name():
+    """Test that OrchestratorBlock falls back to block.name when no customized_name."""
    from unittest.mock import MagicMock

    from backend.blocks.basic import StoreValueBlock
-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data.graph import Link, Node

    # Create a mock node without customized_name
@@ -1105,13 +1106,14 @@ async def test_smart_decision_maker_falls_back_to_block_name():
    mock_node.block_id = StoreValueBlock().id
    mock_node.metadata = {}  # No customized_name
    mock_node.block = StoreValueBlock()
+    mock_node.input_default = {}

    # Create a mock link
    mock_link = MagicMock(spec=Link)
    mock_link.sink_name = "input"

    # Call the function directly
-    result = await SmartDecisionMakerBlock._create_block_function_signature(
+    result = await OrchestratorBlock._create_block_function_signature(
        mock_node, [mock_link]
    )

@@ -1122,11 +1124,11 @@ async def test_smart_decision_maker_falls_back_to_block_name():


@pytest.mark.asyncio
-async def test_smart_decision_maker_uses_customized_name_for_agents():
-    """Test that SmartDecisionMakerBlock uses customized_name from metadata for agent nodes."""
+async def test_orchestrator_uses_customized_name_for_agents():
+    """Test that OrchestratorBlock uses customized_name from metadata for agent nodes."""
    from unittest.mock import AsyncMock, MagicMock, patch

-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data.graph import Link, Node

    # Create a mock node with customized_name in metadata
@@ -1152,10 +1154,10 @@ async def test_smart_decision_maker_uses_customized_name_for_agents():
    mock_db_client.get_graph_metadata.return_value = mock_graph_meta

    with patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
+        "backend.blocks.orchestrator.get_database_manager_async_client",
        return_value=mock_db_client,
    ):
-        result = await SmartDecisionMakerBlock._create_agent_function_signature(
+        result = await OrchestratorBlock._create_agent_function_signature(
            mock_node, [mock_link]
        )

@@ -1166,11 +1168,11 @@ async def test_smart_decision_maker_uses_customized_name_for_agents():


@pytest.mark.asyncio
-async def test_smart_decision_maker_agent_falls_back_to_graph_name():
+async def test_orchestrator_agent_falls_back_to_graph_name():
    """Test that agent node falls back to graph name when no customized_name."""
    from unittest.mock import AsyncMock, MagicMock, patch

-    from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+    from backend.blocks.orchestrator import OrchestratorBlock
    from backend.data.graph import Link, Node

    # Create a mock node without customized_name
@@ -1196,10 +1198,10 @@ async def test_smart_decision_maker_agent_falls_back_to_graph_name():
    mock_db_client.get_graph_metadata.return_value = mock_graph_meta

    with patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
+        "backend.blocks.orchestrator.get_database_manager_async_client",
        return_value=mock_db_client,
    ):
-        result = await SmartDecisionMakerBlock._create_agent_function_signature(
+        result = await OrchestratorBlock._create_agent_function_signature(
            mock_node, [mock_link]
        )

--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dict.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dict.py
@@ -3,12 +3,12 @@ from unittest.mock import Mock
 import pytest

 from backend.blocks.data_manipulation import AddToListBlock, CreateDictionaryBlock
-from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+from backend.blocks.orchestrator import OrchestratorBlock


@pytest.mark.asyncio
-async def test_smart_decision_maker_handles_dynamic_dict_fields():
-    """Test Smart Decision Maker can handle dynamic dictionary fields (_#_) for any block"""
+async def test_orchestrator_handles_dynamic_dict_fields():
+    """Test Orchestrator can handle dynamic dictionary fields (_#_) for any block"""

    # Create a mock node for CreateDictionaryBlock
    mock_node = Mock()
@@ -23,24 +23,24 @@ async def test_smart_decision_maker_handles_dynamic_dict_fields():
            source_name="tools_^_create_dict_~_name",
            sink_name="values_#_name",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_age",
            sink_name="values_#_age",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_city",
            sink_name="values_#_city",  # Dynamic dict field
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

    # Generate function signature
-    signature = await SmartDecisionMakerBlock._create_block_function_signature(
+    signature = await OrchestratorBlock._create_block_function_signature(
        mock_node, mock_links  # type: ignore
    )

@@ -70,8 +70,8 @@ async def test_smart_decision_maker_handles_dynamic_dict_fields():


@pytest.mark.asyncio
-async def test_smart_decision_maker_handles_dynamic_list_fields():
-    """Test Smart Decision Maker can handle dynamic list fields (_$_) for any block"""
+async def test_orchestrator_handles_dynamic_list_fields():
+    """Test Orchestrator can handle dynamic list fields (_$_) for any block"""

    # Create a mock node for AddToListBlock
    mock_node = Mock()
@@ -86,18 +86,18 @@ async def test_smart_decision_maker_handles_dynamic_list_fields():
            source_name="tools_^_add_to_list_~_0",
            sink_name="entries_$_0",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_add_to_list_~_1",
            sink_name="entries_$_1",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

    # Generate function signature
-    signature = await SmartDecisionMakerBlock._create_block_function_signature(
+    signature = await OrchestratorBlock._create_block_function_signature(
        mock_node, mock_links  # type: ignore
    )

--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dynamic_fields.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_dynamic_fields.py
@@ -1,4 +1,4 @@
-"""Comprehensive tests for SmartDecisionMakerBlock dynamic field handling."""
+"""Comprehensive tests for OrchestratorBlock dynamic field handling."""

 import json
 from unittest.mock import AsyncMock, MagicMock, Mock, patch
@@ -6,7 +6,7 @@ from unittest.mock import AsyncMock, MagicMock, Mock, patch
 import pytest

 from backend.blocks.data_manipulation import AddToListBlock, CreateDictionaryBlock
-from backend.blocks.smart_decision_maker import SmartDecisionMakerBlock
+from backend.blocks.orchestrator import OrchestratorBlock
 from backend.blocks.text import MatchTextPatternBlock
 from backend.data.dynamic_fields import get_dynamic_field_description

@@ -37,7 +37,7 @@ async def test_dynamic_field_description_generation():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_dict_fields():
    """Test that function signatures are created correctly for dictionary dynamic fields."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Create a mock node for CreateDictionaryBlock
    mock_node = Mock()
@@ -52,19 +52,19 @@ async def test_create_block_function_signature_with_dict_fields():
            source_name="tools_^_create_dict_~_values___name",  # Sanitized source
            sink_name="values_#_name",  # Original sink
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_values___age",  # Sanitized source
            sink_name="values_#_age",  # Original sink
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_create_dict_~_values___email",  # Sanitized source
            sink_name="values_#_email",  # Original sink
            sink_id="dict_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

@@ -100,7 +100,7 @@ async def test_create_block_function_signature_with_dict_fields():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_list_fields():
    """Test that function signatures are created correctly for list dynamic fields."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Create a mock node for AddToListBlock
    mock_node = Mock()
@@ -115,19 +115,19 @@ async def test_create_block_function_signature_with_list_fields():
            source_name="tools_^_add_list_~_0",
            sink_name="entries_$_0",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_add_list_~_1",
            sink_name="entries_$_1",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_add_list_~_2",
            sink_name="entries_$_2",  # Dynamic list field
            sink_id="list_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

@@ -154,7 +154,7 @@ async def test_create_block_function_signature_with_list_fields():
@pytest.mark.asyncio
 async def test_create_block_function_signature_with_object_fields():
    """Test that function signatures are created correctly for object dynamic fields."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Create a mock node for MatchTextPatternBlock (simulating object fields)
    mock_node = Mock()
@@ -169,13 +169,13 @@ async def test_create_block_function_signature_with_object_fields():
            source_name="tools_^_extract_~_user_name",
            sink_name="data_@_user_name",  # Dynamic object field
            sink_id="extract_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_extract_~_user_email",
            sink_name="data_@_user_email",  # Dynamic object field
            sink_id="extract_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

@@ -197,11 +197,11 @@ async def test_create_block_function_signature_with_object_fields():
@pytest.mark.asyncio
 async def test_create_tool_node_signatures():
    """Test that the mapping between sanitized and original field names is built correctly."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Mock the database client and connected nodes
    with patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client"
+        "backend.blocks.orchestrator.get_database_manager_async_client"
    ) as mock_db:
        mock_client = AsyncMock()
        mock_db.return_value = mock_client
@@ -281,7 +281,7 @@ async def test_create_tool_node_signatures():
@pytest.mark.asyncio
 async def test_output_yielding_with_dynamic_fields():
    """Test that outputs are yielded correctly with dynamic field names mapped back."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # No more sanitized mapping needed since we removed sanitization

@@ -309,13 +309,13 @@ async def test_output_yielding_with_dynamic_fields():

    # Mock the LLM call
    with patch(
-        "backend.blocks.smart_decision_maker.llm.llm_call", new_callable=AsyncMock
+        "backend.blocks.orchestrator.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm:
        mock_llm.return_value = mock_response

        # Mock the database manager to avoid HTTP calls during tool execution
        with patch(
-            "backend.blocks.smart_decision_maker.get_database_manager_async_client"
+            "backend.blocks.orchestrator.get_database_manager_async_client"
        ) as mock_db_manager, patch.object(
            block, "_create_tool_node_signatures", new_callable=AsyncMock
        ) as mock_sig:
@@ -420,7 +420,7 @@ async def test_output_yielding_with_dynamic_fields():
@pytest.mark.asyncio
 async def test_mixed_regular_and_dynamic_fields():
    """Test handling of blocks with both regular and dynamic fields."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Create a mock node
    mock_node = Mock()
@@ -450,19 +450,19 @@ async def test_mixed_regular_and_dynamic_fields():
            source_name="tools_^_test_~_regular",
            sink_name="regular_field",  # Regular field
            sink_id="test_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_test_~_dict_key",
            sink_name="values_#_key1",  # Dynamic dict field
            sink_id="test_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
        Mock(
            source_name="tools_^_test_~_dict_key2",
            sink_name="values_#_key2",  # Dynamic dict field
            sink_id="test_node_id",
-            source_id="smart_decision_node_id",
+            source_id="orchestrator_node_id",
        ),
    ]

@@ -488,7 +488,7 @@ async def test_mixed_regular_and_dynamic_fields():
@pytest.mark.asyncio
 async def test_validation_errors_dont_pollute_conversation():
    """Test that validation errors are only used during retries and don't pollute the conversation."""
-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # Track conversation history changes
    conversation_snapshots = []
@@ -535,7 +535,7 @@ async def test_validation_errors_dont_pollute_conversation():

    # Mock the LLM call
    with patch(
-        "backend.blocks.smart_decision_maker.llm.llm_call", new_callable=AsyncMock
+        "backend.blocks.orchestrator.llm.llm_call", new_callable=AsyncMock
    ) as mock_llm:
        mock_llm.side_effect = mock_llm_call

@@ -565,7 +565,7 @@ async def test_validation_errors_dont_pollute_conversation():

            # Mock the database manager to avoid HTTP calls during tool execution
            with patch(
-                "backend.blocks.smart_decision_maker.get_database_manager_async_client"
+                "backend.blocks.orchestrator.get_database_manager_async_client"
            ) as mock_db_manager:
                # Set up the mock database manager for agent mode
                mock_db_client = AsyncMock()
--- a/autogpt_platform/backend/backend/blocks/test/test_orchestrator_execution_mode.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_orchestrator_execution_mode.py
@@ -0,0 +1,202 @@
+"""Tests for ExecutionMode enum and provider validation in the orchestrator.
+
+Covers:
+- ExecutionMode enum members exist and have stable values
+- EXTENDED_THINKING provider validation (anthropic/open_router allowed, others rejected)
+- EXTENDED_THINKING model-name validation (must start with "claude")
+"""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from backend.blocks.llm import LlmModel
+from backend.blocks.orchestrator import ExecutionMode, OrchestratorBlock
+
+# ---------------------------------------------------------------------------
+# ExecutionMode enum integrity
+# ---------------------------------------------------------------------------
+
+
+class TestExecutionModeEnum:
+    """Guard against accidental renames or removals of enum members."""
+
+    def test_built_in_exists(self):
+        assert hasattr(ExecutionMode, "BUILT_IN")
+        assert ExecutionMode.BUILT_IN.value == "built_in"
+
+    def test_extended_thinking_exists(self):
+        assert hasattr(ExecutionMode, "EXTENDED_THINKING")
+        assert ExecutionMode.EXTENDED_THINKING.value == "extended_thinking"
+
+    def test_exactly_two_members(self):
+        """If a new mode is added, this test should be updated intentionally."""
+        assert set(ExecutionMode.__members__.keys()) == {
+            "BUILT_IN",
+            "EXTENDED_THINKING",
+        }
+
+    def test_string_enum(self):
+        """ExecutionMode is a str enum so it serialises cleanly to JSON."""
+        assert isinstance(ExecutionMode.BUILT_IN, str)
+        assert isinstance(ExecutionMode.EXTENDED_THINKING, str)
+
+    def test_round_trip_from_value(self):
+        """Constructing from the string value should return the same member."""
+        assert ExecutionMode("built_in") is ExecutionMode.BUILT_IN
+        assert ExecutionMode("extended_thinking") is ExecutionMode.EXTENDED_THINKING
+
+
+# ---------------------------------------------------------------------------
+# Provider validation (inline in OrchestratorBlock.run)
+# ---------------------------------------------------------------------------
+
+
+def _make_model_stub(provider: str, value: str):
+    """Create a lightweight stub that behaves like LlmModel for validation."""
+    metadata = MagicMock()
+    metadata.provider = provider
+    stub = MagicMock()
+    stub.metadata = metadata
+    stub.value = value
+    return stub
+
+
+class TestExtendedThinkingProviderValidation:
+    """The orchestrator rejects EXTENDED_THINKING for non-Anthropic providers."""
+
+    def test_anthropic_provider_accepted(self):
+        """provider='anthropic' + claude model should not raise."""
+        model = _make_model_stub("anthropic", "claude-opus-4-6")
+        provider = model.metadata.provider
+        model_name = model.value
+        assert provider in ("anthropic", "open_router")
+        assert model_name.startswith("claude")
+
+    def test_open_router_provider_accepted(self):
+        """provider='open_router' + claude model should not raise."""
+        model = _make_model_stub("open_router", "claude-sonnet-4-6")
+        provider = model.metadata.provider
+        model_name = model.value
+        assert provider in ("anthropic", "open_router")
+        assert model_name.startswith("claude")
+
+    def test_openai_provider_rejected(self):
+        """provider='openai' should be rejected for EXTENDED_THINKING."""
+        model = _make_model_stub("openai", "gpt-4o")
+        provider = model.metadata.provider
+        assert provider not in ("anthropic", "open_router")
+
+    def test_groq_provider_rejected(self):
+        model = _make_model_stub("groq", "llama-3.3-70b-versatile")
+        provider = model.metadata.provider
+        assert provider not in ("anthropic", "open_router")
+
+    def test_non_claude_model_rejected_even_if_anthropic_provider(self):
+        """A hypothetical non-Claude model with provider='anthropic' is rejected."""
+        model = _make_model_stub("anthropic", "not-a-claude-model")
+        model_name = model.value
+        assert not model_name.startswith("claude")
+
+    def test_real_gpt4o_model_rejected(self):
+        """Verify a real LlmModel enum member (GPT4O) fails the provider check."""
+        model = LlmModel.GPT4O
+        provider = model.metadata.provider
+        assert provider not in ("anthropic", "open_router")
+
+    def test_real_claude_model_passes(self):
+        """Verify a real LlmModel enum member (CLAUDE_4_6_SONNET) passes."""
+        model = LlmModel.CLAUDE_4_6_SONNET
+        provider = model.metadata.provider
+        model_name = model.value
+        assert provider in ("anthropic", "open_router")
+        assert model_name.startswith("claude")
+
+
+# ---------------------------------------------------------------------------
+# Integration-style: exercise the validation branch via OrchestratorBlock.run
+# ---------------------------------------------------------------------------
+
+
+def _make_input_data(model, execution_mode=ExecutionMode.EXTENDED_THINKING):
+    """Build a minimal MagicMock that satisfies OrchestratorBlock.run's early path."""
+    inp = MagicMock()
+    inp.execution_mode = execution_mode
+    inp.model = model
+    inp.prompt = "test"
+    inp.sys_prompt = ""
+    inp.conversation_history = []
+    inp.last_tool_output = None
+    inp.prompt_values = {}
+    return inp
+
+
+async def _collect_run_outputs(block, input_data, **kwargs):
+    """Exhaust the OrchestratorBlock.run async generator, collecting outputs."""
+    outputs = []
+    async for item in block.run(input_data, **kwargs):
+        outputs.append(item)
+    return outputs
+
+
+class TestExtendedThinkingValidationRaisesInBlock:
+    """Call OrchestratorBlock.run far enough to trigger the ValueError."""
+
+    @pytest.mark.asyncio
+    async def test_non_anthropic_provider_raises_valueerror(self):
+        """EXTENDED_THINKING + openai provider raises ValueError."""
+        block = OrchestratorBlock()
+        input_data = _make_input_data(model=LlmModel.GPT4O)
+
+        with (
+            patch.object(
+                block,
+                "_create_tool_node_signatures",
+                new_callable=AsyncMock,
+                return_value=[],
+            ),
+            pytest.raises(ValueError, match="Anthropic-compatible"),
+        ):
+            await _collect_run_outputs(
+                block,
+                input_data,
+                credentials=MagicMock(),
+                graph_id="g",
+                node_id="n",
+                graph_exec_id="ge",
+                node_exec_id="ne",
+                user_id="u",
+                graph_version=1,
+                execution_context=MagicMock(),
+                execution_processor=MagicMock(),
+            )
+
+    @pytest.mark.asyncio
+    async def test_non_claude_model_with_anthropic_provider_raises(self):
+        """A model with anthropic provider but non-claude name raises ValueError."""
+        block = OrchestratorBlock()
+        fake_model = _make_model_stub("anthropic", "not-a-claude-model")
+        input_data = _make_input_data(model=fake_model)
+
+        with (
+            patch.object(
+                block,
+                "_create_tool_node_signatures",
+                new_callable=AsyncMock,
+                return_value=[],
+            ),
+            pytest.raises(ValueError, match="only supports Claude models"),
+        ):
+            await _collect_run_outputs(
+                block,
+                input_data,
+                credentials=MagicMock(),
+                graph_id="g",
+                node_id="n",
+                graph_exec_id="ge",
+                node_exec_id="ne",
+                user_id="u",
+                graph_version=1,
+                execution_context=MagicMock(),
+                execution_processor=MagicMock(),
+            )
--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_responses_api.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_responses_api.py
@@ -1,6 +1,6 @@
-"""Tests for SmartDecisionMakerBlock compatibility with the OpenAI Responses API.
+"""Tests for OrchestratorBlock compatibility with the OpenAI Responses API.

-The SmartDecisionMakerBlock manages conversation history in the Chat Completions
+The OrchestratorBlock manages conversation history in the Chat Completions
 format, but OpenAI models now use the Responses API which has a fundamentally
 different conversation structure.  These tests document:

@@ -27,8 +27,8 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from backend.blocks.smart_decision_maker import (
-    SmartDecisionMakerBlock,
+from backend.blocks.orchestrator import (
+    OrchestratorBlock,
    _combine_tool_responses,
    _convert_raw_response_to_dict,
    _create_tool_response,
@@ -733,7 +733,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_no_reasoning_no_tools(self):
        """Dict raw_response, no reasoning → appends assistant dict."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": "hi"})
        block._update_conversation(prompt, resp)
@@ -741,7 +741,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_with_reasoning_no_tool_calls(self):
        """Reasoning present, no tool calls → reasoning prepended."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response(
            {"role": "assistant", "content": "answer"},
@@ -757,7 +757,7 @@ class TestUpdateConversation:

    def test_dict_raw_response_with_reasoning_and_anthropic_tool_calls(self):
        """Reasoning + Anthropic tool_use in content → reasoning skipped."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        raw = {
            "role": "assistant",
@@ -772,7 +772,7 @@ class TestUpdateConversation:

    def test_with_tool_outputs(self):
        """Tool outputs → extended onto prompt."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": None})
        outputs = [{"role": "tool", "tool_call_id": "call_1", "content": "r"}]
@@ -782,7 +782,7 @@ class TestUpdateConversation:

    def test_without_tool_outputs(self):
        """No tool outputs → only assistant message appended."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response({"role": "assistant", "content": "done"})
        block._update_conversation(prompt, resp, None)
@@ -790,7 +790,7 @@ class TestUpdateConversation:

    def test_string_raw_response(self):
        """Ollama string → wrapped as assistant dict."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response("hello from ollama")
        block._update_conversation(prompt, resp)
@@ -800,7 +800,7 @@ class TestUpdateConversation:

    def test_responses_api_text_response_produces_valid_items(self):
        """Responses API text response → conversation items must have valid role."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = [
            {"role": "system", "content": "sys"},
            {"role": "user", "content": "user"},
@@ -820,7 +820,7 @@ class TestUpdateConversation:

    def test_responses_api_function_call_produces_valid_items(self):
        """Responses API function_call → conversation items must have valid type."""
-        block = SmartDecisionMakerBlock()
+        block = OrchestratorBlock()
        prompt: list[dict] = []
        resp = self._make_response(
            _MockResponse(output=[_MockFunctionCall("tool", "{}", call_id="call_1")])
@@ -856,7 +856,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
    """
    import backend.blocks.llm as llm_module

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    # First response: tool call
    mock_tc = MagicMock()
@@ -936,7 +936,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
    with patch("backend.blocks.llm.llm_call", llm_mock), patch.object(
        block, "_create_tool_node_signatures", return_value=tool_sigs
    ), patch(
-        "backend.blocks.smart_decision_maker.get_database_manager_async_client",
+        "backend.blocks.orchestrator.get_database_manager_async_client",
        return_value=mock_db,
    ), patch(
        "backend.executor.manager.async_update_node_execution_status",
@@ -945,7 +945,7 @@ async def test_agent_mode_conversation_valid_for_responses_api():
        "backend.integrations.creds_manager.IntegrationCredentialsManager"
    ):

-        inp = SmartDecisionMakerBlock.Input(
+        inp = OrchestratorBlock.Input(
            prompt="Improve this",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
@@ -992,7 +992,7 @@ async def test_traditional_mode_conversation_valid_for_responses_api():
    """Traditional mode: the yielded conversation must contain only valid items."""
    import backend.blocks.llm as llm_module

-    block = SmartDecisionMakerBlock()
+    block = OrchestratorBlock()

    mock_tc = MagicMock()
    mock_tc.function.name = "my_tool"
@@ -1028,7 +1028,7 @@ async def test_traditional_mode_conversation_valid_for_responses_api():
        "backend.blocks.llm.llm_call", new_callable=AsyncMock, return_value=resp
    ), patch.object(block, "_create_tool_node_signatures", return_value=tool_sigs):

-        inp = SmartDecisionMakerBlock.Input(
+        inp = OrchestratorBlock.Input(
            prompt="Do it",
            model=llm_module.DEFAULT_LLM_MODEL,
            credentials=llm_module.TEST_CREDENTIALS_INPUT,  # type: ignore
--- a/autogpt_platform/backend/backend/blocks/test/test_orchestrator_tool_dedup.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_orchestrator_tool_dedup.py
--- a/autogpt_platform/backend/backend/blocks/text_to_speech_block.py
+++ b/autogpt_platform/backend/backend/blocks/text_to_speech_block.py
@@ -13,6 +13,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -104,4 +105,5 @@ class UnrealTextToSpeechBlock(Block):
            input_data.text,
            input_data.voice_id,
        )
+        self.merge_stats(NodeExecutionStats(output_size=len(input_data.text)))
        yield "mp3_url", api_response["OutputUri"]
--- a/autogpt_platform/backend/backend/blocks/xml_parser.py
+++ b/autogpt_platform/backend/backend/blocks/xml_parser.py
@@ -44,7 +44,7 @@ class XMLParserBlock(Block):
            elif token.type == "TAG_CLOSE":
                depth -= 1
                if depth < 0:
-                    raise SyntaxError("Unexpected closing tag in XML input.")
+                    raise ValueError("Unexpected closing tag in XML input.")
            elif token.type in {"TEXT", "ESCAPE"}:
                if depth == 0 and token.value:
                    raise ValueError(
@@ -53,7 +53,7 @@ class XMLParserBlock(Block):
                    )

        if depth != 0:
-            raise SyntaxError("Unclosed tag detected in XML input.")
+            raise ValueError("Unclosed tag detected in XML input.")
        if not root_seen:
            raise ValueError("XML must include a root element.")

@@ -76,4 +76,7 @@ class XMLParserBlock(Block):
        except ValueError as val_e:
            raise ValueError(f"Validation error for dict:{val_e}") from val_e
        except SyntaxError as syn_e:
-            raise SyntaxError(f"Error in input xml syntax: {syn_e}") from syn_e
+            # Raise as ValueError so the base Block.execute() wraps it as
+            # BlockExecutionError (expected user-caused failure) instead of
+            # BlockUnknownError (unexpected platform error that alerts Sentry).
+            raise ValueError(f"Error in input xml syntax: {syn_e}") from syn_e
--- a/autogpt_platform/backend/backend/blocks/youtube.py
+++ b/autogpt_platform/backend/backend/blocks/youtube.py
@@ -19,6 +19,7 @@ from backend.blocks._base import (
 from backend.data.model import (
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
    UserPasswordCredentials,
 )
@@ -170,6 +171,7 @@ class TranscribeYoutubeVideoBlock(Block):
            transcript = self.get_transcript(video_id, credentials)
            transcript_text = self.format_transcript(transcript=transcript)

+            self.merge_stats(NodeExecutionStats(output_size=1))
            # Only yield after all operations succeed
            yield "video_id", video_id
            yield "transcript", transcript_text
--- a/autogpt_platform/backend/backend/blocks/zerobounce/validate_emails.py
+++ b/autogpt_platform/backend/backend/blocks/zerobounce/validate_emails.py
@@ -21,7 +21,7 @@ from backend.blocks.zerobounce._auth import (
    ZeroBounceCredentials,
    ZeroBounceCredentialsInput,
 )
-from backend.data.model import CredentialsField, SchemaField
+from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField


 class Response(BaseModel):
@@ -177,5 +177,6 @@ class ValidateEmailsBlock(Block):
        )

        response_model = Response(**response.__dict__)
+        self.merge_stats(NodeExecutionStats(output_size=1))

        yield "response", response_model
--- a/autogpt_platform/backend/backend/copilot/baseline/service.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service.py
@@ -9,12 +9,16 @@ shared tool registry as the SDK path.
 import asyncio
 import logging
 import uuid
-from collections.abc import AsyncGenerator
-from typing import Any
+from collections.abc import AsyncGenerator, Sequence
+from dataclasses import dataclass, field
+from functools import partial
+from typing import Any, cast

 import orjson
 from langfuse import propagate_attributes
+from openai.types.chat import ChatCompletionMessageParam, ChatCompletionToolParam

+from backend.copilot.context import set_execution_context
 from backend.copilot.model import (
    ChatMessage,
    ChatSession,
@@ -47,8 +51,24 @@ from backend.copilot.service import (
 from backend.copilot.token_tracking import persist_and_record_usage
 from backend.copilot.tools import execute_tool, get_available_tools
 from backend.copilot.tracking import track_user_message
+from backend.copilot.transcript import (
+    download_transcript,
+    upload_transcript,
+    validate_transcript,
+)
+from backend.copilot.transcript_builder import TranscriptBuilder
 from backend.util.exceptions import NotFoundError
-from backend.util.prompt import compress_context
+from backend.util.prompt import (
+    compress_context,
+    estimate_token_count,
+    estimate_token_count_str,
+)
+from backend.util.tool_call_loop import (
+    LLMLoopResponse,
+    LLMToolCall,
+    ToolCallResult,
+    tool_call_loop,
+)

 logger = logging.getLogger(__name__)

@@ -59,6 +79,286 @@ _background_tasks: set[asyncio.Task[Any]] = set()
 _MAX_TOOL_ROUNDS = 30


+@dataclass
+class _BaselineStreamState:
+    """Mutable state shared between the tool-call loop callbacks.
+
+    Extracted from ``stream_chat_completion_baseline`` so that the callbacks
+    can be module-level functions instead of deeply nested closures.
+    """
+
+    pending_events: list[StreamBaseResponse] = field(default_factory=list)
+    assistant_text: str = ""
+    text_block_id: str = field(default_factory=lambda: str(uuid.uuid4()))
+    text_started: bool = False
+    turn_prompt_tokens: int = 0
+    turn_completion_tokens: int = 0
+
+
+async def _baseline_llm_caller(
+    messages: list[dict[str, Any]],
+    tools: Sequence[Any],
+    *,
+    state: _BaselineStreamState,
+) -> LLMLoopResponse:
+    """Stream an OpenAI-compatible response and collect results.
+
+    Extracted from ``stream_chat_completion_baseline`` for readability.
+    """
+    state.pending_events.append(StreamStartStep())
+
+    round_text = ""
+    try:
+        client = _get_openai_client()
+        typed_messages = cast(list[ChatCompletionMessageParam], messages)
+        if tools:
+            typed_tools = cast(list[ChatCompletionToolParam], tools)
+            response = await client.chat.completions.create(
+                model=config.fast_model,
+                messages=typed_messages,
+                tools=typed_tools,
+                stream=True,
+                stream_options={"include_usage": True},
+            )
+        else:
+            response = await client.chat.completions.create(
+                model=config.fast_model,
+                messages=typed_messages,
+                stream=True,
+                stream_options={"include_usage": True},
+            )
+        tool_calls_by_index: dict[int, dict[str, str]] = {}
+
+        async for chunk in response:
+            if chunk.usage:
+                state.turn_prompt_tokens += chunk.usage.prompt_tokens or 0
+                state.turn_completion_tokens += chunk.usage.completion_tokens or 0
+
+            delta = chunk.choices[0].delta if chunk.choices else None
+            if not delta:
+                continue
+
+            if delta.content:
+                if not state.text_started:
+                    state.pending_events.append(StreamTextStart(id=state.text_block_id))
+                    state.text_started = True
+                round_text += delta.content
+                state.pending_events.append(
+                    StreamTextDelta(id=state.text_block_id, delta=delta.content)
+                )
+
+            if delta.tool_calls:
+                for tc in delta.tool_calls:
+                    idx = tc.index
+                    if idx not in tool_calls_by_index:
+                        tool_calls_by_index[idx] = {
+                            "id": "",
+                            "name": "",
+                            "arguments": "",
+                        }
+                    entry = tool_calls_by_index[idx]
+                    if tc.id:
+                        entry["id"] = tc.id
+                    if tc.function and tc.function.name:
+                        entry["name"] = tc.function.name
+                    if tc.function and tc.function.arguments:
+                        entry["arguments"] += tc.function.arguments
+
+        # Close text block
+        if state.text_started:
+            state.pending_events.append(StreamTextEnd(id=state.text_block_id))
+            state.text_started = False
+            state.text_block_id = str(uuid.uuid4())
+    finally:
+        # Always persist partial text so the session history stays consistent,
+        # even when the stream is interrupted by an exception.
+        state.assistant_text += round_text
+        # Always emit StreamFinishStep to match the StreamStartStep,
+        # even if an exception occurred during streaming.
+        state.pending_events.append(StreamFinishStep())
+
+    # Convert to shared format
+    llm_tool_calls = [
+        LLMToolCall(
+            id=tc["id"],
+            name=tc["name"],
+            arguments=tc["arguments"] or "{}",
+        )
+        for tc in tool_calls_by_index.values()
+    ]
+
+    return LLMLoopResponse(
+        response_text=round_text or None,
+        tool_calls=llm_tool_calls,
+        raw_response=None,  # Not needed for baseline conversation updater
+        prompt_tokens=0,  # Tracked via state accumulators
+        completion_tokens=0,
+    )
+
+
+async def _baseline_tool_executor(
+    tool_call: LLMToolCall,
+    tools: Sequence[Any],
+    *,
+    state: _BaselineStreamState,
+    user_id: str | None,
+    session: ChatSession,
+) -> ToolCallResult:
+    """Execute a tool via the copilot tool registry.
+
+    Extracted from ``stream_chat_completion_baseline`` for readability.
+    """
+    tool_call_id = tool_call.id
+    tool_name = tool_call.name
+    raw_args = tool_call.arguments or "{}"
+
+    try:
+        tool_args = orjson.loads(raw_args)
+    except orjson.JSONDecodeError as parse_err:
+        parse_error = f"Invalid JSON arguments for tool '{tool_name}': {parse_err}"
+        logger.warning("[Baseline] %s", parse_error)
+        state.pending_events.append(
+            StreamToolOutputAvailable(
+                toolCallId=tool_call_id,
+                toolName=tool_name,
+                output=parse_error,
+                success=False,
+            )
+        )
+        return ToolCallResult(
+            tool_call_id=tool_call_id,
+            tool_name=tool_name,
+            content=parse_error,
+            is_error=True,
+        )
+
+    state.pending_events.append(
+        StreamToolInputStart(toolCallId=tool_call_id, toolName=tool_name)
+    )
+    state.pending_events.append(
+        StreamToolInputAvailable(
+            toolCallId=tool_call_id,
+            toolName=tool_name,
+            input=tool_args,
+        )
+    )
+
+    try:
+        result: StreamToolOutputAvailable = await execute_tool(
+            tool_name=tool_name,
+            parameters=tool_args,
+            user_id=user_id,
+            session=session,
+            tool_call_id=tool_call_id,
+        )
+        state.pending_events.append(result)
+        tool_output = (
+            result.output if isinstance(result.output, str) else str(result.output)
+        )
+        return ToolCallResult(
+            tool_call_id=tool_call_id,
+            tool_name=tool_name,
+            content=tool_output,
+        )
+    except Exception as e:
+        error_output = f"Tool execution error: {e}"
+        logger.error(
+            "[Baseline] Tool %s failed: %s",
+            tool_name,
+            error_output,
+            exc_info=True,
+        )
+        state.pending_events.append(
+            StreamToolOutputAvailable(
+                toolCallId=tool_call_id,
+                toolName=tool_name,
+                output=error_output,
+                success=False,
+            )
+        )
+        return ToolCallResult(
+            tool_call_id=tool_call_id,
+            tool_name=tool_name,
+            content=error_output,
+            is_error=True,
+        )
+
+
+def _baseline_conversation_updater(
+    messages: list[dict[str, Any]],
+    response: LLMLoopResponse,
+    tool_results: list[ToolCallResult] | None = None,
+    *,
+    transcript_builder: TranscriptBuilder,
+    model: str = "",
+) -> None:
+    """Update OpenAI message list with assistant response + tool results.
+
+    Extracted from ``stream_chat_completion_baseline`` for readability.
+    """
+    if tool_results:
+        # Build assistant message with tool_calls
+        assistant_msg: dict[str, Any] = {"role": "assistant"}
+        if response.response_text:
+            assistant_msg["content"] = response.response_text
+        assistant_msg["tool_calls"] = [
+            {
+                "id": tc.id,
+                "type": "function",
+                "function": {"name": tc.name, "arguments": tc.arguments},
+            }
+            for tc in response.tool_calls
+        ]
+        messages.append(assistant_msg)
+        # Record assistant message (with tool_calls) to transcript
+        content_blocks: list[dict[str, Any]] = []
+        if response.response_text:
+            content_blocks.append({"type": "text", "text": response.response_text})
+        for tc in response.tool_calls:
+            try:
+                args = orjson.loads(tc.arguments) if tc.arguments else {}
+            except Exception:
+                args = {}
+            content_blocks.append(
+                {
+                    "type": "tool_use",
+                    "id": tc.id,
+                    "name": tc.name,
+                    "input": args,
+                }
+            )
+        if content_blocks:
+            transcript_builder.append_assistant(
+                content_blocks=content_blocks,
+                model=model,
+                stop_reason="tool_use",
+            )
+        for tr in tool_results:
+            messages.append(
+                {
+                    "role": "tool",
+                    "tool_call_id": tr.tool_call_id,
+                    "content": tr.content,
+                }
+            )
+            # Record tool result to transcript AFTER the assistant tool_use
+            # block to maintain correct Anthropic API ordering:
+            # assistant(tool_use) → user(tool_result)
+            transcript_builder.append_tool_result(
+                tool_use_id=tr.tool_call_id,
+                content=tr.content,
+            )
+    else:
+        if response.response_text:
+            messages.append({"role": "assistant", "content": response.response_text})
+            # Record final text to transcript
+            transcript_builder.append_assistant(
+                content_blocks=[{"type": "text", "text": response.response_text}],
+                model=model,
+                stop_reason="end_turn",
+            )
+
+
 async def _update_title_async(
    session_id: str, message: str, user_id: str | None
 ) -> None:
@@ -85,19 +385,23 @@ async def _compress_session_messages(
        msg_dict: dict[str, Any] = {"role": msg.role}
        if msg.content:
            msg_dict["content"] = msg.content
+        if msg.tool_calls:
+            msg_dict["tool_calls"] = msg.tool_calls
+        if msg.tool_call_id:
+            msg_dict["tool_call_id"] = msg.tool_call_id
        messages_dict.append(msg_dict)

    try:
        result = await compress_context(
            messages=messages_dict,
-            model=config.model,
+            model=config.fast_model,
            client=_get_openai_client(),
        )
    except Exception as e:
        logger.warning("[Baseline] Context compression with LLM failed: %s", e)
        result = await compress_context(
            messages=messages_dict,
-            model=config.model,
+            model=config.fast_model,
            client=None,
        )

@@ -111,7 +415,12 @@ async def _compress_session_messages(
            result.messages_dropped,
        )
        return [
-            ChatMessage(role=m["role"], content=m.get("content"))
+            ChatMessage(
+                role=m["role"],
+                content=m.get("content"),
+                tool_calls=m.get("tool_calls"),
+                tool_call_id=m.get("tool_call_id"),
+            )
            for m in result.messages
        ]

@@ -142,7 +451,8 @@ async def stream_chat_completion_baseline(
            f"Session {session_id} not found. Please create a new session first."
        )

-    # Append user message
+    # Append user message (skip if it's an exact duplicate of the last message,
+    # e.g. from a network retry)
    new_role = "user" if is_user_message else "assistant"
    if message and (
        len(session.messages) == 0
@@ -161,6 +471,54 @@ async def stream_chat_completion_baseline(

    session = await upsert_chat_session(session)

+    # --- Transcript support (feature parity with SDK path) ---
+    transcript_builder = TranscriptBuilder()
+    transcript_covers_prefix = True
+
+    if user_id and len(session.messages) > 1:
+        try:
+            dl = await download_transcript(user_id, session_id, log_prefix="[Baseline]")
+            if dl and validate_transcript(dl.content):
+                # Reject stale transcripts: if msg_count is known and
+                # doesn't cover the current session, loading it would
+                # silently drop intermediate turns from the transcript.
+                session_msg_count = len(session.messages)
+                if dl.message_count and dl.message_count < session_msg_count - 1:
+                    logger.warning(
+                        "[Baseline] Transcript stale: covers %d of %d messages, skipping",
+                        dl.message_count,
+                        session_msg_count,
+                    )
+                    transcript_covers_prefix = False
+                else:
+                    transcript_builder.load_previous(
+                        dl.content, log_prefix="[Baseline]"
+                    )
+                    logger.info(
+                        "[Baseline] Loaded transcript: %dB, msg_count=%d",
+                        len(dl.content),
+                        dl.message_count,
+                    )
+            elif dl:
+                logger.warning("[Baseline] Downloaded transcript but invalid")
+                transcript_covers_prefix = False
+            else:
+                logger.debug("[Baseline] No transcript available")
+                transcript_covers_prefix = False
+        except Exception as e:
+            logger.warning("[Baseline] Transcript download failed: %s", e)
+            transcript_covers_prefix = False
+
+    # Append user message to transcript.
+    # Always append when the message is present and is from the user,
+    # even on duplicate-suppressed retries (is_new_message=False).
+    # The loaded transcript may be stale (uploaded before the previous
+    # attempt stored this message), so skipping it would leave the
+    # transcript without the user turn, creating a malformed
+    # assistant-after-assistant structure when the LLM reply is added.
+    if message and is_user_message:
+        transcript_builder.append_user(content=message)
+
    # Generate title for new sessions
    if is_user_message and not session.title:
        user_messages = [m for m in session.messages if m.role == "user"]
@@ -193,16 +551,37 @@ async def stream_chat_completion_baseline(
    # Compress context if approaching the model's token limit
    messages_for_context = await _compress_session_messages(session.messages)

-    # Build OpenAI message list from session history
+    # Build OpenAI message list from session history.
+    # Include tool_calls on assistant messages and tool-role results so the
+    # model retains full context of what tools were invoked and their outcomes.
    openai_messages: list[dict[str, Any]] = [
        {"role": "system", "content": system_prompt}
    ]
    for msg in messages_for_context:
-        if msg.role in ("user", "assistant") and msg.content:
+        if msg.role == "assistant":
+            entry: dict[str, Any] = {"role": "assistant"}
+            if msg.content:
+                entry["content"] = msg.content
+            if msg.tool_calls:
+                entry["tool_calls"] = msg.tool_calls
+            if msg.content or msg.tool_calls:
+                openai_messages.append(entry)
+        elif msg.role == "tool" and msg.tool_call_id:
+            openai_messages.append(
+                {
+                    "role": "tool",
+                    "tool_call_id": msg.tool_call_id,
+                    "content": msg.content or "",
+                }
+            )
+        elif msg.role == "user" and msg.content:
            openai_messages.append({"role": msg.role, "content": msg.content})

    tools = get_available_tools()

+    # Propagate execution context so tool handlers can read session-level flags.
+    set_execution_context(user_id, session)
+
    yield StreamStart(messageId=message_id, sessionId=session_id)

    # Propagate user/session context to Langfuse so all LLM calls within
@@ -219,191 +598,38 @@ async def stream_chat_completion_baseline(
    except Exception:
        logger.warning("[Baseline] Langfuse trace context setup failed")

-    assistant_text = ""
-    text_block_id = str(uuid.uuid4())
-    text_started = False
-    step_open = False
-    # Token usage accumulators — populated from streaming chunks
-    turn_prompt_tokens = 0
-    turn_completion_tokens = 0
    _stream_error = False  # Track whether an error occurred during streaming
+    state = _BaselineStreamState()
+
+    # Bind extracted module-level callbacks to this request's state/session
+    # using functools.partial so they satisfy the Protocol signatures.
+    _bound_llm_caller = partial(_baseline_llm_caller, state=state)
+    _bound_tool_executor = partial(
+        _baseline_tool_executor, state=state, user_id=user_id, session=session
+    )
+
+    _bound_conversation_updater = partial(
+        _baseline_conversation_updater,
+        transcript_builder=transcript_builder,
+        model=config.fast_model,
+    )
+
    try:
-        for _round in range(_MAX_TOOL_ROUNDS):
-            # Open a new step for each LLM round
-            yield StreamStartStep()
-            step_open = True
+        loop_result = None
+        async for loop_result in tool_call_loop(
+            messages=openai_messages,
+            tools=tools,
+            llm_call=_bound_llm_caller,
+            execute_tool=_bound_tool_executor,
+            update_conversation=_bound_conversation_updater,
+            max_iterations=_MAX_TOOL_ROUNDS,
+        ):
+            # Drain buffered events after each iteration (real-time streaming)
+            for evt in state.pending_events:
+                yield evt
+            state.pending_events.clear()

-            # Stream a response from the model
-            create_kwargs: dict[str, Any] = dict(
-                model=config.model,
-                messages=openai_messages,
-                stream=True,
-                stream_options={"include_usage": True},
-            )
-            if tools:
-                create_kwargs["tools"] = tools
-            response = await _get_openai_client().chat.completions.create(**create_kwargs)  # type: ignore[arg-type]  # dynamic kwargs
-
-            # Accumulate streamed response (text + tool calls)
-            round_text = ""
-            tool_calls_by_index: dict[int, dict[str, str]] = {}
-
-            async for chunk in response:
-                # Capture token usage from the streaming chunk.
-                # OpenRouter normalises all providers into OpenAI format
-                # where prompt_tokens already includes cached tokens
-                # (unlike Anthropic's native API). Use += to sum all
-                # tool-call rounds since each API call is independent.
-                # NOTE: stream_options={"include_usage": True} is not
-                # universally supported — some providers (Mistral, Llama
-                # via OpenRouter) always return chunk.usage=None. When
-                # that happens, tokens stay 0 and the tiktoken fallback
-                # below activates. Fail-open: one round is estimated.
-                if chunk.usage:
-                    turn_prompt_tokens += chunk.usage.prompt_tokens or 0
-                    turn_completion_tokens += chunk.usage.completion_tokens or 0
-
-                delta = chunk.choices[0].delta if chunk.choices else None
-                if not delta:
-                    continue
-
-                # Text content
-                if delta.content:
-                    if not text_started:
-                        yield StreamTextStart(id=text_block_id)
-                        text_started = True
-                    round_text += delta.content
-                    yield StreamTextDelta(id=text_block_id, delta=delta.content)
-
-                # Tool call fragments (streamed incrementally)
-                if delta.tool_calls:
-                    for tc in delta.tool_calls:
-                        idx = tc.index
-                        if idx not in tool_calls_by_index:
-                            tool_calls_by_index[idx] = {
-                                "id": "",
-                                "name": "",
-                                "arguments": "",
-                            }
-                        entry = tool_calls_by_index[idx]
-                        if tc.id:
-                            entry["id"] = tc.id
-                        if tc.function and tc.function.name:
-                            entry["name"] = tc.function.name
-                        if tc.function and tc.function.arguments:
-                            entry["arguments"] += tc.function.arguments
-
-            # Close text block if we had one this round
-            if text_started:
-                yield StreamTextEnd(id=text_block_id)
-                text_started = False
-                text_block_id = str(uuid.uuid4())
-
-            # Accumulate text for session persistence
-            assistant_text += round_text
-
-            # No tool calls -> model is done
-            if not tool_calls_by_index:
-                yield StreamFinishStep()
-                step_open = False
-                break
-
-            # Close step before tool execution
-            yield StreamFinishStep()
-            step_open = False
-
-            # Append the assistant message with tool_calls to context.
-            assistant_msg: dict[str, Any] = {"role": "assistant"}
-            if round_text:
-                assistant_msg["content"] = round_text
-            assistant_msg["tool_calls"] = [
-                {
-                    "id": tc["id"],
-                    "type": "function",
-                    "function": {
-                        "name": tc["name"],
-                        "arguments": tc["arguments"] or "{}",
-                    },
-                }
-                for tc in tool_calls_by_index.values()
-            ]
-            openai_messages.append(assistant_msg)
-
-            # Execute each tool call and stream events
-            for tc in tool_calls_by_index.values():
-                tool_call_id = tc["id"]
-                tool_name = tc["name"]
-                raw_args = tc["arguments"] or "{}"
-                try:
-                    tool_args = orjson.loads(raw_args)
-                except orjson.JSONDecodeError as parse_err:
-                    parse_error = (
-                        f"Invalid JSON arguments for tool '{tool_name}': {parse_err}"
-                    )
-                    logger.warning("[Baseline] %s", parse_error)
-                    yield StreamToolOutputAvailable(
-                        toolCallId=tool_call_id,
-                        toolName=tool_name,
-                        output=parse_error,
-                        success=False,
-                    )
-                    openai_messages.append(
-                        {
-                            "role": "tool",
-                            "tool_call_id": tool_call_id,
-                            "content": parse_error,
-                        }
-                    )
-                    continue
-
-                yield StreamToolInputStart(toolCallId=tool_call_id, toolName=tool_name)
-                yield StreamToolInputAvailable(
-                    toolCallId=tool_call_id,
-                    toolName=tool_name,
-                    input=tool_args,
-                )
-
-                # Execute via shared tool registry
-                try:
-                    result: StreamToolOutputAvailable = await execute_tool(
-                        tool_name=tool_name,
-                        parameters=tool_args,
-                        user_id=user_id,
-                        session=session,
-                        tool_call_id=tool_call_id,
-                    )
-                    yield result
-                    tool_output = (
-                        result.output
-                        if isinstance(result.output, str)
-                        else str(result.output)
-                    )
-                except Exception as e:
-                    error_output = f"Tool execution error: {e}"
-                    logger.error(
-                        "[Baseline] Tool %s failed: %s",
-                        tool_name,
-                        error_output,
-                        exc_info=True,
-                    )
-                    yield StreamToolOutputAvailable(
-                        toolCallId=tool_call_id,
-                        toolName=tool_name,
-                        output=error_output,
-                        success=False,
-                    )
-                    tool_output = error_output
-
-                # Append tool result to context for next round
-                openai_messages.append(
-                    {
-                        "role": "tool",
-                        "tool_call_id": tool_call_id,
-                        "content": tool_output,
-                    }
-                )
-        else:
-            # for-loop exhausted without break -> tool-round limit hit
+        if loop_result and not loop_result.finished_naturally:
            limit_msg = (
                f"Exceeded {_MAX_TOOL_ROUNDS} tool-call rounds "
                "without a final response."
@@ -418,11 +644,28 @@ async def stream_chat_completion_baseline(
        _stream_error = True
        error_msg = str(e) or type(e).__name__
        logger.error("[Baseline] Streaming error: %s", error_msg, exc_info=True)
-        # Close any open text/step before emitting error
-        if text_started:
-            yield StreamTextEnd(id=text_block_id)
-        if step_open:
-            yield StreamFinishStep()
+        # Close any open text block.  The llm_caller's finally block
+        # already appended StreamFinishStep to pending_events, so we must
+        # insert StreamTextEnd *before* StreamFinishStep to preserve the
+        # protocol ordering:
+        #   StreamStartStep -> StreamTextStart -> ...deltas... ->
+        #   StreamTextEnd -> StreamFinishStep
+        # Appending (or yielding directly) would place it after
+        # StreamFinishStep, violating the protocol.
+        if state.text_started:
+            # Find the last StreamFinishStep and insert before it.
+            insert_pos = len(state.pending_events)
+            for i in range(len(state.pending_events) - 1, -1, -1):
+                if isinstance(state.pending_events[i], StreamFinishStep):
+                    insert_pos = i
+                    break
+            state.pending_events.insert(
+                insert_pos, StreamTextEnd(id=state.text_block_id)
+            )
+        # Drain pending events in correct order
+        for evt in state.pending_events:
+            yield evt
+        state.pending_events.clear()
        yield StreamError(errorText=error_msg, code="baseline_error")
        # Still persist whatever we got
    finally:
@@ -442,26 +685,21 @@ async def stream_chat_completion_baseline(
        # Skip fallback when an error occurred and no output was produced —
        # charging rate-limit tokens for completely failed requests is unfair.
        if (
-            turn_prompt_tokens == 0
-            and turn_completion_tokens == 0
-            and not (_stream_error and not assistant_text)
+            state.turn_prompt_tokens == 0
+            and state.turn_completion_tokens == 0
+            and not (_stream_error and not state.assistant_text)
        ):
-            from backend.util.prompt import (
-                estimate_token_count,
-                estimate_token_count_str,
+            state.turn_prompt_tokens = max(
+                estimate_token_count(openai_messages, model=config.fast_model), 1
            )
-
-            turn_prompt_tokens = max(
-                estimate_token_count(openai_messages, model=config.model), 1
-            )
-            turn_completion_tokens = estimate_token_count_str(
-                assistant_text, model=config.model
+            state.turn_completion_tokens = estimate_token_count_str(
+                state.assistant_text, model=config.fast_model
            )
            logger.info(
                "[Baseline] No streaming usage reported; estimated tokens: "
                "prompt=%d, completion=%d",
-                turn_prompt_tokens,
-                turn_completion_tokens,
+                state.turn_prompt_tokens,
+                state.turn_completion_tokens,
            )

        # Persist token usage to session and record for rate limiting.
@@ -471,31 +709,50 @@ async def stream_chat_completion_baseline(
        await persist_and_record_usage(
            session=session,
            user_id=user_id,
-            prompt_tokens=turn_prompt_tokens,
-            completion_tokens=turn_completion_tokens,
+            prompt_tokens=state.turn_prompt_tokens,
+            completion_tokens=state.turn_completion_tokens,
            log_prefix="[Baseline]",
        )

        # Persist assistant response
-        if assistant_text:
+        if state.assistant_text:
            session.messages.append(
-                ChatMessage(role="assistant", content=assistant_text)
+                ChatMessage(role="assistant", content=state.assistant_text)
            )
        try:
            await upsert_chat_session(session)
        except Exception as persist_err:
            logger.error("[Baseline] Failed to persist session: %s", persist_err)

+        # --- Upload transcript for next-turn continuity ---
+        if user_id and transcript_covers_prefix:
+            try:
+                _transcript_content = transcript_builder.to_jsonl()
+                if _transcript_content and validate_transcript(_transcript_content):
+                    await asyncio.shield(
+                        upload_transcript(
+                            user_id=user_id,
+                            session_id=session_id,
+                            content=_transcript_content,
+                            message_count=len(session.messages),
+                            log_prefix="[Baseline]",
+                        )
+                    )
+                else:
+                    logger.debug("[Baseline] No valid transcript to upload")
+            except Exception as upload_err:
+                logger.error("[Baseline] Transcript upload failed: %s", upload_err)
+
    # Yield usage and finish AFTER try/finally (not inside finally).
    # PEP 525 prohibits yielding from finally in async generators during
    # aclose() — doing so raises RuntimeError on client disconnect.
    # On GeneratorExit the client is already gone, so unreachable yields
    # are harmless; on normal completion they reach the SSE stream.
-    if turn_prompt_tokens > 0 or turn_completion_tokens > 0:
+    if state.turn_prompt_tokens > 0 or state.turn_completion_tokens > 0:
        yield StreamUsage(
-            prompt_tokens=turn_prompt_tokens,
-            completion_tokens=turn_completion_tokens,
-            total_tokens=turn_prompt_tokens + turn_completion_tokens,
+            prompt_tokens=state.turn_prompt_tokens,
+            completion_tokens=state.turn_completion_tokens,
+            total_tokens=state.turn_prompt_tokens + state.turn_completion_tokens,
        )

    yield StreamFinish()
--- a/autogpt_platform/backend/backend/copilot/baseline/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service_test.py
@@ -31,7 +31,7 @@ async def test_baseline_multi_turn(setup_test_user, test_user_id):
    if not api_key:
        return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")

-    session = await create_chat_session(test_user_id)
+    session = await create_chat_session(test_user_id, dry_run=False)
    session = await upsert_chat_session(session)

    # --- Turn 1: send a message with a unique keyword ---
--- a/autogpt_platform/backend/backend/copilot/config.py
+++ b/autogpt_platform/backend/backend/copilot/config.py
@@ -14,12 +14,21 @@ class ChatConfig(BaseSettings):

    # OpenAI API Configuration
    model: str = Field(
-        default="anthropic/claude-opus-4.6", description="Default model to use"
+        default="anthropic/claude-opus-4.6",
+        description="Default model for extended thinking mode",
+    )
+    fast_model: str = Field(
+        default="anthropic/claude-sonnet-4",
+        description="Model for fast mode (baseline path). Should be faster/cheaper than the default model.",
    )
    title_model: str = Field(
        default="openai/gpt-4o-mini",
        description="Model to use for generating session titles (should be fast/cheap)",
    )
+    simulation_model: str = Field(
+        default="google/gemini-2.5-flash",
+        description="Model for dry-run block simulation (should be fast/cheap with good JSON output)",
+    )
    api_key: str | None = Field(default=None, description="OpenAI API key")
    base_url: str | None = Field(
        default=OPENROUTER_BASE_URL,
@@ -77,11 +86,11 @@ class ChatConfig(BaseSettings):
    # allows ~70-100 turns/day.
    # Checked at the HTTP layer (routes.py) before each turn.
    #
-    # TODO: These are deploy-time constants applied identically to every user.
-    #  If per-user or per-plan limits are needed (e.g., free tier vs paid), these
-    #  must move to the database (e.g., a UserPlan table) and get_usage_status /
-    #  check_rate_limit would look up each user's specific limits instead of
-    #  reading config.daily_token_limit / config.weekly_token_limit.
+    # These are base limits for the FREE tier. Higher tiers (PRO, BUSINESS,
+    # ENTERPRISE) multiply these by their tier multiplier (see
+    # rate_limit.TIER_MULTIPLIERS). User tier is stored in the
+    # User.subscriptionTier DB column and resolved inside
+    # get_global_rate_limits().
    daily_token_limit: int = Field(
        default=2_500_000,
        description="Max tokens per day, resets at midnight UTC (0 = unlimited)",
@@ -91,6 +100,20 @@ class ChatConfig(BaseSettings):
        description="Max tokens per week, resets Monday 00:00 UTC (0 = unlimited)",
    )

+    # Cost (in credits / cents) to reset the daily rate limit using credits.
+    # When a user hits their daily limit, they can spend this amount to reset
+    # the daily counter and keep working.  Set to 0 to disable the feature.
+    rate_limit_reset_cost: int = Field(
+        default=500,
+        ge=0,
+        description="Credit cost (in cents) for resetting the daily rate limit. 0 = disabled.",
+    )
+    max_daily_resets: int = Field(
+        default=5,
+        ge=0,
+        description="Maximum number of credit-based rate limit resets per user per day. 0 = unlimited.",
+    )
+
    # Claude Agent SDK Configuration
    use_claude_agent_sdk: bool = Field(
        default=True,
@@ -115,6 +138,32 @@ class ChatConfig(BaseSettings):
        description="Use --resume for multi-turn conversations instead of "
        "history compression. Falls back to compression when unavailable.",
    )
+    claude_agent_fallback_model: str = Field(
+        default="claude-sonnet-4-20250514",
+        description="Fallback model when the primary model is unavailable (e.g. 529 "
+        "overloaded). The SDK automatically retries with this cheaper model.",
+    )
+    claude_agent_max_turns: int = Field(
+        default=50,
+        ge=1,
+        le=500,
+        description="Maximum number of agentic turns (tool-use loops) per query. "
+        "Prevents runaway tool loops from burning budget.",
+    )
+    claude_agent_max_budget_usd: float = Field(
+        default=5.0,
+        ge=0.01,
+        le=100.0,
+        description="Maximum spend in USD per SDK query. The CLI aborts the "
+        "request if this budget is exceeded.",
+    )
+    claude_agent_max_transient_retries: int = Field(
+        default=3,
+        ge=0,
+        le=10,
+        description="Maximum number of retries for transient API errors "
+        "(429, 5xx, ECONNRESET) before surfacing the error to the user.",
+    )
    use_openrouter: bool = Field(
        default=True,
        description="Enable routing API calls through the OpenRouter proxy. "
@@ -164,7 +213,7 @@ class ChatConfig(BaseSettings):

        Single source of truth for "will the SDK route through OpenRouter?".
        Checks the flag *and* that ``api_key`` + a valid ``base_url`` are
-        present — mirrors the fallback logic in ``_build_sdk_env``.
+        present — mirrors the fallback logic in ``build_sdk_env``.
        """
        if not self.use_openrouter:
            return False
--- a/autogpt_platform/backend/backend/copilot/constants.py
+++ b/autogpt_platform/backend/backend/copilot/constants.py
@@ -44,12 +44,32 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
 # Transient Anthropic API error detection
 # ---------------------------------------------------------------------------
 # Patterns in error text that indicate a transient Anthropic API error
-# (ECONNRESET / dropped TCP connection) which is retryable.
+# which is retryable.  Covers:
+#   - Connection-level: ECONNRESET, dropped TCP connections
+#   - HTTP 429: rate-limit / too-many-requests
+#   - HTTP 5xx: server errors, overloaded
 _TRANSIENT_ERROR_PATTERNS = (
+    # Connection-level
    "socket connection was closed unexpectedly",
    "ECONNRESET",
    "connection was forcibly closed",
    "network socket disconnected",
+    # 429 rate-limit patterns
+    "rate limit",
+    "rate_limit",
+    "too many requests",
+    "status code 429",
+    # 5xx server error patterns
+    "overloaded",
+    "internal server error",
+    "bad gateway",
+    "service unavailable",
+    "gateway timeout",
+    "status code 529",
+    "status code 500",
+    "status code 502",
+    "status code 503",
+    "status code 504",
 )

 FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"
--- a/autogpt_platform/backend/backend/copilot/context.py
+++ b/autogpt_platform/backend/backend/copilot/context.py
@@ -17,6 +17,9 @@ from backend.util.workspace import WorkspaceManager
 if TYPE_CHECKING:
    from e2b import AsyncSandbox

+    from backend.copilot.permissions import CopilotPermissions
+
+
 # Allowed base directory for the Read tool.  Public so service.py can use it
 # for sweep operations without depending on a private implementation detail.
 # Respects CLAUDE_CONFIG_DIR env var, consistent with transcript.py's
@@ -43,6 +46,12 @@ _current_sandbox: ContextVar["AsyncSandbox | None"] = ContextVar(
 )
 _current_sdk_cwd: ContextVar[str] = ContextVar("_current_sdk_cwd", default="")

+# Current execution's capability filter.  None means "no restrictions".
+# Set by set_execution_context(); read by run_block and service.py.
+_current_permissions: "ContextVar[CopilotPermissions | None]" = ContextVar(
+    "_current_permissions", default=None
+)
+

 def encode_cwd_for_cli(cwd: str) -> str:
    """Encode a working directory path the same way the Claude CLI does.
@@ -63,6 +72,7 @@ def set_execution_context(
    session: ChatSession,
    sandbox: "AsyncSandbox | None" = None,
    sdk_cwd: str | None = None,
+    permissions: "CopilotPermissions | None" = None,
 ) -> None:
    """Set per-turn context variables used by file-resolution tool handlers."""
    _current_user_id.set(user_id)
@@ -70,6 +80,7 @@ def set_execution_context(
    _current_sandbox.set(sandbox)
    _current_sdk_cwd.set(sdk_cwd or "")
    _current_project_dir.set(_encode_cwd_for_cli(sdk_cwd) if sdk_cwd else "")
+    _current_permissions.set(permissions)


 def get_execution_context() -> tuple[str | None, ChatSession | None]:
@@ -77,6 +88,11 @@ def get_execution_context() -> tuple[str | None, ChatSession | None]:
    return _current_user_id.get(), _current_session.get()


+def get_current_permissions() -> "CopilotPermissions | None":
+    """Return the capability filter for the current execution, or None if unrestricted."""
+    return _current_permissions.get()
+
+
 def get_current_sandbox() -> "AsyncSandbox | None":
    """Return the E2B sandbox for the current session, or None if not active."""
    return _current_sandbox.get()
@@ -88,17 +104,32 @@ def get_sdk_cwd() -> str:


 E2B_WORKDIR = "/home/user"
+E2B_ALLOWED_DIRS: tuple[str, ...] = (E2B_WORKDIR, "/tmp")
+E2B_ALLOWED_DIRS_STR: str = " or ".join(E2B_ALLOWED_DIRS)
+
+
+def is_within_allowed_dirs(path: str) -> bool:
+    """Return True if *path* is within one of the allowed sandbox directories."""
+    for allowed in E2B_ALLOWED_DIRS:
+        if path == allowed or path.startswith(allowed + "/"):
+            return True
+    return False


 def resolve_sandbox_path(path: str) -> str:
-    """Normalise *path* to an absolute sandbox path under ``/home/user``.
+    """Normalise *path* to an absolute sandbox path under an allowed directory.
+
+    Allowed directories: ``/home/user`` and ``/tmp``.
+    Relative paths are resolved against ``/home/user``.

    Raises :class:`ValueError` if the resolved path escapes the sandbox.
    """
    candidate = path if os.path.isabs(path) else os.path.join(E2B_WORKDIR, path)
    normalized = os.path.normpath(candidate)
-    if normalized != E2B_WORKDIR and not normalized.startswith(E2B_WORKDIR + "/"):
-        raise ValueError(f"Path must be within {E2B_WORKDIR}: {path}")
+    if not is_within_allowed_dirs(normalized):
+        raise ValueError(
+            f"Path must be within {E2B_ALLOWED_DIRS_STR}: {os.path.basename(path)}"
+        )
    return normalized


@@ -118,7 +149,8 @@ def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:

    Allowed:
    - Files under *sdk_cwd* (``/tmp/copilot-<session>/``)
-    - Files under ``~/.claude/projects/<encoded-cwd>/<uuid>/tool-results/...``.
+    - Files under ``~/.claude/projects/<encoded-cwd>/<uuid>/tool-results/...``
+      or ``tool-outputs/...``.
      The SDK nests tool-results under a conversation UUID directory;
      the UUID segment is validated with ``_UUID_RE``.
    """
@@ -143,17 +175,20 @@ def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
        # Defence-in-depth: ensure project_dir didn't escape the base.
        if not project_dir.startswith(SDK_PROJECTS_DIR + os.sep):
            return False
-        # Only allow: <encoded-cwd>/<uuid>/tool-results/<file>
+        # Only allow: <encoded-cwd>/<uuid>/<tool-dir>/<file>
        # The SDK always creates a conversation UUID directory between
-        # the project dir and tool-results/.
+        # the project dir and the tool directory.
+        # Accept both "tool-results" (SDK's persisted outputs) and
+        # "tool-outputs" (the model sometimes confuses workspace paths
+        # with filesystem paths and generates this variant).
        if resolved.startswith(project_dir + os.sep):
            relative = resolved[len(project_dir) + 1 :]
            parts = relative.split(os.sep)
-            # Require exactly: [<uuid>, "tool-results", <file>, ...]
+            # Require exactly: [<uuid>, "tool-results"|"tool-outputs", <file>, ...]
            if (
                len(parts) >= 3
                and _UUID_RE.match(parts[0])
-                and parts[1] == "tool-results"
+                and parts[1] in ("tool-results", "tool-outputs")
            ):
                return True

--- a/autogpt_platform/backend/backend/copilot/context_test.py
+++ b/autogpt_platform/backend/backend/copilot/context_test.py
@@ -11,6 +11,7 @@ import pytest
 from backend.copilot.context import (
    SDK_PROJECTS_DIR,
    _current_project_dir,
+    get_current_permissions,
    get_current_sandbox,
    get_execution_context,
    get_sdk_cwd,
@@ -18,6 +19,7 @@ from backend.copilot.context import (
    resolve_sandbox_path,
    set_execution_context,
 )
+from backend.copilot.permissions import CopilotPermissions


 def _make_session() -> MagicMock:
@@ -61,6 +63,19 @@ def test_get_current_sandbox_returns_set_value():
    assert get_current_sandbox() is mock_sandbox


+def test_set_and_get_current_permissions():
+    """set_execution_context stores permissions; get_current_permissions returns it."""
+    perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+    set_execution_context("u1", _make_session(), permissions=perms)
+    assert get_current_permissions() is perms
+
+
+def test_get_current_permissions_defaults_to_none():
+    """get_current_permissions returns None when no permissions have been set."""
+    set_execution_context("u1", _make_session())
+    assert get_current_permissions() is None
+
+
 def test_get_sdk_cwd_empty_when_not_set():
    """get_sdk_cwd returns empty string when sdk_cwd is not set."""
    set_execution_context("u1", _make_session(), sdk_cwd=None)
@@ -119,6 +134,21 @@ def test_is_allowed_local_path_tool_results_with_uuid():
        _current_project_dir.set("")


+def test_is_allowed_local_path_tool_outputs_with_uuid():
+    """Files under <encoded-cwd>/<uuid>/tool-outputs/ are also allowed."""
+    encoded = "test-encoded-dir"
+    conv_uuid = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
+    path = os.path.join(
+        SDK_PROJECTS_DIR, encoded, conv_uuid, "tool-outputs", "output.json"
+    )
+
+    _current_project_dir.set(encoded)
+    try:
+        assert is_allowed_local_path(path, sdk_cwd=None)
+    finally:
+        _current_project_dir.set("")
+
+
 def test_is_allowed_local_path_tool_results_without_uuid_rejected():
    """Direct <encoded-cwd>/tool-results/ (no UUID) is rejected."""
    encoded = "test-encoded-dir"
@@ -144,7 +174,7 @@ def test_is_allowed_local_path_sibling_of_tool_results_is_rejected():


 def test_is_allowed_local_path_valid_uuid_wrong_segment_name_rejected():
-    """A valid UUID dir but non-'tool-results' second segment is rejected."""
+    """A valid UUID dir but non-'tool-results'/'tool-outputs' second segment is rejected."""
    encoded = "test-encoded-dir"
    uuid_str = "12345678-1234-5678-9abc-def012345678"
    path = os.path.join(
@@ -183,10 +213,32 @@ def test_resolve_sandbox_path_normalizes_dots():


 def test_resolve_sandbox_path_escape_raises():
-    with pytest.raises(ValueError, match="/home/user"):
+    with pytest.raises(ValueError, match="must be within"):
        resolve_sandbox_path("/home/user/../../etc/passwd")


 def test_resolve_sandbox_path_absolute_outside_raises():
-    with pytest.raises(ValueError, match="/home/user"):
+    with pytest.raises(ValueError):
        resolve_sandbox_path("/etc/passwd")
+
+
+def test_resolve_sandbox_path_tmp_allowed():
+    assert resolve_sandbox_path("/tmp/data.txt") == "/tmp/data.txt"
+
+
+def test_resolve_sandbox_path_tmp_nested():
+    assert resolve_sandbox_path("/tmp/a/b/c.txt") == "/tmp/a/b/c.txt"
+
+
+def test_resolve_sandbox_path_tmp_itself():
+    assert resolve_sandbox_path("/tmp") == "/tmp"
+
+
+def test_resolve_sandbox_path_tmp_escape_raises():
+    with pytest.raises(ValueError):
+        resolve_sandbox_path("/tmp/../etc/passwd")
+
+
+def test_resolve_sandbox_path_tmp_prefix_collision_raises():
+    with pytest.raises(ValueError):
+        resolve_sandbox_path("/tmp_evil/malicious.txt")
--- a/autogpt_platform/backend/backend/copilot/db.py
+++ b/autogpt_platform/backend/backend/copilot/db.py
@@ -18,7 +18,13 @@ from prisma.types import (
 from backend.data import db
 from backend.util.json import SafeJson, sanitize_string

-from .model import ChatMessage, ChatSession, ChatSessionInfo
+from .model import (
+    ChatMessage,
+    ChatSession,
+    ChatSessionInfo,
+    ChatSessionMetadata,
+    invalidate_session_cache,
+)

 logger = logging.getLogger(__name__)

@@ -35,6 +41,7 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
 async def create_chat_session(
    session_id: str,
    user_id: str,
+    metadata: ChatSessionMetadata | None = None,
 ) -> ChatSessionInfo:
    """Create a new chat session in the database."""
    data = ChatSessionCreateInput(
@@ -43,6 +50,7 @@ async def create_chat_session(
        credentials=SafeJson({}),
        successfulAgentRuns=SafeJson({}),
        successfulAgentSchedules=SafeJson({}),
+        metadata=SafeJson((metadata or ChatSessionMetadata()).model_dump()),
    )
    prisma_session = await PrismaChatSession.prisma().create(data=data)
    return ChatSessionInfo.from_db(prisma_session)
@@ -57,7 +65,12 @@ async def update_chat_session(
    total_completion_tokens: int | None = None,
    title: str | None = None,
 ) -> ChatSession | None:
-    """Update a chat session's metadata."""
+    """Update a chat session's mutable fields.
+
+    Note: ``metadata`` (which includes ``dry_run``) is intentionally omitted —
+    it is set once at creation time and treated as immutable for the lifetime
+    of the session.
+    """
    data: ChatSessionUpdateInput = {"updatedAt": datetime.now(UTC)}

    if credentials is not None:
@@ -217,6 +230,9 @@ async def add_chat_messages_batch(
                    if msg.get("function_call") is not None:
                        data["functionCall"] = SafeJson(msg["function_call"])

+                    if msg.get("duration_ms") is not None:
+                        data["durationMs"] = msg["duration_ms"]
+
                    messages_data.append(data)

                # Run create_many and session update in parallel within transaction
@@ -359,3 +375,22 @@ async def update_tool_message_content(
            f"tool_call_id {tool_call_id}: {e}"
        )
        return False
+
+
+async def set_turn_duration(session_id: str, duration_ms: int) -> None:
+    """Set durationMs on the last assistant message in a session.
+
+    Also invalidates the Redis session cache so the next GET returns
+    the updated duration.
+    """
+    last_msg = await PrismaChatMessage.prisma().find_first(
+        where={"sessionId": session_id, "role": "assistant"},
+        order={"sequence": "desc"},
+    )
+    if last_msg:
+        await PrismaChatMessage.prisma().update(
+            where={"id": last_msg.id},
+            data={"durationMs": duration_ms},
+        )
+        # Invalidate cache so the session is re-fetched from DB with durationMs
+        await invalidate_session_cache(session_id)
--- a/autogpt_platform/backend/backend/copilot/executor/processor.py
+++ b/autogpt_platform/backend/backend/copilot/executor/processor.py
@@ -14,7 +14,7 @@ import time
 from backend.copilot import stream_registry
 from backend.copilot.baseline import stream_chat_completion_baseline
 from backend.copilot.config import ChatConfig
-from backend.copilot.response_model import StreamFinish
+from backend.copilot.response_model import StreamError
 from backend.copilot.sdk import service as sdk_service
 from backend.copilot.sdk.dummy import stream_chat_completion_dummy
 from backend.executor.cluster_lock import ClusterLock
@@ -23,6 +23,7 @@ from backend.util.feature_flag import Flag, is_feature_enabled
 from backend.util.logging import TruncatedLogger, configure_logging
 from backend.util.process import set_service_name
 from backend.util.retry import func_retry
+from backend.util.workspace_storage import shutdown_workspace_storage

 from .utils import CoPilotExecutionEntry, CoPilotLogMetadata

@@ -153,8 +154,6 @@ class CoPilotProcessor:
        worker's event loop, ensuring ``aiohttp.ClientSession.close()``
        runs on the same loop that created the session.
        """
-        from backend.util.workspace_storage import shutdown_workspace_storage
-
        coro = shutdown_workspace_storage()
        try:
            future = asyncio.run_coroutine_threadsafe(coro, self.execution_loop)
@@ -252,51 +251,64 @@ class CoPilotProcessor:
                stream_fn = stream_chat_completion_dummy
                log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
            else:
-                use_sdk = (
-                    config.use_claude_code_subscription
-                    or await is_feature_enabled(
-                        Flag.COPILOT_SDK,
-                        entry.user_id or "anonymous",
-                        default=config.use_claude_agent_sdk,
+                # Per-request mode override from the frontend takes priority.
+                # 'fast' → baseline (OpenAI-compatible), 'extended_thinking' → SDK.
+                if entry.mode == "fast":
+                    use_sdk = False
+                elif entry.mode == "extended_thinking":
+                    use_sdk = True
+                else:
+                    # No mode specified — fall back to feature flag / config.
+                    use_sdk = (
+                        config.use_claude_code_subscription
+                        or await is_feature_enabled(
+                            Flag.COPILOT_SDK,
+                            entry.user_id or "anonymous",
+                            default=config.use_claude_agent_sdk,
+                        )
                    )
-                )
                stream_fn = (
                    sdk_service.stream_chat_completion_sdk
                    if use_sdk
                    else stream_chat_completion_baseline
                )
-                log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
+                log.info(
+                    f"Using {'SDK' if use_sdk else 'baseline'} service "
+                    f"(mode={entry.mode or 'default'})"
+                )

            # Stream chat completion and publish chunks to Redis.
-            async for chunk in stream_fn(
+            # stream_and_publish wraps the raw stream with registry
+            # publishing (shared with collect_copilot_response).
+            raw_stream = stream_fn(
                session_id=entry.session_id,
                message=entry.message if entry.message else None,
                is_user_message=entry.is_user_message,
                user_id=entry.user_id,
                context=entry.context,
                file_ids=entry.file_ids,
+            )
+            async for chunk in stream_registry.stream_and_publish(
+                session_id=entry.session_id,
+                turn_id=entry.turn_id,
+                stream=raw_stream,
            ):
                if cancel.is_set():
                    log.info("Cancel requested, breaking stream")
                    break

+                # Capture StreamError so mark_session_completed receives
+                # the error message (stream_and_publish yields but does
+                # not publish StreamError — that's done by mark_session_completed).
+                if isinstance(chunk, StreamError):
+                    error_msg = chunk.errorText
+                    break
+
                current_time = time.monotonic()
                if current_time - last_refresh >= refresh_interval:
                    cluster_lock.refresh()
                    last_refresh = current_time

-                # Skip StreamFinish — mark_session_completed publishes it.
-                if isinstance(chunk, StreamFinish):
-                    continue
-
-                try:
-                    await stream_registry.publish_chunk(entry.turn_id, chunk)
-                except Exception as e:
-                    log.error(
-                        f"Error publishing chunk {type(chunk).__name__}: {e}",
-                        exc_info=True,
-                    )
-
            # Stream loop completed
            if cancel.is_set():
                log.info("Stream cancelled by user")
--- a/autogpt_platform/backend/backend/copilot/executor/utils.py
+++ b/autogpt_platform/backend/backend/copilot/executor/utils.py
@@ -6,6 +6,7 @@ Defines two exchanges and queues following the graph executor pattern:
 """

 import logging
+from typing import Literal

 from pydantic import BaseModel

@@ -156,6 +157,9 @@ class CoPilotExecutionEntry(BaseModel):
    file_ids: list[str] | None = None
    """Workspace file IDs attached to the user's message"""

+    mode: Literal["fast", "extended_thinking"] | None = None
+    """Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
+

 class CancelCoPilotEvent(BaseModel):
    """Event to cancel a CoPilot operation."""
@@ -175,6 +179,7 @@ async def enqueue_copilot_turn(
    is_user_message: bool = True,
    context: dict[str, str] | None = None,
    file_ids: list[str] | None = None,
+    mode: Literal["fast", "extended_thinking"] | None = None,
 ) -> None:
    """Enqueue a CoPilot task for processing by the executor service.

@@ -186,6 +191,7 @@ async def enqueue_copilot_turn(
        is_user_message: Whether the message is from the user (vs system/assistant)
        context: Optional context for the message (e.g., {url: str, content: str})
        file_ids: Optional workspace file IDs attached to the user's message
+        mode: Autopilot mode override ('fast' or 'extended_thinking'). None = server default.
    """
    from backend.util.clients import get_async_copilot_queue

@@ -197,6 +203,7 @@ async def enqueue_copilot_turn(
        is_user_message=is_user_message,
        context=context,
        file_ids=file_ids,
+        mode=mode,
    )

    queue_client = await get_async_copilot_queue()
--- a/autogpt_platform/backend/backend/copilot/integration_creds.py
+++ b/autogpt_platform/backend/backend/copilot/integration_creds.py
@@ -59,6 +59,16 @@ _null_cache: TTLCache[tuple[str, str], bool] = TTLCache(
    maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
 )

+# GitHub user identity caches (keyed by user_id only, not provider tuple).
+# Declared here so invalidate_user_provider_cache() can reference them.
+_GH_IDENTITY_CACHE_TTL = 600.0  # 10 min — profile data rarely changes
+_gh_identity_cache: TTLCache[str, dict[str, str]] = TTLCache(
+    maxsize=_CACHE_MAX_SIZE, ttl=_GH_IDENTITY_CACHE_TTL
+)
+_gh_identity_null_cache: TTLCache[str, bool] = TTLCache(
+    maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
+)
+

 def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
    """Remove the cached entry for *user_id*/*provider* from both caches.
@@ -66,11 +76,19 @@ def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
    Call this after storing new credentials so that the next
    ``get_provider_token()`` call performs a fresh DB lookup instead of
    serving a stale TTL-cached result.
+
+    For GitHub specifically, also clears the git-identity caches so that
+    ``get_github_user_git_identity()`` re-fetches the user's profile on
+    the next call instead of serving stale identity data.
    """
    key = (user_id, provider)
    _token_cache.pop(key, None)
    _null_cache.pop(key, None)

+    if provider == "github":
+        _gh_identity_cache.pop(user_id, None)
+        _gh_identity_null_cache.pop(user_id, None)
+

 # Register this module's cache-bust function with the credentials manager so
 # that any create/update/delete operation immediately evicts stale cache
@@ -123,6 +141,7 @@ async def get_provider_token(user_id: str, provider: str) -> str | None:
        [c for c in creds_list if c.type == "oauth2"],
        key=lambda c: 0 if "repo" in (cast(OAuth2Credentials, c).scopes or []) else 1,
    )
+    refresh_failed = False
    for creds in oauth2_creds:
        if creds.type == "oauth2":
            try:
@@ -141,6 +160,7 @@ async def get_provider_token(user_id: str, provider: str) -> str | None:
                # Do NOT fall back to the stale token — it is likely expired
                # or revoked.  Returning None forces the caller to re-auth,
                # preventing the LLM from receiving a non-functional token.
+                refresh_failed = True
                continue
            _token_cache[cache_key] = token
            return token
@@ -152,8 +172,12 @@ async def get_provider_token(user_id: str, provider: str) -> str | None:
            _token_cache[cache_key] = token
            return token

-    # No credentials found — cache to avoid repeated DB hits.
-    _null_cache[cache_key] = True
+    # Only cache "not connected" when the user truly has no credentials for this
+    # provider.  If we had OAuth credentials but refresh failed (e.g. transient
+    # network error, event-loop mismatch), do NOT cache the negative result —
+    # the next call should retry the refresh instead of being blocked for 60 s.
+    if not refresh_failed:
+        _null_cache[cache_key] = True
    return None


@@ -171,3 +195,76 @@ async def get_integration_env_vars(user_id: str) -> dict[str, str]:
            for var in var_names:
                env[var] = token
    return env
+
+
+# ---------------------------------------------------------------------------
+# GitHub user identity (for git committer env vars)
+# ---------------------------------------------------------------------------
+
+
+async def get_github_user_git_identity(user_id: str) -> dict[str, str] | None:
+    """Fetch the GitHub user's name and email for git committer env vars.
+
+    Uses the ``/user`` GitHub API endpoint with the user's stored token.
+    Returns a dict with ``GIT_AUTHOR_NAME``, ``GIT_AUTHOR_EMAIL``,
+    ``GIT_COMMITTER_NAME``, and ``GIT_COMMITTER_EMAIL`` if the user has a
+    connected GitHub account.  Returns ``None`` otherwise.
+
+    Results are cached for 10 minutes; "not connected" results are cached for
+    60 s (same as null-token cache).
+    """
+    if user_id in _gh_identity_null_cache:
+        return None
+    if cached := _gh_identity_cache.get(user_id):
+        return cached
+
+    token = await get_provider_token(user_id, "github")
+    if not token:
+        _gh_identity_null_cache[user_id] = True
+        return None
+
+    import aiohttp
+
+    try:
+        async with aiohttp.ClientSession() as session:
+            async with session.get(
+                "https://api.github.com/user",
+                headers={
+                    "Authorization": f"token {token}",
+                    "Accept": "application/vnd.github+json",
+                },
+                timeout=aiohttp.ClientTimeout(total=5),
+            ) as resp:
+                if resp.status != 200:
+                    logger.warning(
+                        "[git-identity] GitHub /user returned %s for user %s",
+                        resp.status,
+                        user_id,
+                    )
+                    return None
+                data = await resp.json()
+    except Exception as exc:
+        logger.warning(
+            "[git-identity] Failed to fetch GitHub profile for user %s: %s",
+            user_id,
+            exc,
+        )
+        return None
+
+    name = data.get("name") or data.get("login") or "AutoGPT User"
+    # GitHub may return email=null if the user has set their email to private.
+    # Fall back to the noreply address GitHub generates for every account.
+    email = data.get("email")
+    if not email:
+        gh_id = data.get("id", "")
+        login = data.get("login", "user")
+        email = f"{gh_id}+{login}@users.noreply.github.com"
+
+    identity = {
+        "GIT_AUTHOR_NAME": name,
+        "GIT_AUTHOR_EMAIL": email,
+        "GIT_COMMITTER_NAME": name,
+        "GIT_COMMITTER_EMAIL": email,
+    }
+    _gh_identity_cache[user_id] = identity
+    return identity
--- a/autogpt_platform/backend/backend/copilot/integration_creds_test.py
+++ b/autogpt_platform/backend/backend/copilot/integration_creds_test.py
@@ -9,6 +9,8 @@ from backend.copilot.integration_creds import (
    _NULL_CACHE_TTL,
    _TOKEN_CACHE_TTL,
    PROVIDER_ENV_VARS,
+    _gh_identity_cache,
+    _gh_identity_null_cache,
    _null_cache,
    _token_cache,
    get_integration_env_vars,
@@ -49,9 +51,13 @@ def clear_caches():
    """Ensure clean caches before and after every test."""
    _token_cache.clear()
    _null_cache.clear()
+    _gh_identity_cache.clear()
+    _gh_identity_null_cache.clear()
    yield
    _token_cache.clear()
    _null_cache.clear()
+    _gh_identity_cache.clear()
+    _gh_identity_null_cache.clear()


 class TestInvalidateUserProviderCache:
@@ -77,6 +83,34 @@ class TestInvalidateUserProviderCache:
        invalidate_user_provider_cache(_USER, _PROVIDER)
        assert other_key in _token_cache

+    def test_clears_gh_identity_cache_for_github_provider(self):
+        """When provider is 'github', identity caches must also be cleared."""
+        _gh_identity_cache[_USER] = {
+            "GIT_AUTHOR_NAME": "Old Name",
+            "GIT_AUTHOR_EMAIL": "old@example.com",
+            "GIT_COMMITTER_NAME": "Old Name",
+            "GIT_COMMITTER_EMAIL": "old@example.com",
+        }
+        invalidate_user_provider_cache(_USER, "github")
+        assert _USER not in _gh_identity_cache
+
+    def test_clears_gh_identity_null_cache_for_github_provider(self):
+        """When provider is 'github', the identity null-cache must also be cleared."""
+        _gh_identity_null_cache[_USER] = True
+        invalidate_user_provider_cache(_USER, "github")
+        assert _USER not in _gh_identity_null_cache
+
+    def test_does_not_clear_gh_identity_cache_for_other_providers(self):
+        """When provider is NOT 'github', identity caches must be left alone."""
+        _gh_identity_cache[_USER] = {
+            "GIT_AUTHOR_NAME": "Some Name",
+            "GIT_AUTHOR_EMAIL": "some@example.com",
+            "GIT_COMMITTER_NAME": "Some Name",
+            "GIT_COMMITTER_EMAIL": "some@example.com",
+        }
+        invalidate_user_provider_cache(_USER, "some-other-provider")
+        assert _USER in _gh_identity_cache
+

 class TestGetProviderToken:
    @pytest.mark.asyncio(loop_scope="session")
@@ -129,8 +163,15 @@ class TestGetProviderToken:
        assert result == "oauth-tok"

    @pytest.mark.asyncio(loop_scope="session")
-    async def test_oauth2_refresh_failure_returns_none(self):
-        """On refresh failure, return None instead of caching a stale token."""
+    async def test_oauth2_refresh_failure_returns_none_without_null_cache(self):
+        """On refresh failure, return None but do NOT cache in null_cache.
+
+        The user has credentials — they just couldn't be refreshed right now
+        (e.g. transient network error or event-loop mismatch in the copilot
+        executor).  Caching a negative result would block all credential
+        lookups for 60 s even though the creds exist and may refresh fine
+        on the next attempt.
+        """
        oauth_creds = _make_oauth2_creds("stale-oauth-tok")
        mock_manager = MagicMock()
        mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[oauth_creds])
@@ -141,6 +182,8 @@ class TestGetProviderToken:

        # Stale tokens must NOT be returned — forces re-auth.
        assert result is None
+        # Must NOT cache negative result when refresh failed — next call retries.
+        assert (_USER, _PROVIDER) not in _null_cache

    @pytest.mark.asyncio(loop_scope="session")
    async def test_no_credentials_caches_null_entry(self):
@@ -176,6 +219,96 @@ class TestGetProviderToken:
        assert _NULL_CACHE_TTL < _TOKEN_CACHE_TTL


+class TestThreadSafetyLocks:
+    """Bug reproduction: shared AsyncRedisKeyedMutex across threads caused
+    'Future attached to a different loop' when copilot workers accessed
+    credentials from different event loops."""
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_store_locks_returns_per_thread_instance(self):
+        """IntegrationCredentialsStore.locks() must return different instances
+        for different threads (via @thread_cached)."""
+        import asyncio
+        import concurrent.futures
+
+        from backend.integrations.credentials_store import IntegrationCredentialsStore
+
+        store = IntegrationCredentialsStore()
+
+        async def get_locks_id():
+            mock_redis = AsyncMock()
+            with patch(
+                "backend.integrations.credentials_store.get_redis_async",
+                return_value=mock_redis,
+            ):
+                locks = await store.locks()
+                return id(locks)
+
+        # Get locks from main thread
+        main_id = await get_locks_id()
+
+        # Get locks from a worker thread
+        def run_in_thread():
+            loop = asyncio.new_event_loop()
+            try:
+                return loop.run_until_complete(get_locks_id())
+            finally:
+                loop.close()
+
+        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+            worker_id = await asyncio.get_event_loop().run_in_executor(
+                pool, run_in_thread
+            )
+
+        assert main_id != worker_id, (
+            "Store.locks() returned the same instance across threads. "
+            "This would cause 'Future attached to a different loop' errors."
+        )
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_manager_delegates_to_store_locks(self):
+        """IntegrationCredentialsManager.locks() should delegate to store."""
+        from backend.integrations.creds_manager import IntegrationCredentialsManager
+
+        manager = IntegrationCredentialsManager()
+        mock_redis = AsyncMock()
+
+        with patch(
+            "backend.integrations.credentials_store.get_redis_async",
+            return_value=mock_redis,
+        ):
+            locks = await manager.locks()
+
+        # Should have gotten it from the store
+        assert locks is not None
+
+
+class TestRefreshUnlockedPath:
+    """Bug reproduction: copilot worker threads need lock-free refresh because
+    Redis-backed asyncio.Lock created on one event loop can't be used on another."""
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_refresh_if_needed_lock_false_skips_redis(self):
+        """refresh_if_needed(lock=False) must not touch Redis locks at all."""
+        from backend.integrations.creds_manager import IntegrationCredentialsManager
+
+        manager = IntegrationCredentialsManager()
+        creds = _make_oauth2_creds()
+
+        mock_handler = MagicMock()
+        mock_handler.needs_refresh = MagicMock(return_value=False)
+
+        with patch(
+            "backend.integrations.creds_manager._get_provider_oauth_handler",
+            new_callable=AsyncMock,
+            return_value=mock_handler,
+        ):
+            result = await manager.refresh_if_needed(_USER, creds, lock=False)
+
+        # Should return credentials without touching locks
+        assert result.id == creds.id
+
+
 class TestGetIntegrationEnvVars:
    @pytest.mark.asyncio(loop_scope="session")
    async def test_injects_all_env_vars_for_provider(self):
--- a/autogpt_platform/backend/backend/copilot/model.py
+++ b/autogpt_platform/backend/backend/copilot/model.py
@@ -46,6 +46,16 @@ def _get_session_cache_key(session_id: str) -> str:
 # ===================== Chat data models ===================== #


+class ChatSessionMetadata(BaseModel):
+    """Typed metadata stored in the ``metadata`` JSON column of ChatSession.
+
+    Add new session-level flags here instead of adding DB columns —
+    no migration required for new fields as long as a default is provided.
+    """
+
+    dry_run: bool = False
+
+
 class ChatMessage(BaseModel):
    role: str
    content: str | None = None
@@ -54,6 +64,7 @@ class ChatMessage(BaseModel):
    refusal: str | None = None
    tool_calls: list[dict] | None = None
    function_call: dict | None = None
+    duration_ms: int | None = None

    @staticmethod
    def from_db(prisma_message: PrismaChatMessage) -> "ChatMessage":
@@ -66,6 +77,7 @@ class ChatMessage(BaseModel):
            refusal=prisma_message.refusal,
            tool_calls=_parse_json_field(prisma_message.toolCalls),
            function_call=_parse_json_field(prisma_message.functionCall),
+            duration_ms=prisma_message.durationMs,
        )


@@ -88,6 +100,12 @@ class ChatSessionInfo(BaseModel):
    updated_at: datetime
    successful_agent_runs: dict[str, int] = {}
    successful_agent_schedules: dict[str, int] = {}
+    metadata: ChatSessionMetadata = ChatSessionMetadata()
+
+    @property
+    def dry_run(self) -> bool:
+        """Convenience accessor for ``metadata.dry_run``."""
+        return self.metadata.dry_run

    @classmethod
    def from_db(cls, prisma_session: PrismaChatSession) -> Self:
@@ -101,6 +119,10 @@ class ChatSessionInfo(BaseModel):
            prisma_session.successfulAgentSchedules, default={}
        )

+        # Parse typed metadata from the JSON column.
+        raw_metadata = _parse_json_field(prisma_session.metadata, default={})
+        metadata = ChatSessionMetadata.model_validate(raw_metadata)
+
        # Calculate usage from token counts.
        # NOTE: Per-turn cache_read_tokens / cache_creation_tokens breakdown
        # is lost after persistence — the DB only stores aggregate prompt and
@@ -126,6 +148,7 @@ class ChatSessionInfo(BaseModel):
            updated_at=prisma_session.updatedAt,
            successful_agent_runs=successful_agent_runs,
            successful_agent_schedules=successful_agent_schedules,
+            metadata=metadata,
        )


@@ -133,7 +156,7 @@ class ChatSession(ChatSessionInfo):
    messages: list[ChatMessage]

    @classmethod
-    def new(cls, user_id: str) -> Self:
+    def new(cls, user_id: str, *, dry_run: bool) -> Self:
        return cls(
            session_id=str(uuid.uuid4()),
            user_id=user_id,
@@ -143,6 +166,7 @@ class ChatSession(ChatSessionInfo):
            credentials={},
            started_at=datetime.now(UTC),
            updated_at=datetime.now(UTC),
+            metadata=ChatSessionMetadata(dry_run=dry_run),
        )

    @classmethod
@@ -530,6 +554,7 @@ async def _save_session_to_db(
            await db.create_chat_session(
                session_id=session.session_id,
                user_id=session.user_id,
+                metadata=session.metadata,
            )
            existing_message_count = 0

@@ -607,21 +632,27 @@ async def append_and_save_message(session_id: str, message: ChatMessage) -> Chat
        return session


-async def create_chat_session(user_id: str) -> ChatSession:
+async def create_chat_session(user_id: str, *, dry_run: bool) -> ChatSession:
    """Create a new chat session and persist it.

+    Args:
+        user_id: The authenticated user ID.
+        dry_run: When True, run_block and run_agent tool calls in this
+            session are forced to use dry-run simulation mode.
+
    Raises:
        DatabaseError: If the database write fails. We fail fast to ensure
            callers never receive a non-persisted session that only exists
            in cache (which would be lost when the cache expires).
    """
-    session = ChatSession.new(user_id)
+    session = ChatSession.new(user_id, dry_run=dry_run)

    # Create in database first - fail fast if this fails
    try:
        await chat_db().create_chat_session(
            session_id=session.session_id,
            user_id=user_id,
+            metadata=session.metadata,
        )
    except Exception as e:
        logger.error(f"Failed to create session {session.session_id} in database: {e}")
--- a/Show More
+++ b/Show More