Merge branch 'dev' into ntindle/google-issues-fix

Resolved conflicts: - executor/utils.py: kept auto-credentials validation + dev's comment update - graph_test.py: kept both auto-credentials tests AND MCP deduplication tests
fix: add is_auto_credential and input_field_name to auto credentials schema
2026-03-17 03:00:27 -04:00 · 2026-02-16 00:09:35 -06:00 · 2026-02-12 16:13:23 -06:00 · 2026-02-12 16:12:20 -06:00 · 2026-02-09 13:46:37 -06:00 · 2026-02-08 16:11:47 -06:00
585 changed files with 35668 additions and 61789 deletions
--- a/.claude/skills/backend-check/SKILL.md
+++ b/.claude/skills/backend-check/SKILL.md
@@ -1,17 +0,0 @@
---
-name: backend-check
-description: Run the full backend formatting, linting, and test suite. Ensures code quality before commits and PRs. TRIGGER when backend Python code has been modified and needs validation.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Backend Check
-
-## Steps
-
-1. **Format**: `poetry run format` — runs formatting AND linting. NEVER run ruff/black/isort individually
-2. **Fix** any remaining errors manually, re-run until clean
-3. **Test**: `poetry run test` (runs DB setup + pytest). For specific files: `poetry run pytest -s -vvv <test_files>`
-4. **Snapshots** (if needed): `poetry run pytest path/to/test.py --snapshot-update` — review with `git diff`
--- a/.claude/skills/code-style/SKILL.md
+++ b/.claude/skills/code-style/SKILL.md
@@ -1,35 +0,0 @@
---
-name: code-style
-description: Python code style preferences for the AutoGPT backend. Apply when writing or reviewing Python code. TRIGGER when writing new Python code, reviewing PRs, or refactoring backend code.
-user-invocable: false
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Code Style
-
-## Imports
-
- **Top-level only** — no local/inner imports. Move all imports to the top of the file.
-
-## Typing
-
- **No duck typing** — avoid `hasattr`, `getattr`, `isinstance` for type dispatch. Use proper typed interfaces, unions, or protocols.
- **Pydantic models** over dataclass, namedtuple, or raw dict for structured data.
- **No linter suppressors** — avoid `# type: ignore`, `# noqa`, `# pyright: ignore` etc. 99% of the time the right fix is fixing the type/code, not silencing the tool.
-
-## Code Structure
-
- **List comprehensions** over manual loop-and-append.
- **Early return** — guard clauses first, avoid deep nesting.
- **Flatten inline** — prefer short, concise expressions. Reduce `if/else` chains with direct returns or ternaries when readable.
- **Modular functions** — break complex logic into small, focused functions rather than long blocks with nested conditionals.
-
-## Review Checklist
-
-Before finishing, always ask:
- Can any function be split into smaller pieces?
- Is there unnecessary nesting that an early return would eliminate?
- Can any loop be a comprehension?
- Is there a simpler way to express this logic?
--- a/.claude/skills/frontend-check/SKILL.md
+++ b/.claude/skills/frontend-check/SKILL.md
@@ -1,16 +0,0 @@
---
-name: frontend-check
-description: Run the full frontend formatting, linting, and type checking suite. Ensures code quality before commits and PRs. TRIGGER when frontend TypeScript/React code has been modified and needs validation.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Frontend Check
-
-## Steps (in order)
-
-1. **Format**: `pnpm format` — NEVER run individual formatters
-2. **Lint**: `pnpm lint` — fix errors, re-run until clean
-3. **Types**: `pnpm types` — if it keeps failing after multiple attempts, stop and ask the user
--- a/.claude/skills/new-block/SKILL.md
+++ b/.claude/skills/new-block/SKILL.md
@@ -1,29 +0,0 @@
---
-name: new-block
-description: Create a new backend block following the Block SDK Guide. Guides through provider configuration, schema definition, authentication, and testing. TRIGGER when user asks to create a new block, add a new integration, or build a new node for the graph editor.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# New Block Creation
-
-Read `docs/platform/block-sdk-guide.md` first for the full guide.
-
-## Steps
-
-1. **Provider config** (if external service): create `_config.py` with `ProviderBuilder`
-2. **Block file** in `backend/blocks/` (from `autogpt_platform/backend/`):
-   - Generate a UUID once with `uuid.uuid4()`, then **hard-code that string** as `id` (IDs must be stable across imports)
-   - `Input(BlockSchema)` and `Output(BlockSchema)` classes
-   - `async def run` that `yield`s output fields
-3. **Files**: use `store_media_file()` with `"for_block_output"` for outputs
-4. **Test**: `poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[MyBlock]' -xvs`
-5. **Format**: `poetry run format`
-
-## Rules
-
- Analyze interfaces: do inputs/outputs connect well with other blocks in a graph?
- Use top-level imports, avoid duck typing
- Always use `for_block_output` for block outputs
--- a/.claude/skills/openapi-regen/SKILL.md
+++ b/.claude/skills/openapi-regen/SKILL.md
@@ -1,28 +0,0 @@
---
-name: openapi-regen
-description: Regenerate the OpenAPI spec and frontend API client. Starts the backend REST server, fetches the spec, and regenerates the typed frontend hooks. TRIGGER when API routes change, new endpoints are added, or frontend API types are stale.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# OpenAPI Spec Regeneration
-
-## Steps
-
-1. **Run end-to-end** in a single shell block (so `REST_PID` persists):
-   ```bash
-   cd autogpt_platform/backend && poetry run rest &
-   REST_PID=$!
-   WAIT=0; until curl -sf http://localhost:8006/health > /dev/null 2>&1; do sleep 1; WAIT=$((WAIT+1)); [ $WAIT -ge 60 ] && echo "Timed out" && kill $REST_PID && exit 1; done
-   cd ../frontend && pnpm generate:api:force
-   kill $REST_PID
-   pnpm types && pnpm lint && pnpm format
-   ```
-
-## Rules
-
- Always use `pnpm generate:api:force` (not `pnpm generate:api`)
- Don't manually edit files in `src/app/api/__generated__/`
- Generated hooks follow: `use{Method}{Version}{OperationName}`
--- a/.claude/skills/pr-create/SKILL.md
+++ b/.claude/skills/pr-create/SKILL.md
@@ -1,31 +0,0 @@
---
-name: pr-create
-description: Create a pull request for the current branch. TRIGGER when user asks to create a PR, open a pull request, push changes for review, or submit work for merging.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Create Pull Request
-
-## Steps
-
-1. **Check for existing PR**: `gh pr view --json url -q .url 2>/dev/null` — if a PR already exists, output its URL and stop
-2. **Understand changes**: `git status`, `git diff dev...HEAD`, `git log dev..HEAD --oneline`
-3. **Read PR template**: `.github/PULL_REQUEST_TEMPLATE.md`
-4. **Draft PR title**: Use conventional commits format (see CLAUDE.md for types and scopes)
-5. **Fill out PR template** as the body — be thorough in the Changes section
-6. **Format first** (if relevant changes exist):
-   - Backend: `cd autogpt_platform/backend && poetry run format`
-   - Frontend: `cd autogpt_platform/frontend && pnpm format`
-   - Fix any lint errors, then commit formatting changes before pushing
-7. **Push**: `git push -u origin HEAD`
-8. **Create PR**: `gh pr create --base dev`
-9. **Output** the PR URL
-
-## Rules
-
- Always target `dev` branch
- Do NOT run tests — CI will handle that
- Use the PR template from `.github/PULL_REQUEST_TEMPLATE.md`
--- a/.claude/skills/pr-review/SKILL.md
+++ b/.claude/skills/pr-review/SKILL.md
@@ -1,51 +0,0 @@
---
-name: pr-review
-description: Address all open PR review comments systematically. Fetches comments, addresses each one, reacts +1/-1, and replies when clarification is needed. Keeps iterating until all comments are addressed and CI is green. TRIGGER when user shares a PR URL, asks to address review comments, fix PR feedback, or respond to reviewer comments.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# PR Review Comment Workflow
-
-## Steps
-
-1. **Find PR**: `gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT`
-2. **Fetch comments** (all three sources):
-   - `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` (top-level reviews)
-   - `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` (inline review comments)
-   - `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` (PR conversation comments)
-3. **Skip** comments already reacted to by PR author
-4. **For each unreacted comment**:
-   - Read referenced code, make the fix (or reply if you disagree/need info)
-   - **Inline review comments** (`pulls/{N}/comments`):
-     - React: `gh api repos/.../pulls/comments/{ID}/reactions -f content="+1"` (or `-1`)
-     - Reply: `gh api repos/.../pulls/{N}/comments/{ID}/replies -f body="..."`
-   - **PR conversation comments** (`issues/{N}/comments`):
-     - React: `gh api repos/.../issues/comments/{ID}/reactions -f content="+1"` (or `-1`)
-     - No threaded replies — post a new issue comment if needed
-   - **Top-level reviews**: no reaction API — address in code, reply via issue comment if needed
-5. **Include autogpt-reviewer bot fixes** too
-6. **Format**: `cd autogpt_platform/backend && poetry run format`, `cd autogpt_platform/frontend && pnpm format`
-7. **Commit & push**
-8. **Re-fetch comments** immediately — address any new unreacted ones before waiting on CI
-9. **Stay productive while CI runs** — don't idle. In priority order:
-   - Run any pending local tests (`poetry run pytest`, e2e, etc.) and fix failures
-   - Address any remaining comments
-   - Only poll `gh pr checks {N}` as the last resort when there's truly nothing left to do
-10. **If CI fails** — fix, go back to step 6
-11. **Re-fetch comments again** after CI is green — address anything that appeared while CI was running
-12. **Done** only when: all comments reacted AND CI is green.
-
-## CRITICAL: Do Not Stop
-
-**Loop is: address → format → commit → push → re-check comments → run local tests → wait CI → re-check comments → repeat.**
-
-Never idle. If CI is running and you have nothing to address, run local tests. Waiting on CI is the last resort.
-
-## Rules
-
- One todo per comment
- For inline review comments: reply on existing threads. For PR conversation comments: post a new issue comment (API doesn't support threaded replies)
- React to every comment: +1 addressed, -1 disagreed (with explanation)
--- a/.claude/skills/worktree-setup/SKILL.md
+++ b/.claude/skills/worktree-setup/SKILL.md
@@ -1,45 +0,0 @@
---
-name: worktree-setup
-description: Set up a new git worktree for parallel development. Creates the worktree, copies .env files, installs dependencies, generates Prisma client, and optionally starts the app (with port conflict resolution) or runs tests. TRIGGER when user asks to set up a worktree, work on a branch in isolation, or needs a separate environment for a branch or PR.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Worktree Setup
-
-## Preferred: Use Branchlet
-
-The repo has a `.branchlet.json` config — it handles env file copying, dependency installation, and Prisma generation automatically.
-
-```bash
-npm install -g branchlet                                      # install once
-branchlet create -n <name> -s <source-branch> -b <new-branch>
-branchlet list --json   # list all worktrees
-```
-
-## Manual Fallback
-
-If branchlet isn't available:
-
-1. `git worktree add ../<RepoName><N> <branch-name>`
-2. Copy `.env` files: `backend/.env`, `frontend/.env`, `autogpt_platform/.env`, `db/docker/.env`
-3. Install deps:
-   - `cd autogpt_platform/backend && poetry install && poetry run prisma generate`
-   - `cd autogpt_platform/frontend && pnpm install`
-
-## Running the App
-
-Free ports first — backend uses: 8001, 8002, 8003, 8005, 8006, 8007, 8008.
-
-```bash
-for port in 8001 8002 8003 8005 8006 8007 8008; do
-  lsof -ti :$port | xargs kill -9 2>/dev/null || true
-done
-cd <worktree>/autogpt_platform/backend && poetry run app
-```
-
-## CoPilot Testing Gotcha
-
-SDK mode spawns a Claude subprocess — **won't work inside Claude Code**. Set `CHAT_USE_CLAUDE_AGENT_SDK=false` in `backend/.env` to use baseline mode.
--- a/.github/workflows/platform-backend-ci.yml
+++ b/.github/workflows/platform-backend-ci.yml
@@ -41,18 +41,13 @@ jobs:
        ports:
          - 6379:6379
      rabbitmq:
-        image: rabbitmq:4.1.4
+        image: rabbitmq:3.12-management
        ports:
          - 5672:5672
+          - 15672:15672
        env:
          RABBITMQ_DEFAULT_USER: ${{ env.RABBITMQ_DEFAULT_USER }}
          RABBITMQ_DEFAULT_PASS: ${{ env.RABBITMQ_DEFAULT_PASS }}
-        options: >-
-          --health-cmd "rabbitmq-diagnostics -q ping"
-          --health-interval 30s
-          --health-timeout 10s
-          --health-retries 5
-          --health-start-period 10s
      clamav:
        image: clamav/clamav-debian:latest
        ports:
--- a/.github/workflows/platform-frontend-ci.yml
+++ b/.github/workflows/platform-frontend-ci.yml
@@ -6,16 +6,10 @@ on:
    paths:
      - ".github/workflows/platform-frontend-ci.yml"
      - "autogpt_platform/frontend/**"
-      - "autogpt_platform/backend/Dockerfile"
-      - "autogpt_platform/docker-compose.yml"
-      - "autogpt_platform/docker-compose.platform.yml"
  pull_request:
    paths:
      - ".github/workflows/platform-frontend-ci.yml"
      - "autogpt_platform/frontend/**"
-      - "autogpt_platform/backend/Dockerfile"
-      - "autogpt_platform/docker-compose.yml"
-      - "autogpt_platform/docker-compose.platform.yml"
  merge_group:
  workflow_dispatch:

@@ -149,7 +143,7 @@ jobs:
          driver-opts: network=host

      - name: Set up Platform - Expose GHA cache to docker buildx CLI
-        uses: crazy-max/ghaction-github-runtime@v4
+        uses: crazy-max/ghaction-github-runtime@v3

      - name: Set up Platform - Build Docker images (with cache)
        working-directory: autogpt_platform
--- a/.gitignore
+++ b/.gitignore
@@ -180,6 +180,4 @@ autogpt_platform/backend/settings.py
 .claude/settings.local.json
 CLAUDE.local.md
 /autogpt_platform/backend/logs
-.next
-# Implementation plans (generated by AI agents)
-plans/
+.next
--- a/.nvmrc
+++ b/.nvmrc
@@ -1 +0,0 @@
-22
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,10 +1,3 @@
-default_install_hook_types:
-  - pre-commit
-  - pre-push
-  - post-checkout
-
-default_stages: [pre-commit]
-
 repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
@@ -24,7 +17,6 @@ repos:
        name: Detect secrets
        description: Detects high entropy strings that are likely to be passwords.
        files: ^autogpt_platform/
-        exclude: pnpm-lock\.yaml$
        stages: [pre-push]

  - repo: local
@@ -34,106 +26,49 @@ repos:
      - id: poetry-install
        name: Check & Install dependencies - AutoGPT Platform - Backend
        alias: poetry-install-platform-backend
+        entry: poetry -C autogpt_platform/backend install
        # include autogpt_libs source (since it's a path dependency)
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/(backend|autogpt_libs)/poetry\.lock$" || exit 0;
-          poetry -C autogpt_platform/backend install
-          '
-        always_run: true
+        files: ^autogpt_platform/(backend|autogpt_libs)/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - AutoGPT Platform - Libs
        alias: poetry-install-platform-libs
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/autogpt_libs/poetry\.lock$" || exit 0;
-          poetry -C autogpt_platform/autogpt_libs install
-          '
-        always_run: true
+        entry: poetry -C autogpt_platform/autogpt_libs install
+        files: ^autogpt_platform/autogpt_libs/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]
-
-      - id: pnpm-install
-        name: Check & Install dependencies - AutoGPT Platform - Frontend
-        alias: pnpm-install-platform-frontend
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/frontend/pnpm-lock\.yaml$" || exit 0;
-          pnpm --prefix autogpt_platform/frontend install
-          '
-        always_run: true
-        language: system
-        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - Classic - AutoGPT
        alias: poetry-install-classic-autogpt
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/(original_autogpt|forge)/poetry\.lock$" || exit 0;
-          poetry -C classic/original_autogpt install
-          '
+        entry: poetry -C classic/original_autogpt install
        # include forge source (since it's a path dependency)
-        always_run: true
+        files: ^classic/(original_autogpt|forge)/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - Classic - Forge
        alias: poetry-install-classic-forge
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/forge/poetry\.lock$" || exit 0;
-          poetry -C classic/forge install
-          '
-        always_run: true
+        entry: poetry -C classic/forge install
+        files: ^classic/forge/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - Classic - Benchmark
        alias: poetry-install-classic-benchmark
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/benchmark/poetry\.lock$" || exit 0;
-          poetry -C classic/benchmark install
-          '
-        always_run: true
+        entry: poetry -C classic/benchmark install
+        files: ^classic/benchmark/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

  - repo: local
    # For proper type checking, Prisma client must be up-to-date.
@@ -141,54 +76,12 @@ repos:
      - id: prisma-generate
        name: Prisma Generate - AutoGPT Platform - Backend
        alias: prisma-generate-platform-backend
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/((backend|autogpt_libs)/poetry\.lock|backend/schema\.prisma)$" || exit 0;
-          cd autogpt_platform/backend
-          && poetry run prisma generate
-          && poetry run gen-prisma-stub
-          '
+        entry: bash -c 'cd autogpt_platform/backend && poetry run prisma generate'
        # include everything that triggers poetry install + the prisma schema
-        always_run: true
+        files: ^autogpt_platform/((backend|autogpt_libs)/poetry\.lock|backend/schema.prisma)$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]
-
-      - id: export-api-schema
-        name: Export API schema - AutoGPT Platform - Backend -> Frontend
-        alias: export-api-schema-platform
-        entry: >
-          bash -c '
-          cd autogpt_platform/backend
-          && poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
-          && cd ../frontend
-          && pnpm prettier --write ./src/app/api/openapi.json
-          '
-        files: ^autogpt_platform/backend/
-        language: system
-        pass_filenames: false
-
-      - id: generate-api-client
-        name: Generate API client - AutoGPT Platform - Frontend
-        alias: generate-api-client-platform-frontend
-        entry: >
-          bash -c '
-          SCHEMA=autogpt_platform/frontend/src/app/api/openapi.json;
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --quiet "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF" -- "$SCHEMA" && exit 0
-          else
-            git diff --quiet HEAD -- "$SCHEMA" && exit 0
-          fi;
-          cd autogpt_platform/frontend && pnpm generate:api
-          '
-        always_run: true
-        language: system
-        pass_filenames: false
-        stages: [pre-commit, post-checkout]

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.7.2
--- a/autogpt_platform/.gitignore
+++ b/autogpt_platform/.gitignore
@@ -1,3 +1,2 @@
 *.ignore.*
-*.ign.*
-.application.logs
+*.ign.*
--- a/autogpt_platform/backend/.env.default
+++ b/autogpt_platform/backend/.env.default
@@ -190,8 +190,5 @@ ZEROBOUNCE_API_KEY=
 POSTHOG_API_KEY=
 POSTHOG_HOST=https://eu.i.posthog.com

-# Tally Form Integration (pre-populate business understanding on signup)
-TALLY_API_KEY=
-
 # Other Services
 AUTOMOD_API_KEY=
--- a/autogpt_platform/backend/Dockerfile
+++ b/autogpt_platform/backend/Dockerfile
@@ -53,6 +53,63 @@ COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/parti
 COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
 RUN poetry run prisma generate && poetry run gen-prisma-stub

+# ============================== BACKEND SERVER ============================== #
+
+FROM debian:13-slim AS server
+
+WORKDIR /app
+
+ENV POETRY_HOME=/opt/poetry \
+    POETRY_NO_INTERACTION=1 \
+    POETRY_VIRTUALENVS_CREATE=true \
+    POETRY_VIRTUALENVS_IN_PROJECT=true \
+    DEBIAN_FRONTEND=noninteractive
+ENV PATH=/opt/poetry/bin:$PATH
+
+# Install Python, FFmpeg, ImageMagick, and CLI tools for agent use.
+# bubblewrap provides OS-level sandbox (whitelist-only FS + no network)
+# for the bash_exec MCP tool.
+# Using --no-install-recommends saves ~650MB by skipping unnecessary deps like llvm, mesa, etc.
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3.13 \
+    python3-pip \
+    ffmpeg \
+    imagemagick \
+    jq \
+    ripgrep \
+    tree \
+    bubblewrap \
+    && rm -rf /var/lib/apt/lists/*
+
+COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
+COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
+# Copy Node.js installation for Prisma
+COPY --from=builder /usr/bin/node /usr/bin/node
+COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
+COPY --from=builder /usr/bin/npm /usr/bin/npm
+COPY --from=builder /usr/bin/npx /usr/bin/npx
+COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
+
+WORKDIR /app/autogpt_platform/backend
+
+# Copy only the .venv from builder (not the entire /app directory)
+# The .venv includes the generated Prisma client
+COPY --from=builder /app/autogpt_platform/backend/.venv ./.venv
+ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
+
+# Copy dependency files + autogpt_libs (path dependency)
+COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
+COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml ./
+
+# Copy backend code + docs (for Copilot docs search)
+COPY autogpt_platform/backend ./
+COPY docs /app/docs
+RUN poetry install --no-ansi --only-root
+
+ENV PORT=8000
+
+CMD ["poetry", "run", "rest"]
+
 # =============================== DB MIGRATOR =============================== #

 # Lightweight migrate stage - only needs Prisma CLI, not full Python environment
@@ -84,75 +141,3 @@ COPY autogpt_platform/backend/schema.prisma ./
 COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
 COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
 COPY autogpt_platform/backend/migrations ./migrations
-
-# ============================== BACKEND SERVER ============================== #
-
-FROM debian:13-slim AS server
-
-WORKDIR /app
-
-ENV DEBIAN_FRONTEND=noninteractive
-
-# Install Python, FFmpeg, ImageMagick, and CLI tools for agent use.
-# bubblewrap provides OS-level sandbox (whitelist-only FS + no network)
-# for the bash_exec MCP tool (fallback when E2B is not configured).
-# Using --no-install-recommends saves ~650MB by skipping unnecessary deps like llvm, mesa, etc.
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    python3.13 \
-    python3-pip \
-    ffmpeg \
-    imagemagick \
-    jq \
-    ripgrep \
-    tree \
-    bubblewrap \
-    && rm -rf /var/lib/apt/lists/*
-
-# Copy poetry (build-time only, for `poetry install --only-root` to create entry points)
-COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
-COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
-# Copy Node.js installation for Prisma and agent-browser.
-# npm/npx are symlinks in the builder (-> ../lib/node_modules/npm/bin/*-cli.js);
-# COPY resolves them to regular files, breaking require() paths.  Recreate as
-# proper symlinks so npm/npx can find their modules.
-COPY --from=builder /usr/bin/node /usr/bin/node
-COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
-RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
-    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
-COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
-
-# Install agent-browser (Copilot browser tool) + Chromium runtime dependencies.
-# These are the runtime libraries Chromium/Playwright needs on Debian 13 (trixie).
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
-    libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
-    libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
-    libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
-    fonts-liberation libfontconfig1 \
-    && rm -rf /var/lib/apt/lists/* \
-    && npm install -g agent-browser \
-    && agent-browser install \
-    && rm -rf /tmp/* /root/.npm
-
-WORKDIR /app/autogpt_platform/backend
-
-# Copy only the .venv from builder (not the entire /app directory)
-# The .venv includes the generated Prisma client
-COPY --from=builder /app/autogpt_platform/backend/.venv ./.venv
-ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
-
-# Copy dependency files + autogpt_libs (path dependency)
-COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
-COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml ./
-
-# Copy backend code + docs (for Copilot docs search)
-COPY autogpt_platform/backend ./
-COPY docs /app/docs
-# Install the project package to create entry point scripts in .venv/bin/
-# (e.g., rest, executor, ws, db, scheduler, notification - see [tool.poetry.scripts])
-RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \
-    poetry install --no-ansi --only-root
-
-ENV PORT=8000
-
-CMD ["rest"]
--- a/autogpt_platform/backend/backend/api/conftest.py
+++ b/autogpt_platform/backend/backend/api/conftest.py
@@ -1,9 +1,4 @@
-"""Common test fixtures for server tests.
-
-Note: Common fixtures like test_user_id, admin_user_id, target_user_id,
-setup_test_user, and setup_admin_user are defined in the parent conftest.py
-(backend/conftest.py) and are available here automatically.
-"""
+"""Common test fixtures for server tests."""

 import pytest
 from pytest_snapshot.plugin import Snapshot
@@ -16,6 +11,54 @@ def configured_snapshot(snapshot: Snapshot) -> Snapshot:
    return snapshot


+@pytest.fixture
+def test_user_id() -> str:
+    """Test user ID fixture."""
+    return "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
+
+
+@pytest.fixture
+def admin_user_id() -> str:
+    """Admin user ID fixture."""
+    return "4e53486c-cf57-477e-ba2a-cb02dc828e1b"
+
+
+@pytest.fixture
+def target_user_id() -> str:
+    """Target user ID fixture."""
+    return "5e53486c-cf57-477e-ba2a-cb02dc828e1c"
+
+
+@pytest.fixture
+async def setup_test_user(test_user_id):
+    """Create test user in database before tests."""
+    from backend.data.user import get_or_create_user
+
+    # Create the test user in the database using JWT token format
+    user_data = {
+        "sub": test_user_id,
+        "email": "test@example.com",
+        "user_metadata": {"name": "Test User"},
+    }
+    await get_or_create_user(user_data)
+    return test_user_id
+
+
+@pytest.fixture
+async def setup_admin_user(admin_user_id):
+    """Create admin user in database before tests."""
+    from backend.data.user import get_or_create_user
+
+    # Create the admin user in the database using JWT token format
+    user_data = {
+        "sub": admin_user_id,
+        "email": "test-admin@example.com",
+        "user_metadata": {"name": "Test Admin"},
+    }
+    await get_or_create_user(user_data)
+    return admin_user_id
+
+
@pytest.fixture
 def mock_jwt_user(test_user_id):
    """Provide mock JWT payload for regular user testing."""
--- a/autogpt_platform/backend/backend/api/external/middleware.py
+++ b/autogpt_platform/backend/backend/api/external/middleware.py
@@ -88,23 +88,20 @@ async def require_auth(
    )


-def require_permission(*permissions: APIKeyPermission):
+def require_permission(permission: APIKeyPermission):
    """
-    Dependency function for checking required permissions.
-    All listed permissions must be present.
+    Dependency function for checking specific permissions
    (works with API keys and OAuth tokens)
    """

-    async def check_permissions(
+    async def check_permission(
        auth: APIAuthorizationInfo = Security(require_auth),
    ) -> APIAuthorizationInfo:
-        missing = [p for p in permissions if p not in auth.scopes]
-        if missing:
+        if permission not in auth.scopes:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
-                detail=f"Missing required permission(s): "
-                f"{', '.join(p.value for p in missing)}",
+                detail=f"Missing required permission: {permission.value}",
            )
        return auth

-    return check_permissions
+    return check_permission
--- a/autogpt_platform/backend/backend/api/external/v1/routes.py
+++ b/autogpt_platform/backend/backend/api/external/v1/routes.py
@@ -1,7 +1,7 @@
 import logging
 import urllib.parse
 from collections import defaultdict
-from typing import Annotated, Any, Optional, Sequence
+from typing import Annotated, Any, Literal, Optional, Sequence

 from fastapi import APIRouter, Body, HTTPException, Security
 from prisma.enums import AgentExecutionStatus, APIKeyPermission
@@ -9,17 +9,15 @@ from pydantic import BaseModel, Field
 from typing_extensions import TypedDict

 import backend.api.features.store.cache as store_cache
-import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
 import backend.blocks
-from backend.api.external.middleware import require_auth, require_permission
+from backend.api.external.middleware import require_permission
 from backend.data import execution as execution_db
 from backend.data import graph as graph_db
 from backend.data import user as user_db
 from backend.data.auth.base import APIAuthorizationInfo
 from backend.data.block import BlockInput, CompletedBlockOutput
 from backend.executor.utils import add_graph_execution
-from backend.integrations.webhooks.graph_lifecycle_hooks import on_graph_activate
 from backend.util.settings import Settings

 from .integrations import integrations_router
@@ -97,43 +95,6 @@ async def execute_graph_block(
    return output


-@v1_router.post(
-    path="/graphs",
-    tags=["graphs"],
-    status_code=201,
-    dependencies=[
-        Security(
-            require_permission(
-                APIKeyPermission.WRITE_GRAPH, APIKeyPermission.WRITE_LIBRARY
-            )
-        )
-    ],
-)
-async def create_graph(
-    graph: graph_db.Graph,
-    auth: APIAuthorizationInfo = Security(
-        require_permission(APIKeyPermission.WRITE_GRAPH, APIKeyPermission.WRITE_LIBRARY)
-    ),
-) -> graph_db.GraphModel:
-    """
-    Create a new agent graph.
-
-    The graph will be validated and assigned a new ID.
-    It is automatically added to the user's library.
-    """
-    from backend.api.features.library import db as library_db
-
-    graph_model = graph_db.make_graph_model(graph, auth.user_id)
-    graph_model.reassign_ids(user_id=auth.user_id, reassign_graph_id=True)
-    graph_model.validate_graph(for_run=False)
-
-    await graph_db.create_graph(graph_model, user_id=auth.user_id)
-    await library_db.create_library_agent(graph_model, auth.user_id)
-    activated_graph = await on_graph_activate(graph_model, user_id=auth.user_id)
-
-    return activated_graph
-
-
@v1_router.post(
    path="/graphs/{graph_id}/execute/{graph_version}",
    tags=["graphs"],
@@ -231,13 +192,13 @@ async def get_graph_execution_results(
@v1_router.get(
    path="/store/agents",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.StoreAgentsResponse,
 )
 async def get_store_agents(
    featured: bool = False,
    creator: str | None = None,
-    sorted_by: store_db.StoreAgentsSortOptions | None = None,
+    sorted_by: Literal["rating", "runs", "name", "updated_at"] | None = None,
    search_query: str | None = None,
    category: str | None = None,
    page: int = 1,
@@ -279,7 +240,7 @@ async def get_store_agents(
@v1_router.get(
    path="/store/agents/{username}/{agent_name}",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.StoreAgentDetails,
 )
 async def get_store_agent(
@@ -307,13 +268,13 @@ async def get_store_agent(
@v1_router.get(
    path="/store/creators",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.CreatorsResponse,
 )
 async def get_store_creators(
    featured: bool = False,
    search_query: str | None = None,
-    sorted_by: store_db.StoreCreatorsSortOptions | None = None,
+    sorted_by: Literal["agent_rating", "agent_runs", "num_agents"] | None = None,
    page: int = 1,
    page_size: int = 20,
 ) -> store_model.CreatorsResponse:
@@ -349,7 +310,7 @@ async def get_store_creators(
@v1_router.get(
    path="/store/creators/{username}",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.CreatorDetails,
 )
 async def get_store_creator(
--- a/autogpt_platform/backend/backend/api/external/v1/tools.py
+++ b/autogpt_platform/backend/backend/api/external/v1/tools.py
@@ -15,9 +15,9 @@ from prisma.enums import APIKeyPermission
 from pydantic import BaseModel, Field

 from backend.api.external.middleware import require_permission
-from backend.copilot.model import ChatSession
-from backend.copilot.tools import find_agent_tool, run_agent_tool
-from backend.copilot.tools.models import ToolResponseBase
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools import find_agent_tool, run_agent_tool
+from backend.api.features.chat.tools.models import ToolResponseBase
 from backend.data.auth.base import APIAuthorizationInfo

 logger = logging.getLogger(__name__)
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
@@ -24,13 +24,14 @@ router = fastapi.APIRouter(
@router.get(
    "/listings",
    summary="Get Admin Listings History",
+    response_model=store_model.StoreListingsWithVersionsResponse,
 )
 async def get_admin_listings_with_versions(
    status: typing.Optional[prisma.enums.SubmissionStatus] = None,
    search: typing.Optional[str] = None,
    page: int = 1,
    page_size: int = 20,
-) -> store_model.StoreListingsWithVersionsAdminViewResponse:
+):
    """
    Get store listings with their version history for admins.

@@ -44,26 +45,36 @@ async def get_admin_listings_with_versions(
        page_size: Number of items per page

    Returns:
-        Paginated listings with their versions
+        StoreListingsWithVersionsResponse with listings and their versions
    """
-    listings = await store_db.get_admin_listings_with_versions(
-        status=status,
-        search_query=search,
-        page=page,
-        page_size=page_size,
-    )
-    return listings
+    try:
+        listings = await store_db.get_admin_listings_with_versions(
+            status=status,
+            search_query=search,
+            page=page,
+            page_size=page_size,
+        )
+        return listings
+    except Exception as e:
+        logger.exception("Error getting admin listings with versions: %s", e)
+        return fastapi.responses.JSONResponse(
+            status_code=500,
+            content={
+                "detail": "An error occurred while retrieving listings with versions"
+            },
+        )


@router.post(
    "/submissions/{store_listing_version_id}/review",
    summary="Review Store Submission",
+    response_model=store_model.StoreSubmission,
 )
 async def review_submission(
    store_listing_version_id: str,
    request: store_model.ReviewSubmissionRequest,
    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
-) -> store_model.StoreSubmissionAdminView:
+):
    """
    Review a store listing submission.

@@ -73,24 +84,31 @@ async def review_submission(
        user_id: Authenticated admin user performing the review

    Returns:
-        StoreSubmissionAdminView with updated review information
+        StoreSubmission with updated review information
    """
-    already_approved = await store_db.check_submission_already_approved(
-        store_listing_version_id=store_listing_version_id,
-    )
-    submission = await store_db.review_store_submission(
-        store_listing_version_id=store_listing_version_id,
-        is_approved=request.is_approved,
-        external_comments=request.comments,
-        internal_comments=request.internal_comments or "",
-        reviewer_id=user_id,
-    )
+    try:
+        already_approved = await store_db.check_submission_already_approved(
+            store_listing_version_id=store_listing_version_id,
+        )
+        submission = await store_db.review_store_submission(
+            store_listing_version_id=store_listing_version_id,
+            is_approved=request.is_approved,
+            external_comments=request.comments,
+            internal_comments=request.internal_comments or "",
+            reviewer_id=user_id,
+        )

-    state_changed = already_approved != request.is_approved
-    # Clear caches whenever approval state changes, since store visibility can change
-    if state_changed:
-        store_cache.clear_all_caches()
-    return submission
+        state_changed = already_approved != request.is_approved
+        # Clear caches when the request is approved as it updates what is shown on the store
+        if state_changed:
+            store_cache.clear_all_caches()
+        return submission
+    except Exception as e:
+        logger.exception("Error reviewing submission: %s", e)
+        return fastapi.responses.JSONResponse(
+            status_code=500,
+            content={"detail": "An error occurred while reviewing the submission"},
+        )


@router.get(
--- a/autogpt_platform/backend/backend/api/features/builder/db.py
+++ b/autogpt_platform/backend/backend/api/features/builder/db.py
@@ -1,17 +1,15 @@
 import logging
 from dataclasses import dataclass
+from datetime import datetime, timedelta, timezone
 from difflib import SequenceMatcher
-from typing import Any, Sequence, get_args, get_origin
+from typing import Sequence

 import prisma
-from prisma.enums import ContentType
-from prisma.models import mv_suggested_blocks

 import backend.api.features.library.db as library_db
 import backend.api.features.library.model as library_model
 import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
-from backend.api.features.store.hybrid_search import unified_hybrid_search
 from backend.blocks import load_all_blocks
 from backend.blocks._base import (
    AnyBlockSchema,
@@ -21,6 +19,7 @@ from backend.blocks._base import (
    BlockType,
 )
 from backend.blocks.llm import LlmModel
+from backend.data.db import query_raw_with_schema
 from backend.integrations.providers import ProviderName
 from backend.util.cache import cached
 from backend.util.models import Pagination
@@ -43,16 +42,6 @@ MAX_LIBRARY_AGENT_RESULTS = 100
 MAX_MARKETPLACE_AGENT_RESULTS = 100
 MIN_SCORE_FOR_FILTERED_RESULTS = 10.0

-# Boost blocks over marketplace agents in search results
-BLOCK_SCORE_BOOST = 50.0
-
-# Block IDs to exclude from search results
-EXCLUDED_BLOCK_IDS = frozenset(
-    {
-        "e189baac-8c20-45a1-94a7-55177ea42565",  # AgentExecutorBlock
-    }
-)
-
 SearchResultItem = BlockInfo | library_model.LibraryAgent | store_model.StoreAgent


@@ -75,8 +64,8 @@ def get_block_categories(category_blocks: int = 3) -> list[BlockCategoryResponse

    for block_type in load_all_blocks().values():
        block: AnyBlockSchema = block_type()
-        # Skip disabled and excluded blocks
-        if block.disabled or block.id in EXCLUDED_BLOCK_IDS:
+        # Skip disabled blocks
+        if block.disabled:
            continue
        # Skip blocks that don't have categories (all should have at least one)
        if not block.categories:
@@ -127,9 +116,6 @@ def get_blocks(
        # Skip disabled blocks
        if block.disabled:
            continue
-        # Skip excluded blocks
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
        # Skip blocks that don't match the category
        if category and category not in {c.name.lower() for c in block.categories}:
            continue
@@ -269,25 +255,14 @@ async def _build_cached_search_results(
        "my_agents": 0,
    }

-    # Use hybrid search when query is present, otherwise list all blocks
-    if (include_blocks or include_integrations) and normalized_query:
-        block_results, block_total, integration_total = await _hybrid_search_blocks(
-            query=search_query,
-            include_blocks=include_blocks,
-            include_integrations=include_integrations,
-        )
-        scored_items.extend(block_results)
-        total_items["blocks"] = block_total
-        total_items["integrations"] = integration_total
-    elif include_blocks or include_integrations:
-        # No query - list all blocks using in-memory approach
-        block_results, block_total, integration_total = _collect_block_results(
-            include_blocks=include_blocks,
-            include_integrations=include_integrations,
-        )
-        scored_items.extend(block_results)
-        total_items["blocks"] = block_total
-        total_items["integrations"] = integration_total
+    block_results, block_total, integration_total = _collect_block_results(
+        normalized_query=normalized_query,
+        include_blocks=include_blocks,
+        include_integrations=include_integrations,
+    )
+    scored_items.extend(block_results)
+    total_items["blocks"] = block_total
+    total_items["integrations"] = integration_total

    if include_library_agents:
        library_response = await library_db.list_library_agents(
@@ -332,14 +307,10 @@ async def _build_cached_search_results(

 def _collect_block_results(
    *,
+    normalized_query: str,
    include_blocks: bool,
    include_integrations: bool,
 ) -> tuple[list[_ScoredItem], int, int]:
-    """
-    Collect all blocks for listing (no search query).
-
-    All blocks get BLOCK_SCORE_BOOST to prioritize them over marketplace agents.
-    """
    results: list[_ScoredItem] = []
    block_count = 0
    integration_count = 0
@@ -352,10 +323,6 @@ def _collect_block_results(
        if block.disabled:
            continue

-        # Skip excluded blocks
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
-
        block_info = block.get_info()
        credentials = list(block.input_schema.get_credentials_fields().values())
        is_integration = len(credentials) > 0
@@ -365,6 +332,10 @@ def _collect_block_results(
        if not is_integration and not include_blocks:
            continue

+        score = _score_block(block, block_info, normalized_query)
+        if not _should_include_item(score, normalized_query):
+            continue
+
        filter_type: FilterType = "integrations" if is_integration else "blocks"
        if is_integration:
            integration_count += 1
@@ -375,122 +346,8 @@ def _collect_block_results(
            _ScoredItem(
                item=block_info,
                filter_type=filter_type,
-                score=BLOCK_SCORE_BOOST,
-                sort_key=block_info.name.lower(),
-            )
-        )
-
-    return results, block_count, integration_count
-
-
-async def _hybrid_search_blocks(
-    *,
-    query: str,
-    include_blocks: bool,
-    include_integrations: bool,
-) -> tuple[list[_ScoredItem], int, int]:
-    """
-    Search blocks using hybrid search with builder-specific filtering.
-
-    Uses unified_hybrid_search for semantic + lexical search, then applies
-    post-filtering for block/integration types and scoring adjustments.
-
-    Scoring:
-        - Base: hybrid relevance score (0-1) scaled to 0-100, plus BLOCK_SCORE_BOOST
-          to prioritize blocks over marketplace agents in combined results
-        - +30 for exact name match, +15 for prefix name match
-        - +20 if the block has an LlmModel field and the query matches an LLM model name
-
-    Args:
-        query: The search query string
-        include_blocks: Whether to include regular blocks
-        include_integrations: Whether to include integration blocks
-
-    Returns:
-        Tuple of (scored_items, block_count, integration_count)
-    """
-    results: list[_ScoredItem] = []
-    block_count = 0
-    integration_count = 0
-
-    if not include_blocks and not include_integrations:
-        return results, block_count, integration_count
-
-    normalized_query = query.strip().lower()
-
-    # Fetch more results to account for post-filtering
-    search_results, _ = await unified_hybrid_search(
-        query=query,
-        content_types=[ContentType.BLOCK],
-        page=1,
-        page_size=150,
-        min_score=0.10,
-    )
-
-    # Load all blocks for getting BlockInfo
-    all_blocks = load_all_blocks()
-
-    for result in search_results:
-        block_id = result["content_id"]
-
-        # Skip excluded blocks
-        if block_id in EXCLUDED_BLOCK_IDS:
-            continue
-
-        metadata = result.get("metadata", {})
-        hybrid_score = result.get("relevance", 0.0)
-
-        # Get the actual block class
-        if block_id not in all_blocks:
-            continue
-
-        block_cls = all_blocks[block_id]
-        block: AnyBlockSchema = block_cls()
-
-        if block.disabled:
-            continue
-
-        # Check block/integration filter using metadata
-        is_integration = metadata.get("is_integration", False)
-
-        if is_integration and not include_integrations:
-            continue
-        if not is_integration and not include_blocks:
-            continue
-
-        # Get block info
-        block_info = block.get_info()
-
-        # Calculate final score: scale hybrid score and add builder-specific bonuses
-        # Hybrid scores are 0-1, builder scores were 0-200+
-        # Add BLOCK_SCORE_BOOST to prioritize blocks over marketplace agents
-        final_score = hybrid_score * 100 + BLOCK_SCORE_BOOST
-
-        # Add LLM model match bonus
-        has_llm_field = metadata.get("has_llm_model_field", False)
-        if has_llm_field and _matches_llm_model(block.input_schema, normalized_query):
-            final_score += 20
-
-        # Add exact/prefix match bonus for deterministic tie-breaking
-        name = block_info.name.lower()
-        if name == normalized_query:
-            final_score += 30
-        elif name.startswith(normalized_query):
-            final_score += 15
-
-        # Track counts
-        filter_type: FilterType = "integrations" if is_integration else "blocks"
-        if is_integration:
-            integration_count += 1
-        else:
-            block_count += 1
-
-        results.append(
-            _ScoredItem(
-                item=block_info,
-                filter_type=filter_type,
-                score=final_score,
-                sort_key=name,
+                score=score,
+                sort_key=_get_item_name(block_info),
            )
        )

@@ -615,8 +472,6 @@ async def _get_static_counts():
        block: AnyBlockSchema = block_type()
        if block.disabled:
            continue
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue

        all_blocks += 1

@@ -643,25 +498,47 @@ async def _get_static_counts():
    }


-def _contains_type(annotation: Any, target: type) -> bool:
-    """Check if an annotation is or contains the target type (handles Optional/Union/Annotated)."""
-    if annotation is target:
-        return True
-    origin = get_origin(annotation)
-    if origin is None:
-        return False
-    return any(_contains_type(arg, target) for arg in get_args(annotation))
-
-
 def _matches_llm_model(schema_cls: type[BlockSchema], query: str) -> bool:
    for field in schema_cls.model_fields.values():
-        if _contains_type(field.annotation, LlmModel):
+        if field.annotation == LlmModel:
            # Check if query matches any value in llm_models
            if any(query in name for name in llm_models):
                return True
    return False


+def _score_block(
+    block: AnyBlockSchema,
+    block_info: BlockInfo,
+    normalized_query: str,
+) -> float:
+    if not normalized_query:
+        return 0.0
+
+    name = block_info.name.lower()
+    description = block_info.description.lower()
+    score = _score_primary_fields(name, description, normalized_query)
+
+    category_text = " ".join(
+        category.get("category", "").lower() for category in block_info.categories
+    )
+    score += _score_additional_field(category_text, normalized_query, 12, 6)
+
+    credentials_info = block.input_schema.get_credentials_fields_info().values()
+    provider_names = [
+        provider.value.lower()
+        for info in credentials_info
+        for provider in info.provider
+    ]
+    provider_text = " ".join(provider_names)
+    score += _score_additional_field(provider_text, normalized_query, 15, 6)
+
+    if _matches_llm_model(block.input_schema, normalized_query):
+        score += 20
+
+    return score
+
+
 def _score_library_agent(
    agent: library_model.LibraryAgent,
    normalized_query: str,
@@ -768,20 +645,31 @@ def _get_all_providers() -> dict[ProviderName, Provider]:
    return providers


-@cached(ttl_seconds=3600, shared_cache=True)
+@cached(ttl_seconds=3600)
 async def get_suggested_blocks(count: int = 5) -> list[BlockInfo]:
-    """Return the most-executed blocks from the last 14 days.
+    suggested_blocks = []
+    # Sum the number of executions for each block type
+    # Prisma cannot group by nested relations, so we do a raw query
+    # Calculate the cutoff timestamp
+    timestamp_threshold = datetime.now(timezone.utc) - timedelta(days=30)

-    Queries the mv_suggested_blocks materialized view (refreshed hourly via pg_cron)
-    and returns the top `count` blocks sorted by execution count, excluding
-    Input/Output/Agent block types and blocks in EXCLUDED_BLOCK_IDS.
-    """
-    results = await mv_suggested_blocks.prisma().find_many()
+    results = await query_raw_with_schema(
+        """
+        SELECT
+            agent_node."agentBlockId" AS block_id,
+            COUNT(execution.id) AS execution_count
+        FROM {schema_prefix}"AgentNodeExecution" execution
+        JOIN {schema_prefix}"AgentNode" agent_node ON execution."agentNodeId" = agent_node.id
+        WHERE execution."endedTime" >= $1::timestamp
+        GROUP BY agent_node."agentBlockId"
+        ORDER BY execution_count DESC;
+        """,
+        timestamp_threshold,
+    )

    # Get the top blocks based on execution count
-    # But ignore Input, Output, Agent, and excluded blocks
+    # But ignore Input and Output blocks
    blocks: list[tuple[BlockInfo, int]] = []
-    execution_counts = {row.block_id: row.execution_count for row in results}

    for block_type in load_all_blocks().values():
        block: AnyBlockSchema = block_type()
@@ -791,9 +679,11 @@ async def get_suggested_blocks(count: int = 5) -> list[BlockInfo]:
            BlockType.AGENT,
        ):
            continue
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
-        execution_count = execution_counts.get(block.id, 0)
+        # Find the execution count for this block
+        execution_count = next(
+            (row["execution_count"] for row in results if row["block_id"] == block.id),
+            0,
+        )
        blocks.append((block.get_info(), execution_count))
    # Sort blocks by execution count
    blocks.sort(key=lambda x: x[1], reverse=True)
--- a/autogpt_platform/backend/backend/api/features/builder/model.py
+++ b/autogpt_platform/backend/backend/api/features/builder/model.py
@@ -27,6 +27,7 @@ class SearchEntry(BaseModel):

 # Suggestions
 class SuggestionsResponse(BaseModel):
+    otto_suggestions: list[str]
    recent_searches: list[SearchEntry]
    providers: list[ProviderName]
    top_blocks: list[BlockInfo]
--- a/autogpt_platform/backend/backend/api/features/builder/routes.py
+++ b/autogpt_platform/backend/backend/api/features/builder/routes.py
@@ -1,5 +1,5 @@
 import logging
-from typing import Annotated, Sequence, cast, get_args
+from typing import Annotated, Sequence

 import fastapi
 from autogpt_libs.auth.dependencies import get_user_id, requires_user
@@ -10,8 +10,6 @@ from backend.util.models import Pagination
 from . import db as builder_db
 from . import model as builder_model

-VALID_FILTER_VALUES = get_args(builder_model.FilterType)
-
 logger = logging.getLogger(__name__)

 router = fastapi.APIRouter(
@@ -51,6 +49,11 @@ async def get_suggestions(
    Get all suggestions for the Blocks Menu.
    """
    return builder_model.SuggestionsResponse(
+        otto_suggestions=[
+            "What blocks do I need to get started?",
+            "Help me create a list",
+            "Help me feed my data to Google Maps",
+        ],
        recent_searches=await builder_db.get_recent_searches(user_id),
        providers=[
            ProviderName.TWITTER,
@@ -148,7 +151,7 @@ async def get_providers(
 async def search(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
    search_query: Annotated[str | None, fastapi.Query()] = None,
-    filter: Annotated[str | None, fastapi.Query()] = None,
+    filter: Annotated[list[builder_model.FilterType] | None, fastapi.Query()] = None,
    search_id: Annotated[str | None, fastapi.Query()] = None,
    by_creator: Annotated[list[str] | None, fastapi.Query()] = None,
    page: Annotated[int, fastapi.Query()] = 1,
@@ -157,20 +160,9 @@ async def search(
    """
    Search for blocks (including integrations), marketplace agents, and user library agents.
    """
-    # Parse and validate filter parameter
-    filters: list[builder_model.FilterType]
-    if filter:
-        filter_values = [f.strip() for f in filter.split(",")]
-        invalid_filters = [f for f in filter_values if f not in VALID_FILTER_VALUES]
-        if invalid_filters:
-            raise fastapi.HTTPException(
-                status_code=400,
-                detail=f"Invalid filter value(s): {', '.join(invalid_filters)}. "
-                f"Valid values are: {', '.join(VALID_FILTER_VALUES)}",
-            )
-        filters = cast(list[builder_model.FilterType], filter_values)
-    else:
-        filters = [
+    # If no filters are provided, then we will return all types
+    if not filter:
+        filter = [
            "blocks",
            "integrations",
            "marketplace_agents",
@@ -182,7 +174,7 @@ async def search(
    cached_results = await builder_db.get_sorted_search_results(
        user_id=user_id,
        search_query=search_query,
-        filters=filters,
+        filters=filter,
        by_creator=by_creator,
    )

@@ -204,7 +196,7 @@ async def search(
        user_id,
        builder_model.SearchEntry(
            search_query=search_query,
-            filter=filters,
+            filter=filter,
            by_creator=by_creator,
            search_id=search_id,
        ),
--- a/autogpt_platform/backend/backend/api/features/chat/completion_consumer.py
+++ b/autogpt_platform/backend/backend/api/features/chat/completion_consumer.py
@@ -0,0 +1,368 @@
+"""Redis Streams consumer for operation completion messages.
+
+This module provides a consumer (ChatCompletionConsumer) that listens for
+completion notifications (OperationCompleteMessage) from external services
+(like Agent Generator) and triggers the appropriate stream registry and
+chat service updates via process_operation_success/process_operation_failure.
+
+Why Redis Streams instead of RabbitMQ?
+--------------------------------------
+While the project typically uses RabbitMQ for async task queues (e.g., execution
+queue), Redis Streams was chosen for chat completion notifications because:
+
+1. **Unified Infrastructure**: The SSE reconnection feature already uses Redis
+   Streams (via stream_registry) for message persistence and replay. Using Redis
+   Streams for completion notifications keeps all chat streaming infrastructure
+   in one system, simplifying operations and reducing cross-system coordination.
+
+2. **Message Replay**: Redis Streams support XREAD with arbitrary message IDs,
+   allowing consumers to replay missed messages after reconnection. This aligns
+   with the SSE reconnection pattern where clients can resume from last_message_id.
+
+3. **Consumer Groups with XAUTOCLAIM**: Redis consumer groups provide automatic
+   load balancing across pods with explicit message claiming (XAUTOCLAIM) for
+   recovering from dead consumers - ideal for the completion callback pattern.
+
+4. **Lower Latency**: For real-time SSE updates, Redis (already in-memory for
+   stream_registry) provides lower latency than an additional RabbitMQ hop.
+
+5. **Atomicity with Task State**: Completion processing often needs to update
+   task metadata stored in Redis. Keeping both in Redis enables simpler
+   transactional semantics without distributed coordination.
+
+The consumer uses Redis Streams with consumer groups for reliable message
+processing across multiple platform pods, with XAUTOCLAIM for reclaiming
+stale pending messages from dead consumers.
+"""
+
+import asyncio
+import logging
+import os
+import uuid
+from typing import Any
+
+import orjson
+from prisma import Prisma
+from pydantic import BaseModel
+from redis.exceptions import ResponseError
+
+from backend.data.redis_client import get_redis_async
+
+from . import stream_registry
+from .completion_handler import process_operation_failure, process_operation_success
+from .config import ChatConfig
+
+logger = logging.getLogger(__name__)
+config = ChatConfig()
+
+
+class OperationCompleteMessage(BaseModel):
+    """Message format for operation completion notifications."""
+
+    operation_id: str
+    task_id: str
+    success: bool
+    result: dict | str | None = None
+    error: str | None = None
+
+
+class ChatCompletionConsumer:
+    """Consumer for chat operation completion messages from Redis Streams.
+
+    This consumer initializes its own Prisma client in start() to ensure
+    database operations work correctly within this async context.
+
+    Uses Redis consumer groups to allow multiple platform pods to consume
+    messages reliably with automatic redelivery on failure.
+    """
+
+    def __init__(self):
+        self._consumer_task: asyncio.Task | None = None
+        self._running = False
+        self._prisma: Prisma | None = None
+        self._consumer_name = f"consumer-{uuid.uuid4().hex[:8]}"
+
+    async def start(self) -> None:
+        """Start the completion consumer."""
+        if self._running:
+            logger.warning("Completion consumer already running")
+            return
+
+        # Create consumer group if it doesn't exist
+        try:
+            redis = await get_redis_async()
+            await redis.xgroup_create(
+                config.stream_completion_name,
+                config.stream_consumer_group,
+                id="0",
+                mkstream=True,
+            )
+            logger.info(
+                f"Created consumer group '{config.stream_consumer_group}' "
+                f"on stream '{config.stream_completion_name}'"
+            )
+        except ResponseError as e:
+            if "BUSYGROUP" in str(e):
+                logger.debug(
+                    f"Consumer group '{config.stream_consumer_group}' already exists"
+                )
+            else:
+                raise
+
+        self._running = True
+        self._consumer_task = asyncio.create_task(self._consume_messages())
+        logger.info(
+            f"Chat completion consumer started (consumer: {self._consumer_name})"
+        )
+
+    async def _ensure_prisma(self) -> Prisma:
+        """Lazily initialize Prisma client on first use."""
+        if self._prisma is None:
+            database_url = os.getenv("DATABASE_URL", "postgresql://localhost:5432")
+            self._prisma = Prisma(datasource={"url": database_url})
+            await self._prisma.connect()
+            logger.info("[COMPLETION] Consumer Prisma client connected (lazy init)")
+        return self._prisma
+
+    async def stop(self) -> None:
+        """Stop the completion consumer."""
+        self._running = False
+
+        if self._consumer_task:
+            self._consumer_task.cancel()
+            try:
+                await self._consumer_task
+            except asyncio.CancelledError:
+                pass
+            self._consumer_task = None
+
+        if self._prisma:
+            await self._prisma.disconnect()
+            self._prisma = None
+            logger.info("[COMPLETION] Consumer Prisma client disconnected")
+
+        logger.info("Chat completion consumer stopped")
+
+    async def _consume_messages(self) -> None:
+        """Main message consumption loop with retry logic."""
+        max_retries = 10
+        retry_delay = 5  # seconds
+        retry_count = 0
+        block_timeout = 5000  # milliseconds
+
+        while self._running and retry_count < max_retries:
+            try:
+                redis = await get_redis_async()
+
+                # Reset retry count on successful connection
+                retry_count = 0
+
+                while self._running:
+                    # First, claim any stale pending messages from dead consumers
+                    # Redis does NOT auto-redeliver pending messages; we must explicitly
+                    # claim them using XAUTOCLAIM
+                    try:
+                        claimed_result = await redis.xautoclaim(
+                            name=config.stream_completion_name,
+                            groupname=config.stream_consumer_group,
+                            consumername=self._consumer_name,
+                            min_idle_time=config.stream_claim_min_idle_ms,
+                            start_id="0-0",
+                            count=10,
+                        )
+                        # xautoclaim returns: (next_start_id, [(id, data), ...], [deleted_ids])
+                        if claimed_result and len(claimed_result) >= 2:
+                            claimed_entries = claimed_result[1]
+                            if claimed_entries:
+                                logger.info(
+                                    f"Claimed {len(claimed_entries)} stale pending messages"
+                                )
+                                for entry_id, data in claimed_entries:
+                                    if not self._running:
+                                        return
+                                    await self._process_entry(redis, entry_id, data)
+                    except Exception as e:
+                        logger.warning(f"XAUTOCLAIM failed (non-fatal): {e}")
+
+                    # Read new messages from the stream
+                    messages = await redis.xreadgroup(
+                        groupname=config.stream_consumer_group,
+                        consumername=self._consumer_name,
+                        streams={config.stream_completion_name: ">"},
+                        block=block_timeout,
+                        count=10,
+                    )
+
+                    if not messages:
+                        continue
+
+                    for stream_name, entries in messages:
+                        for entry_id, data in entries:
+                            if not self._running:
+                                return
+                            await self._process_entry(redis, entry_id, data)
+
+            except asyncio.CancelledError:
+                logger.info("Consumer cancelled")
+                return
+            except Exception as e:
+                retry_count += 1
+                logger.error(
+                    f"Consumer error (retry {retry_count}/{max_retries}): {e}",
+                    exc_info=True,
+                )
+                if self._running and retry_count < max_retries:
+                    await asyncio.sleep(retry_delay)
+                else:
+                    logger.error("Max retries reached, stopping consumer")
+                    return
+
+    async def _process_entry(
+        self, redis: Any, entry_id: str, data: dict[str, Any]
+    ) -> None:
+        """Process a single stream entry and acknowledge it on success.
+
+        Args:
+            redis: Redis client connection
+            entry_id: The stream entry ID
+            data: The entry data dict
+        """
+        try:
+            # Handle the message
+            message_data = data.get("data")
+            if message_data:
+                await self._handle_message(
+                    message_data.encode()
+                    if isinstance(message_data, str)
+                    else message_data
+                )
+
+            # Acknowledge the message after successful processing
+            await redis.xack(
+                config.stream_completion_name,
+                config.stream_consumer_group,
+                entry_id,
+            )
+        except Exception as e:
+            logger.error(
+                f"Error processing completion message {entry_id}: {e}",
+                exc_info=True,
+            )
+            # Message remains in pending state and will be claimed by
+            # XAUTOCLAIM after min_idle_time expires
+
+    async def _handle_message(self, body: bytes) -> None:
+        """Handle a completion message using our own Prisma client."""
+        try:
+            data = orjson.loads(body)
+            message = OperationCompleteMessage(**data)
+        except Exception as e:
+            logger.error(f"Failed to parse completion message: {e}")
+            return
+
+        logger.info(
+            f"[COMPLETION] Received completion for operation {message.operation_id} "
+            f"(task_id={message.task_id}, success={message.success})"
+        )
+
+        # Find task in registry
+        task = await stream_registry.find_task_by_operation_id(message.operation_id)
+        if task is None:
+            task = await stream_registry.get_task(message.task_id)
+
+        if task is None:
+            logger.warning(
+                f"[COMPLETION] Task not found for operation {message.operation_id} "
+                f"(task_id={message.task_id})"
+            )
+            return
+
+        logger.info(
+            f"[COMPLETION] Found task: task_id={task.task_id}, "
+            f"session_id={task.session_id}, tool_call_id={task.tool_call_id}"
+        )
+
+        # Guard against empty task fields
+        if not task.task_id or not task.session_id or not task.tool_call_id:
+            logger.error(
+                f"[COMPLETION] Task has empty critical fields! "
+                f"task_id={task.task_id!r}, session_id={task.session_id!r}, "
+                f"tool_call_id={task.tool_call_id!r}"
+            )
+            return
+
+        if message.success:
+            await self._handle_success(task, message)
+        else:
+            await self._handle_failure(task, message)
+
+    async def _handle_success(
+        self,
+        task: stream_registry.ActiveTask,
+        message: OperationCompleteMessage,
+    ) -> None:
+        """Handle successful operation completion."""
+        prisma = await self._ensure_prisma()
+        await process_operation_success(task, message.result, prisma)
+
+    async def _handle_failure(
+        self,
+        task: stream_registry.ActiveTask,
+        message: OperationCompleteMessage,
+    ) -> None:
+        """Handle failed operation completion."""
+        prisma = await self._ensure_prisma()
+        await process_operation_failure(task, message.error, prisma)
+
+
+# Module-level consumer instance
+_consumer: ChatCompletionConsumer | None = None
+
+
+async def start_completion_consumer() -> None:
+    """Start the global completion consumer."""
+    global _consumer
+    if _consumer is None:
+        _consumer = ChatCompletionConsumer()
+    await _consumer.start()
+
+
+async def stop_completion_consumer() -> None:
+    """Stop the global completion consumer."""
+    global _consumer
+    if _consumer:
+        await _consumer.stop()
+        _consumer = None
+
+
+async def publish_operation_complete(
+    operation_id: str,
+    task_id: str,
+    success: bool,
+    result: dict | str | None = None,
+    error: str | None = None,
+) -> None:
+    """Publish an operation completion message to Redis Streams.
+
+    Args:
+        operation_id: The operation ID that completed.
+        task_id: The task ID associated with the operation.
+        success: Whether the operation succeeded.
+        result: The result data (for success).
+        error: The error message (for failure).
+    """
+    message = OperationCompleteMessage(
+        operation_id=operation_id,
+        task_id=task_id,
+        success=success,
+        result=result,
+        error=error,
+    )
+
+    redis = await get_redis_async()
+    await redis.xadd(
+        config.stream_completion_name,
+        {"data": message.model_dump_json()},
+        maxlen=config.stream_max_length,
+    )
+    logger.info(f"Published completion for operation {operation_id}")
--- a/autogpt_platform/backend/backend/api/features/chat/completion_handler.py
+++ b/autogpt_platform/backend/backend/api/features/chat/completion_handler.py
@@ -0,0 +1,344 @@
+"""Shared completion handling for operation success and failure.
+
+This module provides common logic for handling operation completion from both:
+- The Redis Streams consumer (completion_consumer.py)
+- The HTTP webhook endpoint (routes.py)
+"""
+
+import logging
+from typing import Any
+
+import orjson
+from prisma import Prisma
+
+from . import service as chat_service
+from . import stream_registry
+from .response_model import StreamError, StreamToolOutputAvailable
+from .tools.models import ErrorResponse
+
+logger = logging.getLogger(__name__)
+
+# Tools that produce agent_json that needs to be saved to library
+AGENT_GENERATION_TOOLS = {"create_agent", "edit_agent"}
+
+# Keys that should be stripped from agent_json when returning in error responses
+SENSITIVE_KEYS = frozenset(
+    {
+        "api_key",
+        "apikey",
+        "api_secret",
+        "password",
+        "secret",
+        "credentials",
+        "credential",
+        "token",
+        "access_token",
+        "refresh_token",
+        "private_key",
+        "privatekey",
+        "auth",
+        "authorization",
+    }
+)
+
+
+def _sanitize_agent_json(obj: Any) -> Any:
+    """Recursively sanitize agent_json by removing sensitive keys.
+
+    Args:
+        obj: The object to sanitize (dict, list, or primitive)
+
+    Returns:
+        Sanitized copy with sensitive keys removed/redacted
+    """
+    if isinstance(obj, dict):
+        return {
+            k: "[REDACTED]" if k.lower() in SENSITIVE_KEYS else _sanitize_agent_json(v)
+            for k, v in obj.items()
+        }
+    elif isinstance(obj, list):
+        return [_sanitize_agent_json(item) for item in obj]
+    else:
+        return obj
+
+
+class ToolMessageUpdateError(Exception):
+    """Raised when updating a tool message in the database fails."""
+
+    pass
+
+
+async def _update_tool_message(
+    session_id: str,
+    tool_call_id: str,
+    content: str,
+    prisma_client: Prisma | None,
+) -> None:
+    """Update tool message in database.
+
+    Args:
+        session_id: The session ID
+        tool_call_id: The tool call ID to update
+        content: The new content for the message
+        prisma_client: Optional Prisma client. If None, uses chat_service.
+
+    Raises:
+        ToolMessageUpdateError: If the database update fails. The caller should
+            handle this to avoid marking the task as completed with inconsistent state.
+    """
+    try:
+        if prisma_client:
+            # Use provided Prisma client (for consumer with its own connection)
+            updated_count = await prisma_client.chatmessage.update_many(
+                where={
+                    "sessionId": session_id,
+                    "toolCallId": tool_call_id,
+                },
+                data={"content": content},
+            )
+            # Check if any rows were updated - 0 means message not found
+            if updated_count == 0:
+                raise ToolMessageUpdateError(
+                    f"No message found with tool_call_id={tool_call_id} in session {session_id}"
+                )
+        else:
+            # Use service function (for webhook endpoint)
+            await chat_service._update_pending_operation(
+                session_id=session_id,
+                tool_call_id=tool_call_id,
+                result=content,
+            )
+    except ToolMessageUpdateError:
+        raise
+    except Exception as e:
+        logger.error(f"[COMPLETION] Failed to update tool message: {e}", exc_info=True)
+        raise ToolMessageUpdateError(
+            f"Failed to update tool message for tool_call_id={tool_call_id}: {e}"
+        ) from e
+
+
+def serialize_result(result: dict | list | str | int | float | bool | None) -> str:
+    """Serialize result to JSON string with sensible defaults.
+
+    Args:
+        result: The result to serialize. Can be a dict, list, string,
+            number, boolean, or None.
+
+    Returns:
+        JSON string representation of the result. Returns '{"status": "completed"}'
+        only when result is explicitly None.
+    """
+    if isinstance(result, str):
+        return result
+    if result is None:
+        return '{"status": "completed"}'
+    return orjson.dumps(result).decode("utf-8")
+
+
+async def _save_agent_from_result(
+    result: dict[str, Any],
+    user_id: str | None,
+    tool_name: str,
+) -> dict[str, Any]:
+    """Save agent to library if result contains agent_json.
+
+    Args:
+        result: The result dict that may contain agent_json
+        user_id: The user ID to save the agent for
+        tool_name: The tool name (create_agent or edit_agent)
+
+    Returns:
+        Updated result dict with saved agent details, or original result if no agent_json
+    """
+    if not user_id:
+        logger.warning("[COMPLETION] Cannot save agent: no user_id in task")
+        return result
+
+    agent_json = result.get("agent_json")
+    if not agent_json:
+        logger.warning(
+            f"[COMPLETION] {tool_name} completed but no agent_json in result"
+        )
+        return result
+
+    try:
+        from .tools.agent_generator import save_agent_to_library
+
+        is_update = tool_name == "edit_agent"
+        created_graph, library_agent = await save_agent_to_library(
+            agent_json, user_id, is_update=is_update
+        )
+
+        logger.info(
+            f"[COMPLETION] Saved agent '{created_graph.name}' to library "
+            f"(graph_id={created_graph.id}, library_agent_id={library_agent.id})"
+        )
+
+        # Return a response similar to AgentSavedResponse
+        return {
+            "type": "agent_saved",
+            "message": f"Agent '{created_graph.name}' has been saved to your library!",
+            "agent_id": created_graph.id,
+            "agent_name": created_graph.name,
+            "library_agent_id": library_agent.id,
+            "library_agent_link": f"/library/agents/{library_agent.id}",
+            "agent_page_link": f"/build?flowID={created_graph.id}",
+        }
+    except Exception as e:
+        logger.error(
+            f"[COMPLETION] Failed to save agent to library: {e}",
+            exc_info=True,
+        )
+        # Return error but don't fail the whole operation
+        # Sanitize agent_json to remove sensitive keys before returning
+        return {
+            "type": "error",
+            "message": f"Agent was generated but failed to save: {str(e)}",
+            "error": str(e),
+            "agent_json": _sanitize_agent_json(agent_json),
+        }
+
+
+async def process_operation_success(
+    task: stream_registry.ActiveTask,
+    result: dict | str | None,
+    prisma_client: Prisma | None = None,
+) -> None:
+    """Handle successful operation completion.
+
+    Publishes the result to the stream registry, updates the database,
+    generates LLM continuation, and marks the task as completed.
+
+    Args:
+        task: The active task that completed
+        result: The result data from the operation
+        prisma_client: Optional Prisma client for database operations.
+            If None, uses chat_service._update_pending_operation instead.
+
+    Raises:
+        ToolMessageUpdateError: If the database update fails. The task will be
+            marked as failed instead of completed to avoid inconsistent state.
+    """
+    # For agent generation tools, save the agent to library
+    if task.tool_name in AGENT_GENERATION_TOOLS and isinstance(result, dict):
+        result = await _save_agent_from_result(result, task.user_id, task.tool_name)
+
+    # Serialize result for output (only substitute default when result is exactly None)
+    result_output = result if result is not None else {"status": "completed"}
+    output_str = (
+        result_output
+        if isinstance(result_output, str)
+        else orjson.dumps(result_output).decode("utf-8")
+    )
+
+    # Publish result to stream registry
+    await stream_registry.publish_chunk(
+        task.task_id,
+        StreamToolOutputAvailable(
+            toolCallId=task.tool_call_id,
+            toolName=task.tool_name,
+            output=output_str,
+            success=True,
+        ),
+    )
+
+    # Update pending operation in database
+    # If this fails, we must not continue to mark the task as completed
+    result_str = serialize_result(result)
+    try:
+        await _update_tool_message(
+            session_id=task.session_id,
+            tool_call_id=task.tool_call_id,
+            content=result_str,
+            prisma_client=prisma_client,
+        )
+    except ToolMessageUpdateError:
+        # DB update failed - mark task as failed to avoid inconsistent state
+        logger.error(
+            f"[COMPLETION] DB update failed for task {task.task_id}, "
+            "marking as failed instead of completed"
+        )
+        await stream_registry.publish_chunk(
+            task.task_id,
+            StreamError(errorText="Failed to save operation result to database"),
+        )
+        await stream_registry.mark_task_completed(task.task_id, status="failed")
+        raise
+
+    # Generate LLM continuation with streaming
+    try:
+        await chat_service._generate_llm_continuation_with_streaming(
+            session_id=task.session_id,
+            user_id=task.user_id,
+            task_id=task.task_id,
+        )
+    except Exception as e:
+        logger.error(
+            f"[COMPLETION] Failed to generate LLM continuation: {e}",
+            exc_info=True,
+        )
+
+    # Mark task as completed and release Redis lock
+    await stream_registry.mark_task_completed(task.task_id, status="completed")
+    try:
+        await chat_service._mark_operation_completed(task.tool_call_id)
+    except Exception as e:
+        logger.error(f"[COMPLETION] Failed to mark operation completed: {e}")
+
+    logger.info(
+        f"[COMPLETION] Successfully processed completion for task {task.task_id}"
+    )
+
+
+async def process_operation_failure(
+    task: stream_registry.ActiveTask,
+    error: str | None,
+    prisma_client: Prisma | None = None,
+) -> None:
+    """Handle failed operation completion.
+
+    Publishes the error to the stream registry, updates the database with
+    the error response, and marks the task as failed.
+
+    Args:
+        task: The active task that failed
+        error: The error message from the operation
+        prisma_client: Optional Prisma client for database operations.
+            If None, uses chat_service._update_pending_operation instead.
+    """
+    error_msg = error or "Operation failed"
+
+    # Publish error to stream registry
+    await stream_registry.publish_chunk(
+        task.task_id,
+        StreamError(errorText=error_msg),
+    )
+
+    # Update pending operation with error
+    # If this fails, we still continue to mark the task as failed
+    error_response = ErrorResponse(
+        message=error_msg,
+        error=error,
+    )
+    try:
+        await _update_tool_message(
+            session_id=task.session_id,
+            tool_call_id=task.tool_call_id,
+            content=error_response.model_dump_json(),
+            prisma_client=prisma_client,
+        )
+    except ToolMessageUpdateError:
+        # DB update failed - log but continue with cleanup
+        logger.error(
+            f"[COMPLETION] DB update failed while processing failure for task {task.task_id}, "
+            "continuing with cleanup"
+        )
+
+    # Mark task as failed and release Redis lock
+    await stream_registry.mark_task_completed(task.task_id, status="failed")
+    try:
+        await chat_service._mark_operation_completed(task.tool_call_id)
+    except Exception as e:
+        logger.error(f"[COMPLETION] Failed to mark operation completed: {e}")
+
+    logger.info(f"[COMPLETION] Processed failure for task {task.task_id}: {error_msg}")
--- a/autogpt_platform/backend/backend/api/features/chat/config.py
+++ b/autogpt_platform/backend/backend/api/features/chat/config.py
@@ -1,13 +1,10 @@
 """Configuration management for chat system."""

 import os
-from typing import Literal

 from pydantic import Field, field_validator
 from pydantic_settings import BaseSettings

-from backend.util.clients import OPENROUTER_BASE_URL
-

 class ChatConfig(BaseSettings):
    """Configuration for the chat system."""
@@ -22,41 +19,70 @@ class ChatConfig(BaseSettings):
    )
    api_key: str | None = Field(default=None, description="OpenAI API key")
    base_url: str | None = Field(
-        default=OPENROUTER_BASE_URL,
+        default="https://openrouter.ai/api/v1",
        description="Base URL for API (e.g., for OpenRouter)",
    )

    # Session TTL Configuration - 12 hours
    session_ttl: int = Field(default=43200, description="Session TTL in seconds")

+    # Streaming Configuration
+    stream_timeout: int = Field(default=300, description="Stream timeout in seconds")
+    max_retries: int = Field(
+        default=3,
+        description="Max retries for fallback path (SDK handles retries internally)",
+    )
    max_agent_runs: int = Field(default=30, description="Maximum number of agent runs")
    max_agent_schedules: int = Field(
        default=30, description="Maximum number of agent schedules"
    )

+    # Long-running operation configuration
+    long_running_operation_ttl: int = Field(
+        default=600,
+        description="TTL in seconds for long-running operation tracking in Redis (safety net if pod dies)",
+    )
+
    # Stream registry configuration for SSE reconnection
    stream_ttl: int = Field(
        default=3600,
        description="TTL in seconds for stream data in Redis (1 hour)",
    )
-    stream_lock_ttl: int = Field(
-        default=120,
-        description="TTL in seconds for stream lock (2 minutes). Short timeout allows "
-        "reconnection after refresh/crash without long waits.",
-    )
    stream_max_length: int = Field(
        default=10000,
        description="Maximum number of messages to store per stream",
    )

-    # Redis key prefixes for stream registry
-    session_meta_prefix: str = Field(
-        default="chat:task:meta:",
-        description="Prefix for session metadata hash keys",
+    # Redis Streams configuration for completion consumer
+    stream_completion_name: str = Field(
+        default="chat:completions",
+        description="Redis Stream name for operation completions",
    )
-    turn_stream_prefix: str = Field(
+    stream_consumer_group: str = Field(
+        default="chat_consumers",
+        description="Consumer group name for completion stream",
+    )
+    stream_claim_min_idle_ms: int = Field(
+        default=60000,
+        description="Minimum idle time in milliseconds before claiming pending messages from dead consumers",
+    )
+
+    # Redis key prefixes for stream registry
+    task_meta_prefix: str = Field(
+        default="chat:task:meta:",
+        description="Prefix for task metadata hash keys",
+    )
+    task_stream_prefix: str = Field(
        default="chat:stream:",
-        description="Prefix for turn message stream keys",
+        description="Prefix for task message stream keys",
+    )
+    task_op_prefix: str = Field(
+        default="chat:task:op:",
+        description="Prefix for operation ID to task ID mapping keys",
+    )
+    internal_api_key: str | None = Field(
+        default=None,
+        description="API key for internal webhook callbacks (env: CHAT_INTERNAL_API_KEY)",
    )

    # Langfuse Prompt Management Configuration
@@ -65,15 +91,11 @@ class ChatConfig(BaseSettings):
        default="CoPilot Prompt",
        description="Name of the prompt in Langfuse to fetch",
    )
-    langfuse_prompt_cache_ttl: int = Field(
-        default=300,
-        description="Cache TTL in seconds for Langfuse prompt (0 to disable caching)",
-    )

    # Claude Agent SDK Configuration
    use_claude_agent_sdk: bool = Field(
        default=True,
-        description="Use Claude Agent SDK (True) or OpenAI-compatible LLM baseline (False)",
+        description="Use Claude Agent SDK for chat completions",
    )
    claude_agent_model: str | None = Field(
        default=None,
@@ -87,88 +109,25 @@ class ChatConfig(BaseSettings):
    )
    claude_agent_max_subtasks: int = Field(
        default=10,
-        description="Max number of concurrent sub-agent Tasks the SDK can run per session.",
+        description="Max number of sub-agent Tasks the SDK can spawn per session.",
    )
    claude_agent_use_resume: bool = Field(
        default=True,
        description="Use --resume for multi-turn conversations instead of "
        "history compression. Falls back to compression when unavailable.",
    )
-    use_claude_code_subscription: bool = Field(
-        default=False,
-        description="For personal/dev use: use Claude Code CLI subscription auth instead of API keys. Requires `claude login` on the host. Only works with SDK mode.",
-    )

-    # E2B Sandbox Configuration
-    use_e2b_sandbox: bool = Field(
+    # Extended thinking configuration for Claude models
+    thinking_enabled: bool = Field(
        default=True,
-        description="Use E2B cloud sandboxes for persistent bash/python execution. "
-        "When enabled, bash_exec routes commands to E2B and SDK file tools "
-        "operate directly on the sandbox via E2B's filesystem API.",
+        description="Enable adaptive thinking for Claude models via OpenRouter",
    )
-    e2b_api_key: str | None = Field(
-        default=None,
-        description="E2B API key. Falls back to E2B_API_KEY environment variable.",
-    )
-    e2b_sandbox_template: str = Field(
-        default="base",
-        description="E2B sandbox template to use for copilot sessions.",
-    )
-    e2b_sandbox_timeout: int = Field(
-        default=10800,  # 3 hours — wall-clock timeout, not idle; explicit pause is primary
-        description="E2B sandbox running-time timeout (seconds). "
-        "E2B timeout is wall-clock (not idle). Explicit per-turn pause is the primary "
-        "mechanism; this is the safety net.",
-    )
-    e2b_sandbox_on_timeout: Literal["kill", "pause"] = Field(
-        default="pause",
-        description="E2B lifecycle action on timeout: 'pause' (default, free) or 'kill'.",
-    )
-
-    @property
-    def e2b_active(self) -> bool:
-        """True when E2B is enabled and the API key is present.
-
-        Single source of truth for "should we use E2B right now?".
-        Prefer this over combining ``use_e2b_sandbox`` and ``e2b_api_key``
-        separately at call sites.
-        """
-        return self.use_e2b_sandbox and bool(self.e2b_api_key)
-
-    @property
-    def active_e2b_api_key(self) -> str | None:
-        """Return the E2B API key when E2B is enabled and configured, else None.
-
-        Combines the ``use_e2b_sandbox`` flag check and key presence into one.
-        Use in callers::
-
-            if api_key := config.active_e2b_api_key:
-                # E2B is active; api_key is narrowed to str
-        """
-        return self.e2b_api_key if self.e2b_active else None
-
-    @field_validator("use_e2b_sandbox", mode="before")
-    @classmethod
-    def get_use_e2b_sandbox(cls, v):
-        """Get use_e2b_sandbox from environment if not provided."""
-        env_val = os.getenv("CHAT_USE_E2B_SANDBOX", "").lower()
-        if env_val:
-            return env_val in ("true", "1", "yes", "on")
-        return True if v is None else v
-
-    @field_validator("e2b_api_key", mode="before")
-    @classmethod
-    def get_e2b_api_key(cls, v):
-        """Get E2B API key from environment if not provided."""
-        if not v:
-            v = os.getenv("CHAT_E2B_API_KEY") or os.getenv("E2B_API_KEY")
-        return v

    @field_validator("api_key", mode="before")
    @classmethod
    def get_api_key(cls, v):
        """Get API key from environment if not provided."""
-        if not v:
+        if v is None:
            # Try to get from environment variables
            # First check for CHAT_API_KEY (Pydantic prefix)
            v = os.getenv("CHAT_API_KEY")
@@ -178,16 +137,13 @@ class ChatConfig(BaseSettings):
            if not v:
                # Fall back to OPENAI_API_KEY
                v = os.getenv("OPENAI_API_KEY")
-            # Note: ANTHROPIC_API_KEY is intentionally NOT included here.
-            # The SDK CLI picks it up from the env directly. Including it
-            # would pair it with the OpenRouter base_url, causing auth failures.
        return v

    @field_validator("base_url", mode="before")
    @classmethod
    def get_base_url(cls, v):
        """Get base URL from environment if not provided."""
-        if not v:
+        if v is None:
            # Check for OpenRouter or custom base URL
            v = os.getenv("CHAT_BASE_URL")
            if not v:
@@ -195,7 +151,15 @@ class ChatConfig(BaseSettings):
            if not v:
                v = os.getenv("OPENAI_BASE_URL")
            if not v:
-                v = OPENROUTER_BASE_URL
+                v = "https://openrouter.ai/api/v1"
+        return v
+
+    @field_validator("internal_api_key", mode="before")
+    @classmethod
+    def get_internal_api_key(cls, v):
+        """Get internal API key from environment if not provided."""
+        if v is None:
+            v = os.getenv("CHAT_INTERNAL_API_KEY")
        return v

    @field_validator("use_claude_agent_sdk", mode="before")
@@ -209,15 +173,6 @@ class ChatConfig(BaseSettings):
        # Default to True (SDK enabled by default)
        return True if v is None else v

-    @field_validator("use_claude_code_subscription", mode="before")
-    @classmethod
-    def get_use_claude_code_subscription(cls, v):
-        """Get use_claude_code_subscription from environment if not provided."""
-        env_val = os.getenv("CHAT_USE_CLAUDE_CODE_SUBSCRIPTION", "").lower()
-        if env_val:
-            return env_val in ("true", "1", "yes", "on")
-        return False if v is None else v
-
    # Prompt paths for different contexts
    PROMPT_PATHS: dict[str, str] = {
        "default": "prompts/chat_system.md",
--- a/autogpt_platform/backend/backend/api/features/chat/db.py
+++ b/autogpt_platform/backend/backend/api/features/chat/db.py
@@ -0,0 +1,288 @@
+"""Database operations for chat sessions."""
+
+import asyncio
+import logging
+from datetime import UTC, datetime
+from typing import Any, cast
+
+from prisma.models import ChatMessage as PrismaChatMessage
+from prisma.models import ChatSession as PrismaChatSession
+from prisma.types import (
+    ChatMessageCreateInput,
+    ChatSessionCreateInput,
+    ChatSessionUpdateInput,
+    ChatSessionWhereInput,
+)
+
+from backend.data.db import transaction
+from backend.util.json import SafeJson
+
+logger = logging.getLogger(__name__)
+
+
+async def get_chat_session(session_id: str) -> PrismaChatSession | None:
+    """Get a chat session by ID from the database."""
+    session = await PrismaChatSession.prisma().find_unique(
+        where={"id": session_id},
+        include={"Messages": True},
+    )
+    if session and session.Messages:
+        # Sort messages by sequence in Python - Prisma Python client doesn't support
+        # order_by in include clauses (unlike Prisma JS), so we sort after fetching
+        session.Messages.sort(key=lambda m: m.sequence)
+    return session
+
+
+async def create_chat_session(
+    session_id: str,
+    user_id: str,
+) -> PrismaChatSession:
+    """Create a new chat session in the database."""
+    data = ChatSessionCreateInput(
+        id=session_id,
+        userId=user_id,
+        credentials=SafeJson({}),
+        successfulAgentRuns=SafeJson({}),
+        successfulAgentSchedules=SafeJson({}),
+    )
+    return await PrismaChatSession.prisma().create(data=data)
+
+
+async def update_chat_session(
+    session_id: str,
+    credentials: dict[str, Any] | None = None,
+    successful_agent_runs: dict[str, Any] | None = None,
+    successful_agent_schedules: dict[str, Any] | None = None,
+    total_prompt_tokens: int | None = None,
+    total_completion_tokens: int | None = None,
+    title: str | None = None,
+) -> PrismaChatSession | None:
+    """Update a chat session's metadata."""
+    data: ChatSessionUpdateInput = {"updatedAt": datetime.now(UTC)}
+
+    if credentials is not None:
+        data["credentials"] = SafeJson(credentials)
+    if successful_agent_runs is not None:
+        data["successfulAgentRuns"] = SafeJson(successful_agent_runs)
+    if successful_agent_schedules is not None:
+        data["successfulAgentSchedules"] = SafeJson(successful_agent_schedules)
+    if total_prompt_tokens is not None:
+        data["totalPromptTokens"] = total_prompt_tokens
+    if total_completion_tokens is not None:
+        data["totalCompletionTokens"] = total_completion_tokens
+    if title is not None:
+        data["title"] = title
+
+    session = await PrismaChatSession.prisma().update(
+        where={"id": session_id},
+        data=data,
+        include={"Messages": True},
+    )
+    if session and session.Messages:
+        # Sort in Python - Prisma Python doesn't support order_by in include clauses
+        session.Messages.sort(key=lambda m: m.sequence)
+    return session
+
+
+async def add_chat_message(
+    session_id: str,
+    role: str,
+    sequence: int,
+    content: str | None = None,
+    name: str | None = None,
+    tool_call_id: str | None = None,
+    refusal: str | None = None,
+    tool_calls: list[dict[str, Any]] | None = None,
+    function_call: dict[str, Any] | None = None,
+) -> PrismaChatMessage:
+    """Add a message to a chat session."""
+    # Build input dict dynamically rather than using ChatMessageCreateInput directly
+    # because Prisma's TypedDict validation rejects optional fields set to None.
+    # We only include fields that have values, then cast at the end.
+    data: dict[str, Any] = {
+        "Session": {"connect": {"id": session_id}},
+        "role": role,
+        "sequence": sequence,
+    }
+
+    # Add optional string fields
+    if content is not None:
+        data["content"] = content
+    if name is not None:
+        data["name"] = name
+    if tool_call_id is not None:
+        data["toolCallId"] = tool_call_id
+    if refusal is not None:
+        data["refusal"] = refusal
+
+    # Add optional JSON fields only when they have values
+    if tool_calls is not None:
+        data["toolCalls"] = SafeJson(tool_calls)
+    if function_call is not None:
+        data["functionCall"] = SafeJson(function_call)
+
+    # Run message create and session timestamp update in parallel for lower latency
+    _, message = await asyncio.gather(
+        PrismaChatSession.prisma().update(
+            where={"id": session_id},
+            data={"updatedAt": datetime.now(UTC)},
+        ),
+        PrismaChatMessage.prisma().create(data=cast(ChatMessageCreateInput, data)),
+    )
+    return message
+
+
+async def add_chat_messages_batch(
+    session_id: str,
+    messages: list[dict[str, Any]],
+    start_sequence: int,
+) -> list[PrismaChatMessage]:
+    """Add multiple messages to a chat session in a batch.
+
+    Uses a transaction for atomicity - if any message creation fails,
+    the entire batch is rolled back.
+    """
+    if not messages:
+        return []
+
+    created_messages = []
+
+    async with transaction() as tx:
+        for i, msg in enumerate(messages):
+            # Build input dict dynamically rather than using ChatMessageCreateInput
+            # directly because Prisma's TypedDict validation rejects optional fields
+            # set to None. We only include fields that have values, then cast.
+            data: dict[str, Any] = {
+                "Session": {"connect": {"id": session_id}},
+                "role": msg["role"],
+                "sequence": start_sequence + i,
+            }
+
+            # Add optional string fields
+            if msg.get("content") is not None:
+                data["content"] = msg["content"]
+            if msg.get("name") is not None:
+                data["name"] = msg["name"]
+            if msg.get("tool_call_id") is not None:
+                data["toolCallId"] = msg["tool_call_id"]
+            if msg.get("refusal") is not None:
+                data["refusal"] = msg["refusal"]
+
+            # Add optional JSON fields only when they have values
+            if msg.get("tool_calls") is not None:
+                data["toolCalls"] = SafeJson(msg["tool_calls"])
+            if msg.get("function_call") is not None:
+                data["functionCall"] = SafeJson(msg["function_call"])
+
+            created = await PrismaChatMessage.prisma(tx).create(
+                data=cast(ChatMessageCreateInput, data)
+            )
+            created_messages.append(created)
+
+        # Update session's updatedAt timestamp within the same transaction.
+        # Note: Token usage (total_prompt_tokens, total_completion_tokens) is updated
+        # separately via update_chat_session() after streaming completes.
+        await PrismaChatSession.prisma(tx).update(
+            where={"id": session_id},
+            data={"updatedAt": datetime.now(UTC)},
+        )
+
+    return created_messages
+
+
+async def get_user_chat_sessions(
+    user_id: str,
+    limit: int = 50,
+    offset: int = 0,
+) -> list[PrismaChatSession]:
+    """Get chat sessions for a user, ordered by most recent."""
+    return await PrismaChatSession.prisma().find_many(
+        where={"userId": user_id},
+        order={"updatedAt": "desc"},
+        take=limit,
+        skip=offset,
+    )
+
+
+async def get_user_session_count(user_id: str) -> int:
+    """Get the total number of chat sessions for a user."""
+    return await PrismaChatSession.prisma().count(where={"userId": user_id})
+
+
+async def delete_chat_session(session_id: str, user_id: str | None = None) -> bool:
+    """Delete a chat session and all its messages.
+
+    Args:
+        session_id: The session ID to delete.
+        user_id: If provided, validates that the session belongs to this user
+            before deletion. This prevents unauthorized deletion of other
+            users' sessions.
+
+    Returns:
+        True if deleted successfully, False otherwise.
+    """
+    try:
+        # Build typed where clause with optional user_id validation
+        where_clause: ChatSessionWhereInput = {"id": session_id}
+        if user_id is not None:
+            where_clause["userId"] = user_id
+
+        result = await PrismaChatSession.prisma().delete_many(where=where_clause)
+        if result == 0:
+            logger.warning(
+                f"No session deleted for {session_id} "
+                f"(user_id validation: {user_id is not None})"
+            )
+            return False
+        return True
+    except Exception as e:
+        logger.error(f"Failed to delete chat session {session_id}: {e}")
+        return False
+
+
+async def get_chat_session_message_count(session_id: str) -> int:
+    """Get the number of messages in a chat session."""
+    count = await PrismaChatMessage.prisma().count(where={"sessionId": session_id})
+    return count
+
+
+async def update_tool_message_content(
+    session_id: str,
+    tool_call_id: str,
+    new_content: str,
+) -> bool:
+    """Update the content of a tool message in chat history.
+
+    Used by background tasks to update pending operation messages with final results.
+
+    Args:
+        session_id: The chat session ID.
+        tool_call_id: The tool call ID to find the message.
+        new_content: The new content to set.
+
+    Returns:
+        True if a message was updated, False otherwise.
+    """
+    try:
+        result = await PrismaChatMessage.prisma().update_many(
+            where={
+                "sessionId": session_id,
+                "toolCallId": tool_call_id,
+            },
+            data={
+                "content": new_content,
+            },
+        )
+        if result == 0:
+            logger.warning(
+                f"No message found to update for session {session_id}, "
+                f"tool_call_id {tool_call_id}"
+            )
+            return False
+        return True
+    except Exception as e:
+        logger.error(
+            f"Failed to update tool message for session {session_id}, "
+            f"tool_call_id {tool_call_id}: {e}"
+        )
+        return False
--- a/autogpt_platform/backend/backend/api/features/chat/model.py
+++ b/autogpt_platform/backend/backend/api/features/chat/model.py
@@ -2,7 +2,7 @@ import asyncio
 import logging
 import uuid
 from datetime import UTC, datetime
-from typing import Any, Self, cast
+from typing import Any, cast
 from weakref import WeakValueDictionary

 from openai.types.chat import (
@@ -23,17 +23,26 @@ from prisma.models import ChatMessage as PrismaChatMessage
 from prisma.models import ChatSession as PrismaChatSession
 from pydantic import BaseModel

-from backend.data.db_accessors import chat_db
 from backend.data.redis_client import get_redis_async
 from backend.util import json
 from backend.util.exceptions import DatabaseError, RedisError

+from . import db as chat_db
 from .config import ChatConfig

 logger = logging.getLogger(__name__)
 config = ChatConfig()


+def _parse_json_field(value: str | dict | list | None, default: Any = None) -> Any:
+    """Parse a JSON field that may be stored as string or already parsed."""
+    if value is None:
+        return default
+    if isinstance(value, str):
+        return json.loads(value)
+    return value
+
+
 # Redis cache key prefix for chat sessions
 CHAT_SESSION_CACHE_PREFIX = "chat:session:"

@@ -43,7 +52,28 @@ def _get_session_cache_key(session_id: str) -> str:
    return f"{CHAT_SESSION_CACHE_PREFIX}{session_id}"


-# ===================== Chat data models ===================== #
+# Session-level locks to prevent race conditions during concurrent upserts.
+# Uses WeakValueDictionary to automatically garbage collect locks when no longer referenced,
+# preventing unbounded memory growth while maintaining lock semantics for active sessions.
+# Invalidation: Locks are auto-removed by GC when no coroutine holds a reference (after
+# async with lock: completes). Explicit cleanup also occurs in delete_chat_session().
+_session_locks: WeakValueDictionary[str, asyncio.Lock] = WeakValueDictionary()
+_session_locks_mutex = asyncio.Lock()
+
+
+async def _get_session_lock(session_id: str) -> asyncio.Lock:
+    """Get or create a lock for a specific session to prevent concurrent upserts.
+
+    Uses WeakValueDictionary for automatic cleanup: locks are garbage collected
+    when no coroutine holds a reference to them, preventing memory leaks from
+    unbounded growth of session locks.
+    """
+    async with _session_locks_mutex:
+        lock = _session_locks.get(session_id)
+        if lock is None:
+            lock = asyncio.Lock()
+            _session_locks[session_id] = lock
+        return lock


 class ChatMessage(BaseModel):
@@ -55,19 +85,6 @@ class ChatMessage(BaseModel):
    tool_calls: list[dict] | None = None
    function_call: dict | None = None

-    @staticmethod
-    def from_db(prisma_message: PrismaChatMessage) -> "ChatMessage":
-        """Convert a Prisma ChatMessage to a Pydantic ChatMessage."""
-        return ChatMessage(
-            role=prisma_message.role,
-            content=prisma_message.content,
-            name=prisma_message.name,
-            tool_call_id=prisma_message.toolCallId,
-            refusal=prisma_message.refusal,
-            tool_calls=_parse_json_field(prisma_message.toolCalls),
-            function_call=_parse_json_field(prisma_message.functionCall),
-        )
-

 class Usage(BaseModel):
    prompt_tokens: int
@@ -75,10 +92,11 @@ class Usage(BaseModel):
    total_tokens: int


-class ChatSessionInfo(BaseModel):
+class ChatSession(BaseModel):
    session_id: str
    user_id: str
    title: str | None = None
+    messages: list[ChatMessage]
    usage: list[Usage]
    credentials: dict[str, dict] = {}  # Map of provider -> credential metadata
    started_at: datetime
@@ -86,9 +104,60 @@ class ChatSessionInfo(BaseModel):
    successful_agent_runs: dict[str, int] = {}
    successful_agent_schedules: dict[str, int] = {}

-    @classmethod
-    def from_db(cls, prisma_session: PrismaChatSession) -> Self:
-        """Convert Prisma ChatSession to Pydantic ChatSession."""
+    def add_tool_call_to_current_turn(self, tool_call: dict) -> None:
+        """Attach a tool_call to the current turn's assistant message.
+
+        Searches backwards for the most recent assistant message (stopping at
+        any user message boundary). If found, appends the tool_call to it.
+        Otherwise creates a new assistant message with the tool_call.
+        """
+        for msg in reversed(self.messages):
+            if msg.role == "user":
+                break
+            if msg.role == "assistant":
+                if not msg.tool_calls:
+                    msg.tool_calls = []
+                msg.tool_calls.append(tool_call)
+                return
+
+        self.messages.append(
+            ChatMessage(role="assistant", content="", tool_calls=[tool_call])
+        )
+
+    @staticmethod
+    def new(user_id: str) -> "ChatSession":
+        return ChatSession(
+            session_id=str(uuid.uuid4()),
+            user_id=user_id,
+            title=None,
+            messages=[],
+            usage=[],
+            credentials={},
+            started_at=datetime.now(UTC),
+            updated_at=datetime.now(UTC),
+        )
+
+    @staticmethod
+    def from_db(
+        prisma_session: PrismaChatSession,
+        prisma_messages: list[PrismaChatMessage] | None = None,
+    ) -> "ChatSession":
+        """Convert Prisma models to Pydantic ChatSession."""
+        messages = []
+        if prisma_messages:
+            for msg in prisma_messages:
+                messages.append(
+                    ChatMessage(
+                        role=msg.role,
+                        content=msg.content,
+                        name=msg.name,
+                        tool_call_id=msg.toolCallId,
+                        refusal=msg.refusal,
+                        tool_calls=_parse_json_field(msg.toolCalls),
+                        function_call=_parse_json_field(msg.functionCall),
+                    )
+                )
+
        # Parse JSON fields from Prisma
        credentials = _parse_json_field(prisma_session.credentials, default={})
        successful_agent_runs = _parse_json_field(
@@ -110,10 +179,11 @@ class ChatSessionInfo(BaseModel):
                )
            )

-        return cls(
+        return ChatSession(
            session_id=prisma_session.id,
            user_id=prisma_session.userId,
            title=prisma_session.title,
+            messages=messages,
            usage=usage,
            credentials=credentials,
            started_at=prisma_session.createdAt,
@@ -122,55 +192,46 @@ class ChatSessionInfo(BaseModel):
            successful_agent_schedules=successful_agent_schedules,
        )

+    @staticmethod
+    def _merge_consecutive_assistant_messages(
+        messages: list[ChatCompletionMessageParam],
+    ) -> list[ChatCompletionMessageParam]:
+        """Merge consecutive assistant messages into single messages.

-class ChatSession(ChatSessionInfo):
-    messages: list[ChatMessage]
-
-    @classmethod
-    def new(cls, user_id: str) -> Self:
-        return cls(
-            session_id=str(uuid.uuid4()),
-            user_id=user_id,
-            title=None,
-            messages=[],
-            usage=[],
-            credentials={},
-            started_at=datetime.now(UTC),
-            updated_at=datetime.now(UTC),
-        )
-
-    @classmethod
-    def from_db(cls, prisma_session: PrismaChatSession) -> Self:
-        """Convert Prisma ChatSession to Pydantic ChatSession."""
-        if prisma_session.Messages is None:
-            raise ValueError(
-                f"Prisma session {prisma_session.id} is missing Messages relation"
-            )
-
-        return cls(
-            **ChatSessionInfo.from_db(prisma_session).model_dump(),
-            messages=[ChatMessage.from_db(m) for m in prisma_session.Messages],
-        )
-
-    def add_tool_call_to_current_turn(self, tool_call: dict) -> None:
-        """Attach a tool_call to the current turn's assistant message.
-
-        Searches backwards for the most recent assistant message (stopping at
-        any user message boundary). If found, appends the tool_call to it.
-        Otherwise creates a new assistant message with the tool_call.
+        Long-running tool flows can create split assistant messages: one with
+        text content and another with tool_calls. Anthropic's API requires
+        tool_result blocks to reference a tool_use in the immediately preceding
+        assistant message, so these splits cause 400 errors via OpenRouter.
        """
-        for msg in reversed(self.messages):
-            if msg.role == "user":
-                break
-            if msg.role == "assistant":
-                if not msg.tool_calls:
-                    msg.tool_calls = []
-                msg.tool_calls.append(tool_call)
-                return
+        if len(messages) < 2:
+            return messages

-        self.messages.append(
-            ChatMessage(role="assistant", content="", tool_calls=[tool_call])
-        )
+        result: list[ChatCompletionMessageParam] = [messages[0]]
+        for msg in messages[1:]:
+            prev = result[-1]
+            if prev.get("role") != "assistant" or msg.get("role") != "assistant":
+                result.append(msg)
+                continue
+
+            prev = cast(ChatCompletionAssistantMessageParam, prev)
+            curr = cast(ChatCompletionAssistantMessageParam, msg)
+
+            curr_content = curr.get("content") or ""
+            if curr_content:
+                prev_content = prev.get("content") or ""
+                prev["content"] = (
+                    f"{prev_content}\n{curr_content}" if prev_content else curr_content
+                )
+
+            curr_tool_calls = curr.get("tool_calls")
+            if curr_tool_calls:
+                prev_tool_calls = prev.get("tool_calls")
+                prev["tool_calls"] = (
+                    list(prev_tool_calls) + list(curr_tool_calls)
+                    if prev_tool_calls
+                    else list(curr_tool_calls)
+                )
+        return result

    def to_openai_messages(self) -> list[ChatCompletionMessageParam]:
        messages = []
@@ -260,70 +321,40 @@ class ChatSession(ChatSessionInfo):
                )
        return self._merge_consecutive_assistant_messages(messages)

-    @staticmethod
-    def _merge_consecutive_assistant_messages(
-        messages: list[ChatCompletionMessageParam],
-    ) -> list[ChatCompletionMessageParam]:
-        """Merge consecutive assistant messages into single messages.

-        Long-running tool flows can create split assistant messages: one with
-        text content and another with tool_calls. Anthropic's API requires
-        tool_result blocks to reference a tool_use in the immediately preceding
-        assistant message, so these splits cause 400 errors via OpenRouter.
-        """
-        if len(messages) < 2:
-            return messages
+async def _get_session_from_cache(session_id: str) -> ChatSession | None:
+    """Get a chat session from Redis cache."""
+    redis_key = _get_session_cache_key(session_id)
+    async_redis = await get_redis_async()
+    raw_session: bytes | None = await async_redis.get(redis_key)

-        result: list[ChatCompletionMessageParam] = [messages[0]]
-        for msg in messages[1:]:
-            prev = result[-1]
-            if prev.get("role") != "assistant" or msg.get("role") != "assistant":
-                result.append(msg)
-                continue
+    if raw_session is None:
+        return None

-            prev = cast(ChatCompletionAssistantMessageParam, prev)
-            curr = cast(ChatCompletionAssistantMessageParam, msg)
-
-            curr_content = curr.get("content") or ""
-            if curr_content:
-                prev_content = prev.get("content") or ""
-                prev["content"] = (
-                    f"{prev_content}\n{curr_content}" if prev_content else curr_content
-                )
-
-            curr_tool_calls = curr.get("tool_calls")
-            if curr_tool_calls:
-                prev_tool_calls = prev.get("tool_calls")
-                prev["tool_calls"] = (
-                    list(prev_tool_calls) + list(curr_tool_calls)
-                    if prev_tool_calls
-                    else list(curr_tool_calls)
-                )
-        return result
+    try:
+        session = ChatSession.model_validate_json(raw_session)
+        logger.info(
+            f"[CACHE] Loaded session {session_id}: {len(session.messages)} messages, "
+            f"last_roles={[m.role for m in session.messages[-3:]]}"  # Last 3 roles
+        )
+        return session
+    except Exception as e:
+        logger.error(f"Failed to deserialize session {session_id}: {e}", exc_info=True)
+        raise RedisError(f"Corrupted session data for {session_id}") from e


-def _parse_json_field(value: str | dict | list | None, default: Any = None) -> Any:
-    """Parse a JSON field that may be stored as string or already parsed."""
-    if value is None:
-        return default
-    if isinstance(value, str):
-        return json.loads(value)
-    return value
-
-
-# ================ Chat cache + DB operations ================ #
-
-# NOTE: Database calls are automatically routed through DatabaseManager if Prisma is not
-#       connected directly.
-
-
-async def cache_chat_session(session: ChatSession) -> None:
-    """Cache a chat session in Redis (without persisting to the database)."""
+async def _cache_session(session: ChatSession) -> None:
+    """Cache a chat session in Redis."""
    redis_key = _get_session_cache_key(session.session_id)
    async_redis = await get_redis_async()
    await async_redis.setex(redis_key, config.session_ttl, session.model_dump_json())


+async def cache_chat_session(session: ChatSession) -> None:
+    """Cache a chat session without persisting to the database."""
+    await _cache_session(session)
+
+
 async def invalidate_session_cache(session_id: str) -> None:
    """Invalidate a chat session from Redis cache.

@@ -339,6 +370,77 @@ async def invalidate_session_cache(session_id: str) -> None:
        logger.warning(f"Failed to invalidate session cache for {session_id}: {e}")


+async def _get_session_from_db(session_id: str) -> ChatSession | None:
+    """Get a chat session from the database."""
+    prisma_session = await chat_db.get_chat_session(session_id)
+    if not prisma_session:
+        return None
+
+    messages = prisma_session.Messages
+    logger.debug(
+        f"[DB] Loaded session {session_id}: {len(messages) if messages else 0} messages, "
+        f"roles={[m.role for m in messages[-3:]] if messages else []}"  # Last 3 roles
+    )
+
+    return ChatSession.from_db(prisma_session, messages)
+
+
+async def _save_session_to_db(
+    session: ChatSession, existing_message_count: int
+) -> None:
+    """Save or update a chat session in the database."""
+    # Check if session exists in DB
+    existing = await chat_db.get_chat_session(session.session_id)
+
+    if not existing:
+        # Create new session
+        await chat_db.create_chat_session(
+            session_id=session.session_id,
+            user_id=session.user_id,
+        )
+        existing_message_count = 0
+
+    # Calculate total tokens from usage
+    total_prompt = sum(u.prompt_tokens for u in session.usage)
+    total_completion = sum(u.completion_tokens for u in session.usage)
+
+    # Update session metadata
+    await chat_db.update_chat_session(
+        session_id=session.session_id,
+        credentials=session.credentials,
+        successful_agent_runs=session.successful_agent_runs,
+        successful_agent_schedules=session.successful_agent_schedules,
+        total_prompt_tokens=total_prompt,
+        total_completion_tokens=total_completion,
+    )
+
+    # Add new messages (only those after existing count)
+    new_messages = session.messages[existing_message_count:]
+    if new_messages:
+        messages_data = []
+        for msg in new_messages:
+            messages_data.append(
+                {
+                    "role": msg.role,
+                    "content": msg.content,
+                    "name": msg.name,
+                    "tool_call_id": msg.tool_call_id,
+                    "refusal": msg.refusal,
+                    "tool_calls": msg.tool_calls,
+                    "function_call": msg.function_call,
+                }
+            )
+        logger.debug(
+            f"[DB] Saving {len(new_messages)} messages to session {session.session_id}, "
+            f"roles={[m['role'] for m in messages_data]}"
+        )
+        await chat_db.add_chat_messages_batch(
+            session_id=session.session_id,
+            messages=messages_data,
+            start_sequence=existing_message_count,
+        )
+
+
 async def get_chat_session(
    session_id: str,
    user_id: str | None = None,
@@ -386,52 +488,13 @@ async def get_chat_session(

    # Cache the session from DB
    try:
-        await cache_chat_session(session)
-        logger.info(f"Cached session {session_id} from database")
+        await _cache_session(session)
    except Exception as e:
        logger.warning(f"Failed to cache session {session_id}: {e}")

    return session


-async def _get_session_from_cache(session_id: str) -> ChatSession | None:
-    """Get a chat session from Redis cache."""
-    redis_key = _get_session_cache_key(session_id)
-    async_redis = await get_redis_async()
-    raw_session: bytes | None = await async_redis.get(redis_key)
-
-    if raw_session is None:
-        return None
-
-    try:
-        session = ChatSession.model_validate_json(raw_session)
-        logger.info(
-            f"Loading session {session_id} from cache: "
-            f"message_count={len(session.messages)}, "
-            f"roles={[m.role for m in session.messages]}"
-        )
-        return session
-    except Exception as e:
-        logger.error(f"Failed to deserialize session {session_id}: {e}", exc_info=True)
-        raise RedisError(f"Corrupted session data for {session_id}") from e
-
-
-async def _get_session_from_db(session_id: str) -> ChatSession | None:
-    """Get a chat session from the database."""
-    session = await chat_db().get_chat_session(session_id)
-    if not session:
-        return None
-
-    logger.info(
-        f"Loaded session {session_id} from DB: "
-        f"has_messages={bool(session.messages)}, "
-        f"message_count={len(session.messages)}, "
-        f"roles={[m.role for m in session.messages]}"
-    )
-
-    return session
-
-
 async def upsert_chat_session(
    session: ChatSession,
 ) -> ChatSession:
@@ -451,35 +514,25 @@ async def upsert_chat_session(
    lock = await _get_session_lock(session.session_id)

    async with lock:
-        # Always query DB for existing message count to ensure consistency
-        existing_message_count = await chat_db().get_next_sequence(session.session_id)
+        # Get existing message count from DB for incremental saves
+        existing_message_count = await chat_db.get_chat_session_message_count(
+            session.session_id
+        )

        db_error: Exception | None = None

        # Save to database (primary storage)
        try:
-            await _save_session_to_db(
-                session,
-                existing_message_count,
-                skip_existence_check=existing_message_count > 0,
-            )
+            await _save_session_to_db(session, existing_message_count)
        except Exception as e:
            logger.error(
                f"Failed to save session {session.session_id} to database: {e}"
            )
            db_error = e

-        # Save to cache (best-effort, even if DB failed).
-        # Title updates (update_session_title) run *outside* this lock because
-        # they only touch the title field, not messages.  So a concurrent rename
-        # or auto-title may have written a newer title to Redis while this
-        # upsert was in progress.  Always prefer the cached title to avoid
-        # overwriting it with the stale in-memory copy.
+        # Save to cache (best-effort, even if DB failed)
        try:
-            existing_cached = await _get_session_from_cache(session.session_id)
-            if existing_cached and existing_cached.title:
-                session = session.model_copy(update={"title": existing_cached.title})
-            await cache_chat_session(session)
+            await _cache_session(session)
        except Exception as e:
            # If DB succeeded but cache failed, raise cache error
            if db_error is None:
@@ -500,75 +553,6 @@ async def upsert_chat_session(
        return session


-async def _save_session_to_db(
-    session: ChatSession,
-    existing_message_count: int,
-    *,
-    skip_existence_check: bool = False,
-) -> None:
-    """Save or update a chat session in the database.
-
-    Args:
-        skip_existence_check: When True, skip the ``get_chat_session`` query
-            and assume the session row already exists.  Saves one DB round trip
-            for incremental saves during streaming.
-    """
-    db = chat_db()
-
-    if not skip_existence_check:
-        # Check if session exists in DB
-        existing = await db.get_chat_session(session.session_id)
-
-        if not existing:
-            # Create new session
-            await db.create_chat_session(
-                session_id=session.session_id,
-                user_id=session.user_id,
-            )
-            existing_message_count = 0
-
-    # Calculate total tokens from usage
-    total_prompt = sum(u.prompt_tokens for u in session.usage)
-    total_completion = sum(u.completion_tokens for u in session.usage)
-
-    # Update session metadata
-    await db.update_chat_session(
-        session_id=session.session_id,
-        credentials=session.credentials,
-        successful_agent_runs=session.successful_agent_runs,
-        successful_agent_schedules=session.successful_agent_schedules,
-        total_prompt_tokens=total_prompt,
-        total_completion_tokens=total_completion,
-    )
-
-    # Add new messages (only those after existing count)
-    new_messages = session.messages[existing_message_count:]
-    if new_messages:
-        messages_data = []
-        for msg in new_messages:
-            messages_data.append(
-                {
-                    "role": msg.role,
-                    "content": msg.content,
-                    "name": msg.name,
-                    "tool_call_id": msg.tool_call_id,
-                    "refusal": msg.refusal,
-                    "tool_calls": msg.tool_calls,
-                    "function_call": msg.function_call,
-                }
-            )
-        logger.info(
-            f"Saving {len(new_messages)} new messages to DB for session {session.session_id}: "
-            f"roles={[m['role'] for m in messages_data]}, "
-            f"start_sequence={existing_message_count}"
-        )
-        await db.add_chat_messages_batch(
-            session_id=session.session_id,
-            messages=messages_data,
-            start_sequence=existing_message_count,
-        )
-
-
 async def append_and_save_message(session_id: str, message: ChatMessage) -> ChatSession:
    """Atomically append a message to a session and persist it.

@@ -584,7 +568,9 @@ async def append_and_save_message(session_id: str, message: ChatMessage) -> Chat
            raise ValueError(f"Session {session_id} not found")

        session.messages.append(message)
-        existing_message_count = await chat_db().get_next_sequence(session_id)
+        existing_message_count = await chat_db.get_chat_session_message_count(
+            session_id
+        )

        try:
            await _save_session_to_db(session, existing_message_count)
@@ -594,7 +580,7 @@ async def append_and_save_message(session_id: str, message: ChatMessage) -> Chat
            ) from e

        try:
-            await cache_chat_session(session)
+            await _cache_session(session)
        except Exception as e:
            logger.warning(f"Cache write failed for session {session_id}: {e}")

@@ -613,7 +599,7 @@ async def create_chat_session(user_id: str) -> ChatSession:

    # Create in database first - fail fast if this fails
    try:
-        await chat_db().create_chat_session(
+        await chat_db.create_chat_session(
            session_id=session.session_id,
            user_id=user_id,
        )
@@ -625,7 +611,7 @@ async def create_chat_session(user_id: str) -> ChatSession:

    # Cache the session (best-effort optimization, DB is source of truth)
    try:
-        await cache_chat_session(session)
+        await _cache_session(session)
    except Exception as e:
        logger.warning(f"Failed to cache new session {session.session_id}: {e}")

@@ -636,16 +622,20 @@ async def get_user_sessions(
    user_id: str,
    limit: int = 50,
    offset: int = 0,
-) -> tuple[list[ChatSessionInfo], int]:
+) -> tuple[list[ChatSession], int]:
    """Get chat sessions for a user from the database with total count.

    Returns:
        A tuple of (sessions, total_count) where total_count is the overall
        number of sessions for the user (not just the current page).
    """
-    db = chat_db()
-    sessions = await db.get_user_chat_sessions(user_id, limit, offset)
-    total_count = await db.get_user_session_count(user_id)
+    prisma_sessions = await chat_db.get_user_chat_sessions(user_id, limit, offset)
+    total_count = await chat_db.get_user_session_count(user_id)
+
+    sessions = []
+    for prisma_session in prisma_sessions:
+        # Convert without messages for listing (lighter weight)
+        sessions.append(ChatSession.from_db(prisma_session, None))

    return sessions, total_count

@@ -663,7 +653,7 @@ async def delete_chat_session(session_id: str, user_id: str | None = None) -> bo
    """
    # Delete from database first (with optional user_id validation)
    # This confirms ownership before invalidating cache
-    deleted = await chat_db().delete_chat_session(session_id, user_id)
+    deleted = await chat_db.delete_chat_session(session_id, user_id)

    if not deleted:
        return False
@@ -680,47 +670,27 @@ async def delete_chat_session(session_id: str, user_id: str | None = None) -> bo
    async with _session_locks_mutex:
        _session_locks.pop(session_id, None)

-    # Shut down any local browser daemon for this session (best-effort).
-    # Inline import required: all tool modules import ChatSession from this
-    # module, so any top-level import from tools.* would create a cycle.
-    try:
-        from .tools.agent_browser import close_browser_session
-
-        await close_browser_session(session_id, user_id=user_id)
-    except Exception as e:
-        logger.debug(f"Browser cleanup for session {session_id}: {e}")
-
    return True


-async def update_session_title(
-    session_id: str,
-    user_id: str,
-    title: str,
-    *,
-    only_if_empty: bool = False,
-) -> bool:
-    """Update the title of a chat session, scoped to the owning user.
+async def update_session_title(session_id: str, title: str) -> bool:
+    """Update only the title of a chat session.

-    Lightweight operation that doesn't touch messages, avoiding race conditions
-    with concurrent message updates.
+    This is a lightweight operation that doesn't touch messages, avoiding
+    race conditions with concurrent message updates. Use this for background
+    title generation instead of upsert_chat_session.

    Args:
        session_id: The session ID to update.
-        user_id: Owning user — the DB query filters on this.
        title: The new title to set.
-        only_if_empty: When True, uses an atomic ``UPDATE WHERE title IS NULL``
-            so auto-generated titles never overwrite a user-set title.

    Returns:
-        True if updated successfully, False otherwise (not found, wrong user,
-        or — when only_if_empty — title was already set).
+        True if updated successfully, False otherwise.
    """
    try:
-        updated = await chat_db().update_chat_session_title(
-            session_id, user_id, title, only_if_empty=only_if_empty
-        )
-        if not updated:
+        result = await chat_db.update_chat_session(session_id=session_id, title=title)
+        if result is None:
+            logger.warning(f"Session {session_id} not found for title update")
            return False

        # Update title in cache if it exists (instead of invalidating).
@@ -730,39 +700,14 @@ async def update_session_title(
            cached = await _get_session_from_cache(session_id)
            if cached:
                cached.title = title
-                await cache_chat_session(cached)
+                await _cache_session(cached)
        except Exception as e:
+            # Not critical - title will be correct on next full cache refresh
            logger.warning(
-                f"Cache title update failed for session {session_id} (non-critical): {e}"
+                f"Failed to update title in cache for session {session_id}: {e}"
            )

        return True
    except Exception as e:
        logger.error(f"Failed to update title for session {session_id}: {e}")
        return False
-
-
-# ==================== Chat session locks ==================== #
-
-_session_locks: WeakValueDictionary[str, asyncio.Lock] = WeakValueDictionary()
-_session_locks_mutex = asyncio.Lock()
-
-
-async def _get_session_lock(session_id: str) -> asyncio.Lock:
-    """Get or create a lock for a specific session to prevent concurrent upserts.
-
-    This was originally added to solve the specific problem of race conditions between
-    the session title thread and the conversation thread, which always occurs on the
-    same instance as we prevent rapid request sends on the frontend.
-
-    Uses WeakValueDictionary for automatic cleanup: locks are garbage collected
-    when no coroutine holds a reference to them, preventing memory leaks from
-    unbounded growth of session locks. Explicit cleanup also occurs
-    in `delete_chat_session()`.
-    """
-    async with _session_locks_mutex:
-        lock = _session_locks.get(session_id)
-        if lock is None:
-            lock = asyncio.Lock()
-            _session_locks[session_id] = lock
-        return lock
--- a/autogpt_platform/backend/backend/api/features/chat/model_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/model_test.py
@@ -331,96 +331,3 @@ def test_to_openai_messages_merges_split_assistants():
    tc_list = merged.get("tool_calls")
    assert tc_list is not None and len(list(tc_list)) == 1
    assert list(tc_list)[0]["id"] == "tc1"
-
-
-# --------------------------------------------------------------------------- #
-#  Concurrent save collision detection                                        #
-# --------------------------------------------------------------------------- #
-
-
-@pytest.mark.asyncio(loop_scope="session")
-async def test_concurrent_saves_collision_detection(setup_test_user, test_user_id):
-    """Test that concurrent saves from streaming loop and callback handle collisions correctly.
-
-    Simulates the race condition where:
-    1. Streaming loop starts with saved_msg_count=5
-    2. Long-running callback appends message #5 and saves
-    3. Streaming loop tries to save with stale count=5
-
-    The collision detection should handle this gracefully.
-    """
-    import asyncio
-
-    # Create a session with initial messages
-    session = ChatSession.new(user_id=test_user_id)
-    for i in range(3):
-        session.messages.append(
-            ChatMessage(
-                role="user" if i % 2 == 0 else "assistant", content=f"Message {i}"
-            )
-        )
-
-    # Save initial messages
-    session = await upsert_chat_session(session)
-
-    # Simulate streaming loop and callback saving concurrently
-    async def streaming_loop_save():
-        """Simulates streaming loop saving messages."""
-        # Add 2 messages
-        session.messages.append(ChatMessage(role="user", content="Streaming message 1"))
-        session.messages.append(
-            ChatMessage(role="assistant", content="Streaming message 2")
-        )
-
-        # Wait a bit to let callback potentially save first
-        await asyncio.sleep(0.01)
-
-        # Save (will query DB for existing count)
-        return await upsert_chat_session(session)
-
-    async def callback_save():
-        """Simulates long-running callback saving a message."""
-        # Add 1 message
-        session.messages.append(
-            ChatMessage(role="tool", content="Callback result", tool_call_id="tc1")
-        )
-
-        # Save immediately (will query DB for existing count)
-        return await upsert_chat_session(session)
-
-    # Run both saves concurrently - one will hit collision detection
-    results = await asyncio.gather(streaming_loop_save(), callback_save())
-
-    # Both should succeed
-    assert all(r is not None for r in results)
-
-    # Reload session from DB to verify
-    from backend.data.redis_client import get_redis_async
-
-    redis_key = f"chat:session:{session.session_id}"
-    async_redis = await get_redis_async()
-    await async_redis.delete(redis_key)  # Clear cache to force DB load
-
-    loaded_session = await get_chat_session(session.session_id, test_user_id)
-    assert loaded_session is not None
-
-    # Should have all 6 messages (3 initial + 2 streaming + 1 callback)
-    assert len(loaded_session.messages) == 6
-
-    # Verify no duplicate sequences
-    sequences = []
-    for i, msg in enumerate(loaded_session.messages):
-        # Messages should have sequential sequence numbers starting from 0
-        sequences.append(i)
-
-    # All sequences should be unique and sequential
-    assert sequences == list(range(6))
-
-    # Verify message content is preserved
-    contents = [m.content for m in loaded_session.messages]
-    assert "Message 0" in contents
-    assert "Message 1" in contents
-    assert "Message 2" in contents
-    assert "Streaming message 1" in contents
-    assert "Streaming message 2" in contents
-    assert "Callback result" in contents
--- a/autogpt_platform/backend/backend/api/features/chat/response_model.py
+++ b/autogpt_platform/backend/backend/api/features/chat/response_model.py
@@ -5,17 +5,12 @@ This module implements the AI SDK UI Stream Protocol (v1) for streaming chat res
 See: https://ai-sdk.dev/docs/ai-sdk-ui/stream-protocol
 """

-import json
-import logging
 from enum import Enum
 from typing import Any

 from pydantic import BaseModel, Field

 from backend.util.json import dumps as json_dumps
-from backend.util.truncate import truncate
-
-logger = logging.getLogger(__name__)


 class ResponseType(str, Enum):
@@ -52,8 +47,7 @@ class StreamBaseResponse(BaseModel):

    def to_sse(self) -> str:
        """Convert to SSE format."""
-        json_str = self.model_dump_json(exclude_none=True)
-        return f"data: {json_str}\n\n"
+        return f"data: {self.model_dump_json()}\n\n"


 # ========== Message Lifecycle ==========
@@ -64,13 +58,15 @@ class StreamStart(StreamBaseResponse):

    type: ResponseType = ResponseType.START
    messageId: str = Field(..., description="Unique message ID")
-    sessionId: str | None = Field(
+    taskId: str | None = Field(
        default=None,
-        description="Session ID for SSE reconnection.",
+        description="Task ID for SSE reconnection. Clients can reconnect using GET /tasks/{taskId}/stream",
    )

    def to_sse(self) -> str:
-        """Convert to SSE format, excluding non-protocol fields like sessionId."""
+        """Convert to SSE format, excluding non-protocol fields like taskId."""
+        import json
+
        data: dict[str, Any] = {
            "type": self.type.value,
            "messageId": self.messageId,
@@ -151,9 +147,6 @@ class StreamToolInputAvailable(StreamBaseResponse):
    )


-_MAX_TOOL_OUTPUT_SIZE = 100_000  # ~100 KB; truncate to avoid bloating SSE/DB
-
-
 class StreamToolOutputAvailable(StreamBaseResponse):
    """Tool execution result."""

@@ -168,12 +161,10 @@ class StreamToolOutputAvailable(StreamBaseResponse):
        default=True, description="Whether the tool execution succeeded"
    )

-    def model_post_init(self, __context: Any) -> None:
-        """Truncate oversized outputs after construction."""
-        self.output = truncate(self.output, _MAX_TOOL_OUTPUT_SIZE)
-
    def to_sse(self) -> str:
        """Convert to SSE format, excluding non-spec fields."""
+        import json
+
        data = {
            "type": self.type.value,
            "toolCallId": self.toolCallId,
--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
@@ -2,34 +2,33 @@

 import asyncio
 import logging
-import re
+import uuid as uuid_module
 from collections.abc import AsyncGenerator
 from typing import Annotated
-from uuid import uuid4

 from autogpt_libs import auth
-from fastapi import APIRouter, Depends, HTTPException, Query, Response, Security
+from fastapi import APIRouter, Depends, Header, HTTPException, Query, Response, Security
 from fastapi.responses import StreamingResponse
-from prisma.models import UserWorkspaceFile
-from pydantic import BaseModel, Field, field_validator
+from pydantic import BaseModel

-from backend.copilot import service as chat_service
-from backend.copilot import stream_registry
-from backend.copilot.config import ChatConfig
-from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
-from backend.copilot.model import (
+from backend.util.exceptions import NotFoundError
+from backend.util.feature_flag import Flag, is_feature_enabled
+
+from . import service as chat_service
+from . import stream_registry
+from .completion_handler import process_operation_failure, process_operation_success
+from .config import ChatConfig
+from .model import (
    ChatMessage,
    ChatSession,
    append_and_save_message,
    create_chat_session,
-    delete_chat_session,
    get_chat_session,
    get_user_sessions,
-    update_session_title,
 )
-from backend.copilot.response_model import StreamError, StreamFinish, StreamHeartbeat
-from backend.copilot.tools.e2b_sandbox import kill_sandbox
-from backend.copilot.tools.models import (
+from .response_model import StreamError, StreamFinish, StreamHeartbeat, StreamStart
+from .sdk import service as sdk_service
+from .tools.models import (
    AgentDetailsResponse,
    AgentOutputResponse,
    AgentPreviewResponse,
@@ -44,23 +43,18 @@ from backend.copilot.tools.models import (
    ErrorResponse,
    ExecutionStartedResponse,
    InputValidationErrorResponse,
-    MCPToolOutputResponse,
-    MCPToolsDiscoveredResponse,
    NeedLoginResponse,
    NoResultsResponse,
+    OperationInProgressResponse,
+    OperationPendingResponse,
+    OperationStartedResponse,
    SetupRequirementsResponse,
-    SuggestedGoalResponse,
    UnderstandingUpdatedResponse,
 )
-from backend.copilot.tracking import track_user_message
-from backend.data.workspace import get_or_create_workspace
-from backend.util.exceptions import NotFoundError
+from .tracking import track_user_message

 config = ChatConfig()

-_UUID_RE = re.compile(
-    r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", re.I
-)

 logger = logging.getLogger(__name__)

@@ -89,9 +83,6 @@ class StreamChatRequest(BaseModel):
    message: str
    is_user_message: bool = True
    context: dict[str, str] | None = None  # {url: str, content: str}
-    file_ids: list[str] | None = Field(
-        default=None, max_length=20
-    )  # Workspace file IDs attached to this message


 class CreateSessionResponse(BaseModel):
@@ -105,8 +96,10 @@ class CreateSessionResponse(BaseModel):
 class ActiveStreamInfo(BaseModel):
    """Information about an active stream for reconnection."""

-    turn_id: str
+    task_id: str
    last_message_id: str  # Redis Stream message ID for resumption
+    operation_id: str  # Operation ID for completion tracking
+    tool_name: str  # Name of the tool being executed


 class SessionDetailResponse(BaseModel):
@@ -136,25 +129,12 @@ class ListSessionsResponse(BaseModel):
    total: int


-class CancelSessionResponse(BaseModel):
-    """Response model for the cancel session endpoint."""
+class OperationCompleteRequest(BaseModel):
+    """Request model for external completion webhook."""

-    cancelled: bool
-    reason: str | None = None
-
-
-class UpdateSessionTitleRequest(BaseModel):
-    """Request model for updating a session's title."""
-
-    title: str
-
-    @field_validator("title")
-    @classmethod
-    def title_must_not_be_blank(cls, v: str) -> str:
-        stripped = v.strip()
-        if not stripped:
-            raise ValueError("Title must not be blank")
-        return stripped
+    success: bool
+    result: dict | str | None = None
+    error: str | None = None


 # ========== Routes ==========
@@ -231,92 +211,6 @@ async def create_session(
    )


-@router.delete(
-    "/sessions/{session_id}",
-    dependencies=[Security(auth.requires_user)],
-    status_code=204,
-    responses={404: {"description": "Session not found or access denied"}},
-)
-async def delete_session(
-    session_id: str,
-    user_id: Annotated[str, Security(auth.get_user_id)],
-) -> Response:
-    """
-    Delete a chat session.
-
-    Permanently removes a chat session and all its messages.
-    Only the owner can delete their sessions.
-
-    Args:
-        session_id: The session ID to delete.
-        user_id: The authenticated user's ID.
-
-    Returns:
-        204 No Content on success.
-
-    Raises:
-        HTTPException: 404 if session not found or not owned by user.
-    """
-    deleted = await delete_chat_session(session_id, user_id)
-
-    if not deleted:
-        raise HTTPException(
-            status_code=404,
-            detail=f"Session {session_id} not found or access denied",
-        )
-
-    # Best-effort cleanup of the E2B sandbox (if any).
-    # sandbox_id is in Redis; kill_sandbox() fetches it from there.
-    e2b_cfg = ChatConfig()
-    if e2b_cfg.e2b_active:
-        assert e2b_cfg.e2b_api_key  # guaranteed by e2b_active check
-        try:
-            await kill_sandbox(session_id, e2b_cfg.e2b_api_key)
-        except Exception:
-            logger.warning(
-                "[E2B] Failed to kill sandbox for session %s", session_id[:12]
-            )
-
-    return Response(status_code=204)
-
-
-@router.patch(
-    "/sessions/{session_id}/title",
-    summary="Update session title",
-    dependencies=[Security(auth.requires_user)],
-    status_code=200,
-    responses={404: {"description": "Session not found or access denied"}},
-)
-async def update_session_title_route(
-    session_id: str,
-    request: UpdateSessionTitleRequest,
-    user_id: Annotated[str, Security(auth.get_user_id)],
-) -> dict:
-    """
-    Update the title of a chat session.
-
-    Allows the user to rename their chat session.
-
-    Args:
-        session_id: The session ID to update.
-        request: Request body containing the new title.
-        user_id: The authenticated user's ID.
-
-    Returns:
-        dict: Status of the update.
-
-    Raises:
-        HTTPException: 404 if session not found or not owned by user.
-    """
-    success = await update_session_title(session_id, user_id, request.title)
-    if not success:
-        raise HTTPException(
-            status_code=404,
-            detail=f"Session {session_id} not found or access denied",
-        )
-    return {"status": "ok"}
-
-
@router.get(
    "/sessions/{session_id}",
 )
@@ -328,7 +222,7 @@ async def get_session(
    Retrieve the details of a specific chat session.

    Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
-    If there's an active stream for this session, returns active_stream info for reconnection.
+    If there's an active stream for this session, returns the task_id for reconnection.

    Args:
        session_id: The unique identifier for the desired chat session.
@@ -346,21 +240,28 @@ async def get_session(

    # Check if there's an active stream for this session
    active_stream_info = None
-    active_session, last_message_id = await stream_registry.get_active_session(
+    active_task, last_message_id = await stream_registry.get_active_task_for_session(
        session_id, user_id
    )
    logger.info(
-        f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
+        f"[GET_SESSION] session={session_id}, active_task={active_task is not None}, "
        f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
    )
-    if active_session:
-        # Keep the assistant message (including tool_calls) so the frontend can
-        # render the correct tool UI (e.g. CreateAgent with mini game).
-        # convertChatSessionToUiMessages handles isComplete=false by setting
-        # tool parts without output to state "input-available".
+    if active_task:
+        # Filter out the in-progress assistant message from the session response.
+        # The client will receive the complete assistant response through the SSE
+        # stream replay instead, preventing duplicate content.
+        if messages and messages[-1].get("role") == "assistant":
+            messages = messages[:-1]
+
+        # Use "0-0" as last_message_id to replay the stream from the beginning.
+        # Since we filtered out the cached assistant message, the client needs
+        # the full stream to reconstruct the response.
        active_stream_info = ActiveStreamInfo(
-            turn_id=active_session.turn_id,
-            last_message_id=last_message_id,
+            task_id=active_task.task_id,
+            last_message_id="0-0",
+            operation_id=active_task.operation_id,
+            tool_name=active_task.tool_name,
        )

    return SessionDetailResponse(
@@ -373,51 +274,6 @@ async def get_session(
    )


-@router.post(
-    "/sessions/{session_id}/cancel",
-    status_code=200,
-)
-async def cancel_session_task(
-    session_id: str,
-    user_id: Annotated[str | None, Depends(auth.get_user_id)],
-) -> CancelSessionResponse:
-    """Cancel the active streaming task for a session.
-
-    Publishes a cancel event to the executor via RabbitMQ FANOUT, then
-    polls Redis until the task status flips from ``running`` or a timeout
-    (5 s) is reached.  Returns only after the cancellation is confirmed.
-    """
-    await _validate_and_get_session(session_id, user_id)
-
-    active_session, _ = await stream_registry.get_active_session(session_id, user_id)
-    if not active_session:
-        return CancelSessionResponse(cancelled=True, reason="no_active_session")
-
-    await enqueue_cancel_task(session_id)
-    logger.info(f"[CANCEL] Published cancel for session ...{session_id[-8:]}")
-
-    # Poll until the executor confirms the task is no longer running.
-    poll_interval = 0.5
-    max_wait = 5.0
-    waited = 0.0
-    while waited < max_wait:
-        await asyncio.sleep(poll_interval)
-        waited += poll_interval
-        session_state = await stream_registry.get_session(session_id)
-        if session_state is None or session_state.status != "running":
-            logger.info(
-                f"[CANCEL] Session ...{session_id[-8:]} confirmed stopped "
-                f"(status={session_state.status if session_state else 'gone'}) after {waited:.1f}s"
-            )
-            return CancelSessionResponse(cancelled=True)
-
-    logger.warning(
-        f"[CANCEL] Session ...{session_id[-8:]} not confirmed after {max_wait}s, force-completing"
-    )
-    await stream_registry.mark_session_completed(session_id, error_message="Cancelled")
-    return CancelSessionResponse(cancelled=True)
-
-
@router.post(
    "/sessions/{session_id}/stream",
 )
@@ -435,15 +291,16 @@ async def stream_chat_post(
      - Tool execution results

    The AI generation runs in a background task that continues even if the client disconnects.
-    All chunks are written to a per-turn Redis stream for reconnection support. If the client
-    disconnects, they can reconnect using GET /sessions/{session_id}/stream to resume.
+    All chunks are written to Redis for reconnection support. If the client disconnects,
+    they can reconnect using GET /tasks/{task_id}/stream to resume from where they left off.

    Args:
        session_id: The chat session identifier to associate with the streamed messages.
        request: Request body containing message, is_user_message, and optional context.
        user_id: Optional authenticated user ID.
    Returns:
-        StreamingResponse: SSE-formatted response chunks.
+        StreamingResponse: SSE-formatted response chunks. First chunk is a "start" event
+        containing the task_id for reconnection.

    """
    import asyncio
@@ -459,7 +316,7 @@ async def stream_chat_post(
        f"user={user_id}, message_len={len(request.message)}",
        extra={"json_fields": log_meta},
    )
-    await _validate_and_get_session(session_id, user_id)
+    session = await _validate_and_get_session(session_id, user_id)
    logger.info(
        f"[TIMING] session validated in {(time.perf_counter() - stream_start_time) * 1000:.1f}ms",
        extra={
@@ -470,38 +327,6 @@ async def stream_chat_post(
        },
    )

-    # Enrich message with file metadata if file_ids are provided.
-    # Also sanitise file_ids so only validated, workspace-scoped IDs are
-    # forwarded downstream (e.g. to the executor via enqueue_copilot_turn).
-    sanitized_file_ids: list[str] | None = None
-    if request.file_ids and user_id:
-        # Filter to valid UUIDs only to prevent DB abuse
-        valid_ids = [fid for fid in request.file_ids if _UUID_RE.match(fid)]
-
-        if valid_ids:
-            workspace = await get_or_create_workspace(user_id)
-            # Batch query instead of N+1
-            files = await UserWorkspaceFile.prisma().find_many(
-                where={
-                    "id": {"in": valid_ids},
-                    "workspaceId": workspace.id,
-                    "isDeleted": False,
-                }
-            )
-            # Only keep IDs that actually exist in the user's workspace
-            sanitized_file_ids = [wf.id for wf in files] or None
-            file_lines: list[str] = [
-                f"- {wf.name} ({wf.mimeType}, {round(wf.sizeBytes / 1024, 1)} KB), file_id={wf.id}"
-                for wf in files
-            ]
-            if file_lines:
-                files_block = (
-                    "\n\n[Attached files]\n"
-                    + "\n".join(file_lines)
-                    + "\nUse read_workspace_file with the file_id to access file contents."
-                )
-                request.message += files_block
-
    # Atomically append user message to session BEFORE creating task to avoid
    # race condition where GET_SESSION sees task as "running" but message isn't
    # saved yet.  append_and_save_message re-fetches inside a lock to prevent
@@ -518,47 +343,152 @@ async def stream_chat_post(
                message_length=len(request.message),
            )
        logger.info(f"[STREAM] Saving user message to session {session_id}")
-        await append_and_save_message(session_id, message)
+        session = await append_and_save_message(session_id, message)
        logger.info(f"[STREAM] User message saved for session {session_id}")

    # Create a task in the stream registry for reconnection support
-    turn_id = str(uuid4())
-    log_meta["turn_id"] = turn_id
+    task_id = str(uuid_module.uuid4())
+    operation_id = str(uuid_module.uuid4())
+    log_meta["task_id"] = task_id

-    session_create_start = time.perf_counter()
-    await stream_registry.create_session(
+    task_create_start = time.perf_counter()
+    await stream_registry.create_task(
+        task_id=task_id,
        session_id=session_id,
        user_id=user_id,
-        tool_call_id="chat_stream",
+        tool_call_id="chat_stream",  # Not a tool call, but needed for the model
        tool_name="chat",
-        turn_id=turn_id,
+        operation_id=operation_id,
    )
    logger.info(
-        f"[TIMING] create_session completed in {(time.perf_counter() - session_create_start) * 1000:.1f}ms",
+        f"[TIMING] create_task completed in {(time.perf_counter() - task_create_start) * 1000:.1f}ms",
        extra={
            "json_fields": {
                **log_meta,
-                "duration_ms": (time.perf_counter() - session_create_start) * 1000,
+                "duration_ms": (time.perf_counter() - task_create_start) * 1000,
            }
        },
    )

-    # Per-turn stream is always fresh (unique turn_id), subscribe from beginning
-    subscribe_from_id = "0-0"
+    # Background task that runs the AI generation independently of SSE connection
+    async def run_ai_generation():
+        import time as time_module

-    await enqueue_copilot_turn(
-        session_id=session_id,
-        user_id=user_id,
-        message=request.message,
-        turn_id=turn_id,
-        is_user_message=request.is_user_message,
-        context=request.context,
-        file_ids=sanitized_file_ids,
-    )
+        gen_start_time = time_module.perf_counter()
+        logger.info(
+            f"[TIMING] run_ai_generation STARTED, task={task_id}, session={session_id}, user={user_id}",
+            extra={"json_fields": log_meta},
+        )
+        first_chunk_time, ttfc = None, None
+        chunk_count = 0
+        try:
+            # Emit a start event with task_id for reconnection
+            start_chunk = StreamStart(messageId=task_id, taskId=task_id)
+            await stream_registry.publish_chunk(task_id, start_chunk)
+            logger.info(
+                f"[TIMING] StreamStart published at {(time_module.perf_counter() - gen_start_time) * 1000:.1f}ms",
+                extra={
+                    "json_fields": {
+                        **log_meta,
+                        "elapsed_ms": (time_module.perf_counter() - gen_start_time)
+                        * 1000,
+                    }
+                },
+            )

+            # Choose service based on LaunchDarkly flag (falls back to config default)
+            use_sdk = await is_feature_enabled(
+                Flag.COPILOT_SDK,
+                user_id or "anonymous",
+                default=config.use_claude_agent_sdk,
+            )
+            stream_fn = (
+                sdk_service.stream_chat_completion_sdk
+                if use_sdk
+                else chat_service.stream_chat_completion
+            )
+            logger.info(
+                f"[TIMING] Calling {'sdk' if use_sdk else 'standard'} stream_chat_completion",
+                extra={"json_fields": log_meta},
+            )
+            # Pass message=None since we already added it to the session above
+            async for chunk in stream_fn(
+                session_id,
+                None,  # Message already in session
+                is_user_message=request.is_user_message,
+                user_id=user_id,
+                session=session,  # Pass session with message already added
+                context=request.context,
+            ):
+                # Skip duplicate StreamStart — we already published one above
+                if isinstance(chunk, StreamStart):
+                    continue
+                chunk_count += 1
+                if first_chunk_time is None:
+                    first_chunk_time = time_module.perf_counter()
+                    ttfc = first_chunk_time - gen_start_time
+                    logger.info(
+                        f"[TIMING] FIRST AI CHUNK at {ttfc:.2f}s, type={type(chunk).__name__}",
+                        extra={
+                            "json_fields": {
+                                **log_meta,
+                                "chunk_type": type(chunk).__name__,
+                                "time_to_first_chunk_ms": ttfc * 1000,
+                            }
+                        },
+                    )
+                # Write to Redis (subscribers will receive via XREAD)
+                await stream_registry.publish_chunk(task_id, chunk)
+
+            gen_end_time = time_module.perf_counter()
+            total_time = (gen_end_time - gen_start_time) * 1000
+            logger.info(
+                f"[TIMING] run_ai_generation FINISHED in {total_time / 1000:.1f}s; "
+                f"task={task_id}, session={session_id}, "
+                f"ttfc={ttfc or -1:.2f}s, n_chunks={chunk_count}",
+                extra={
+                    "json_fields": {
+                        **log_meta,
+                        "total_time_ms": total_time,
+                        "time_to_first_chunk_ms": (
+                            ttfc * 1000 if ttfc is not None else None
+                        ),
+                        "n_chunks": chunk_count,
+                    }
+                },
+            )
+            await stream_registry.mark_task_completed(task_id, "completed")
+        except Exception as e:
+            elapsed = time_module.perf_counter() - gen_start_time
+            logger.error(
+                f"[TIMING] run_ai_generation ERROR after {elapsed:.2f}s: {e}",
+                extra={
+                    "json_fields": {
+                        **log_meta,
+                        "elapsed_ms": elapsed * 1000,
+                        "error": str(e),
+                    }
+                },
+            )
+            # Publish a StreamError so the frontend can display an error message
+            try:
+                await stream_registry.publish_chunk(
+                    task_id,
+                    StreamError(
+                        errorText="An error occurred. Please try again.",
+                        code="stream_error",
+                    ),
+                )
+            except Exception:
+                pass  # Best-effort; mark_task_completed will publish StreamFinish
+            await stream_registry.mark_task_completed(task_id, "failed")
+
+    # Start the AI generation in a background task
+    bg_task = asyncio.create_task(run_ai_generation())
+    await stream_registry.set_task_asyncio_task(task_id, bg_task)
    setup_time = (time.perf_counter() - stream_start_time) * 1000
    logger.info(
-        f"[TIMING] Task enqueued to RabbitMQ, setup={setup_time:.1f}ms",
+        f"[TIMING] Background task started, setup={setup_time:.1f}ms",
        extra={"json_fields": {**log_meta, "setup_time_ms": setup_time}},
    )

@@ -568,7 +498,7 @@ async def stream_chat_post(

        event_gen_start = time_module.perf_counter()
        logger.info(
-            f"[TIMING] event_generator STARTED, turn={turn_id}, session={session_id}, "
+            f"[TIMING] event_generator STARTED, task={task_id}, session={session_id}, "
            f"user={user_id}",
            extra={"json_fields": log_meta},
        )
@@ -576,12 +506,11 @@ async def stream_chat_post(
        first_chunk_yielded = False
        chunks_yielded = 0
        try:
-            # Subscribe from the position we captured before enqueuing
-            # This avoids replaying old messages while catching all new ones
-            subscriber_queue = await stream_registry.subscribe_to_session(
-                session_id=session_id,
+            # Subscribe to the task stream (this replays existing messages + live updates)
+            subscriber_queue = await stream_registry.subscribe_to_task(
+                task_id=task_id,
                user_id=user_id,
-                last_message_id=subscribe_from_id,
+                last_message_id="0-0",  # Get all messages from the beginning
            )

            if subscriber_queue is None:
@@ -596,7 +525,7 @@ async def stream_chat_post(
            )
            while True:
                try:
-                    chunk = await asyncio.wait_for(subscriber_queue.get(), timeout=10.0)
+                    chunk = await asyncio.wait_for(subscriber_queue.get(), timeout=30.0)
                    chunks_yielded += 1

                    if not first_chunk_yielded:
@@ -664,19 +593,19 @@ async def stream_chat_post(
            # Unsubscribe when client disconnects or stream ends
            if subscriber_queue is not None:
                try:
-                    await stream_registry.unsubscribe_from_session(
-                        session_id, subscriber_queue
+                    await stream_registry.unsubscribe_from_task(
+                        task_id, subscriber_queue
                    )
                except Exception as unsub_err:
                    logger.error(
-                        f"Error unsubscribing from session {session_id}: {unsub_err}",
+                        f"Error unsubscribing from task {task_id}: {unsub_err}",
                        exc_info=True,
                    )
            # AI SDK protocol termination - always yield even if unsubscribe fails
            total_time = time_module.perf_counter() - event_gen_start
            logger.info(
                f"[TIMING] event_generator FINISHED in {total_time:.2f}s; "
-                f"turn={turn_id}, session={session_id}, n_chunks={chunks_yielded}",
+                f"task={task_id}, session={session_id}, n_chunks={chunks_yielded}",
                extra={
                    "json_fields": {
                        **log_meta,
@@ -723,21 +652,17 @@ async def resume_session_stream(
    """
    import asyncio

-    active_session, last_message_id = await stream_registry.get_active_session(
+    active_task, _last_id = await stream_registry.get_active_task_for_session(
        session_id, user_id
    )

-    if not active_session:
+    if not active_task:
        return Response(status_code=204)

-    # Always replay from the beginning ("0-0") on resume.
-    # We can't use last_message_id because it's the latest ID in the backend
-    # stream, not the latest the frontend received — the gap causes lost
-    # messages. The frontend deduplicates replayed content.
-    subscriber_queue = await stream_registry.subscribe_to_session(
-        session_id=session_id,
+    subscriber_queue = await stream_registry.subscribe_to_task(
+        task_id=active_task.task_id,
        user_id=user_id,
-        last_message_id="0-0",
+        last_message_id="0-0",  # Full replay so useChat rebuilds the message
    )

    if subscriber_queue is None:
@@ -749,7 +674,7 @@ async def resume_session_stream(
        try:
            while True:
                try:
-                    chunk = await asyncio.wait_for(subscriber_queue.get(), timeout=10.0)
+                    chunk = await asyncio.wait_for(subscriber_queue.get(), timeout=30.0)
                    if chunk_count < 3:
                        logger.info(
                            "Resume stream chunk",
@@ -773,12 +698,12 @@ async def resume_session_stream(
            logger.error(f"Error in resume stream for session {session_id}: {e}")
        finally:
            try:
-                await stream_registry.unsubscribe_from_session(
-                    session_id, subscriber_queue
+                await stream_registry.unsubscribe_from_task(
+                    active_task.task_id, subscriber_queue
                )
            except Exception as unsub_err:
                logger.error(
-                    f"Error unsubscribing from session {active_session.session_id}: {unsub_err}",
+                    f"Error unsubscribing from task {active_task.task_id}: {unsub_err}",
                    exc_info=True,
                )
            logger.info(
@@ -806,6 +731,7 @@ async def resume_session_stream(
@router.patch(
    "/sessions/{session_id}/assign-user",
    dependencies=[Security(auth.requires_user)],
+    status_code=200,
 )
 async def session_assign_user(
    session_id: str,
@@ -828,6 +754,229 @@ async def session_assign_user(
    return {"status": "ok"}


+# ========== Task Streaming (SSE Reconnection) ==========
+
+
+@router.get(
+    "/tasks/{task_id}/stream",
+)
+async def stream_task(
+    task_id: str,
+    user_id: str | None = Depends(auth.get_user_id),
+    last_message_id: str = Query(
+        default="0-0",
+        description="Last Redis Stream message ID received (e.g., '1706540123456-0'). Use '0-0' for full replay.",
+    ),
+):
+    """
+    Reconnect to a long-running task's SSE stream.
+
+    When a long-running operation (like agent generation) starts, the client
+    receives a task_id. If the connection drops, the client can reconnect
+    using this endpoint to resume receiving updates.
+
+    Args:
+        task_id: The task ID from the operation_started response.
+        user_id: Authenticated user ID for ownership validation.
+        last_message_id: Last Redis Stream message ID received ("0-0" for full replay).
+
+    Returns:
+        StreamingResponse: SSE-formatted response chunks starting after last_message_id.
+
+    Raises:
+        HTTPException: 404 if task not found, 410 if task expired, 403 if access denied.
+    """
+    # Check task existence and expiry before subscribing
+    task, error_code = await stream_registry.get_task_with_expiry_info(task_id)
+
+    if error_code == "TASK_EXPIRED":
+        raise HTTPException(
+            status_code=410,
+            detail={
+                "code": "TASK_EXPIRED",
+                "message": "This operation has expired. Please try again.",
+            },
+        )
+
+    if error_code == "TASK_NOT_FOUND":
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "code": "TASK_NOT_FOUND",
+                "message": f"Task {task_id} not found.",
+            },
+        )
+
+    # Validate ownership if task has an owner
+    if task and task.user_id and user_id != task.user_id:
+        raise HTTPException(
+            status_code=403,
+            detail={
+                "code": "ACCESS_DENIED",
+                "message": "You do not have access to this task.",
+            },
+        )
+
+    # Get subscriber queue from stream registry
+    subscriber_queue = await stream_registry.subscribe_to_task(
+        task_id=task_id,
+        user_id=user_id,
+        last_message_id=last_message_id,
+    )
+
+    if subscriber_queue is None:
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "code": "TASK_NOT_FOUND",
+                "message": f"Task {task_id} not found or access denied.",
+            },
+        )
+
+    async def event_generator() -> AsyncGenerator[str, None]:
+        heartbeat_interval = 15.0  # Send heartbeat every 15 seconds
+        try:
+            while True:
+                try:
+                    # Wait for next chunk with timeout for heartbeats
+                    chunk = await asyncio.wait_for(
+                        subscriber_queue.get(), timeout=heartbeat_interval
+                    )
+                    yield chunk.to_sse()
+
+                    # Check for finish signal
+                    if isinstance(chunk, StreamFinish):
+                        break
+                except asyncio.TimeoutError:
+                    # Send heartbeat to keep connection alive
+                    yield StreamHeartbeat().to_sse()
+        except Exception as e:
+            logger.error(f"Error in task stream {task_id}: {e}", exc_info=True)
+        finally:
+            # Unsubscribe when client disconnects or stream ends
+            try:
+                await stream_registry.unsubscribe_from_task(task_id, subscriber_queue)
+            except Exception as unsub_err:
+                logger.error(
+                    f"Error unsubscribing from task {task_id}: {unsub_err}",
+                    exc_info=True,
+                )
+            # AI SDK protocol termination - always yield even if unsubscribe fails
+            yield "data: [DONE]\n\n"
+
+    return StreamingResponse(
+        event_generator(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+            "x-vercel-ai-ui-message-stream": "v1",
+        },
+    )
+
+
+@router.get(
+    "/tasks/{task_id}",
+)
+async def get_task_status(
+    task_id: str,
+    user_id: str | None = Depends(auth.get_user_id),
+) -> dict:
+    """
+    Get the status of a long-running task.
+
+    Args:
+        task_id: The task ID to check.
+        user_id: Authenticated user ID for ownership validation.
+
+    Returns:
+        dict: Task status including task_id, status, tool_name, and operation_id.
+
+    Raises:
+        NotFoundError: If task_id is not found or user doesn't have access.
+    """
+    task = await stream_registry.get_task(task_id)
+
+    if task is None:
+        raise NotFoundError(f"Task {task_id} not found.")
+
+    # Validate ownership - if task has an owner, requester must match
+    if task.user_id and user_id != task.user_id:
+        raise NotFoundError(f"Task {task_id} not found.")
+
+    return {
+        "task_id": task.task_id,
+        "session_id": task.session_id,
+        "status": task.status,
+        "tool_name": task.tool_name,
+        "operation_id": task.operation_id,
+        "created_at": task.created_at.isoformat(),
+    }
+
+
+# ========== External Completion Webhook ==========
+
+
+@router.post(
+    "/operations/{operation_id}/complete",
+    status_code=200,
+)
+async def complete_operation(
+    operation_id: str,
+    request: OperationCompleteRequest,
+    x_api_key: str | None = Header(default=None),
+) -> dict:
+    """
+    External completion webhook for long-running operations.
+
+    Called by Agent Generator (or other services) when an operation completes.
+    This triggers the stream registry to publish completion and continue LLM generation.
+
+    Args:
+        operation_id: The operation ID to complete.
+        request: Completion payload with success status and result/error.
+        x_api_key: Internal API key for authentication.
+
+    Returns:
+        dict: Status of the completion.
+
+    Raises:
+        HTTPException: If API key is invalid or operation not found.
+    """
+    # Validate internal API key - reject if not configured or invalid
+    if not config.internal_api_key:
+        logger.error(
+            "Operation complete webhook rejected: CHAT_INTERNAL_API_KEY not configured"
+        )
+        raise HTTPException(
+            status_code=503,
+            detail="Webhook not available: internal API key not configured",
+        )
+    if x_api_key != config.internal_api_key:
+        raise HTTPException(status_code=401, detail="Invalid API key")
+
+    # Find task by operation_id
+    task = await stream_registry.find_task_by_operation_id(operation_id)
+    if task is None:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Operation {operation_id} not found",
+        )
+
+    logger.info(
+        f"Received completion webhook for operation {operation_id} "
+        f"(task_id={task.task_id}, success={request.success})"
+    )
+
+    if request.success:
+        await process_operation_success(task, request.result)
+    else:
+        await process_operation_failure(task, request.error)
+
+    return {"status": "ok", "task_id": task.task_id}
+
+
 # ========== Configuration ==========


@@ -902,14 +1051,14 @@ ToolResponseUnion = (
    | AgentPreviewResponse
    | AgentSavedResponse
    | ClarificationNeededResponse
-    | SuggestedGoalResponse
    | BlockListResponse
    | BlockDetailsResponse
    | BlockOutputResponse
    | DocSearchResultsResponse
    | DocPageResponse
-    | MCPToolsDiscoveredResponse
-    | MCPToolOutputResponse
+    | OperationStartedResponse
+    | OperationPendingResponse
+    | OperationInProgressResponse
 )


--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -1,251 +0,0 @@
-"""Tests for chat API routes: session title update and file attachment validation."""
-
-from unittest.mock import AsyncMock
-
-import fastapi
-import fastapi.testclient
-import pytest
-import pytest_mock
-
-from backend.api.features.chat import routes as chat_routes
-
-app = fastapi.FastAPI()
-app.include_router(chat_routes.router)
-
-client = fastapi.testclient.TestClient(app)
-
-TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
-
-
-@pytest.fixture(autouse=True)
-def setup_app_auth(mock_jwt_user):
-    """Setup auth overrides for all tests in this module"""
-    from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def _mock_update_session_title(
-    mocker: pytest_mock.MockerFixture, *, success: bool = True
-):
-    """Mock update_session_title."""
-    return mocker.patch(
-        "backend.api.features.chat.routes.update_session_title",
-        new_callable=AsyncMock,
-        return_value=success,
-    )
-
-
-# ─── Update title: success ─────────────────────────────────────────────
-
-
-def test_update_title_success(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    mock_update = _mock_update_session_title(mocker, success=True)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "My project"},
-    )
-
-    assert response.status_code == 200
-    assert response.json() == {"status": "ok"}
-    mock_update.assert_called_once_with("sess-1", test_user_id, "My project")
-
-
-def test_update_title_trims_whitespace(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    mock_update = _mock_update_session_title(mocker, success=True)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "  trimmed  "},
-    )
-
-    assert response.status_code == 200
-    mock_update.assert_called_once_with("sess-1", test_user_id, "trimmed")
-
-
-# ─── Update title: blank / whitespace-only → 422 ──────────────────────
-
-
-def test_update_title_blank_rejected(
-    test_user_id: str,
-) -> None:
-    """Whitespace-only titles must be rejected before hitting the DB."""
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "   "},
-    )
-
-    assert response.status_code == 422
-
-
-def test_update_title_empty_rejected(
-    test_user_id: str,
-) -> None:
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": ""},
-    )
-
-    assert response.status_code == 422
-
-
-# ─── Update title: session not found or wrong user → 404 ──────────────
-
-
-def test_update_title_not_found(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    _mock_update_session_title(mocker, success=False)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "New name"},
-    )
-
-    assert response.status_code == 404
-
-
-# ─── file_ids Pydantic validation ─────────────────────────────────────
-
-
-def test_stream_chat_rejects_too_many_file_ids():
-    """More than 20 file_ids should be rejected by Pydantic validation (422)."""
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [f"00000000-0000-0000-0000-{i:012d}" for i in range(21)],
-        },
-    )
-    assert response.status_code == 422
-
-
-def _mock_stream_internals(mocker: pytest_mock.MockFixture):
-    """Mock the async internals of stream_chat_post so tests can exercise
-    validation and enrichment logic without needing Redis/RabbitMQ."""
-    mocker.patch(
-        "backend.api.features.chat.routes._validate_and_get_session",
-        return_value=None,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.append_and_save_message",
-        return_value=None,
-    )
-    mock_registry = mocker.MagicMock()
-    mock_registry.create_session = mocker.AsyncMock(return_value=None)
-    mocker.patch(
-        "backend.api.features.chat.routes.stream_registry",
-        mock_registry,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.enqueue_copilot_turn",
-        return_value=None,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.track_user_message",
-        return_value=None,
-    )
-
-
-def test_stream_chat_accepts_20_file_ids(mocker: pytest_mock.MockFixture):
-    """Exactly 20 file_ids should be accepted (not rejected by validation)."""
-    _mock_stream_internals(mocker)
-    # Patch workspace lookup as imported by the routes module
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "ws-1"})(),
-    )
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [f"00000000-0000-0000-0000-{i:012d}" for i in range(20)],
-        },
-    )
-    # Should get past validation — 200 streaming response expected
-    assert response.status_code == 200
-
-
-# ─── UUID format filtering ─────────────────────────────────────────────
-
-
-def test_file_ids_filters_invalid_uuids(mocker: pytest_mock.MockFixture):
-    """Non-UUID strings in file_ids should be silently filtered out
-    and NOT passed to the database query."""
-    _mock_stream_internals(mocker)
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "ws-1"})(),
-    )
-
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    valid_id = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
-    client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [
-                valid_id,
-                "not-a-uuid",
-                "../../../etc/passwd",
-                "",
-            ],
-        },
-    )
-
-    # The find_many call should only receive the one valid UUID
-    mock_prisma.find_many.assert_called_once()
-    call_kwargs = mock_prisma.find_many.call_args[1]
-    assert call_kwargs["where"]["id"]["in"] == [valid_id]
-
-
-# ─── Cross-workspace file_ids ─────────────────────────────────────────
-
-
-def test_file_ids_scoped_to_workspace(mocker: pytest_mock.MockFixture):
-    """The batch query should scope to the user's workspace."""
-    _mock_stream_internals(mocker)
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "my-workspace-id"})(),
-    )
-
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    fid = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
-    client.post(
-        "/sessions/sess-1/stream",
-        json={"message": "hi", "file_ids": [fid]},
-    )
-
-    call_kwargs = mock_prisma.find_many.call_args[1]
-    assert call_kwargs["where"]["workspaceId"] == "my-workspace-id"
-    assert call_kwargs["where"]["isDeleted"] is False
--- a/autogpt_platform/backend/backend/api/features/chat/sdk/init.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/init.py
--- a/autogpt_platform/backend/backend/api/features/chat/sdk/response_adapter.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/response_adapter.py
@@ -0,0 +1,203 @@
+"""Response adapter for converting Claude Agent SDK messages to Vercel AI SDK format.
+
+This module provides the adapter layer that converts streaming messages from
+the Claude Agent SDK into the Vercel AI SDK UI Stream Protocol format that
+the frontend expects.
+"""
+
+import json
+import logging
+import uuid
+
+from claude_agent_sdk import (
+    AssistantMessage,
+    Message,
+    ResultMessage,
+    SystemMessage,
+    TextBlock,
+    ToolResultBlock,
+    ToolUseBlock,
+    UserMessage,
+)
+
+from backend.api.features.chat.response_model import (
+    StreamBaseResponse,
+    StreamError,
+    StreamFinish,
+    StreamFinishStep,
+    StreamStart,
+    StreamStartStep,
+    StreamTextDelta,
+    StreamTextEnd,
+    StreamTextStart,
+    StreamToolInputAvailable,
+    StreamToolInputStart,
+    StreamToolOutputAvailable,
+)
+from backend.api.features.chat.sdk.tool_adapter import (
+    MCP_TOOL_PREFIX,
+    pop_pending_tool_output,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class SDKResponseAdapter:
+    """Adapter for converting Claude Agent SDK messages to Vercel AI SDK format.
+
+    This class maintains state during a streaming session to properly track
+    text blocks, tool calls, and message lifecycle.
+    """
+
+    def __init__(self, message_id: str | None = None):
+        self.message_id = message_id or str(uuid.uuid4())
+        self.text_block_id = str(uuid.uuid4())
+        self.has_started_text = False
+        self.has_ended_text = False
+        self.current_tool_calls: dict[str, dict[str, str]] = {}
+        self.task_id: str | None = None
+        self.step_open = False
+
+    def set_task_id(self, task_id: str) -> None:
+        """Set the task ID for reconnection support."""
+        self.task_id = task_id
+
+    def convert_message(self, sdk_message: Message) -> list[StreamBaseResponse]:
+        """Convert a single SDK message to Vercel AI SDK format."""
+        responses: list[StreamBaseResponse] = []
+
+        if isinstance(sdk_message, SystemMessage):
+            if sdk_message.subtype == "init":
+                responses.append(
+                    StreamStart(messageId=self.message_id, taskId=self.task_id)
+                )
+                # Open the first step (matches non-SDK: StreamStart then StreamStartStep)
+                responses.append(StreamStartStep())
+                self.step_open = True
+
+        elif isinstance(sdk_message, AssistantMessage):
+            # After tool results, the SDK sends a new AssistantMessage for the
+            # next LLM turn. Open a new step if the previous one was closed.
+            if not self.step_open:
+                responses.append(StreamStartStep())
+                self.step_open = True
+
+            for block in sdk_message.content:
+                if isinstance(block, TextBlock):
+                    if block.text:
+                        self._ensure_text_started(responses)
+                        responses.append(
+                            StreamTextDelta(id=self.text_block_id, delta=block.text)
+                        )
+
+                elif isinstance(block, ToolUseBlock):
+                    self._end_text_if_open(responses)
+
+                    # Strip MCP prefix so frontend sees "find_block"
+                    # instead of "mcp__copilot__find_block".
+                    tool_name = block.name.removeprefix(MCP_TOOL_PREFIX)
+
+                    responses.append(
+                        StreamToolInputStart(toolCallId=block.id, toolName=tool_name)
+                    )
+                    responses.append(
+                        StreamToolInputAvailable(
+                            toolCallId=block.id,
+                            toolName=tool_name,
+                            input=block.input,
+                        )
+                    )
+                    self.current_tool_calls[block.id] = {"name": tool_name}
+
+        elif isinstance(sdk_message, UserMessage):
+            # UserMessage carries tool results back from tool execution.
+            content = sdk_message.content
+            blocks = content if isinstance(content, list) else []
+            for block in blocks:
+                if isinstance(block, ToolResultBlock) and block.tool_use_id:
+                    tool_info = self.current_tool_calls.get(block.tool_use_id, {})
+                    tool_name = tool_info.get("name", "unknown")
+
+                    # Prefer the stashed full output over the SDK's
+                    # (potentially truncated) ToolResultBlock content.
+                    # The SDK truncates large results, writing them to disk,
+                    # which breaks frontend widget parsing.
+                    output = pop_pending_tool_output(tool_name) or (
+                        _extract_tool_output(block.content)
+                    )
+
+                    responses.append(
+                        StreamToolOutputAvailable(
+                            toolCallId=block.tool_use_id,
+                            toolName=tool_name,
+                            output=output,
+                            success=not (block.is_error or False),
+                        )
+                    )
+
+            # Close the current step after tool results — the next
+            # AssistantMessage will open a new step for the continuation.
+            if self.step_open:
+                responses.append(StreamFinishStep())
+                self.step_open = False
+
+        elif isinstance(sdk_message, ResultMessage):
+            self._end_text_if_open(responses)
+            # Close the step before finishing.
+            if self.step_open:
+                responses.append(StreamFinishStep())
+                self.step_open = False
+
+            if sdk_message.subtype == "success":
+                responses.append(StreamFinish())
+            elif sdk_message.subtype in ("error", "error_during_execution"):
+                error_msg = getattr(sdk_message, "result", None) or "Unknown error"
+                responses.append(
+                    StreamError(errorText=str(error_msg), code="sdk_error")
+                )
+                responses.append(StreamFinish())
+            else:
+                logger.warning(
+                    f"Unexpected ResultMessage subtype: {sdk_message.subtype}"
+                )
+                responses.append(StreamFinish())
+
+        else:
+            logger.debug(f"Unhandled SDK message type: {type(sdk_message).__name__}")
+
+        return responses
+
+    def _ensure_text_started(self, responses: list[StreamBaseResponse]) -> None:
+        """Start (or restart) a text block if needed."""
+        if not self.has_started_text or self.has_ended_text:
+            if self.has_ended_text:
+                self.text_block_id = str(uuid.uuid4())
+                self.has_ended_text = False
+            responses.append(StreamTextStart(id=self.text_block_id))
+            self.has_started_text = True
+
+    def _end_text_if_open(self, responses: list[StreamBaseResponse]) -> None:
+        """End the current text block if one is open."""
+        if self.has_started_text and not self.has_ended_text:
+            responses.append(StreamTextEnd(id=self.text_block_id))
+            self.has_ended_text = True
+
+
+def _extract_tool_output(content: str | list[dict[str, str]] | None) -> str:
+    """Extract a string output from a ToolResultBlock's content field."""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        parts = [item.get("text", "") for item in content if item.get("type") == "text"]
+        if parts:
+            return "".join(parts)
+        try:
+            return json.dumps(content)
+        except (TypeError, ValueError):
+            return str(content)
+    if content is None:
+        return ""
+    try:
+        return json.dumps(content)
+    except (TypeError, ValueError):
+        return str(content)
--- a/autogpt_platform/backend/backend/api/features/chat/sdk/response_adapter_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/response_adapter_test.py
@@ -1,8 +1,5 @@
 """Unit tests for the SDK response adapter."""

-import asyncio
-
-import pytest
 from claude_agent_sdk import (
    AssistantMessage,
    ResultMessage,
@@ -13,7 +10,7 @@ from claude_agent_sdk import (
    UserMessage,
 )

-from backend.copilot.response_model import (
+from backend.api.features.chat.response_model import (
    StreamBaseResponse,
    StreamError,
    StreamFinish,
@@ -30,14 +27,12 @@ from backend.copilot.response_model import (

 from .response_adapter import SDKResponseAdapter
 from .tool_adapter import MCP_TOOL_PREFIX
-from .tool_adapter import _pending_tool_outputs as _pto
-from .tool_adapter import _stash_event
-from .tool_adapter import stash_pending_tool_output as _stash
-from .tool_adapter import wait_for_stash


 def _adapter() -> SDKResponseAdapter:
-    return SDKResponseAdapter(message_id="msg-1", session_id="session-1")
+    a = SDKResponseAdapter(message_id="msg-1")
+    a.set_task_id("task-1")
+    return a


 # -- SystemMessage -----------------------------------------------------------
@@ -49,7 +44,7 @@ def test_system_init_emits_start_and_step():
    assert len(results) == 2
    assert isinstance(results[0], StreamStart)
    assert results[0].messageId == "msg-1"
-    assert results[0].sessionId == "session-1"
+    assert results[0].taskId == "task-1"
    assert isinstance(results[1], StreamStartStep)


@@ -369,314 +364,3 @@ def test_full_conversation_flow():
        "StreamFinishStep",  # step 2 closed
        "StreamFinish",
    ]
-
-
-# -- Flush unresolved tool calls --------------------------------------------
-
-
-def test_flush_unresolved_at_result_message():
-    """Built-in tools (WebSearch) without UserMessage results get flushed at ResultMessage."""
-    adapter = _adapter()
-    all_responses: list[StreamBaseResponse] = []
-
-    # 1. Init
-    all_responses.extend(
-        adapter.convert_message(SystemMessage(subtype="init", data={}))
-    )
-    # 2. Tool use (built-in tool — no MCP prefix)
-    all_responses.extend(
-        adapter.convert_message(
-            AssistantMessage(
-                content=[
-                    ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
-                ],
-                model="test",
-            )
-        )
-    )
-    # 3. No UserMessage for this tool — go straight to ResultMessage
-    all_responses.extend(
-        adapter.convert_message(
-            ResultMessage(
-                subtype="success",
-                duration_ms=100,
-                duration_api_ms=50,
-                is_error=False,
-                num_turns=1,
-                session_id="s1",
-            )
-        )
-    )
-
-    types = [type(r).__name__ for r in all_responses]
-    assert types == [
-        "StreamStart",
-        "StreamStartStep",
-        "StreamToolInputStart",
-        "StreamToolInputAvailable",
-        "StreamToolOutputAvailable",  # flushed with empty output
-        "StreamFinishStep",  # step closed by flush
-        "StreamFinish",
-    ]
-    # The flushed output should be empty (no stash available)
-    output_event = [
-        r for r in all_responses if isinstance(r, StreamToolOutputAvailable)
-    ][0]
-    assert output_event.toolCallId == "ws-1"
-    assert output_event.toolName == "WebSearch"
-    assert output_event.output == ""
-
-
-def test_flush_unresolved_at_next_assistant_message():
-    """Built-in tools get flushed when the next AssistantMessage arrives."""
-    adapter = _adapter()
-    all_responses: list[StreamBaseResponse] = []
-
-    # 1. Init
-    all_responses.extend(
-        adapter.convert_message(SystemMessage(subtype="init", data={}))
-    )
-    # 2. Tool use (built-in — no UserMessage will come)
-    all_responses.extend(
-        adapter.convert_message(
-            AssistantMessage(
-                content=[
-                    ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
-                ],
-                model="test",
-            )
-        )
-    )
-    # 3. Next AssistantMessage triggers flush before processing its blocks
-    all_responses.extend(
-        adapter.convert_message(
-            AssistantMessage(
-                content=[TextBlock(text="Here are the results")], model="test"
-            )
-        )
-    )
-
-    types = [type(r).__name__ for r in all_responses]
-    assert types == [
-        "StreamStart",
-        "StreamStartStep",
-        "StreamToolInputStart",
-        "StreamToolInputAvailable",
-        # Flush at next AssistantMessage:
-        "StreamToolOutputAvailable",
-        "StreamFinishStep",  # step closed by flush
-        # New step for continuation text:
-        "StreamStartStep",
-        "StreamTextStart",
-        "StreamTextDelta",
-    ]
-
-
-def test_flush_with_stashed_output():
-    """Stashed output from PostToolUse hook is used when flushing."""
-    adapter = _adapter()
-
-    # Simulate PostToolUse hook stashing output
-    _pto.set({})
-    _stash("WebSearch", "Search result: 5 items found")
-
-    all_responses: list[StreamBaseResponse] = []
-
-    # Tool use
-    all_responses.extend(
-        adapter.convert_message(
-            AssistantMessage(
-                content=[
-                    ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
-                ],
-                model="test",
-            )
-        )
-    )
-    # ResultMessage triggers flush
-    all_responses.extend(
-        adapter.convert_message(
-            ResultMessage(
-                subtype="success",
-                duration_ms=100,
-                duration_api_ms=50,
-                is_error=False,
-                num_turns=1,
-                session_id="s1",
-            )
-        )
-    )
-
-    output_events = [
-        r for r in all_responses if isinstance(r, StreamToolOutputAvailable)
-    ]
-    assert len(output_events) == 1
-    assert output_events[0].output == "Search result: 5 items found"
-
-    # Cleanup
-    _pto.set({})  # type: ignore[arg-type]
-
-
-# -- wait_for_stash synchronisation tests --
-
-
-@pytest.mark.asyncio
-async def test_wait_for_stash_signaled():
-    """wait_for_stash returns True when stash_pending_tool_output signals."""
-    _pto.set({})
-    event = asyncio.Event()
-    _stash_event.set(event)
-
-    # Simulate a PostToolUse hook that stashes output after a short delay
-    async def delayed_stash():
-        await asyncio.sleep(0.01)
-        _stash("WebSearch", "result data")
-
-    asyncio.create_task(delayed_stash())
-    result = await wait_for_stash(timeout=1.0)
-
-    assert result is True
-    pto = _pto.get()
-    assert pto is not None
-    assert pto.get("WebSearch") == ["result data"]
-
-    # Cleanup
-    _pto.set({})
-    _stash_event.set(None)
-
-
-@pytest.mark.asyncio
-async def test_wait_for_stash_timeout():
-    """wait_for_stash returns False on timeout when no stash occurs."""
-    _pto.set({})
-    event = asyncio.Event()
-    _stash_event.set(event)
-
-    result = await wait_for_stash(timeout=0.05)
-    assert result is False
-
-    # Cleanup
-    _pto.set({})
-    _stash_event.set(None)
-
-
-@pytest.mark.asyncio
-async def test_wait_for_stash_already_stashed():
-    """wait_for_stash picks up a stash that happened just before the wait."""
-    _pto.set({})
-    event = asyncio.Event()
-    _stash_event.set(event)
-
-    # Stash before waiting — simulates hook completing before message arrives
-    _stash("Read", "file contents")
-    # Event is now set; wait_for_stash detects the fast path and returns
-    # immediately without timing out.
-    result = await wait_for_stash(timeout=0.05)
-    assert result is True
-
-    # But the stash itself is populated
-    pto = _pto.get()
-    assert pto is not None
-    assert pto.get("Read") == ["file contents"]
-
-    # Cleanup
-    _pto.set({})
-    _stash_event.set(None)
-
-
-# -- Parallel tool call tests --
-
-
-def test_parallel_tool_calls_not_flushed_prematurely():
-    """Parallel tool calls should NOT be flushed when the next AssistantMessage
-    only contains ToolUseBlocks (parallel continuation)."""
-    adapter = SDKResponseAdapter()
-
-    # Init
-    adapter.convert_message(SystemMessage(subtype="init", data={}))
-
-    # First AssistantMessage: tool call #1
-    msg1 = AssistantMessage(
-        content=[ToolUseBlock(id="t1", name="WebSearch", input={"q": "foo"})],
-        model="test",
-    )
-    r1 = adapter.convert_message(msg1)
-    assert any(isinstance(r, StreamToolInputAvailable) for r in r1)
-    assert adapter.has_unresolved_tool_calls
-
-    # Second AssistantMessage: tool call #2 (parallel continuation)
-    msg2 = AssistantMessage(
-        content=[ToolUseBlock(id="t2", name="WebSearch", input={"q": "bar"})],
-        model="test",
-    )
-    r2 = adapter.convert_message(msg2)
-
-    # No flush should have happened — t1 should NOT have StreamToolOutputAvailable
-    output_events = [r for r in r2 if isinstance(r, StreamToolOutputAvailable)]
-    assert len(output_events) == 0, (
-        f"Tool-only AssistantMessage should not flush prior tools, "
-        f"but got {len(output_events)} output events"
-    )
-
-    # Both t1 and t2 should still be unresolved
-    assert "t1" not in adapter.resolved_tool_calls
-    assert "t2" not in adapter.resolved_tool_calls
-
-
-def test_text_assistant_message_flushes_prior_tools():
-    """An AssistantMessage with text (new turn) should flush unresolved tools."""
-    adapter = SDKResponseAdapter()
-
-    # Init
-    adapter.convert_message(SystemMessage(subtype="init", data={}))
-
-    # Tool call
-    msg1 = AssistantMessage(
-        content=[ToolUseBlock(id="t1", name="WebSearch", input={"q": "foo"})],
-        model="test",
-    )
-    adapter.convert_message(msg1)
-    assert adapter.has_unresolved_tool_calls
-
-    # Text AssistantMessage (new turn after tools completed)
-    msg2 = AssistantMessage(
-        content=[TextBlock(text="Here are the results")],
-        model="test",
-    )
-    r2 = adapter.convert_message(msg2)
-
-    # Flush SHOULD have happened — t1 gets empty output
-    output_events = [r for r in r2 if isinstance(r, StreamToolOutputAvailable)]
-    assert len(output_events) == 1
-    assert output_events[0].toolCallId == "t1"
-    assert "t1" in adapter.resolved_tool_calls
-
-
-def test_already_resolved_tool_skipped_in_user_message():
-    """A tool result in UserMessage should be skipped if already resolved by flush."""
-    adapter = SDKResponseAdapter()
-
-    adapter.convert_message(SystemMessage(subtype="init", data={}))
-
-    # Tool call + flush via text message
-    adapter.convert_message(
-        AssistantMessage(
-            content=[ToolUseBlock(id="t1", name="WebSearch", input={})],
-            model="test",
-        )
-    )
-    adapter.convert_message(
-        AssistantMessage(
-            content=[TextBlock(text="Done")],
-            model="test",
-        )
-    )
-    assert "t1" in adapter.resolved_tool_calls
-
-    # Now UserMessage arrives with the real result — should be skipped
-    user_msg = UserMessage(content=[ToolResultBlock(tool_use_id="t1", content="real")])
-    r = adapter.convert_message(user_msg)
-    output_events = [r_ for r_ in r if isinstance(r_, StreamToolOutputAvailable)]
-    assert (
-        len(output_events) == 0
-    ), "Already-resolved tool should not emit duplicate output"
--- a/autogpt_platform/backend/backend/api/features/chat/sdk/security_hooks.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/security_hooks.py
@@ -6,18 +6,16 @@ ensuring multi-user isolation and preventing unauthorized operations.

 import json
 import logging
+import os
 import re
 from collections.abc import Callable
 from typing import Any, cast

-from backend.copilot.context import is_allowed_local_path
-
-from .tool_adapter import (
+from backend.api.features.chat.sdk.tool_adapter import (
    BLOCKED_TOOLS,
    DANGEROUS_PATTERNS,
    MCP_TOOL_PREFIX,
    WORKSPACE_SCOPED_TOOLS,
-    stash_pending_tool_output,
 )

 logger = logging.getLogger(__name__)
@@ -39,20 +37,40 @@ def _validate_workspace_path(
 ) -> dict[str, Any]:
    """Validate that a workspace-scoped tool only accesses allowed paths.

-    Delegates to :func:`is_allowed_local_path` which permits:
+    Allowed directories:
    - The SDK working directory (``/tmp/copilot-<session>/``)
-    - The current session's tool-results directory
-      (``~/.claude/projects/<encoded-cwd>/tool-results/``)
+    - The SDK tool-results directory (``~/.claude/projects/…/tool-results/``)
    """
    path = tool_input.get("file_path") or tool_input.get("path") or ""
    if not path:
        # Glob/Grep without a path default to cwd which is already sandboxed
        return {}

-    if is_allowed_local_path(path, sdk_cwd):
+    # Resolve relative paths against sdk_cwd (the SDK sets cwd so the LLM
+    # naturally uses relative paths like "test.txt" instead of absolute ones).
+    # Tilde paths (~/) are home-dir references, not relative — expand first.
+    if path.startswith("~"):
+        resolved = os.path.realpath(os.path.expanduser(path))
+    elif not os.path.isabs(path) and sdk_cwd:
+        resolved = os.path.realpath(os.path.join(sdk_cwd, path))
+    else:
+        resolved = os.path.realpath(path)
+
+    # Allow access within the SDK working directory
+    if sdk_cwd:
+        norm_cwd = os.path.realpath(sdk_cwd)
+        if resolved.startswith(norm_cwd + os.sep) or resolved == norm_cwd:
+            return {}
+
+    # Allow access to ~/.claude/projects/*/tool-results/ (big tool results)
+    claude_dir = os.path.realpath(os.path.expanduser("~/.claude/projects"))
+    tool_results_seg = os.sep + "tool-results" + os.sep
+    if resolved.startswith(claude_dir + os.sep) and tool_results_seg in resolved:
        return {}

-    logger.warning(f"Blocked {tool_name} outside workspace: {path}")
+    logger.warning(
+        f"Blocked {tool_name} outside workspace: {path} (resolved={resolved})"
+    )
    workspace_hint = f" Allowed workspace: {sdk_cwd}" if sdk_cwd else ""
    return _deny(
        f"[SECURITY] Tool '{tool_name}' can only access files within the workspace "
@@ -105,20 +123,20 @@ def _validate_user_isolation(
    """Validate that tool calls respect user isolation."""
    # For workspace file tools, ensure path doesn't escape
    if "workspace" in tool_name.lower():
-        # The "path" param is a cloud storage key (e.g. "/ASEAN/report.md")
-        # where a leading "/" is normal.  Only check for ".." traversal.
-        # Filesystem paths (source_path, save_to_path) are validated inside
-        # the tool itself via _validate_ephemeral_path.
        path = tool_input.get("path", "") or tool_input.get("file_path", "")
-        if path and ".." in path:
-            logger.warning(f"Blocked path traversal attempt: {path} by user {user_id}")
-            return {
-                "hookSpecificOutput": {
-                    "hookEventName": "PreToolUse",
-                    "permissionDecision": "deny",
-                    "permissionDecisionReason": "Path traversal not allowed",
+        if path:
+            # Check for path traversal
+            if ".." in path or path.startswith("/"):
+                logger.warning(
+                    f"Blocked path traversal attempt: {path} by user {user_id}"
+                )
+                return {
+                    "hookSpecificOutput": {
+                        "hookEventName": "PreToolUse",
+                        "permissionDecision": "deny",
+                        "permissionDecisionReason": "Path traversal not allowed",
+                    }
                }
-            }

    return {}

@@ -127,7 +145,7 @@ def create_security_hooks(
    user_id: str | None,
    sdk_cwd: str | None = None,
    max_subtasks: int = 3,
-    on_compact: Callable[[], None] | None = None,
+    on_stop: Callable[[str, str], None] | None = None,
 ) -> dict[str, Any]:
    """Create the security hooks configuration for Claude Agent SDK.

@@ -136,12 +154,15 @@ def create_security_hooks(
    - PostToolUse: Log successful tool executions
    - PostToolUseFailure: Log and handle failed tool executions
    - PreCompact: Log context compaction events (SDK handles compaction automatically)
+    - Stop: Capture transcript path for stateless resume (when *on_stop* is provided)

    Args:
        user_id: Current user ID for isolation validation
        sdk_cwd: SDK working directory for workspace-scoped tool validation
-        max_subtasks: Maximum concurrent Task (sub-agent) spawns allowed per session
-        on_compact: Callback invoked when SDK starts compacting context.
+        max_subtasks: Maximum Task (sub-agent) spawns allowed per session
+        on_stop: Callback ``(transcript_path, sdk_session_id)`` invoked when
+            the SDK finishes processing — used to read the JSONL transcript
+            before the CLI process exits.

    Returns:
        Hooks configuration dict for ClaudeAgentOptions
@@ -150,9 +171,8 @@ def create_security_hooks(
        from claude_agent_sdk import HookMatcher
        from claude_agent_sdk.types import HookContext, HookInput, SyncHookJSONOutput

-        # Per-session tracking for Task sub-agent concurrency.
-        # Set of tool_use_ids that consumed a slot — len() is the active count.
-        task_tool_use_ids: set[str] = set()
+        # Per-session counter for Task sub-agent spawns
+        task_spawn_count = 0

        async def pre_tool_use_hook(
            input_data: HookInput,
@@ -160,34 +180,23 @@ def create_security_hooks(
            context: HookContext,
        ) -> SyncHookJSONOutput:
            """Combined pre-tool-use validation hook."""
+            nonlocal task_spawn_count
            _ = context  # unused but required by signature
            tool_name = cast(str, input_data.get("tool_name", ""))
            tool_input = cast(dict[str, Any], input_data.get("tool_input", {}))

            # Rate-limit Task (sub-agent) spawns per session
            if tool_name == "Task":
-                # Block background task execution first — denied calls
-                # should not consume a subtask slot.
-                if tool_input.get("run_in_background"):
-                    logger.info(f"[SDK] Blocked background Task, user={user_id}")
-                    return cast(
-                        SyncHookJSONOutput,
-                        _deny(
-                            "Background task execution is not supported. "
-                            "Run tasks in the foreground instead "
-                            "(remove the run_in_background parameter)."
-                        ),
-                    )
-                if len(task_tool_use_ids) >= max_subtasks:
+                task_spawn_count += 1
+                if task_spawn_count > max_subtasks:
                    logger.warning(
                        f"[SDK] Task limit reached ({max_subtasks}), user={user_id}"
                    )
                    return cast(
                        SyncHookJSONOutput,
                        _deny(
-                            f"Maximum {max_subtasks} concurrent sub-tasks. "
-                            "Wait for running sub-tasks to finish, "
-                            "or continue in the main conversation."
+                            f"Maximum {max_subtasks} sub-tasks per session. "
+                            "Please continue in the main conversation."
                        ),
                    )

@@ -207,68 +216,18 @@ def create_security_hooks(
            if result:
                return cast(SyncHookJSONOutput, result)

-            # Reserve the Task slot only after all validations pass
-            if tool_name == "Task" and tool_use_id is not None:
-                task_tool_use_ids.add(tool_use_id)
-
            logger.debug(f"[SDK] Tool start: {tool_name}, user={user_id}")
            return cast(SyncHookJSONOutput, {})

-        def _release_task_slot(tool_name: str, tool_use_id: str | None) -> None:
-            """Release a Task concurrency slot if one was reserved."""
-            if tool_name == "Task" and tool_use_id in task_tool_use_ids:
-                task_tool_use_ids.discard(tool_use_id)
-                logger.info(
-                    "[SDK] Task slot released, active=%d/%d, user=%s",
-                    len(task_tool_use_ids),
-                    max_subtasks,
-                    user_id,
-                )
-
        async def post_tool_use_hook(
            input_data: HookInput,
            tool_use_id: str | None,
            context: HookContext,
        ) -> SyncHookJSONOutput:
-            """Log successful tool executions and stash SDK built-in tool outputs.
-
-            MCP tools stash their output in ``_execute_tool_sync`` before the
-            SDK can truncate it.  SDK built-in tools (WebSearch, Read, etc.)
-            are executed by the CLI internally — this hook captures their
-            output so the response adapter can forward it to the frontend.
-            """
+            """Log successful tool executions for observability."""
            _ = context
            tool_name = cast(str, input_data.get("tool_name", ""))
-
-            _release_task_slot(tool_name, tool_use_id)
-            is_builtin = not tool_name.startswith(MCP_TOOL_PREFIX)
-            logger.info(
-                "[SDK] PostToolUse: %s (builtin=%s, tool_use_id=%s)",
-                tool_name,
-                is_builtin,
-                (tool_use_id or "")[:12],
-            )
-
-            # Stash output for SDK built-in tools so the response adapter can
-            # emit StreamToolOutputAvailable even when the CLI doesn't surface
-            # a separate UserMessage with ToolResultBlock content.
-            if is_builtin:
-                tool_response = input_data.get("tool_response")
-                if tool_response is not None:
-                    resp_preview = str(tool_response)[:100]
-                    logger.info(
-                        "[SDK] Stashing builtin output for %s (%d chars): %s...",
-                        tool_name,
-                        len(str(tool_response)),
-                        resp_preview,
-                    )
-                    stash_pending_tool_output(tool_name, tool_response)
-                else:
-                    logger.warning(
-                        "[SDK] PostToolUse for builtin %s but tool_response is None",
-                        tool_name,
-                    )
-
+            logger.debug(f"[SDK] Tool success: {tool_name}, tool_use_id={tool_use_id}")
            return cast(SyncHookJSONOutput, {})

        async def post_tool_failure_hook(
@@ -284,9 +243,6 @@ def create_security_hooks(
                f"[SDK] Tool failed: {tool_name}, error={error}, "
                f"user={user_id}, tool_use_id={tool_use_id}"
            )
-
-            _release_task_slot(tool_name, tool_use_id)
-
            return cast(SyncHookJSONOutput, {})

        async def pre_compact_hook(
@@ -304,8 +260,30 @@ def create_security_hooks(
            logger.info(
                f"[SDK] Context compaction triggered: {trigger}, user={user_id}"
            )
-            if on_compact is not None:
-                on_compact()
+            return cast(SyncHookJSONOutput, {})
+
+        # --- Stop hook: capture transcript path for stateless resume ---
+        async def stop_hook(
+            input_data: HookInput,
+            tool_use_id: str | None,
+            context: HookContext,
+        ) -> SyncHookJSONOutput:
+            """Capture transcript path when SDK finishes processing.
+
+            The Stop hook fires while the CLI process is still alive, giving us
+            a reliable window to read the JSONL transcript before SIGTERM.
+            """
+            _ = context, tool_use_id
+            transcript_path = cast(str, input_data.get("transcript_path", ""))
+            sdk_session_id = cast(str, input_data.get("session_id", ""))
+
+            if transcript_path and on_stop:
+                logger.info(
+                    f"[SDK] Stop hook: transcript_path={transcript_path}, "
+                    f"sdk_session_id={sdk_session_id[:12]}..."
+                )
+                on_stop(transcript_path, sdk_session_id)
+
            return cast(SyncHookJSONOutput, {})

        hooks: dict[str, Any] = {
@@ -317,6 +295,9 @@ def create_security_hooks(
            "PreCompact": [HookMatcher(matcher="*", hooks=[pre_compact_hook])],
        }

+        if on_stop is not None:
+            hooks["Stop"] = [HookMatcher(matcher=None, hooks=[stop_hook])]
+
        return hooks
    except ImportError:
        # Fallback for when SDK isn't available - return empty hooks
--- a/autogpt_platform/backend/backend/api/features/chat/sdk/security_hooks_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/security_hooks_test.py
@@ -0,0 +1,165 @@
+"""Unit tests for SDK security hooks."""
+
+import os
+
+from .security_hooks import _validate_tool_access, _validate_user_isolation
+
+SDK_CWD = "/tmp/copilot-abc123"
+
+
+def _is_denied(result: dict) -> bool:
+    hook = result.get("hookSpecificOutput", {})
+    return hook.get("permissionDecision") == "deny"
+
+
+# -- Blocked tools -----------------------------------------------------------
+
+
+def test_blocked_tools_denied():
+    for tool in ("bash", "shell", "exec", "terminal", "command"):
+        result = _validate_tool_access(tool, {})
+        assert _is_denied(result), f"{tool} should be blocked"
+
+
+def test_unknown_tool_allowed():
+    result = _validate_tool_access("SomeCustomTool", {})
+    assert result == {}
+
+
+# -- Workspace-scoped tools --------------------------------------------------
+
+
+def test_read_within_workspace_allowed():
+    result = _validate_tool_access(
+        "Read", {"file_path": f"{SDK_CWD}/file.txt"}, sdk_cwd=SDK_CWD
+    )
+    assert result == {}
+
+
+def test_write_within_workspace_allowed():
+    result = _validate_tool_access(
+        "Write", {"file_path": f"{SDK_CWD}/output.json"}, sdk_cwd=SDK_CWD
+    )
+    assert result == {}
+
+
+def test_edit_within_workspace_allowed():
+    result = _validate_tool_access(
+        "Edit", {"file_path": f"{SDK_CWD}/src/main.py"}, sdk_cwd=SDK_CWD
+    )
+    assert result == {}
+
+
+def test_glob_within_workspace_allowed():
+    result = _validate_tool_access("Glob", {"path": f"{SDK_CWD}/src"}, sdk_cwd=SDK_CWD)
+    assert result == {}
+
+
+def test_grep_within_workspace_allowed():
+    result = _validate_tool_access("Grep", {"path": f"{SDK_CWD}/src"}, sdk_cwd=SDK_CWD)
+    assert result == {}
+
+
+def test_read_outside_workspace_denied():
+    result = _validate_tool_access(
+        "Read", {"file_path": "/etc/passwd"}, sdk_cwd=SDK_CWD
+    )
+    assert _is_denied(result)
+
+
+def test_write_outside_workspace_denied():
+    result = _validate_tool_access(
+        "Write", {"file_path": "/home/user/secrets.txt"}, sdk_cwd=SDK_CWD
+    )
+    assert _is_denied(result)
+
+
+def test_traversal_attack_denied():
+    result = _validate_tool_access(
+        "Read",
+        {"file_path": f"{SDK_CWD}/../../etc/passwd"},
+        sdk_cwd=SDK_CWD,
+    )
+    assert _is_denied(result)
+
+
+def test_no_path_allowed():
+    """Glob/Grep without a path argument defaults to cwd — should pass."""
+    result = _validate_tool_access("Glob", {}, sdk_cwd=SDK_CWD)
+    assert result == {}
+
+
+def test_read_no_cwd_denies_absolute():
+    """If no sdk_cwd is set, absolute paths are denied."""
+    result = _validate_tool_access("Read", {"file_path": "/tmp/anything"})
+    assert _is_denied(result)
+
+
+# -- Tool-results directory --------------------------------------------------
+
+
+def test_read_tool_results_allowed():
+    home = os.path.expanduser("~")
+    path = f"{home}/.claude/projects/-tmp-copilot-abc123/tool-results/12345.txt"
+    result = _validate_tool_access("Read", {"file_path": path}, sdk_cwd=SDK_CWD)
+    assert result == {}
+
+
+def test_read_claude_projects_without_tool_results_denied():
+    home = os.path.expanduser("~")
+    path = f"{home}/.claude/projects/-tmp-copilot-abc123/settings.json"
+    result = _validate_tool_access("Read", {"file_path": path}, sdk_cwd=SDK_CWD)
+    assert _is_denied(result)
+
+
+# -- Built-in Bash is blocked (use bash_exec MCP tool instead) ---------------
+
+
+def test_bash_builtin_always_blocked():
+    """SDK built-in Bash is blocked — bash_exec MCP tool with bubblewrap is used instead."""
+    result = _validate_tool_access("Bash", {"command": "echo hello"}, sdk_cwd=SDK_CWD)
+    assert _is_denied(result)
+
+
+# -- Dangerous patterns ------------------------------------------------------
+
+
+def test_dangerous_pattern_blocked():
+    result = _validate_tool_access("SomeTool", {"cmd": "sudo rm -rf /"})
+    assert _is_denied(result)
+
+
+def test_subprocess_pattern_blocked():
+    result = _validate_tool_access("SomeTool", {"code": "subprocess.run(...)"})
+    assert _is_denied(result)
+
+
+# -- User isolation ----------------------------------------------------------
+
+
+def test_workspace_path_traversal_blocked():
+    result = _validate_user_isolation(
+        "workspace_read", {"path": "../../../etc/shadow"}, user_id="user-1"
+    )
+    assert _is_denied(result)
+
+
+def test_workspace_absolute_path_blocked():
+    result = _validate_user_isolation(
+        "workspace_read", {"path": "/etc/passwd"}, user_id="user-1"
+    )
+    assert _is_denied(result)
+
+
+def test_workspace_normal_path_allowed():
+    result = _validate_user_isolation(
+        "workspace_read", {"path": "src/main.py"}, user_id="user-1"
+    )
+    assert result == {}
+
+
+def test_non_workspace_tool_passes_isolation():
+    result = _validate_user_isolation(
+        "find_agent", {"query": "email"}, user_id="user-1"
+    )
+    assert result == {}
--- a/autogpt_platform/backend/backend/api/features/chat/sdk/service.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/service.py
@@ -0,0 +1,752 @@
+"""Claude Agent SDK service layer for CoPilot chat completions."""
+
+import asyncio
+import json
+import logging
+import os
+import uuid
+from collections.abc import AsyncGenerator
+from dataclasses import dataclass
+from typing import Any
+
+from backend.util.exceptions import NotFoundError
+
+from .. import stream_registry
+from ..config import ChatConfig
+from ..model import (
+    ChatMessage,
+    ChatSession,
+    get_chat_session,
+    update_session_title,
+    upsert_chat_session,
+)
+from ..response_model import (
+    StreamBaseResponse,
+    StreamError,
+    StreamFinish,
+    StreamStart,
+    StreamTextDelta,
+    StreamToolInputAvailable,
+    StreamToolOutputAvailable,
+)
+from ..service import (
+    _build_system_prompt,
+    _execute_long_running_tool_with_streaming,
+    _generate_session_title,
+)
+from ..tools.models import OperationPendingResponse, OperationStartedResponse
+from ..tools.sandbox import WORKSPACE_PREFIX, make_session_path
+from ..tracking import track_user_message
+from .response_adapter import SDKResponseAdapter
+from .security_hooks import create_security_hooks
+from .tool_adapter import (
+    COPILOT_TOOL_NAMES,
+    SDK_DISALLOWED_TOOLS,
+    LongRunningCallback,
+    create_copilot_mcp_server,
+    set_execution_context,
+)
+from .transcript import (
+    download_transcript,
+    read_transcript_file,
+    upload_transcript,
+    validate_transcript,
+    write_transcript_to_tempfile,
+)
+
+logger = logging.getLogger(__name__)
+config = ChatConfig()
+
+# Set to hold background tasks to prevent garbage collection
+_background_tasks: set[asyncio.Task[Any]] = set()
+
+
+@dataclass
+class CapturedTranscript:
+    """Info captured by the SDK Stop hook for stateless --resume."""
+
+    path: str = ""
+    sdk_session_id: str = ""
+
+    @property
+    def available(self) -> bool:
+        return bool(self.path)
+
+
+_SDK_CWD_PREFIX = WORKSPACE_PREFIX
+
+# Appended to the system prompt to inform the agent about available tools.
+# The SDK built-in Bash is NOT available — use mcp__copilot__bash_exec instead,
+# which has kernel-level network isolation (unshare --net).
+_SDK_TOOL_SUPPLEMENT = """
+
+## Tool notes
+
+- The SDK built-in Bash tool is NOT available.  Use the `bash_exec` MCP tool
+  for shell commands — it runs in a network-isolated sandbox.
+- **Shared workspace**: The SDK Read/Write tools and `bash_exec` share the
+  same working directory. Files created by one are readable by the other.
+  These files are **ephemeral** — they exist only for the current session.
+- **Persistent storage**: Use `write_workspace_file` / `read_workspace_file`
+  for files that should persist across sessions (stored in cloud storage).
+- Long-running tools (create_agent, edit_agent, etc.) are handled
+  asynchronously.  You will receive an immediate response; the actual result
+  is delivered to the user via a background stream.
+"""
+
+
+def _build_long_running_callback(user_id: str | None) -> LongRunningCallback:
+    """Build a callback that delegates long-running tools to the non-SDK infrastructure.
+
+    Long-running tools (create_agent, edit_agent, etc.) are delegated to the
+    existing background infrastructure: stream_registry (Redis Streams),
+    database persistence, and SSE reconnection.  This means results survive
+    page refreshes / pod restarts, and the frontend shows the proper loading
+    widget with progress updates.
+
+    The returned callback matches the ``LongRunningCallback`` signature:
+    ``(tool_name, args, session) -> MCP response dict``.
+    """
+
+    async def _callback(
+        tool_name: str, args: dict[str, Any], session: ChatSession
+    ) -> dict[str, Any]:
+        operation_id = str(uuid.uuid4())
+        task_id = str(uuid.uuid4())
+        tool_call_id = f"sdk-{uuid.uuid4().hex[:12]}"
+        session_id = session.session_id
+
+        # --- Build user-friendly messages (matches non-SDK service) ---
+        if tool_name == "create_agent":
+            desc = args.get("description", "")
+            desc_preview = (desc[:100] + "...") if len(desc) > 100 else desc
+            pending_msg = (
+                f"Creating your agent: {desc_preview}"
+                if desc_preview
+                else "Creating agent... This may take a few minutes."
+            )
+            started_msg = (
+                "Agent creation started. You can close this tab - "
+                "check your library in a few minutes."
+            )
+        elif tool_name == "edit_agent":
+            changes = args.get("changes", "")
+            changes_preview = (changes[:100] + "...") if len(changes) > 100 else changes
+            pending_msg = (
+                f"Editing agent: {changes_preview}"
+                if changes_preview
+                else "Editing agent... This may take a few minutes."
+            )
+            started_msg = (
+                "Agent edit started. You can close this tab - "
+                "check your library in a few minutes."
+            )
+        else:
+            pending_msg = f"Running {tool_name}... This may take a few minutes."
+            started_msg = (
+                f"{tool_name} started. You can close this tab - "
+                "check back in a few minutes."
+            )
+
+        # --- Register task in Redis for SSE reconnection ---
+        await stream_registry.create_task(
+            task_id=task_id,
+            session_id=session_id,
+            user_id=user_id,
+            tool_call_id=tool_call_id,
+            tool_name=tool_name,
+            operation_id=operation_id,
+        )
+
+        # --- Save OperationPendingResponse to chat history ---
+        pending_message = ChatMessage(
+            role="tool",
+            content=OperationPendingResponse(
+                message=pending_msg,
+                operation_id=operation_id,
+                tool_name=tool_name,
+            ).model_dump_json(),
+            tool_call_id=tool_call_id,
+        )
+        session.messages.append(pending_message)
+        await upsert_chat_session(session)
+
+        # --- Spawn background task (reuses non-SDK infrastructure) ---
+        bg_task = asyncio.create_task(
+            _execute_long_running_tool_with_streaming(
+                tool_name=tool_name,
+                parameters=args,
+                tool_call_id=tool_call_id,
+                operation_id=operation_id,
+                task_id=task_id,
+                session_id=session_id,
+                user_id=user_id,
+            )
+        )
+        _background_tasks.add(bg_task)
+        bg_task.add_done_callback(_background_tasks.discard)
+        await stream_registry.set_task_asyncio_task(task_id, bg_task)
+
+        logger.info(
+            f"[SDK] Long-running tool {tool_name} delegated to background "
+            f"(operation_id={operation_id}, task_id={task_id})"
+        )
+
+        # --- Return OperationStartedResponse as MCP tool result ---
+        # This flows through SDK → response adapter → frontend, triggering
+        # the loading widget with SSE reconnection support.
+        started_json = OperationStartedResponse(
+            message=started_msg,
+            operation_id=operation_id,
+            tool_name=tool_name,
+            task_id=task_id,
+        ).model_dump_json()
+
+        return {
+            "content": [{"type": "text", "text": started_json}],
+            "isError": False,
+        }
+
+    return _callback
+
+
+def _resolve_sdk_model() -> str | None:
+    """Resolve the model name for the Claude Agent SDK CLI.
+
+    Uses ``config.claude_agent_model`` if set, otherwise derives from
+    ``config.model`` by stripping the OpenRouter provider prefix (e.g.,
+    ``"anthropic/claude-opus-4.6"`` → ``"claude-opus-4.6"``).
+    """
+    if config.claude_agent_model:
+        return config.claude_agent_model
+    model = config.model
+    if "/" in model:
+        return model.split("/", 1)[1]
+    return model
+
+
+def _build_sdk_env() -> dict[str, str]:
+    """Build env vars for the SDK CLI process.
+
+    Routes API calls through OpenRouter (or a custom base_url) using
+    the same ``config.api_key`` / ``config.base_url`` as the non-SDK path.
+    This gives per-call token and cost tracking on the OpenRouter dashboard.
+
+    Only overrides ``ANTHROPIC_API_KEY`` when a valid proxy URL and auth
+    token are both present — otherwise returns an empty dict so the SDK
+    falls back to its default credentials.
+    """
+    env: dict[str, str] = {}
+    if config.api_key and config.base_url:
+        # Strip /v1 suffix — SDK expects the base URL without a version path
+        base = config.base_url.rstrip("/")
+        if base.endswith("/v1"):
+            base = base[:-3]
+        if not base or not base.startswith("http"):
+            # Invalid base_url — don't override SDK defaults
+            return env
+        env["ANTHROPIC_BASE_URL"] = base
+        env["ANTHROPIC_AUTH_TOKEN"] = config.api_key
+        # Must be explicitly empty so the CLI uses AUTH_TOKEN instead
+        env["ANTHROPIC_API_KEY"] = ""
+    return env
+
+
+def _make_sdk_cwd(session_id: str) -> str:
+    """Create a safe, session-specific working directory path.
+
+    Delegates to :func:`~backend.api.features.chat.tools.sandbox.make_session_path`
+    (single source of truth for path sanitization) and adds a defence-in-depth
+    assertion.
+    """
+    cwd = make_session_path(session_id)
+    # Defence-in-depth: normpath + startswith is a CodeQL-recognised sanitizer
+    cwd = os.path.normpath(cwd)
+    if not cwd.startswith(_SDK_CWD_PREFIX):
+        raise ValueError(f"SDK cwd escaped prefix: {cwd}")
+    return cwd
+
+
+def _cleanup_sdk_tool_results(cwd: str) -> None:
+    """Remove SDK tool-result files for a specific session working directory.
+
+    The SDK creates tool-result files under ~/.claude/projects/<encoded-cwd>/tool-results/.
+    We clean only the specific cwd's results to avoid race conditions between
+    concurrent sessions.
+
+    Security: cwd MUST be created by _make_sdk_cwd() which sanitizes session_id.
+    """
+    import shutil
+
+    # Validate cwd is under the expected prefix
+    normalized = os.path.normpath(cwd)
+    if not normalized.startswith(_SDK_CWD_PREFIX):
+        logger.warning(f"[SDK] Rejecting cleanup for path outside workspace: {cwd}")
+        return
+
+    # SDK encodes the cwd path by replacing '/' with '-'
+    encoded_cwd = normalized.replace("/", "-")
+
+    # Construct the project directory path (known-safe home expansion)
+    claude_projects = os.path.expanduser("~/.claude/projects")
+    project_dir = os.path.join(claude_projects, encoded_cwd)
+
+    # Security check 3: Validate project_dir is under ~/.claude/projects
+    project_dir = os.path.normpath(project_dir)
+    if not project_dir.startswith(claude_projects):
+        logger.warning(
+            f"[SDK] Rejecting cleanup for escaped project path: {project_dir}"
+        )
+        return
+
+    results_dir = os.path.join(project_dir, "tool-results")
+    if os.path.isdir(results_dir):
+        for filename in os.listdir(results_dir):
+            file_path = os.path.join(results_dir, filename)
+            try:
+                if os.path.isfile(file_path):
+                    os.remove(file_path)
+            except OSError:
+                pass
+
+    # Also clean up the temp cwd directory itself
+    try:
+        shutil.rmtree(normalized, ignore_errors=True)
+    except OSError:
+        pass
+
+
+async def _compress_conversation_history(
+    session: ChatSession,
+) -> list[ChatMessage]:
+    """Compress prior conversation messages if they exceed the token threshold.
+
+    Uses the shared compress_context() from prompt.py which supports:
+    - LLM summarization of old messages (keeps recent ones intact)
+    - Progressive content truncation as fallback
+    - Middle-out deletion as last resort
+
+    Returns the compressed prior messages (everything except the current message).
+    """
+    prior = session.messages[:-1]
+    if len(prior) < 2:
+        return prior
+
+    from backend.util.prompt import compress_context
+
+    # Convert ChatMessages to dicts for compress_context
+    messages_dict = []
+    for msg in prior:
+        msg_dict: dict[str, Any] = {"role": msg.role}
+        if msg.content:
+            msg_dict["content"] = msg.content
+        if msg.tool_calls:
+            msg_dict["tool_calls"] = msg.tool_calls
+        if msg.tool_call_id:
+            msg_dict["tool_call_id"] = msg.tool_call_id
+        messages_dict.append(msg_dict)
+
+    try:
+        import openai
+
+        async with openai.AsyncOpenAI(
+            api_key=config.api_key, base_url=config.base_url, timeout=30.0
+        ) as client:
+            result = await compress_context(
+                messages=messages_dict,
+                model=config.model,
+                client=client,
+            )
+    except Exception as e:
+        logger.warning(f"[SDK] Context compression with LLM failed: {e}")
+        # Fall back to truncation-only (no LLM summarization)
+        result = await compress_context(
+            messages=messages_dict,
+            model=config.model,
+            client=None,
+        )
+
+    if result.was_compacted:
+        logger.info(
+            f"[SDK] Context compacted: {result.original_token_count} -> "
+            f"{result.token_count} tokens "
+            f"({result.messages_summarized} summarized, "
+            f"{result.messages_dropped} dropped)"
+        )
+        # Convert compressed dicts back to ChatMessages
+        return [
+            ChatMessage(
+                role=m["role"],
+                content=m.get("content"),
+                tool_calls=m.get("tool_calls"),
+                tool_call_id=m.get("tool_call_id"),
+            )
+            for m in result.messages
+        ]
+
+    return prior
+
+
+def _format_conversation_context(messages: list[ChatMessage]) -> str | None:
+    """Format conversation messages into a context prefix for the user message.
+
+    Returns a string like:
+        <conversation_history>
+        User: hello
+        You responded: Hi! How can I help?
+        </conversation_history>
+
+    Returns None if there are no messages to format.
+    """
+    if not messages:
+        return None
+
+    lines: list[str] = []
+    for msg in messages:
+        if not msg.content:
+            continue
+        if msg.role == "user":
+            lines.append(f"User: {msg.content}")
+        elif msg.role == "assistant":
+            lines.append(f"You responded: {msg.content}")
+        # Skip tool messages — they're internal details
+
+    if not lines:
+        return None
+
+    return "<conversation_history>\n" + "\n".join(lines) + "\n</conversation_history>"
+
+
+async def stream_chat_completion_sdk(
+    session_id: str,
+    message: str | None = None,
+    tool_call_response: str | None = None,  # noqa: ARG001
+    is_user_message: bool = True,
+    user_id: str | None = None,
+    retry_count: int = 0,  # noqa: ARG001
+    session: ChatSession | None = None,
+    context: dict[str, str] | None = None,  # noqa: ARG001
+) -> AsyncGenerator[StreamBaseResponse, None]:
+    """Stream chat completion using Claude Agent SDK.
+
+    Drop-in replacement for stream_chat_completion with improved reliability.
+    """
+
+    if session is None:
+        session = await get_chat_session(session_id, user_id)
+
+    if not session:
+        raise NotFoundError(
+            f"Session {session_id} not found. Please create a new session first."
+        )
+
+    if message:
+        session.messages.append(
+            ChatMessage(
+                role="user" if is_user_message else "assistant", content=message
+            )
+        )
+        if is_user_message:
+            track_user_message(
+                user_id=user_id, session_id=session_id, message_length=len(message)
+            )
+
+    session = await upsert_chat_session(session)
+
+    # Generate title for new sessions (first user message)
+    if is_user_message and not session.title:
+        user_messages = [m for m in session.messages if m.role == "user"]
+        if len(user_messages) == 1:
+            first_message = user_messages[0].content or message or ""
+            if first_message:
+                task = asyncio.create_task(
+                    _update_title_async(session_id, first_message, user_id)
+                )
+                _background_tasks.add(task)
+                task.add_done_callback(_background_tasks.discard)
+
+    # Build system prompt (reuses non-SDK path with Langfuse support)
+    has_history = len(session.messages) > 1
+    system_prompt, _ = await _build_system_prompt(
+        user_id, has_conversation_history=has_history
+    )
+    system_prompt += _SDK_TOOL_SUPPLEMENT
+    message_id = str(uuid.uuid4())
+    task_id = str(uuid.uuid4())
+
+    yield StreamStart(messageId=message_id, taskId=task_id)
+
+    stream_completed = False
+    # Initialise sdk_cwd before the try so the finally can reference it
+    # even if _make_sdk_cwd raises (in that case it stays as "").
+    sdk_cwd = ""
+    use_resume = False
+
+    try:
+        # Use a session-specific temp dir to avoid cleanup race conditions
+        # between concurrent sessions.
+        sdk_cwd = _make_sdk_cwd(session_id)
+        os.makedirs(sdk_cwd, exist_ok=True)
+
+        set_execution_context(
+            user_id,
+            session,
+            long_running_callback=_build_long_running_callback(user_id),
+        )
+        try:
+            from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
+
+            # Fail fast when no API credentials are available at all
+            sdk_env = _build_sdk_env()
+            if not sdk_env and not os.environ.get("ANTHROPIC_API_KEY"):
+                raise RuntimeError(
+                    "No API key configured. Set OPEN_ROUTER_API_KEY "
+                    "(or CHAT_API_KEY) for OpenRouter routing, "
+                    "or ANTHROPIC_API_KEY for direct Anthropic access."
+                )
+
+            mcp_server = create_copilot_mcp_server()
+
+            sdk_model = _resolve_sdk_model()
+
+            # --- Transcript capture via Stop hook ---
+            captured_transcript = CapturedTranscript()
+
+            def _on_stop(transcript_path: str, sdk_session_id: str) -> None:
+                captured_transcript.path = transcript_path
+                captured_transcript.sdk_session_id = sdk_session_id
+
+            security_hooks = create_security_hooks(
+                user_id,
+                sdk_cwd=sdk_cwd,
+                max_subtasks=config.claude_agent_max_subtasks,
+                on_stop=_on_stop if config.claude_agent_use_resume else None,
+            )
+
+            # --- Resume strategy: download transcript from bucket ---
+            resume_file: str | None = None
+            use_resume = False
+
+            if config.claude_agent_use_resume and user_id and len(session.messages) > 1:
+                transcript_content = await download_transcript(user_id, session_id)
+                if transcript_content and validate_transcript(transcript_content):
+                    resume_file = write_transcript_to_tempfile(
+                        transcript_content, session_id, sdk_cwd
+                    )
+                    if resume_file:
+                        use_resume = True
+                        logger.info(
+                            f"[SDK] Using --resume with transcript "
+                            f"({len(transcript_content)} bytes)"
+                        )
+
+            sdk_options_kwargs: dict[str, Any] = {
+                "system_prompt": system_prompt,
+                "mcp_servers": {"copilot": mcp_server},
+                "allowed_tools": COPILOT_TOOL_NAMES,
+                "disallowed_tools": SDK_DISALLOWED_TOOLS,
+                "hooks": security_hooks,
+                "cwd": sdk_cwd,
+                "max_buffer_size": config.claude_agent_max_buffer_size,
+            }
+            if sdk_env:
+                sdk_options_kwargs["model"] = sdk_model
+                sdk_options_kwargs["env"] = sdk_env
+            if use_resume and resume_file:
+                sdk_options_kwargs["resume"] = resume_file
+
+            options = ClaudeAgentOptions(**sdk_options_kwargs)  # type: ignore[arg-type]
+
+            adapter = SDKResponseAdapter(message_id=message_id)
+            adapter.set_task_id(task_id)
+
+            async with ClaudeSDKClient(options=options) as client:
+                current_message = message or ""
+                if not current_message and session.messages:
+                    last_user = [m for m in session.messages if m.role == "user"]
+                    if last_user:
+                        current_message = last_user[-1].content or ""
+
+                if not current_message.strip():
+                    yield StreamError(
+                        errorText="Message cannot be empty.",
+                        code="empty_prompt",
+                    )
+                    yield StreamFinish()
+                    return
+
+                # Build query: with --resume the CLI already has full
+                # context, so we only send the new message.  Without
+                # resume, compress history into a context prefix.
+                query_message = current_message
+                if not use_resume and len(session.messages) > 1:
+                    logger.warning(
+                        f"[SDK] Using compression fallback for session "
+                        f"{session_id} ({len(session.messages)} messages) — "
+                        f"no transcript available for --resume"
+                    )
+                    compressed = await _compress_conversation_history(session)
+                    history_context = _format_conversation_context(compressed)
+                    if history_context:
+                        query_message = (
+                            f"{history_context}\n\n"
+                            f"Now, the user says:\n{current_message}"
+                        )
+
+                logger.info(
+                    f"[SDK] Sending query ({len(session.messages)} msgs in session)"
+                )
+                logger.debug(f"[SDK] Query preview: {current_message[:80]!r}")
+                await client.query(query_message, session_id=session_id)
+
+                assistant_response = ChatMessage(role="assistant", content="")
+                accumulated_tool_calls: list[dict[str, Any]] = []
+                has_appended_assistant = False
+                has_tool_results = False
+
+                async for sdk_msg in client.receive_messages():
+                    logger.debug(
+                        f"[SDK] Received: {type(sdk_msg).__name__} "
+                        f"{getattr(sdk_msg, 'subtype', '')}"
+                    )
+                    for response in adapter.convert_message(sdk_msg):
+                        if isinstance(response, StreamStart):
+                            continue
+
+                        yield response
+
+                        if isinstance(response, StreamTextDelta):
+                            delta = response.delta or ""
+                            # After tool results, start a new assistant
+                            # message for the post-tool text.
+                            if has_tool_results and has_appended_assistant:
+                                assistant_response = ChatMessage(
+                                    role="assistant", content=delta
+                                )
+                                accumulated_tool_calls = []
+                                has_appended_assistant = False
+                                has_tool_results = False
+                                session.messages.append(assistant_response)
+                                has_appended_assistant = True
+                            else:
+                                assistant_response.content = (
+                                    assistant_response.content or ""
+                                ) + delta
+                                if not has_appended_assistant:
+                                    session.messages.append(assistant_response)
+                                    has_appended_assistant = True
+
+                        elif isinstance(response, StreamToolInputAvailable):
+                            accumulated_tool_calls.append(
+                                {
+                                    "id": response.toolCallId,
+                                    "type": "function",
+                                    "function": {
+                                        "name": response.toolName,
+                                        "arguments": json.dumps(response.input or {}),
+                                    },
+                                }
+                            )
+                            assistant_response.tool_calls = accumulated_tool_calls
+                            if not has_appended_assistant:
+                                session.messages.append(assistant_response)
+                                has_appended_assistant = True
+
+                        elif isinstance(response, StreamToolOutputAvailable):
+                            session.messages.append(
+                                ChatMessage(
+                                    role="tool",
+                                    content=(
+                                        response.output
+                                        if isinstance(response.output, str)
+                                        else str(response.output)
+                                    ),
+                                    tool_call_id=response.toolCallId,
+                                )
+                            )
+                            has_tool_results = True
+
+                        elif isinstance(response, StreamFinish):
+                            stream_completed = True
+
+                    if stream_completed:
+                        break
+
+                if (
+                    assistant_response.content or assistant_response.tool_calls
+                ) and not has_appended_assistant:
+                    session.messages.append(assistant_response)
+
+                # --- Capture transcript while CLI is still alive ---
+                # Must happen INSIDE async with: close() sends SIGTERM
+                # which kills the CLI before it can flush the JSONL.
+                if (
+                    config.claude_agent_use_resume
+                    and user_id
+                    and captured_transcript.available
+                ):
+                    # Give CLI time to flush JSONL writes before we read
+                    await asyncio.sleep(0.5)
+                    raw_transcript = read_transcript_file(captured_transcript.path)
+                    if raw_transcript:
+                        task = asyncio.create_task(
+                            _upload_transcript_bg(user_id, session_id, raw_transcript)
+                        )
+                        _background_tasks.add(task)
+                        task.add_done_callback(_background_tasks.discard)
+                    else:
+                        logger.debug("[SDK] Stop hook fired but transcript not usable")
+
+        except ImportError:
+            raise RuntimeError(
+                "claude-agent-sdk is not installed. "
+                "Disable SDK mode (CHAT_USE_CLAUDE_AGENT_SDK=false) "
+                "to use the OpenAI-compatible fallback."
+            )
+
+        await upsert_chat_session(session)
+        logger.debug(
+            f"[SDK] Session {session_id} saved with {len(session.messages)} messages"
+        )
+        if not stream_completed:
+            yield StreamFinish()
+
+    except Exception as e:
+        logger.error(f"[SDK] Error: {e}", exc_info=True)
+        try:
+            await upsert_chat_session(session)
+        except Exception as save_err:
+            logger.error(f"[SDK] Failed to save session on error: {save_err}")
+        yield StreamError(
+            errorText="An error occurred. Please try again.",
+            code="sdk_error",
+        )
+        yield StreamFinish()
+    finally:
+        if sdk_cwd:
+            _cleanup_sdk_tool_results(sdk_cwd)
+
+
+async def _upload_transcript_bg(
+    user_id: str, session_id: str, raw_content: str
+) -> None:
+    """Background task to strip progress entries and upload transcript."""
+    try:
+        await upload_transcript(user_id, session_id, raw_content)
+    except Exception as e:
+        logger.error(f"[SDK] Failed to upload transcript for {session_id}: {e}")
+
+
+async def _update_title_async(
+    session_id: str, message: str, user_id: str | None = None
+) -> None:
+    """Background task to update session title."""
+    try:
+        title = await _generate_session_title(
+            message, user_id=user_id, session_id=session_id
+        )
+        if title:
+            await update_session_title(session_id, title)
+            logger.debug(f"[SDK] Generated title for {session_id}: {title}")
+    except Exception as e:
+        logger.warning(f"[SDK] Failed to update session title: {e}")
--- a/autogpt_platform/backend/backend/api/features/chat/sdk/tool_adapter.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/tool_adapter.py
@@ -0,0 +1,363 @@
+"""Tool adapter for wrapping existing CoPilot tools as Claude Agent SDK MCP tools.
+
+This module provides the adapter layer that converts existing BaseTool implementations
+into in-process MCP tools that can be used with the Claude Agent SDK.
+
+Long-running tools (``is_long_running=True``) are delegated to the non-SDK
+background infrastructure (stream_registry, Redis persistence, SSE reconnection)
+via a callback provided by the service layer.  This avoids wasteful SDK polling
+and makes results survive page refreshes.
+"""
+
+import itertools
+import json
+import logging
+import os
+import uuid
+from collections.abc import Awaitable, Callable
+from contextvars import ContextVar
+from typing import Any
+
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools import TOOL_REGISTRY
+from backend.api.features.chat.tools.base import BaseTool
+
+logger = logging.getLogger(__name__)
+
+# Allowed base directory for the Read tool (SDK saves oversized tool results here).
+# Restricted to ~/.claude/projects/ and further validated to require "tool-results"
+# in the path — prevents reading settings, credentials, or other sensitive files.
+_SDK_PROJECTS_DIR = os.path.expanduser("~/.claude/projects/")
+
+# MCP server naming - the SDK prefixes tool names as "mcp__{server_name}__{tool}"
+MCP_SERVER_NAME = "copilot"
+MCP_TOOL_PREFIX = f"mcp__{MCP_SERVER_NAME}__"
+
+# Context variables to pass user/session info to tool execution
+_current_user_id: ContextVar[str | None] = ContextVar("current_user_id", default=None)
+_current_session: ContextVar[ChatSession | None] = ContextVar(
+    "current_session", default=None
+)
+# Stash for MCP tool outputs before the SDK potentially truncates them.
+# Keyed by tool_name → full output string. Consumed (popped) by the
+# response adapter when it builds StreamToolOutputAvailable.
+_pending_tool_outputs: ContextVar[dict[str, str]] = ContextVar(
+    "pending_tool_outputs", default=None  # type: ignore[arg-type]
+)
+
+# Callback type for delegating long-running tools to the non-SDK infrastructure.
+# Args: (tool_name, arguments, session) → MCP-formatted response dict.
+LongRunningCallback = Callable[
+    [str, dict[str, Any], ChatSession], Awaitable[dict[str, Any]]
+]
+
+# ContextVar so the service layer can inject the callback per-request.
+_long_running_callback: ContextVar[LongRunningCallback | None] = ContextVar(
+    "long_running_callback", default=None
+)
+
+
+def set_execution_context(
+    user_id: str | None,
+    session: ChatSession,
+    long_running_callback: LongRunningCallback | None = None,
+) -> None:
+    """Set the execution context for tool calls.
+
+    This must be called before streaming begins to ensure tools have access
+    to user_id and session information.
+
+    Args:
+        user_id: Current user's ID.
+        session: Current chat session.
+        long_running_callback: Optional callback to delegate long-running tools
+            to the non-SDK background infrastructure (stream_registry + Redis).
+    """
+    _current_user_id.set(user_id)
+    _current_session.set(session)
+    _pending_tool_outputs.set({})
+    _long_running_callback.set(long_running_callback)
+
+
+def get_execution_context() -> tuple[str | None, ChatSession | None]:
+    """Get the current execution context."""
+    return (
+        _current_user_id.get(),
+        _current_session.get(),
+    )
+
+
+def pop_pending_tool_output(tool_name: str) -> str | None:
+    """Pop and return the stashed full output for *tool_name*.
+
+    The SDK CLI may truncate large tool results (writing them to disk and
+    replacing the content with a file reference). This stash keeps the
+    original MCP output so the response adapter can forward it to the
+    frontend for proper widget rendering.
+
+    Returns ``None`` if nothing was stashed for *tool_name*.
+    """
+    pending = _pending_tool_outputs.get(None)
+    if pending is None:
+        return None
+    return pending.pop(tool_name, None)
+
+
+async def _execute_tool_sync(
+    base_tool: BaseTool,
+    user_id: str | None,
+    session: ChatSession,
+    args: dict[str, Any],
+) -> dict[str, Any]:
+    """Execute a tool synchronously and return MCP-formatted response."""
+    effective_id = f"sdk-{uuid.uuid4().hex[:12]}"
+    result = await base_tool.execute(
+        user_id=user_id,
+        session=session,
+        tool_call_id=effective_id,
+        **args,
+    )
+
+    text = (
+        result.output if isinstance(result.output, str) else json.dumps(result.output)
+    )
+
+    # Stash the full output before the SDK potentially truncates it.
+    pending = _pending_tool_outputs.get(None)
+    if pending is not None:
+        pending[base_tool.name] = text
+
+    return {
+        "content": [{"type": "text", "text": text}],
+        "isError": not result.success,
+    }
+
+
+def _mcp_error(message: str) -> dict[str, Any]:
+    return {
+        "content": [
+            {"type": "text", "text": json.dumps({"error": message, "type": "error"})}
+        ],
+        "isError": True,
+    }
+
+
+def create_tool_handler(base_tool: BaseTool):
+    """Create an async handler function for a BaseTool.
+
+    This wraps the existing BaseTool._execute method to be compatible
+    with the Claude Agent SDK MCP tool format.
+
+    Long-running tools (``is_long_running=True``) are delegated to the
+    non-SDK background infrastructure via a callback set in the execution
+    context.  The callback persists the operation in Redis (stream_registry)
+    so results survive page refreshes and pod restarts.
+    """
+
+    async def tool_handler(args: dict[str, Any]) -> dict[str, Any]:
+        """Execute the wrapped tool and return MCP-formatted response."""
+        user_id, session = get_execution_context()
+
+        if session is None:
+            return _mcp_error("No session context available")
+
+        # --- Long-running: delegate to non-SDK background infrastructure ---
+        if base_tool.is_long_running:
+            callback = _long_running_callback.get(None)
+            if callback:
+                try:
+                    return await callback(base_tool.name, args, session)
+                except Exception as e:
+                    logger.error(
+                        f"Long-running callback failed for {base_tool.name}: {e}",
+                        exc_info=True,
+                    )
+                    return _mcp_error(f"Failed to start {base_tool.name}: {e}")
+            # No callback — fall through to synchronous execution
+            logger.warning(
+                f"[SDK] No long-running callback for {base_tool.name}, "
+                f"executing synchronously (may block)"
+            )
+
+        # --- Normal (fast) tool: execute synchronously ---
+        try:
+            return await _execute_tool_sync(base_tool, user_id, session, args)
+        except Exception as e:
+            logger.error(f"Error executing tool {base_tool.name}: {e}", exc_info=True)
+            return _mcp_error(f"Failed to execute {base_tool.name}: {e}")
+
+    return tool_handler
+
+
+def _build_input_schema(base_tool: BaseTool) -> dict[str, Any]:
+    """Build a JSON Schema input schema for a tool."""
+    return {
+        "type": "object",
+        "properties": base_tool.parameters.get("properties", {}),
+        "required": base_tool.parameters.get("required", []),
+    }
+
+
+async def _read_file_handler(args: dict[str, Any]) -> dict[str, Any]:
+    """Read a file with optional offset/limit. Restricted to SDK working directory.
+
+    After reading, the file is deleted to prevent accumulation in long-running pods.
+    """
+    file_path = args.get("file_path", "")
+    offset = args.get("offset", 0)
+    limit = args.get("limit", 2000)
+
+    # Security: only allow reads under ~/.claude/projects/**/tool-results/
+    real_path = os.path.realpath(file_path)
+    if not real_path.startswith(_SDK_PROJECTS_DIR) or "tool-results" not in real_path:
+        return {
+            "content": [{"type": "text", "text": f"Access denied: {file_path}"}],
+            "isError": True,
+        }
+
+    try:
+        with open(real_path) as f:
+            selected = list(itertools.islice(f, offset, offset + limit))
+        content = "".join(selected)
+        # Cleanup happens in _cleanup_sdk_tool_results after session ends;
+        # don't delete here — the SDK may read in multiple chunks.
+        return {"content": [{"type": "text", "text": content}], "isError": False}
+    except FileNotFoundError:
+        return {
+            "content": [{"type": "text", "text": f"File not found: {file_path}"}],
+            "isError": True,
+        }
+    except Exception as e:
+        return {
+            "content": [{"type": "text", "text": f"Error reading file: {e}"}],
+            "isError": True,
+        }
+
+
+_READ_TOOL_NAME = "Read"
+_READ_TOOL_DESCRIPTION = (
+    "Read a file from the local filesystem. "
+    "Use offset and limit to read specific line ranges for large files."
+)
+_READ_TOOL_SCHEMA = {
+    "type": "object",
+    "properties": {
+        "file_path": {
+            "type": "string",
+            "description": "The absolute path to the file to read",
+        },
+        "offset": {
+            "type": "integer",
+            "description": "Line number to start reading from (0-indexed). Default: 0",
+        },
+        "limit": {
+            "type": "integer",
+            "description": "Number of lines to read. Default: 2000",
+        },
+    },
+    "required": ["file_path"],
+}
+
+
+# Create the MCP server configuration
+def create_copilot_mcp_server():
+    """Create an in-process MCP server configuration for CoPilot tools.
+
+    This can be passed to ClaudeAgentOptions.mcp_servers.
+
+    Note: The actual SDK MCP server creation depends on the claude-agent-sdk
+    package being available. This function returns the configuration that
+    can be used with the SDK.
+    """
+    try:
+        from claude_agent_sdk import create_sdk_mcp_server, tool
+
+        # Create decorated tool functions
+        sdk_tools = []
+
+        for tool_name, base_tool in TOOL_REGISTRY.items():
+            handler = create_tool_handler(base_tool)
+            decorated = tool(
+                tool_name,
+                base_tool.description,
+                _build_input_schema(base_tool),
+            )(handler)
+            sdk_tools.append(decorated)
+
+        # Add the Read tool so the SDK can read back oversized tool results
+        read_tool = tool(
+            _READ_TOOL_NAME,
+            _READ_TOOL_DESCRIPTION,
+            _READ_TOOL_SCHEMA,
+        )(_read_file_handler)
+        sdk_tools.append(read_tool)
+
+        server = create_sdk_mcp_server(
+            name=MCP_SERVER_NAME,
+            version="1.0.0",
+            tools=sdk_tools,
+        )
+
+        return server
+
+    except ImportError:
+        # Let ImportError propagate so service.py handles the fallback
+        raise
+
+
+# SDK built-in tools allowed within the workspace directory.
+# Security hooks validate that file paths stay within sdk_cwd.
+# Bash is NOT included — use the sandboxed MCP bash_exec tool instead,
+# which provides kernel-level network isolation via unshare --net.
+# Task allows spawning sub-agents (rate-limited by security hooks).
+# WebSearch uses Brave Search via Anthropic's API — safe, no SSRF risk.
+_SDK_BUILTIN_TOOLS = ["Read", "Write", "Edit", "Glob", "Grep", "Task", "WebSearch"]
+
+# SDK built-in tools that must be explicitly blocked.
+# Bash: dangerous — agent uses mcp__copilot__bash_exec with kernel-level
+#   network isolation (unshare --net) instead.
+# WebFetch: SSRF risk — can reach internal network (localhost, 10.x, etc.).
+#   Agent uses the SSRF-protected mcp__copilot__web_fetch tool instead.
+SDK_DISALLOWED_TOOLS = ["Bash", "WebFetch"]
+
+# Tools that are blocked entirely in security hooks (defence-in-depth).
+# Includes SDK_DISALLOWED_TOOLS plus common aliases/synonyms.
+BLOCKED_TOOLS = {
+    *SDK_DISALLOWED_TOOLS,
+    "bash",
+    "shell",
+    "exec",
+    "terminal",
+    "command",
+}
+
+# Tools allowed only when their path argument stays within the SDK workspace.
+# The SDK uses these to handle oversized tool results (writes to tool-results/
+# files, then reads them back) and for workspace file operations.
+WORKSPACE_SCOPED_TOOLS = {"Read", "Write", "Edit", "Glob", "Grep"}
+
+# Dangerous patterns in tool inputs
+DANGEROUS_PATTERNS = [
+    r"sudo",
+    r"rm\s+-rf",
+    r"dd\s+if=",
+    r"/etc/passwd",
+    r"/etc/shadow",
+    r"chmod\s+777",
+    r"curl\s+.*\|.*sh",
+    r"wget\s+.*\|.*sh",
+    r"eval\s*\(",
+    r"exec\s*\(",
+    r"__import__",
+    r"os\.system",
+    r"subprocess",
+]
+
+# List of tool names for allowed_tools configuration
+# Include MCP tools, the MCP Read tool for oversized results,
+# and SDK built-in file tools for workspace operations.
+COPILOT_TOOL_NAMES = [
+    *[f"{MCP_TOOL_PREFIX}{name}" for name in TOOL_REGISTRY.keys()],
+    f"{MCP_TOOL_PREFIX}{_READ_TOOL_NAME}",
+    *_SDK_BUILTIN_TOOLS,
+]
--- a/autogpt_platform/backend/backend/api/features/chat/sdk/transcript.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/transcript.py
@@ -10,13 +10,10 @@ Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
 filesystem for self-hosted) — no DB column needed.
 """

+import json
 import logging
 import os
 import re
-import time
-from dataclasses import dataclass
-
-from backend.util import json

 logger = logging.getLogger(__name__)

@@ -34,16 +31,6 @@ STRIPPABLE_TYPES = frozenset(
    {"progress", "file-history-snapshot", "queue-operation", "summary", "pr-link"}
 )

-
-@dataclass
-class TranscriptDownload:
-    """Result of downloading a transcript with its metadata."""
-
-    content: str
-    message_count: int = 0  # session.messages length when uploaded
-    uploaded_at: float = 0.0  # epoch timestamp of upload
-
-
 # Workspace storage constants — deterministic path from session_id.
 TRANSCRIPT_STORAGE_PREFIX = "chat-transcripts"

@@ -59,37 +46,41 @@ def strip_progress_entries(content: str) -> str:
    Removes entries whose ``type`` is in ``STRIPPABLE_TYPES`` and reparents
    any remaining child entries so the ``parentUuid`` chain stays intact.
    Typically reduces transcript size by ~30%.
-
-    Entries that are not stripped or reparented are kept as their original
-    raw JSON line to avoid unnecessary re-serialization that changes
-    whitespace or key ordering.
    """
    lines = content.strip().split("\n")

-    # Parse entries, keeping the original line alongside the parsed dict.
-    parsed: list[tuple[str, dict | None]] = []
+    entries: list[dict] = []
    for line in lines:
-        parsed.append((line, json.loads(line, fallback=None)))
+        try:
+            entries.append(json.loads(line))
+        except json.JSONDecodeError:
+            # Keep unparseable lines as-is (safety)
+            entries.append({"_raw": line})

-    # First pass: identify stripped UUIDs and build parent map.
    stripped_uuids: set[str] = set()
    uuid_to_parent: dict[str, str] = {}
+    kept: list[dict] = []

-    for _line, entry in parsed:
-        if not isinstance(entry, dict):
+    for entry in entries:
+        if "_raw" in entry:
+            kept.append(entry)
            continue
        uid = entry.get("uuid", "")
        parent = entry.get("parentUuid", "")
+        entry_type = entry.get("type", "")
+
        if uid:
            uuid_to_parent[uid] = parent
-        if entry.get("type", "") in STRIPPABLE_TYPES and uid:
-            stripped_uuids.add(uid)

-    # Second pass: keep non-stripped entries, reparenting where needed.
-    # Preserve original line when no reparenting is required.
-    reparented: set[str] = set()
-    for _line, entry in parsed:
-        if not isinstance(entry, dict):
+        if entry_type in STRIPPABLE_TYPES:
+            if uid:
+                stripped_uuids.add(uid)
+        else:
+            kept.append(entry)
+
+    # Reparent: walk up chain through stripped entries to find surviving ancestor
+    for entry in kept:
+        if "_raw" in entry:
            continue
        parent = entry.get("parentUuid", "")
        original_parent = parent
@@ -97,32 +88,64 @@ def strip_progress_entries(content: str) -> str:
            parent = uuid_to_parent.get(parent, "")
        if parent != original_parent:
            entry["parentUuid"] = parent
-            uid = entry.get("uuid", "")
-            if uid:
-                reparented.add(uid)

    result_lines: list[str] = []
-    for line, entry in parsed:
-        if not isinstance(entry, dict):
-            result_lines.append(line)
-            continue
-        if entry.get("type", "") in STRIPPABLE_TYPES:
-            continue
-        uid = entry.get("uuid", "")
-        if uid in reparented:
-            # Re-serialize only entries whose parentUuid was changed.
-            result_lines.append(json.dumps(entry, separators=(",", ":")))
+    for entry in kept:
+        if "_raw" in entry:
+            result_lines.append(entry["_raw"])
        else:
-            result_lines.append(line)
+            result_lines.append(json.dumps(entry, separators=(",", ":")))

    return "\n".join(result_lines) + "\n"


 # ---------------------------------------------------------------------------
-# Local file I/O (write temp file for --resume)
+# Local file I/O (read from CLI's JSONL, write temp file for --resume)
 # ---------------------------------------------------------------------------


+def read_transcript_file(transcript_path: str) -> str | None:
+    """Read a JSONL transcript file from disk.
+
+    Returns the raw JSONL content, or ``None`` if the file is missing, empty,
+    or only contains metadata (≤2 lines with no conversation messages).
+    """
+    if not transcript_path or not os.path.isfile(transcript_path):
+        logger.debug(f"[Transcript] File not found: {transcript_path}")
+        return None
+
+    try:
+        with open(transcript_path) as f:
+            content = f.read()
+
+        if not content.strip():
+            logger.debug(f"[Transcript] Empty file: {transcript_path}")
+            return None
+
+        lines = content.strip().split("\n")
+        if len(lines) < 3:
+            # Raw files with ≤2 lines are metadata-only
+            # (queue-operation + file-history-snapshot, no conversation).
+            logger.debug(
+                f"[Transcript] Too few lines ({len(lines)}): {transcript_path}"
+            )
+            return None
+
+        # Quick structural validation — parse first and last lines.
+        json.loads(lines[0])
+        json.loads(lines[-1])
+
+        logger.info(
+            f"[Transcript] Read {len(lines)} lines, "
+            f"{len(content)} bytes from {transcript_path}"
+        )
+        return content
+
+    except (json.JSONDecodeError, OSError) as e:
+        logger.warning(f"[Transcript] Failed to read {transcript_path}: {e}")
+        return None
+
+
 def _sanitize_id(raw_id: str, max_len: int = 36) -> str:
    """Sanitize an ID for safe use in file paths.

@@ -137,34 +160,6 @@ def _sanitize_id(raw_id: str, max_len: int = 36) -> str:
 _SAFE_CWD_PREFIX = os.path.realpath("/tmp/copilot-")


-def cleanup_cli_project_dir(sdk_cwd: str) -> None:
-    """Remove the CLI's project directory for a specific working directory.
-
-    The CLI stores session data under ``~/.claude/projects/<encoded_cwd>/``.
-    Each SDK turn uses a unique ``sdk_cwd``, so the project directory is
-    safe to remove entirely after the transcript has been uploaded.
-    """
-    import shutil
-
-    # Encode cwd the same way CLI does (replaces non-alphanumeric with -)
-    cwd_encoded = re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(sdk_cwd))
-    config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
-    projects_base = os.path.realpath(os.path.join(config_dir, "projects"))
-    project_dir = os.path.realpath(os.path.join(projects_base, cwd_encoded))
-
-    if not project_dir.startswith(projects_base + os.sep):
-        logger.warning(
-            f"[Transcript] Cleanup path escaped projects base: {project_dir}"
-        )
-        return
-
-    if os.path.isdir(project_dir):
-        shutil.rmtree(project_dir, ignore_errors=True)
-        logger.debug(f"[Transcript] Cleaned up CLI project dir: {project_dir}")
-    else:
-        logger.debug(f"[Transcript] Project dir not found: {project_dir}")
-
-
 def write_transcript_to_tempfile(
    transcript_content: str,
    session_id: str,
@@ -207,29 +202,32 @@ def write_transcript_to_tempfile(
 def validate_transcript(content: str | None) -> bool:
    """Check that a transcript has actual conversation messages.

-    A valid transcript needs at least one assistant message (not just
-    queue-operation / file-history-snapshot metadata).  We do NOT require
-    a ``type: "user"`` entry because with ``--resume`` the user's message
-    is passed as a CLI query parameter and does not appear in the
-    transcript file.
+    A valid transcript for resume needs at least one user message and one
+    assistant message (not just queue-operation / file-history-snapshot
+    metadata).
    """
    if not content or not content.strip():
        return False

    lines = content.strip().split("\n")
+    if len(lines) < 2:
+        return False

+    has_user = False
    has_assistant = False

    for line in lines:
-        if not line.strip():
-            continue
-        entry = json.loads(line, fallback=None)
-        if not isinstance(entry, dict):
+        try:
+            entry = json.loads(line)
+            msg_type = entry.get("type")
+            if msg_type == "user":
+                has_user = True
+            elif msg_type == "assistant":
+                has_assistant = True
+        except json.JSONDecodeError:
            return False
-        if entry.get("type") == "assistant":
-            has_assistant = True

-    return has_assistant
+    return has_user and has_assistant


 # ---------------------------------------------------------------------------
@@ -250,15 +248,6 @@ def _storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, str]:
    )


-def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, str]:
-    """Return (workspace_id, file_id, filename) for a session's transcript metadata."""
-    return (
-        TRANSCRIPT_STORAGE_PREFIX,
-        _sanitize_id(user_id),
-        f"{_sanitize_id(session_id)}.meta.json",
-    )
-
-
 def _build_storage_path(user_id: str, session_id: str, backend: object) -> str:
    """Build the full storage path string that ``retrieve()`` expects.

@@ -279,51 +268,42 @@ def _build_storage_path(user_id: str, session_id: str, backend: object) -> str:
        return f"local://{wid}/{fid}/{fname}"


-async def upload_transcript(
-    user_id: str,
-    session_id: str,
-    content: str,
-    message_count: int = 0,
-    log_prefix: str = "[Transcript]",
-) -> None:
-    """Strip progress entries and upload complete transcript.
+async def upload_transcript(user_id: str, session_id: str, content: str) -> None:
+    """Strip progress entries and upload transcript to bucket storage.

-    The transcript represents the FULL active context (atomic).
-    Each upload REPLACES the previous transcript entirely.
-
-    The executor holds a cluster lock per session, so concurrent uploads for
-    the same session cannot happen.
-
-    Args:
-        content: Complete JSONL transcript (from TranscriptBuilder).
-        message_count: ``len(session.messages)`` at upload time.
+    Safety: only overwrites when the new (stripped) transcript is larger than
+    what is already stored.  Since JSONL is append-only, the latest transcript
+    is always the longest.  This prevents a slow/stale background task from
+    clobbering a newer upload from a concurrent turn.
    """
    from backend.util.workspace_storage import get_workspace_storage

-    # Strip metadata entries (progress, file-history-snapshot, etc.)
-    # Note: SDK-built transcripts shouldn't have these, but strip for safety
    stripped = strip_progress_entries(content)
    if not validate_transcript(stripped):
-        # Log entry types for debugging — helps identify why validation failed
-        entry_types: list[str] = []
-        for line in stripped.strip().split("\n"):
-            entry = json.loads(line, fallback={"type": "INVALID_JSON"})
-            entry_types.append(entry.get("type", "?"))
        logger.warning(
-            "%s Skipping upload — stripped content not valid "
-            "(types=%s, stripped_len=%d, raw_len=%d)",
-            log_prefix,
-            entry_types,
-            len(stripped),
-            len(content),
+            f"[Transcript] Skipping upload — stripped content is not a valid "
+            f"transcript for session {session_id}"
        )
-        logger.debug("%s Raw content preview: %s", log_prefix, content[:500])
-        logger.debug("%s Stripped content: %s", log_prefix, stripped[:500])
        return

    storage = await get_workspace_storage()
    wid, fid, fname = _storage_path_parts(user_id, session_id)
    encoded = stripped.encode("utf-8")
+    new_size = len(encoded)
+
+    # Check existing transcript size to avoid overwriting newer with older
+    path = _build_storage_path(user_id, session_id, storage)
+    try:
+        existing = await storage.retrieve(path)
+        if len(existing) >= new_size:
+            logger.info(
+                f"[Transcript] Skipping upload — existing transcript "
+                f"({len(existing)}B) >= new ({new_size}B) for session "
+                f"{session_id}"
+            )
+            return
+    except (FileNotFoundError, Exception):
+        pass  # No existing transcript or retrieval error — proceed with upload

    await storage.store(
        workspace_id=wid,
@@ -331,36 +311,16 @@ async def upload_transcript(
        filename=fname,
        content=encoded,
    )
-
-    # Update metadata so message_count stays current.  The gap-fill logic
-    # in _build_query_message relies on it to avoid re-compressing messages.
-    try:
-        meta = {"message_count": message_count, "uploaded_at": time.time()}
-        mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
-        await storage.store(
-            workspace_id=mwid,
-            file_id=mfid,
-            filename=mfname,
-            content=json.dumps(meta).encode("utf-8"),
-        )
-    except Exception as e:
-        logger.warning(f"{log_prefix} Failed to write metadata: {e}")
-
    logger.info(
-        f"{log_prefix} Uploaded {len(encoded)}B "
-        f"(stripped from {len(content)}B, msg_count={message_count})"
+        f"[Transcript] Uploaded {new_size} bytes "
+        f"(stripped from {len(content)}) for session {session_id}"
    )


-async def download_transcript(
-    user_id: str,
-    session_id: str,
-    log_prefix: str = "[Transcript]",
-) -> TranscriptDownload | None:
-    """Download transcript and metadata from bucket storage.
+async def download_transcript(user_id: str, session_id: str) -> str | None:
+    """Download transcript from bucket storage.

-    Returns a ``TranscriptDownload`` with the JSONL content and the
-    ``message_count`` watermark from the upload, or ``None`` if not found.
+    Returns the JSONL content string, or ``None`` if not found.
    """
    from backend.util.workspace_storage import get_workspace_storage

@@ -370,40 +330,17 @@ async def download_transcript(
    try:
        data = await storage.retrieve(path)
        content = data.decode("utf-8")
+        logger.info(
+            f"[Transcript] Downloaded {len(content)} bytes for session {session_id}"
+        )
+        return content
    except FileNotFoundError:
-        logger.debug(f"{log_prefix} No transcript in storage")
+        logger.debug(f"[Transcript] No transcript in storage for {session_id}")
        return None
    except Exception as e:
-        logger.warning(f"{log_prefix} Failed to download transcript: {e}")
+        logger.warning(f"[Transcript] Failed to download transcript: {e}")
        return None

-    # Try to load metadata (best-effort — old transcripts won't have it)
-    message_count = 0
-    uploaded_at = 0.0
-    try:
-        from backend.util.workspace_storage import GCSWorkspaceStorage
-
-        mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
-        if isinstance(storage, GCSWorkspaceStorage):
-            blob = f"workspaces/{mwid}/{mfid}/{mfname}"
-            meta_path = f"gcs://{storage.bucket_name}/{blob}"
-        else:
-            meta_path = f"local://{mwid}/{mfid}/{mfname}"
-
-        meta_data = await storage.retrieve(meta_path)
-        meta = json.loads(meta_data.decode("utf-8"), fallback={})
-        message_count = meta.get("message_count", 0)
-        uploaded_at = meta.get("uploaded_at", 0.0)
-    except (FileNotFoundError, Exception):
-        pass  # No metadata — treat as unknown (msg_count=0 → always fill gap)
-
-    logger.info(f"{log_prefix} Downloaded {len(content)}B (msg_count={message_count})")
-    return TranscriptDownload(
-        content=content,
-        message_count=message_count,
-        uploaded_at=uploaded_at,
-    )
-

 async def delete_transcript(user_id: str, session_id: str) -> None:
    """Delete transcript from bucket storage (e.g. after resume failure)."""
--- a/autogpt_platform/backend/backend/api/features/chat/service.py
+++ b/autogpt_platform/backend/backend/api/features/chat/service.py
--- a/autogpt_platform/backend/backend/api/features/chat/service_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/service_test.py
@@ -4,14 +4,87 @@ from os import getenv

 import pytest

+from . import service as chat_service
 from .model import create_chat_session, get_chat_session, upsert_chat_session
-from .response_model import StreamError, StreamTextDelta
+from .response_model import (
+    StreamError,
+    StreamFinish,
+    StreamTextDelta,
+    StreamToolOutputAvailable,
+)
 from .sdk import service as sdk_service
 from .sdk.transcript import download_transcript

 logger = logging.getLogger(__name__)


+@pytest.mark.asyncio(loop_scope="session")
+async def test_stream_chat_completion(setup_test_user, test_user_id):
+    """
+    Test the stream_chat_completion function.
+    """
+    api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
+    if not api_key:
+        return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
+
+    session = await create_chat_session(test_user_id)
+
+    has_errors = False
+    has_ended = False
+    assistant_message = ""
+    async for chunk in chat_service.stream_chat_completion(
+        session.session_id, "Hello, how are you?", user_id=session.user_id
+    ):
+        logger.info(chunk)
+        if isinstance(chunk, StreamError):
+            has_errors = True
+        if isinstance(chunk, StreamTextDelta):
+            assistant_message += chunk.delta
+        if isinstance(chunk, StreamFinish):
+            has_ended = True
+
+    assert has_ended, "Chat completion did not end"
+    assert not has_errors, "Error occurred while streaming chat completion"
+    assert assistant_message, "Assistant message is empty"
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_stream_chat_completion_with_tool_calls(setup_test_user, test_user_id):
+    """
+    Test the stream_chat_completion function.
+    """
+    api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
+    if not api_key:
+        return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
+
+    session = await create_chat_session(test_user_id)
+    session = await upsert_chat_session(session)
+
+    has_errors = False
+    has_ended = False
+    had_tool_calls = False
+    async for chunk in chat_service.stream_chat_completion(
+        session.session_id,
+        "Please find me an agent that can help me with my business. Use the query 'moneny printing agent'",
+        user_id=session.user_id,
+    ):
+        logger.info(chunk)
+        if isinstance(chunk, StreamError):
+            has_errors = True
+
+        if isinstance(chunk, StreamFinish):
+            has_ended = True
+        if isinstance(chunk, StreamToolOutputAvailable):
+            had_tool_calls = True
+
+    assert has_ended, "Chat completion did not end"
+    assert not has_errors, "Error occurred while streaming chat completion"
+    assert had_tool_calls, "Tool calls did not occur"
+    session = await get_chat_session(session.session_id)
+    assert session, "Session not found"
+    assert session.usage, "Usage is empty"
+
+
@pytest.mark.asyncio(loop_scope="session")
 async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
    """Test that the SDK --resume path captures and uses transcripts across turns.
@@ -41,6 +114,7 @@ async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
    )
    turn1_text = ""
    turn1_errors: list[str] = []
+    turn1_ended = False

    async for chunk in sdk_service.stream_chat_completion_sdk(
        session.session_id,
@@ -51,28 +125,25 @@ async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
            turn1_text += chunk.delta
        elif isinstance(chunk, StreamError):
            turn1_errors.append(chunk.errorText)
+        elif isinstance(chunk, StreamFinish):
+            turn1_ended = True

+    assert turn1_ended, "Turn 1 did not finish"
    assert not turn1_errors, f"Turn 1 errors: {turn1_errors}"
    assert turn1_text, "Turn 1 produced no text"

-    # Wait for background upload task to complete (retry up to 5s).
-    # The CLI may not produce a usable transcript for very short
-    # conversations (only metadata entries) — this is environment-dependent
-    # (CLI version, platform).  When that happens, multi-turn still works
-    # via conversation compression (non-resume path), but we can't test
-    # the --resume round-trip.
+    # Wait for background upload task to complete (retry up to 5s)
    transcript = None
    for _ in range(10):
        await asyncio.sleep(0.5)
        transcript = await download_transcript(test_user_id, session.session_id)
        if transcript:
            break
-    if not transcript:
-        return pytest.skip(
-            "CLI did not produce a usable transcript — "
-            "cannot test --resume round-trip in this environment"
-        )
-    logger.info(f"Turn 1 transcript uploaded: {len(transcript.content)} bytes")
+    assert transcript, (
+        "Transcript was not uploaded to bucket after turn 1 — "
+        "Stop hook may not have fired or transcript was too small"
+    )
+    logger.info(f"Turn 1 transcript uploaded: {len(transcript)} bytes")

    # Reload session for turn 2
    session = await get_chat_session(session.session_id, test_user_id)
@@ -82,6 +153,7 @@ async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
    turn2_msg = "What was the special keyword I asked you to remember?"
    turn2_text = ""
    turn2_errors: list[str] = []
+    turn2_ended = False

    async for chunk in sdk_service.stream_chat_completion_sdk(
        session.session_id,
@@ -93,7 +165,10 @@ async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
            turn2_text += chunk.delta
        elif isinstance(chunk, StreamError):
            turn2_errors.append(chunk.errorText)
+        elif isinstance(chunk, StreamFinish):
+            turn2_ended = True

+    assert turn2_ended, "Turn 2 did not finish"
    assert not turn2_errors, f"Turn 2 errors: {turn2_errors}"
    assert turn2_text, "Turn 2 produced no text"
    assert keyword in turn2_text, (
--- a/autogpt_platform/backend/backend/api/features/chat/stream_registry.py
+++ b/autogpt_platform/backend/backend/api/features/chat/stream_registry.py
--- a/autogpt_platform/backend/backend/api/features/chat/tools/IDEAS.md
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/IDEAS.md
--- a/autogpt_platform/backend/backend/api/features/chat/tools/init.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/init.py
@@ -1,17 +1,16 @@
-from __future__ import annotations
-
 import logging
 from typing import TYPE_CHECKING, Any

 from openai.types.chat import ChatCompletionToolParam

-from backend.copilot.tracking import track_tool_called
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tracking import track_tool_called

 from .add_understanding import AddUnderstandingTool
-from .agent_browser import BrowserActTool, BrowserNavigateTool, BrowserScreenshotTool
 from .agent_output import AgentOutputTool
 from .base import BaseTool
 from .bash_exec import BashExecTool
+from .check_operation_status import CheckOperationStatusTool
 from .create_agent import CreateAgentTool
 from .customize_agent import CustomizeAgentTool
 from .edit_agent import EditAgentTool
@@ -19,23 +18,10 @@ from .feature_requests import CreateFeatureRequestTool, SearchFeatureRequestsToo
 from .find_agent import FindAgentTool
 from .find_block import FindBlockTool
 from .find_library_agent import FindLibraryAgentTool
-from .fix_agent import FixAgentGraphTool
-from .get_agent_building_guide import GetAgentBuildingGuideTool
 from .get_doc_page import GetDocPageTool
-from .get_mcp_guide import GetMCPGuideTool
-from .manage_folders import (
-    CreateFolderTool,
-    DeleteFolderTool,
-    ListFoldersTool,
-    MoveAgentsToFolderTool,
-    MoveFolderTool,
-    UpdateFolderTool,
-)
 from .run_agent import RunAgentTool
 from .run_block import RunBlockTool
-from .run_mcp_tool import RunMCPToolTool
 from .search_docs import SearchDocsTool
-from .validate_agent import ValidateAgentGraphTool
 from .web_fetch import WebFetchTool
 from .workspace_files import (
    DeleteWorkspaceFileTool,
@@ -45,8 +31,7 @@ from .workspace_files import (
 )

 if TYPE_CHECKING:
-    from backend.copilot.model import ChatSession
-    from backend.copilot.response_model import StreamToolOutputAvailable
+    from backend.api.features.chat.response_model import StreamToolOutputAvailable

 logger = logging.getLogger(__name__)

@@ -59,36 +44,20 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
    "find_agent": FindAgentTool(),
    "find_block": FindBlockTool(),
    "find_library_agent": FindLibraryAgentTool(),
-    # Folder management tools
-    "create_folder": CreateFolderTool(),
-    "list_folders": ListFoldersTool(),
-    "update_folder": UpdateFolderTool(),
-    "move_folder": MoveFolderTool(),
-    "delete_folder": DeleteFolderTool(),
-    "move_agents_to_folder": MoveAgentsToFolderTool(),
    "run_agent": RunAgentTool(),
    "run_block": RunBlockTool(),
-    "run_mcp_tool": RunMCPToolTool(),
-    "get_mcp_guide": GetMCPGuideTool(),
    "view_agent_output": AgentOutputTool(),
+    "check_operation_status": CheckOperationStatusTool(),
    "search_docs": SearchDocsTool(),
    "get_doc_page": GetDocPageTool(),
-    "get_agent_building_guide": GetAgentBuildingGuideTool(),
    # Web fetch for safe URL retrieval
    "web_fetch": WebFetchTool(),
-    # Agent-browser multi-step automation (navigate, act, screenshot)
-    "browser_navigate": BrowserNavigateTool(),
-    "browser_act": BrowserActTool(),
-    "browser_screenshot": BrowserScreenshotTool(),
    # Sandboxed code execution (bubblewrap)
    "bash_exec": BashExecTool(),
    # Persistent workspace tools (cloud storage, survives across sessions)
    # Feature request tools
    "search_feature_requests": SearchFeatureRequestsTool(),
    "create_feature_request": CreateFeatureRequestTool(),
-    # Agent generation tools (local validation/fixing)
-    "validate_agent_graph": ValidateAgentGraphTool(),
-    "fix_agent_graph": FixAgentGraphTool(),
    # Workspace tools for CoPilot file operations
    "list_workspace_files": ListWorkspaceFilesTool(),
    "read_workspace_file": ReadWorkspaceFileTool(),
@@ -100,17 +69,10 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
 find_agent_tool = TOOL_REGISTRY["find_agent"]
 run_agent_tool = TOOL_REGISTRY["run_agent"]

-
-def get_available_tools() -> list[ChatCompletionToolParam]:
-    """Return OpenAI tool schemas for tools available in the current environment.
-
-    Called per-request so that env-var or binary availability is evaluated
-    fresh each time (e.g. browser_* tools are excluded when agent-browser
-    CLI is not installed).
-    """
-    return [
-        tool.as_openai_tool() for tool in TOOL_REGISTRY.values() if tool.is_available
-    ]
+# Generated from registry for OpenAI API
+tools: list[ChatCompletionToolParam] = [
+    tool.as_openai_tool() for tool in TOOL_REGISTRY.values()
+]


 def get_tool(tool_name: str) -> BaseTool | None:
--- a/autogpt_platform/backend/backend/api/features/chat/tools/_test_data.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/_test_data.py
@@ -1,46 +1,22 @@
-import logging
 import uuid
 from datetime import UTC, datetime
 from os import getenv

 import pytest
-import pytest_asyncio
 from prisma.types import ProfileCreateInput
 from pydantic import SecretStr

+from backend.api.features.chat.model import ChatSession
 from backend.api.features.store import db as store_db
 from backend.blocks.firecrawl.scrape import FirecrawlScrapeBlock
 from backend.blocks.io import AgentInputBlock, AgentOutputBlock
 from backend.blocks.llm import AITextGeneratorBlock
-from backend.copilot.model import ChatSession
-from backend.data import db as db_module
 from backend.data.db import prisma
 from backend.data.graph import Graph, Link, Node, create_graph
 from backend.data.model import APIKeyCredentials
 from backend.data.user import get_or_create_user
 from backend.integrations.credentials_store import IntegrationCredentialsStore

-_logger = logging.getLogger(__name__)
-
-
-async def _ensure_db_connected() -> None:
-    """Ensure the Prisma connection is alive on the current event loop.
-
-    On Python 3.11, the httpx transport inside Prisma can reference a stale
-    (closed) event loop when session-scoped async fixtures are evaluated long
-    after the initial ``server`` fixture connected Prisma.  A cheap health-check
-    followed by a reconnect fixes this without affecting other fixtures.
-    """
-    try:
-        await prisma.query_raw("SELECT 1")
-    except Exception:
-        _logger.info("Prisma connection stale – reconnecting")
-        try:
-            await db_module.disconnect()
-        except Exception:
-            pass
-        await db_module.connect()
-

 def make_session(user_id: str):
    return ChatSession(
@@ -55,19 +31,15 @@ def make_session(user_id: str):
    )


-@pytest_asyncio.fixture(scope="session", loop_scope="session")
-async def setup_test_data(server):
+@pytest.fixture(scope="session")
+async def setup_test_data():
    """
    Set up test data for run_agent tests:
    1. Create a test user
    2. Create a test graph (agent input -> agent output)
    3. Create a store listing and store listing version
    4. Approve the store listing version
-
-    Depends on ``server`` to ensure Prisma is connected.
    """
-    await _ensure_db_connected()
-
    # 1. Create a test user
    user_data = {
        "sub": f"test-user-{uuid.uuid4()}",
@@ -151,8 +123,8 @@ async def setup_test_data(server):
    unique_slug = f"test-agent-{str(uuid.uuid4())[:8]}"
    store_submission = await store_db.create_store_submission(
        user_id=user.id,
-        graph_id=created_graph.id,
-        graph_version=created_graph.version,
+        agent_id=created_graph.id,
+        agent_version=created_graph.version,
        slug=unique_slug,
        name="Test Agent",
        description="A simple test agent",
@@ -161,10 +133,10 @@ async def setup_test_data(server):
        image_urls=["https://example.com/image.jpg"],
    )

-    assert store_submission.listing_version_id is not None
+    assert store_submission.store_listing_version_id is not None
    # 4. Approve the store listing version
    await store_db.review_store_submission(
-        store_listing_version_id=store_submission.listing_version_id,
+        store_listing_version_id=store_submission.store_listing_version_id,
        is_approved=True,
        external_comments="Approved for testing",
        internal_comments="Test approval",
@@ -178,19 +150,15 @@ async def setup_test_data(server):
    }


-@pytest_asyncio.fixture(scope="session", loop_scope="session")
-async def setup_llm_test_data(server):
+@pytest.fixture(scope="session")
+async def setup_llm_test_data():
    """
    Set up test data for LLM agent tests:
    1. Create a test user
    2. Create test OpenAI credentials for the user
    3. Create a test graph with input -> LLM block -> output
    4. Create and approve a store listing
-
-    Depends on ``server`` to ensure Prisma is connected.
    """
-    await _ensure_db_connected()
-
    key = getenv("OPENAI_API_KEY")
    if not key:
        return pytest.skip("OPENAI_API_KEY is not set")
@@ -321,8 +289,8 @@ async def setup_llm_test_data(server):
    unique_slug = f"llm-test-agent-{str(uuid.uuid4())[:8]}"
    store_submission = await store_db.create_store_submission(
        user_id=user.id,
-        graph_id=created_graph.id,
-        graph_version=created_graph.version,
+        agent_id=created_graph.id,
+        agent_version=created_graph.version,
        slug=unique_slug,
        name="LLM Test Agent",
        description="An agent with LLM capabilities",
@@ -330,9 +298,9 @@ async def setup_llm_test_data(server):
        categories=["testing", "ai"],
        image_urls=["https://example.com/image.jpg"],
    )
-    assert store_submission.listing_version_id is not None
+    assert store_submission.store_listing_version_id is not None
    await store_db.review_store_submission(
-        store_listing_version_id=store_submission.listing_version_id,
+        store_listing_version_id=store_submission.store_listing_version_id,
        is_approved=True,
        external_comments="Approved for testing",
        internal_comments="Test approval for LLM agent",
@@ -347,18 +315,14 @@ async def setup_llm_test_data(server):
    }


-@pytest_asyncio.fixture(scope="session", loop_scope="session")
-async def setup_firecrawl_test_data(server):
+@pytest.fixture(scope="session")
+async def setup_firecrawl_test_data():
    """
    Set up test data for Firecrawl agent tests (missing credentials scenario):
    1. Create a test user (WITHOUT Firecrawl credentials)
    2. Create a test graph with input -> Firecrawl block -> output
    3. Create and approve a store listing
-
-    Depends on ``server`` to ensure Prisma is connected.
    """
-    await _ensure_db_connected()
-
    # 1. Create a test user
    user_data = {
        "sub": f"test-user-{uuid.uuid4()}",
@@ -476,8 +440,8 @@ async def setup_firecrawl_test_data(server):
    unique_slug = f"firecrawl-test-agent-{str(uuid.uuid4())[:8]}"
    store_submission = await store_db.create_store_submission(
        user_id=user.id,
-        graph_id=created_graph.id,
-        graph_version=created_graph.version,
+        agent_id=created_graph.id,
+        agent_version=created_graph.version,
        slug=unique_slug,
        name="Firecrawl Test Agent",
        description="An agent with Firecrawl integration (no credentials)",
@@ -485,9 +449,9 @@ async def setup_firecrawl_test_data(server):
        categories=["testing", "scraping"],
        image_urls=["https://example.com/image.jpg"],
    )
-    assert store_submission.listing_version_id is not None
+    assert store_submission.store_listing_version_id is not None
    await store_db.review_store_submission(
-        store_listing_version_id=store_submission.listing_version_id,
+        store_listing_version_id=store_submission.store_listing_version_id,
        is_approved=True,
        external_comments="Approved for testing",
        internal_comments="Test approval for Firecrawl agent",
--- a/autogpt_platform/backend/backend/api/features/chat/tools/add_understanding.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/add_understanding.py
@@ -3,9 +3,11 @@
 import logging
 from typing import Any

-from backend.copilot.model import ChatSession
-from backend.data.db_accessors import understanding_db
-from backend.data.understanding import BusinessUnderstandingInput
+from backend.api.features.chat.model import ChatSession
+from backend.data.understanding import (
+    BusinessUnderstandingInput,
+    upsert_business_understanding,
+)

 from .base import BaseTool
 from .models import ErrorResponse, ToolResponseBase, UnderstandingUpdatedResponse
@@ -97,9 +99,7 @@ and automations for the user's specific needs."""
        ]

        # Upsert with merge
-        understanding = await understanding_db().upsert_business_understanding(
-            user_id, input_data
-        )
+        understanding = await upsert_business_understanding(user_id, input_data)

        # Build current understanding summary (filter out empty values)
        current_understanding = {
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/init.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/init.py
@@ -1,20 +1,24 @@
 """Agent generator package - Creates agents from natural language."""

 from .core import (
+    AgentGeneratorNotConfiguredError,
    AgentJsonValidationError,
    AgentSummary,
    DecompositionResult,
    DecompositionStep,
    LibraryAgentSummary,
    MarketplaceAgentSummary,
+    customize_template,
+    decompose_goal,
    enrich_library_agents_from_steps,
    extract_search_terms_from_steps,
    extract_uuids_from_text,
+    generate_agent,
+    generate_agent_patch,
    get_agent_as_json,
    get_all_relevant_agents_for_generation,
    get_library_agent_by_graph_id,
    get_library_agent_by_id,
-    get_library_agents_by_ids,
    get_library_agents_for_generation,
    graph_to_json,
    json_to_graph,
@@ -22,28 +26,33 @@ from .core import (
    search_marketplace_agents_for_generation,
 )
 from .errors import get_user_message_for_error
-from .validation import AgentFixer, AgentValidator
+from .service import health_check as check_external_service_health
+from .service import is_external_service_configured

 __all__ = [
-    "AgentFixer",
-    "AgentValidator",
+    "AgentGeneratorNotConfiguredError",
    "AgentJsonValidationError",
    "AgentSummary",
    "DecompositionResult",
    "DecompositionStep",
    "LibraryAgentSummary",
    "MarketplaceAgentSummary",
+    "check_external_service_health",
+    "customize_template",
+    "decompose_goal",
    "enrich_library_agents_from_steps",
    "extract_search_terms_from_steps",
    "extract_uuids_from_text",
+    "generate_agent",
+    "generate_agent_patch",
    "get_agent_as_json",
    "get_all_relevant_agents_for_generation",
    "get_library_agent_by_graph_id",
    "get_library_agent_by_id",
-    "get_library_agents_by_ids",
    "get_library_agents_for_generation",
    "get_user_message_for_error",
    "graph_to_json",
+    "is_external_service_configured",
    "json_to_graph",
    "save_agent_to_library",
    "search_marketplace_agents_for_generation",
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/core.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/core.py
@@ -3,14 +3,20 @@
 import logging
 import re
 import uuid
-from collections.abc import Sequence
 from typing import Any, NotRequired, TypedDict

-from backend.data.db_accessors import graph_db, library_db, store_db
-from backend.data.graph import Graph, Link, Node
+from backend.api.features.library import db as library_db
+from backend.api.features.store import db as store_db
+from backend.data.graph import Graph, Link, Node, get_graph, get_store_listed_graphs
 from backend.util.exceptions import DatabaseError, NotFoundError

-from .helpers import UUID_RE_STR
+from .service import (
+    customize_template_external,
+    decompose_goal_external,
+    generate_agent_external,
+    generate_agent_patch_external,
+    is_external_service_configured,
+)

 logger = logging.getLogger(__name__)

@@ -72,7 +78,38 @@ class DecompositionResult(TypedDict, total=False):
 AgentSummary = LibraryAgentSummary | MarketplaceAgentSummary | dict[str, Any]


-_UUID_PATTERN = re.compile(UUID_RE_STR, re.IGNORECASE)
+def _to_dict_list(
+    agents: list[AgentSummary] | list[dict[str, Any]] | None,
+) -> list[dict[str, Any]] | None:
+    """Convert typed agent summaries to plain dicts for external service calls."""
+    if agents is None:
+        return None
+    return [dict(a) for a in agents]
+
+
+class AgentGeneratorNotConfiguredError(Exception):
+    """Raised when the external Agent Generator service is not configured."""
+
+    pass
+
+
+def _check_service_configured() -> None:
+    """Check if the external Agent Generator service is configured.
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the service is not configured.
+    """
+    if not is_external_service_configured():
+        raise AgentGeneratorNotConfiguredError(
+            "Agent Generator service is not configured. "
+            "Set AGENTGENERATOR_HOST environment variable to enable agent generation."
+        )
+
+
+_UUID_PATTERN = re.compile(
+    r"[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89ab][a-f0-9]{3}-[a-f0-9]{12}",
+    re.IGNORECASE,
+)


 def extract_uuids_from_text(text: str) -> list[str]:
@@ -108,9 +145,8 @@ async def get_library_agent_by_id(
    Returns:
        LibraryAgentSummary if found, None otherwise
    """
-    db = library_db()
    try:
-        agent = await db.get_library_agent_by_graph_id(user_id, agent_id)
+        agent = await library_db.get_library_agent_by_graph_id(user_id, agent_id)
        if agent:
            logger.debug(f"Found library agent by graph_id: {agent.name}")
            return LibraryAgentSummary(
@@ -127,7 +163,7 @@ async def get_library_agent_by_id(
        logger.debug(f"Could not fetch library agent by graph_id {agent_id}: {e}")

    try:
-        agent = await db.get_library_agent(agent_id, user_id)
+        agent = await library_db.get_library_agent(agent_id, user_id)
        if agent:
            logger.debug(f"Found library agent by library_id: {agent.name}")
            return LibraryAgentSummary(
@@ -154,36 +190,6 @@ async def get_library_agent_by_id(
 get_library_agent_by_graph_id = get_library_agent_by_id


-async def get_library_agents_by_ids(
-    user_id: str,
-    agent_ids: list[str],
-) -> list[LibraryAgentSummary]:
-    """Fetch multiple library agents by their IDs.
-
-    Args:
-        user_id: The user ID
-        agent_ids: List of agent IDs (can be graph_ids or library agent IDs)
-
-    Returns:
-        List of LibraryAgentSummary for found agents (silently skips not found)
-    """
-    agents: list[LibraryAgentSummary] = []
-    for agent_id in agent_ids:
-        try:
-            agent = await get_library_agent_by_id(user_id, agent_id)
-            if agent:
-                agents.append(agent)
-                logger.debug(f"Fetched library agent by ID: {agent['name']}")
-            else:
-                logger.warning(f"Library agent not found for ID: {agent_id}")
-        except Exception as e:
-            logger.warning(f"Failed to fetch library agent {agent_id}: {e}")
-            continue
-
-    logger.info(f"Fetched {len(agents)}/{len(agent_ids)} library agents by ID")
-    return agents
-
-
 async def get_library_agents_for_generation(
    user_id: str,
    search_query: str | None = None,
@@ -208,17 +214,10 @@ async def get_library_agents_for_generation(
    Returns:
        List of LibraryAgentSummary with schemas and recent executions for sub-agent composition
    """
-    search_term = search_query.strip() if search_query else None
-    if search_term and len(search_term) > 100:
-        raise ValueError(
-            f"Search query is too long ({len(search_term)} chars, max 100). "
-            f"Please use a shorter, more specific search term."
-        )
-
    try:
-        response = await library_db().list_library_agents(
+        response = await library_db.list_library_agents(
            user_id=user_id,
-            search_term=search_term,
+            search_term=search_query,
            page=1,
            page_size=max_results,
            include_executions=True,
@@ -272,16 +271,9 @@ async def search_marketplace_agents_for_generation(
    Returns:
        List of LibraryAgentSummary with full input/output schemas
    """
-    search_term = search_query.strip()
-    if len(search_term) > 100:
-        raise ValueError(
-            f"Search query is too long ({len(search_term)} chars, max 100). "
-            f"Please use a shorter, more specific search term."
-        )
-
    try:
-        response = await store_db().get_store_agents(
-            search_query=search_term,
+        response = await store_db.get_store_agents(
+            search_query=search_query,
            page=1,
            page_size=max_results,
        )
@@ -294,7 +286,7 @@ async def search_marketplace_agents_for_generation(
            return []

        graph_ids = [agent.agent_graph_id for agent in agents_with_graphs]
-        graphs = await graph_db().get_store_listed_graphs(graph_ids)
+        graphs = await get_store_listed_graphs(*graph_ids)

        results: list[LibraryAgentSummary] = []
        for agent in agents_with_graphs:
@@ -432,7 +424,7 @@ def extract_search_terms_from_steps(
 async def enrich_library_agents_from_steps(
    user_id: str,
    decomposition_result: DecompositionResult | dict[str, Any],
-    existing_agents: Sequence[AgentSummary] | Sequence[dict[str, Any]],
+    existing_agents: list[AgentSummary] | list[dict[str, Any]],
    exclude_graph_id: str | None = None,
    include_marketplace: bool = True,
    max_additional_results: int = 10,
@@ -456,7 +448,7 @@ async def enrich_library_agents_from_steps(
    search_terms = extract_search_terms_from_steps(decomposition_result)

    if not search_terms:
-        return list(existing_agents)
+        return existing_agents

    existing_ids: set[str] = set()
    existing_names: set[str] = set()
@@ -516,6 +508,79 @@ async def enrich_library_agents_from_steps(
    return all_agents


+async def decompose_goal(
+    description: str,
+    context: str = "",
+    library_agents: list[AgentSummary] | None = None,
+) -> DecompositionResult | None:
+    """Break down a goal into steps or return clarifying questions.
+
+    Args:
+        description: Natural language goal description
+        context: Additional context (e.g., answers to previous questions)
+        library_agents: User's library agents available for sub-agent composition
+
+    Returns:
+        DecompositionResult with either:
+        - {"type": "clarifying_questions", "questions": [...]}
+        - {"type": "instructions", "steps": [...]}
+        Or None on error
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the external service is not configured.
+    """
+    _check_service_configured()
+    logger.info("Calling external Agent Generator service for decompose_goal")
+    result = await decompose_goal_external(
+        description, context, _to_dict_list(library_agents)
+    )
+    return result  # type: ignore[return-value]
+
+
+async def generate_agent(
+    instructions: DecompositionResult | dict[str, Any],
+    library_agents: list[AgentSummary] | list[dict[str, Any]] | None = None,
+    operation_id: str | None = None,
+    task_id: str | None = None,
+) -> dict[str, Any] | None:
+    """Generate agent JSON from instructions.
+
+    Args:
+        instructions: Structured instructions from decompose_goal
+        library_agents: User's library agents available for sub-agent composition
+        operation_id: Operation ID for async processing (enables Redis Streams
+            completion notification)
+        task_id: Task ID for async processing (enables Redis Streams persistence
+            and SSE delivery)
+
+    Returns:
+        Agent JSON dict, {"status": "accepted"} for async, error dict {"type": "error", ...}, or None on error
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the external service is not configured.
+    """
+    _check_service_configured()
+    logger.info("Calling external Agent Generator service for generate_agent")
+    result = await generate_agent_external(
+        dict(instructions), _to_dict_list(library_agents), operation_id, task_id
+    )
+
+    # Don't modify async response
+    if result and result.get("status") == "accepted":
+        return result
+
+    if result:
+        if isinstance(result, dict) and result.get("type") == "error":
+            return result
+        if "id" not in result:
+            result["id"] = str(uuid.uuid4())
+        if "version" not in result:
+            result["version"] = 1
+        if "is_active" not in result:
+            result["is_active"] = True
+    return result
+
+
 class AgentJsonValidationError(Exception):
    """Raised when agent JSON is invalid or missing required fields."""

@@ -595,10 +660,7 @@ def json_to_graph(agent_json: dict[str, Any]) -> Graph:


 async def save_agent_to_library(
-    agent_json: dict[str, Any],
-    user_id: str,
-    is_update: bool = False,
-    folder_id: str | None = None,
+    agent_json: dict[str, Any], user_id: str, is_update: bool = False
 ) -> tuple[Graph, Any]:
    """Save agent to database and user's library.

@@ -606,16 +668,14 @@ async def save_agent_to_library(
        agent_json: Agent JSON dict
        user_id: User ID
        is_update: Whether this is an update to an existing agent
-        folder_id: Optional folder ID to place the agent in

    Returns:
        Tuple of (created Graph, LibraryAgent)
    """
    graph = json_to_graph(agent_json)
-    db = library_db()
    if is_update:
-        return await db.update_graph_in_library(graph, user_id)
-    return await db.create_graph_in_library(graph, user_id, folder_id=folder_id)
+        return await library_db.update_graph_in_library(graph, user_id)
+    return await library_db.create_graph_in_library(graph, user_id)


 def graph_to_json(graph: Graph) -> dict[str, Any]:
@@ -675,14 +735,12 @@ async def get_agent_as_json(
    Returns:
        Agent as JSON dict or None if not found
    """
-    db = graph_db()
-
-    graph = await db.get_graph(agent_id, version=None, user_id=user_id)
+    graph = await get_graph(agent_id, version=None, user_id=user_id)

    if not graph and user_id:
        try:
-            library_agent = await library_db().get_library_agent(agent_id, user_id)
-            graph = await db.get_graph(
+            library_agent = await library_db.get_library_agent(agent_id, user_id)
+            graph = await get_graph(
                library_agent.graph_id, version=None, user_id=user_id
            )
        except NotFoundError:
@@ -692,3 +750,76 @@ async def get_agent_as_json(
        return None

    return graph_to_json(graph)
+
+
+async def generate_agent_patch(
+    update_request: str,
+    current_agent: dict[str, Any],
+    library_agents: list[AgentSummary] | None = None,
+    operation_id: str | None = None,
+    task_id: str | None = None,
+) -> dict[str, Any] | None:
+    """Update an existing agent using natural language.
+
+    The external Agent Generator service handles:
+    - Generating the patch
+    - Applying the patch
+    - Fixing and validating the result
+
+    Args:
+        update_request: Natural language description of changes
+        current_agent: Current agent JSON
+        library_agents: User's library agents available for sub-agent composition
+        operation_id: Operation ID for async processing (enables Redis Streams callback)
+        task_id: Task ID for async processing (enables Redis Streams callback)
+
+    Returns:
+        Updated agent JSON, clarifying questions dict {"type": "clarifying_questions", ...},
+        {"status": "accepted"} for async, error dict {"type": "error", ...}, or None on error
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the external service is not configured.
+    """
+    _check_service_configured()
+    logger.info("Calling external Agent Generator service for generate_agent_patch")
+    return await generate_agent_patch_external(
+        update_request,
+        current_agent,
+        _to_dict_list(library_agents),
+        operation_id,
+        task_id,
+    )
+
+
+async def customize_template(
+    template_agent: dict[str, Any],
+    modification_request: str,
+    context: str = "",
+) -> dict[str, Any] | None:
+    """Customize a template/marketplace agent using natural language.
+
+    This is used when users want to modify a template or marketplace agent
+    to fit their specific needs before adding it to their library.
+
+    The external Agent Generator service handles:
+    - Understanding the modification request
+    - Applying changes to the template
+    - Fixing and validating the result
+
+    Args:
+        template_agent: The template agent JSON to customize
+        modification_request: Natural language description of customizations
+        context: Additional context (e.g., answers to previous questions)
+
+    Returns:
+        Customized agent JSON, clarifying questions dict {"type": "clarifying_questions", ...},
+        error dict {"type": "error", ...}, or None on unexpected error
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the external service is not configured.
+    """
+    _check_service_configured()
+    logger.info("Calling external Agent Generator service for customize_template")
+    return await customize_template_external(
+        template_agent, modification_request, context
+    )
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/dummy.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/dummy.py
@@ -0,0 +1,154 @@
+"""Dummy Agent Generator for testing.
+
+Returns mock responses matching the format expected from the external service.
+Enable via AGENTGENERATOR_USE_DUMMY=true in settings.
+
+WARNING: This is for testing only. Do not use in production.
+"""
+
+import asyncio
+import logging
+import uuid
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+# Dummy decomposition result (instructions type)
+DUMMY_DECOMPOSITION_RESULT: dict[str, Any] = {
+    "type": "instructions",
+    "steps": [
+        {
+            "description": "Get input from user",
+            "action": "input",
+            "block_name": "AgentInputBlock",
+        },
+        {
+            "description": "Process the input",
+            "action": "process",
+            "block_name": "TextFormatterBlock",
+        },
+        {
+            "description": "Return output to user",
+            "action": "output",
+            "block_name": "AgentOutputBlock",
+        },
+    ],
+}
+
+# Block IDs from backend/blocks/io.py
+AGENT_INPUT_BLOCK_ID = "c0a8e994-ebf1-4a9c-a4d8-89d09c86741b"
+AGENT_OUTPUT_BLOCK_ID = "363ae599-353e-4804-937e-b2ee3cef3da4"
+
+
+def _generate_dummy_agent_json() -> dict[str, Any]:
+    """Generate a minimal valid agent JSON for testing."""
+    input_node_id = str(uuid.uuid4())
+    output_node_id = str(uuid.uuid4())
+
+    return {
+        "id": str(uuid.uuid4()),
+        "version": 1,
+        "is_active": True,
+        "name": "Dummy Test Agent",
+        "description": "A dummy agent generated for testing purposes",
+        "nodes": [
+            {
+                "id": input_node_id,
+                "block_id": AGENT_INPUT_BLOCK_ID,
+                "input_default": {
+                    "name": "input",
+                    "title": "Input",
+                    "description": "Enter your input",
+                    "placeholder_values": [],
+                },
+                "metadata": {"position": {"x": 0, "y": 0}},
+            },
+            {
+                "id": output_node_id,
+                "block_id": AGENT_OUTPUT_BLOCK_ID,
+                "input_default": {
+                    "name": "output",
+                    "title": "Output",
+                    "description": "Agent output",
+                    "format": "{output}",
+                },
+                "metadata": {"position": {"x": 400, "y": 0}},
+            },
+        ],
+        "links": [
+            {
+                "id": str(uuid.uuid4()),
+                "source_id": input_node_id,
+                "sink_id": output_node_id,
+                "source_name": "result",
+                "sink_name": "value",
+                "is_static": False,
+            },
+        ],
+    }
+
+
+async def decompose_goal_dummy(
+    description: str,
+    context: str = "",
+    library_agents: list[dict[str, Any]] | None = None,
+) -> dict[str, Any]:
+    """Return dummy decomposition result."""
+    logger.info("Using dummy agent generator for decompose_goal")
+    return DUMMY_DECOMPOSITION_RESULT.copy()
+
+
+async def generate_agent_dummy(
+    instructions: dict[str, Any],
+    library_agents: list[dict[str, Any]] | None = None,
+    operation_id: str | None = None,
+    task_id: str | None = None,
+) -> dict[str, Any]:
+    """Return dummy agent JSON after a simulated delay."""
+    logger.info("Using dummy agent generator for generate_agent (30s delay)")
+    await asyncio.sleep(30)
+    return _generate_dummy_agent_json()
+
+
+async def generate_agent_patch_dummy(
+    update_request: str,
+    current_agent: dict[str, Any],
+    library_agents: list[dict[str, Any]] | None = None,
+    operation_id: str | None = None,
+    task_id: str | None = None,
+) -> dict[str, Any]:
+    """Return dummy patched agent (returns the current agent with updated description)."""
+    logger.info("Using dummy agent generator for generate_agent_patch")
+    patched = current_agent.copy()
+    patched["description"] = (
+        f"{current_agent.get('description', '')} (updated: {update_request})"
+    )
+    return patched
+
+
+async def customize_template_dummy(
+    template_agent: dict[str, Any],
+    modification_request: str,
+    context: str = "",
+) -> dict[str, Any]:
+    """Return dummy customized template (returns template with updated description)."""
+    logger.info("Using dummy agent generator for customize_template")
+    customized = template_agent.copy()
+    customized["description"] = (
+        f"{template_agent.get('description', '')} (customized: {modification_request})"
+    )
+    return customized
+
+
+async def get_blocks_dummy() -> list[dict[str, Any]]:
+    """Return dummy blocks list."""
+    logger.info("Using dummy agent generator for get_blocks")
+    return [
+        {"id": AGENT_INPUT_BLOCK_ID, "name": "AgentInputBlock"},
+        {"id": AGENT_OUTPUT_BLOCK_ID, "name": "AgentOutputBlock"},
+    ]
+
+
+async def health_check_dummy() -> bool:
+    """Always returns healthy for dummy service."""
+    return True
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/errors.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/errors.py
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/service.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/service.py
@@ -0,0 +1,549 @@
+"""External Agent Generator service client.
+
+This module provides a client for communicating with the external Agent Generator
+microservice. When AGENTGENERATOR_HOST is configured, the agent generation functions
+will delegate to the external service instead of using the built-in LLM-based implementation.
+"""
+
+import logging
+from typing import Any
+
+import httpx
+
+from backend.util.settings import Settings
+
+from .dummy import (
+    customize_template_dummy,
+    decompose_goal_dummy,
+    generate_agent_dummy,
+    generate_agent_patch_dummy,
+    get_blocks_dummy,
+    health_check_dummy,
+)
+
+logger = logging.getLogger(__name__)
+
+_dummy_mode_warned = False
+
+
+def _create_error_response(
+    error_message: str,
+    error_type: str = "unknown",
+    details: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    """Create a standardized error response dict.
+
+    Args:
+        error_message: Human-readable error message
+        error_type: Machine-readable error type
+        details: Optional additional error details
+
+    Returns:
+        Error dict with type="error" and error details
+    """
+    response: dict[str, Any] = {
+        "type": "error",
+        "error": error_message,
+        "error_type": error_type,
+    }
+    if details:
+        response["details"] = details
+    return response
+
+
+def _classify_http_error(e: httpx.HTTPStatusError) -> tuple[str, str]:
+    """Classify an HTTP error into error_type and message.
+
+    Args:
+        e: The HTTP status error
+
+    Returns:
+        Tuple of (error_type, error_message)
+    """
+    status = e.response.status_code
+    if status == 429:
+        return "rate_limit", f"Agent Generator rate limited: {e}"
+    elif status == 503:
+        return "service_unavailable", f"Agent Generator unavailable: {e}"
+    elif status == 504 or status == 408:
+        return "timeout", f"Agent Generator timed out: {e}"
+    else:
+        return "http_error", f"HTTP error calling Agent Generator: {e}"
+
+
+def _classify_request_error(e: httpx.RequestError) -> tuple[str, str]:
+    """Classify a request error into error_type and message.
+
+    Args:
+        e: The request error
+
+    Returns:
+        Tuple of (error_type, error_message)
+    """
+    error_str = str(e).lower()
+    if "timeout" in error_str or "timed out" in error_str:
+        return "timeout", f"Agent Generator request timed out: {e}"
+    elif "connect" in error_str:
+        return "connection_error", f"Could not connect to Agent Generator: {e}"
+    else:
+        return "request_error", f"Request error calling Agent Generator: {e}"
+
+
+_client: httpx.AsyncClient | None = None
+_settings: Settings | None = None
+
+
+def _get_settings() -> Settings:
+    """Get or create settings singleton."""
+    global _settings
+    if _settings is None:
+        _settings = Settings()
+    return _settings
+
+
+def _is_dummy_mode() -> bool:
+    """Check if dummy mode is enabled for testing."""
+    global _dummy_mode_warned
+    settings = _get_settings()
+    is_dummy = bool(settings.config.agentgenerator_use_dummy)
+    if is_dummy and not _dummy_mode_warned:
+        logger.warning(
+            "Agent Generator running in DUMMY MODE - returning mock responses. "
+            "Do not use in production!"
+        )
+        _dummy_mode_warned = True
+    return is_dummy
+
+
+def is_external_service_configured() -> bool:
+    """Check if external Agent Generator service is configured (or dummy mode)."""
+    settings = _get_settings()
+    return bool(settings.config.agentgenerator_host) or bool(
+        settings.config.agentgenerator_use_dummy
+    )
+
+
+def _get_base_url() -> str:
+    """Get the base URL for the external service."""
+    settings = _get_settings()
+    host = settings.config.agentgenerator_host
+    port = settings.config.agentgenerator_port
+    return f"http://{host}:{port}"
+
+
+def _get_client() -> httpx.AsyncClient:
+    """Get or create the HTTP client for the external service."""
+    global _client
+    if _client is None:
+        settings = _get_settings()
+        _client = httpx.AsyncClient(
+            base_url=_get_base_url(),
+            timeout=httpx.Timeout(settings.config.agentgenerator_timeout),
+        )
+    return _client
+
+
+async def decompose_goal_external(
+    description: str,
+    context: str = "",
+    library_agents: list[dict[str, Any]] | None = None,
+) -> dict[str, Any] | None:
+    """Call the external service to decompose a goal.
+
+    Args:
+        description: Natural language goal description
+        context: Additional context (e.g., answers to previous questions)
+        library_agents: User's library agents available for sub-agent composition
+
+    Returns:
+        Dict with either:
+        - {"type": "clarifying_questions", "questions": [...]}
+        - {"type": "instructions", "steps": [...]}
+        - {"type": "unachievable_goal", ...}
+        - {"type": "vague_goal", ...}
+        - {"type": "error", "error": "...", "error_type": "..."} on error
+        Or None on unexpected error
+    """
+    if _is_dummy_mode():
+        return await decompose_goal_dummy(description, context, library_agents)
+
+    client = _get_client()
+
+    if context:
+        description = f"{description}\n\nAdditional context from user:\n{context}"
+
+    payload: dict[str, Any] = {"description": description}
+    if library_agents:
+        payload["library_agents"] = library_agents
+
+    try:
+        response = await client.post("/api/decompose-description", json=payload)
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            error_msg = data.get("error", "Unknown error from Agent Generator")
+            error_type = data.get("error_type", "unknown")
+            logger.error(
+                f"Agent Generator decomposition failed: {error_msg} "
+                f"(type: {error_type})"
+            )
+            return _create_error_response(error_msg, error_type)
+
+        # Map the response to the expected format
+        response_type = data.get("type")
+        if response_type == "instructions":
+            return {"type": "instructions", "steps": data.get("steps", [])}
+        elif response_type == "clarifying_questions":
+            return {
+                "type": "clarifying_questions",
+                "questions": data.get("questions", []),
+            }
+        elif response_type == "unachievable_goal":
+            return {
+                "type": "unachievable_goal",
+                "reason": data.get("reason"),
+                "suggested_goal": data.get("suggested_goal"),
+            }
+        elif response_type == "vague_goal":
+            return {
+                "type": "vague_goal",
+                "suggested_goal": data.get("suggested_goal"),
+            }
+        elif response_type == "error":
+            # Pass through error from the service
+            return _create_error_response(
+                data.get("error", "Unknown error"),
+                data.get("error_type", "unknown"),
+            )
+        else:
+            logger.error(
+                f"Unknown response type from external service: {response_type}"
+            )
+            return _create_error_response(
+                f"Unknown response type from Agent Generator: {response_type}",
+                "invalid_response",
+            )
+
+    except httpx.HTTPStatusError as e:
+        error_type, error_msg = _classify_http_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except httpx.RequestError as e:
+        error_type, error_msg = _classify_request_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except Exception as e:
+        error_msg = f"Unexpected error calling Agent Generator: {e}"
+        logger.error(error_msg)
+        return _create_error_response(error_msg, "unexpected_error")
+
+
+async def generate_agent_external(
+    instructions: dict[str, Any],
+    library_agents: list[dict[str, Any]] | None = None,
+    operation_id: str | None = None,
+    task_id: str | None = None,
+) -> dict[str, Any] | None:
+    """Call the external service to generate an agent from instructions.
+
+    Args:
+        instructions: Structured instructions from decompose_goal
+        library_agents: User's library agents available for sub-agent composition
+        operation_id: Operation ID for async processing (enables Redis Streams callback)
+        task_id: Task ID for async processing (enables Redis Streams callback)
+
+    Returns:
+        Agent JSON dict, {"status": "accepted"} for async, or error dict {"type": "error", ...} on error
+    """
+    if _is_dummy_mode():
+        return await generate_agent_dummy(
+            instructions, library_agents, operation_id, task_id
+        )
+
+    client = _get_client()
+
+    # Build request payload
+    payload: dict[str, Any] = {"instructions": instructions}
+    if library_agents:
+        payload["library_agents"] = library_agents
+    if operation_id and task_id:
+        payload["operation_id"] = operation_id
+        payload["task_id"] = task_id
+
+    try:
+        response = await client.post("/api/generate-agent", json=payload)
+
+        # Handle 202 Accepted for async processing
+        if response.status_code == 202:
+            logger.info(
+                f"Agent Generator accepted async request "
+                f"(operation_id={operation_id}, task_id={task_id})"
+            )
+            return {
+                "status": "accepted",
+                "operation_id": operation_id,
+                "task_id": task_id,
+            }
+
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            error_msg = data.get("error", "Unknown error from Agent Generator")
+            error_type = data.get("error_type", "unknown")
+            logger.error(
+                f"Agent Generator generation failed: {error_msg} (type: {error_type})"
+            )
+            return _create_error_response(error_msg, error_type)
+
+        return data.get("agent_json")
+
+    except httpx.HTTPStatusError as e:
+        error_type, error_msg = _classify_http_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except httpx.RequestError as e:
+        error_type, error_msg = _classify_request_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except Exception as e:
+        error_msg = f"Unexpected error calling Agent Generator: {e}"
+        logger.error(error_msg)
+        return _create_error_response(error_msg, "unexpected_error")
+
+
+async def generate_agent_patch_external(
+    update_request: str,
+    current_agent: dict[str, Any],
+    library_agents: list[dict[str, Any]] | None = None,
+    operation_id: str | None = None,
+    task_id: str | None = None,
+) -> dict[str, Any] | None:
+    """Call the external service to generate a patch for an existing agent.
+
+    Args:
+        update_request: Natural language description of changes
+        current_agent: Current agent JSON
+        library_agents: User's library agents available for sub-agent composition
+        operation_id: Operation ID for async processing (enables Redis Streams callback)
+        task_id: Task ID for async processing (enables Redis Streams callback)
+
+    Returns:
+        Updated agent JSON, clarifying questions dict, {"status": "accepted"} for async, or error dict on error
+    """
+    if _is_dummy_mode():
+        return await generate_agent_patch_dummy(
+            update_request, current_agent, library_agents, operation_id, task_id
+        )
+
+    client = _get_client()
+
+    # Build request payload
+    payload: dict[str, Any] = {
+        "update_request": update_request,
+        "current_agent_json": current_agent,
+    }
+    if library_agents:
+        payload["library_agents"] = library_agents
+    if operation_id and task_id:
+        payload["operation_id"] = operation_id
+        payload["task_id"] = task_id
+
+    try:
+        response = await client.post("/api/update-agent", json=payload)
+
+        # Handle 202 Accepted for async processing
+        if response.status_code == 202:
+            logger.info(
+                f"Agent Generator accepted async update request "
+                f"(operation_id={operation_id}, task_id={task_id})"
+            )
+            return {
+                "status": "accepted",
+                "operation_id": operation_id,
+                "task_id": task_id,
+            }
+
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            error_msg = data.get("error", "Unknown error from Agent Generator")
+            error_type = data.get("error_type", "unknown")
+            logger.error(
+                f"Agent Generator patch generation failed: {error_msg} "
+                f"(type: {error_type})"
+            )
+            return _create_error_response(error_msg, error_type)
+
+        # Check if it's clarifying questions
+        if data.get("type") == "clarifying_questions":
+            return {
+                "type": "clarifying_questions",
+                "questions": data.get("questions", []),
+            }
+
+        # Check if it's an error passed through
+        if data.get("type") == "error":
+            return _create_error_response(
+                data.get("error", "Unknown error"),
+                data.get("error_type", "unknown"),
+            )
+
+        # Otherwise return the updated agent JSON
+        return data.get("agent_json")
+
+    except httpx.HTTPStatusError as e:
+        error_type, error_msg = _classify_http_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except httpx.RequestError as e:
+        error_type, error_msg = _classify_request_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except Exception as e:
+        error_msg = f"Unexpected error calling Agent Generator: {e}"
+        logger.error(error_msg)
+        return _create_error_response(error_msg, "unexpected_error")
+
+
+async def customize_template_external(
+    template_agent: dict[str, Any],
+    modification_request: str,
+    context: str = "",
+) -> dict[str, Any] | None:
+    """Call the external service to customize a template/marketplace agent.
+
+    Args:
+        template_agent: The template agent JSON to customize
+        modification_request: Natural language description of customizations
+        context: Additional context (e.g., answers to previous questions)
+
+    Returns:
+        Customized agent JSON, clarifying questions dict, or error dict on error
+    """
+    if _is_dummy_mode():
+        return await customize_template_dummy(
+            template_agent, modification_request, context
+        )
+
+    client = _get_client()
+
+    request = modification_request
+    if context:
+        request = f"{modification_request}\n\nAdditional context from user:\n{context}"
+
+    payload: dict[str, Any] = {
+        "template_agent_json": template_agent,
+        "modification_request": request,
+    }
+
+    try:
+        response = await client.post("/api/template-modification", json=payload)
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            error_msg = data.get("error", "Unknown error from Agent Generator")
+            error_type = data.get("error_type", "unknown")
+            logger.error(
+                f"Agent Generator template customization failed: {error_msg} "
+                f"(type: {error_type})"
+            )
+            return _create_error_response(error_msg, error_type)
+
+        # Check if it's clarifying questions
+        if data.get("type") == "clarifying_questions":
+            return {
+                "type": "clarifying_questions",
+                "questions": data.get("questions", []),
+            }
+
+        # Check if it's an error passed through
+        if data.get("type") == "error":
+            return _create_error_response(
+                data.get("error", "Unknown error"),
+                data.get("error_type", "unknown"),
+            )
+
+        # Otherwise return the customized agent JSON
+        return data.get("agent_json")
+
+    except httpx.HTTPStatusError as e:
+        error_type, error_msg = _classify_http_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except httpx.RequestError as e:
+        error_type, error_msg = _classify_request_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except Exception as e:
+        error_msg = f"Unexpected error calling Agent Generator: {e}"
+        logger.error(error_msg)
+        return _create_error_response(error_msg, "unexpected_error")
+
+
+async def get_blocks_external() -> list[dict[str, Any]] | None:
+    """Get available blocks from the external service.
+
+    Returns:
+        List of block info dicts or None on error
+    """
+    if _is_dummy_mode():
+        return await get_blocks_dummy()
+
+    client = _get_client()
+
+    try:
+        response = await client.get("/api/blocks")
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            logger.error("External service returned error getting blocks")
+            return None
+
+        return data.get("blocks", [])
+
+    except httpx.HTTPStatusError as e:
+        logger.error(f"HTTP error getting blocks from external service: {e}")
+        return None
+    except httpx.RequestError as e:
+        logger.error(f"Request error getting blocks from external service: {e}")
+        return None
+    except Exception as e:
+        logger.error(f"Unexpected error getting blocks from external service: {e}")
+        return None
+
+
+async def health_check() -> bool:
+    """Check if the external service is healthy.
+
+    Returns:
+        True if healthy, False otherwise
+    """
+    if not is_external_service_configured():
+        return False
+
+    if _is_dummy_mode():
+        return await health_check_dummy()
+
+    client = _get_client()
+
+    try:
+        response = await client.get("/health")
+        response.raise_for_status()
+        data = response.json()
+        return data.get("status") == "healthy" and data.get("blocks_loaded", False)
+    except Exception as e:
+        logger.warning(f"External agent generator health check failed: {e}")
+        return False
+
+
+async def close_client() -> None:
+    """Close the HTTP client."""
+    global _client
+    if _client is not None:
+        await _client.aclose()
+        _client = None
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_output.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_output.py
@@ -5,15 +5,15 @@ import re
 from datetime import datetime, timedelta, timezone
 from typing import Any

-from pydantic import BaseModel, Field, field_validator
+from pydantic import BaseModel, field_validator

+from backend.api.features.chat.model import ChatSession
+from backend.api.features.library import db as library_db
 from backend.api.features.library.model import LibraryAgent
-from backend.copilot.model import ChatSession
-from backend.data.db_accessors import execution_db, library_db
+from backend.data import execution as execution_db
 from backend.data.execution import ExecutionStatus, GraphExecution, GraphExecutionMeta

 from .base import BaseTool
-from .execution_utils import TERMINAL_STATUSES, wait_for_execution
 from .models import (
    AgentOutputResponse,
    ErrorResponse,
@@ -34,7 +34,6 @@ class AgentOutputInput(BaseModel):
    store_slug: str = ""
    execution_id: str = ""
    run_time: str = "latest"
-    wait_if_running: int = Field(default=0, ge=0, le=300)

    @field_validator(
        "agent_name",
@@ -118,11 +117,6 @@ class AgentOutputTool(BaseTool):
        Select which run to retrieve using:
        - execution_id: Specific execution ID
        - run_time: 'latest' (default), 'yesterday', 'last week', or ISO date 'YYYY-MM-DD'
-
-        Wait for completion (optional):
-        - wait_if_running: Max seconds to wait if execution is still running (0-300).
-          If the execution is running/queued, waits up to this many seconds for completion.
-          Returns current status on timeout. If already finished, returns immediately.
        """

    @property
@@ -152,13 +146,6 @@ class AgentOutputTool(BaseTool):
                        "Time filter: 'latest', 'yesterday', 'last week', or 'YYYY-MM-DD'"
                    ),
                },
-                "wait_if_running": {
-                    "type": "integer",
-                    "description": (
-                        "Max seconds to wait if execution is still running (0-300). "
-                        "If running, waits for completion. Returns current state on timeout."
-                    ),
-                },
            },
            "required": [],
        }
@@ -178,12 +165,10 @@ class AgentOutputTool(BaseTool):
        Resolve agent from provided identifiers.
        Returns (library_agent, error_message).
        """
-        lib_db = library_db()
-
        # Priority 1: Exact library agent ID
        if library_agent_id:
            try:
-                agent = await lib_db.get_library_agent(library_agent_id, user_id)
+                agent = await library_db.get_library_agent(library_agent_id, user_id)
                return agent, None
            except Exception as e:
                logger.warning(f"Failed to get library agent by ID: {e}")
@@ -197,7 +182,7 @@ class AgentOutputTool(BaseTool):
                return None, f"Agent '{store_slug}' not found in marketplace"

            # Find in user's library by graph_id
-            agent = await lib_db.get_library_agent_by_graph_id(user_id, graph.id)
+            agent = await library_db.get_library_agent_by_graph_id(user_id, graph.id)
            if not agent:
                return (
                    None,
@@ -209,7 +194,7 @@ class AgentOutputTool(BaseTool):
        # Priority 3: Fuzzy name search in library
        if agent_name:
            try:
-                response = await lib_db.list_library_agents(
+                response = await library_db.list_library_agents(
                    user_id=user_id,
                    search_term=agent_name,
                    page_size=5,
@@ -238,20 +223,14 @@ class AgentOutputTool(BaseTool):
        execution_id: str | None,
        time_start: datetime | None,
        time_end: datetime | None,
-        include_running: bool = False,
    ) -> tuple[GraphExecution | None, list[GraphExecutionMeta], str | None]:
        """
        Fetch execution(s) based on filters.
        Returns (single_execution, available_executions_meta, error_message).
-
-        Args:
-            include_running: If True, also look for running/queued executions (for waiting)
        """
-        exec_db = execution_db()
-
        # If specific execution_id provided, fetch it directly
        if execution_id:
-            execution = await exec_db.get_graph_execution(
+            execution = await execution_db.get_graph_execution(
                user_id=user_id,
                execution_id=execution_id,
                include_node_executions=False,
@@ -260,25 +239,11 @@ class AgentOutputTool(BaseTool):
                return None, [], f"Execution '{execution_id}' not found"
            return execution, [], None

-        # Determine which statuses to query
-        statuses = [ExecutionStatus.COMPLETED]
-        if include_running:
-            statuses.extend(
-                [
-                    ExecutionStatus.RUNNING,
-                    ExecutionStatus.QUEUED,
-                    ExecutionStatus.INCOMPLETE,
-                    ExecutionStatus.REVIEW,
-                    ExecutionStatus.FAILED,
-                    ExecutionStatus.TERMINATED,
-                ]
-            )
-
-        # Get executions with time filters
-        executions = await exec_db.get_graph_executions(
+        # Get completed executions with time filters
+        executions = await execution_db.get_graph_executions(
            graph_id=graph_id,
            user_id=user_id,
-            statuses=statuses,
+            statuses=[ExecutionStatus.COMPLETED],
            created_time_gte=time_start,
            created_time_lte=time_end,
            limit=10,
@@ -289,7 +254,7 @@ class AgentOutputTool(BaseTool):

        # If only one execution, fetch full details
        if len(executions) == 1:
-            full_execution = await exec_db.get_graph_execution(
+            full_execution = await execution_db.get_graph_execution(
                user_id=user_id,
                execution_id=executions[0].id,
                include_node_executions=False,
@@ -297,7 +262,7 @@ class AgentOutputTool(BaseTool):
            return full_execution, [], None

        # Multiple executions - return latest with full details, plus list of available
-        full_execution = await exec_db.get_graph_execution(
+        full_execution = await execution_db.get_graph_execution(
            user_id=user_id,
            execution_id=executions[0].id,
            include_node_executions=False,
@@ -345,33 +310,10 @@ class AgentOutputTool(BaseTool):
                for e in available_executions[:5]
            ]

-        # Build appropriate message based on execution status
-        if execution.status == ExecutionStatus.COMPLETED:
-            message = f"Found execution outputs for agent '{agent.name}'"
-        elif execution.status == ExecutionStatus.FAILED:
-            message = f"Execution for agent '{agent.name}' failed"
-        elif execution.status == ExecutionStatus.TERMINATED:
-            message = f"Execution for agent '{agent.name}' was terminated"
-        elif execution.status == ExecutionStatus.REVIEW:
-            message = (
-                f"Execution for agent '{agent.name}' is awaiting human review. "
-                "The user needs to approve it before it can continue."
-            )
-        elif execution.status in (
-            ExecutionStatus.RUNNING,
-            ExecutionStatus.QUEUED,
-            ExecutionStatus.INCOMPLETE,
-        ):
-            message = (
-                f"Execution for agent '{agent.name}' is still {execution.status.value}. "
-                "Results may be incomplete. Use wait_if_running to wait for completion."
-            )
-        else:
-            message = f"Found execution for agent '{agent.name}' (status: {execution.status.value})"
-
+        message = f"Found execution outputs for agent '{agent.name}'"
        if len(available_executions) > 1:
            message += (
-                f" Showing latest of {len(available_executions)} matching executions."
+                f". Showing latest of {len(available_executions)} matching executions."
            )

        return AgentOutputResponse(
@@ -438,7 +380,7 @@ class AgentOutputTool(BaseTool):
            and not input_data.store_slug
        ):
            # Fetch execution directly to get graph_id
-            execution = await execution_db().get_graph_execution(
+            execution = await execution_db.get_graph_execution(
                user_id=user_id,
                execution_id=input_data.execution_id,
                include_node_executions=False,
@@ -450,7 +392,7 @@ class AgentOutputTool(BaseTool):
                )

            # Find library agent by graph_id
-            agent = await library_db().get_library_agent_by_graph_id(
+            agent = await library_db.get_library_agent_by_graph_id(
                user_id, execution.graph_id
            )
            if not agent:
@@ -486,17 +428,13 @@ class AgentOutputTool(BaseTool):
        # Parse time expression
        time_start, time_end = parse_time_expression(input_data.run_time)

-        # Check if we should wait for running executions
-        wait_timeout = input_data.wait_if_running
-
-        # Fetch execution(s) - include running if we're going to wait
+        # Fetch execution(s)
        execution, available_executions, exec_error = await self._get_execution(
            user_id=user_id,
            graph_id=agent.graph_id,
            execution_id=input_data.execution_id or None,
            time_start=time_start,
            time_end=time_end,
-            include_running=wait_timeout > 0,
        )

        if exec_error:
@@ -505,17 +443,4 @@ class AgentOutputTool(BaseTool):
                session_id=session_id,
            )

-        # If we have an execution that's still running and we should wait
-        if execution and wait_timeout > 0 and execution.status not in TERMINAL_STATUSES:
-            logger.info(
-                f"Execution {execution.id} is {execution.status}, "
-                f"waiting up to {wait_timeout}s for completion"
-            )
-            execution = await wait_for_execution(
-                user_id=user_id,
-                graph_id=agent.graph_id,
-                execution_id=execution.id,
-                timeout_seconds=wait_timeout,
-            )
-
        return self._build_response(agent, execution, available_executions, session_id)
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_search.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_search.py
@@ -1,15 +1,11 @@
 """Shared agent search functionality for find_agent and find_library_agent tools."""

-from __future__ import annotations
-
 import logging
 import re
-from typing import TYPE_CHECKING, Literal
+from typing import Literal

-if TYPE_CHECKING:
-    from backend.api.features.library.model import LibraryAgent
-
-from backend.data.db_accessors import library_db, store_db
+from backend.api.features.library import db as library_db
+from backend.api.features.store import db as store_db
 from backend.util.exceptions import DatabaseError, NotFoundError

 from .models import (
@@ -29,24 +25,92 @@ _UUID_PATTERN = re.compile(
    re.IGNORECASE,
 )

-# Keywords that should be treated as "list all" rather than a literal search
-_LIST_ALL_KEYWORDS = frozenset({"all", "*", "everything", "any", ""})
+
+def _is_uuid(text: str) -> bool:
+    """Check if text is a valid UUID v4."""
+    return bool(_UUID_PATTERN.match(text.strip()))
+
+
+async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | None:
+    """Fetch a library agent by ID (library agent ID or graph_id).
+
+    Tries multiple lookup strategies:
+    1. First by graph_id (AgentGraph primary key)
+    2. Then by library agent ID (LibraryAgent primary key)
+
+    Args:
+        user_id: The user ID
+        agent_id: The ID to look up (can be graph_id or library agent ID)
+
+    Returns:
+        AgentInfo if found, None otherwise
+    """
+    try:
+        agent = await library_db.get_library_agent_by_graph_id(user_id, agent_id)
+        if agent:
+            logger.debug(f"Found library agent by graph_id: {agent.name}")
+            return AgentInfo(
+                id=agent.id,
+                name=agent.name,
+                description=agent.description or "",
+                source="library",
+                in_library=True,
+                creator=agent.creator_name,
+                status=agent.status.value,
+                can_access_graph=agent.can_access_graph,
+                has_external_trigger=agent.has_external_trigger,
+                new_output=agent.new_output,
+                graph_id=agent.graph_id,
+            )
+    except DatabaseError:
+        raise
+    except Exception as e:
+        logger.warning(
+            f"Could not fetch library agent by graph_id {agent_id}: {e}",
+            exc_info=True,
+        )
+
+    try:
+        agent = await library_db.get_library_agent(agent_id, user_id)
+        if agent:
+            logger.debug(f"Found library agent by library_id: {agent.name}")
+            return AgentInfo(
+                id=agent.id,
+                name=agent.name,
+                description=agent.description or "",
+                source="library",
+                in_library=True,
+                creator=agent.creator_name,
+                status=agent.status.value,
+                can_access_graph=agent.can_access_graph,
+                has_external_trigger=agent.has_external_trigger,
+                new_output=agent.new_output,
+                graph_id=agent.graph_id,
+            )
+    except NotFoundError:
+        logger.debug(f"Library agent not found by library_id: {agent_id}")
+    except DatabaseError:
+        raise
+    except Exception as e:
+        logger.warning(
+            f"Could not fetch library agent by library_id {agent_id}: {e}",
+            exc_info=True,
+        )
+
+    return None


 async def search_agents(
    query: str,
    source: SearchSource,
-    session_id: str | None = None,
+    session_id: str | None,
    user_id: str | None = None,
 ) -> ToolResponseBase:
    """
    Search for agents in marketplace or user library.

-    For library searches, keywords like "all", "*", "everything", or an empty
-    query will list all agents without filtering.
-
    Args:
-        query: Search query string. Special keywords list all library agents.
+        query: Search query string
        source: "marketplace" or "library"
        session_id: Chat session ID
        user_id: User ID (required for library search)
@@ -54,11 +118,7 @@ async def search_agents(
    Returns:
        AgentsFoundResponse, NoResultsResponse, or ErrorResponse
    """
-    # Normalize list-all keywords to empty string for library searches
-    if source == "library" and query.lower().strip() in _LIST_ALL_KEYWORDS:
-        query = ""
-
-    if source == "marketplace" and not query:
+    if not query:
        return ErrorResponse(
            message="Please provide a search query", session_id=session_id
        )
@@ -73,7 +133,7 @@ async def search_agents(
    try:
        if source == "marketplace":
            logger.info(f"Searching marketplace for: {query}")
-            results = await store_db().get_store_agents(search_query=query, page_size=5)
+            results = await store_db.get_store_agents(search_query=query, page_size=5)
            for agent in results.agents:
                agents.append(
                    AgentInfo(
@@ -98,18 +158,28 @@ async def search_agents(
                    logger.info(f"Found agent by direct ID lookup: {agent.name}")

            if not agents:
-                search_term = query or None
-                logger.info(
-                    f"{'Listing all agents in' if not query else 'Searching'} "
-                    f"user library{'' if not query else f' for: {query}'}"
-                )
-                results = await library_db().list_library_agents(
+                logger.info(f"Searching user library for: {query}")
+                results = await library_db.list_library_agents(
                    user_id=user_id,  # type: ignore[arg-type]
-                    search_term=search_term,
-                    page_size=50 if not query else 10,
+                    search_term=query,
+                    page_size=10,
                )
                for agent in results.agents:
-                    agents.append(_library_agent_to_info(agent))
+                    agents.append(
+                        AgentInfo(
+                            id=agent.id,
+                            name=agent.name,
+                            description=agent.description or "",
+                            source="library",
+                            in_library=True,
+                            creator=agent.creator_name,
+                            status=agent.status.value,
+                            can_access_graph=agent.can_access_graph,
+                            has_external_trigger=agent.has_external_trigger,
+                            new_output=agent.new_output,
+                            graph_id=agent.graph_id,
+                        )
+                    )
        logger.info(f"Found {len(agents)} agents in {source}")
    except NotFoundError:
        pass
@@ -122,62 +192,42 @@ async def search_agents(
        )

    if not agents:
-        if source == "marketplace":
-            suggestions = [
+        suggestions = (
+            [
                "Try more general terms",
                "Browse categories in the marketplace",
                "Check spelling",
            ]
-            no_results_msg = (
-                f"No agents found matching '{query}'. Let the user know they can "
-                "try different keywords or browse the marketplace. Also let them "
-                "know you can create a custom agent for them based on their needs."
-            )
-        elif not query:
-            # User asked to list all but library is empty
-            suggestions = [
-                "Browse the marketplace to find and add agents",
-                "Use find_agent to search the marketplace",
-            ]
-            no_results_msg = (
-                "Your library is empty. Let the user know they can browse the "
-                "marketplace to find agents, or you can create a custom agent "
-                "for them based on their needs."
-            )
-        else:
-            suggestions = [
+            if source == "marketplace"
+            else [
                "Try different keywords",
                "Use find_agent to search the marketplace",
                "Check your library at /library",
            ]
-            no_results_msg = (
-                f"No agents matching '{query}' found in your library. Let the "
-                "user know you can create a custom agent for them based on "
-                "their needs."
-            )
+        )
+        no_results_msg = (
+            f"No agents found matching '{query}'. Let the user know they can try different keywords or browse the marketplace. Also let them know you can create a custom agent for them based on their needs."
+            if source == "marketplace"
+            else f"No agents matching '{query}' found in your library. Let the user know you can create a custom agent for them based on their needs."
+        )
        return NoResultsResponse(
            message=no_results_msg, session_id=session_id, suggestions=suggestions
        )

-    if source == "marketplace":
-        title = (
-            f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} for '{query}'"
-        )
-    elif not query:
-        title = f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} in your library"
-    else:
-        title = f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} in your library for '{query}'"
+    title = f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} "
+    title += (
+        f"for '{query}'"
+        if source == "marketplace"
+        else f"in your library for '{query}'"
+    )

    message = (
        "Now you have found some options for the user to choose from. "
        "You can add a link to a recommended agent at: /marketplace/agent/agent_id "
-        "Please ask the user if they would like to use any of these agents. "
-        "Let the user know we can create a custom agent for them based on their needs."
+        "Please ask the user if they would like to use any of these agents. Let the user know we can create a custom agent for them based on their needs."
        if source == "marketplace"
-        else "Found agents in the user's library. You can provide a link to view "
-        "an agent at: /library/agents/{agent_id}. Use agent_output to get "
-        "execution results, or run_agent to execute. Let the user know we can "
-        "create a custom agent for them based on their needs."
+        else "Found agents in the user's library. You can provide a link to view an agent at: "
+        "/library/agents/{agent_id}. Use agent_output to get execution results, or run_agent to execute. Let the user know we can create a custom agent for them based on their needs."
    )

    return AgentsFoundResponse(
@@ -187,70 +237,3 @@ async def search_agents(
        count=len(agents),
        session_id=session_id,
    )
-
-
-def _is_uuid(text: str) -> bool:
-    """Check if text is a valid UUID v4."""
-    return bool(_UUID_PATTERN.match(text.strip()))
-
-
-def _library_agent_to_info(agent: LibraryAgent) -> AgentInfo:
-    """Convert a library agent model to an AgentInfo."""
-    return AgentInfo(
-        id=agent.id,
-        name=agent.name,
-        description=agent.description or "",
-        source="library",
-        in_library=True,
-        creator=agent.creator_name,
-        status=agent.status.value,
-        can_access_graph=agent.can_access_graph,
-        has_external_trigger=agent.has_external_trigger,
-        new_output=agent.new_output,
-        graph_id=agent.graph_id,
-        graph_version=agent.graph_version,
-        input_schema=agent.input_schema,
-        output_schema=agent.output_schema,
-    )
-
-
-async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | None:
-    """Fetch a library agent by ID (library agent ID or graph_id).
-
-    Tries multiple lookup strategies:
-    1. First by graph_id (AgentGraph primary key)
-    2. Then by library agent ID (LibraryAgent primary key)
-    """
-    lib_db = library_db()
-
-    try:
-        agent = await lib_db.get_library_agent_by_graph_id(user_id, agent_id)
-        if agent:
-            logger.debug(f"Found library agent by graph_id: {agent.name}")
-            return _library_agent_to_info(agent)
-    except NotFoundError:
-        logger.debug(f"Library agent not found by graph_id: {agent_id}")
-    except DatabaseError:
-        raise
-    except Exception as e:
-        logger.warning(
-            f"Could not fetch library agent by graph_id {agent_id}: {e}",
-            exc_info=True,
-        )
-
-    try:
-        agent = await lib_db.get_library_agent(agent_id, user_id)
-        if agent:
-            logger.debug(f"Found library agent by library_id: {agent.name}")
-            return _library_agent_to_info(agent)
-    except NotFoundError:
-        logger.debug(f"Library agent not found by library_id: {agent_id}")
-    except DatabaseError:
-        raise
-    except Exception as e:
-        logger.warning(
-            f"Could not fetch library agent by library_id {agent_id}: {e}",
-            exc_info=True,
-        )
-
-    return None
--- a/autogpt_platform/backend/backend/api/features/chat/tools/base.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/base.py
@@ -0,0 +1,129 @@
+"""Base classes and shared utilities for chat tools."""
+
+import logging
+from typing import Any
+
+from openai.types.chat import ChatCompletionToolParam
+
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.response_model import StreamToolOutputAvailable
+
+from .models import ErrorResponse, NeedLoginResponse, ToolResponseBase
+
+logger = logging.getLogger(__name__)
+
+
+class BaseTool:
+    """Base class for all chat tools."""
+
+    @property
+    def name(self) -> str:
+        """Tool name for OpenAI function calling."""
+        raise NotImplementedError
+
+    @property
+    def description(self) -> str:
+        """Tool description for OpenAI."""
+        raise NotImplementedError
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        """Tool parameters schema for OpenAI."""
+        raise NotImplementedError
+
+    @property
+    def requires_auth(self) -> bool:
+        """Whether this tool requires authentication."""
+        return False
+
+    @property
+    def is_long_running(self) -> bool:
+        """Whether this tool is long-running and should execute in background.
+
+        Long-running tools (like agent generation) are executed via background
+        tasks to survive SSE disconnections. The result is persisted to chat
+        history and visible when the user refreshes.
+        """
+        return False
+
+    def as_openai_tool(self) -> ChatCompletionToolParam:
+        """Convert to OpenAI tool format."""
+        return ChatCompletionToolParam(
+            type="function",
+            function={
+                "name": self.name,
+                "description": self.description,
+                "parameters": self.parameters,
+            },
+        )
+
+    async def execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        tool_call_id: str,
+        **kwargs,
+    ) -> StreamToolOutputAvailable:
+        """Execute the tool with authentication check.
+
+        Args:
+            user_id: User ID (may be anonymous like "anon_123")
+            session_id: Chat session ID
+            **kwargs: Tool-specific parameters
+
+        Returns:
+            Pydantic response object
+
+        """
+        if self.requires_auth and not user_id:
+            logger.error(
+                f"Attempted tool call for {self.name} but user not authenticated"
+            )
+            return StreamToolOutputAvailable(
+                toolCallId=tool_call_id,
+                toolName=self.name,
+                output=NeedLoginResponse(
+                    message=f"Please sign in to use {self.name}",
+                    session_id=session.session_id,
+                ).model_dump_json(),
+                success=False,
+            )
+
+        try:
+            result = await self._execute(user_id, session, **kwargs)
+            return StreamToolOutputAvailable(
+                toolCallId=tool_call_id,
+                toolName=self.name,
+                output=result.model_dump_json(),
+            )
+        except Exception as e:
+            logger.error(f"Error in {self.name}: {e}", exc_info=True)
+            return StreamToolOutputAvailable(
+                toolCallId=tool_call_id,
+                toolName=self.name,
+                output=ErrorResponse(
+                    message=f"An error occurred while executing {self.name}",
+                    error=str(e),
+                    session_id=session.session_id,
+                ).model_dump_json(),
+                success=False,
+            )
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        """Internal execution logic to be implemented by subclasses.
+
+        Args:
+            user_id: User ID (authenticated or anonymous)
+            session_id: Chat session ID
+            **kwargs: Tool-specific parameters
+
+        Returns:
+            Pydantic response object
+
+        """
+        raise NotImplementedError
--- a/autogpt_platform/backend/backend/api/features/chat/tools/bash_exec.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/bash_exec.py
@@ -0,0 +1,131 @@
+"""Bash execution tool — run shell commands in a bubblewrap sandbox.
+
+Full Bash scripting is allowed (loops, conditionals, pipes, functions, etc.).
+Safety comes from OS-level isolation (bubblewrap): only system dirs visible
+read-only, writable workspace only, clean env, no network.
+
+Requires bubblewrap (``bwrap``) — the tool is disabled when bwrap is not
+available (e.g. macOS development).
+"""
+
+import logging
+from typing import Any
+
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.base import BaseTool
+from backend.api.features.chat.tools.models import (
+    BashExecResponse,
+    ErrorResponse,
+    ToolResponseBase,
+)
+from backend.api.features.chat.tools.sandbox import (
+    get_workspace_dir,
+    has_full_sandbox,
+    run_sandboxed,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class BashExecTool(BaseTool):
+    """Execute Bash commands in a bubblewrap sandbox."""
+
+    @property
+    def name(self) -> str:
+        return "bash_exec"
+
+    @property
+    def description(self) -> str:
+        if not has_full_sandbox():
+            return (
+                "Bash execution is DISABLED — bubblewrap sandbox is not "
+                "available on this platform. Do not call this tool."
+            )
+        return (
+            "Execute a Bash command or script in a bubblewrap sandbox. "
+            "Full Bash scripting is supported (loops, conditionals, pipes, "
+            "functions, etc.). "
+            "The sandbox shares the same working directory as the SDK Read/Write "
+            "tools — files created by either are accessible to both. "
+            "SECURITY: Only system directories (/usr, /bin, /lib, /etc) are "
+            "visible read-only, the per-session workspace is the only writable "
+            "path, environment variables are wiped (no secrets), all network "
+            "access is blocked at the kernel level, and resource limits are "
+            "enforced (max 64 processes, 512MB memory, 50MB file size). "
+            "Application code, configs, and other directories are NOT accessible. "
+            "To fetch web content, use the web_fetch tool instead. "
+            "Execution is killed after the timeout (default 30s, max 120s). "
+            "Returns stdout and stderr. "
+            "Useful for file manipulation, data processing with Unix tools "
+            "(grep, awk, sed, jq, etc.), and running shell scripts."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "command": {
+                    "type": "string",
+                    "description": "Bash command or script to execute.",
+                },
+                "timeout": {
+                    "type": "integer",
+                    "description": (
+                        "Max execution time in seconds (default 30, max 120)."
+                    ),
+                    "default": 30,
+                },
+            },
+            "required": ["command"],
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        return False
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs: Any,
+    ) -> ToolResponseBase:
+        session_id = session.session_id if session else None
+
+        if not has_full_sandbox():
+            return ErrorResponse(
+                message="bash_exec requires bubblewrap sandbox (Linux only).",
+                error="sandbox_unavailable",
+                session_id=session_id,
+            )
+
+        command: str = (kwargs.get("command") or "").strip()
+        timeout: int = kwargs.get("timeout", 30)
+
+        if not command:
+            return ErrorResponse(
+                message="No command provided.",
+                error="empty_command",
+                session_id=session_id,
+            )
+
+        workspace = get_workspace_dir(session_id or "default")
+
+        stdout, stderr, exit_code, timed_out = await run_sandboxed(
+            command=["bash", "-c", command],
+            cwd=workspace,
+            timeout=timeout,
+        )
+
+        return BashExecResponse(
+            message=(
+                "Execution timed out"
+                if timed_out
+                else f"Command executed (exit {exit_code})"
+            ),
+            stdout=stdout,
+            stderr=stderr,
+            exit_code=exit_code,
+            timed_out=timed_out,
+            session_id=session_id,
+        )
--- a/autogpt_platform/backend/backend/api/features/chat/tools/check_operation_status.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/check_operation_status.py
@@ -0,0 +1,127 @@
+"""CheckOperationStatusTool — query the status of a long-running operation."""
+
+import logging
+from typing import Any
+
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.base import BaseTool
+from backend.api.features.chat.tools.models import (
+    ErrorResponse,
+    ResponseType,
+    ToolResponseBase,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class OperationStatusResponse(ToolResponseBase):
+    """Response for check_operation_status tool."""
+
+    type: ResponseType = ResponseType.OPERATION_STATUS
+    task_id: str
+    operation_id: str
+    status: str  # "running", "completed", "failed"
+    tool_name: str | None = None
+    message: str = ""
+
+
+class CheckOperationStatusTool(BaseTool):
+    """Check the status of a long-running operation (create_agent, edit_agent, etc.).
+
+    The CoPilot uses this tool to report back to the user whether an
+    operation that was started earlier has completed, failed, or is still
+    running.
+    """
+
+    @property
+    def name(self) -> str:
+        return "check_operation_status"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Check the current status of a long-running operation such as "
+            "create_agent or edit_agent. Accepts either an operation_id or "
+            "task_id from a previous operation_started response. "
+            "Returns the current status: running, completed, or failed."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "operation_id": {
+                    "type": "string",
+                    "description": (
+                        "The operation_id from an operation_started response."
+                    ),
+                },
+                "task_id": {
+                    "type": "string",
+                    "description": (
+                        "The task_id from an operation_started response. "
+                        "Used as fallback if operation_id is not provided."
+                    ),
+                },
+            },
+            "required": [],
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        return False
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        from backend.api.features.chat import stream_registry
+
+        operation_id = (kwargs.get("operation_id") or "").strip()
+        task_id = (kwargs.get("task_id") or "").strip()
+
+        if not operation_id and not task_id:
+            return ErrorResponse(
+                message="Please provide an operation_id or task_id.",
+                error="missing_parameter",
+            )
+
+        task = None
+        if operation_id:
+            task = await stream_registry.find_task_by_operation_id(operation_id)
+        if task is None and task_id:
+            task = await stream_registry.get_task(task_id)
+
+        if task is None:
+            # Task not in Redis — it may have already expired (TTL).
+            # Check conversation history for the result instead.
+            return ErrorResponse(
+                message=(
+                    "Operation not found — it may have already completed and "
+                    "expired from the status tracker. Check the conversation "
+                    "history for the result."
+                ),
+                error="not_found",
+            )
+
+        status_messages = {
+            "running": (
+                f"The {task.tool_name or 'operation'} is still running. "
+                "Please wait for it to complete."
+            ),
+            "completed": (
+                f"The {task.tool_name or 'operation'} has completed successfully."
+            ),
+            "failed": f"The {task.tool_name or 'operation'} has failed.",
+        }
+
+        return OperationStatusResponse(
+            task_id=task.task_id,
+            operation_id=task.operation_id,
+            status=task.status,
+            tool_name=task.tool_name,
+            message=status_messages.get(task.status, f"Status: {task.status}"),
+        )
--- a/autogpt_platform/backend/backend/api/features/chat/tools/create_agent.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/create_agent.py
@@ -0,0 +1,335 @@
+"""CreateAgentTool - Creates agents from natural language descriptions."""
+
+import logging
+from typing import Any
+
+from backend.api.features.chat.model import ChatSession
+
+from .agent_generator import (
+    AgentGeneratorNotConfiguredError,
+    decompose_goal,
+    enrich_library_agents_from_steps,
+    generate_agent,
+    get_all_relevant_agents_for_generation,
+    get_user_message_for_error,
+    save_agent_to_library,
+)
+from .base import BaseTool
+from .models import (
+    AgentPreviewResponse,
+    AgentSavedResponse,
+    AsyncProcessingResponse,
+    ClarificationNeededResponse,
+    ClarifyingQuestion,
+    ErrorResponse,
+    ToolResponseBase,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class CreateAgentTool(BaseTool):
+    """Tool for creating agents from natural language descriptions."""
+
+    @property
+    def name(self) -> str:
+        return "create_agent"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Create a new agent workflow from a natural language description. "
+            "First generates a preview, then saves to library if save=true."
+        )
+
+    @property
+    def requires_auth(self) -> bool:
+        return True
+
+    @property
+    def is_long_running(self) -> bool:
+        return True
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "description": {
+                    "type": "string",
+                    "description": (
+                        "Natural language description of what the agent should do. "
+                        "Be specific about inputs, outputs, and the workflow steps."
+                    ),
+                },
+                "context": {
+                    "type": "string",
+                    "description": (
+                        "Additional context or answers to previous clarifying questions. "
+                        "Include any preferences or constraints mentioned by the user."
+                    ),
+                },
+                "save": {
+                    "type": "boolean",
+                    "description": (
+                        "Whether to save the agent to the user's library. "
+                        "Default is true. Set to false for preview only."
+                    ),
+                    "default": True,
+                },
+            },
+            "required": ["description"],
+        }
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        """Execute the create_agent tool.
+
+        Flow:
+        1. Decompose the description into steps (may return clarifying questions)
+        2. Generate agent JSON (external service handles fixing and validation)
+        3. Preview or save based on the save parameter
+        """
+        description = kwargs.get("description", "").strip()
+        context = kwargs.get("context", "")
+        save = kwargs.get("save", True)
+        session_id = session.session_id if session else None
+
+        # Extract async processing params (passed by long-running tool handler)
+        operation_id = kwargs.get("_operation_id")
+        task_id = kwargs.get("_task_id")
+
+        if not description:
+            return ErrorResponse(
+                message="Please provide a description of what the agent should do.",
+                error="Missing description parameter",
+                session_id=session_id,
+            )
+
+        library_agents = None
+        if user_id:
+            try:
+                library_agents = await get_all_relevant_agents_for_generation(
+                    user_id=user_id,
+                    search_query=description,
+                    include_marketplace=True,
+                )
+                logger.debug(
+                    f"Found {len(library_agents)} relevant agents for sub-agent composition"
+                )
+            except Exception as e:
+                logger.warning(f"Failed to fetch library agents: {e}")
+
+        try:
+            decomposition_result = await decompose_goal(
+                description, context, library_agents
+            )
+        except AgentGeneratorNotConfiguredError:
+            return ErrorResponse(
+                message=(
+                    "Agent generation is not available. "
+                    "The Agent Generator service is not configured."
+                ),
+                error="service_not_configured",
+                session_id=session_id,
+            )
+
+        if decomposition_result is None:
+            return ErrorResponse(
+                message="Failed to analyze the goal. The agent generation service may be unavailable. Please try again.",
+                error="decomposition_failed",
+                details={"description": description[:100]},
+                session_id=session_id,
+            )
+
+        if decomposition_result.get("type") == "error":
+            error_msg = decomposition_result.get("error", "Unknown error")
+            error_type = decomposition_result.get("error_type", "unknown")
+            user_message = get_user_message_for_error(
+                error_type,
+                operation="analyze the goal",
+                llm_parse_message="The AI had trouble understanding this request. Please try rephrasing your goal.",
+            )
+            return ErrorResponse(
+                message=user_message,
+                error=f"decomposition_failed:{error_type}",
+                details={
+                    "description": description[:100],
+                    "service_error": error_msg,
+                    "error_type": error_type,
+                },
+                session_id=session_id,
+            )
+
+        if decomposition_result.get("type") == "clarifying_questions":
+            questions = decomposition_result.get("questions", [])
+            return ClarificationNeededResponse(
+                message=(
+                    "I need some more information to create this agent. "
+                    "Please answer the following questions:"
+                ),
+                questions=[
+                    ClarifyingQuestion(
+                        question=q.get("question", ""),
+                        keyword=q.get("keyword", ""),
+                        example=q.get("example"),
+                    )
+                    for q in questions
+                ],
+                session_id=session_id,
+            )
+
+        if decomposition_result.get("type") == "unachievable_goal":
+            suggested = decomposition_result.get("suggested_goal", "")
+            reason = decomposition_result.get("reason", "")
+            return ErrorResponse(
+                message=(
+                    f"This goal cannot be accomplished with the available blocks. "
+                    f"{reason} "
+                    f"Suggestion: {suggested}"
+                ),
+                error="unachievable_goal",
+                details={"suggested_goal": suggested, "reason": reason},
+                session_id=session_id,
+            )
+
+        if decomposition_result.get("type") == "vague_goal":
+            suggested = decomposition_result.get("suggested_goal", "")
+            return ErrorResponse(
+                message=(
+                    f"The goal is too vague to create a specific workflow. "
+                    f"Suggestion: {suggested}"
+                ),
+                error="vague_goal",
+                details={"suggested_goal": suggested},
+                session_id=session_id,
+            )
+
+        if user_id and library_agents is not None:
+            try:
+                library_agents = await enrich_library_agents_from_steps(
+                    user_id=user_id,
+                    decomposition_result=decomposition_result,
+                    existing_agents=library_agents,
+                    include_marketplace=True,
+                )
+                logger.debug(
+                    f"After enrichment: {len(library_agents)} total agents for sub-agent composition"
+                )
+            except Exception as e:
+                logger.warning(f"Failed to enrich library agents from steps: {e}")
+
+        try:
+            agent_json = await generate_agent(
+                decomposition_result,
+                library_agents,
+                operation_id=operation_id,
+                task_id=task_id,
+            )
+        except AgentGeneratorNotConfiguredError:
+            return ErrorResponse(
+                message=(
+                    "Agent generation is not available. "
+                    "The Agent Generator service is not configured."
+                ),
+                error="service_not_configured",
+                session_id=session_id,
+            )
+
+        if agent_json is None:
+            return ErrorResponse(
+                message="Failed to generate the agent. The agent generation service may be unavailable. Please try again.",
+                error="generation_failed",
+                details={"description": description[:100]},
+                session_id=session_id,
+            )
+
+        if isinstance(agent_json, dict) and agent_json.get("type") == "error":
+            error_msg = agent_json.get("error", "Unknown error")
+            error_type = agent_json.get("error_type", "unknown")
+            user_message = get_user_message_for_error(
+                error_type,
+                operation="generate the agent",
+                llm_parse_message="The AI had trouble generating the agent. Please try again or simplify your goal.",
+                validation_message=(
+                    "I wasn't able to create a valid agent for this request. "
+                    "The generated workflow had some structural issues. "
+                    "Please try simplifying your goal or breaking it into smaller steps."
+                ),
+                error_details=error_msg,
+            )
+            return ErrorResponse(
+                message=user_message,
+                error=f"generation_failed:{error_type}",
+                details={
+                    "description": description[:100],
+                    "service_error": error_msg,
+                    "error_type": error_type,
+                },
+                session_id=session_id,
+            )
+
+        # Check if Agent Generator accepted for async processing
+        if agent_json.get("status") == "accepted":
+            logger.info(
+                f"Agent generation delegated to async processing "
+                f"(operation_id={operation_id}, task_id={task_id})"
+            )
+            return AsyncProcessingResponse(
+                message="Agent generation started. You'll be notified when it's complete.",
+                operation_id=operation_id,
+                task_id=task_id,
+                session_id=session_id,
+            )
+
+        agent_name = agent_json.get("name", "Generated Agent")
+        agent_description = agent_json.get("description", "")
+        node_count = len(agent_json.get("nodes", []))
+        link_count = len(agent_json.get("links", []))
+
+        if not save:
+            return AgentPreviewResponse(
+                message=(
+                    f"I've generated an agent called '{agent_name}' with {node_count} blocks. "
+                    f"Review it and call create_agent with save=true to save it to your library."
+                ),
+                agent_json=agent_json,
+                agent_name=agent_name,
+                description=agent_description,
+                node_count=node_count,
+                link_count=link_count,
+                session_id=session_id,
+            )
+
+        if not user_id:
+            return ErrorResponse(
+                message="You must be logged in to save agents.",
+                error="auth_required",
+                session_id=session_id,
+            )
+
+        try:
+            created_graph, library_agent = await save_agent_to_library(
+                agent_json, user_id
+            )
+
+            return AgentSavedResponse(
+                message=f"Agent '{created_graph.name}' has been saved to your library!",
+                agent_id=created_graph.id,
+                agent_name=created_graph.name,
+                library_agent_id=library_agent.id,
+                library_agent_link=f"/library/agents/{library_agent.id}",
+                agent_page_link=f"/build?flowID={created_graph.id}",
+                session_id=session_id,
+            )
+        except Exception as e:
+            return ErrorResponse(
+                message=f"Failed to save the agent: {str(e)}",
+                error="save_failed",
+                details={"exception": str(e)},
+                session_id=session_id,
+            )
--- a/autogpt_platform/backend/backend/api/features/chat/tools/customize_agent.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/customize_agent.py
@@ -0,0 +1,337 @@
+"""CustomizeAgentTool - Customizes marketplace/template agents using natural language."""
+
+import logging
+from typing import Any
+
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.store import db as store_db
+from backend.api.features.store.exceptions import AgentNotFoundError
+
+from .agent_generator import (
+    AgentGeneratorNotConfiguredError,
+    customize_template,
+    get_user_message_for_error,
+    graph_to_json,
+    save_agent_to_library,
+)
+from .base import BaseTool
+from .models import (
+    AgentPreviewResponse,
+    AgentSavedResponse,
+    ClarificationNeededResponse,
+    ClarifyingQuestion,
+    ErrorResponse,
+    ToolResponseBase,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class CustomizeAgentTool(BaseTool):
+    """Tool for customizing marketplace/template agents using natural language."""
+
+    @property
+    def name(self) -> str:
+        return "customize_agent"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Customize a marketplace or template agent using natural language. "
+            "Takes an existing agent from the marketplace and modifies it based on "
+            "the user's requirements before adding to their library."
+        )
+
+    @property
+    def requires_auth(self) -> bool:
+        return True
+
+    @property
+    def is_long_running(self) -> bool:
+        return True
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "agent_id": {
+                    "type": "string",
+                    "description": (
+                        "The marketplace agent ID in format 'creator/slug' "
+                        "(e.g., 'autogpt/newsletter-writer'). "
+                        "Get this from find_agent results."
+                    ),
+                },
+                "modifications": {
+                    "type": "string",
+                    "description": (
+                        "Natural language description of how to customize the agent. "
+                        "Be specific about what changes you want to make."
+                    ),
+                },
+                "context": {
+                    "type": "string",
+                    "description": (
+                        "Additional context or answers to previous clarifying questions."
+                    ),
+                },
+                "save": {
+                    "type": "boolean",
+                    "description": (
+                        "Whether to save the customized agent to the user's library. "
+                        "Default is true. Set to false for preview only."
+                    ),
+                    "default": True,
+                },
+            },
+            "required": ["agent_id", "modifications"],
+        }
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        """Execute the customize_agent tool.
+
+        Flow:
+        1. Parse the agent ID to get creator/slug
+        2. Fetch the template agent from the marketplace
+        3. Call customize_template with the modification request
+        4. Preview or save based on the save parameter
+        """
+        agent_id = kwargs.get("agent_id", "").strip()
+        modifications = kwargs.get("modifications", "").strip()
+        context = kwargs.get("context", "")
+        save = kwargs.get("save", True)
+        session_id = session.session_id if session else None
+
+        if not agent_id:
+            return ErrorResponse(
+                message="Please provide the marketplace agent ID (e.g., 'creator/agent-name').",
+                error="missing_agent_id",
+                session_id=session_id,
+            )
+
+        if not modifications:
+            return ErrorResponse(
+                message="Please describe how you want to customize this agent.",
+                error="missing_modifications",
+                session_id=session_id,
+            )
+
+        # Parse agent_id in format "creator/slug"
+        parts = [p.strip() for p in agent_id.split("/")]
+        if len(parts) != 2 or not parts[0] or not parts[1]:
+            return ErrorResponse(
+                message=(
+                    f"Invalid agent ID format: '{agent_id}'. "
+                    "Expected format is 'creator/agent-name' "
+                    "(e.g., 'autogpt/newsletter-writer')."
+                ),
+                error="invalid_agent_id_format",
+                session_id=session_id,
+            )
+
+        creator_username, agent_slug = parts
+
+        # Fetch the marketplace agent details
+        try:
+            agent_details = await store_db.get_store_agent_details(
+                username=creator_username, agent_name=agent_slug
+            )
+        except AgentNotFoundError:
+            return ErrorResponse(
+                message=(
+                    f"Could not find marketplace agent '{agent_id}'. "
+                    "Please check the agent ID and try again."
+                ),
+                error="agent_not_found",
+                session_id=session_id,
+            )
+        except Exception as e:
+            logger.error(f"Error fetching marketplace agent {agent_id}: {e}")
+            return ErrorResponse(
+                message="Failed to fetch the marketplace agent. Please try again.",
+                error="fetch_error",
+                session_id=session_id,
+            )
+
+        if not agent_details.store_listing_version_id:
+            return ErrorResponse(
+                message=(
+                    f"The agent '{agent_id}' does not have an available version. "
+                    "Please try a different agent."
+                ),
+                error="no_version_available",
+                session_id=session_id,
+            )
+
+        # Get the full agent graph
+        try:
+            graph = await store_db.get_agent(agent_details.store_listing_version_id)
+            template_agent = graph_to_json(graph)
+        except Exception as e:
+            logger.error(f"Error fetching agent graph for {agent_id}: {e}")
+            return ErrorResponse(
+                message="Failed to fetch the agent configuration. Please try again.",
+                error="graph_fetch_error",
+                session_id=session_id,
+            )
+
+        # Call customize_template
+        try:
+            result = await customize_template(
+                template_agent=template_agent,
+                modification_request=modifications,
+                context=context,
+            )
+        except AgentGeneratorNotConfiguredError:
+            return ErrorResponse(
+                message=(
+                    "Agent customization is not available. "
+                    "The Agent Generator service is not configured."
+                ),
+                error="service_not_configured",
+                session_id=session_id,
+            )
+        except Exception as e:
+            logger.error(f"Error calling customize_template for {agent_id}: {e}")
+            return ErrorResponse(
+                message=(
+                    "Failed to customize the agent due to a service error. "
+                    "Please try again."
+                ),
+                error="customization_service_error",
+                session_id=session_id,
+            )
+
+        if result is None:
+            return ErrorResponse(
+                message=(
+                    "Failed to customize the agent. "
+                    "The agent generation service may be unavailable or timed out. "
+                    "Please try again."
+                ),
+                error="customization_failed",
+                session_id=session_id,
+            )
+
+        # Handle error response
+        if isinstance(result, dict) and result.get("type") == "error":
+            error_msg = result.get("error", "Unknown error")
+            error_type = result.get("error_type", "unknown")
+            user_message = get_user_message_for_error(
+                error_type,
+                operation="customize the agent",
+                llm_parse_message=(
+                    "The AI had trouble customizing the agent. "
+                    "Please try again or simplify your request."
+                ),
+                validation_message=(
+                    "The customized agent failed validation. "
+                    "Please try rephrasing your request."
+                ),
+                error_details=error_msg,
+            )
+            return ErrorResponse(
+                message=user_message,
+                error=f"customization_failed:{error_type}",
+                session_id=session_id,
+            )
+
+        # Handle clarifying questions
+        if isinstance(result, dict) and result.get("type") == "clarifying_questions":
+            questions = result.get("questions") or []
+            if not isinstance(questions, list):
+                logger.error(
+                    f"Unexpected clarifying questions format: {type(questions)}"
+                )
+                questions = []
+            return ClarificationNeededResponse(
+                message=(
+                    "I need some more information to customize this agent. "
+                    "Please answer the following questions:"
+                ),
+                questions=[
+                    ClarifyingQuestion(
+                        question=q.get("question", ""),
+                        keyword=q.get("keyword", ""),
+                        example=q.get("example"),
+                    )
+                    for q in questions
+                    if isinstance(q, dict)
+                ],
+                session_id=session_id,
+            )
+
+        # Result should be the customized agent JSON
+        if not isinstance(result, dict):
+            logger.error(f"Unexpected customize_template response type: {type(result)}")
+            return ErrorResponse(
+                message="Failed to customize the agent due to an unexpected response.",
+                error="unexpected_response_type",
+                session_id=session_id,
+            )
+
+        customized_agent = result
+
+        agent_name = customized_agent.get(
+            "name", f"Customized {agent_details.agent_name}"
+        )
+        agent_description = customized_agent.get("description", "")
+        nodes = customized_agent.get("nodes")
+        links = customized_agent.get("links")
+        node_count = len(nodes) if isinstance(nodes, list) else 0
+        link_count = len(links) if isinstance(links, list) else 0
+
+        if not save:
+            return AgentPreviewResponse(
+                message=(
+                    f"I've customized the agent '{agent_details.agent_name}'. "
+                    f"The customized agent has {node_count} blocks. "
+                    f"Review it and call customize_agent with save=true to save it."
+                ),
+                agent_json=customized_agent,
+                agent_name=agent_name,
+                description=agent_description,
+                node_count=node_count,
+                link_count=link_count,
+                session_id=session_id,
+            )
+
+        if not user_id:
+            return ErrorResponse(
+                message="You must be logged in to save agents.",
+                error="auth_required",
+                session_id=session_id,
+            )
+
+        # Save to user's library
+        try:
+            created_graph, library_agent = await save_agent_to_library(
+                customized_agent, user_id, is_update=False
+            )
+
+            return AgentSavedResponse(
+                message=(
+                    f"Customized agent '{created_graph.name}' "
+                    f"(based on '{agent_details.agent_name}') "
+                    f"has been saved to your library!"
+                ),
+                agent_id=created_graph.id,
+                agent_name=created_graph.name,
+                library_agent_id=library_agent.id,
+                library_agent_link=f"/library/agents/{library_agent.id}",
+                agent_page_link=f"/build?flowID={created_graph.id}",
+                session_id=session_id,
+            )
+        except Exception as e:
+            logger.error(f"Error saving customized agent: {e}")
+            return ErrorResponse(
+                message="Failed to save the customized agent. Please try again.",
+                error="save_failed",
+                session_id=session_id,
+            )
--- a/autogpt_platform/backend/backend/api/features/chat/tools/edit_agent.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/edit_agent.py
@@ -0,0 +1,284 @@
+"""EditAgentTool - Edits existing agents using natural language."""
+
+import logging
+from typing import Any
+
+from backend.api.features.chat.model import ChatSession
+
+from .agent_generator import (
+    AgentGeneratorNotConfiguredError,
+    generate_agent_patch,
+    get_agent_as_json,
+    get_all_relevant_agents_for_generation,
+    get_user_message_for_error,
+    save_agent_to_library,
+)
+from .base import BaseTool
+from .models import (
+    AgentPreviewResponse,
+    AgentSavedResponse,
+    AsyncProcessingResponse,
+    ClarificationNeededResponse,
+    ClarifyingQuestion,
+    ErrorResponse,
+    ToolResponseBase,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class EditAgentTool(BaseTool):
+    """Tool for editing existing agents using natural language."""
+
+    @property
+    def name(self) -> str:
+        return "edit_agent"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Edit an existing agent from the user's library using natural language. "
+            "Generates updates to the agent while preserving unchanged parts."
+        )
+
+    @property
+    def requires_auth(self) -> bool:
+        return True
+
+    @property
+    def is_long_running(self) -> bool:
+        return True
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "agent_id": {
+                    "type": "string",
+                    "description": (
+                        "The ID of the agent to edit. "
+                        "Can be a graph ID or library agent ID."
+                    ),
+                },
+                "changes": {
+                    "type": "string",
+                    "description": (
+                        "Natural language description of what changes to make. "
+                        "Be specific about what to add, remove, or modify."
+                    ),
+                },
+                "context": {
+                    "type": "string",
+                    "description": (
+                        "Additional context or answers to previous clarifying questions."
+                    ),
+                },
+                "save": {
+                    "type": "boolean",
+                    "description": (
+                        "Whether to save the changes. "
+                        "Default is true. Set to false for preview only."
+                    ),
+                    "default": True,
+                },
+            },
+            "required": ["agent_id", "changes"],
+        }
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        """Execute the edit_agent tool.
+
+        Flow:
+        1. Fetch the current agent
+        2. Generate updated agent (external service handles fixing and validation)
+        3. Preview or save based on the save parameter
+        """
+        agent_id = kwargs.get("agent_id", "").strip()
+        changes = kwargs.get("changes", "").strip()
+        context = kwargs.get("context", "")
+        save = kwargs.get("save", True)
+        session_id = session.session_id if session else None
+
+        # Extract async processing params (passed by long-running tool handler)
+        operation_id = kwargs.get("_operation_id")
+        task_id = kwargs.get("_task_id")
+
+        if not agent_id:
+            return ErrorResponse(
+                message="Please provide the agent ID to edit.",
+                error="Missing agent_id parameter",
+                session_id=session_id,
+            )
+
+        if not changes:
+            return ErrorResponse(
+                message="Please describe what changes you want to make.",
+                error="Missing changes parameter",
+                session_id=session_id,
+            )
+
+        current_agent = await get_agent_as_json(agent_id, user_id)
+
+        if current_agent is None:
+            return ErrorResponse(
+                message=f"Could not find agent with ID '{agent_id}' in your library.",
+                error="agent_not_found",
+                session_id=session_id,
+            )
+
+        library_agents = None
+        if user_id:
+            try:
+                graph_id = current_agent.get("id")
+                library_agents = await get_all_relevant_agents_for_generation(
+                    user_id=user_id,
+                    search_query=changes,
+                    exclude_graph_id=graph_id,
+                    include_marketplace=True,
+                )
+                logger.debug(
+                    f"Found {len(library_agents)} relevant agents for sub-agent composition"
+                )
+            except Exception as e:
+                logger.warning(f"Failed to fetch library agents: {e}")
+
+        update_request = changes
+        if context:
+            update_request = f"{changes}\n\nAdditional context:\n{context}"
+
+        try:
+            result = await generate_agent_patch(
+                update_request,
+                current_agent,
+                library_agents,
+                operation_id=operation_id,
+                task_id=task_id,
+            )
+        except AgentGeneratorNotConfiguredError:
+            return ErrorResponse(
+                message=(
+                    "Agent editing is not available. "
+                    "The Agent Generator service is not configured."
+                ),
+                error="service_not_configured",
+                session_id=session_id,
+            )
+
+        if result is None:
+            return ErrorResponse(
+                message="Failed to generate changes. The agent generation service may be unavailable or timed out. Please try again.",
+                error="update_generation_failed",
+                details={"agent_id": agent_id, "changes": changes[:100]},
+                session_id=session_id,
+            )
+
+        # Check if Agent Generator accepted for async processing
+        if result.get("status") == "accepted":
+            logger.info(
+                f"Agent edit delegated to async processing "
+                f"(operation_id={operation_id}, task_id={task_id})"
+            )
+            return AsyncProcessingResponse(
+                message="Agent edit started. You'll be notified when it's complete.",
+                operation_id=operation_id,
+                task_id=task_id,
+                session_id=session_id,
+            )
+
+        # Check if the result is an error from the external service
+        if isinstance(result, dict) and result.get("type") == "error":
+            error_msg = result.get("error", "Unknown error")
+            error_type = result.get("error_type", "unknown")
+            user_message = get_user_message_for_error(
+                error_type,
+                operation="generate the changes",
+                llm_parse_message="The AI had trouble generating the changes. Please try again or simplify your request.",
+                validation_message="The generated changes failed validation. Please try rephrasing your request.",
+                error_details=error_msg,
+            )
+            return ErrorResponse(
+                message=user_message,
+                error=f"update_generation_failed:{error_type}",
+                details={
+                    "agent_id": agent_id,
+                    "changes": changes[:100],
+                    "service_error": error_msg,
+                    "error_type": error_type,
+                },
+                session_id=session_id,
+            )
+
+        if result.get("type") == "clarifying_questions":
+            questions = result.get("questions", [])
+            return ClarificationNeededResponse(
+                message=(
+                    "I need some more information about the changes. "
+                    "Please answer the following questions:"
+                ),
+                questions=[
+                    ClarifyingQuestion(
+                        question=q.get("question", ""),
+                        keyword=q.get("keyword", ""),
+                        example=q.get("example"),
+                    )
+                    for q in questions
+                ],
+                session_id=session_id,
+            )
+
+        updated_agent = result
+
+        agent_name = updated_agent.get("name", "Updated Agent")
+        agent_description = updated_agent.get("description", "")
+        node_count = len(updated_agent.get("nodes", []))
+        link_count = len(updated_agent.get("links", []))
+
+        if not save:
+            return AgentPreviewResponse(
+                message=(
+                    f"I've updated the agent. "
+                    f"The agent now has {node_count} blocks. "
+                    f"Review it and call edit_agent with save=true to save the changes."
+                ),
+                agent_json=updated_agent,
+                agent_name=agent_name,
+                description=agent_description,
+                node_count=node_count,
+                link_count=link_count,
+                session_id=session_id,
+            )
+
+        if not user_id:
+            return ErrorResponse(
+                message="You must be logged in to save agents.",
+                error="auth_required",
+                session_id=session_id,
+            )
+
+        try:
+            created_graph, library_agent = await save_agent_to_library(
+                updated_agent, user_id, is_update=True
+            )
+
+            return AgentSavedResponse(
+                message=f"Updated agent '{created_graph.name}' has been saved to your library!",
+                agent_id=created_graph.id,
+                agent_name=created_graph.name,
+                library_agent_id=library_agent.id,
+                library_agent_link=f"/library/agents/{library_agent.id}",
+                agent_page_link=f"/build?flowID={created_graph.id}",
+                session_id=session_id,
+            )
+        except Exception as e:
+            return ErrorResponse(
+                message=f"Failed to save the updated agent: {str(e)}",
+                error="save_failed",
+                details={"exception": str(e)},
+                session_id=session_id,
+            )
--- a/autogpt_platform/backend/backend/api/features/chat/tools/feature_requests.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/feature_requests.py
@@ -5,14 +5,9 @@ from typing import Any

 from pydantic import SecretStr

-from backend.blocks.linear._api import LinearClient
-from backend.copilot.model import ChatSession
-from backend.data.db_accessors import user_db
-from backend.data.model import APIKeyCredentials
-from backend.util.settings import Settings
-
-from .base import BaseTool
-from .models import (
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.base import BaseTool
+from backend.api.features.chat.tools.models import (
    ErrorResponse,
    FeatureRequestCreatedResponse,
    FeatureRequestInfo,
@@ -20,6 +15,10 @@ from .models import (
    NoResultsResponse,
    ToolResponseBase,
 )
+from backend.blocks.linear._api import LinearClient
+from backend.data.model import APIKeyCredentials
+from backend.data.user import get_user_email_by_id
+from backend.util.settings import Settings

 logger = logging.getLogger(__name__)

@@ -33,6 +32,7 @@ query SearchFeatureRequests($term: String!, $filter: IssueFilter, $first: Int) {
      id
      identifier
      title
+      description
    }
  }
 }
@@ -104,8 +104,8 @@ def _get_linear_config() -> tuple[LinearClient, str, str]:
    Raises RuntimeError if any required setting is missing.
    """
    secrets = _get_settings().secrets
-    if not secrets.copilot_linear_api_key:
-        raise RuntimeError("COPILOT_LINEAR_API_KEY is not configured")
+    if not secrets.linear_api_key:
+        raise RuntimeError("LINEAR_API_KEY is not configured")
    if not secrets.linear_feature_request_project_id:
        raise RuntimeError("LINEAR_FEATURE_REQUEST_PROJECT_ID is not configured")
    if not secrets.linear_feature_request_team_id:
@@ -114,7 +114,7 @@ def _get_linear_config() -> tuple[LinearClient, str, str]:
    credentials = APIKeyCredentials(
        id="system-linear",
        provider="linear",
-        api_key=SecretStr(secrets.copilot_linear_api_key),
+        api_key=SecretStr(secrets.linear_api_key),
        title="System Linear API Key",
    )
    client = LinearClient(credentials=credentials)
@@ -204,6 +204,7 @@ class SearchFeatureRequestsTool(BaseTool):
                    id=node["id"],
                    identifier=node["identifier"],
                    title=node["title"],
+                    description=node.get("description"),
                )
                for node in nodes
            ]
@@ -237,11 +238,7 @@ class CreateFeatureRequestTool(BaseTool):
            "Create a new feature request or add a customer need to an existing one. "
            "Always search first with search_feature_requests to avoid duplicates. "
            "If a matching request exists, pass its ID as existing_issue_id to add "
-            "the user's need to it instead of creating a duplicate. "
-            "IMPORTANT: Never include personally identifiable information (PII) in "
-            "the title or description — no names, emails, phone numbers, company "
-            "names, or other identifying details. Write titles and descriptions in "
-            "generic, feature-focused language."
+            "the user's need to it instead of creating a duplicate."
        )

    @property
@@ -251,20 +248,11 @@ class CreateFeatureRequestTool(BaseTool):
            "properties": {
                "title": {
                    "type": "string",
-                    "description": (
-                        "Title for the feature request. Must be generic and "
-                        "feature-focused — do not include any user names, emails, "
-                        "company names, or other PII."
-                    ),
+                    "description": "Title for the feature request.",
                },
                "description": {
                    "type": "string",
-                    "description": (
-                        "Detailed description of what the user wants and why. "
-                        "Must not contain any personally identifiable information "
-                        "(PII) — describe the feature need generically without "
-                        "referencing specific users, companies, or contact details."
-                    ),
+                    "description": "Detailed description of what the user wants and why.",
                },
                "existing_issue_id": {
                    "type": "string",
@@ -344,9 +332,7 @@ class CreateFeatureRequestTool(BaseTool):
        # Resolve a human-readable name (email) for the Linear customer record.
        # Fall back to user_id if the lookup fails or returns None.
        try:
-            customer_display_name = (
-                await user_db().get_user_email_by_id(user_id) or user_id
-            )
+            customer_display_name = await get_user_email_by_id(user_id) or user_id
        except Exception:
            customer_display_name = user_id

--- a/autogpt_platform/backend/backend/api/features/chat/tools/feature_requests_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/feature_requests_test.py
@@ -1,18 +1,22 @@
 """Tests for SearchFeatureRequestsTool and CreateFeatureRequestTool."""

-from unittest.mock import AsyncMock, MagicMock, patch
+from unittest.mock import AsyncMock, patch

 import pytest

-from ._test_data import make_session
-from .feature_requests import CreateFeatureRequestTool, SearchFeatureRequestsTool
-from .models import (
+from backend.api.features.chat.tools.feature_requests import (
+    CreateFeatureRequestTool,
+    SearchFeatureRequestsTool,
+)
+from backend.api.features.chat.tools.models import (
    ErrorResponse,
    FeatureRequestCreatedResponse,
    FeatureRequestSearchResponse,
    NoResultsResponse,
 )

+from ._test_data import make_session
+
 _TEST_USER_ID = "test-user-feature-requests"
 _TEST_USER_EMAIL = "testuser@example.com"

@@ -35,7 +39,7 @@ def _mock_linear_config(*, query_return=None, mutate_return=None):
        client.mutate.return_value = mutate_return
    return (
        patch(
-            "backend.copilot.tools.feature_requests._get_linear_config",
+            "backend.api.features.chat.tools.feature_requests._get_linear_config",
            return_value=(client, _FAKE_PROJECT_ID, _FAKE_TEAM_ID),
        ),
        client,
@@ -117,11 +121,13 @@ class TestSearchFeatureRequestsTool:
                "id": "id-1",
                "identifier": "FR-1",
                "title": "Dark mode",
+                "description": "Add dark mode support",
            },
            {
                "id": "id-2",
                "identifier": "FR-2",
                "title": "Dark theme",
+                "description": None,
            },
        ]
        patcher, _ = _mock_linear_config(query_return=_search_response(nodes))
@@ -202,7 +208,7 @@ class TestSearchFeatureRequestsTool:
    async def test_linear_client_init_failure(self):
        session = make_session(user_id=_TEST_USER_ID)
        with patch(
-            "backend.copilot.tools.feature_requests._get_linear_config",
+            "backend.api.features.chat.tools.feature_requests._get_linear_config",
            side_effect=RuntimeError("No API key"),
        ):
            tool = SearchFeatureRequestsTool()
@@ -225,11 +231,10 @@ class TestCreateFeatureRequestTool:

    @pytest.fixture(autouse=True)
    def _patch_email_lookup(self):
-        mock_user_db = MagicMock()
-        mock_user_db.get_user_email_by_id = AsyncMock(return_value=_TEST_USER_EMAIL)
        with patch(
-            "backend.copilot.tools.feature_requests.user_db",
-            return_value=mock_user_db,
+            "backend.api.features.chat.tools.feature_requests.get_user_email_by_id",
+            new_callable=AsyncMock,
+            return_value=_TEST_USER_EMAIL,
        ):
            yield

@@ -342,7 +347,7 @@ class TestCreateFeatureRequestTool:
    async def test_linear_client_init_failure(self):
        session = make_session(user_id=_TEST_USER_ID)
        with patch(
-            "backend.copilot.tools.feature_requests._get_linear_config",
+            "backend.api.features.chat.tools.feature_requests._get_linear_config",
            side_effect=RuntimeError("No API key"),
        ):
            tool = CreateFeatureRequestTool()
--- a/autogpt_platform/backend/backend/api/features/chat/tools/find_agent.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/find_agent.py
@@ -2,7 +2,7 @@

 from typing import Any

-from backend.copilot.model import ChatSession
+from backend.api.features.chat.model import ChatSession

 from .agent_search import search_agents
 from .base import BaseTool
--- a/autogpt_platform/backend/backend/api/features/chat/tools/find_block.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/find_block.py
@@ -3,18 +3,17 @@ from typing import Any

 from prisma.enums import ContentType

-from backend.blocks import get_block
-from backend.blocks._base import BlockType
-from backend.copilot.model import ChatSession
-from backend.data.db_accessors import search
-
-from .base import BaseTool, ToolResponseBase
-from .models import (
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.base import BaseTool, ToolResponseBase
+from backend.api.features.chat.tools.models import (
    BlockInfoSummary,
    BlockListResponse,
    ErrorResponse,
    NoResultsResponse,
 )
+from backend.api.features.store.hybrid_search import unified_hybrid_search
+from backend.blocks import get_block
+from backend.blocks._base import BlockType

 logger = logging.getLogger(__name__)

@@ -32,7 +31,6 @@ COPILOT_EXCLUDED_BLOCK_TYPES = {
    BlockType.NOTE,  # Visual annotation only - no runtime behavior
    BlockType.HUMAN_IN_THE_LOOP,  # Pauses for human approval - CoPilot IS human-in-the-loop
    BlockType.AGENT,  # AgentExecutorBlock requires execution_context - use run_agent tool
-    BlockType.MCP_TOOL,  # Has dedicated run_mcp_tool tool with discovery + auth flow
 }

 # Specific block IDs excluded from CoPilot (STANDARD type but still require graph context)
@@ -72,15 +70,6 @@ class FindBlockTool(BaseTool):
                        "Use keywords like 'email', 'http', 'text', 'ai', etc."
                    ),
                },
-                "include_schemas": {
-                    "type": "boolean",
-                    "description": (
-                        "If true, include full input_schema and output_schema "
-                        "for each block. Use when generating agent JSON that "
-                        "needs block schemas. Default is false."
-                    ),
-                    "default": False,
-                },
            },
            "required": ["query"],
        }
@@ -108,7 +97,6 @@ class FindBlockTool(BaseTool):
            ErrorResponse: Error message
        """
        query = kwargs.get("query", "").strip()
-        include_schemas = kwargs.get("include_schemas", False)
        session_id = session.session_id

        if not query:
@@ -119,7 +107,7 @@ class FindBlockTool(BaseTool):

        try:
            # Search for blocks using hybrid search
-            results, total = await search().unified_hybrid_search(
+            results, total = await unified_hybrid_search(
                query=query,
                content_types=[ContentType.BLOCK],
                page=1,
@@ -153,21 +141,15 @@ class FindBlockTool(BaseTool):
                ):
                    continue

-                summary = BlockInfoSummary(
-                    id=block_id,
-                    name=block.name,
-                    description=block.optimized_description or block.description or "",
-                    categories=[c.value for c in block.categories],
+                blocks.append(
+                    BlockInfoSummary(
+                        id=block_id,
+                        name=block.name,
+                        description=block.description or "",
+                        categories=[c.value for c in block.categories],
+                    )
                )

-                if include_schemas:
-                    info = block.get_info()
-                    summary.input_schema = info.inputSchema
-                    summary.output_schema = info.outputSchema
-                    summary.static_output = info.staticOutput
-
-                blocks.append(summary)
-
                if len(blocks) >= _TARGET_RESULTS:
                    break

--- a/autogpt_platform/backend/backend/api/features/chat/tools/find_block_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/find_block_test.py
@@ -4,15 +4,15 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from backend.blocks._base import BlockType
-
-from ._test_data import make_session
-from .find_block import (
+from backend.api.features.chat.tools.find_block import (
    COPILOT_EXCLUDED_BLOCK_IDS,
    COPILOT_EXCLUDED_BLOCK_TYPES,
    FindBlockTool,
 )
-from .models import BlockListResponse
+from backend.api.features.chat.tools.models import BlockListResponse
+from backend.blocks._base import BlockType
+
+from ._test_data import make_session

 _TEST_USER_ID = "test-user-find-block"

@@ -25,7 +25,6 @@ def make_mock_block(
    input_schema: dict | None = None,
    output_schema: dict | None = None,
    credentials_fields: dict | None = None,
-    static_output: bool = False,
 ):
    """Create a mock block for testing."""
    mock = MagicMock()
@@ -34,7 +33,6 @@ def make_mock_block(
    mock.description = f"{name} description"
    mock.block_type = block_type
    mock.disabled = disabled
-    mock.static_output = static_output
    mock.input_schema = MagicMock()
    mock.input_schema.jsonschema.return_value = input_schema or {
        "properties": {},
@@ -44,15 +42,6 @@ def make_mock_block(
    mock.output_schema = MagicMock()
    mock.output_schema.jsonschema.return_value = output_schema or {}
    mock.categories = []
-    mock.optimized_description = None
-
-    # Mock get_info() for include_schemas support
-    mock_info = MagicMock()
-    mock_info.inputSchema = input_schema or {"properties": {}, "required": []}
-    mock_info.outputSchema = output_schema or {}
-    mock_info.staticOutput = static_output
-    mock.get_info.return_value = mock_info
-
    return mock


@@ -95,17 +84,13 @@ class TestFindBlockFiltering:
                "standard-block-id": standard_block,
            }.get(block_id)

-        mock_search_db = MagicMock()
-        mock_search_db.unified_hybrid_search = AsyncMock(
-            return_value=(search_results, 2)
-        )
-
        with patch(
-            "backend.copilot.tools.find_block.search",
-            return_value=mock_search_db,
+            "backend.api.features.chat.tools.find_block.unified_hybrid_search",
+            new_callable=AsyncMock,
+            return_value=(search_results, 2),
        ):
            with patch(
-                "backend.copilot.tools.find_block.get_block",
+                "backend.api.features.chat.tools.find_block.get_block",
                side_effect=mock_get_block,
            ):
                tool = FindBlockTool()
@@ -143,17 +128,13 @@ class TestFindBlockFiltering:
                "normal-block-id": normal_block,
            }.get(block_id)

-        mock_search_db = MagicMock()
-        mock_search_db.unified_hybrid_search = AsyncMock(
-            return_value=(search_results, 2)
-        )
-
        with patch(
-            "backend.copilot.tools.find_block.search",
-            return_value=mock_search_db,
+            "backend.api.features.chat.tools.find_block.unified_hybrid_search",
+            new_callable=AsyncMock,
+            return_value=(search_results, 2),
        ):
            with patch(
-                "backend.copilot.tools.find_block.get_block",
+                "backend.api.features.chat.tools.find_block.get_block",
                side_effect=mock_get_block,
            ):
                tool = FindBlockTool()
@@ -372,20 +353,13 @@ class TestFindBlockFiltering:
            for d in block_defs
        }

-        mock_search_db = MagicMock()
-        mock_search_db.unified_hybrid_search = AsyncMock(
-            return_value=(search_results, len(search_results))
-        )
-
-        with (
-            patch(
-                "backend.copilot.tools.find_block.search",
-                return_value=mock_search_db,
-            ),
-            patch(
-                "backend.copilot.tools.find_block.get_block",
-                side_effect=lambda bid: mock_blocks.get(bid),
-            ),
+        with patch(
+            "backend.api.features.chat.tools.find_block.unified_hybrid_search",
+            new_callable=AsyncMock,
+            return_value=(search_results, len(search_results)),
+        ), patch(
+            "backend.api.features.chat.tools.find_block.get_block",
+            side_effect=lambda bid: mock_blocks.get(bid),
        ):
            tool = FindBlockTool()
            response = await tool._execute(
@@ -410,92 +384,3 @@ class TestFindBlockFiltering:
            f"Average chars per block ({avg_chars}) exceeds 500. "
            f"Total response: {total_chars} chars for {response.count} blocks."
        )
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_include_schemas_false_omits_schemas(self):
-        """Without include_schemas, schemas should be empty dicts."""
-        session = make_session(user_id=_TEST_USER_ID)
-        input_schema = {"properties": {"url": {"type": "string"}}, "required": ["url"]}
-        output_schema = {"properties": {"result": {"type": "string"}}}
-
-        search_results = [{"content_id": "block-1", "score": 0.9}]
-        block = make_mock_block(
-            "block-1",
-            "Test Block",
-            BlockType.STANDARD,
-            input_schema=input_schema,
-            output_schema=output_schema,
-        )
-
-        mock_search_db = MagicMock()
-        mock_search_db.unified_hybrid_search = AsyncMock(
-            return_value=(search_results, 1)
-        )
-
-        with (
-            patch(
-                "backend.copilot.tools.find_block.search",
-                return_value=mock_search_db,
-            ),
-            patch(
-                "backend.copilot.tools.find_block.get_block",
-                return_value=block,
-            ),
-        ):
-            tool = FindBlockTool()
-            response = await tool._execute(
-                user_id=_TEST_USER_ID,
-                session=session,
-                query="test",
-                include_schemas=False,
-            )
-
-        assert isinstance(response, BlockListResponse)
-        assert response.blocks[0].input_schema == {}
-        assert response.blocks[0].output_schema == {}
-        assert response.blocks[0].static_output is False
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_include_schemas_true_populates_schemas(self):
-        """With include_schemas=true, schemas should be populated from block info."""
-        session = make_session(user_id=_TEST_USER_ID)
-        input_schema = {"properties": {"url": {"type": "string"}}, "required": ["url"]}
-        output_schema = {"properties": {"result": {"type": "string"}}}
-
-        search_results = [{"content_id": "block-1", "score": 0.9}]
-        block = make_mock_block(
-            "block-1",
-            "Test Block",
-            BlockType.STANDARD,
-            input_schema=input_schema,
-            output_schema=output_schema,
-            static_output=True,
-        )
-
-        mock_search_db = MagicMock()
-        mock_search_db.unified_hybrid_search = AsyncMock(
-            return_value=(search_results, 1)
-        )
-
-        with (
-            patch(
-                "backend.copilot.tools.find_block.search",
-                return_value=mock_search_db,
-            ),
-            patch(
-                "backend.copilot.tools.find_block.get_block",
-                return_value=block,
-            ),
-        ):
-            tool = FindBlockTool()
-            response = await tool._execute(
-                user_id=_TEST_USER_ID,
-                session=session,
-                query="test",
-                include_schemas=True,
-            )
-
-        assert isinstance(response, BlockListResponse)
-        assert response.blocks[0].input_schema == input_schema
-        assert response.blocks[0].output_schema == output_schema
-        assert response.blocks[0].static_output is True
--- a/autogpt_platform/backend/backend/api/features/chat/tools/find_library_agent.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/find_library_agent.py
@@ -2,7 +2,7 @@

 from typing import Any

-from backend.copilot.model import ChatSession
+from backend.api.features.chat.model import ChatSession

 from .agent_search import search_agents
 from .base import BaseTool
@@ -19,13 +19,9 @@ class FindLibraryAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Search for or list agents in the user's library. Use this to find "
-            "agents the user has already added to their library, including agents "
-            "they created or added from the marketplace. "
-            "When creating agents with sub-agent composition, use this to get "
-            "the agent's graph_id, graph_version, input_schema, and output_schema "
-            "needed for AgentExecutorBlock nodes. "
-            "Omit the query to list all agents."
+            "Search for agents in the user's library. Use this to find agents "
+            "the user has already added to their library, including agents they "
+            "created or added from the marketplace."
        )

    @property
@@ -35,13 +31,10 @@ class FindLibraryAgentTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": (
-                        "Search query to find agents by name or description. "
-                        "Omit to list all agents in the library."
-                    ),
+                    "description": "Search query to find agents by name or description.",
                },
            },
-            "required": [],
+            "required": ["query"],
        }

    @property
@@ -52,7 +45,7 @@ class FindLibraryAgentTool(BaseTool):
        self, user_id: str | None, session: ChatSession, **kwargs
    ) -> ToolResponseBase:
        return await search_agents(
-            query=(kwargs.get("query") or "").strip(),
+            query=kwargs.get("query", "").strip(),
            source="library",
            session_id=session.session_id,
            user_id=user_id,
--- a/autogpt_platform/backend/backend/api/features/chat/tools/get_doc_page.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/get_doc_page.py
@@ -4,10 +4,13 @@ import logging
 from pathlib import Path
 from typing import Any

-from backend.copilot.model import ChatSession
-
-from .base import BaseTool
-from .models import DocPageResponse, ErrorResponse, ToolResponseBase
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.base import BaseTool
+from backend.api.features.chat.tools.models import (
+    DocPageResponse,
+    ErrorResponse,
+    ToolResponseBase,
+)

 logger = logging.getLogger(__name__)

--- a/autogpt_platform/backend/backend/api/features/chat/tools/helpers.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/helpers.py
--- a/autogpt_platform/backend/backend/api/features/chat/tools/models.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/models.py
@@ -2,7 +2,7 @@

 from datetime import datetime
 from enum import Enum
-from typing import Any, Literal
+from typing import Any

 from pydantic import BaseModel, Field

@@ -12,70 +12,42 @@ from backend.data.model import CredentialsMetaInput
 class ResponseType(str, Enum):
    """Types of tool responses."""

-    # General
-    ERROR = "error"
-    NO_RESULTS = "no_results"
-    NEED_LOGIN = "need_login"
-
-    # Agent discovery & execution
    AGENTS_FOUND = "agents_found"
    AGENT_DETAILS = "agent_details"
    SETUP_REQUIREMENTS = "setup_requirements"
-    INPUT_VALIDATION_ERROR = "input_validation_error"
    EXECUTION_STARTED = "execution_started"
+    NEED_LOGIN = "need_login"
+    ERROR = "error"
+    NO_RESULTS = "no_results"
    AGENT_OUTPUT = "agent_output"
    UNDERSTANDING_UPDATED = "understanding_updated"
-    SUGGESTED_GOAL = "suggested_goal"
-
-    # Agent builder (create / edit / validate / fix)
-    AGENT_BUILDER_GUIDE = "agent_builder_guide"
-    AGENT_BUILDER_PREVIEW = "agent_builder_preview"
-    AGENT_BUILDER_SAVED = "agent_builder_saved"
-    AGENT_BUILDER_CLARIFICATION_NEEDED = "agent_builder_clarification_needed"
-    AGENT_BUILDER_VALIDATION_RESULT = "agent_builder_validation_result"
-    AGENT_BUILDER_FIX_RESULT = "agent_builder_fix_result"
-
-    # Block
+    AGENT_PREVIEW = "agent_preview"
+    AGENT_SAVED = "agent_saved"
+    CLARIFICATION_NEEDED = "clarification_needed"
    BLOCK_LIST = "block_list"
    BLOCK_DETAILS = "block_details"
    BLOCK_OUTPUT = "block_output"
-
-    # MCP
-    MCP_GUIDE = "mcp_guide"
-    MCP_TOOLS_DISCOVERED = "mcp_tools_discovered"
-    MCP_TOOL_OUTPUT = "mcp_tool_output"
-
-    # Docs
    DOC_SEARCH_RESULTS = "doc_search_results"
    DOC_PAGE = "doc_page"
-
-    # Workspace files
+    # Workspace response types
    WORKSPACE_FILE_LIST = "workspace_file_list"
    WORKSPACE_FILE_CONTENT = "workspace_file_content"
    WORKSPACE_FILE_METADATA = "workspace_file_metadata"
    WORKSPACE_FILE_WRITTEN = "workspace_file_written"
    WORKSPACE_FILE_DELETED = "workspace_file_deleted"
-
-    # Folder management
-    FOLDER_CREATED = "folder_created"
-    FOLDER_LIST = "folder_list"
-    FOLDER_UPDATED = "folder_updated"
-    FOLDER_MOVED = "folder_moved"
-    FOLDER_DELETED = "folder_deleted"
-    AGENTS_MOVED_TO_FOLDER = "agents_moved_to_folder"
-
-    # Browser automation
-    BROWSER_NAVIGATE = "browser_navigate"
-    BROWSER_ACT = "browser_act"
-    BROWSER_SCREENSHOT = "browser_screenshot"
-
+    # Long-running operation types
+    OPERATION_STARTED = "operation_started"
+    OPERATION_PENDING = "operation_pending"
+    OPERATION_IN_PROGRESS = "operation_in_progress"
+    # Input validation
+    INPUT_VALIDATION_ERROR = "input_validation_error"
+    # Web fetch
+    WEB_FETCH = "web_fetch"
    # Code execution
    BASH_EXEC = "bash_exec"
-
-    # Web
-    WEB_FETCH = "web_fetch"
-
-    # Feature requests
+    # Operation status check
+    OPERATION_STATUS = "operation_status"
+    # Feature request types
    FEATURE_REQUEST_SEARCH = "feature_request_search"
    FEATURE_REQUEST_CREATED = "feature_request_created"

@@ -108,15 +80,6 @@ class AgentInfo(BaseModel):
    has_external_trigger: bool | None = None
    new_output: bool | None = None
    graph_id: str | None = None
-    graph_version: int | None = None
-    input_schema: dict[str, Any] | None = Field(
-        default=None,
-        description="JSON Schema for the agent's inputs (for AgentExecutorBlock)",
-    )
-    output_schema: dict[str, Any] | None = Field(
-        default=None,
-        description="JSON Schema for the agent's outputs (for AgentExecutorBlock)",
-    )
    inputs: dict[str, Any] | None = Field(
        default=None,
        description="Input schema for the agent, including field names, types, and defaults",
@@ -307,7 +270,7 @@ class ClarifyingQuestion(BaseModel):
 class AgentPreviewResponse(ToolResponseBase):
    """Response for previewing a generated agent before saving."""

-    type: ResponseType = ResponseType.AGENT_BUILDER_PREVIEW
+    type: ResponseType = ResponseType.AGENT_PREVIEW
    agent_json: dict[str, Any]
    agent_name: str
    description: str
@@ -318,7 +281,7 @@ class AgentPreviewResponse(ToolResponseBase):
 class AgentSavedResponse(ToolResponseBase):
    """Response when an agent is saved to the library."""

-    type: ResponseType = ResponseType.AGENT_BUILDER_SAVED
+    type: ResponseType = ResponseType.AGENT_SAVED
    agent_id: str
    agent_name: str
    library_agent_id: str
@@ -329,26 +292,10 @@ class AgentSavedResponse(ToolResponseBase):
 class ClarificationNeededResponse(ToolResponseBase):
    """Response when the LLM needs more information from the user."""

-    type: ResponseType = ResponseType.AGENT_BUILDER_CLARIFICATION_NEEDED
+    type: ResponseType = ResponseType.CLARIFICATION_NEEDED
    questions: list[ClarifyingQuestion] = Field(default_factory=list)


-class SuggestedGoalResponse(ToolResponseBase):
-    """Response when the goal needs refinement with a suggested alternative."""
-
-    type: ResponseType = ResponseType.SUGGESTED_GOAL
-    suggested_goal: str = Field(description="The suggested alternative goal")
-    reason: str = Field(
-        default="", description="Why the original goal needs refinement"
-    )
-    original_goal: str = Field(
-        default="", description="The user's original goal for context"
-    )
-    goal_type: Literal["vague", "unachievable"] = Field(
-        default="vague", description="Type: 'vague' or 'unachievable'"
-    )
-
-
 # Documentation search models
 class DocSearchResult(BaseModel):
    """A single documentation search result."""
@@ -406,10 +353,6 @@ class BlockInfoSummary(BaseModel):
        default_factory=dict,
        description="Full JSON schema for block outputs",
    )
-    static_output: bool = Field(
-        default=False,
-        description="Whether the block produces output without needing input",
-    )
    required_inputs: list[BlockInputFieldInfo] = Field(
        default_factory=list,
        description="List of input fields for this block",
@@ -458,6 +401,63 @@ class BlockOutputResponse(ToolResponseBase):
    success: bool = True


+# Long-running operation models
+class OperationStartedResponse(ToolResponseBase):
+    """Response when a long-running operation has been started in the background.
+
+    This is returned immediately to the client while the operation continues
+    to execute. The user can close the tab and check back later.
+
+    The task_id can be used to reconnect to the SSE stream via
+    GET /chat/tasks/{task_id}/stream?last_idx=0
+    """
+
+    type: ResponseType = ResponseType.OPERATION_STARTED
+    operation_id: str
+    tool_name: str
+    task_id: str | None = None  # For SSE reconnection
+
+
+class OperationPendingResponse(ToolResponseBase):
+    """Response stored in chat history while a long-running operation is executing.
+
+    This is persisted to the database so users see a pending state when they
+    refresh before the operation completes.
+    """
+
+    type: ResponseType = ResponseType.OPERATION_PENDING
+    operation_id: str
+    tool_name: str
+
+
+class OperationInProgressResponse(ToolResponseBase):
+    """Response when an operation is already in progress.
+
+    Returned for idempotency when the same tool_call_id is requested again
+    while the background task is still running.
+    """
+
+    type: ResponseType = ResponseType.OPERATION_IN_PROGRESS
+    tool_call_id: str
+
+
+class AsyncProcessingResponse(ToolResponseBase):
+    """Response when an operation has been delegated to async processing.
+
+    This is returned by tools when the external service accepts the request
+    for async processing (HTTP 202 Accepted). The Redis Streams completion
+    consumer will handle the result when the external service completes.
+
+    The status field is specifically "accepted" to allow the long-running tool
+    handler to detect this response and skip LLM continuation.
+    """
+
+    type: ResponseType = ResponseType.OPERATION_STARTED
+    status: str = "accepted"  # Must be "accepted" for detection
+    operation_id: str | None = None
+    task_id: str | None = None
+
+
 class WebFetchResponse(ToolResponseBase):
    """Response for web_fetch tool."""

@@ -486,6 +486,7 @@ class FeatureRequestInfo(BaseModel):
    id: str
    identifier: str
    title: str
+    description: str | None = None


 class FeatureRequestSearchResponse(ToolResponseBase):
@@ -507,161 +508,3 @@ class FeatureRequestCreatedResponse(ToolResponseBase):
    issue_url: str
    is_new_issue: bool  # False if added to existing
    customer_name: str
-
-
-# MCP tool models
-class MCPToolInfo(BaseModel):
-    """Information about a single MCP tool discovered from a server."""
-
-    name: str
-    description: str
-    input_schema: dict[str, Any]
-
-
-class MCPToolsDiscoveredResponse(ToolResponseBase):
-    """Response when MCP tools are discovered from a server (agent-internal)."""
-
-    type: ResponseType = ResponseType.MCP_TOOLS_DISCOVERED
-    server_url: str
-    tools: list[MCPToolInfo]
-
-
-class MCPToolOutputResponse(ToolResponseBase):
-    """Response after executing an MCP tool."""
-
-    type: ResponseType = ResponseType.MCP_TOOL_OUTPUT
-    server_url: str
-    tool_name: str
-    result: Any = None
-    success: bool = True
-
-
-# Agent-browser multi-step automation models
-
-
-class BrowserNavigateResponse(ToolResponseBase):
-    """Response for browser_navigate tool."""
-
-    type: ResponseType = ResponseType.BROWSER_NAVIGATE
-    url: str
-    title: str
-    snapshot: str  # Interactive accessibility tree with @ref IDs
-
-
-class BrowserActResponse(ToolResponseBase):
-    """Response for browser_act tool."""
-
-    type: ResponseType = ResponseType.BROWSER_ACT
-    action: str
-    current_url: str = ""
-    snapshot: str  # Updated accessibility tree after the action
-
-
-class BrowserScreenshotResponse(ToolResponseBase):
-    """Response for browser_screenshot tool."""
-
-    type: ResponseType = ResponseType.BROWSER_SCREENSHOT
-    file_id: str  # Workspace file ID — use read_workspace_file to retrieve
-    filename: str
-
-
-# Agent generation tool response models
-
-
-class ValidationResultResponse(ToolResponseBase):
-    """Response for validate_agent_graph tool."""
-
-    type: ResponseType = ResponseType.AGENT_BUILDER_VALIDATION_RESULT
-    valid: bool
-    errors: list[str] = Field(default_factory=list)
-    error_count: int = 0
-
-
-class FixResultResponse(ToolResponseBase):
-    """Response for fix_agent_graph tool."""
-
-    type: ResponseType = ResponseType.AGENT_BUILDER_FIX_RESULT
-    fixed_agent_json: dict[str, Any]
-    fixes_applied: list[str] = Field(default_factory=list)
-    fix_count: int = 0
-    valid_after_fix: bool = False
-    remaining_errors: list[str] = Field(default_factory=list)
-
-
-# Folder management models
-
-
-class FolderAgentSummary(BaseModel):
-    """Lightweight agent info for folder listings."""
-
-    id: str
-    name: str
-    description: str = ""
-
-
-class FolderInfo(BaseModel):
-    """Information about a folder."""
-
-    id: str
-    name: str
-    parent_id: str | None = None
-    icon: str | None = None
-    color: str | None = None
-    agent_count: int = 0
-    subfolder_count: int = 0
-    agents: list[FolderAgentSummary] | None = None
-
-
-class FolderTreeInfo(FolderInfo):
-    """Folder with nested children for tree display."""
-
-    children: list["FolderTreeInfo"] = []
-
-
-class FolderCreatedResponse(ToolResponseBase):
-    """Response when a folder is created."""
-
-    type: ResponseType = ResponseType.FOLDER_CREATED
-    folder: FolderInfo
-
-
-class FolderListResponse(ToolResponseBase):
-    """Response for listing folders."""
-
-    type: ResponseType = ResponseType.FOLDER_LIST
-    folders: list[FolderInfo] = Field(default_factory=list)
-    tree: list[FolderTreeInfo] | None = None
-    root_agents: list[FolderAgentSummary] | None = None
-    count: int = 0
-
-
-class FolderUpdatedResponse(ToolResponseBase):
-    """Response when a folder is updated."""
-
-    type: ResponseType = ResponseType.FOLDER_UPDATED
-    folder: FolderInfo
-
-
-class FolderMovedResponse(ToolResponseBase):
-    """Response when a folder is moved."""
-
-    type: ResponseType = ResponseType.FOLDER_MOVED
-    folder: FolderInfo
-    target_parent_id: str | None = None
-
-
-class FolderDeletedResponse(ToolResponseBase):
-    """Response when a folder is deleted."""
-
-    type: ResponseType = ResponseType.FOLDER_DELETED
-    folder_id: str
-
-
-class AgentsMovedToFolderResponse(ToolResponseBase):
-    """Response when agents are moved to a folder."""
-
-    type: ResponseType = ResponseType.AGENTS_MOVED_TO_FOLDER
-    agent_ids: list[str]
-    agent_names: list[str] = []
-    folder_id: str | None = None
-    count: int = 0
--- a/autogpt_platform/backend/backend/api/features/chat/tools/run_agent.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/run_agent.py
@@ -5,13 +5,16 @@ from typing import Any

 from pydantic import BaseModel, Field, field_validator

-from backend.copilot.config import ChatConfig
-from backend.copilot.model import ChatSession
-from backend.copilot.tracking import track_agent_run_success, track_agent_scheduled
-from backend.data.db_accessors import graph_db, library_db, user_db
-from backend.data.execution import ExecutionStatus
+from backend.api.features.chat.config import ChatConfig
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tracking import (
+    track_agent_run_success,
+    track_agent_scheduled,
+)
+from backend.api.features.library import db as library_db
 from backend.data.graph import GraphModel
 from backend.data.model import CredentialsMetaInput
+from backend.data.user import get_user_by_id
 from backend.executor import utils as execution_utils
 from backend.util.clients import get_scheduler_client
 from backend.util.exceptions import DatabaseError, NotFoundError
@@ -21,15 +24,12 @@ from backend.util.timezone_utils import (
 )

 from .base import BaseTool
-from .execution_utils import get_execution_outputs, wait_for_execution
 from .helpers import get_inputs_from_schema
 from .models import (
    AgentDetails,
    AgentDetailsResponse,
-    AgentOutputResponse,
    ErrorResponse,
    ExecutionOptions,
-    ExecutionOutputInfo,
    ExecutionStartedResponse,
    InputValidationErrorResponse,
    SetupInfo,
@@ -70,7 +70,6 @@ class RunAgentInput(BaseModel):
    schedule_name: str = ""
    cron: str = ""
    timezone: str = "UTC"
-    wait_for_result: int = Field(default=0, ge=0, le=300)

    @field_validator(
        "username_agent_slug",
@@ -152,14 +151,6 @@ class RunAgentTool(BaseTool):
                    "type": "string",
                    "description": "IANA timezone for schedule (default: UTC)",
                },
-                "wait_for_result": {
-                    "type": "integer",
-                    "description": (
-                        "Max seconds to wait for execution to complete (0-300). "
-                        "If >0, blocks until the execution finishes or times out. "
-                        "Returns execution outputs when complete."
-                    ),
-                },
            },
            "required": [],
        }
@@ -209,7 +200,7 @@ class RunAgentTool(BaseTool):

            # Priority: library_agent_id if provided
            if has_library_id:
-                library_agent = await library_db().get_library_agent(
+                library_agent = await library_db.get_library_agent(
                    params.library_agent_id, user_id
                )
                if not library_agent:
@@ -218,7 +209,9 @@ class RunAgentTool(BaseTool):
                        session_id=session_id,
                    )
                # Get the graph from the library agent
-                graph = await graph_db().get_graph(
+                from backend.data.graph import get_graph
+
+                graph = await get_graph(
                    library_agent.graph_id,
                    library_agent.graph_version,
                    user_id=user_id,
@@ -354,7 +347,6 @@ class RunAgentTool(BaseTool):
                    graph=graph,
                    graph_credentials=graph_credentials,
                    inputs=params.inputs,
-                    wait_for_result=params.wait_for_result,
                )

        except NotFoundError as e:
@@ -438,9 +430,8 @@ class RunAgentTool(BaseTool):
        graph: GraphModel,
        graph_credentials: dict[str, CredentialsMetaInput],
        inputs: dict[str, Any],
-        wait_for_result: int = 0,
    ) -> ToolResponseBase:
-        """Execute an agent immediately, optionally waiting for completion."""
+        """Execute an agent immediately."""
        session_id = session.session_id

        # Check rate limits
@@ -477,91 +468,6 @@ class RunAgentTool(BaseTool):
        )

        library_agent_link = f"/library/agents/{library_agent.id}"
-
-        # If wait_for_result is requested, wait for execution to complete
-        if wait_for_result > 0:
-            logger.info(
-                f"Waiting up to {wait_for_result}s for execution {execution.id}"
-            )
-            completed = await wait_for_execution(
-                user_id=user_id,
-                graph_id=library_agent.graph_id,
-                execution_id=execution.id,
-                timeout_seconds=wait_for_result,
-            )
-
-            if completed and completed.status == ExecutionStatus.COMPLETED:
-                outputs = get_execution_outputs(completed)
-                return AgentOutputResponse(
-                    message=(
-                        f"Agent '{library_agent.name}' completed successfully. "
-                        f"View at {library_agent_link}."
-                    ),
-                    session_id=session_id,
-                    agent_name=library_agent.name,
-                    agent_id=library_agent.graph_id,
-                    library_agent_id=library_agent.id,
-                    library_agent_link=library_agent_link,
-                    execution=ExecutionOutputInfo(
-                        execution_id=execution.id,
-                        status=completed.status.value,
-                        started_at=completed.started_at,
-                        ended_at=completed.ended_at,
-                        outputs=outputs or {},
-                    ),
-                )
-            elif completed and completed.status == ExecutionStatus.FAILED:
-                error_detail = completed.stats.error if completed.stats else None
-                return ErrorResponse(
-                    message=(
-                        f"Agent '{library_agent.name}' execution failed. "
-                        f"View details at {library_agent_link}."
-                    ),
-                    session_id=session_id,
-                    error=error_detail,
-                )
-            elif completed and completed.status == ExecutionStatus.TERMINATED:
-                error_detail = completed.stats.error if completed.stats else None
-                return ErrorResponse(
-                    message=(
-                        f"Agent '{library_agent.name}' execution was terminated. "
-                        f"View details at {library_agent_link}."
-                    ),
-                    session_id=session_id,
-                    error=error_detail,
-                )
-            elif completed and completed.status == ExecutionStatus.REVIEW:
-                return ExecutionStartedResponse(
-                    message=(
-                        f"Agent '{library_agent.name}' is awaiting human review. "
-                        f"Check at {library_agent_link}."
-                    ),
-                    session_id=session_id,
-                    execution_id=execution.id,
-                    graph_id=library_agent.graph_id,
-                    graph_name=library_agent.name,
-                    library_agent_id=library_agent.id,
-                    library_agent_link=library_agent_link,
-                    status=ExecutionStatus.REVIEW.value,
-                )
-            else:
-                status = completed.status.value if completed else "unknown"
-                return ExecutionStartedResponse(
-                    message=(
-                        f"Agent '{library_agent.name}' is still {status} after "
-                        f"{wait_for_result}s. Check results later at "
-                        f"{library_agent_link}. "
-                        f"Use view_agent_output with wait_if_running to check again."
-                    ),
-                    session_id=session_id,
-                    execution_id=execution.id,
-                    graph_id=library_agent.graph_id,
-                    graph_name=library_agent.name,
-                    library_agent_id=library_agent.id,
-                    library_agent_link=library_agent_link,
-                    status=status,
-                )
-
        return ExecutionStartedResponse(
            message=(
                f"Agent '{library_agent.name}' execution started successfully. "
@@ -616,7 +522,7 @@ class RunAgentTool(BaseTool):
        library_agent = await get_or_create_library_agent(graph, user_id)

        # Get user timezone
-        user = await user_db().get_user_by_id(user_id)
+        user = await get_user_by_id(user_id)
        user_timezone = get_user_timezone_or_utc(user.timezone if user else timezone)

        # Create schedule
--- a/autogpt_platform/backend/backend/api/features/chat/tools/run_agent_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/run_agent_test.py
--- a/autogpt_platform/backend/backend/api/features/chat/tools/run_block.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/run_block.py
@@ -7,17 +7,20 @@ from typing import Any

 from pydantic_core import PydanticUndefined

-from backend.blocks import BlockType, get_block
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.find_block import (
+    COPILOT_EXCLUDED_BLOCK_IDS,
+    COPILOT_EXCLUDED_BLOCK_TYPES,
+)
+from backend.blocks import get_block
 from backend.blocks._base import AnyBlockSchema
-from backend.copilot.model import ChatSession
-from backend.data.db_accessors import workspace_db
 from backend.data.execution import ExecutionContext
 from backend.data.model import CredentialsFieldInfo, CredentialsMetaInput
+from backend.data.workspace import get_or_create_workspace
 from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.util.exceptions import BlockError

 from .base import BaseTool
-from .find_block import COPILOT_EXCLUDED_BLOCK_IDS, COPILOT_EXCLUDED_BLOCK_TYPES
 from .helpers import get_inputs_from_schema
 from .models import (
    BlockDetails,
@@ -83,7 +86,7 @@ class RunBlockTool(BaseTool):
                    ),
                },
            },
-            "required": ["block_id", "block_name", "input_data"],
+            "required": ["block_id", "input_data"],
        }

    @property
@@ -149,28 +152,20 @@ class RunBlockTool(BaseTool):
            block.block_type in COPILOT_EXCLUDED_BLOCK_TYPES
            or block.id in COPILOT_EXCLUDED_BLOCK_IDS
        ):
-            # Provide actionable guidance for blocks with dedicated tools
-            if block.block_type == BlockType.MCP_TOOL:
-                hint = (
-                    " Use the `run_mcp_tool` tool instead — it handles "
-                    "MCP server discovery, authentication, and execution."
-                )
-            elif block.block_type == BlockType.AGENT:
-                hint = " Use the `run_agent` tool instead."
-            else:
-                hint = " This block is designed for use within graphs only."
            return ErrorResponse(
-                message=f"Block '{block.name}' cannot be run directly.{hint}",
+                message=(
+                    f"Block '{block.name}' cannot be run directly in CoPilot. "
+                    "This block is designed for use within graphs only."
+                ),
                session_id=session_id,
            )

        logger.info(f"Executing block {block.name} ({block_id}) for user {user_id}")

        creds_manager = IntegrationCredentialsManager()
-        (
-            matched_credentials,
-            missing_credentials,
-        ) = await self._resolve_block_credentials(user_id, block, input_data)
+        matched_credentials, missing_credentials = (
+            await self._resolve_block_credentials(user_id, block, input_data)
+        )

        # Get block schemas for details/validation
        try:
@@ -281,7 +276,7 @@ class RunBlockTool(BaseTool):

        try:
            # Get or create user's workspace for CoPilot file operations
-            workspace = await workspace_db().get_or_create_workspace(user_id)
+            workspace = await get_or_create_workspace(user_id)

            # Generate synthetic IDs for CoPilot context
            # Each chat session is treated as its own agent with one continuous run
--- a/autogpt_platform/backend/backend/api/features/chat/tools/run_block_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/run_block_test.py
@@ -4,16 +4,16 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from backend.blocks._base import BlockType
-
-from ._test_data import make_session
-from .models import (
+from backend.api.features.chat.tools.models import (
    BlockDetailsResponse,
    BlockOutputResponse,
    ErrorResponse,
    InputValidationErrorResponse,
 )
-from .run_block import RunBlockTool
+from backend.api.features.chat.tools.run_block import RunBlockTool
+from backend.blocks._base import BlockType
+
+from ._test_data import make_session

 _TEST_USER_ID = "test-user-run-block"

@@ -77,7 +77,7 @@ class TestRunBlockFiltering:
        input_block = make_mock_block("input-block-id", "Input Block", BlockType.INPUT)

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.api.features.chat.tools.run_block.get_block",
            return_value=input_block,
        ):
            tool = RunBlockTool()
@@ -89,7 +89,7 @@ class TestRunBlockFiltering:
            )

        assert isinstance(response, ErrorResponse)
-        assert "cannot be run directly" in response.message
+        assert "cannot be run directly in CoPilot" in response.message
        assert "designed for use within graphs only" in response.message

    @pytest.mark.asyncio(loop_scope="session")
@@ -103,7 +103,7 @@ class TestRunBlockFiltering:
        )

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.api.features.chat.tools.run_block.get_block",
            return_value=smart_block,
        ):
            tool = RunBlockTool()
@@ -115,7 +115,7 @@ class TestRunBlockFiltering:
            )

        assert isinstance(response, ErrorResponse)
-        assert "cannot be run directly" in response.message
+        assert "cannot be run directly in CoPilot" in response.message

    @pytest.mark.asyncio(loop_scope="session")
    async def test_non_excluded_block_passes_guard(self):
@@ -127,7 +127,7 @@ class TestRunBlockFiltering:
        )

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.api.features.chat.tools.run_block.get_block",
            return_value=standard_block,
        ):
            tool = RunBlockTool()
@@ -141,7 +141,7 @@ class TestRunBlockFiltering:
        # Should NOT be an ErrorResponse about CoPilot exclusion
        # (may be other errors like missing credentials, but not the exclusion guard)
        if isinstance(response, ErrorResponse):
-            assert "cannot be run directly" not in response.message
+            assert "cannot be run directly in CoPilot" not in response.message


 class TestRunBlockInputValidation:
@@ -183,7 +183,7 @@ class TestRunBlockInputValidation:
        )

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.api.features.chat.tools.run_block.get_block",
            return_value=mock_block,
        ):
            tool = RunBlockTool()
@@ -222,7 +222,7 @@ class TestRunBlockInputValidation:
        )

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.api.features.chat.tools.run_block.get_block",
            return_value=mock_block,
        ):
            tool = RunBlockTool()
@@ -263,7 +263,7 @@ class TestRunBlockInputValidation:
        )

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.api.features.chat.tools.run_block.get_block",
            return_value=mock_block,
        ):
            tool = RunBlockTool()
@@ -302,19 +302,15 @@ class TestRunBlockInputValidation:

        mock_block.execute = mock_execute

-        mock_workspace_db = MagicMock()
-        mock_workspace_db.get_or_create_workspace = AsyncMock(
-            return_value=MagicMock(id="test-workspace-id")
-        )
-
        with (
            patch(
-                "backend.copilot.tools.run_block.get_block",
+                "backend.api.features.chat.tools.run_block.get_block",
                return_value=mock_block,
            ),
            patch(
-                "backend.copilot.tools.run_block.workspace_db",
-                return_value=mock_workspace_db,
+                "backend.api.features.chat.tools.run_block.get_or_create_workspace",
+                new_callable=AsyncMock,
+                return_value=MagicMock(id="test-workspace-id"),
            ),
        ):
            tool = RunBlockTool()
@@ -348,7 +344,7 @@ class TestRunBlockInputValidation:
        )

        with patch(
-            "backend.copilot.tools.run_block.get_block",
+            "backend.api.features.chat.tools.run_block.get_block",
            return_value=mock_block,
        ):
            tool = RunBlockTool()
--- a/autogpt_platform/backend/backend/api/features/chat/tools/sandbox.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/sandbox.py
@@ -13,7 +13,6 @@ import logging
 import os
 import platform
 import shutil
-import signal

 logger = logging.getLogger(__name__)

@@ -246,7 +245,6 @@ async def run_sandboxed(
            stderr=asyncio.subprocess.PIPE,
            cwd=cwd,
            env=safe_env,
-            start_new_session=True,  # Own process group for clean kill
        )

        try:
@@ -257,18 +255,7 @@ async def run_sandboxed(
            stderr = stderr_bytes.decode("utf-8", errors="replace")
            return stdout, stderr, proc.returncode or 0, False
        except asyncio.TimeoutError:
-            # Kill entire process group (bwrap + all children).
-            # proc.kill() alone only kills the bwrap parent, leaving
-            # children running until they finish naturally.
-            try:
-                os.killpg(proc.pid, signal.SIGKILL)
-            except ProcessLookupError:
-                pass  # Already exited
-            except OSError as kill_err:
-                logger.warning(
-                    "Failed to kill process group %d: %s", proc.pid, kill_err
-                )
-            # Always reap the subprocess regardless of killpg outcome.
+            proc.kill()
            await proc.communicate()
            return "", f"Execution timed out after {timeout}s", -1, True

--- a/autogpt_platform/backend/backend/api/features/chat/tools/search_docs.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/search_docs.py
@@ -5,17 +5,16 @@ from typing import Any

 from prisma.enums import ContentType

-from backend.copilot.model import ChatSession
-from backend.data.db_accessors import search
-
-from .base import BaseTool
-from .models import (
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.base import BaseTool
+from backend.api.features.chat.tools.models import (
    DocSearchResult,
    DocSearchResultsResponse,
    ErrorResponse,
    NoResultsResponse,
    ToolResponseBase,
 )
+from backend.api.features.store.hybrid_search import unified_hybrid_search

 logger = logging.getLogger(__name__)

@@ -118,7 +117,7 @@ class SearchDocsTool(BaseTool):

        try:
            # Search using hybrid search for DOCUMENTATION content type only
-            results, total = await search().unified_hybrid_search(
+            results, total = await unified_hybrid_search(
                query=query,
                content_types=[ContentType.DOCUMENTATION],
                page=1,
--- a/autogpt_platform/backend/backend/api/features/chat/tools/test_run_block_details.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/test_run_block_details.py
@@ -4,13 +4,13 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

+from backend.api.features.chat.tools.models import BlockDetailsResponse
+from backend.api.features.chat.tools.run_block import RunBlockTool
 from backend.blocks._base import BlockType
 from backend.data.model import CredentialsMetaInput
 from backend.integrations.providers import ProviderName

 from ._test_data import make_session
-from .models import BlockDetailsResponse
-from .run_block import RunBlockTool

 _TEST_USER_ID = "test-user-run-block-details"

@@ -61,7 +61,7 @@ async def test_run_block_returns_details_when_no_input_provided():
    )

    with patch(
-        "backend.copilot.tools.run_block.get_block",
+        "backend.api.features.chat.tools.run_block.get_block",
        return_value=http_block,
    ):
        # Mock credentials check to return no missing credentials
@@ -120,7 +120,7 @@ async def test_run_block_returns_details_when_only_credentials_provided():
    }

    with patch(
-        "backend.copilot.tools.run_block.get_block",
+        "backend.api.features.chat.tools.run_block.get_block",
        return_value=mock,
    ):
        with patch.object(
--- a/autogpt_platform/backend/backend/api/features/chat/tools/utils.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/utils.py
@@ -3,8 +3,9 @@
 import logging
 from typing import Any

+from backend.api.features.library import db as library_db
 from backend.api.features.library import model as library_model
-from backend.data.db_accessors import library_db, store_db
+from backend.api.features.store import db as store_db
 from backend.data.graph import GraphModel
 from backend.data.model import (
    Credentials,
@@ -38,14 +39,13 @@ async def fetch_graph_from_store_slug(
    Raises:
        DatabaseError: If there's a database error during lookup.
    """
-    sdb = store_db()
    try:
-        store_agent = await sdb.get_store_agent_details(username, agent_name)
+        store_agent = await store_db.get_store_agent_details(username, agent_name)
    except NotFoundError:
        return None, None

    # Get the graph from store listing version
-    graph = await sdb.get_available_graph(
+    graph = await store_db.get_available_graph(
        store_agent.store_listing_version_id, hide_nodes=False
    )
    return graph, store_agent
@@ -119,7 +119,7 @@ def build_missing_credentials_from_graph(
    preserving all supported credential types for each field.
    """
    matched_keys = set(matched_credentials.keys()) if matched_credentials else set()
-    aggregated_fields = graph.aggregate_credentials_inputs()
+    aggregated_fields = graph.regular_credentials_inputs

    return {
        field_key: _serialize_missing_credential(field_key, field_info)
@@ -210,13 +210,13 @@ async def get_or_create_library_agent(
    Returns:
        LibraryAgent instance
    """
-    existing = await library_db().get_library_agent_by_graph_id(
+    existing = await library_db.get_library_agent_by_graph_id(
        graph_id=graph.id, user_id=user_id
    )
    if existing:
        return existing

-    library_agents = await library_db().create_library_agent(
+    library_agents = await library_db.create_library_agent(
        graph=graph,
        user_id=user_id,
        create_library_agents_for_sub_graphs=False,
@@ -339,7 +339,7 @@ async def match_user_credentials_to_graph(
    missing_creds: list[str] = []

    # Get aggregated credentials requirements from the graph
-    aggregated_creds = graph.aggregate_credentials_inputs()
+    aggregated_creds = graph.regular_credentials_inputs
    logger.debug(
        f"Matching credentials for graph {graph.id}: {len(aggregated_creds)} required"
    )
--- a/autogpt_platform/backend/backend/api/features/chat/tools/utils_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/utils_test.py
@@ -0,0 +1,78 @@
+"""Tests for chat tools utility functions."""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from backend.data.model import CredentialsFieldInfo
+
+
+def _make_regular_field() -> CredentialsFieldInfo:
+    return CredentialsFieldInfo.model_validate(
+        {
+            "credentials_provider": ["github"],
+            "credentials_types": ["api_key"],
+            "is_auto_credential": False,
+        },
+        by_alias=True,
+    )
+
+
+def test_build_missing_credentials_excludes_auto_creds():
+    """
+    build_missing_credentials_from_graph() should use regular_credentials_inputs
+    and thus exclude auto_credentials from the "missing" set.
+    """
+    from backend.api.features.chat.tools.utils import (
+        build_missing_credentials_from_graph,
+    )
+
+    regular_field = _make_regular_field()
+
+    mock_graph = MagicMock()
+    # regular_credentials_inputs should only return the non-auto field
+    mock_graph.regular_credentials_inputs = {
+        "github_api_key": (regular_field, {("node-1", "credentials")}, True),
+    }
+
+    result = build_missing_credentials_from_graph(mock_graph, matched_credentials=None)
+
+    # Should include the regular credential
+    assert "github_api_key" in result
+    # Should NOT include the auto_credential (not in regular_credentials_inputs)
+    assert "google_oauth2" not in result
+
+
+@pytest.mark.asyncio
+async def test_match_user_credentials_excludes_auto_creds():
+    """
+    match_user_credentials_to_graph() should use regular_credentials_inputs
+    and thus exclude auto_credentials from matching.
+    """
+    from backend.api.features.chat.tools.utils import match_user_credentials_to_graph
+
+    regular_field = _make_regular_field()
+
+    mock_graph = MagicMock()
+    mock_graph.id = "test-graph"
+    # regular_credentials_inputs returns only non-auto fields
+    mock_graph.regular_credentials_inputs = {
+        "github_api_key": (regular_field, {("node-1", "credentials")}, True),
+    }
+
+    # Mock the credentials manager to return no credentials
+    with patch(
+        "backend.api.features.chat.tools.utils.IntegrationCredentialsManager"
+    ) as MockCredsMgr:
+        mock_store = AsyncMock()
+        mock_store.get_all_creds.return_value = []
+        MockCredsMgr.return_value.store = mock_store
+
+        matched, missing = await match_user_credentials_to_graph(
+            user_id="test-user", graph=mock_graph
+        )
+
+    # No credentials available, so github should be missing
+    assert len(matched) == 0
+    assert len(missing) == 1
+    assert "github_api_key" in missing[0]
--- a/autogpt_platform/backend/backend/api/features/chat/tools/web_fetch.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/web_fetch.py
@@ -6,12 +6,15 @@ from typing import Any
 import aiohttp
 import html2text

-from backend.copilot.model import ChatSession
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.base import BaseTool
+from backend.api.features.chat.tools.models import (
+    ErrorResponse,
+    ToolResponseBase,
+    WebFetchResponse,
+)
 from backend.util.request import Requests

-from .base import BaseTool
-from .models import ErrorResponse, ToolResponseBase, WebFetchResponse
-
 logger = logging.getLogger(__name__)

 # Limits
@@ -30,10 +33,6 @@ _TEXT_CONTENT_TYPES = {
    "application/xhtml+xml",
    "application/rss+xml",
    "application/atom+xml",
-    # RFC 7807 — JSON problem details; used by many REST APIs for error responses
-    "application/problem+json",
-    "application/problem+xml",
-    "application/ld+json",
 }


--- a/autogpt_platform/backend/backend/api/features/chat/tools/workspace_files.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/workspace_files.py
@@ -0,0 +1,626 @@
+"""CoPilot tools for workspace file operations."""
+
+import base64
+import logging
+from typing import Any, Optional
+
+from pydantic import BaseModel
+
+from backend.api.features.chat.model import ChatSession
+from backend.data.workspace import get_or_create_workspace
+from backend.util.settings import Config
+from backend.util.virus_scanner import scan_content_safe
+from backend.util.workspace import WorkspaceManager
+
+from .base import BaseTool
+from .models import ErrorResponse, ResponseType, ToolResponseBase
+
+logger = logging.getLogger(__name__)
+
+
+class WorkspaceFileInfoData(BaseModel):
+    """Data model for workspace file information (not a response itself)."""
+
+    file_id: str
+    name: str
+    path: str
+    mime_type: str
+    size_bytes: int
+
+
+class WorkspaceFileListResponse(ToolResponseBase):
+    """Response containing list of workspace files."""
+
+    type: ResponseType = ResponseType.WORKSPACE_FILE_LIST
+    files: list[WorkspaceFileInfoData]
+    total_count: int
+
+
+class WorkspaceFileContentResponse(ToolResponseBase):
+    """Response containing workspace file content (legacy, for small text files)."""
+
+    type: ResponseType = ResponseType.WORKSPACE_FILE_CONTENT
+    file_id: str
+    name: str
+    path: str
+    mime_type: str
+    content_base64: str
+
+
+class WorkspaceFileMetadataResponse(ToolResponseBase):
+    """Response containing workspace file metadata and download URL (prevents context bloat)."""
+
+    type: ResponseType = ResponseType.WORKSPACE_FILE_METADATA
+    file_id: str
+    name: str
+    path: str
+    mime_type: str
+    size_bytes: int
+    download_url: str
+    preview: str | None = None  # First 500 chars for text files
+
+
+class WorkspaceWriteResponse(ToolResponseBase):
+    """Response after writing a file to workspace."""
+
+    type: ResponseType = ResponseType.WORKSPACE_FILE_WRITTEN
+    file_id: str
+    name: str
+    path: str
+    size_bytes: int
+
+
+class WorkspaceDeleteResponse(ToolResponseBase):
+    """Response after deleting a file from workspace."""
+
+    type: ResponseType = ResponseType.WORKSPACE_FILE_DELETED
+    file_id: str
+    success: bool
+
+
+class ListWorkspaceFilesTool(BaseTool):
+    """Tool for listing files in user's workspace."""
+
+    @property
+    def name(self) -> str:
+        return "list_workspace_files"
+
+    @property
+    def description(self) -> str:
+        return (
+            "List files in the user's persistent workspace (cloud storage). "
+            "These files survive across sessions. "
+            "For ephemeral session files, use the SDK Read/Glob tools instead. "
+            "Returns file names, paths, sizes, and metadata. "
+            "Optionally filter by path prefix."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "path_prefix": {
+                    "type": "string",
+                    "description": (
+                        "Optional path prefix to filter files "
+                        "(e.g., '/documents/' to list only files in documents folder). "
+                        "By default, only files from the current session are listed."
+                    ),
+                },
+                "limit": {
+                    "type": "integer",
+                    "description": "Maximum number of files to return (default 50, max 100)",
+                    "minimum": 1,
+                    "maximum": 100,
+                },
+                "include_all_sessions": {
+                    "type": "boolean",
+                    "description": (
+                        "If true, list files from all sessions. "
+                        "Default is false (only current session's files)."
+                    ),
+                },
+            },
+            "required": [],
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        return True
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        session_id = session.session_id
+
+        if not user_id:
+            return ErrorResponse(
+                message="Authentication required",
+                session_id=session_id,
+            )
+
+        path_prefix: Optional[str] = kwargs.get("path_prefix")
+        limit = min(kwargs.get("limit", 50), 100)
+        include_all_sessions: bool = kwargs.get("include_all_sessions", False)
+
+        try:
+            workspace = await get_or_create_workspace(user_id)
+            # Pass session_id for session-scoped file access
+            manager = WorkspaceManager(user_id, workspace.id, session_id)
+
+            files = await manager.list_files(
+                path=path_prefix,
+                limit=limit,
+                include_all_sessions=include_all_sessions,
+            )
+            total = await manager.get_file_count(
+                path=path_prefix,
+                include_all_sessions=include_all_sessions,
+            )
+
+            file_infos = [
+                WorkspaceFileInfoData(
+                    file_id=f.id,
+                    name=f.name,
+                    path=f.path,
+                    mime_type=f.mimeType,
+                    size_bytes=f.sizeBytes,
+                )
+                for f in files
+            ]
+
+            scope_msg = "all sessions" if include_all_sessions else "current session"
+            return WorkspaceFileListResponse(
+                files=file_infos,
+                total_count=total,
+                message=f"Found {len(files)} files in workspace ({scope_msg})",
+                session_id=session_id,
+            )
+
+        except Exception as e:
+            logger.error(f"Error listing workspace files: {e}", exc_info=True)
+            return ErrorResponse(
+                message=f"Failed to list workspace files: {str(e)}",
+                error=str(e),
+                session_id=session_id,
+            )
+
+
+class ReadWorkspaceFileTool(BaseTool):
+    """Tool for reading file content from workspace."""
+
+    # Size threshold for returning full content vs metadata+URL
+    # Files larger than this return metadata with download URL to prevent context bloat
+    MAX_INLINE_SIZE_BYTES = 32 * 1024  # 32KB
+    # Preview size for text files
+    PREVIEW_SIZE = 500
+
+    @property
+    def name(self) -> str:
+        return "read_workspace_file"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Read a file from the user's persistent workspace (cloud storage). "
+            "These files survive across sessions. "
+            "For ephemeral session files, use the SDK Read tool instead. "
+            "Specify either file_id or path to identify the file. "
+            "For small text files, returns content directly. "
+            "For large or binary files, returns metadata and a download URL. "
+            "Paths are scoped to the current session by default. "
+            "Use /sessions/<session_id>/... for cross-session access."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "file_id": {
+                    "type": "string",
+                    "description": "The file's unique ID (from list_workspace_files)",
+                },
+                "path": {
+                    "type": "string",
+                    "description": (
+                        "The virtual file path (e.g., '/documents/report.pdf'). "
+                        "Scoped to current session by default."
+                    ),
+                },
+                "force_download_url": {
+                    "type": "boolean",
+                    "description": (
+                        "If true, always return metadata+URL instead of inline content. "
+                        "Default is false (auto-selects based on file size/type)."
+                    ),
+                },
+            },
+            "required": [],  # At least one must be provided
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        return True
+
+    def _is_text_mime_type(self, mime_type: str) -> bool:
+        """Check if the MIME type is a text-based type."""
+        text_types = [
+            "text/",
+            "application/json",
+            "application/xml",
+            "application/javascript",
+            "application/x-python",
+            "application/x-sh",
+        ]
+        return any(mime_type.startswith(t) for t in text_types)
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        session_id = session.session_id
+
+        if not user_id:
+            return ErrorResponse(
+                message="Authentication required",
+                session_id=session_id,
+            )
+
+        file_id: Optional[str] = kwargs.get("file_id")
+        path: Optional[str] = kwargs.get("path")
+        force_download_url: bool = kwargs.get("force_download_url", False)
+
+        if not file_id and not path:
+            return ErrorResponse(
+                message="Please provide either file_id or path",
+                session_id=session_id,
+            )
+
+        try:
+            workspace = await get_or_create_workspace(user_id)
+            # Pass session_id for session-scoped file access
+            manager = WorkspaceManager(user_id, workspace.id, session_id)
+
+            # Get file info
+            if file_id:
+                file_info = await manager.get_file_info(file_id)
+                if file_info is None:
+                    return ErrorResponse(
+                        message=f"File not found: {file_id}",
+                        session_id=session_id,
+                    )
+                target_file_id = file_id
+            else:
+                # path is guaranteed to be non-None here due to the check above
+                assert path is not None
+                file_info = await manager.get_file_info_by_path(path)
+                if file_info is None:
+                    return ErrorResponse(
+                        message=f"File not found at path: {path}",
+                        session_id=session_id,
+                    )
+                target_file_id = file_info.id
+
+            # Decide whether to return inline content or metadata+URL
+            is_small_file = file_info.sizeBytes <= self.MAX_INLINE_SIZE_BYTES
+            is_text_file = self._is_text_mime_type(file_info.mimeType)
+
+            # Return inline content for small text files (unless force_download_url)
+            if is_small_file and is_text_file and not force_download_url:
+                content = await manager.read_file_by_id(target_file_id)
+                content_b64 = base64.b64encode(content).decode("utf-8")
+
+                return WorkspaceFileContentResponse(
+                    file_id=file_info.id,
+                    name=file_info.name,
+                    path=file_info.path,
+                    mime_type=file_info.mimeType,
+                    content_base64=content_b64,
+                    message=f"Successfully read file: {file_info.name}",
+                    session_id=session_id,
+                )
+
+            # Return metadata + workspace:// reference for large or binary files
+            # This prevents context bloat (100KB file = ~133KB as base64)
+            # Use workspace:// format so frontend urlTransform can add proxy prefix
+            download_url = f"workspace://{target_file_id}"
+
+            # Generate preview for text files
+            preview: str | None = None
+            if is_text_file:
+                try:
+                    content = await manager.read_file_by_id(target_file_id)
+                    preview_text = content[: self.PREVIEW_SIZE].decode(
+                        "utf-8", errors="replace"
+                    )
+                    if len(content) > self.PREVIEW_SIZE:
+                        preview_text += "..."
+                    preview = preview_text
+                except Exception:
+                    pass  # Preview is optional
+
+            return WorkspaceFileMetadataResponse(
+                file_id=file_info.id,
+                name=file_info.name,
+                path=file_info.path,
+                mime_type=file_info.mimeType,
+                size_bytes=file_info.sizeBytes,
+                download_url=download_url,
+                preview=preview,
+                message=f"File: {file_info.name} ({file_info.sizeBytes} bytes). Use download_url to retrieve content.",
+                session_id=session_id,
+            )
+
+        except FileNotFoundError as e:
+            return ErrorResponse(
+                message=str(e),
+                session_id=session_id,
+            )
+        except Exception as e:
+            logger.error(f"Error reading workspace file: {e}", exc_info=True)
+            return ErrorResponse(
+                message=f"Failed to read workspace file: {str(e)}",
+                error=str(e),
+                session_id=session_id,
+            )
+
+
+class WriteWorkspaceFileTool(BaseTool):
+    """Tool for writing files to workspace."""
+
+    @property
+    def name(self) -> str:
+        return "write_workspace_file"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Write or create a file in the user's persistent workspace (cloud storage). "
+            "These files survive across sessions. "
+            "For ephemeral session files, use the SDK Write tool instead. "
+            "Provide the content as a base64-encoded string. "
+            f"Maximum file size is {Config().max_file_size_mb}MB. "
+            "Files are saved to the current session's folder by default. "
+            "Use /sessions/<session_id>/... for cross-session access."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "filename": {
+                    "type": "string",
+                    "description": "Name for the file (e.g., 'report.pdf')",
+                },
+                "content_base64": {
+                    "type": "string",
+                    "description": "Base64-encoded file content",
+                },
+                "path": {
+                    "type": "string",
+                    "description": (
+                        "Optional virtual path where to save the file "
+                        "(e.g., '/documents/report.pdf'). "
+                        "Defaults to '/{filename}'. Scoped to current session."
+                    ),
+                },
+                "mime_type": {
+                    "type": "string",
+                    "description": (
+                        "Optional MIME type of the file. "
+                        "Auto-detected from filename if not provided."
+                    ),
+                },
+                "overwrite": {
+                    "type": "boolean",
+                    "description": "Whether to overwrite if file exists at path (default: false)",
+                },
+            },
+            "required": ["filename", "content_base64"],
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        return True
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        session_id = session.session_id
+
+        if not user_id:
+            return ErrorResponse(
+                message="Authentication required",
+                session_id=session_id,
+            )
+
+        filename: str = kwargs.get("filename", "")
+        content_b64: str = kwargs.get("content_base64", "")
+        path: Optional[str] = kwargs.get("path")
+        mime_type: Optional[str] = kwargs.get("mime_type")
+        overwrite: bool = kwargs.get("overwrite", False)
+
+        if not filename:
+            return ErrorResponse(
+                message="Please provide a filename",
+                session_id=session_id,
+            )
+
+        if not content_b64:
+            return ErrorResponse(
+                message="Please provide content_base64",
+                session_id=session_id,
+            )
+
+        # Decode content
+        try:
+            content = base64.b64decode(content_b64)
+        except Exception:
+            return ErrorResponse(
+                message="Invalid base64-encoded content",
+                session_id=session_id,
+            )
+
+        # Check size
+        max_file_size = Config().max_file_size_mb * 1024 * 1024
+        if len(content) > max_file_size:
+            return ErrorResponse(
+                message=f"File too large. Maximum size is {Config().max_file_size_mb}MB",
+                session_id=session_id,
+            )
+
+        try:
+            # Virus scan
+            await scan_content_safe(content, filename=filename)
+
+            workspace = await get_or_create_workspace(user_id)
+            # Pass session_id for session-scoped file access
+            manager = WorkspaceManager(user_id, workspace.id, session_id)
+
+            file_record = await manager.write_file(
+                content=content,
+                filename=filename,
+                path=path,
+                mime_type=mime_type,
+                overwrite=overwrite,
+            )
+
+            return WorkspaceWriteResponse(
+                file_id=file_record.id,
+                name=file_record.name,
+                path=file_record.path,
+                size_bytes=file_record.sizeBytes,
+                message=f"Successfully wrote file: {file_record.name}",
+                session_id=session_id,
+            )
+
+        except ValueError as e:
+            return ErrorResponse(
+                message=str(e),
+                session_id=session_id,
+            )
+        except Exception as e:
+            logger.error(f"Error writing workspace file: {e}", exc_info=True)
+            return ErrorResponse(
+                message=f"Failed to write workspace file: {str(e)}",
+                error=str(e),
+                session_id=session_id,
+            )
+
+
+class DeleteWorkspaceFileTool(BaseTool):
+    """Tool for deleting files from workspace."""
+
+    @property
+    def name(self) -> str:
+        return "delete_workspace_file"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Delete a file from the user's persistent workspace (cloud storage). "
+            "Specify either file_id or path to identify the file. "
+            "Paths are scoped to the current session by default. "
+            "Use /sessions/<session_id>/... for cross-session access."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "file_id": {
+                    "type": "string",
+                    "description": "The file's unique ID (from list_workspace_files)",
+                },
+                "path": {
+                    "type": "string",
+                    "description": (
+                        "The virtual file path (e.g., '/documents/report.pdf'). "
+                        "Scoped to current session by default."
+                    ),
+                },
+            },
+            "required": [],  # At least one must be provided
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        return True
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs,
+    ) -> ToolResponseBase:
+        session_id = session.session_id
+
+        if not user_id:
+            return ErrorResponse(
+                message="Authentication required",
+                session_id=session_id,
+            )
+
+        file_id: Optional[str] = kwargs.get("file_id")
+        path: Optional[str] = kwargs.get("path")
+
+        if not file_id and not path:
+            return ErrorResponse(
+                message="Please provide either file_id or path",
+                session_id=session_id,
+            )
+
+        try:
+            workspace = await get_or_create_workspace(user_id)
+            # Pass session_id for session-scoped file access
+            manager = WorkspaceManager(user_id, workspace.id, session_id)
+
+            # Determine the file_id to delete
+            target_file_id: str
+            if file_id:
+                target_file_id = file_id
+            else:
+                # path is guaranteed to be non-None here due to the check above
+                assert path is not None
+                file_info = await manager.get_file_info_by_path(path)
+                if file_info is None:
+                    return ErrorResponse(
+                        message=f"File not found at path: {path}",
+                        session_id=session_id,
+                    )
+                target_file_id = file_info.id
+
+            success = await manager.delete_file(target_file_id)
+
+            if not success:
+                return ErrorResponse(
+                    message=f"File not found: {target_file_id}",
+                    session_id=session_id,
+                )
+
+            return WorkspaceDeleteResponse(
+                file_id=target_file_id,
+                success=True,
+                message="File deleted successfully",
+                session_id=session_id,
+            )
+
+        except Exception as e:
+            logger.error(f"Error deleting workspace file: {e}", exc_info=True)
+            return ErrorResponse(
+                message=f"Failed to delete workspace file: {str(e)}",
+                error=str(e),
+                session_id=session_id,
+            )
--- a/autogpt_platform/backend/backend/api/features/chat/tracking.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tracking.py
--- a/autogpt_platform/backend/backend/api/features/executions/review/routes.py
+++ b/autogpt_platform/backend/backend/api/features/executions/review/routes.py
@@ -22,7 +22,6 @@ from backend.data.human_review import (
 )
 from backend.data.model import USER_TIMEZONE_NOT_SET
 from backend.data.user import get_user_by_id
-from backend.data.workspace import get_or_create_workspace
 from backend.executor.utils import add_graph_execution

 from .model import PendingHumanReviewModel, ReviewRequest, ReviewResponse
@@ -322,13 +321,10 @@ async def process_review_action(
                    user.timezone if user.timezone != USER_TIMEZONE_NOT_SET else "UTC"
                )

-                workspace = await get_or_create_workspace(user_id)
-
                execution_context = ExecutionContext(
                    human_in_the_loop_safe_mode=settings.human_in_the_loop_safe_mode,
                    sensitive_action_safe_mode=settings.sensitive_action_safe_mode,
                    user_timezone=user_timezone,
-                    workspace_id=workspace.id,
                )

                await add_graph_execution(
--- a/autogpt_platform/backend/backend/api/features/library/db.py
+++ b/autogpt_platform/backend/backend/api/features/library/db.py
--- a/autogpt_platform/backend/backend/api/features/library/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/library/db_test.py
@@ -4,6 +4,7 @@ import prisma.enums
 import prisma.models
 import pytest

+import backend.api.features.store.exceptions
 from backend.data.db import connect
 from backend.data.includes import library_agent_include

@@ -143,7 +144,6 @@ async def test_add_agent_to_library(mocker):
    )

    mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
-    mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
    mock_library_agent.return_value.find_unique = mocker.AsyncMock(return_value=None)
    mock_library_agent.return_value.create = mocker.AsyncMock(
        return_value=mock_library_agent_data
@@ -178,6 +178,7 @@ async def test_add_agent_to_library(mocker):
                "agentGraphVersion": 1,
            }
        },
+        include={"AgentGraph": True},
    )
    # Check that create was called with the expected data including settings
    create_call_args = mock_library_agent.return_value.create.call_args
@@ -217,7 +218,7 @@ async def test_add_agent_to_library_not_found(mocker):
    )

    # Call function and verify exception
-    with pytest.raises(db.NotFoundError):
+    with pytest.raises(backend.api.features.store.exceptions.AgentNotFoundError):
        await db.add_store_agent_to_library("version123", "test-user")

    # Verify mock called correctly
--- a/autogpt_platform/backend/backend/api/features/library/exceptions.py
+++ b/autogpt_platform/backend/backend/api/features/library/exceptions.py
@@ -1,10 +0,0 @@
-class FolderValidationError(Exception):
-    """Raised when folder operations fail validation."""
-
-    pass
-
-
-class FolderAlreadyExistsError(FolderValidationError):
-    """Raised when a folder with the same name already exists in the location."""
-
-    pass
--- a/autogpt_platform/backend/backend/api/features/library/model.py
+++ b/autogpt_platform/backend/backend/api/features/library/model.py
@@ -26,95 +26,6 @@ class LibraryAgentStatus(str, Enum):
    ERROR = "ERROR"


-# === Folder Models ===
-
-
-class LibraryFolder(pydantic.BaseModel):
-    """Represents a folder for organizing library agents."""
-
-    id: str
-    user_id: str
-    name: str
-    icon: str | None = None
-    color: str | None = None
-    parent_id: str | None = None
-    created_at: datetime.datetime
-    updated_at: datetime.datetime
-    agent_count: int = 0  # Direct agents in folder
-    subfolder_count: int = 0  # Direct child folders
-
-    @staticmethod
-    def from_db(
-        folder: prisma.models.LibraryFolder,
-        agent_count: int = 0,
-        subfolder_count: int = 0,
-    ) -> "LibraryFolder":
-        """Factory method that constructs a LibraryFolder from a Prisma model."""
-        return LibraryFolder(
-            id=folder.id,
-            user_id=folder.userId,
-            name=folder.name,
-            icon=folder.icon,
-            color=folder.color,
-            parent_id=folder.parentId,
-            created_at=folder.createdAt,
-            updated_at=folder.updatedAt,
-            agent_count=agent_count,
-            subfolder_count=subfolder_count,
-        )
-
-
-class LibraryFolderTree(LibraryFolder):
-    """Folder with nested children for tree view."""
-
-    children: list["LibraryFolderTree"] = []
-
-
-class FolderCreateRequest(pydantic.BaseModel):
-    """Request model for creating a folder."""
-
-    name: str = pydantic.Field(..., min_length=1, max_length=100)
-    icon: str | None = None
-    color: str | None = pydantic.Field(
-        None, pattern=r"^#[0-9A-Fa-f]{6}$", description="Hex color code (#RRGGBB)"
-    )
-    parent_id: str | None = None
-
-
-class FolderUpdateRequest(pydantic.BaseModel):
-    """Request model for updating a folder."""
-
-    name: str | None = pydantic.Field(None, min_length=1, max_length=100)
-    icon: str | None = None
-    color: str | None = None
-
-
-class FolderMoveRequest(pydantic.BaseModel):
-    """Request model for moving a folder to a new parent."""
-
-    target_parent_id: str | None = None  # None = move to root
-
-
-class BulkMoveAgentsRequest(pydantic.BaseModel):
-    """Request model for moving multiple agents to a folder."""
-
-    agent_ids: list[str]
-    folder_id: str | None = None  # None = move to root
-
-
-class FolderListResponse(pydantic.BaseModel):
-    """Response schema for a list of folders."""
-
-    folders: list[LibraryFolder]
-    pagination: Pagination
-
-
-class FolderTreeResponse(pydantic.BaseModel):
-    """Response schema for folder tree structure."""
-
-    tree: list[LibraryFolderTree]
-
-
 class MarketplaceListingCreator(pydantic.BaseModel):
    """Creator information for a marketplace listing."""

@@ -209,9 +120,6 @@ class LibraryAgent(pydantic.BaseModel):
    can_access_graph: bool
    is_latest_version: bool
    is_favorite: bool
-    folder_id: str | None = None
-    folder_name: str | None = None  # Denormalized for display
-
    recommended_schedule_cron: str | None = None
    settings: GraphSettings = pydantic.Field(default_factory=GraphSettings)
    marketplace_listing: Optional["MarketplaceListing"] = None
@@ -351,8 +259,6 @@ class LibraryAgent(pydantic.BaseModel):
            can_access_graph=can_access_graph,
            is_latest_version=is_latest_version,
            is_favorite=agent.isFavorite,
-            folder_id=agent.folderId,
-            folder_name=agent.Folder.name if agent.Folder else None,
            recommended_schedule_cron=agent.AgentGraph.recommendedScheduleCron,
            settings=_parse_settings(agent.settings),
            marketplace_listing=marketplace_listing_data,
@@ -564,7 +470,3 @@ class LibraryAgentUpdateRequest(pydantic.BaseModel):
    settings: Optional[GraphSettings] = pydantic.Field(
        default=None, description="User-specific settings for this library agent"
    )
-    folder_id: Optional[str] = pydantic.Field(
-        default=None,
-        description="Folder ID to move agent to (None to move to root)",
-    )
--- a/autogpt_platform/backend/backend/api/features/library/routes/init.py
+++ b/autogpt_platform/backend/backend/api/features/library/routes/init.py
@@ -1,11 +1,9 @@
 import fastapi

 from .agents import router as agents_router
-from .folders import router as folders_router
 from .presets import router as presets_router

 router = fastapi.APIRouter()

 router.include_router(presets_router)
-router.include_router(folders_router)
 router.include_router(agents_router)
--- a/autogpt_platform/backend/backend/api/features/library/routes/agents.py
+++ b/autogpt_platform/backend/backend/api/features/library/routes/agents.py
@@ -41,14 +41,6 @@ async def list_library_agents(
        ge=1,
        description="Number of agents per page (must be >= 1)",
    ),
-    folder_id: Optional[str] = Query(
-        None,
-        description="Filter by folder ID",
-    ),
-    include_root_only: bool = Query(
-        False,
-        description="Only return agents without a folder (root-level agents)",
-    ),
 ) -> library_model.LibraryAgentResponse:
    """
    Get all agents in the user's library (both created and saved).
@@ -59,8 +51,6 @@ async def list_library_agents(
        sort_by=sort_by,
        page=page,
        page_size=page_size,
-        folder_id=folder_id,
-        include_root_only=include_root_only,
    )


@@ -178,7 +168,6 @@ async def update_library_agent(
        is_favorite=payload.is_favorite,
        is_archived=payload.is_archived,
        settings=payload.settings,
-        folder_id=payload.folder_id,
    )


--- a/autogpt_platform/backend/backend/api/features/library/routes/folders.py
+++ b/autogpt_platform/backend/backend/api/features/library/routes/folders.py
@@ -1,287 +0,0 @@
-from typing import Optional
-
-import autogpt_libs.auth as autogpt_auth_lib
-from fastapi import APIRouter, Query, Security, status
-from fastapi.responses import Response
-
-from .. import db as library_db
-from .. import model as library_model
-
-router = APIRouter(
-    prefix="/folders",
-    tags=["library", "folders", "private"],
-    dependencies=[Security(autogpt_auth_lib.requires_user)],
-)
-
-
-@router.get(
-    "",
-    summary="List Library Folders",
-    response_model=library_model.FolderListResponse,
-    responses={
-        200: {"description": "List of folders"},
-        500: {"description": "Server error"},
-    },
-)
-async def list_folders(
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-    parent_id: Optional[str] = Query(
-        None,
-        description="Filter by parent folder ID. If not provided, returns root-level folders.",
-    ),
-    include_relations: bool = Query(
-        True,
-        description="Include agent and subfolder relations (for counts)",
-    ),
-) -> library_model.FolderListResponse:
-    """
-    List folders for the authenticated user.
-
-    Args:
-        user_id: ID of the authenticated user.
-        parent_id: Optional parent folder ID to filter by.
-        include_relations: Whether to include agent and subfolder relations for counts.
-
-    Returns:
-        A FolderListResponse containing folders.
-    """
-    folders = await library_db.list_folders(
-        user_id=user_id,
-        parent_id=parent_id,
-        include_relations=include_relations,
-    )
-    return library_model.FolderListResponse(
-        folders=folders,
-        pagination=library_model.Pagination(
-            total_items=len(folders),
-            total_pages=1,
-            current_page=1,
-            page_size=len(folders),
-        ),
-    )
-
-
-@router.get(
-    "/tree",
-    summary="Get Folder Tree",
-    response_model=library_model.FolderTreeResponse,
-    responses={
-        200: {"description": "Folder tree structure"},
-        500: {"description": "Server error"},
-    },
-)
-async def get_folder_tree(
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.FolderTreeResponse:
-    """
-    Get the full folder tree for the authenticated user.
-
-    Args:
-        user_id: ID of the authenticated user.
-
-    Returns:
-        A FolderTreeResponse containing the nested folder structure.
-    """
-    tree = await library_db.get_folder_tree(user_id=user_id)
-    return library_model.FolderTreeResponse(tree=tree)
-
-
-@router.get(
-    "/{folder_id}",
-    summary="Get Folder",
-    response_model=library_model.LibraryFolder,
-    responses={
-        200: {"description": "Folder details"},
-        404: {"description": "Folder not found"},
-        500: {"description": "Server error"},
-    },
-)
-async def get_folder(
-    folder_id: str,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.LibraryFolder:
-    """
-    Get a specific folder.
-
-    Args:
-        folder_id: ID of the folder to retrieve.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The requested LibraryFolder.
-    """
-    return await library_db.get_folder(folder_id=folder_id, user_id=user_id)
-
-
-@router.post(
-    "",
-    summary="Create Folder",
-    status_code=status.HTTP_201_CREATED,
-    response_model=library_model.LibraryFolder,
-    responses={
-        201: {"description": "Folder created successfully"},
-        400: {"description": "Validation error"},
-        404: {"description": "Parent folder not found"},
-        409: {"description": "Folder name conflict"},
-        500: {"description": "Server error"},
-    },
-)
-async def create_folder(
-    payload: library_model.FolderCreateRequest,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.LibraryFolder:
-    """
-    Create a new folder.
-
-    Args:
-        payload: The folder creation request.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The created LibraryFolder.
-    """
-    return await library_db.create_folder(
-        user_id=user_id,
-        name=payload.name,
-        parent_id=payload.parent_id,
-        icon=payload.icon,
-        color=payload.color,
-    )
-
-
-@router.patch(
-    "/{folder_id}",
-    summary="Update Folder",
-    response_model=library_model.LibraryFolder,
-    responses={
-        200: {"description": "Folder updated successfully"},
-        400: {"description": "Validation error"},
-        404: {"description": "Folder not found"},
-        409: {"description": "Folder name conflict"},
-        500: {"description": "Server error"},
-    },
-)
-async def update_folder(
-    folder_id: str,
-    payload: library_model.FolderUpdateRequest,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.LibraryFolder:
-    """
-    Update a folder's properties.
-
-    Args:
-        folder_id: ID of the folder to update.
-        payload: The folder update request.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The updated LibraryFolder.
-    """
-    return await library_db.update_folder(
-        folder_id=folder_id,
-        user_id=user_id,
-        name=payload.name,
-        icon=payload.icon,
-        color=payload.color,
-    )
-
-
-@router.post(
-    "/{folder_id}/move",
-    summary="Move Folder",
-    response_model=library_model.LibraryFolder,
-    responses={
-        200: {"description": "Folder moved successfully"},
-        400: {"description": "Validation error (circular reference)"},
-        404: {"description": "Folder or target parent not found"},
-        409: {"description": "Folder name conflict in target location"},
-        500: {"description": "Server error"},
-    },
-)
-async def move_folder(
-    folder_id: str,
-    payload: library_model.FolderMoveRequest,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.LibraryFolder:
-    """
-    Move a folder to a new parent.
-
-    Args:
-        folder_id: ID of the folder to move.
-        payload: The move request with target parent.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The moved LibraryFolder.
-    """
-    return await library_db.move_folder(
-        folder_id=folder_id,
-        user_id=user_id,
-        target_parent_id=payload.target_parent_id,
-    )
-
-
-@router.delete(
-    "/{folder_id}",
-    summary="Delete Folder",
-    status_code=status.HTTP_204_NO_CONTENT,
-    responses={
-        204: {"description": "Folder deleted successfully"},
-        404: {"description": "Folder not found"},
-        500: {"description": "Server error"},
-    },
-)
-async def delete_folder(
-    folder_id: str,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> Response:
-    """
-    Soft-delete a folder and all its contents.
-
-    Args:
-        folder_id: ID of the folder to delete.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        204 No Content if successful.
-    """
-    await library_db.delete_folder(
-        folder_id=folder_id,
-        user_id=user_id,
-        soft_delete=True,
-    )
-    return Response(status_code=status.HTTP_204_NO_CONTENT)
-
-
-# === Bulk Agent Operations ===
-
-
-@router.post(
-    "/agents/bulk-move",
-    summary="Bulk Move Agents",
-    response_model=list[library_model.LibraryAgent],
-    responses={
-        200: {"description": "Agents moved successfully"},
-        404: {"description": "Folder not found"},
-        500: {"description": "Server error"},
-    },
-)
-async def bulk_move_agents(
-    payload: library_model.BulkMoveAgentsRequest,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> list[library_model.LibraryAgent]:
-    """
-    Move multiple agents to a folder.
-
-    Args:
-        payload: The bulk move request with agent IDs and target folder.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The updated LibraryAgents.
-    """
-    return await library_db.bulk_move_agents_to_folder(
-        agent_ids=payload.agent_ids,
-        folder_id=payload.folder_id,
-        user_id=user_id,
-    )
--- a/autogpt_platform/backend/backend/api/features/library/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/library/routes_test.py
@@ -115,8 +115,6 @@ async def test_get_library_agents_success(
        sort_by=library_model.LibraryAgentSort.UPDATED_AT,
        page=1,
        page_size=15,
-        folder_id=None,
-        include_root_only=False,
    )


--- a/autogpt_platform/backend/backend/api/features/mcp/routes.py
+++ b/autogpt_platform/backend/backend/api/features/mcp/routes.py
@@ -7,24 +7,20 @@ frontend can list available tools on an MCP server before placing a block.

 import logging
 from typing import Annotated, Any
+from urllib.parse import urlparse

 import fastapi
 from autogpt_libs.auth import get_user_id
 from fastapi import Security
-from pydantic import BaseModel, Field, SecretStr
+from pydantic import BaseModel, Field

 from backend.api.features.integrations.router import CredentialsMetaResponse
 from backend.blocks.mcp.client import MCPClient, MCPClientError
-from backend.blocks.mcp.helpers import (
-    auto_lookup_mcp_credential,
-    normalize_mcp_url,
-    server_host,
-)
 from backend.blocks.mcp.oauth import MCPOAuthHandler
 from backend.data.model import OAuth2Credentials
 from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.integrations.providers import ProviderName
-from backend.util.request import HTTPClientError, Requests, validate_url_host
+from backend.util.request import HTTPClientError, Requests
 from backend.util.settings import Settings

 logger = logging.getLogger(__name__)
@@ -78,20 +74,32 @@ async def discover_tools(
    If the user has a stored MCP credential for this server URL, it will be
    used automatically — no need to pass an explicit auth token.
    """
-    # Validate URL to prevent SSRF — blocks loopback and private IP ranges.
-    try:
-        await validate_url_host(request.server_url)
-    except ValueError as e:
-        raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
-
    auth_token = request.auth_token

    # Auto-use stored MCP credential when no explicit token is provided.
    if not auth_token:
-        best_cred = await auto_lookup_mcp_credential(
-            user_id, normalize_mcp_url(request.server_url)
+        mcp_creds = await creds_manager.store.get_creds_by_provider(
+            user_id, ProviderName.MCP.value
        )
+        # Find the freshest credential for this server URL
+        best_cred: OAuth2Credentials | None = None
+        for cred in mcp_creds:
+            if (
+                isinstance(cred, OAuth2Credentials)
+                and (cred.metadata or {}).get("mcp_server_url") == request.server_url
+            ):
+                if best_cred is None or (
+                    (cred.access_token_expires_at or 0)
+                    > (best_cred.access_token_expires_at or 0)
+                ):
+                    best_cred = cred
        if best_cred:
+            # Refresh the token if expired before using it
+            best_cred = await creds_manager.refresh_if_needed(user_id, best_cred)
+            logger.info(
+                f"Using MCP credential {best_cred.id} for {request.server_url}, "
+                f"expires_at={best_cred.access_token_expires_at}"
+            )
            auth_token = best_cred.access_token.get_secret_value()

    client = MCPClient(request.server_url, auth_token=auth_token)
@@ -126,7 +134,7 @@ async def discover_tools(
        ],
        server_name=(
            init_result.get("serverInfo", {}).get("name")
-            or server_host(request.server_url)
+            or urlparse(request.server_url).hostname
            or "MCP"
        ),
        protocol_version=init_result.get("protocolVersion"),
@@ -165,16 +173,7 @@ async def mcp_oauth_login(
    3. Performs Dynamic Client Registration (RFC 7591) if available
    4. Returns the authorization URL for the frontend to open in a popup
    """
-    # Validate URL to prevent SSRF — blocks loopback and private IP ranges.
-    try:
-        await validate_url_host(request.server_url)
-    except ValueError as e:
-        raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
-
-    # Normalize the URL so that credentials stored here are matched consistently
-    # by auto_lookup_mcp_credential (which also uses normalized URLs).
-    server_url = normalize_mcp_url(request.server_url)
-    client = MCPClient(server_url)
+    client = MCPClient(request.server_url)

    # Step 1: Discover protected-resource metadata (RFC 9728)
    protected_resource = await client.discover_auth()
@@ -183,16 +182,7 @@ async def mcp_oauth_login(

    if protected_resource and protected_resource.get("authorization_servers"):
        auth_server_url = protected_resource["authorization_servers"][0]
-        resource_url = protected_resource.get("resource", server_url)
-
-        # Validate the auth server URL from metadata to prevent SSRF.
-        try:
-            await validate_url_host(auth_server_url)
-        except ValueError as e:
-            raise fastapi.HTTPException(
-                status_code=400,
-                detail=f"Invalid authorization server URL in metadata: {e}",
-            )
+        resource_url = protected_resource.get("resource", request.server_url)

        # Step 2a: Discover auth-server metadata (RFC 8414)
        metadata = await client.discover_auth_server_metadata(auth_server_url)
@@ -202,7 +192,7 @@ async def mcp_oauth_login(
        # Don't assume a resource_url — omitting it lets the auth server choose
        # the correct audience for the token (RFC 8707 resource is optional).
        resource_url = None
-        metadata = await client.discover_auth_server_metadata(server_url)
+        metadata = await client.discover_auth_server_metadata(request.server_url)

    if (
        not metadata
@@ -232,18 +222,12 @@ async def mcp_oauth_login(
    client_id = ""
    client_secret = ""
    if registration_endpoint:
-        # Validate the registration endpoint to prevent SSRF via metadata.
-        try:
-            await validate_url_host(registration_endpoint)
-        except ValueError:
-            pass  # Skip registration, fall back to default client_id
-        else:
-            reg_result = await _register_mcp_client(
-                registration_endpoint, redirect_uri, server_url
-            )
-            if reg_result:
-                client_id = reg_result.get("client_id", "")
-                client_secret = reg_result.get("client_secret", "")
+        reg_result = await _register_mcp_client(
+            registration_endpoint, redirect_uri, request.server_url
+        )
+        if reg_result:
+            client_id = reg_result.get("client_id", "")
+            client_secret = reg_result.get("client_secret", "")

    if not client_id:
        client_id = "autogpt-platform"
@@ -261,7 +245,7 @@ async def mcp_oauth_login(
            "token_url": token_url,
            "revoke_url": revoke_url,
            "resource_url": resource_url,
-            "server_url": server_url,
+            "server_url": request.server_url,
            "client_id": client_id,
            "client_secret": client_secret,
        },
@@ -358,7 +342,7 @@ async def mcp_oauth_callback(
    credentials.metadata["mcp_token_url"] = meta["token_url"]
    credentials.metadata["mcp_resource_url"] = meta.get("resource_url", "")

-    hostname = server_host(meta["server_url"])
+    hostname = urlparse(meta["server_url"]).hostname or meta["server_url"]
    credentials.title = f"MCP: {hostname}"

    # Remove old MCP credentials for the same server to prevent stale token buildup.
@@ -373,9 +357,7 @@ async def mcp_oauth_callback(
            ):
                await creds_manager.store.delete_creds_by_id(user_id, old.id)
                logger.info(
-                    "Removed old MCP credential %s for %s",
-                    old.id,
-                    server_host(meta["server_url"]),
+                    f"Removed old MCP credential {old.id} for {meta['server_url']}"
                )
    except Exception:
        logger.debug("Could not clean up old MCP credentials", exc_info=True)
@@ -393,93 +375,6 @@ async def mcp_oauth_callback(
    )


-# ======================== Bearer Token ======================== #
-
-
-class MCPStoreTokenRequest(BaseModel):
-    """Request to store a bearer token for an MCP server that doesn't support OAuth."""
-
-    server_url: str = Field(
-        description="MCP server URL the token authenticates against"
-    )
-    token: SecretStr = Field(
-        min_length=1, description="Bearer token / API key for the MCP server"
-    )
-
-
-@router.post(
-    "/token",
-    summary="Store a bearer token for an MCP server",
-)
-async def mcp_store_token(
-    request: MCPStoreTokenRequest,
-    user_id: Annotated[str, Security(get_user_id)],
-) -> CredentialsMetaResponse:
-    """
-    Store a manually provided bearer token as an MCP credential.
-
-    Used by the Copilot MCPSetupCard when the server doesn't support the MCP
-    OAuth discovery flow (returns 400 from /oauth/login).  Subsequent
-    ``run_mcp_tool`` calls will automatically pick up the token via
-    ``_auto_lookup_credential``.
-    """
-    token = request.token.get_secret_value().strip()
-    if not token:
-        raise fastapi.HTTPException(status_code=422, detail="Token must not be blank.")
-
-    # Validate URL to prevent SSRF — blocks loopback and private IP ranges.
-    try:
-        await validate_url_host(request.server_url)
-    except ValueError as e:
-        raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
-
-    # Normalize URL so trailing-slash variants match existing credentials.
-    server_url = normalize_mcp_url(request.server_url)
-    hostname = server_host(server_url)
-
-    # Collect IDs of old credentials to clean up after successful create.
-    old_cred_ids: list[str] = []
-    try:
-        old_creds = await creds_manager.store.get_creds_by_provider(
-            user_id, ProviderName.MCP.value
-        )
-        old_cred_ids = [
-            old.id
-            for old in old_creds
-            if isinstance(old, OAuth2Credentials)
-            and normalize_mcp_url((old.metadata or {}).get("mcp_server_url", ""))
-            == server_url
-        ]
-    except Exception:
-        logger.debug("Could not query old MCP token credentials", exc_info=True)
-
-    credentials = OAuth2Credentials(
-        provider=ProviderName.MCP.value,
-        title=f"MCP: {hostname}",
-        access_token=SecretStr(token),
-        scopes=[],
-        metadata={"mcp_server_url": server_url},
-    )
-    await creds_manager.create(user_id, credentials)
-
-    # Only delete old credentials after the new one is safely stored.
-    for old_id in old_cred_ids:
-        try:
-            await creds_manager.store.delete_creds_by_id(user_id, old_id)
-        except Exception:
-            logger.debug("Could not clean up old MCP token credential", exc_info=True)
-
-    return CredentialsMetaResponse(
-        id=credentials.id,
-        provider=credentials.provider,
-        type=credentials.type,
-        title=credentials.title,
-        scopes=credentials.scopes,
-        username=credentials.username,
-        host=hostname,
-    )
-
-
 # ======================== Helpers ======================== #


@@ -505,7 +400,5 @@ async def _register_mcp_client(
            return data
        return None
    except Exception as e:
-        logger.warning(
-            "Dynamic client registration failed for %s: %s", server_host(server_url), e
-        )
+        logger.warning(f"Dynamic client registration failed for {server_url}: {e}")
        return None
--- a/autogpt_platform/backend/backend/api/features/mcp/test_routes.py
+++ b/autogpt_platform/backend/backend/api/features/mcp/test_routes.py
@@ -11,11 +11,9 @@ import httpx
 import pytest
 import pytest_asyncio
 from autogpt_libs.auth import get_user_id
-from pydantic import SecretStr

 from backend.api.features.mcp.routes import router
 from backend.blocks.mcp.client import MCPClientError, MCPTool
-from backend.data.model import OAuth2Credentials
 from backend.util.request import HTTPClientError

 app = fastapi.FastAPI()
@@ -30,16 +28,6 @@ async def client():
        yield c


-@pytest.fixture(autouse=True)
-def _bypass_ssrf_validation():
-    """Bypass validate_url_host in all route tests (test URLs don't resolve)."""
-    with patch(
-        "backend.api.features.mcp.routes.validate_url_host",
-        new_callable=AsyncMock,
-    ):
-        yield
-
-
 class TestDiscoverTools:
    @pytest.mark.asyncio(loop_scope="session")
    async def test_discover_tools_success(self, client):
@@ -68,12 +56,9 @@ class TestDiscoverTools:

        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                return_value={
@@ -122,6 +107,10 @@ class TestDiscoverTools:
    @pytest.mark.asyncio(loop_scope="session")
    async def test_discover_tools_auto_uses_stored_credential(self, client):
        """When no explicit token is given, stored MCP credentials are used."""
+        from pydantic import SecretStr
+
+        from backend.data.model import OAuth2Credentials
+
        stored_cred = OAuth2Credentials(
            provider="mcp",
            title="MCP: example.com",
@@ -135,12 +124,10 @@ class TestDiscoverTools:

        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=stored_cred,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[stored_cred])
+            mock_cm.refresh_if_needed = AsyncMock(return_value=stored_cred)
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                return_value={"serverInfo": {}, "protocolVersion": "2025-03-26"}
@@ -162,12 +149,9 @@ class TestDiscoverTools:
    async def test_discover_tools_mcp_error(self, client):
        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                side_effect=MCPClientError("Connection refused")
@@ -185,12 +169,9 @@ class TestDiscoverTools:
    async def test_discover_tools_generic_error(self, client):
        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(side_effect=Exception("Network timeout"))

@@ -206,12 +187,9 @@ class TestDiscoverTools:
    async def test_discover_tools_auth_required(self, client):
        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                side_effect=HTTPClientError("HTTP 401 Error: Unauthorized", 401)
@@ -229,12 +207,9 @@ class TestDiscoverTools:
    async def test_discover_tools_forbidden(self, client):
        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                side_effect=HTTPClientError("HTTP 403 Error: Forbidden", 403)
@@ -356,6 +331,10 @@ class TestOAuthLogin:
 class TestOAuthCallback:
    @pytest.mark.asyncio(loop_scope="session")
    async def test_oauth_callback_success(self, client):
+        from pydantic import SecretStr
+
+        from backend.data.model import OAuth2Credentials
+
        mock_creds = OAuth2Credentials(
            provider="mcp",
            title=None,
@@ -455,118 +434,3 @@ class TestOAuthCallback:

        assert response.status_code == 400
        assert "token exchange failed" in response.json()["detail"].lower()
-
-
-class TestStoreToken:
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_store_token_success(self, client):
-        with patch("backend.api.features.mcp.routes.creds_manager") as mock_cm:
-            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
-            mock_cm.create = AsyncMock()
-
-            response = await client.post(
-                "/token",
-                json={
-                    "server_url": "https://mcp.example.com/mcp",
-                    "token": "my-api-key-123",
-                },
-            )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert data["provider"] == "mcp"
-        assert data["type"] == "oauth2"
-        assert data["host"] == "mcp.example.com"
-        mock_cm.create.assert_called_once()
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_store_token_blank_rejected(self, client):
-        """Blank token string (after stripping) should return 422."""
-        response = await client.post(
-            "/token",
-            json={
-                "server_url": "https://mcp.example.com/mcp",
-                "token": "   ",
-            },
-        )
-        # Pydantic min_length=1 catches the whitespace-only token
-        assert response.status_code == 422
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_store_token_replaces_old_credential(self, client):
-        old_cred = OAuth2Credentials(
-            provider="mcp",
-            title="MCP: mcp.example.com",
-            access_token=SecretStr("old-token"),
-            scopes=[],
-            metadata={"mcp_server_url": "https://mcp.example.com/mcp"},
-        )
-        with patch("backend.api.features.mcp.routes.creds_manager") as mock_cm:
-            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[old_cred])
-            mock_cm.create = AsyncMock()
-            mock_cm.store.delete_creds_by_id = AsyncMock()
-
-            response = await client.post(
-                "/token",
-                json={
-                    "server_url": "https://mcp.example.com/mcp",
-                    "token": "new-token",
-                },
-            )
-
-        assert response.status_code == 200
-        mock_cm.store.delete_creds_by_id.assert_called_once_with(
-            "test-user-id", old_cred.id
-        )
-
-
-class TestSSRFValidation:
-    """Verify that validate_url_host is enforced on all endpoints."""
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_discover_tools_ssrf_blocked(self, client):
-        with patch(
-            "backend.api.features.mcp.routes.validate_url_host",
-            new_callable=AsyncMock,
-            side_effect=ValueError("blocked loopback"),
-        ):
-            response = await client.post(
-                "/discover-tools",
-                json={"server_url": "http://localhost/mcp"},
-            )
-
-        assert response.status_code == 400
-        assert "blocked loopback" in response.json()["detail"].lower()
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_oauth_login_ssrf_blocked(self, client):
-        with patch(
-            "backend.api.features.mcp.routes.validate_url_host",
-            new_callable=AsyncMock,
-            side_effect=ValueError("blocked private IP"),
-        ):
-            response = await client.post(
-                "/oauth/login",
-                json={"server_url": "http://10.0.0.1/mcp"},
-            )
-
-        assert response.status_code == 400
-        assert "blocked private ip" in response.json()["detail"].lower()
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_store_token_ssrf_blocked(self, client):
-        with patch(
-            "backend.api.features.mcp.routes.validate_url_host",
-            new_callable=AsyncMock,
-            side_effect=ValueError("blocked loopback"),
-        ):
-            response = await client.post(
-                "/token",
-                json={
-                    "server_url": "http://127.0.0.1/mcp",
-                    "token": "some-token",
-                },
-            )
-
-        assert response.status_code == 400
-        assert "blocked loopback" in response.json()["detail"].lower()
--- a/autogpt_platform/backend/backend/api/features/store/cache.py
+++ b/autogpt_platform/backend/backend/api/features/store/cache.py
@@ -1,3 +1,5 @@
+from typing import Literal
+
 from backend.util.cache import cached

 from . import db as store_db
@@ -21,7 +23,7 @@ def clear_all_caches():
 async def _get_cached_store_agents(
    featured: bool,
    creator: str | None,
-    sorted_by: store_db.StoreAgentsSortOptions | None,
+    sorted_by: Literal["rating", "runs", "name", "updated_at"] | None,
    search_query: str | None,
    category: str | None,
    page: int,
@@ -55,7 +57,7 @@ async def _get_cached_agent_details(
 async def _get_cached_store_creators(
    featured: bool,
    search_query: str | None,
-    sorted_by: store_db.StoreCreatorsSortOptions | None,
+    sorted_by: Literal["agent_rating", "agent_runs", "num_agents"] | None,
    page: int,
    page_size: int,
 ):
@@ -73,4 +75,4 @@ async def _get_cached_store_creators(
@cached(maxsize=100, ttl_seconds=300, shared_cache=True)
 async def _get_cached_creator_details(username: str):
    """Cached helper to get creator details."""
-    return await store_db.get_store_creator(username=username.lower())
+    return await store_db.get_store_creator_details(username=username.lower())
--- a/autogpt_platform/backend/backend/api/features/store/content_handlers.py
+++ b/autogpt_platform/backend/backend/api/features/store/content_handlers.py
@@ -9,26 +9,15 @@ import logging
 from abc import ABC, abstractmethod
 from dataclasses import dataclass
 from pathlib import Path
-from typing import Any, get_args, get_origin
+from typing import Any

 from prisma.enums import ContentType

-from backend.blocks.llm import LlmModel
 from backend.data.db import query_raw_with_schema

 logger = logging.getLogger(__name__)


-def _contains_type(annotation: Any, target: type) -> bool:
-    """Check if an annotation is or contains the target type (handles Optional/Union/Annotated)."""
-    if annotation is target:
-        return True
-    origin = get_origin(annotation)
-    if origin is None:
-        return False
-    return any(_contains_type(arg, target) for arg in get_args(annotation))
-
-
@dataclass
 class ContentItem:
    """Represents a piece of content to be embedded."""
@@ -199,51 +188,45 @@ class BlockHandler(ContentHandler):
            try:
                block_instance = block_cls()

+                # Skip disabled blocks - they shouldn't be indexed
                if block_instance.disabled:
                    continue

                # Build searchable text from block metadata
                parts = []
-                if block_instance.name:
+                if hasattr(block_instance, "name") and block_instance.name:
                    parts.append(block_instance.name)
-                if block_instance.description:
+                if (
+                    hasattr(block_instance, "description")
+                    and block_instance.description
+                ):
                    parts.append(block_instance.description)
-                if block_instance.categories:
+                if hasattr(block_instance, "categories") and block_instance.categories:
+                    # Convert BlockCategory enum to strings
                    parts.append(
                        " ".join(str(cat.value) for cat in block_instance.categories)
                    )

-                # Add input schema field descriptions
-                block_input_fields = block_instance.input_schema.model_fields
-                parts += [
-                    f"{field_name}: {field_info.description}"
-                    for field_name, field_info in block_input_fields.items()
-                    if field_info.description
-                ]
+                # Add input/output schema info
+                if hasattr(block_instance, "input_schema"):
+                    schema = block_instance.input_schema
+                    if hasattr(schema, "model_json_schema"):
+                        schema_dict = schema.model_json_schema()
+                        if "properties" in schema_dict:
+                            for prop_name, prop_info in schema_dict[
+                                "properties"
+                            ].items():
+                                if "description" in prop_info:
+                                    parts.append(
+                                        f"{prop_name}: {prop_info['description']}"
+                                    )

                searchable_text = " ".join(parts)

+                # Convert categories set of enums to list of strings for JSON serialization
+                categories = getattr(block_instance, "categories", set())
                categories_list = (
-                    [cat.value for cat in block_instance.categories]
-                    if block_instance.categories
-                    else []
-                )
-
-                # Extract provider names from credentials fields
-                credentials_info = (
-                    block_instance.input_schema.get_credentials_fields_info()
-                )
-                is_integration = len(credentials_info) > 0
-                provider_names = [
-                    provider.value.lower()
-                    for info in credentials_info.values()
-                    for provider in info.provider
-                ]
-
-                # Check if block has LlmModel field in input schema
-                has_llm_model_field = any(
-                    _contains_type(field.annotation, LlmModel)
-                    for field in block_instance.input_schema.model_fields.values()
+                    [cat.value for cat in categories] if categories else []
                )

                items.append(
@@ -252,11 +235,8 @@ class BlockHandler(ContentHandler):
                        content_type=ContentType.BLOCK,
                        searchable_text=searchable_text,
                        metadata={
-                            "name": block_instance.name,
+                            "name": getattr(block_instance, "name", ""),
                            "categories": categories_list,
-                            "providers": provider_names,
-                            "has_llm_model_field": has_llm_model_field,
-                            "is_integration": is_integration,
                        },
                        user_id=None,  # Blocks are public
                    )
--- a/autogpt_platform/backend/backend/api/features/store/content_handlers_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/content_handlers_test.py
@@ -82,10 +82,9 @@ async def test_block_handler_get_missing_items(mocker):
    mock_block_instance.description = "Performs calculations"
    mock_block_instance.categories = [MagicMock(value="MATH")]
    mock_block_instance.disabled = False
-    mock_field = MagicMock()
-    mock_field.description = "Math expression to evaluate"
-    mock_block_instance.input_schema.model_fields = {"expression": mock_field}
-    mock_block_instance.input_schema.get_credentials_fields_info.return_value = {}
+    mock_block_instance.input_schema.model_json_schema.return_value = {
+        "properties": {"expression": {"description": "Math expression to evaluate"}}
+    }
    mock_block_class.return_value = mock_block_instance

    mock_blocks = {"block-uuid-1": mock_block_class}
@@ -310,19 +309,19 @@ async def test_content_handlers_registry():


@pytest.mark.asyncio(loop_scope="session")
-async def test_block_handler_handles_empty_attributes():
-    """Test BlockHandler handles blocks with empty/falsy attribute values."""
+async def test_block_handler_handles_missing_attributes():
+    """Test BlockHandler gracefully handles blocks with missing attributes."""
    handler = BlockHandler()

-    # Mock block with empty values (all attributes exist but are falsy)
+    # Mock block with minimal attributes
    mock_block_class = MagicMock()
    mock_block_instance = MagicMock()
    mock_block_instance.name = "Minimal Block"
    mock_block_instance.disabled = False
-    mock_block_instance.description = ""
-    mock_block_instance.categories = set()
-    mock_block_instance.input_schema.model_fields = {}
-    mock_block_instance.input_schema.get_credentials_fields_info.return_value = {}
+    # No description, categories, or schema
+    del mock_block_instance.description
+    del mock_block_instance.categories
+    del mock_block_instance.input_schema
    mock_block_class.return_value = mock_block_instance

    mock_blocks = {"block-minimal": mock_block_class}
@@ -353,8 +352,6 @@ async def test_block_handler_skips_failed_blocks():
    good_instance.description = "Works fine"
    good_instance.categories = []
    good_instance.disabled = False
-    good_instance.input_schema.model_fields = {}
-    good_instance.input_schema.get_credentials_fields_info.return_value = {}
    good_block.return_value = good_instance

    bad_block = MagicMock()
--- a/autogpt_platform/backend/backend/api/features/store/db.py
+++ b/autogpt_platform/backend/backend/api/features/store/db.py
--- a/autogpt_platform/backend/backend/api/features/store/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/db_test.py
@@ -26,7 +26,7 @@ async def test_get_store_agents(mocker):
    mock_agents = [
        prisma.models.StoreAgent(
            listing_id="test-id",
-            listing_version_id="version123",
+            storeListingVersionId="version123",
            slug="test-agent",
            agent_name="Test Agent",
            agent_video=None,
@@ -40,11 +40,11 @@ async def test_get_store_agents(mocker):
            runs=10,
            rating=4.5,
            versions=["1.0"],
-            graph_id="test-graph-id",
-            graph_versions=["1"],
+            agentGraphVersions=["1"],
+            agentGraphId="test-graph-id",
            updated_at=datetime.now(),
            is_available=False,
-            use_for_onboarding=False,
+            useForOnboarding=False,
        )
    ]

@@ -68,10 +68,10 @@ async def test_get_store_agents(mocker):

@pytest.mark.asyncio(loop_scope="session")
 async def test_get_store_agent_details(mocker):
-    # Mock data - StoreAgent view already contains the active version data
+    # Mock data
    mock_agent = prisma.models.StoreAgent(
        listing_id="test-id",
-        listing_version_id="version123",
+        storeListingVersionId="version123",
        slug="test-agent",
        agent_name="Test Agent",
        agent_video="video.mp4",
@@ -85,38 +85,102 @@ async def test_get_store_agent_details(mocker):
        runs=10,
        rating=4.5,
        versions=["1.0"],
-        graph_id="test-graph-id",
-        graph_versions=["1"],
+        agentGraphVersions=["1"],
+        agentGraphId="test-graph-id",
        updated_at=datetime.now(),
-        is_available=True,
-        use_for_onboarding=False,
+        is_available=False,
+        useForOnboarding=False,
    )

-    # Mock StoreAgent prisma call
+    # Mock active version agent (what we want to return for active version)
+    mock_active_agent = prisma.models.StoreAgent(
+        listing_id="test-id",
+        storeListingVersionId="active-version-id",
+        slug="test-agent",
+        agent_name="Test Agent Active",
+        agent_video="active_video.mp4",
+        agent_image=["active_image.jpg"],
+        featured=False,
+        creator_username="creator",
+        creator_avatar="avatar.jpg",
+        sub_heading="Test heading active",
+        description="Test description active",
+        categories=["test"],
+        runs=15,
+        rating=4.8,
+        versions=["1.0", "2.0"],
+        agentGraphVersions=["1", "2"],
+        agentGraphId="test-graph-id-active",
+        updated_at=datetime.now(),
+        is_available=True,
+        useForOnboarding=False,
+    )
+
+    # Create a mock StoreListing result
+    mock_store_listing = mocker.MagicMock()
+    mock_store_listing.activeVersionId = "active-version-id"
+    mock_store_listing.hasApprovedVersion = True
+    mock_store_listing.ActiveVersion = mocker.MagicMock()
+    mock_store_listing.ActiveVersion.recommendedScheduleCron = None
+
+    # Mock StoreAgent prisma call - need to handle multiple calls
    mock_store_agent = mocker.patch("prisma.models.StoreAgent.prisma")
-    mock_store_agent.return_value.find_first = mocker.AsyncMock(return_value=mock_agent)
+
+    # Set up side_effect to return different results for different calls
+    def mock_find_first_side_effect(*args, **kwargs):
+        where_clause = kwargs.get("where", {})
+        if "storeListingVersionId" in where_clause:
+            # Second call for active version
+            return mock_active_agent
+        else:
+            # First call for initial lookup
+            return mock_agent
+
+    mock_store_agent.return_value.find_first = mocker.AsyncMock(
+        side_effect=mock_find_first_side_effect
+    )
+
+    # Mock Profile prisma call
+    mock_profile = mocker.MagicMock()
+    mock_profile.userId = "user-id-123"
+    mock_profile_db = mocker.patch("prisma.models.Profile.prisma")
+    mock_profile_db.return_value.find_first = mocker.AsyncMock(
+        return_value=mock_profile
+    )
+
+    # Mock StoreListing prisma call
+    mock_store_listing_db = mocker.patch("prisma.models.StoreListing.prisma")
+    mock_store_listing_db.return_value.find_first = mocker.AsyncMock(
+        return_value=mock_store_listing
+    )

    # Call function
    result = await db.get_store_agent_details("creator", "test-agent")

-    # Verify results - constructed from the StoreAgent view
+    # Verify results - should use active version data
    assert result.slug == "test-agent"
-    assert result.agent_name == "Test Agent"
-    assert result.active_version_id == "version123"
+    assert result.agent_name == "Test Agent Active"  # From active version
+    assert result.active_version_id == "active-version-id"
    assert result.has_approved_version is True
-    assert result.store_listing_version_id == "version123"
-    assert result.graph_id == "test-graph-id"
-    assert result.runs == 10
-    assert result.rating == 4.5
+    assert (
+        result.store_listing_version_id == "active-version-id"
+    )  # Should be active version ID

-    # Verify single StoreAgent lookup
-    mock_store_agent.return_value.find_first.assert_called_once_with(
+    # Verify mocks called correctly - now expecting 2 calls
+    assert mock_store_agent.return_value.find_first.call_count == 2
+
+    # Check the specific calls
+    calls = mock_store_agent.return_value.find_first.call_args_list
+    assert calls[0] == mocker.call(
        where={"creator_username": "creator", "slug": "test-agent"}
    )
+    assert calls[1] == mocker.call(where={"storeListingVersionId": "active-version-id"})
+
+    mock_store_listing_db.return_value.find_first.assert_called_once()


@pytest.mark.asyncio(loop_scope="session")
-async def test_get_store_creator(mocker):
+async def test_get_store_creator_details(mocker):
    # Mock data
    mock_creator_data = prisma.models.Creator(
        name="Test Creator",
@@ -138,7 +202,7 @@ async def test_get_store_creator(mocker):
    mock_creator.return_value.find_unique.return_value = mock_creator_data

    # Call function
-    result = await db.get_store_creator("creator")
+    result = await db.get_store_creator_details("creator")

    # Verify results
    assert result.username == "creator"
@@ -154,110 +218,61 @@ async def test_get_store_creator(mocker):

@pytest.mark.asyncio(loop_scope="session")
 async def test_create_store_submission(mocker):
-    now = datetime.now()
-
-    # Mock agent graph (with no pending submissions) and user with profile
-    mock_profile = prisma.models.Profile(
-        id="profile-id",
-        userId="user-id",
-        name="Test User",
-        username="testuser",
-        description="Test",
-        isFeatured=False,
-        links=[],
-        createdAt=now,
-        updatedAt=now,
-    )
-    mock_user = prisma.models.User(
-        id="user-id",
-        email="test@example.com",
-        createdAt=now,
-        updatedAt=now,
-        Profile=[mock_profile],
-        emailVerified=True,
-        metadata="{}",  # type: ignore[reportArgumentType]
-        integrations="",
-        maxEmailsPerDay=1,
-        notifyOnAgentRun=True,
-        notifyOnZeroBalance=True,
-        notifyOnLowBalance=True,
-        notifyOnBlockExecutionFailed=True,
-        notifyOnContinuousAgentError=True,
-        notifyOnDailySummary=True,
-        notifyOnWeeklySummary=True,
-        notifyOnMonthlySummary=True,
-        notifyOnAgentApproved=True,
-        notifyOnAgentRejected=True,
-        timezone="Europe/Delft",
-    )
+    # Mock data
    mock_agent = prisma.models.AgentGraph(
        id="agent-id",
        version=1,
        userId="user-id",
-        createdAt=now,
+        createdAt=datetime.now(),
        isActive=True,
-        StoreListingVersions=[],
-        User=mock_user,
    )

-    # Mock the created StoreListingVersion (returned by create)
-    mock_store_listing_obj = prisma.models.StoreListing(
+    mock_listing = prisma.models.StoreListing(
        id="listing-id",
-        createdAt=now,
-        updatedAt=now,
+        createdAt=datetime.now(),
+        updatedAt=datetime.now(),
        isDeleted=False,
        hasApprovedVersion=False,
        slug="test-agent",
        agentGraphId="agent-id",
-        owningUserId="user-id",
-        useForOnboarding=False,
-    )
-    mock_version = prisma.models.StoreListingVersion(
-        id="version-id",
-        agentGraphId="agent-id",
        agentGraphVersion=1,
-        name="Test Agent",
-        description="Test description",
-        createdAt=now,
-        updatedAt=now,
-        subHeading="",
-        imageUrls=[],
-        categories=[],
-        isFeatured=False,
-        isDeleted=False,
-        version=1,
-        storeListingId="listing-id",
-        submissionStatus=prisma.enums.SubmissionStatus.PENDING,
-        isAvailable=True,
-        submittedAt=now,
-        StoreListing=mock_store_listing_obj,
+        owningUserId="user-id",
+        Versions=[
+            prisma.models.StoreListingVersion(
+                id="version-id",
+                agentGraphId="agent-id",
+                agentGraphVersion=1,
+                name="Test Agent",
+                description="Test description",
+                createdAt=datetime.now(),
+                updatedAt=datetime.now(),
+                subHeading="Test heading",
+                imageUrls=["image.jpg"],
+                categories=["test"],
+                isFeatured=False,
+                isDeleted=False,
+                version=1,
+                storeListingId="listing-id",
+                submissionStatus=prisma.enums.SubmissionStatus.PENDING,
+                isAvailable=True,
+            )
+        ],
+        useForOnboarding=False,
    )

    # Mock prisma calls
    mock_agent_graph = mocker.patch("prisma.models.AgentGraph.prisma")
    mock_agent_graph.return_value.find_first = mocker.AsyncMock(return_value=mock_agent)

-    # Mock transaction context manager
-    mock_tx = mocker.MagicMock()
-    mocker.patch(
-        "backend.api.features.store.db.transaction",
-        return_value=mocker.AsyncMock(
-            __aenter__=mocker.AsyncMock(return_value=mock_tx),
-            __aexit__=mocker.AsyncMock(return_value=False),
-        ),
-    )
-
-    mock_sl = mocker.patch("prisma.models.StoreListing.prisma")
-    mock_sl.return_value.find_unique = mocker.AsyncMock(return_value=None)
-
-    mock_slv = mocker.patch("prisma.models.StoreListingVersion.prisma")
-    mock_slv.return_value.create = mocker.AsyncMock(return_value=mock_version)
+    mock_store_listing = mocker.patch("prisma.models.StoreListing.prisma")
+    mock_store_listing.return_value.find_first = mocker.AsyncMock(return_value=None)
+    mock_store_listing.return_value.create = mocker.AsyncMock(return_value=mock_listing)

    # Call function
    result = await db.create_store_submission(
        user_id="user-id",
-        graph_id="agent-id",
-        graph_version=1,
+        agent_id="agent-id",
+        agent_version=1,
        slug="test-agent",
        name="Test Agent",
        description="Test description",
@@ -266,11 +281,11 @@ async def test_create_store_submission(mocker):
    # Verify results
    assert result.name == "Test Agent"
    assert result.description == "Test description"
-    assert result.listing_version_id == "version-id"
+    assert result.store_listing_version_id == "version-id"

    # Verify mocks called correctly
    mock_agent_graph.return_value.find_first.assert_called_once()
-    mock_slv.return_value.create.assert_called_once()
+    mock_store_listing.return_value.create.assert_called_once()


@pytest.mark.asyncio(loop_scope="session")
@@ -303,6 +318,7 @@ async def test_update_profile(mocker):
        description="Test description",
        links=["link1"],
        avatar_url="avatar.jpg",
+        is_featured=False,
    )

    # Call function
@@ -373,7 +389,7 @@ async def test_get_store_agents_with_search_and_filters_parameterized():
        creators=["creator1'; DROP TABLE Users; --", "creator2"],
        category="AI'; DELETE FROM StoreAgent; --",
        featured=True,
-        sorted_by=db.StoreAgentsSortOptions.RATING,
+        sorted_by="rating",
        page=1,
        page_size=20,
    )
--- a/autogpt_platform/backend/backend/api/features/store/exceptions.py
+++ b/autogpt_platform/backend/backend/api/features/store/exceptions.py
@@ -57,6 +57,12 @@ class StoreError(ValueError):
    pass


+class AgentNotFoundError(NotFoundError):
+    """Raised when an agent is not found"""
+
+    pass
+
+
 class CreatorNotFoundError(NotFoundError):
    """Raised when a creator is not found"""

--- a/autogpt_platform/backend/backend/api/features/store/hybrid_search.py
+++ b/autogpt_platform/backend/backend/api/features/store/hybrid_search.py
@@ -568,7 +568,7 @@ async def hybrid_search(
            SELECT uce."contentId" as "storeListingVersionId"
            FROM {{schema_prefix}}"UnifiedContentEmbedding" uce
            INNER JOIN {{schema_prefix}}"StoreAgent" sa
-                ON uce."contentId" = sa.listing_version_id
+                ON uce."contentId" = sa."storeListingVersionId"
            WHERE uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
            AND uce."userId" IS NULL
            AND uce.search @@ plainto_tsquery('english', {query_param})
@@ -582,7 +582,7 @@ async def hybrid_search(
                SELECT uce."contentId", uce.embedding
                FROM {{schema_prefix}}"UnifiedContentEmbedding" uce
                INNER JOIN {{schema_prefix}}"StoreAgent" sa
-                    ON uce."contentId" = sa.listing_version_id
+                    ON uce."contentId" = sa."storeListingVersionId"
                WHERE uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
                AND uce."userId" IS NULL
                AND {where_clause}
@@ -605,7 +605,7 @@ async def hybrid_search(
                sa.featured,
                sa.is_available,
                sa.updated_at,
-                sa.graph_id,
+                sa."agentGraphId",
                -- Searchable text for BM25 reranking
                COALESCE(sa.agent_name, '') || ' ' || COALESCE(sa.sub_heading, '') || ' ' || COALESCE(sa.description, '') as searchable_text,
                -- Semantic score
@@ -627,9 +627,9 @@ async def hybrid_search(
                sa.runs as popularity_raw
            FROM candidates c
            INNER JOIN {{schema_prefix}}"StoreAgent" sa
-                ON c."storeListingVersionId" = sa.listing_version_id
+                ON c."storeListingVersionId" = sa."storeListingVersionId"
            INNER JOIN {{schema_prefix}}"UnifiedContentEmbedding" uce
-                ON sa.listing_version_id = uce."contentId"
+                ON sa."storeListingVersionId" = uce."contentId"
                AND uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
        ),
        max_vals AS (
@@ -665,7 +665,7 @@ async def hybrid_search(
                featured,
                is_available,
                updated_at,
-                graph_id,
+                "agentGraphId",
                searchable_text,
                semantic_score,
                lexical_score,
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Nick Tindle	c42aab3482	Merge branch 'dev' into ntindle/google-issues-fix Resolved conflicts: - executor/utils.py: kept auto-credentials validation + dev's comment update - graph_test.py: kept both auto-credentials tests AND MCP deduplication tests	2026-02-16 00:09:35 -06:00
Nick Tindle	e7705427bb	fix: add is_auto_credential and input_field_name to auto credentials schema Post-refactor fix: these fields were moved from data/block.py to blocks/_base.py in #12068	2026-02-12 16:13:23 -06:00
Nicholas Tindle	201ec5aa3a	Merge branch 'dev' into ntindle/google-issues-fix	2026-02-12 16:12:20 -06:00
Nicholas Tindle	2b8134a711	Merge branch 'dev' into ntindle/google-issues-fix	2026-02-09 13:46:37 -06:00
Nicholas Tindle	90b3b5ba16	fix(backend): Fix misplaced section header in graph_test.py Move the _reassign_ids section comment to above the actual _reassign_ids tests, and label the combine() tests correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 16:11:47 -06:00
Nicholas Tindle	f4f81bc4fc	fix(backend): Remove _credentials_id key on fork instead of setting to None Setting _credentials_id to None on fork was ambiguous — both "forked, needs re-auth" and "chained data from upstream" were represented as None. This caused _acquire_auto_credentials to silently skip credential acquisition for forked agents, leading to confusing TypeErrors at runtime. Now the key is deleted entirely, making the three states unambiguous: - Present with value: user-selected credentials - Present as None: chained data from upstream block - Absent: forked/needs re-authentication Also adds pre-run validation for the missing key case and makes error messages provider-agnostic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 17:34:16 -06:00
Nicholas Tindle	c5abc01f25	fix(backend): Add error handling for auto-credentials store lookup Wrap get_creds_by_id call in try/except in the auto-credentials validation path to match the error handling pattern used for regular credentials. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 16:53:29 -06:00
Nicholas Tindle	8b7053c1de	merge: Resolve conflicts with dev (PR #11986 graph model refactor) Adapt auto-credentials filtering to dev's refactored graph model: - aggregate_credentials_inputs() now returns 3-tuples (field_info, node_pairs, is_required) - credentials_input_schema moved to GraphModel, builds JSON schema directly - Update regular/auto_credentials_inputs properties for 3-tuple format - Update test mocks and assertions for new tuple format and class hierarchy Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 16:39:57 -06:00
Nicholas Tindle	e00c1202ad	fix(platform): Fix Google Drive auto-credentials handling across the platform - Tag auto-credentials with `is_auto_credential` and `input_field_name` on `CredentialsFieldInfo` to distinguish them from regular user-provided credentials - Add `regular_credentials_inputs` and `auto_credentials_inputs` properties to `Graph` so UI schemas, CoPilot, and library presets only surface regular credentials - Extract `_acquire_auto_credentials()` helper in executor to resolve embedded `_credentials_id` at execution time with proper lock management - Validate auto-credentials ownership in `_validate_node_input_credentials()` to catch stale/missing credentials before execution - Clear `_credentials_id` in `_reassign_ids()` on graph fork so cloned agents require re-authentication - Propagate `is_auto_credential` through `combine()` and `discriminate()` on `CredentialsFieldInfo` - Add `referrerPolicy: "no-referrer-when-downgrade"` to Google API script loading to fix Firefox API key validation - Comprehensive test coverage for all new behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 16:08:53 -06:00