fix(copilot): resolve dangling tool spinners when stream finishes

When the backend doesn't emit StreamToolOutputAvailable for all tool calls before StreamFinish (e.g. SDK built-in tools like WebSearch), the frontend spinners would spin forever. Add a useEffect that watches for the streaming→ready transition and marks any remaining input-available/input-streaming tool parts as output-available. Extract shared resolveInProgressTools helper used by both the stop handler (cancelled) and stream-end (completed).
style: fix Black formatting on cancel endpoint
2026-03-17 03:00:27 -04:00 · 2026-02-20 03:48:20 +07:00 · 2026-02-20 02:46:03 +07:00 · 2026-02-20 02:31:18 +07:00 · 2026-02-20 02:25:43 +07:00 · 2026-02-20 02:21:28 +07:00
448 changed files with 25117 additions and 39090 deletions
--- a/.github/workflows/platform-frontend-ci.yml
+++ b/.github/workflows/platform-frontend-ci.yml
@@ -149,7 +149,7 @@ jobs:
          driver-opts: network=host

      - name: Set up Platform - Expose GHA cache to docker buildx CLI
-        uses: crazy-max/ghaction-github-runtime@v4
+        uses: crazy-max/ghaction-github-runtime@v3

      - name: Set up Platform - Build Docker images (with cache)
        working-directory: autogpt_platform
--- a/.gitignore
+++ b/.gitignore
@@ -180,6 +180,4 @@ autogpt_platform/backend/settings.py
 .claude/settings.local.json
 CLAUDE.local.md
 /autogpt_platform/backend/logs
-.next
-# Implementation plans (generated by AI agents)
-plans/
+.next
--- a/.nvmrc
+++ b/.nvmrc
@@ -1 +0,0 @@
-22
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,10 +1,3 @@
-default_install_hook_types:
-  - pre-commit
-  - pre-push
-  - post-checkout
-
-default_stages: [pre-commit]
-
 repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
@@ -24,7 +17,6 @@ repos:
        name: Detect secrets
        description: Detects high entropy strings that are likely to be passwords.
        files: ^autogpt_platform/
-        exclude: pnpm-lock\.yaml$
        stages: [pre-push]

  - repo: local
@@ -34,106 +26,49 @@ repos:
      - id: poetry-install
        name: Check & Install dependencies - AutoGPT Platform - Backend
        alias: poetry-install-platform-backend
+        entry: poetry -C autogpt_platform/backend install
        # include autogpt_libs source (since it's a path dependency)
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/(backend|autogpt_libs)/poetry\.lock$" || exit 0;
-          poetry -C autogpt_platform/backend install
-          '
-        always_run: true
+        files: ^autogpt_platform/(backend|autogpt_libs)/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - AutoGPT Platform - Libs
        alias: poetry-install-platform-libs
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/autogpt_libs/poetry\.lock$" || exit 0;
-          poetry -C autogpt_platform/autogpt_libs install
-          '
-        always_run: true
+        entry: poetry -C autogpt_platform/autogpt_libs install
+        files: ^autogpt_platform/autogpt_libs/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]
-
-      - id: pnpm-install
-        name: Check & Install dependencies - AutoGPT Platform - Frontend
-        alias: pnpm-install-platform-frontend
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/frontend/pnpm-lock\.yaml$" || exit 0;
-          pnpm --prefix autogpt_platform/frontend install
-          '
-        always_run: true
-        language: system
-        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - Classic - AutoGPT
        alias: poetry-install-classic-autogpt
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/(original_autogpt|forge)/poetry\.lock$" || exit 0;
-          poetry -C classic/original_autogpt install
-          '
+        entry: poetry -C classic/original_autogpt install
        # include forge source (since it's a path dependency)
-        always_run: true
+        files: ^classic/(original_autogpt|forge)/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - Classic - Forge
        alias: poetry-install-classic-forge
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/forge/poetry\.lock$" || exit 0;
-          poetry -C classic/forge install
-          '
-        always_run: true
+        entry: poetry -C classic/forge install
+        files: ^classic/forge/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - Classic - Benchmark
        alias: poetry-install-classic-benchmark
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/benchmark/poetry\.lock$" || exit 0;
-          poetry -C classic/benchmark install
-          '
-        always_run: true
+        entry: poetry -C classic/benchmark install
+        files: ^classic/benchmark/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

  - repo: local
    # For proper type checking, Prisma client must be up-to-date.
@@ -141,54 +76,12 @@ repos:
      - id: prisma-generate
        name: Prisma Generate - AutoGPT Platform - Backend
        alias: prisma-generate-platform-backend
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/((backend|autogpt_libs)/poetry\.lock|backend/schema\.prisma)$" || exit 0;
-          cd autogpt_platform/backend
-          && poetry run prisma generate
-          && poetry run gen-prisma-stub
-          '
+        entry: bash -c 'cd autogpt_platform/backend && poetry run prisma generate'
        # include everything that triggers poetry install + the prisma schema
-        always_run: true
+        files: ^autogpt_platform/((backend|autogpt_libs)/poetry\.lock|backend/schema.prisma)$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]
-
-      - id: export-api-schema
-        name: Export API schema - AutoGPT Platform - Backend -> Frontend
-        alias: export-api-schema-platform
-        entry: >
-          bash -c '
-          cd autogpt_platform/backend
-          && poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
-          && cd ../frontend
-          && pnpm prettier --write ./src/app/api/openapi.json
-          '
-        files: ^autogpt_platform/backend/
-        language: system
-        pass_filenames: false
-
-      - id: generate-api-client
-        name: Generate API client - AutoGPT Platform - Frontend
-        alias: generate-api-client-platform-frontend
-        entry: >
-          bash -c '
-          SCHEMA=autogpt_platform/frontend/src/app/api/openapi.json;
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --quiet "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF" -- "$SCHEMA" && exit 0
-          else
-            git diff --quiet HEAD -- "$SCHEMA" && exit 0
-          fi;
-          cd autogpt_platform/frontend && pnpm generate:api
-          '
-        always_run: true
-        language: system
-        pass_filenames: false
-        stages: [pre-commit, post-checkout]

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.7.2
--- a/autogpt_platform/.gitignore
+++ b/autogpt_platform/.gitignore
@@ -1,3 +1,2 @@
 *.ignore.*
-*.ign.*
-.application.logs
+*.ign.*
--- a/autogpt_platform/backend/.env.default
+++ b/autogpt_platform/backend/.env.default
@@ -190,8 +190,5 @@ ZEROBOUNCE_API_KEY=
 POSTHOG_API_KEY=
 POSTHOG_HOST=https://eu.i.posthog.com

-# Tally Form Integration (pre-populate business understanding on signup)
-TALLY_API_KEY=
-
 # Other Services
 AUTOMOD_API_KEY=
--- a/autogpt_platform/backend/Dockerfile
+++ b/autogpt_platform/backend/Dockerfile
@@ -95,7 +95,7 @@ ENV DEBIAN_FRONTEND=noninteractive

 # Install Python, FFmpeg, ImageMagick, and CLI tools for agent use.
 # bubblewrap provides OS-level sandbox (whitelist-only FS + no network)
-# for the bash_exec MCP tool (fallback when E2B is not configured).
+# for the bash_exec MCP tool.
 # Using --no-install-recommends saves ~650MB by skipping unnecessary deps like llvm, mesa, etc.
 RUN apt-get update && apt-get install -y --no-install-recommends \
    python3.13 \
@@ -111,29 +111,13 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
 # Copy poetry (build-time only, for `poetry install --only-root` to create entry points)
 COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
 COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
-# Copy Node.js installation for Prisma and agent-browser.
-# npm/npx are symlinks in the builder (-> ../lib/node_modules/npm/bin/*-cli.js);
-# COPY resolves them to regular files, breaking require() paths.  Recreate as
-# proper symlinks so npm/npx can find their modules.
+# Copy Node.js installation for Prisma
 COPY --from=builder /usr/bin/node /usr/bin/node
 COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
-RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
-    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
+COPY --from=builder /usr/bin/npm /usr/bin/npm
+COPY --from=builder /usr/bin/npx /usr/bin/npx
 COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries

-# Install agent-browser (Copilot browser tool) + Chromium runtime dependencies.
-# These are the runtime libraries Chromium/Playwright needs on Debian 13 (trixie).
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
-    libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
-    libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
-    libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
-    fonts-liberation libfontconfig1 \
-    && rm -rf /var/lib/apt/lists/* \
-    && npm install -g agent-browser \
-    && agent-browser install \
-    && rm -rf /tmp/* /root/.npm
-
 WORKDIR /app/autogpt_platform/backend

 # Copy only the .venv from builder (not the entire /app directory)
--- a/autogpt_platform/backend/backend/api/external/middleware.py
+++ b/autogpt_platform/backend/backend/api/external/middleware.py
@@ -88,23 +88,20 @@ async def require_auth(
    )


-def require_permission(*permissions: APIKeyPermission):
+def require_permission(permission: APIKeyPermission):
    """
-    Dependency function for checking required permissions.
-    All listed permissions must be present.
+    Dependency function for checking specific permissions
    (works with API keys and OAuth tokens)
    """

-    async def check_permissions(
+    async def check_permission(
        auth: APIAuthorizationInfo = Security(require_auth),
    ) -> APIAuthorizationInfo:
-        missing = [p for p in permissions if p not in auth.scopes]
-        if missing:
+        if permission not in auth.scopes:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
-                detail=f"Missing required permission(s): "
-                f"{', '.join(p.value for p in missing)}",
+                detail=f"Missing required permission: {permission.value}",
            )
        return auth

-    return check_permissions
+    return check_permission
--- a/autogpt_platform/backend/backend/api/external/v1/routes.py
+++ b/autogpt_platform/backend/backend/api/external/v1/routes.py
@@ -1,7 +1,7 @@
 import logging
 import urllib.parse
 from collections import defaultdict
-from typing import Annotated, Any, Optional, Sequence
+from typing import Annotated, Any, Literal, Optional, Sequence

 from fastapi import APIRouter, Body, HTTPException, Security
 from prisma.enums import AgentExecutionStatus, APIKeyPermission
@@ -9,17 +9,15 @@ from pydantic import BaseModel, Field
 from typing_extensions import TypedDict

 import backend.api.features.store.cache as store_cache
-import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
 import backend.blocks
-from backend.api.external.middleware import require_auth, require_permission
+from backend.api.external.middleware import require_permission
 from backend.data import execution as execution_db
 from backend.data import graph as graph_db
 from backend.data import user as user_db
 from backend.data.auth.base import APIAuthorizationInfo
 from backend.data.block import BlockInput, CompletedBlockOutput
 from backend.executor.utils import add_graph_execution
-from backend.integrations.webhooks.graph_lifecycle_hooks import on_graph_activate
 from backend.util.settings import Settings

 from .integrations import integrations_router
@@ -97,43 +95,6 @@ async def execute_graph_block(
    return output


-@v1_router.post(
-    path="/graphs",
-    tags=["graphs"],
-    status_code=201,
-    dependencies=[
-        Security(
-            require_permission(
-                APIKeyPermission.WRITE_GRAPH, APIKeyPermission.WRITE_LIBRARY
-            )
-        )
-    ],
-)
-async def create_graph(
-    graph: graph_db.Graph,
-    auth: APIAuthorizationInfo = Security(
-        require_permission(APIKeyPermission.WRITE_GRAPH, APIKeyPermission.WRITE_LIBRARY)
-    ),
-) -> graph_db.GraphModel:
-    """
-    Create a new agent graph.
-
-    The graph will be validated and assigned a new ID.
-    It is automatically added to the user's library.
-    """
-    from backend.api.features.library import db as library_db
-
-    graph_model = graph_db.make_graph_model(graph, auth.user_id)
-    graph_model.reassign_ids(user_id=auth.user_id, reassign_graph_id=True)
-    graph_model.validate_graph(for_run=False)
-
-    await graph_db.create_graph(graph_model, user_id=auth.user_id)
-    await library_db.create_library_agent(graph_model, auth.user_id)
-    activated_graph = await on_graph_activate(graph_model, user_id=auth.user_id)
-
-    return activated_graph
-
-
@v1_router.post(
    path="/graphs/{graph_id}/execute/{graph_version}",
    tags=["graphs"],
@@ -231,13 +192,13 @@ async def get_graph_execution_results(
@v1_router.get(
    path="/store/agents",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.StoreAgentsResponse,
 )
 async def get_store_agents(
    featured: bool = False,
    creator: str | None = None,
-    sorted_by: store_db.StoreAgentsSortOptions | None = None,
+    sorted_by: Literal["rating", "runs", "name", "updated_at"] | None = None,
    search_query: str | None = None,
    category: str | None = None,
    page: int = 1,
@@ -279,7 +240,7 @@ async def get_store_agents(
@v1_router.get(
    path="/store/agents/{username}/{agent_name}",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.StoreAgentDetails,
 )
 async def get_store_agent(
@@ -307,13 +268,13 @@ async def get_store_agent(
@v1_router.get(
    path="/store/creators",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.CreatorsResponse,
 )
 async def get_store_creators(
    featured: bool = False,
    search_query: str | None = None,
-    sorted_by: store_db.StoreCreatorsSortOptions | None = None,
+    sorted_by: Literal["agent_rating", "agent_runs", "num_agents"] | None = None,
    page: int = 1,
    page_size: int = 20,
 ) -> store_model.CreatorsResponse:
@@ -349,7 +310,7 @@ async def get_store_creators(
@v1_router.get(
    path="/store/creators/{username}",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.CreatorDetails,
 )
 async def get_store_creator(
--- a/autogpt_platform/backend/backend/api/features/admin/model.py
+++ b/autogpt_platform/backend/backend/api/features/admin/model.py
@@ -1,8 +1,4 @@
-from datetime import datetime
-from typing import Any, Literal, Optional
-
-import prisma.enums
-from pydantic import BaseModel, EmailStr
+from pydantic import BaseModel

 from backend.data.model import UserTransaction
 from backend.util.models import Pagination
@@ -18,42 +14,3 @@ class UserHistoryResponse(BaseModel):
 class AddUserCreditsResponse(BaseModel):
    new_balance: int
    transaction_key: str
-
-
-class CreateInvitedUserRequest(BaseModel):
-    email: EmailStr
-    name: Optional[str] = None
-
-
-class InvitedUserResponse(BaseModel):
-    id: str
-    email: str
-    status: prisma.enums.InvitedUserStatus
-    auth_user_id: Optional[str] = None
-    name: Optional[str] = None
-    tally_understanding: Optional[dict[str, Any]] = None
-    tally_status: prisma.enums.TallyComputationStatus
-    tally_computed_at: Optional[datetime] = None
-    tally_error: Optional[str] = None
-    created_at: datetime
-    updated_at: datetime
-
-
-class InvitedUsersResponse(BaseModel):
-    invited_users: list[InvitedUserResponse]
-
-
-class BulkInvitedUserRowResponse(BaseModel):
-    row_number: int
-    email: Optional[str] = None
-    name: Optional[str] = None
-    status: Literal["CREATED", "SKIPPED", "ERROR"]
-    message: str
-    invited_user: Optional[InvitedUserResponse] = None
-
-
-class BulkInvitedUsersResponse(BaseModel):
-    created_count: int
-    skipped_count: int
-    error_count: int
-    results: list[BulkInvitedUserRowResponse]
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
@@ -24,13 +24,14 @@ router = fastapi.APIRouter(
@router.get(
    "/listings",
    summary="Get Admin Listings History",
+    response_model=store_model.StoreListingsWithVersionsResponse,
 )
 async def get_admin_listings_with_versions(
    status: typing.Optional[prisma.enums.SubmissionStatus] = None,
    search: typing.Optional[str] = None,
    page: int = 1,
    page_size: int = 20,
-) -> store_model.StoreListingsWithVersionsAdminViewResponse:
+):
    """
    Get store listings with their version history for admins.

@@ -44,26 +45,36 @@ async def get_admin_listings_with_versions(
        page_size: Number of items per page

    Returns:
-        Paginated listings with their versions
+        StoreListingsWithVersionsResponse with listings and their versions
    """
-    listings = await store_db.get_admin_listings_with_versions(
-        status=status,
-        search_query=search,
-        page=page,
-        page_size=page_size,
-    )
-    return listings
+    try:
+        listings = await store_db.get_admin_listings_with_versions(
+            status=status,
+            search_query=search,
+            page=page,
+            page_size=page_size,
+        )
+        return listings
+    except Exception as e:
+        logger.exception("Error getting admin listings with versions: %s", e)
+        return fastapi.responses.JSONResponse(
+            status_code=500,
+            content={
+                "detail": "An error occurred while retrieving listings with versions"
+            },
+        )


@router.post(
    "/submissions/{store_listing_version_id}/review",
    summary="Review Store Submission",
+    response_model=store_model.StoreSubmission,
 )
 async def review_submission(
    store_listing_version_id: str,
    request: store_model.ReviewSubmissionRequest,
    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
-) -> store_model.StoreSubmissionAdminView:
+):
    """
    Review a store listing submission.

@@ -73,24 +84,31 @@ async def review_submission(
        user_id: Authenticated admin user performing the review

    Returns:
-        StoreSubmissionAdminView with updated review information
+        StoreSubmission with updated review information
    """
-    already_approved = await store_db.check_submission_already_approved(
-        store_listing_version_id=store_listing_version_id,
-    )
-    submission = await store_db.review_store_submission(
-        store_listing_version_id=store_listing_version_id,
-        is_approved=request.is_approved,
-        external_comments=request.comments,
-        internal_comments=request.internal_comments or "",
-        reviewer_id=user_id,
-    )
+    try:
+        already_approved = await store_db.check_submission_already_approved(
+            store_listing_version_id=store_listing_version_id,
+        )
+        submission = await store_db.review_store_submission(
+            store_listing_version_id=store_listing_version_id,
+            is_approved=request.is_approved,
+            external_comments=request.comments,
+            internal_comments=request.internal_comments or "",
+            reviewer_id=user_id,
+        )

-    state_changed = already_approved != request.is_approved
-    # Clear caches whenever approval state changes, since store visibility can change
-    if state_changed:
-        store_cache.clear_all_caches()
-    return submission
+        state_changed = already_approved != request.is_approved
+        # Clear caches when the request is approved as it updates what is shown on the store
+        if state_changed:
+            store_cache.clear_all_caches()
+        return submission
+    except Exception as e:
+        logger.exception("Error reviewing submission: %s", e)
+        return fastapi.responses.JSONResponse(
+            status_code=500,
+            content={"detail": "An error occurred while reviewing the submission"},
+        )


@router.get(
--- a/autogpt_platform/backend/backend/api/features/admin/user_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/user_admin_routes.py
@@ -1,143 +0,0 @@
-import logging
-
-from autogpt_libs.auth import get_user_id, requires_admin_user
-from fastapi import APIRouter, File, Security, UploadFile
-
-from backend.data.invited_user import (
-    BulkInvitedUsersResult,
-    InvitedUserRecord,
-    bulk_create_invited_users_from_file,
-    create_invited_user,
-    list_invited_users,
-    retry_invited_user_tally,
-    revoke_invited_user,
-)
-
-from .model import (
-    BulkInvitedUserRowResponse,
-    BulkInvitedUsersResponse,
-    CreateInvitedUserRequest,
-    InvitedUserResponse,
-    InvitedUsersResponse,
-)
-
-logger = logging.getLogger(__name__)
-
-
-router = APIRouter(
-    prefix="/admin",
-    tags=["users", "admin"],
-    dependencies=[Security(requires_admin_user)],
-)
-
-
-def _to_response(invited_user: InvitedUserRecord) -> InvitedUserResponse:
-    return InvitedUserResponse(**invited_user.model_dump())
-
-
-def _to_bulk_response(result: BulkInvitedUsersResult) -> BulkInvitedUsersResponse:
-    return BulkInvitedUsersResponse(
-        created_count=result.created_count,
-        skipped_count=result.skipped_count,
-        error_count=result.error_count,
-        results=[
-            BulkInvitedUserRowResponse(
-                row_number=row.row_number,
-                email=row.email,
-                name=row.name,
-                status=row.status,
-                message=row.message,
-                invited_user=(
-                    _to_response(row.invited_user)
-                    if row.invited_user is not None
-                    else None
-                ),
-            )
-            for row in result.results
-        ],
-    )
-
-
-@router.get(
-    "/invited-users",
-    response_model=InvitedUsersResponse,
-    summary="List Invited Users",
-)
-async def get_invited_users(
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUsersResponse:
-    logger.info("Admin user %s requested invited users", admin_user_id)
-    invited_users = await list_invited_users()
-    return InvitedUsersResponse(
-        invited_users=[_to_response(invited_user) for invited_user in invited_users]
-    )
-
-
-@router.post(
-    "/invited-users",
-    response_model=InvitedUserResponse,
-    summary="Create Invited User",
-)
-async def create_invited_user_route(
-    request: CreateInvitedUserRequest,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info(
-        "Admin user %s created invited user for %s",
-        admin_user_id,
-        request.email,
-    )
-    invited_user = await create_invited_user(request.email, request.name)
-    return _to_response(invited_user)
-
-
-@router.post(
-    "/invited-users/bulk",
-    response_model=BulkInvitedUsersResponse,
-    summary="Bulk Create Invited Users",
-    operation_id="postV2BulkCreateInvitedUsers",
-)
-async def bulk_create_invited_users_route(
-    file: UploadFile = File(...),
-    admin_user_id: str = Security(get_user_id),
-) -> BulkInvitedUsersResponse:
-    logger.info(
-        "Admin user %s bulk invited users from %s",
-        admin_user_id,
-        file.filename or "<unnamed>",
-    )
-    content = await file.read()
-    result = await bulk_create_invited_users_from_file(file.filename, content)
-    return _to_bulk_response(result)
-
-
-@router.post(
-    "/invited-users/{invited_user_id}/revoke",
-    response_model=InvitedUserResponse,
-    summary="Revoke Invited User",
-)
-async def revoke_invited_user_route(
-    invited_user_id: str,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info("Admin user %s revoked invited user %s", admin_user_id, invited_user_id)
-    invited_user = await revoke_invited_user(invited_user_id)
-    return _to_response(invited_user)
-
-
-@router.post(
-    "/invited-users/{invited_user_id}/retry-tally",
-    response_model=InvitedUserResponse,
-    summary="Retry Invited User Tally",
-)
-async def retry_invited_user_tally_route(
-    invited_user_id: str,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info(
-        "Admin user %s retried Tally seed for invited user %s",
-        admin_user_id,
-        invited_user_id,
-    )
-    invited_user = await retry_invited_user_tally(invited_user_id)
-    return _to_response(invited_user)
--- a/autogpt_platform/backend/backend/api/features/admin/user_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/user_admin_routes_test.py
@@ -1,165 +0,0 @@
-from datetime import datetime, timezone
-from unittest.mock import AsyncMock
-
-import fastapi
-import fastapi.testclient
-import prisma.enums
-import pytest
-import pytest_mock
-from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-from backend.data.invited_user import (
-    BulkInvitedUserRowResult,
-    BulkInvitedUsersResult,
-    InvitedUserRecord,
-)
-
-from .user_admin_routes import router as user_admin_router
-
-app = fastapi.FastAPI()
-app.include_router(user_admin_router)
-
-client = fastapi.testclient.TestClient(app)
-
-
-@pytest.fixture(autouse=True)
-def setup_app_admin_auth(mock_jwt_admin):
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def _sample_invited_user() -> InvitedUserRecord:
-    now = datetime.now(timezone.utc)
-    return InvitedUserRecord(
-        id="invite-1",
-        email="invited@example.com",
-        status=prisma.enums.InvitedUserStatus.INVITED,
-        auth_user_id=None,
-        name="Invited User",
-        tally_understanding=None,
-        tally_status=prisma.enums.TallyComputationStatus.PENDING,
-        tally_computed_at=None,
-        tally_error=None,
-        created_at=now,
-        updated_at=now,
-    )
-
-
-def _sample_bulk_invited_users_result() -> BulkInvitedUsersResult:
-    return BulkInvitedUsersResult(
-        created_count=1,
-        skipped_count=1,
-        error_count=0,
-        results=[
-            BulkInvitedUserRowResult(
-                row_number=1,
-                email="invited@example.com",
-                name=None,
-                status="CREATED",
-                message="Invite created",
-                invited_user=_sample_invited_user(),
-            ),
-            BulkInvitedUserRowResult(
-                row_number=2,
-                email="duplicate@example.com",
-                name=None,
-                status="SKIPPED",
-                message="An invited user with this email already exists",
-                invited_user=None,
-            ),
-        ],
-    )
-
-
-def test_get_invited_users(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.list_invited_users",
-        AsyncMock(return_value=[_sample_invited_user()]),
-    )
-
-    response = client.get("/admin/invited-users")
-
-    assert response.status_code == 200
-    data = response.json()
-    assert len(data["invited_users"]) == 1
-    assert data["invited_users"][0]["email"] == "invited@example.com"
-    assert data["invited_users"][0]["status"] == "INVITED"
-
-
-def test_create_invited_user(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.create_invited_user",
-        AsyncMock(return_value=_sample_invited_user()),
-    )
-
-    response = client.post(
-        "/admin/invited-users",
-        json={"email": "invited@example.com", "name": "Invited User"},
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["email"] == "invited@example.com"
-    assert data["name"] == "Invited User"
-
-
-def test_bulk_create_invited_users(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.bulk_create_invited_users_from_file",
-        AsyncMock(return_value=_sample_bulk_invited_users_result()),
-    )
-
-    response = client.post(
-        "/admin/invited-users/bulk",
-        files={
-            "file": ("invites.txt", b"invited@example.com\nduplicate@example.com\n")
-        },
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["created_count"] == 1
-    assert data["skipped_count"] == 1
-    assert data["results"][0]["status"] == "CREATED"
-    assert data["results"][1]["status"] == "SKIPPED"
-
-
-def test_revoke_invited_user(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    revoked = _sample_invited_user().model_copy(
-        update={"status": prisma.enums.InvitedUserStatus.REVOKED}
-    )
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.revoke_invited_user",
-        AsyncMock(return_value=revoked),
-    )
-
-    response = client.post("/admin/invited-users/invite-1/revoke")
-
-    assert response.status_code == 200
-    assert response.json()["status"] == "REVOKED"
-
-
-def test_retry_invited_user_tally(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    retried = _sample_invited_user().model_copy(
-        update={"tally_status": prisma.enums.TallyComputationStatus.RUNNING}
-    )
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.retry_invited_user_tally",
-        AsyncMock(return_value=retried),
-    )
-
-    response = client.post("/admin/invited-users/invite-1/retry-tally")
-
-    assert response.status_code == 200
-    assert response.json()["tally_status"] == "RUNNING"
--- a/autogpt_platform/backend/backend/api/features/builder/db.py
+++ b/autogpt_platform/backend/backend/api/features/builder/db.py
@@ -1,17 +1,15 @@
 import logging
 from dataclasses import dataclass
+from datetime import datetime, timedelta, timezone
 from difflib import SequenceMatcher
-from typing import Any, Sequence, get_args, get_origin
+from typing import Sequence

 import prisma
-from prisma.enums import ContentType
-from prisma.models import mv_suggested_blocks

 import backend.api.features.library.db as library_db
 import backend.api.features.library.model as library_model
 import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
-from backend.api.features.store.hybrid_search import unified_hybrid_search
 from backend.blocks import load_all_blocks
 from backend.blocks._base import (
    AnyBlockSchema,
@@ -21,6 +19,7 @@ from backend.blocks._base import (
    BlockType,
 )
 from backend.blocks.llm import LlmModel
+from backend.data.db import query_raw_with_schema
 from backend.integrations.providers import ProviderName
 from backend.util.cache import cached
 from backend.util.models import Pagination
@@ -43,16 +42,6 @@ MAX_LIBRARY_AGENT_RESULTS = 100
 MAX_MARKETPLACE_AGENT_RESULTS = 100
 MIN_SCORE_FOR_FILTERED_RESULTS = 10.0

-# Boost blocks over marketplace agents in search results
-BLOCK_SCORE_BOOST = 50.0
-
-# Block IDs to exclude from search results
-EXCLUDED_BLOCK_IDS = frozenset(
-    {
-        "e189baac-8c20-45a1-94a7-55177ea42565",  # AgentExecutorBlock
-    }
-)
-
 SearchResultItem = BlockInfo | library_model.LibraryAgent | store_model.StoreAgent


@@ -75,8 +64,8 @@ def get_block_categories(category_blocks: int = 3) -> list[BlockCategoryResponse

    for block_type in load_all_blocks().values():
        block: AnyBlockSchema = block_type()
-        # Skip disabled and excluded blocks
-        if block.disabled or block.id in EXCLUDED_BLOCK_IDS:
+        # Skip disabled blocks
+        if block.disabled:
            continue
        # Skip blocks that don't have categories (all should have at least one)
        if not block.categories:
@@ -127,9 +116,6 @@ def get_blocks(
        # Skip disabled blocks
        if block.disabled:
            continue
-        # Skip excluded blocks
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
        # Skip blocks that don't match the category
        if category and category not in {c.name.lower() for c in block.categories}:
            continue
@@ -269,25 +255,14 @@ async def _build_cached_search_results(
        "my_agents": 0,
    }

-    # Use hybrid search when query is present, otherwise list all blocks
-    if (include_blocks or include_integrations) and normalized_query:
-        block_results, block_total, integration_total = await _hybrid_search_blocks(
-            query=search_query,
-            include_blocks=include_blocks,
-            include_integrations=include_integrations,
-        )
-        scored_items.extend(block_results)
-        total_items["blocks"] = block_total
-        total_items["integrations"] = integration_total
-    elif include_blocks or include_integrations:
-        # No query - list all blocks using in-memory approach
-        block_results, block_total, integration_total = _collect_block_results(
-            include_blocks=include_blocks,
-            include_integrations=include_integrations,
-        )
-        scored_items.extend(block_results)
-        total_items["blocks"] = block_total
-        total_items["integrations"] = integration_total
+    block_results, block_total, integration_total = _collect_block_results(
+        normalized_query=normalized_query,
+        include_blocks=include_blocks,
+        include_integrations=include_integrations,
+    )
+    scored_items.extend(block_results)
+    total_items["blocks"] = block_total
+    total_items["integrations"] = integration_total

    if include_library_agents:
        library_response = await library_db.list_library_agents(
@@ -332,14 +307,10 @@ async def _build_cached_search_results(

 def _collect_block_results(
    *,
+    normalized_query: str,
    include_blocks: bool,
    include_integrations: bool,
 ) -> tuple[list[_ScoredItem], int, int]:
-    """
-    Collect all blocks for listing (no search query).
-
-    All blocks get BLOCK_SCORE_BOOST to prioritize them over marketplace agents.
-    """
    results: list[_ScoredItem] = []
    block_count = 0
    integration_count = 0
@@ -352,10 +323,6 @@ def _collect_block_results(
        if block.disabled:
            continue

-        # Skip excluded blocks
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
-
        block_info = block.get_info()
        credentials = list(block.input_schema.get_credentials_fields().values())
        is_integration = len(credentials) > 0
@@ -365,6 +332,10 @@ def _collect_block_results(
        if not is_integration and not include_blocks:
            continue

+        score = _score_block(block, block_info, normalized_query)
+        if not _should_include_item(score, normalized_query):
+            continue
+
        filter_type: FilterType = "integrations" if is_integration else "blocks"
        if is_integration:
            integration_count += 1
@@ -375,122 +346,8 @@ def _collect_block_results(
            _ScoredItem(
                item=block_info,
                filter_type=filter_type,
-                score=BLOCK_SCORE_BOOST,
-                sort_key=block_info.name.lower(),
-            )
-        )
-
-    return results, block_count, integration_count
-
-
-async def _hybrid_search_blocks(
-    *,
-    query: str,
-    include_blocks: bool,
-    include_integrations: bool,
-) -> tuple[list[_ScoredItem], int, int]:
-    """
-    Search blocks using hybrid search with builder-specific filtering.
-
-    Uses unified_hybrid_search for semantic + lexical search, then applies
-    post-filtering for block/integration types and scoring adjustments.
-
-    Scoring:
-        - Base: hybrid relevance score (0-1) scaled to 0-100, plus BLOCK_SCORE_BOOST
-          to prioritize blocks over marketplace agents in combined results
-        - +30 for exact name match, +15 for prefix name match
-        - +20 if the block has an LlmModel field and the query matches an LLM model name
-
-    Args:
-        query: The search query string
-        include_blocks: Whether to include regular blocks
-        include_integrations: Whether to include integration blocks
-
-    Returns:
-        Tuple of (scored_items, block_count, integration_count)
-    """
-    results: list[_ScoredItem] = []
-    block_count = 0
-    integration_count = 0
-
-    if not include_blocks and not include_integrations:
-        return results, block_count, integration_count
-
-    normalized_query = query.strip().lower()
-
-    # Fetch more results to account for post-filtering
-    search_results, _ = await unified_hybrid_search(
-        query=query,
-        content_types=[ContentType.BLOCK],
-        page=1,
-        page_size=150,
-        min_score=0.10,
-    )
-
-    # Load all blocks for getting BlockInfo
-    all_blocks = load_all_blocks()
-
-    for result in search_results:
-        block_id = result["content_id"]
-
-        # Skip excluded blocks
-        if block_id in EXCLUDED_BLOCK_IDS:
-            continue
-
-        metadata = result.get("metadata", {})
-        hybrid_score = result.get("relevance", 0.0)
-
-        # Get the actual block class
-        if block_id not in all_blocks:
-            continue
-
-        block_cls = all_blocks[block_id]
-        block: AnyBlockSchema = block_cls()
-
-        if block.disabled:
-            continue
-
-        # Check block/integration filter using metadata
-        is_integration = metadata.get("is_integration", False)
-
-        if is_integration and not include_integrations:
-            continue
-        if not is_integration and not include_blocks:
-            continue
-
-        # Get block info
-        block_info = block.get_info()
-
-        # Calculate final score: scale hybrid score and add builder-specific bonuses
-        # Hybrid scores are 0-1, builder scores were 0-200+
-        # Add BLOCK_SCORE_BOOST to prioritize blocks over marketplace agents
-        final_score = hybrid_score * 100 + BLOCK_SCORE_BOOST
-
-        # Add LLM model match bonus
-        has_llm_field = metadata.get("has_llm_model_field", False)
-        if has_llm_field and _matches_llm_model(block.input_schema, normalized_query):
-            final_score += 20
-
-        # Add exact/prefix match bonus for deterministic tie-breaking
-        name = block_info.name.lower()
-        if name == normalized_query:
-            final_score += 30
-        elif name.startswith(normalized_query):
-            final_score += 15
-
-        # Track counts
-        filter_type: FilterType = "integrations" if is_integration else "blocks"
-        if is_integration:
-            integration_count += 1
-        else:
-            block_count += 1
-
-        results.append(
-            _ScoredItem(
-                item=block_info,
-                filter_type=filter_type,
-                score=final_score,
-                sort_key=name,
+                score=score,
+                sort_key=_get_item_name(block_info),
            )
        )

@@ -615,8 +472,6 @@ async def _get_static_counts():
        block: AnyBlockSchema = block_type()
        if block.disabled:
            continue
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue

        all_blocks += 1

@@ -643,25 +498,47 @@ async def _get_static_counts():
    }


-def _contains_type(annotation: Any, target: type) -> bool:
-    """Check if an annotation is or contains the target type (handles Optional/Union/Annotated)."""
-    if annotation is target:
-        return True
-    origin = get_origin(annotation)
-    if origin is None:
-        return False
-    return any(_contains_type(arg, target) for arg in get_args(annotation))
-
-
 def _matches_llm_model(schema_cls: type[BlockSchema], query: str) -> bool:
    for field in schema_cls.model_fields.values():
-        if _contains_type(field.annotation, LlmModel):
+        if field.annotation == LlmModel:
            # Check if query matches any value in llm_models
            if any(query in name for name in llm_models):
                return True
    return False


+def _score_block(
+    block: AnyBlockSchema,
+    block_info: BlockInfo,
+    normalized_query: str,
+) -> float:
+    if not normalized_query:
+        return 0.0
+
+    name = block_info.name.lower()
+    description = block_info.description.lower()
+    score = _score_primary_fields(name, description, normalized_query)
+
+    category_text = " ".join(
+        category.get("category", "").lower() for category in block_info.categories
+    )
+    score += _score_additional_field(category_text, normalized_query, 12, 6)
+
+    credentials_info = block.input_schema.get_credentials_fields_info().values()
+    provider_names = [
+        provider.value.lower()
+        for info in credentials_info
+        for provider in info.provider
+    ]
+    provider_text = " ".join(provider_names)
+    score += _score_additional_field(provider_text, normalized_query, 15, 6)
+
+    if _matches_llm_model(block.input_schema, normalized_query):
+        score += 20
+
+    return score
+
+
 def _score_library_agent(
    agent: library_model.LibraryAgent,
    normalized_query: str,
@@ -768,20 +645,31 @@ def _get_all_providers() -> dict[ProviderName, Provider]:
    return providers


-@cached(ttl_seconds=3600, shared_cache=True)
+@cached(ttl_seconds=3600)
 async def get_suggested_blocks(count: int = 5) -> list[BlockInfo]:
-    """Return the most-executed blocks from the last 14 days.
+    suggested_blocks = []
+    # Sum the number of executions for each block type
+    # Prisma cannot group by nested relations, so we do a raw query
+    # Calculate the cutoff timestamp
+    timestamp_threshold = datetime.now(timezone.utc) - timedelta(days=30)

-    Queries the mv_suggested_blocks materialized view (refreshed hourly via pg_cron)
-    and returns the top `count` blocks sorted by execution count, excluding
-    Input/Output/Agent block types and blocks in EXCLUDED_BLOCK_IDS.
-    """
-    results = await mv_suggested_blocks.prisma().find_many()
+    results = await query_raw_with_schema(
+        """
+        SELECT
+            agent_node."agentBlockId" AS block_id,
+            COUNT(execution.id) AS execution_count
+        FROM {schema_prefix}"AgentNodeExecution" execution
+        JOIN {schema_prefix}"AgentNode" agent_node ON execution."agentNodeId" = agent_node.id
+        WHERE execution."endedTime" >= $1::timestamp
+        GROUP BY agent_node."agentBlockId"
+        ORDER BY execution_count DESC;
+        """,
+        timestamp_threshold,
+    )

    # Get the top blocks based on execution count
-    # But ignore Input, Output, Agent, and excluded blocks
+    # But ignore Input and Output blocks
    blocks: list[tuple[BlockInfo, int]] = []
-    execution_counts = {row.block_id: row.execution_count for row in results}

    for block_type in load_all_blocks().values():
        block: AnyBlockSchema = block_type()
@@ -791,9 +679,11 @@ async def get_suggested_blocks(count: int = 5) -> list[BlockInfo]:
            BlockType.AGENT,
        ):
            continue
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
-        execution_count = execution_counts.get(block.id, 0)
+        # Find the execution count for this block
+        execution_count = next(
+            (row["execution_count"] for row in results if row["block_id"] == block.id),
+            0,
+        )
        blocks.append((block.get_info(), execution_count))
    # Sort blocks by execution count
    blocks.sort(key=lambda x: x[1], reverse=True)
--- a/autogpt_platform/backend/backend/api/features/builder/model.py
+++ b/autogpt_platform/backend/backend/api/features/builder/model.py
@@ -27,6 +27,7 @@ class SearchEntry(BaseModel):

 # Suggestions
 class SuggestionsResponse(BaseModel):
+    otto_suggestions: list[str]
    recent_searches: list[SearchEntry]
    providers: list[ProviderName]
    top_blocks: list[BlockInfo]
--- a/autogpt_platform/backend/backend/api/features/builder/routes.py
+++ b/autogpt_platform/backend/backend/api/features/builder/routes.py
@@ -1,5 +1,5 @@
 import logging
-from typing import Annotated, Sequence, cast, get_args
+from typing import Annotated, Sequence

 import fastapi
 from autogpt_libs.auth.dependencies import get_user_id, requires_user
@@ -10,8 +10,6 @@ from backend.util.models import Pagination
 from . import db as builder_db
 from . import model as builder_model

-VALID_FILTER_VALUES = get_args(builder_model.FilterType)
-
 logger = logging.getLogger(__name__)

 router = fastapi.APIRouter(
@@ -51,6 +49,11 @@ async def get_suggestions(
    Get all suggestions for the Blocks Menu.
    """
    return builder_model.SuggestionsResponse(
+        otto_suggestions=[
+            "What blocks do I need to get started?",
+            "Help me create a list",
+            "Help me feed my data to Google Maps",
+        ],
        recent_searches=await builder_db.get_recent_searches(user_id),
        providers=[
            ProviderName.TWITTER,
@@ -148,7 +151,7 @@ async def get_providers(
 async def search(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
    search_query: Annotated[str | None, fastapi.Query()] = None,
-    filter: Annotated[str | None, fastapi.Query()] = None,
+    filter: Annotated[list[builder_model.FilterType] | None, fastapi.Query()] = None,
    search_id: Annotated[str | None, fastapi.Query()] = None,
    by_creator: Annotated[list[str] | None, fastapi.Query()] = None,
    page: Annotated[int, fastapi.Query()] = 1,
@@ -157,20 +160,9 @@ async def search(
    """
    Search for blocks (including integrations), marketplace agents, and user library agents.
    """
-    # Parse and validate filter parameter
-    filters: list[builder_model.FilterType]
-    if filter:
-        filter_values = [f.strip() for f in filter.split(",")]
-        invalid_filters = [f for f in filter_values if f not in VALID_FILTER_VALUES]
-        if invalid_filters:
-            raise fastapi.HTTPException(
-                status_code=400,
-                detail=f"Invalid filter value(s): {', '.join(invalid_filters)}. "
-                f"Valid values are: {', '.join(VALID_FILTER_VALUES)}",
-            )
-        filters = cast(list[builder_model.FilterType], filter_values)
-    else:
-        filters = [
+    # If no filters are provided, then we will return all types
+    if not filter:
+        filter = [
            "blocks",
            "integrations",
            "marketplace_agents",
@@ -182,7 +174,7 @@ async def search(
    cached_results = await builder_db.get_sorted_search_results(
        user_id=user_id,
        search_query=search_query,
-        filters=filters,
+        filters=filter,
        by_creator=by_creator,
    )

@@ -204,7 +196,7 @@ async def search(
        user_id,
        builder_model.SearchEntry(
            search_query=search_query,
-            filter=filters,
+            filter=filter,
            by_creator=by_creator,
            search_id=search_id,
        ),
--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
@@ -2,21 +2,23 @@

 import asyncio
 import logging
-import re
+import uuid as uuid_module
 from collections.abc import AsyncGenerator
 from typing import Annotated
-from uuid import uuid4

 from autogpt_libs import auth
-from fastapi import APIRouter, Depends, HTTPException, Query, Response, Security
+from fastapi import APIRouter, Depends, Header, HTTPException, Query, Response, Security
 from fastapi.responses import StreamingResponse
-from prisma.models import UserWorkspaceFile
-from pydantic import BaseModel, Field, field_validator
+from pydantic import BaseModel

 from backend.copilot import service as chat_service
 from backend.copilot import stream_registry
+from backend.copilot.completion_handler import (
+    process_operation_failure,
+    process_operation_success,
+)
 from backend.copilot.config import ChatConfig
-from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
+from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_task
 from backend.copilot.model import (
    ChatMessage,
    ChatSession,
@@ -25,7 +27,6 @@ from backend.copilot.model import (
    delete_chat_session,
    get_chat_session,
    get_user_sessions,
-    update_session_title,
 )
 from backend.copilot.response_model import StreamError, StreamFinish, StreamHeartbeat
 from backend.copilot.tools.models import (
@@ -43,23 +44,20 @@ from backend.copilot.tools.models import (
    ErrorResponse,
    ExecutionStartedResponse,
    InputValidationErrorResponse,
-    MCPToolOutputResponse,
-    MCPToolsDiscoveredResponse,
    NeedLoginResponse,
    NoResultsResponse,
+    OperationInProgressResponse,
+    OperationPendingResponse,
+    OperationStartedResponse,
    SetupRequirementsResponse,
    SuggestedGoalResponse,
    UnderstandingUpdatedResponse,
 )
 from backend.copilot.tracking import track_user_message
-from backend.data.workspace import get_or_create_workspace
 from backend.util.exceptions import NotFoundError

 config = ChatConfig()

-_UUID_RE = re.compile(
-    r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", re.I
-)

 logger = logging.getLogger(__name__)

@@ -88,9 +86,6 @@ class StreamChatRequest(BaseModel):
    message: str
    is_user_message: bool = True
    context: dict[str, str] | None = None  # {url: str, content: str}
-    file_ids: list[str] | None = Field(
-        default=None, max_length=20
-    )  # Workspace file IDs attached to this message


 class CreateSessionResponse(BaseModel):
@@ -104,8 +99,10 @@ class CreateSessionResponse(BaseModel):
 class ActiveStreamInfo(BaseModel):
    """Information about an active stream for reconnection."""

-    turn_id: str
+    task_id: str
    last_message_id: str  # Redis Stream message ID for resumption
+    operation_id: str  # Operation ID for completion tracking
+    tool_name: str  # Name of the tool being executed


 class SessionDetailResponse(BaseModel):
@@ -135,25 +132,20 @@ class ListSessionsResponse(BaseModel):
    total: int


-class CancelSessionResponse(BaseModel):
-    """Response model for the cancel session endpoint."""
+class CancelTaskResponse(BaseModel):
+    """Response model for the cancel task endpoint."""

    cancelled: bool
+    task_id: str | None = None
    reason: str | None = None


-class UpdateSessionTitleRequest(BaseModel):
-    """Request model for updating a session's title."""
+class OperationCompleteRequest(BaseModel):
+    """Request model for external completion webhook."""

-    title: str
-
-    @field_validator("title")
-    @classmethod
-    def title_must_not_be_blank(cls, v: str) -> str:
-        stripped = v.strip()
-        if not stripped:
-            raise ValueError("Title must not be blank")
-        return stripped
+    success: bool
+    result: dict | str | None = None
+    error: str | None = None


 # ========== Routes ==========
@@ -264,58 +256,9 @@ async def delete_session(
            detail=f"Session {session_id} not found or access denied",
        )

-    # Best-effort cleanup of the E2B sandbox (if any).
-    config = ChatConfig()
-    if config.use_e2b_sandbox and config.e2b_api_key:
-        from backend.copilot.tools.e2b_sandbox import kill_sandbox
-
-        try:
-            await kill_sandbox(session_id, config.e2b_api_key)
-        except Exception:
-            logger.warning(
-                "[E2B] Failed to kill sandbox for session %s", session_id[:12]
-            )
-
    return Response(status_code=204)


-@router.patch(
-    "/sessions/{session_id}/title",
-    summary="Update session title",
-    dependencies=[Security(auth.requires_user)],
-    status_code=200,
-    responses={404: {"description": "Session not found or access denied"}},
-)
-async def update_session_title_route(
-    session_id: str,
-    request: UpdateSessionTitleRequest,
-    user_id: Annotated[str, Security(auth.get_user_id)],
-) -> dict:
-    """
-    Update the title of a chat session.
-
-    Allows the user to rename their chat session.
-
-    Args:
-        session_id: The session ID to update.
-        request: Request body containing the new title.
-        user_id: The authenticated user's ID.
-
-    Returns:
-        dict: Status of the update.
-
-    Raises:
-        HTTPException: 404 if session not found or not owned by user.
-    """
-    success = await update_session_title(session_id, user_id, request.title)
-    if not success:
-        raise HTTPException(
-            status_code=404,
-            detail=f"Session {session_id} not found or access denied",
-        )
-    return {"status": "ok"}
-
-
@router.get(
    "/sessions/{session_id}",
 )
@@ -327,7 +270,7 @@ async def get_session(
    Retrieve the details of a specific chat session.

    Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
-    If there's an active stream for this session, returns active_stream info for reconnection.
+    If there's an active stream for this session, returns the task_id for reconnection.

    Args:
        session_id: The unique identifier for the desired chat session.
@@ -345,21 +288,28 @@ async def get_session(

    # Check if there's an active stream for this session
    active_stream_info = None
-    active_session, last_message_id = await stream_registry.get_active_session(
+    active_task, last_message_id = await stream_registry.get_active_task_for_session(
        session_id, user_id
    )
    logger.info(
-        f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
+        f"[GET_SESSION] session={session_id}, active_task={active_task is not None}, "
        f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
    )
-    if active_session:
-        # Keep the assistant message (including tool_calls) so the frontend can
-        # render the correct tool UI (e.g. CreateAgent with mini game).
-        # convertChatSessionToUiMessages handles isComplete=false by setting
-        # tool parts without output to state "input-available".
+    if active_task:
+        # Filter out the in-progress assistant message from the session response.
+        # The client will receive the complete assistant response through the SSE
+        # stream replay instead, preventing duplicate content.
+        if messages and messages[-1].get("role") == "assistant":
+            messages = messages[:-1]
+
+        # Use "0-0" as last_message_id to replay the stream from the beginning.
+        # Since we filtered out the cached assistant message, the client needs
+        # the full stream to reconstruct the response.
        active_stream_info = ActiveStreamInfo(
-            turn_id=active_session.turn_id,
-            last_message_id=last_message_id,
+            task_id=active_task.task_id,
+            last_message_id="0-0",
+            operation_id=active_task.operation_id,
+            tool_name=active_task.tool_name,
        )

    return SessionDetailResponse(
@@ -379,7 +329,7 @@ async def get_session(
 async def cancel_session_task(
    session_id: str,
    user_id: Annotated[str | None, Depends(auth.get_user_id)],
-) -> CancelSessionResponse:
+) -> CancelTaskResponse:
    """Cancel the active streaming task for a session.

    Publishes a cancel event to the executor via RabbitMQ FANOUT, then
@@ -388,33 +338,39 @@ async def cancel_session_task(
    """
    await _validate_and_get_session(session_id, user_id)

-    active_session, _ = await stream_registry.get_active_session(session_id, user_id)
-    if not active_session:
-        return CancelSessionResponse(cancelled=True, reason="no_active_session")
+    active_task, _ = await stream_registry.get_active_task_for_session(
+        session_id, user_id
+    )
+    if not active_task:
+        return CancelTaskResponse(cancelled=False, reason="no_active_task")

-    await enqueue_cancel_task(session_id)
-    logger.info(f"[CANCEL] Published cancel for session ...{session_id[-8:]}")
+    task_id = active_task.task_id
+    await enqueue_cancel_task(task_id)
+    logger.info(
+        f"[CANCEL] Published cancel for task ...{task_id[-8:]} "
+        f"session ...{session_id[-8:]}"
+    )

    # Poll until the executor confirms the task is no longer running.
+    # Keep max_wait below typical reverse-proxy read timeouts.
    poll_interval = 0.5
    max_wait = 5.0
    waited = 0.0
    while waited < max_wait:
        await asyncio.sleep(poll_interval)
        waited += poll_interval
-        session_state = await stream_registry.get_session(session_id)
-        if session_state is None or session_state.status != "running":
+        task = await stream_registry.get_task(task_id)
+        if task is None or task.status != "running":
            logger.info(
-                f"[CANCEL] Session ...{session_id[-8:]} confirmed stopped "
-                f"(status={session_state.status if session_state else 'gone'}) after {waited:.1f}s"
+                f"[CANCEL] Task ...{task_id[-8:]} confirmed stopped "
+                f"(status={task.status if task else 'gone'}) after {waited:.1f}s"
            )
-            return CancelSessionResponse(cancelled=True)
+            return CancelTaskResponse(cancelled=True, task_id=task_id)

-    logger.warning(
-        f"[CANCEL] Session ...{session_id[-8:]} not confirmed after {max_wait}s, force-completing"
+    logger.warning(f"[CANCEL] Task ...{task_id[-8:]} not confirmed after {max_wait}s")
+    return CancelTaskResponse(
+        cancelled=True, task_id=task_id, reason="cancel_published_not_confirmed"
    )
-    await stream_registry.mark_session_completed(session_id, error_message="Cancelled")
-    return CancelSessionResponse(cancelled=True)


@router.post(
@@ -434,15 +390,16 @@ async def stream_chat_post(
      - Tool execution results

    The AI generation runs in a background task that continues even if the client disconnects.
-    All chunks are written to a per-turn Redis stream for reconnection support. If the client
-    disconnects, they can reconnect using GET /sessions/{session_id}/stream to resume.
+    All chunks are written to Redis for reconnection support. If the client disconnects,
+    they can reconnect using GET /tasks/{task_id}/stream to resume from where they left off.

    Args:
        session_id: The chat session identifier to associate with the streamed messages.
        request: Request body containing message, is_user_message, and optional context.
        user_id: Optional authenticated user ID.
    Returns:
-        StreamingResponse: SSE-formatted response chunks.
+        StreamingResponse: SSE-formatted response chunks. First chunk is a "start" event
+        containing the task_id for reconnection.

    """
    import asyncio
@@ -469,38 +426,6 @@ async def stream_chat_post(
        },
    )

-    # Enrich message with file metadata if file_ids are provided.
-    # Also sanitise file_ids so only validated, workspace-scoped IDs are
-    # forwarded downstream (e.g. to the executor via enqueue_copilot_turn).
-    sanitized_file_ids: list[str] | None = None
-    if request.file_ids and user_id:
-        # Filter to valid UUIDs only to prevent DB abuse
-        valid_ids = [fid for fid in request.file_ids if _UUID_RE.match(fid)]
-
-        if valid_ids:
-            workspace = await get_or_create_workspace(user_id)
-            # Batch query instead of N+1
-            files = await UserWorkspaceFile.prisma().find_many(
-                where={
-                    "id": {"in": valid_ids},
-                    "workspaceId": workspace.id,
-                    "isDeleted": False,
-                }
-            )
-            # Only keep IDs that actually exist in the user's workspace
-            sanitized_file_ids = [wf.id for wf in files] or None
-            file_lines: list[str] = [
-                f"- {wf.name} ({wf.mimeType}, {round(wf.sizeBytes / 1024, 1)} KB), file_id={wf.id}"
-                for wf in files
-            ]
-            if file_lines:
-                files_block = (
-                    "\n\n[Attached files]\n"
-                    + "\n".join(file_lines)
-                    + "\nUse read_workspace_file with the file_id to access file contents."
-                )
-                request.message += files_block
-
    # Atomically append user message to session BEFORE creating task to avoid
    # race condition where GET_SESSION sees task as "running" but message isn't
    # saved yet.  append_and_save_message re-fetches inside a lock to prevent
@@ -521,38 +446,37 @@ async def stream_chat_post(
        logger.info(f"[STREAM] User message saved for session {session_id}")

    # Create a task in the stream registry for reconnection support
-    turn_id = str(uuid4())
-    log_meta["turn_id"] = turn_id
+    task_id = str(uuid_module.uuid4())
+    operation_id = str(uuid_module.uuid4())
+    log_meta["task_id"] = task_id

-    session_create_start = time.perf_counter()
-    await stream_registry.create_session(
+    task_create_start = time.perf_counter()
+    await stream_registry.create_task(
+        task_id=task_id,
        session_id=session_id,
        user_id=user_id,
-        tool_call_id="chat_stream",
+        tool_call_id="chat_stream",  # Not a tool call, but needed for the model
        tool_name="chat",
-        turn_id=turn_id,
+        operation_id=operation_id,
    )
    logger.info(
-        f"[TIMING] create_session completed in {(time.perf_counter() - session_create_start) * 1000:.1f}ms",
+        f"[TIMING] create_task completed in {(time.perf_counter() - task_create_start) * 1000:.1f}ms",
        extra={
            "json_fields": {
                **log_meta,
-                "duration_ms": (time.perf_counter() - session_create_start) * 1000,
+                "duration_ms": (time.perf_counter() - task_create_start) * 1000,
            }
        },
    )

-    # Per-turn stream is always fresh (unique turn_id), subscribe from beginning
-    subscribe_from_id = "0-0"
-
-    await enqueue_copilot_turn(
+    await enqueue_copilot_task(
+        task_id=task_id,
        session_id=session_id,
        user_id=user_id,
+        operation_id=operation_id,
        message=request.message,
-        turn_id=turn_id,
        is_user_message=request.is_user_message,
        context=request.context,
-        file_ids=sanitized_file_ids,
    )

    setup_time = (time.perf_counter() - stream_start_time) * 1000
@@ -567,7 +491,7 @@ async def stream_chat_post(

        event_gen_start = time_module.perf_counter()
        logger.info(
-            f"[TIMING] event_generator STARTED, turn={turn_id}, session={session_id}, "
+            f"[TIMING] event_generator STARTED, task={task_id}, session={session_id}, "
            f"user={user_id}",
            extra={"json_fields": log_meta},
        )
@@ -575,12 +499,11 @@ async def stream_chat_post(
        first_chunk_yielded = False
        chunks_yielded = 0
        try:
-            # Subscribe from the position we captured before enqueuing
-            # This avoids replaying old messages while catching all new ones
-            subscriber_queue = await stream_registry.subscribe_to_session(
-                session_id=session_id,
+            # Subscribe to the task stream (this replays existing messages + live updates)
+            subscriber_queue = await stream_registry.subscribe_to_task(
+                task_id=task_id,
                user_id=user_id,
-                last_message_id=subscribe_from_id,
+                last_message_id="0-0",  # Get all messages from the beginning
            )

            if subscriber_queue is None:
@@ -595,7 +518,7 @@ async def stream_chat_post(
            )
            while True:
                try:
-                    chunk = await asyncio.wait_for(subscriber_queue.get(), timeout=10.0)
+                    chunk = await asyncio.wait_for(subscriber_queue.get(), timeout=30.0)
                    chunks_yielded += 1

                    if not first_chunk_yielded:
@@ -663,19 +586,19 @@ async def stream_chat_post(
            # Unsubscribe when client disconnects or stream ends
            if subscriber_queue is not None:
                try:
-                    await stream_registry.unsubscribe_from_session(
-                        session_id, subscriber_queue
+                    await stream_registry.unsubscribe_from_task(
+                        task_id, subscriber_queue
                    )
                except Exception as unsub_err:
                    logger.error(
-                        f"Error unsubscribing from session {session_id}: {unsub_err}",
+                        f"Error unsubscribing from task {task_id}: {unsub_err}",
                        exc_info=True,
                    )
            # AI SDK protocol termination - always yield even if unsubscribe fails
            total_time = time_module.perf_counter() - event_gen_start
            logger.info(
                f"[TIMING] event_generator FINISHED in {total_time:.2f}s; "
-                f"turn={turn_id}, session={session_id}, n_chunks={chunks_yielded}",
+                f"task={task_id}, session={session_id}, n_chunks={chunks_yielded}",
                extra={
                    "json_fields": {
                        **log_meta,
@@ -722,21 +645,17 @@ async def resume_session_stream(
    """
    import asyncio

-    active_session, last_message_id = await stream_registry.get_active_session(
+    active_task, _last_id = await stream_registry.get_active_task_for_session(
        session_id, user_id
    )

-    if not active_session:
+    if not active_task:
        return Response(status_code=204)

-    # Always replay from the beginning ("0-0") on resume.
-    # We can't use last_message_id because it's the latest ID in the backend
-    # stream, not the latest the frontend received — the gap causes lost
-    # messages. The frontend deduplicates replayed content.
-    subscriber_queue = await stream_registry.subscribe_to_session(
-        session_id=session_id,
+    subscriber_queue = await stream_registry.subscribe_to_task(
+        task_id=active_task.task_id,
        user_id=user_id,
-        last_message_id="0-0",
+        last_message_id="0-0",  # Full replay so useChat rebuilds the message
    )

    if subscriber_queue is None:
@@ -748,7 +667,7 @@ async def resume_session_stream(
        try:
            while True:
                try:
-                    chunk = await asyncio.wait_for(subscriber_queue.get(), timeout=10.0)
+                    chunk = await asyncio.wait_for(subscriber_queue.get(), timeout=30.0)
                    if chunk_count < 3:
                        logger.info(
                            "Resume stream chunk",
@@ -772,12 +691,12 @@ async def resume_session_stream(
            logger.error(f"Error in resume stream for session {session_id}: {e}")
        finally:
            try:
-                await stream_registry.unsubscribe_from_session(
-                    session_id, subscriber_queue
+                await stream_registry.unsubscribe_from_task(
+                    active_task.task_id, subscriber_queue
                )
            except Exception as unsub_err:
                logger.error(
-                    f"Error unsubscribing from session {active_session.session_id}: {unsub_err}",
+                    f"Error unsubscribing from task {active_task.task_id}: {unsub_err}",
                    exc_info=True,
                )
            logger.info(
@@ -805,6 +724,7 @@ async def resume_session_stream(
@router.patch(
    "/sessions/{session_id}/assign-user",
    dependencies=[Security(auth.requires_user)],
+    status_code=200,
 )
 async def session_assign_user(
    session_id: str,
@@ -827,6 +747,229 @@ async def session_assign_user(
    return {"status": "ok"}


+# ========== Task Streaming (SSE Reconnection) ==========
+
+
+@router.get(
+    "/tasks/{task_id}/stream",
+)
+async def stream_task(
+    task_id: str,
+    user_id: str | None = Depends(auth.get_user_id),
+    last_message_id: str = Query(
+        default="0-0",
+        description="Last Redis Stream message ID received (e.g., '1706540123456-0'). Use '0-0' for full replay.",
+    ),
+):
+    """
+    Reconnect to a long-running task's SSE stream.
+
+    When a long-running operation (like agent generation) starts, the client
+    receives a task_id. If the connection drops, the client can reconnect
+    using this endpoint to resume receiving updates.
+
+    Args:
+        task_id: The task ID from the operation_started response.
+        user_id: Authenticated user ID for ownership validation.
+        last_message_id: Last Redis Stream message ID received ("0-0" for full replay).
+
+    Returns:
+        StreamingResponse: SSE-formatted response chunks starting after last_message_id.
+
+    Raises:
+        HTTPException: 404 if task not found, 410 if task expired, 403 if access denied.
+    """
+    # Check task existence and expiry before subscribing
+    task, error_code = await stream_registry.get_task_with_expiry_info(task_id)
+
+    if error_code == "TASK_EXPIRED":
+        raise HTTPException(
+            status_code=410,
+            detail={
+                "code": "TASK_EXPIRED",
+                "message": "This operation has expired. Please try again.",
+            },
+        )
+
+    if error_code == "TASK_NOT_FOUND":
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "code": "TASK_NOT_FOUND",
+                "message": f"Task {task_id} not found.",
+            },
+        )
+
+    # Validate ownership if task has an owner
+    if task and task.user_id and user_id != task.user_id:
+        raise HTTPException(
+            status_code=403,
+            detail={
+                "code": "ACCESS_DENIED",
+                "message": "You do not have access to this task.",
+            },
+        )
+
+    # Get subscriber queue from stream registry
+    subscriber_queue = await stream_registry.subscribe_to_task(
+        task_id=task_id,
+        user_id=user_id,
+        last_message_id=last_message_id,
+    )
+
+    if subscriber_queue is None:
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "code": "TASK_NOT_FOUND",
+                "message": f"Task {task_id} not found or access denied.",
+            },
+        )
+
+    async def event_generator() -> AsyncGenerator[str, None]:
+        heartbeat_interval = 15.0  # Send heartbeat every 15 seconds
+        try:
+            while True:
+                try:
+                    # Wait for next chunk with timeout for heartbeats
+                    chunk = await asyncio.wait_for(
+                        subscriber_queue.get(), timeout=heartbeat_interval
+                    )
+                    yield chunk.to_sse()
+
+                    # Check for finish signal
+                    if isinstance(chunk, StreamFinish):
+                        break
+                except asyncio.TimeoutError:
+                    # Send heartbeat to keep connection alive
+                    yield StreamHeartbeat().to_sse()
+        except Exception as e:
+            logger.error(f"Error in task stream {task_id}: {e}", exc_info=True)
+        finally:
+            # Unsubscribe when client disconnects or stream ends
+            try:
+                await stream_registry.unsubscribe_from_task(task_id, subscriber_queue)
+            except Exception as unsub_err:
+                logger.error(
+                    f"Error unsubscribing from task {task_id}: {unsub_err}",
+                    exc_info=True,
+                )
+            # AI SDK protocol termination - always yield even if unsubscribe fails
+            yield "data: [DONE]\n\n"
+
+    return StreamingResponse(
+        event_generator(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+            "x-vercel-ai-ui-message-stream": "v1",
+        },
+    )
+
+
+@router.get(
+    "/tasks/{task_id}",
+)
+async def get_task_status(
+    task_id: str,
+    user_id: str | None = Depends(auth.get_user_id),
+) -> dict:
+    """
+    Get the status of a long-running task.
+
+    Args:
+        task_id: The task ID to check.
+        user_id: Authenticated user ID for ownership validation.
+
+    Returns:
+        dict: Task status including task_id, status, tool_name, and operation_id.
+
+    Raises:
+        NotFoundError: If task_id is not found or user doesn't have access.
+    """
+    task = await stream_registry.get_task(task_id)
+
+    if task is None:
+        raise NotFoundError(f"Task {task_id} not found.")
+
+    # Validate ownership - if task has an owner, requester must match
+    if task.user_id and user_id != task.user_id:
+        raise NotFoundError(f"Task {task_id} not found.")
+
+    return {
+        "task_id": task.task_id,
+        "session_id": task.session_id,
+        "status": task.status,
+        "tool_name": task.tool_name,
+        "operation_id": task.operation_id,
+        "created_at": task.created_at.isoformat(),
+    }
+
+
+# ========== External Completion Webhook ==========
+
+
+@router.post(
+    "/operations/{operation_id}/complete",
+    status_code=200,
+)
+async def complete_operation(
+    operation_id: str,
+    request: OperationCompleteRequest,
+    x_api_key: str | None = Header(default=None),
+) -> dict:
+    """
+    External completion webhook for long-running operations.
+
+    Called by Agent Generator (or other services) when an operation completes.
+    This triggers the stream registry to publish completion and continue LLM generation.
+
+    Args:
+        operation_id: The operation ID to complete.
+        request: Completion payload with success status and result/error.
+        x_api_key: Internal API key for authentication.
+
+    Returns:
+        dict: Status of the completion.
+
+    Raises:
+        HTTPException: If API key is invalid or operation not found.
+    """
+    # Validate internal API key - reject if not configured or invalid
+    if not config.internal_api_key:
+        logger.error(
+            "Operation complete webhook rejected: CHAT_INTERNAL_API_KEY not configured"
+        )
+        raise HTTPException(
+            status_code=503,
+            detail="Webhook not available: internal API key not configured",
+        )
+    if x_api_key != config.internal_api_key:
+        raise HTTPException(status_code=401, detail="Invalid API key")
+
+    # Find task by operation_id
+    task = await stream_registry.find_task_by_operation_id(operation_id)
+    if task is None:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Operation {operation_id} not found",
+        )
+
+    logger.info(
+        f"Received completion webhook for operation {operation_id} "
+        f"(task_id={task.task_id}, success={request.success})"
+    )
+
+    if request.success:
+        await process_operation_success(task, request.result)
+    else:
+        await process_operation_failure(task, request.error)
+
+    return {"status": "ok", "task_id": task.task_id}
+
+
 # ========== Configuration ==========


@@ -907,8 +1050,9 @@ ToolResponseUnion = (
    | BlockOutputResponse
    | DocSearchResultsResponse
    | DocPageResponse
-    | MCPToolsDiscoveredResponse
-    | MCPToolOutputResponse
+    | OperationStartedResponse
+    | OperationPendingResponse
+    | OperationInProgressResponse
 )


--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -1,251 +0,0 @@
-"""Tests for chat API routes: session title update and file attachment validation."""
-
-from unittest.mock import AsyncMock
-
-import fastapi
-import fastapi.testclient
-import pytest
-import pytest_mock
-
-from backend.api.features.chat import routes as chat_routes
-
-app = fastapi.FastAPI()
-app.include_router(chat_routes.router)
-
-client = fastapi.testclient.TestClient(app)
-
-TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
-
-
-@pytest.fixture(autouse=True)
-def setup_app_auth(mock_jwt_user):
-    """Setup auth overrides for all tests in this module"""
-    from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def _mock_update_session_title(
-    mocker: pytest_mock.MockerFixture, *, success: bool = True
-):
-    """Mock update_session_title."""
-    return mocker.patch(
-        "backend.api.features.chat.routes.update_session_title",
-        new_callable=AsyncMock,
-        return_value=success,
-    )
-
-
-# ─── Update title: success ─────────────────────────────────────────────
-
-
-def test_update_title_success(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    mock_update = _mock_update_session_title(mocker, success=True)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "My project"},
-    )
-
-    assert response.status_code == 200
-    assert response.json() == {"status": "ok"}
-    mock_update.assert_called_once_with("sess-1", test_user_id, "My project")
-
-
-def test_update_title_trims_whitespace(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    mock_update = _mock_update_session_title(mocker, success=True)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "  trimmed  "},
-    )
-
-    assert response.status_code == 200
-    mock_update.assert_called_once_with("sess-1", test_user_id, "trimmed")
-
-
-# ─── Update title: blank / whitespace-only → 422 ──────────────────────
-
-
-def test_update_title_blank_rejected(
-    test_user_id: str,
-) -> None:
-    """Whitespace-only titles must be rejected before hitting the DB."""
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "   "},
-    )
-
-    assert response.status_code == 422
-
-
-def test_update_title_empty_rejected(
-    test_user_id: str,
-) -> None:
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": ""},
-    )
-
-    assert response.status_code == 422
-
-
-# ─── Update title: session not found or wrong user → 404 ──────────────
-
-
-def test_update_title_not_found(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    _mock_update_session_title(mocker, success=False)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "New name"},
-    )
-
-    assert response.status_code == 404
-
-
-# ─── file_ids Pydantic validation ─────────────────────────────────────
-
-
-def test_stream_chat_rejects_too_many_file_ids():
-    """More than 20 file_ids should be rejected by Pydantic validation (422)."""
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [f"00000000-0000-0000-0000-{i:012d}" for i in range(21)],
-        },
-    )
-    assert response.status_code == 422
-
-
-def _mock_stream_internals(mocker: pytest_mock.MockFixture):
-    """Mock the async internals of stream_chat_post so tests can exercise
-    validation and enrichment logic without needing Redis/RabbitMQ."""
-    mocker.patch(
-        "backend.api.features.chat.routes._validate_and_get_session",
-        return_value=None,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.append_and_save_message",
-        return_value=None,
-    )
-    mock_registry = mocker.MagicMock()
-    mock_registry.create_session = mocker.AsyncMock(return_value=None)
-    mocker.patch(
-        "backend.api.features.chat.routes.stream_registry",
-        mock_registry,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.enqueue_copilot_turn",
-        return_value=None,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.track_user_message",
-        return_value=None,
-    )
-
-
-def test_stream_chat_accepts_20_file_ids(mocker: pytest_mock.MockFixture):
-    """Exactly 20 file_ids should be accepted (not rejected by validation)."""
-    _mock_stream_internals(mocker)
-    # Patch workspace lookup as imported by the routes module
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "ws-1"})(),
-    )
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [f"00000000-0000-0000-0000-{i:012d}" for i in range(20)],
-        },
-    )
-    # Should get past validation — 200 streaming response expected
-    assert response.status_code == 200
-
-
-# ─── UUID format filtering ─────────────────────────────────────────────
-
-
-def test_file_ids_filters_invalid_uuids(mocker: pytest_mock.MockFixture):
-    """Non-UUID strings in file_ids should be silently filtered out
-    and NOT passed to the database query."""
-    _mock_stream_internals(mocker)
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "ws-1"})(),
-    )
-
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    valid_id = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
-    client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [
-                valid_id,
-                "not-a-uuid",
-                "../../../etc/passwd",
-                "",
-            ],
-        },
-    )
-
-    # The find_many call should only receive the one valid UUID
-    mock_prisma.find_many.assert_called_once()
-    call_kwargs = mock_prisma.find_many.call_args[1]
-    assert call_kwargs["where"]["id"]["in"] == [valid_id]
-
-
-# ─── Cross-workspace file_ids ─────────────────────────────────────────
-
-
-def test_file_ids_scoped_to_workspace(mocker: pytest_mock.MockFixture):
-    """The batch query should scope to the user's workspace."""
-    _mock_stream_internals(mocker)
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "my-workspace-id"})(),
-    )
-
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    fid = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
-    client.post(
-        "/sessions/sess-1/stream",
-        json={"message": "hi", "file_ids": [fid]},
-    )
-
-    call_kwargs = mock_prisma.find_many.call_args[1]
-    assert call_kwargs["where"]["workspaceId"] == "my-workspace-id"
-    assert call_kwargs["where"]["isDeleted"] is False
--- a/autogpt_platform/backend/backend/api/features/executions/review/routes.py
+++ b/autogpt_platform/backend/backend/api/features/executions/review/routes.py
@@ -22,7 +22,6 @@ from backend.data.human_review import (
 )
 from backend.data.model import USER_TIMEZONE_NOT_SET
 from backend.data.user import get_user_by_id
-from backend.data.workspace import get_or_create_workspace
 from backend.executor.utils import add_graph_execution

 from .model import PendingHumanReviewModel, ReviewRequest, ReviewResponse
@@ -322,13 +321,10 @@ async def process_review_action(
                    user.timezone if user.timezone != USER_TIMEZONE_NOT_SET else "UTC"
                )

-                workspace = await get_or_create_workspace(user_id)
-
                execution_context = ExecutionContext(
                    human_in_the_loop_safe_mode=settings.human_in_the_loop_safe_mode,
                    sensitive_action_safe_mode=settings.sensitive_action_safe_mode,
                    user_timezone=user_timezone,
-                    workspace_id=workspace.id,
                )

                await add_graph_execution(
--- a/autogpt_platform/backend/backend/api/features/library/db.py
+++ b/autogpt_platform/backend/backend/api/features/library/db.py
--- a/autogpt_platform/backend/backend/api/features/library/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/library/db_test.py
@@ -4,6 +4,7 @@ import prisma.enums
 import prisma.models
 import pytest

+import backend.api.features.store.exceptions
 from backend.data.db import connect
 from backend.data.includes import library_agent_include

@@ -143,7 +144,6 @@ async def test_add_agent_to_library(mocker):
    )

    mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
-    mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
    mock_library_agent.return_value.find_unique = mocker.AsyncMock(return_value=None)
    mock_library_agent.return_value.create = mocker.AsyncMock(
        return_value=mock_library_agent_data
@@ -178,6 +178,7 @@ async def test_add_agent_to_library(mocker):
                "agentGraphVersion": 1,
            }
        },
+        include={"AgentGraph": True},
    )
    # Check that create was called with the expected data including settings
    create_call_args = mock_library_agent.return_value.create.call_args
@@ -217,7 +218,7 @@ async def test_add_agent_to_library_not_found(mocker):
    )

    # Call function and verify exception
-    with pytest.raises(db.NotFoundError):
+    with pytest.raises(backend.api.features.store.exceptions.AgentNotFoundError):
        await db.add_store_agent_to_library("version123", "test-user")

    # Verify mock called correctly
--- a/autogpt_platform/backend/backend/api/features/library/exceptions.py
+++ b/autogpt_platform/backend/backend/api/features/library/exceptions.py
@@ -1,10 +0,0 @@
-class FolderValidationError(Exception):
-    """Raised when folder operations fail validation."""
-
-    pass
-
-
-class FolderAlreadyExistsError(FolderValidationError):
-    """Raised when a folder with the same name already exists in the location."""
-
-    pass
--- a/autogpt_platform/backend/backend/api/features/library/model.py
+++ b/autogpt_platform/backend/backend/api/features/library/model.py
@@ -26,95 +26,6 @@ class LibraryAgentStatus(str, Enum):
    ERROR = "ERROR"


-# === Folder Models ===
-
-
-class LibraryFolder(pydantic.BaseModel):
-    """Represents a folder for organizing library agents."""
-
-    id: str
-    user_id: str
-    name: str
-    icon: str | None = None
-    color: str | None = None
-    parent_id: str | None = None
-    created_at: datetime.datetime
-    updated_at: datetime.datetime
-    agent_count: int = 0  # Direct agents in folder
-    subfolder_count: int = 0  # Direct child folders
-
-    @staticmethod
-    def from_db(
-        folder: prisma.models.LibraryFolder,
-        agent_count: int = 0,
-        subfolder_count: int = 0,
-    ) -> "LibraryFolder":
-        """Factory method that constructs a LibraryFolder from a Prisma model."""
-        return LibraryFolder(
-            id=folder.id,
-            user_id=folder.userId,
-            name=folder.name,
-            icon=folder.icon,
-            color=folder.color,
-            parent_id=folder.parentId,
-            created_at=folder.createdAt,
-            updated_at=folder.updatedAt,
-            agent_count=agent_count,
-            subfolder_count=subfolder_count,
-        )
-
-
-class LibraryFolderTree(LibraryFolder):
-    """Folder with nested children for tree view."""
-
-    children: list["LibraryFolderTree"] = []
-
-
-class FolderCreateRequest(pydantic.BaseModel):
-    """Request model for creating a folder."""
-
-    name: str = pydantic.Field(..., min_length=1, max_length=100)
-    icon: str | None = None
-    color: str | None = pydantic.Field(
-        None, pattern=r"^#[0-9A-Fa-f]{6}$", description="Hex color code (#RRGGBB)"
-    )
-    parent_id: str | None = None
-
-
-class FolderUpdateRequest(pydantic.BaseModel):
-    """Request model for updating a folder."""
-
-    name: str | None = pydantic.Field(None, min_length=1, max_length=100)
-    icon: str | None = None
-    color: str | None = None
-
-
-class FolderMoveRequest(pydantic.BaseModel):
-    """Request model for moving a folder to a new parent."""
-
-    target_parent_id: str | None = None  # None = move to root
-
-
-class BulkMoveAgentsRequest(pydantic.BaseModel):
-    """Request model for moving multiple agents to a folder."""
-
-    agent_ids: list[str]
-    folder_id: str | None = None  # None = move to root
-
-
-class FolderListResponse(pydantic.BaseModel):
-    """Response schema for a list of folders."""
-
-    folders: list[LibraryFolder]
-    pagination: Pagination
-
-
-class FolderTreeResponse(pydantic.BaseModel):
-    """Response schema for folder tree structure."""
-
-    tree: list[LibraryFolderTree]
-
-
 class MarketplaceListingCreator(pydantic.BaseModel):
    """Creator information for a marketplace listing."""

@@ -209,9 +120,6 @@ class LibraryAgent(pydantic.BaseModel):
    can_access_graph: bool
    is_latest_version: bool
    is_favorite: bool
-    folder_id: str | None = None
-    folder_name: str | None = None  # Denormalized for display
-
    recommended_schedule_cron: str | None = None
    settings: GraphSettings = pydantic.Field(default_factory=GraphSettings)
    marketplace_listing: Optional["MarketplaceListing"] = None
@@ -351,8 +259,6 @@ class LibraryAgent(pydantic.BaseModel):
            can_access_graph=can_access_graph,
            is_latest_version=is_latest_version,
            is_favorite=agent.isFavorite,
-            folder_id=agent.folderId,
-            folder_name=agent.Folder.name if agent.Folder else None,
            recommended_schedule_cron=agent.AgentGraph.recommendedScheduleCron,
            settings=_parse_settings(agent.settings),
            marketplace_listing=marketplace_listing_data,
@@ -564,7 +470,3 @@ class LibraryAgentUpdateRequest(pydantic.BaseModel):
    settings: Optional[GraphSettings] = pydantic.Field(
        default=None, description="User-specific settings for this library agent"
    )
-    folder_id: Optional[str] = pydantic.Field(
-        default=None,
-        description="Folder ID to move agent to (None to move to root)",
-    )
--- a/autogpt_platform/backend/backend/api/features/library/routes/init.py
+++ b/autogpt_platform/backend/backend/api/features/library/routes/init.py
@@ -1,11 +1,9 @@
 import fastapi

 from .agents import router as agents_router
-from .folders import router as folders_router
 from .presets import router as presets_router

 router = fastapi.APIRouter()

 router.include_router(presets_router)
-router.include_router(folders_router)
 router.include_router(agents_router)
--- a/autogpt_platform/backend/backend/api/features/library/routes/agents.py
+++ b/autogpt_platform/backend/backend/api/features/library/routes/agents.py
@@ -41,14 +41,6 @@ async def list_library_agents(
        ge=1,
        description="Number of agents per page (must be >= 1)",
    ),
-    folder_id: Optional[str] = Query(
-        None,
-        description="Filter by folder ID",
-    ),
-    include_root_only: bool = Query(
-        False,
-        description="Only return agents without a folder (root-level agents)",
-    ),
 ) -> library_model.LibraryAgentResponse:
    """
    Get all agents in the user's library (both created and saved).
@@ -59,8 +51,6 @@ async def list_library_agents(
        sort_by=sort_by,
        page=page,
        page_size=page_size,
-        folder_id=folder_id,
-        include_root_only=include_root_only,
    )


@@ -178,7 +168,6 @@ async def update_library_agent(
        is_favorite=payload.is_favorite,
        is_archived=payload.is_archived,
        settings=payload.settings,
-        folder_id=payload.folder_id,
    )


--- a/autogpt_platform/backend/backend/api/features/library/routes/folders.py
+++ b/autogpt_platform/backend/backend/api/features/library/routes/folders.py
@@ -1,287 +0,0 @@
-from typing import Optional
-
-import autogpt_libs.auth as autogpt_auth_lib
-from fastapi import APIRouter, Query, Security, status
-from fastapi.responses import Response
-
-from .. import db as library_db
-from .. import model as library_model
-
-router = APIRouter(
-    prefix="/folders",
-    tags=["library", "folders", "private"],
-    dependencies=[Security(autogpt_auth_lib.requires_user)],
-)
-
-
-@router.get(
-    "",
-    summary="List Library Folders",
-    response_model=library_model.FolderListResponse,
-    responses={
-        200: {"description": "List of folders"},
-        500: {"description": "Server error"},
-    },
-)
-async def list_folders(
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-    parent_id: Optional[str] = Query(
-        None,
-        description="Filter by parent folder ID. If not provided, returns root-level folders.",
-    ),
-    include_relations: bool = Query(
-        True,
-        description="Include agent and subfolder relations (for counts)",
-    ),
-) -> library_model.FolderListResponse:
-    """
-    List folders for the authenticated user.
-
-    Args:
-        user_id: ID of the authenticated user.
-        parent_id: Optional parent folder ID to filter by.
-        include_relations: Whether to include agent and subfolder relations for counts.
-
-    Returns:
-        A FolderListResponse containing folders.
-    """
-    folders = await library_db.list_folders(
-        user_id=user_id,
-        parent_id=parent_id,
-        include_relations=include_relations,
-    )
-    return library_model.FolderListResponse(
-        folders=folders,
-        pagination=library_model.Pagination(
-            total_items=len(folders),
-            total_pages=1,
-            current_page=1,
-            page_size=len(folders),
-        ),
-    )
-
-
-@router.get(
-    "/tree",
-    summary="Get Folder Tree",
-    response_model=library_model.FolderTreeResponse,
-    responses={
-        200: {"description": "Folder tree structure"},
-        500: {"description": "Server error"},
-    },
-)
-async def get_folder_tree(
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.FolderTreeResponse:
-    """
-    Get the full folder tree for the authenticated user.
-
-    Args:
-        user_id: ID of the authenticated user.
-
-    Returns:
-        A FolderTreeResponse containing the nested folder structure.
-    """
-    tree = await library_db.get_folder_tree(user_id=user_id)
-    return library_model.FolderTreeResponse(tree=tree)
-
-
-@router.get(
-    "/{folder_id}",
-    summary="Get Folder",
-    response_model=library_model.LibraryFolder,
-    responses={
-        200: {"description": "Folder details"},
-        404: {"description": "Folder not found"},
-        500: {"description": "Server error"},
-    },
-)
-async def get_folder(
-    folder_id: str,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.LibraryFolder:
-    """
-    Get a specific folder.
-
-    Args:
-        folder_id: ID of the folder to retrieve.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The requested LibraryFolder.
-    """
-    return await library_db.get_folder(folder_id=folder_id, user_id=user_id)
-
-
-@router.post(
-    "",
-    summary="Create Folder",
-    status_code=status.HTTP_201_CREATED,
-    response_model=library_model.LibraryFolder,
-    responses={
-        201: {"description": "Folder created successfully"},
-        400: {"description": "Validation error"},
-        404: {"description": "Parent folder not found"},
-        409: {"description": "Folder name conflict"},
-        500: {"description": "Server error"},
-    },
-)
-async def create_folder(
-    payload: library_model.FolderCreateRequest,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.LibraryFolder:
-    """
-    Create a new folder.
-
-    Args:
-        payload: The folder creation request.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The created LibraryFolder.
-    """
-    return await library_db.create_folder(
-        user_id=user_id,
-        name=payload.name,
-        parent_id=payload.parent_id,
-        icon=payload.icon,
-        color=payload.color,
-    )
-
-
-@router.patch(
-    "/{folder_id}",
-    summary="Update Folder",
-    response_model=library_model.LibraryFolder,
-    responses={
-        200: {"description": "Folder updated successfully"},
-        400: {"description": "Validation error"},
-        404: {"description": "Folder not found"},
-        409: {"description": "Folder name conflict"},
-        500: {"description": "Server error"},
-    },
-)
-async def update_folder(
-    folder_id: str,
-    payload: library_model.FolderUpdateRequest,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.LibraryFolder:
-    """
-    Update a folder's properties.
-
-    Args:
-        folder_id: ID of the folder to update.
-        payload: The folder update request.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The updated LibraryFolder.
-    """
-    return await library_db.update_folder(
-        folder_id=folder_id,
-        user_id=user_id,
-        name=payload.name,
-        icon=payload.icon,
-        color=payload.color,
-    )
-
-
-@router.post(
-    "/{folder_id}/move",
-    summary="Move Folder",
-    response_model=library_model.LibraryFolder,
-    responses={
-        200: {"description": "Folder moved successfully"},
-        400: {"description": "Validation error (circular reference)"},
-        404: {"description": "Folder or target parent not found"},
-        409: {"description": "Folder name conflict in target location"},
-        500: {"description": "Server error"},
-    },
-)
-async def move_folder(
-    folder_id: str,
-    payload: library_model.FolderMoveRequest,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> library_model.LibraryFolder:
-    """
-    Move a folder to a new parent.
-
-    Args:
-        folder_id: ID of the folder to move.
-        payload: The move request with target parent.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The moved LibraryFolder.
-    """
-    return await library_db.move_folder(
-        folder_id=folder_id,
-        user_id=user_id,
-        target_parent_id=payload.target_parent_id,
-    )
-
-
-@router.delete(
-    "/{folder_id}",
-    summary="Delete Folder",
-    status_code=status.HTTP_204_NO_CONTENT,
-    responses={
-        204: {"description": "Folder deleted successfully"},
-        404: {"description": "Folder not found"},
-        500: {"description": "Server error"},
-    },
-)
-async def delete_folder(
-    folder_id: str,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> Response:
-    """
-    Soft-delete a folder and all its contents.
-
-    Args:
-        folder_id: ID of the folder to delete.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        204 No Content if successful.
-    """
-    await library_db.delete_folder(
-        folder_id=folder_id,
-        user_id=user_id,
-        soft_delete=True,
-    )
-    return Response(status_code=status.HTTP_204_NO_CONTENT)
-
-
-# === Bulk Agent Operations ===
-
-
-@router.post(
-    "/agents/bulk-move",
-    summary="Bulk Move Agents",
-    response_model=list[library_model.LibraryAgent],
-    responses={
-        200: {"description": "Agents moved successfully"},
-        404: {"description": "Folder not found"},
-        500: {"description": "Server error"},
-    },
-)
-async def bulk_move_agents(
-    payload: library_model.BulkMoveAgentsRequest,
-    user_id: str = Security(autogpt_auth_lib.get_user_id),
-) -> list[library_model.LibraryAgent]:
-    """
-    Move multiple agents to a folder.
-
-    Args:
-        payload: The bulk move request with agent IDs and target folder.
-        user_id: ID of the authenticated user.
-
-    Returns:
-        The updated LibraryAgents.
-    """
-    return await library_db.bulk_move_agents_to_folder(
-        agent_ids=payload.agent_ids,
-        folder_id=payload.folder_id,
-        user_id=user_id,
-    )
--- a/autogpt_platform/backend/backend/api/features/library/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/library/routes_test.py
@@ -115,8 +115,6 @@ async def test_get_library_agents_success(
        sort_by=library_model.LibraryAgentSort.UPDATED_AT,
        page=1,
        page_size=15,
-        folder_id=None,
-        include_root_only=False,
    )


--- a/autogpt_platform/backend/backend/api/features/mcp/routes.py
+++ b/autogpt_platform/backend/backend/api/features/mcp/routes.py
@@ -7,24 +7,20 @@ frontend can list available tools on an MCP server before placing a block.

 import logging
 from typing import Annotated, Any
+from urllib.parse import urlparse

 import fastapi
 from autogpt_libs.auth import get_user_id
 from fastapi import Security
-from pydantic import BaseModel, Field, SecretStr
+from pydantic import BaseModel, Field

 from backend.api.features.integrations.router import CredentialsMetaResponse
 from backend.blocks.mcp.client import MCPClient, MCPClientError
-from backend.blocks.mcp.helpers import (
-    auto_lookup_mcp_credential,
-    normalize_mcp_url,
-    server_host,
-)
 from backend.blocks.mcp.oauth import MCPOAuthHandler
 from backend.data.model import OAuth2Credentials
 from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.integrations.providers import ProviderName
-from backend.util.request import HTTPClientError, Requests, validate_url
+from backend.util.request import HTTPClientError, Requests
 from backend.util.settings import Settings

 logger = logging.getLogger(__name__)
@@ -78,20 +74,32 @@ async def discover_tools(
    If the user has a stored MCP credential for this server URL, it will be
    used automatically — no need to pass an explicit auth token.
    """
-    # Validate URL to prevent SSRF — blocks loopback and private IP ranges.
-    try:
-        await validate_url(request.server_url, trusted_origins=[])
-    except ValueError as e:
-        raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
-
    auth_token = request.auth_token

    # Auto-use stored MCP credential when no explicit token is provided.
    if not auth_token:
-        best_cred = await auto_lookup_mcp_credential(
-            user_id, normalize_mcp_url(request.server_url)
+        mcp_creds = await creds_manager.store.get_creds_by_provider(
+            user_id, ProviderName.MCP.value
        )
+        # Find the freshest credential for this server URL
+        best_cred: OAuth2Credentials | None = None
+        for cred in mcp_creds:
+            if (
+                isinstance(cred, OAuth2Credentials)
+                and (cred.metadata or {}).get("mcp_server_url") == request.server_url
+            ):
+                if best_cred is None or (
+                    (cred.access_token_expires_at or 0)
+                    > (best_cred.access_token_expires_at or 0)
+                ):
+                    best_cred = cred
        if best_cred:
+            # Refresh the token if expired before using it
+            best_cred = await creds_manager.refresh_if_needed(user_id, best_cred)
+            logger.info(
+                f"Using MCP credential {best_cred.id} for {request.server_url}, "
+                f"expires_at={best_cred.access_token_expires_at}"
+            )
            auth_token = best_cred.access_token.get_secret_value()

    client = MCPClient(request.server_url, auth_token=auth_token)
@@ -126,7 +134,7 @@ async def discover_tools(
        ],
        server_name=(
            init_result.get("serverInfo", {}).get("name")
-            or server_host(request.server_url)
+            or urlparse(request.server_url).hostname
            or "MCP"
        ),
        protocol_version=init_result.get("protocolVersion"),
@@ -165,16 +173,7 @@ async def mcp_oauth_login(
    3. Performs Dynamic Client Registration (RFC 7591) if available
    4. Returns the authorization URL for the frontend to open in a popup
    """
-    # Validate URL to prevent SSRF — blocks loopback and private IP ranges.
-    try:
-        await validate_url(request.server_url, trusted_origins=[])
-    except ValueError as e:
-        raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
-
-    # Normalize the URL so that credentials stored here are matched consistently
-    # by auto_lookup_mcp_credential (which also uses normalized URLs).
-    server_url = normalize_mcp_url(request.server_url)
-    client = MCPClient(server_url)
+    client = MCPClient(request.server_url)

    # Step 1: Discover protected-resource metadata (RFC 9728)
    protected_resource = await client.discover_auth()
@@ -183,16 +182,7 @@ async def mcp_oauth_login(

    if protected_resource and protected_resource.get("authorization_servers"):
        auth_server_url = protected_resource["authorization_servers"][0]
-        resource_url = protected_resource.get("resource", server_url)
-
-        # Validate the auth server URL from metadata to prevent SSRF.
-        try:
-            await validate_url(auth_server_url, trusted_origins=[])
-        except ValueError as e:
-            raise fastapi.HTTPException(
-                status_code=400,
-                detail=f"Invalid authorization server URL in metadata: {e}",
-            )
+        resource_url = protected_resource.get("resource", request.server_url)

        # Step 2a: Discover auth-server metadata (RFC 8414)
        metadata = await client.discover_auth_server_metadata(auth_server_url)
@@ -202,7 +192,7 @@ async def mcp_oauth_login(
        # Don't assume a resource_url — omitting it lets the auth server choose
        # the correct audience for the token (RFC 8707 resource is optional).
        resource_url = None
-        metadata = await client.discover_auth_server_metadata(server_url)
+        metadata = await client.discover_auth_server_metadata(request.server_url)

    if (
        not metadata
@@ -232,18 +222,12 @@ async def mcp_oauth_login(
    client_id = ""
    client_secret = ""
    if registration_endpoint:
-        # Validate the registration endpoint to prevent SSRF via metadata.
-        try:
-            await validate_url(registration_endpoint, trusted_origins=[])
-        except ValueError:
-            pass  # Skip registration, fall back to default client_id
-        else:
-            reg_result = await _register_mcp_client(
-                registration_endpoint, redirect_uri, server_url
-            )
-            if reg_result:
-                client_id = reg_result.get("client_id", "")
-                client_secret = reg_result.get("client_secret", "")
+        reg_result = await _register_mcp_client(
+            registration_endpoint, redirect_uri, request.server_url
+        )
+        if reg_result:
+            client_id = reg_result.get("client_id", "")
+            client_secret = reg_result.get("client_secret", "")

    if not client_id:
        client_id = "autogpt-platform"
@@ -261,7 +245,7 @@ async def mcp_oauth_login(
            "token_url": token_url,
            "revoke_url": revoke_url,
            "resource_url": resource_url,
-            "server_url": server_url,
+            "server_url": request.server_url,
            "client_id": client_id,
            "client_secret": client_secret,
        },
@@ -358,7 +342,7 @@ async def mcp_oauth_callback(
    credentials.metadata["mcp_token_url"] = meta["token_url"]
    credentials.metadata["mcp_resource_url"] = meta.get("resource_url", "")

-    hostname = server_host(meta["server_url"])
+    hostname = urlparse(meta["server_url"]).hostname or meta["server_url"]
    credentials.title = f"MCP: {hostname}"

    # Remove old MCP credentials for the same server to prevent stale token buildup.
@@ -373,9 +357,7 @@ async def mcp_oauth_callback(
            ):
                await creds_manager.store.delete_creds_by_id(user_id, old.id)
                logger.info(
-                    "Removed old MCP credential %s for %s",
-                    old.id,
-                    server_host(meta["server_url"]),
+                    f"Removed old MCP credential {old.id} for {meta['server_url']}"
                )
    except Exception:
        logger.debug("Could not clean up old MCP credentials", exc_info=True)
@@ -393,93 +375,6 @@ async def mcp_oauth_callback(
    )


-# ======================== Bearer Token ======================== #
-
-
-class MCPStoreTokenRequest(BaseModel):
-    """Request to store a bearer token for an MCP server that doesn't support OAuth."""
-
-    server_url: str = Field(
-        description="MCP server URL the token authenticates against"
-    )
-    token: SecretStr = Field(
-        min_length=1, description="Bearer token / API key for the MCP server"
-    )
-
-
-@router.post(
-    "/token",
-    summary="Store a bearer token for an MCP server",
-)
-async def mcp_store_token(
-    request: MCPStoreTokenRequest,
-    user_id: Annotated[str, Security(get_user_id)],
-) -> CredentialsMetaResponse:
-    """
-    Store a manually provided bearer token as an MCP credential.
-
-    Used by the Copilot MCPSetupCard when the server doesn't support the MCP
-    OAuth discovery flow (returns 400 from /oauth/login).  Subsequent
-    ``run_mcp_tool`` calls will automatically pick up the token via
-    ``_auto_lookup_credential``.
-    """
-    token = request.token.get_secret_value().strip()
-    if not token:
-        raise fastapi.HTTPException(status_code=422, detail="Token must not be blank.")
-
-    # Validate URL to prevent SSRF — blocks loopback and private IP ranges.
-    try:
-        await validate_url(request.server_url, trusted_origins=[])
-    except ValueError as e:
-        raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
-
-    # Normalize URL so trailing-slash variants match existing credentials.
-    server_url = normalize_mcp_url(request.server_url)
-    hostname = server_host(server_url)
-
-    # Collect IDs of old credentials to clean up after successful create.
-    old_cred_ids: list[str] = []
-    try:
-        old_creds = await creds_manager.store.get_creds_by_provider(
-            user_id, ProviderName.MCP.value
-        )
-        old_cred_ids = [
-            old.id
-            for old in old_creds
-            if isinstance(old, OAuth2Credentials)
-            and normalize_mcp_url((old.metadata or {}).get("mcp_server_url", ""))
-            == server_url
-        ]
-    except Exception:
-        logger.debug("Could not query old MCP token credentials", exc_info=True)
-
-    credentials = OAuth2Credentials(
-        provider=ProviderName.MCP.value,
-        title=f"MCP: {hostname}",
-        access_token=SecretStr(token),
-        scopes=[],
-        metadata={"mcp_server_url": server_url},
-    )
-    await creds_manager.create(user_id, credentials)
-
-    # Only delete old credentials after the new one is safely stored.
-    for old_id in old_cred_ids:
-        try:
-            await creds_manager.store.delete_creds_by_id(user_id, old_id)
-        except Exception:
-            logger.debug("Could not clean up old MCP token credential", exc_info=True)
-
-    return CredentialsMetaResponse(
-        id=credentials.id,
-        provider=credentials.provider,
-        type=credentials.type,
-        title=credentials.title,
-        scopes=credentials.scopes,
-        username=credentials.username,
-        host=hostname,
-    )
-
-
 # ======================== Helpers ======================== #


@@ -505,7 +400,5 @@ async def _register_mcp_client(
            return data
        return None
    except Exception as e:
-        logger.warning(
-            "Dynamic client registration failed for %s: %s", server_host(server_url), e
-        )
+        logger.warning(f"Dynamic client registration failed for {server_url}: {e}")
        return None
--- a/autogpt_platform/backend/backend/api/features/mcp/test_routes.py
+++ b/autogpt_platform/backend/backend/api/features/mcp/test_routes.py
@@ -11,11 +11,9 @@ import httpx
 import pytest
 import pytest_asyncio
 from autogpt_libs.auth import get_user_id
-from pydantic import SecretStr

 from backend.api.features.mcp.routes import router
 from backend.blocks.mcp.client import MCPClientError, MCPTool
-from backend.data.model import OAuth2Credentials
 from backend.util.request import HTTPClientError

 app = fastapi.FastAPI()
@@ -30,16 +28,6 @@ async def client():
        yield c


-@pytest.fixture(autouse=True)
-def _bypass_ssrf_validation():
-    """Bypass validate_url in all route tests (test URLs don't resolve)."""
-    with patch(
-        "backend.api.features.mcp.routes.validate_url",
-        new_callable=AsyncMock,
-    ):
-        yield
-
-
 class TestDiscoverTools:
    @pytest.mark.asyncio(loop_scope="session")
    async def test_discover_tools_success(self, client):
@@ -68,12 +56,9 @@ class TestDiscoverTools:

        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                return_value={
@@ -122,6 +107,10 @@ class TestDiscoverTools:
    @pytest.mark.asyncio(loop_scope="session")
    async def test_discover_tools_auto_uses_stored_credential(self, client):
        """When no explicit token is given, stored MCP credentials are used."""
+        from pydantic import SecretStr
+
+        from backend.data.model import OAuth2Credentials
+
        stored_cred = OAuth2Credentials(
            provider="mcp",
            title="MCP: example.com",
@@ -135,12 +124,10 @@ class TestDiscoverTools:

        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=stored_cred,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[stored_cred])
+            mock_cm.refresh_if_needed = AsyncMock(return_value=stored_cred)
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                return_value={"serverInfo": {}, "protocolVersion": "2025-03-26"}
@@ -162,12 +149,9 @@ class TestDiscoverTools:
    async def test_discover_tools_mcp_error(self, client):
        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                side_effect=MCPClientError("Connection refused")
@@ -185,12 +169,9 @@ class TestDiscoverTools:
    async def test_discover_tools_generic_error(self, client):
        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(side_effect=Exception("Network timeout"))

@@ -206,12 +187,9 @@ class TestDiscoverTools:
    async def test_discover_tools_auth_required(self, client):
        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                side_effect=HTTPClientError("HTTP 401 Error: Unauthorized", 401)
@@ -229,12 +207,9 @@ class TestDiscoverTools:
    async def test_discover_tools_forbidden(self, client):
        with (
            patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
-            patch(
-                "backend.api.features.mcp.routes.auto_lookup_mcp_credential",
-                new_callable=AsyncMock,
-                return_value=None,
-            ),
+            patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
        ):
+            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
            instance = MockClient.return_value
            instance.initialize = AsyncMock(
                side_effect=HTTPClientError("HTTP 403 Error: Forbidden", 403)
@@ -356,6 +331,10 @@ class TestOAuthLogin:
 class TestOAuthCallback:
    @pytest.mark.asyncio(loop_scope="session")
    async def test_oauth_callback_success(self, client):
+        from pydantic import SecretStr
+
+        from backend.data.model import OAuth2Credentials
+
        mock_creds = OAuth2Credentials(
            provider="mcp",
            title=None,
@@ -455,118 +434,3 @@ class TestOAuthCallback:

        assert response.status_code == 400
        assert "token exchange failed" in response.json()["detail"].lower()
-
-
-class TestStoreToken:
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_store_token_success(self, client):
-        with patch("backend.api.features.mcp.routes.creds_manager") as mock_cm:
-            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
-            mock_cm.create = AsyncMock()
-
-            response = await client.post(
-                "/token",
-                json={
-                    "server_url": "https://mcp.example.com/mcp",
-                    "token": "my-api-key-123",
-                },
-            )
-
-        assert response.status_code == 200
-        data = response.json()
-        assert data["provider"] == "mcp"
-        assert data["type"] == "oauth2"
-        assert data["host"] == "mcp.example.com"
-        mock_cm.create.assert_called_once()
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_store_token_blank_rejected(self, client):
-        """Blank token string (after stripping) should return 422."""
-        response = await client.post(
-            "/token",
-            json={
-                "server_url": "https://mcp.example.com/mcp",
-                "token": "   ",
-            },
-        )
-        # Pydantic min_length=1 catches the whitespace-only token
-        assert response.status_code == 422
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_store_token_replaces_old_credential(self, client):
-        old_cred = OAuth2Credentials(
-            provider="mcp",
-            title="MCP: mcp.example.com",
-            access_token=SecretStr("old-token"),
-            scopes=[],
-            metadata={"mcp_server_url": "https://mcp.example.com/mcp"},
-        )
-        with patch("backend.api.features.mcp.routes.creds_manager") as mock_cm:
-            mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[old_cred])
-            mock_cm.create = AsyncMock()
-            mock_cm.store.delete_creds_by_id = AsyncMock()
-
-            response = await client.post(
-                "/token",
-                json={
-                    "server_url": "https://mcp.example.com/mcp",
-                    "token": "new-token",
-                },
-            )
-
-        assert response.status_code == 200
-        mock_cm.store.delete_creds_by_id.assert_called_once_with(
-            "test-user-id", old_cred.id
-        )
-
-
-class TestSSRFValidation:
-    """Verify that validate_url is enforced on all endpoints."""
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_discover_tools_ssrf_blocked(self, client):
-        with patch(
-            "backend.api.features.mcp.routes.validate_url",
-            new_callable=AsyncMock,
-            side_effect=ValueError("blocked loopback"),
-        ):
-            response = await client.post(
-                "/discover-tools",
-                json={"server_url": "http://localhost/mcp"},
-            )
-
-        assert response.status_code == 400
-        assert "blocked loopback" in response.json()["detail"].lower()
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_oauth_login_ssrf_blocked(self, client):
-        with patch(
-            "backend.api.features.mcp.routes.validate_url",
-            new_callable=AsyncMock,
-            side_effect=ValueError("blocked private IP"),
-        ):
-            response = await client.post(
-                "/oauth/login",
-                json={"server_url": "http://10.0.0.1/mcp"},
-            )
-
-        assert response.status_code == 400
-        assert "blocked private ip" in response.json()["detail"].lower()
-
-    @pytest.mark.asyncio(loop_scope="session")
-    async def test_store_token_ssrf_blocked(self, client):
-        with patch(
-            "backend.api.features.mcp.routes.validate_url",
-            new_callable=AsyncMock,
-            side_effect=ValueError("blocked loopback"),
-        ):
-            response = await client.post(
-                "/token",
-                json={
-                    "server_url": "http://127.0.0.1/mcp",
-                    "token": "some-token",
-                },
-            )
-
-        assert response.status_code == 400
-        assert "blocked loopback" in response.json()["detail"].lower()
--- a/autogpt_platform/backend/backend/api/features/store/cache.py
+++ b/autogpt_platform/backend/backend/api/features/store/cache.py
@@ -1,3 +1,5 @@
+from typing import Literal
+
 from backend.util.cache import cached

 from . import db as store_db
@@ -21,7 +23,7 @@ def clear_all_caches():
 async def _get_cached_store_agents(
    featured: bool,
    creator: str | None,
-    sorted_by: store_db.StoreAgentsSortOptions | None,
+    sorted_by: Literal["rating", "runs", "name", "updated_at"] | None,
    search_query: str | None,
    category: str | None,
    page: int,
@@ -55,7 +57,7 @@ async def _get_cached_agent_details(
 async def _get_cached_store_creators(
    featured: bool,
    search_query: str | None,
-    sorted_by: store_db.StoreCreatorsSortOptions | None,
+    sorted_by: Literal["agent_rating", "agent_runs", "num_agents"] | None,
    page: int,
    page_size: int,
 ):
@@ -73,4 +75,4 @@ async def _get_cached_store_creators(
@cached(maxsize=100, ttl_seconds=300, shared_cache=True)
 async def _get_cached_creator_details(username: str):
    """Cached helper to get creator details."""
-    return await store_db.get_store_creator(username=username.lower())
+    return await store_db.get_store_creator_details(username=username.lower())
--- a/autogpt_platform/backend/backend/api/features/store/content_handlers.py
+++ b/autogpt_platform/backend/backend/api/features/store/content_handlers.py
@@ -9,26 +9,15 @@ import logging
 from abc import ABC, abstractmethod
 from dataclasses import dataclass
 from pathlib import Path
-from typing import Any, get_args, get_origin
+from typing import Any

 from prisma.enums import ContentType

-from backend.blocks.llm import LlmModel
 from backend.data.db import query_raw_with_schema

 logger = logging.getLogger(__name__)


-def _contains_type(annotation: Any, target: type) -> bool:
-    """Check if an annotation is or contains the target type (handles Optional/Union/Annotated)."""
-    if annotation is target:
-        return True
-    origin = get_origin(annotation)
-    if origin is None:
-        return False
-    return any(_contains_type(arg, target) for arg in get_args(annotation))
-
-
@dataclass
 class ContentItem:
    """Represents a piece of content to be embedded."""
@@ -199,51 +188,45 @@ class BlockHandler(ContentHandler):
            try:
                block_instance = block_cls()

+                # Skip disabled blocks - they shouldn't be indexed
                if block_instance.disabled:
                    continue

                # Build searchable text from block metadata
                parts = []
-                if block_instance.name:
+                if hasattr(block_instance, "name") and block_instance.name:
                    parts.append(block_instance.name)
-                if block_instance.description:
+                if (
+                    hasattr(block_instance, "description")
+                    and block_instance.description
+                ):
                    parts.append(block_instance.description)
-                if block_instance.categories:
+                if hasattr(block_instance, "categories") and block_instance.categories:
+                    # Convert BlockCategory enum to strings
                    parts.append(
                        " ".join(str(cat.value) for cat in block_instance.categories)
                    )

-                # Add input schema field descriptions
-                block_input_fields = block_instance.input_schema.model_fields
-                parts += [
-                    f"{field_name}: {field_info.description}"
-                    for field_name, field_info in block_input_fields.items()
-                    if field_info.description
-                ]
+                # Add input/output schema info
+                if hasattr(block_instance, "input_schema"):
+                    schema = block_instance.input_schema
+                    if hasattr(schema, "model_json_schema"):
+                        schema_dict = schema.model_json_schema()
+                        if "properties" in schema_dict:
+                            for prop_name, prop_info in schema_dict[
+                                "properties"
+                            ].items():
+                                if "description" in prop_info:
+                                    parts.append(
+                                        f"{prop_name}: {prop_info['description']}"
+                                    )

                searchable_text = " ".join(parts)

+                # Convert categories set of enums to list of strings for JSON serialization
+                categories = getattr(block_instance, "categories", set())
                categories_list = (
-                    [cat.value for cat in block_instance.categories]
-                    if block_instance.categories
-                    else []
-                )
-
-                # Extract provider names from credentials fields
-                credentials_info = (
-                    block_instance.input_schema.get_credentials_fields_info()
-                )
-                is_integration = len(credentials_info) > 0
-                provider_names = [
-                    provider.value.lower()
-                    for info in credentials_info.values()
-                    for provider in info.provider
-                ]
-
-                # Check if block has LlmModel field in input schema
-                has_llm_model_field = any(
-                    _contains_type(field.annotation, LlmModel)
-                    for field in block_instance.input_schema.model_fields.values()
+                    [cat.value for cat in categories] if categories else []
                )

                items.append(
@@ -252,11 +235,8 @@ class BlockHandler(ContentHandler):
                        content_type=ContentType.BLOCK,
                        searchable_text=searchable_text,
                        metadata={
-                            "name": block_instance.name,
+                            "name": getattr(block_instance, "name", ""),
                            "categories": categories_list,
-                            "providers": provider_names,
-                            "has_llm_model_field": has_llm_model_field,
-                            "is_integration": is_integration,
                        },
                        user_id=None,  # Blocks are public
                    )
--- a/autogpt_platform/backend/backend/api/features/store/content_handlers_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/content_handlers_test.py
@@ -82,10 +82,9 @@ async def test_block_handler_get_missing_items(mocker):
    mock_block_instance.description = "Performs calculations"
    mock_block_instance.categories = [MagicMock(value="MATH")]
    mock_block_instance.disabled = False
-    mock_field = MagicMock()
-    mock_field.description = "Math expression to evaluate"
-    mock_block_instance.input_schema.model_fields = {"expression": mock_field}
-    mock_block_instance.input_schema.get_credentials_fields_info.return_value = {}
+    mock_block_instance.input_schema.model_json_schema.return_value = {
+        "properties": {"expression": {"description": "Math expression to evaluate"}}
+    }
    mock_block_class.return_value = mock_block_instance

    mock_blocks = {"block-uuid-1": mock_block_class}
@@ -310,19 +309,19 @@ async def test_content_handlers_registry():


@pytest.mark.asyncio(loop_scope="session")
-async def test_block_handler_handles_empty_attributes():
-    """Test BlockHandler handles blocks with empty/falsy attribute values."""
+async def test_block_handler_handles_missing_attributes():
+    """Test BlockHandler gracefully handles blocks with missing attributes."""
    handler = BlockHandler()

-    # Mock block with empty values (all attributes exist but are falsy)
+    # Mock block with minimal attributes
    mock_block_class = MagicMock()
    mock_block_instance = MagicMock()
    mock_block_instance.name = "Minimal Block"
    mock_block_instance.disabled = False
-    mock_block_instance.description = ""
-    mock_block_instance.categories = set()
-    mock_block_instance.input_schema.model_fields = {}
-    mock_block_instance.input_schema.get_credentials_fields_info.return_value = {}
+    # No description, categories, or schema
+    del mock_block_instance.description
+    del mock_block_instance.categories
+    del mock_block_instance.input_schema
    mock_block_class.return_value = mock_block_instance

    mock_blocks = {"block-minimal": mock_block_class}
@@ -353,8 +352,6 @@ async def test_block_handler_skips_failed_blocks():
    good_instance.description = "Works fine"
    good_instance.categories = []
    good_instance.disabled = False
-    good_instance.input_schema.model_fields = {}
-    good_instance.input_schema.get_credentials_fields_info.return_value = {}
    good_block.return_value = good_instance

    bad_block = MagicMock()
--- a/autogpt_platform/backend/backend/api/features/store/db.py
+++ b/autogpt_platform/backend/backend/api/features/store/db.py
--- a/autogpt_platform/backend/backend/api/features/store/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/db_test.py
@@ -26,7 +26,7 @@ async def test_get_store_agents(mocker):
    mock_agents = [
        prisma.models.StoreAgent(
            listing_id="test-id",
-            listing_version_id="version123",
+            storeListingVersionId="version123",
            slug="test-agent",
            agent_name="Test Agent",
            agent_video=None,
@@ -40,11 +40,11 @@ async def test_get_store_agents(mocker):
            runs=10,
            rating=4.5,
            versions=["1.0"],
-            graph_id="test-graph-id",
-            graph_versions=["1"],
+            agentGraphVersions=["1"],
+            agentGraphId="test-graph-id",
            updated_at=datetime.now(),
            is_available=False,
-            use_for_onboarding=False,
+            useForOnboarding=False,
        )
    ]

@@ -68,10 +68,10 @@ async def test_get_store_agents(mocker):

@pytest.mark.asyncio(loop_scope="session")
 async def test_get_store_agent_details(mocker):
-    # Mock data - StoreAgent view already contains the active version data
+    # Mock data
    mock_agent = prisma.models.StoreAgent(
        listing_id="test-id",
-        listing_version_id="version123",
+        storeListingVersionId="version123",
        slug="test-agent",
        agent_name="Test Agent",
        agent_video="video.mp4",
@@ -85,38 +85,102 @@ async def test_get_store_agent_details(mocker):
        runs=10,
        rating=4.5,
        versions=["1.0"],
-        graph_id="test-graph-id",
-        graph_versions=["1"],
+        agentGraphVersions=["1"],
+        agentGraphId="test-graph-id",
        updated_at=datetime.now(),
-        is_available=True,
-        use_for_onboarding=False,
+        is_available=False,
+        useForOnboarding=False,
    )

-    # Mock StoreAgent prisma call
+    # Mock active version agent (what we want to return for active version)
+    mock_active_agent = prisma.models.StoreAgent(
+        listing_id="test-id",
+        storeListingVersionId="active-version-id",
+        slug="test-agent",
+        agent_name="Test Agent Active",
+        agent_video="active_video.mp4",
+        agent_image=["active_image.jpg"],
+        featured=False,
+        creator_username="creator",
+        creator_avatar="avatar.jpg",
+        sub_heading="Test heading active",
+        description="Test description active",
+        categories=["test"],
+        runs=15,
+        rating=4.8,
+        versions=["1.0", "2.0"],
+        agentGraphVersions=["1", "2"],
+        agentGraphId="test-graph-id-active",
+        updated_at=datetime.now(),
+        is_available=True,
+        useForOnboarding=False,
+    )
+
+    # Create a mock StoreListing result
+    mock_store_listing = mocker.MagicMock()
+    mock_store_listing.activeVersionId = "active-version-id"
+    mock_store_listing.hasApprovedVersion = True
+    mock_store_listing.ActiveVersion = mocker.MagicMock()
+    mock_store_listing.ActiveVersion.recommendedScheduleCron = None
+
+    # Mock StoreAgent prisma call - need to handle multiple calls
    mock_store_agent = mocker.patch("prisma.models.StoreAgent.prisma")
-    mock_store_agent.return_value.find_first = mocker.AsyncMock(return_value=mock_agent)
+
+    # Set up side_effect to return different results for different calls
+    def mock_find_first_side_effect(*args, **kwargs):
+        where_clause = kwargs.get("where", {})
+        if "storeListingVersionId" in where_clause:
+            # Second call for active version
+            return mock_active_agent
+        else:
+            # First call for initial lookup
+            return mock_agent
+
+    mock_store_agent.return_value.find_first = mocker.AsyncMock(
+        side_effect=mock_find_first_side_effect
+    )
+
+    # Mock Profile prisma call
+    mock_profile = mocker.MagicMock()
+    mock_profile.userId = "user-id-123"
+    mock_profile_db = mocker.patch("prisma.models.Profile.prisma")
+    mock_profile_db.return_value.find_first = mocker.AsyncMock(
+        return_value=mock_profile
+    )
+
+    # Mock StoreListing prisma call
+    mock_store_listing_db = mocker.patch("prisma.models.StoreListing.prisma")
+    mock_store_listing_db.return_value.find_first = mocker.AsyncMock(
+        return_value=mock_store_listing
+    )

    # Call function
    result = await db.get_store_agent_details("creator", "test-agent")

-    # Verify results - constructed from the StoreAgent view
+    # Verify results - should use active version data
    assert result.slug == "test-agent"
-    assert result.agent_name == "Test Agent"
-    assert result.active_version_id == "version123"
+    assert result.agent_name == "Test Agent Active"  # From active version
+    assert result.active_version_id == "active-version-id"
    assert result.has_approved_version is True
-    assert result.store_listing_version_id == "version123"
-    assert result.graph_id == "test-graph-id"
-    assert result.runs == 10
-    assert result.rating == 4.5
+    assert (
+        result.store_listing_version_id == "active-version-id"
+    )  # Should be active version ID

-    # Verify single StoreAgent lookup
-    mock_store_agent.return_value.find_first.assert_called_once_with(
+    # Verify mocks called correctly - now expecting 2 calls
+    assert mock_store_agent.return_value.find_first.call_count == 2
+
+    # Check the specific calls
+    calls = mock_store_agent.return_value.find_first.call_args_list
+    assert calls[0] == mocker.call(
        where={"creator_username": "creator", "slug": "test-agent"}
    )
+    assert calls[1] == mocker.call(where={"storeListingVersionId": "active-version-id"})
+
+    mock_store_listing_db.return_value.find_first.assert_called_once()


@pytest.mark.asyncio(loop_scope="session")
-async def test_get_store_creator(mocker):
+async def test_get_store_creator_details(mocker):
    # Mock data
    mock_creator_data = prisma.models.Creator(
        name="Test Creator",
@@ -138,7 +202,7 @@ async def test_get_store_creator(mocker):
    mock_creator.return_value.find_unique.return_value = mock_creator_data

    # Call function
-    result = await db.get_store_creator("creator")
+    result = await db.get_store_creator_details("creator")

    # Verify results
    assert result.username == "creator"
@@ -154,110 +218,61 @@ async def test_get_store_creator(mocker):

@pytest.mark.asyncio(loop_scope="session")
 async def test_create_store_submission(mocker):
-    now = datetime.now()
-
-    # Mock agent graph (with no pending submissions) and user with profile
-    mock_profile = prisma.models.Profile(
-        id="profile-id",
-        userId="user-id",
-        name="Test User",
-        username="testuser",
-        description="Test",
-        isFeatured=False,
-        links=[],
-        createdAt=now,
-        updatedAt=now,
-    )
-    mock_user = prisma.models.User(
-        id="user-id",
-        email="test@example.com",
-        createdAt=now,
-        updatedAt=now,
-        Profile=[mock_profile],
-        emailVerified=True,
-        metadata="{}",  # type: ignore[reportArgumentType]
-        integrations="",
-        maxEmailsPerDay=1,
-        notifyOnAgentRun=True,
-        notifyOnZeroBalance=True,
-        notifyOnLowBalance=True,
-        notifyOnBlockExecutionFailed=True,
-        notifyOnContinuousAgentError=True,
-        notifyOnDailySummary=True,
-        notifyOnWeeklySummary=True,
-        notifyOnMonthlySummary=True,
-        notifyOnAgentApproved=True,
-        notifyOnAgentRejected=True,
-        timezone="Europe/Delft",
-    )
+    # Mock data
    mock_agent = prisma.models.AgentGraph(
        id="agent-id",
        version=1,
        userId="user-id",
-        createdAt=now,
+        createdAt=datetime.now(),
        isActive=True,
-        StoreListingVersions=[],
-        User=mock_user,
    )

-    # Mock the created StoreListingVersion (returned by create)
-    mock_store_listing_obj = prisma.models.StoreListing(
+    mock_listing = prisma.models.StoreListing(
        id="listing-id",
-        createdAt=now,
-        updatedAt=now,
+        createdAt=datetime.now(),
+        updatedAt=datetime.now(),
        isDeleted=False,
        hasApprovedVersion=False,
        slug="test-agent",
        agentGraphId="agent-id",
-        owningUserId="user-id",
-        useForOnboarding=False,
-    )
-    mock_version = prisma.models.StoreListingVersion(
-        id="version-id",
-        agentGraphId="agent-id",
        agentGraphVersion=1,
-        name="Test Agent",
-        description="Test description",
-        createdAt=now,
-        updatedAt=now,
-        subHeading="",
-        imageUrls=[],
-        categories=[],
-        isFeatured=False,
-        isDeleted=False,
-        version=1,
-        storeListingId="listing-id",
-        submissionStatus=prisma.enums.SubmissionStatus.PENDING,
-        isAvailable=True,
-        submittedAt=now,
-        StoreListing=mock_store_listing_obj,
+        owningUserId="user-id",
+        Versions=[
+            prisma.models.StoreListingVersion(
+                id="version-id",
+                agentGraphId="agent-id",
+                agentGraphVersion=1,
+                name="Test Agent",
+                description="Test description",
+                createdAt=datetime.now(),
+                updatedAt=datetime.now(),
+                subHeading="Test heading",
+                imageUrls=["image.jpg"],
+                categories=["test"],
+                isFeatured=False,
+                isDeleted=False,
+                version=1,
+                storeListingId="listing-id",
+                submissionStatus=prisma.enums.SubmissionStatus.PENDING,
+                isAvailable=True,
+            )
+        ],
+        useForOnboarding=False,
    )

    # Mock prisma calls
    mock_agent_graph = mocker.patch("prisma.models.AgentGraph.prisma")
    mock_agent_graph.return_value.find_first = mocker.AsyncMock(return_value=mock_agent)

-    # Mock transaction context manager
-    mock_tx = mocker.MagicMock()
-    mocker.patch(
-        "backend.api.features.store.db.transaction",
-        return_value=mocker.AsyncMock(
-            __aenter__=mocker.AsyncMock(return_value=mock_tx),
-            __aexit__=mocker.AsyncMock(return_value=False),
-        ),
-    )
-
-    mock_sl = mocker.patch("prisma.models.StoreListing.prisma")
-    mock_sl.return_value.find_unique = mocker.AsyncMock(return_value=None)
-
-    mock_slv = mocker.patch("prisma.models.StoreListingVersion.prisma")
-    mock_slv.return_value.create = mocker.AsyncMock(return_value=mock_version)
+    mock_store_listing = mocker.patch("prisma.models.StoreListing.prisma")
+    mock_store_listing.return_value.find_first = mocker.AsyncMock(return_value=None)
+    mock_store_listing.return_value.create = mocker.AsyncMock(return_value=mock_listing)

    # Call function
    result = await db.create_store_submission(
        user_id="user-id",
-        graph_id="agent-id",
-        graph_version=1,
+        agent_id="agent-id",
+        agent_version=1,
        slug="test-agent",
        name="Test Agent",
        description="Test description",
@@ -266,11 +281,11 @@ async def test_create_store_submission(mocker):
    # Verify results
    assert result.name == "Test Agent"
    assert result.description == "Test description"
-    assert result.listing_version_id == "version-id"
+    assert result.store_listing_version_id == "version-id"

    # Verify mocks called correctly
    mock_agent_graph.return_value.find_first.assert_called_once()
-    mock_slv.return_value.create.assert_called_once()
+    mock_store_listing.return_value.create.assert_called_once()


@pytest.mark.asyncio(loop_scope="session")
@@ -303,6 +318,7 @@ async def test_update_profile(mocker):
        description="Test description",
        links=["link1"],
        avatar_url="avatar.jpg",
+        is_featured=False,
    )

    # Call function
@@ -373,7 +389,7 @@ async def test_get_store_agents_with_search_and_filters_parameterized():
        creators=["creator1'; DROP TABLE Users; --", "creator2"],
        category="AI'; DELETE FROM StoreAgent; --",
        featured=True,
-        sorted_by=db.StoreAgentsSortOptions.RATING,
+        sorted_by="rating",
        page=1,
        page_size=20,
    )
--- a/autogpt_platform/backend/backend/api/features/store/exceptions.py
+++ b/autogpt_platform/backend/backend/api/features/store/exceptions.py
@@ -57,6 +57,12 @@ class StoreError(ValueError):
    pass


+class AgentNotFoundError(NotFoundError):
+    """Raised when an agent is not found"""
+
+    pass
+
+
 class CreatorNotFoundError(NotFoundError):
    """Raised when a creator is not found"""

--- a/autogpt_platform/backend/backend/api/features/store/hybrid_search.py
+++ b/autogpt_platform/backend/backend/api/features/store/hybrid_search.py
@@ -568,7 +568,7 @@ async def hybrid_search(
            SELECT uce."contentId" as "storeListingVersionId"
            FROM {{schema_prefix}}"UnifiedContentEmbedding" uce
            INNER JOIN {{schema_prefix}}"StoreAgent" sa
-                ON uce."contentId" = sa.listing_version_id
+                ON uce."contentId" = sa."storeListingVersionId"
            WHERE uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
            AND uce."userId" IS NULL
            AND uce.search @@ plainto_tsquery('english', {query_param})
@@ -582,7 +582,7 @@ async def hybrid_search(
                SELECT uce."contentId", uce.embedding
                FROM {{schema_prefix}}"UnifiedContentEmbedding" uce
                INNER JOIN {{schema_prefix}}"StoreAgent" sa
-                    ON uce."contentId" = sa.listing_version_id
+                    ON uce."contentId" = sa."storeListingVersionId"
                WHERE uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
                AND uce."userId" IS NULL
                AND {where_clause}
@@ -605,7 +605,7 @@ async def hybrid_search(
                sa.featured,
                sa.is_available,
                sa.updated_at,
-                sa.graph_id,
+                sa."agentGraphId",
                -- Searchable text for BM25 reranking
                COALESCE(sa.agent_name, '') || ' ' || COALESCE(sa.sub_heading, '') || ' ' || COALESCE(sa.description, '') as searchable_text,
                -- Semantic score
@@ -627,9 +627,9 @@ async def hybrid_search(
                sa.runs as popularity_raw
            FROM candidates c
            INNER JOIN {{schema_prefix}}"StoreAgent" sa
-                ON c."storeListingVersionId" = sa.listing_version_id
+                ON c."storeListingVersionId" = sa."storeListingVersionId"
            INNER JOIN {{schema_prefix}}"UnifiedContentEmbedding" uce
-                ON sa.listing_version_id = uce."contentId"
+                ON sa."storeListingVersionId" = uce."contentId"
                AND uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
        ),
        max_vals AS (
@@ -665,7 +665,7 @@ async def hybrid_search(
                featured,
                is_available,
                updated_at,
-                graph_id,
+                "agentGraphId",
                searchable_text,
                semantic_score,
                lexical_score,
--- a/autogpt_platform/backend/backend/api/features/store/model.py
+++ b/autogpt_platform/backend/backend/api/features/store/model.py
@@ -1,14 +1,11 @@
 import datetime
-from typing import TYPE_CHECKING, List, Self
+from typing import List

 import prisma.enums
 import pydantic

 from backend.util.models import Pagination

-if TYPE_CHECKING:
-    import prisma.models
-

 class ChangelogEntry(pydantic.BaseModel):
    version: str
@@ -16,9 +13,9 @@ class ChangelogEntry(pydantic.BaseModel):
    date: datetime.datetime


-class MyUnpublishedAgent(pydantic.BaseModel):
-    graph_id: str
-    graph_version: int
+class MyAgent(pydantic.BaseModel):
+    agent_id: str
+    agent_version: int
    agent_name: str
    agent_image: str | None = None
    description: str
@@ -26,8 +23,8 @@ class MyUnpublishedAgent(pydantic.BaseModel):
    recommended_schedule_cron: str | None = None


-class MyUnpublishedAgentsResponse(pydantic.BaseModel):
-    agents: list[MyUnpublishedAgent]
+class MyAgentsResponse(pydantic.BaseModel):
+    agents: list[MyAgent]
    pagination: Pagination


@@ -43,21 +40,6 @@ class StoreAgent(pydantic.BaseModel):
    rating: float
    agent_graph_id: str

-    @classmethod
-    def from_db(cls, agent: "prisma.models.StoreAgent") -> "StoreAgent":
-        return cls(
-            slug=agent.slug,
-            agent_name=agent.agent_name,
-            agent_image=agent.agent_image[0] if agent.agent_image else "",
-            creator=agent.creator_username or "Needs Profile",
-            creator_avatar=agent.creator_avatar or "",
-            sub_heading=agent.sub_heading,
-            description=agent.description,
-            runs=agent.runs,
-            rating=agent.rating,
-            agent_graph_id=agent.graph_id,
-        )
-

 class StoreAgentsResponse(pydantic.BaseModel):
    agents: list[StoreAgent]
@@ -80,192 +62,81 @@ class StoreAgentDetails(pydantic.BaseModel):
    runs: int
    rating: float
    versions: list[str]
-    graph_id: str
-    graph_versions: list[str]
+    agentGraphVersions: list[str]
+    agentGraphId: str
    last_updated: datetime.datetime
    recommended_schedule_cron: str | None = None

-    active_version_id: str
-    has_approved_version: bool
+    active_version_id: str | None = None
+    has_approved_version: bool = False

    # Optional changelog data when include_changelog=True
    changelog: list[ChangelogEntry] | None = None

-    @classmethod
-    def from_db(cls, agent: "prisma.models.StoreAgent") -> "StoreAgentDetails":
-        return cls(
-            store_listing_version_id=agent.listing_version_id,
-            slug=agent.slug,
-            agent_name=agent.agent_name,
-            agent_video=agent.agent_video or "",
-            agent_output_demo=agent.agent_output_demo or "",
-            agent_image=agent.agent_image,
-            creator=agent.creator_username or "",
-            creator_avatar=agent.creator_avatar or "",
-            sub_heading=agent.sub_heading,
-            description=agent.description,
-            categories=agent.categories,
-            runs=agent.runs,
-            rating=agent.rating,
-            versions=agent.versions,
-            graph_id=agent.graph_id,
-            graph_versions=agent.graph_versions,
-            last_updated=agent.updated_at,
-            recommended_schedule_cron=agent.recommended_schedule_cron,
-            active_version_id=agent.listing_version_id,
-            has_approved_version=True,  # StoreAgent view only has approved agents
-        )

-
-class Profile(pydantic.BaseModel):
-    """Marketplace user profile (only attributes that the user can update)"""
-
-    username: str
+class Creator(pydantic.BaseModel):
    name: str
+    username: str
    description: str
-    avatar_url: str | None
-    links: list[str]
-
-
-class ProfileDetails(Profile):
-    """Marketplace user profile (including read-only fields)"""
-
-    is_featured: bool
-
-    @classmethod
-    def from_db(cls, profile: "prisma.models.Profile") -> "ProfileDetails":
-        return cls(
-            name=profile.name,
-            username=profile.username,
-            avatar_url=profile.avatarUrl,
-            description=profile.description,
-            links=profile.links,
-            is_featured=profile.isFeatured,
-        )
-
-
-class CreatorDetails(ProfileDetails):
-    """Marketplace creator profile details, including aggregated stats"""
-
+    avatar_url: str
    num_agents: int
-    agent_runs: int
    agent_rating: float
-    top_categories: list[str]
-
-    @classmethod
-    def from_db(cls, creator: "prisma.models.Creator") -> "CreatorDetails":  # type: ignore[override]
-        return cls(
-            name=creator.name,
-            username=creator.username,
-            avatar_url=creator.avatar_url,
-            description=creator.description,
-            links=creator.links,
-            is_featured=creator.is_featured,
-            num_agents=creator.num_agents,
-            agent_runs=creator.agent_runs,
-            agent_rating=creator.agent_rating,
-            top_categories=creator.top_categories,
-        )
+    agent_runs: int
+    is_featured: bool


 class CreatorsResponse(pydantic.BaseModel):
-    creators: List[CreatorDetails]
+    creators: List[Creator]
    pagination: Pagination


-class StoreSubmission(pydantic.BaseModel):
-    # From StoreListing:
-    listing_id: str
-    user_id: str
-    slug: str
+class CreatorDetails(pydantic.BaseModel):
+    name: str
+    username: str
+    description: str
+    links: list[str]
+    avatar_url: str
+    agent_rating: float
+    agent_runs: int
+    top_categories: list[str]

-    # From StoreListingVersion:
-    listing_version_id: str
-    listing_version: int
-    graph_id: str
-    graph_version: int
+
+class Profile(pydantic.BaseModel):
+    name: str
+    username: str
+    description: str
+    links: list[str]
+    avatar_url: str
+    is_featured: bool = False
+
+
+class StoreSubmission(pydantic.BaseModel):
+    listing_id: str
+    agent_id: str
+    agent_version: int
    name: str
    sub_heading: str
+    slug: str
    description: str
-    instructions: str | None
-    categories: list[str]
+    instructions: str | None = None
    image_urls: list[str]
-    video_url: str | None
-    agent_output_demo_url: str | None
-
-    submitted_at: datetime.datetime | None
-    changes_summary: str | None
+    date_submitted: datetime.datetime
    status: prisma.enums.SubmissionStatus
-    reviewed_at: datetime.datetime | None = None
+    runs: int
+    rating: float
+    store_listing_version_id: str | None = None
+    version: int | None = None  # Actual version number from the database
+
    reviewer_id: str | None = None
    review_comments: str | None = None  # External comments visible to creator
+    internal_comments: str | None = None  # Private notes for admin use only
+    reviewed_at: datetime.datetime | None = None
+    changes_summary: str | None = None

-    # Aggregated from AgentGraphExecutions and StoreListingReviews:
-    run_count: int = 0
-    review_count: int = 0
-    review_avg_rating: float = 0.0
-
-    @classmethod
-    def from_db(cls, _sub: "prisma.models.StoreSubmission") -> Self:
-        """Construct from the StoreSubmission Prisma view."""
-        return cls(
-            listing_id=_sub.listing_id,
-            user_id=_sub.user_id,
-            slug=_sub.slug,
-            listing_version_id=_sub.listing_version_id,
-            listing_version=_sub.listing_version,
-            graph_id=_sub.graph_id,
-            graph_version=_sub.graph_version,
-            name=_sub.name,
-            sub_heading=_sub.sub_heading,
-            description=_sub.description,
-            instructions=_sub.instructions,
-            categories=_sub.categories,
-            image_urls=_sub.image_urls,
-            video_url=_sub.video_url,
-            agent_output_demo_url=_sub.agent_output_demo_url,
-            submitted_at=_sub.submitted_at,
-            changes_summary=_sub.changes_summary,
-            status=_sub.status,
-            reviewed_at=_sub.reviewed_at,
-            reviewer_id=_sub.reviewer_id,
-            review_comments=_sub.review_comments,
-            run_count=_sub.run_count,
-            review_count=_sub.review_count,
-            review_avg_rating=_sub.review_avg_rating,
-        )
-
-    @classmethod
-    def from_listing_version(cls, _lv: "prisma.models.StoreListingVersion") -> Self:
-        """
-        Construct from the StoreListingVersion Prisma model (with StoreListing included)
-        """
-        if not (_l := _lv.StoreListing):
-            raise ValueError("StoreListingVersion must have included StoreListing")
-
-        return cls(
-            listing_id=_l.id,
-            user_id=_l.owningUserId,
-            slug=_l.slug,
-            listing_version_id=_lv.id,
-            listing_version=_lv.version,
-            graph_id=_lv.agentGraphId,
-            graph_version=_lv.agentGraphVersion,
-            name=_lv.name,
-            sub_heading=_lv.subHeading,
-            description=_lv.description,
-            instructions=_lv.instructions,
-            categories=_lv.categories,
-            image_urls=_lv.imageUrls,
-            video_url=_lv.videoUrl,
-            agent_output_demo_url=_lv.agentOutputDemoUrl,
-            submitted_at=_lv.submittedAt,
-            changes_summary=_lv.changesSummary,
-            status=_lv.submissionStatus,
-            reviewed_at=_lv.reviewedAt,
-            reviewer_id=_lv.reviewerId,
-            review_comments=_lv.reviewComments,
-        )
+    # Additional fields for editing
+    video_url: str | None = None
+    agent_output_demo_url: str | None = None
+    categories: list[str] = []


 class StoreSubmissionsResponse(pydantic.BaseModel):
@@ -273,12 +144,33 @@ class StoreSubmissionsResponse(pydantic.BaseModel):
    pagination: Pagination


+class StoreListingWithVersions(pydantic.BaseModel):
+    """A store listing with its version history"""
+
+    listing_id: str
+    slug: str
+    agent_id: str
+    agent_version: int
+    active_version_id: str | None = None
+    has_approved_version: bool = False
+    creator_email: str | None = None
+    latest_version: StoreSubmission | None = None
+    versions: list[StoreSubmission] = []
+
+
+class StoreListingsWithVersionsResponse(pydantic.BaseModel):
+    """Response model for listings with version history"""
+
+    listings: list[StoreListingWithVersions]
+    pagination: Pagination
+
+
 class StoreSubmissionRequest(pydantic.BaseModel):
-    graph_id: str = pydantic.Field(
-        ..., min_length=1, description="Graph ID cannot be empty"
+    agent_id: str = pydantic.Field(
+        ..., min_length=1, description="Agent ID cannot be empty"
    )
-    graph_version: int = pydantic.Field(
-        ..., gt=0, description="Graph version must be greater than 0"
+    agent_version: int = pydantic.Field(
+        ..., gt=0, description="Agent version must be greater than 0"
    )
    slug: str
    name: str
@@ -306,42 +198,12 @@ class StoreSubmissionEditRequest(pydantic.BaseModel):
    recommended_schedule_cron: str | None = None


-class StoreSubmissionAdminView(StoreSubmission):
-    internal_comments: str | None  # Private admin notes
-
-    @classmethod
-    def from_db(cls, _sub: "prisma.models.StoreSubmission") -> Self:
-        return cls(
-            **StoreSubmission.from_db(_sub).model_dump(),
-            internal_comments=_sub.internal_comments,
-        )
-
-    @classmethod
-    def from_listing_version(cls, _lv: "prisma.models.StoreListingVersion") -> Self:
-        return cls(
-            **StoreSubmission.from_listing_version(_lv).model_dump(),
-            internal_comments=_lv.internalComments,
-        )
-
-
-class StoreListingWithVersionsAdminView(pydantic.BaseModel):
-    """A store listing with its version history"""
-
-    listing_id: str
-    graph_id: str
-    slug: str
-    active_listing_version_id: str | None = None
-    has_approved_version: bool = False
-    creator_email: str | None = None
-    latest_version: StoreSubmissionAdminView | None = None
-    versions: list[StoreSubmissionAdminView] = []
-
-
-class StoreListingsWithVersionsAdminViewResponse(pydantic.BaseModel):
-    """Response model for listings with version history"""
-
-    listings: list[StoreListingWithVersionsAdminView]
-    pagination: Pagination
+class ProfileDetails(pydantic.BaseModel):
+    name: str
+    username: str
+    description: str
+    links: list[str]
+    avatar_url: str | None = None


 class StoreReview(pydantic.BaseModel):
--- a/autogpt_platform/backend/backend/api/features/store/model_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/model_test.py
@@ -0,0 +1,203 @@
+import datetime
+
+import prisma.enums
+
+from . import model as store_model
+
+
+def test_pagination():
+    pagination = store_model.Pagination(
+        total_items=100, total_pages=5, current_page=2, page_size=20
+    )
+    assert pagination.total_items == 100
+    assert pagination.total_pages == 5
+    assert pagination.current_page == 2
+    assert pagination.page_size == 20
+
+
+def test_store_agent():
+    agent = store_model.StoreAgent(
+        slug="test-agent",
+        agent_name="Test Agent",
+        agent_image="test.jpg",
+        creator="creator1",
+        creator_avatar="avatar.jpg",
+        sub_heading="Test subheading",
+        description="Test description",
+        runs=50,
+        rating=4.5,
+        agent_graph_id="test-graph-id",
+    )
+    assert agent.slug == "test-agent"
+    assert agent.agent_name == "Test Agent"
+    assert agent.runs == 50
+    assert agent.rating == 4.5
+    assert agent.agent_graph_id == "test-graph-id"
+
+
+def test_store_agents_response():
+    response = store_model.StoreAgentsResponse(
+        agents=[
+            store_model.StoreAgent(
+                slug="test-agent",
+                agent_name="Test Agent",
+                agent_image="test.jpg",
+                creator="creator1",
+                creator_avatar="avatar.jpg",
+                sub_heading="Test subheading",
+                description="Test description",
+                runs=50,
+                rating=4.5,
+                agent_graph_id="test-graph-id",
+            )
+        ],
+        pagination=store_model.Pagination(
+            total_items=1, total_pages=1, current_page=1, page_size=20
+        ),
+    )
+    assert len(response.agents) == 1
+    assert response.pagination.total_items == 1
+
+
+def test_store_agent_details():
+    details = store_model.StoreAgentDetails(
+        store_listing_version_id="version123",
+        slug="test-agent",
+        agent_name="Test Agent",
+        agent_video="video.mp4",
+        agent_output_demo="demo.mp4",
+        agent_image=["image1.jpg", "image2.jpg"],
+        creator="creator1",
+        creator_avatar="avatar.jpg",
+        sub_heading="Test subheading",
+        description="Test description",
+        categories=["cat1", "cat2"],
+        runs=50,
+        rating=4.5,
+        versions=["1.0", "2.0"],
+        agentGraphVersions=["1", "2"],
+        agentGraphId="test-graph-id",
+        last_updated=datetime.datetime.now(),
+    )
+    assert details.slug == "test-agent"
+    assert len(details.agent_image) == 2
+    assert len(details.categories) == 2
+    assert len(details.versions) == 2
+
+
+def test_creator():
+    creator = store_model.Creator(
+        agent_rating=4.8,
+        agent_runs=1000,
+        name="Test Creator",
+        username="creator1",
+        description="Test description",
+        avatar_url="avatar.jpg",
+        num_agents=5,
+        is_featured=False,
+    )
+    assert creator.name == "Test Creator"
+    assert creator.num_agents == 5
+
+
+def test_creators_response():
+    response = store_model.CreatorsResponse(
+        creators=[
+            store_model.Creator(
+                agent_rating=4.8,
+                agent_runs=1000,
+                name="Test Creator",
+                username="creator1",
+                description="Test description",
+                avatar_url="avatar.jpg",
+                num_agents=5,
+                is_featured=False,
+            )
+        ],
+        pagination=store_model.Pagination(
+            total_items=1, total_pages=1, current_page=1, page_size=20
+        ),
+    )
+    assert len(response.creators) == 1
+    assert response.pagination.total_items == 1
+
+
+def test_creator_details():
+    details = store_model.CreatorDetails(
+        name="Test Creator",
+        username="creator1",
+        description="Test description",
+        links=["link1.com", "link2.com"],
+        avatar_url="avatar.jpg",
+        agent_rating=4.8,
+        agent_runs=1000,
+        top_categories=["cat1", "cat2"],
+    )
+    assert details.name == "Test Creator"
+    assert len(details.links) == 2
+    assert details.agent_rating == 4.8
+    assert len(details.top_categories) == 2
+
+
+def test_store_submission():
+    submission = store_model.StoreSubmission(
+        listing_id="listing123",
+        agent_id="agent123",
+        agent_version=1,
+        sub_heading="Test subheading",
+        name="Test Agent",
+        slug="test-agent",
+        description="Test description",
+        image_urls=["image1.jpg", "image2.jpg"],
+        date_submitted=datetime.datetime(2023, 1, 1),
+        status=prisma.enums.SubmissionStatus.PENDING,
+        runs=50,
+        rating=4.5,
+    )
+    assert submission.name == "Test Agent"
+    assert len(submission.image_urls) == 2
+    assert submission.status == prisma.enums.SubmissionStatus.PENDING
+
+
+def test_store_submissions_response():
+    response = store_model.StoreSubmissionsResponse(
+        submissions=[
+            store_model.StoreSubmission(
+                listing_id="listing123",
+                agent_id="agent123",
+                agent_version=1,
+                sub_heading="Test subheading",
+                name="Test Agent",
+                slug="test-agent",
+                description="Test description",
+                image_urls=["image1.jpg"],
+                date_submitted=datetime.datetime(2023, 1, 1),
+                status=prisma.enums.SubmissionStatus.PENDING,
+                runs=50,
+                rating=4.5,
+            )
+        ],
+        pagination=store_model.Pagination(
+            total_items=1, total_pages=1, current_page=1, page_size=20
+        ),
+    )
+    assert len(response.submissions) == 1
+    assert response.pagination.total_items == 1
+
+
+def test_store_submission_request():
+    request = store_model.StoreSubmissionRequest(
+        agent_id="agent123",
+        agent_version=1,
+        slug="test-agent",
+        name="Test Agent",
+        sub_heading="Test subheading",
+        video_url="video.mp4",
+        image_urls=["image1.jpg", "image2.jpg"],
+        description="Test description",
+        categories=["cat1", "cat2"],
+    )
+    assert request.agent_id == "agent123"
+    assert request.agent_version == 1
+    assert len(request.image_urls) == 2
+    assert len(request.categories) == 2
--- a/autogpt_platform/backend/backend/api/features/store/routes.py
+++ b/autogpt_platform/backend/backend/api/features/store/routes.py
@@ -1,17 +1,16 @@
 import logging
 import tempfile
+import typing
 import urllib.parse
+from typing import Literal

 import autogpt_libs.auth
 import fastapi
 import fastapi.responses
 import prisma.enums
-from fastapi import Query, Security
-from pydantic import BaseModel

 import backend.data.graph
 import backend.util.json
-from backend.util.exceptions import NotFoundError
 from backend.util.models import Pagination

 from . import cache as store_cache
@@ -35,15 +34,22 @@ router = fastapi.APIRouter()
    "/profile",
    summary="Get user profile",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=store_model.ProfileDetails,
 )
 async def get_profile(
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-) -> store_model.ProfileDetails:
-    """Get the profile details for the authenticated user."""
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+):
+    """
+    Get the profile details for the authenticated user.
+    Cached for 1 hour per user.
+    """
    profile = await store_db.get_user_profile(user_id)
    if profile is None:
-        raise NotFoundError("User does not have a profile yet")
+        return fastapi.responses.JSONResponse(
+            status_code=404,
+            content={"detail": "Profile not found"},
+        )
    return profile


@@ -51,17 +57,98 @@ async def get_profile(
    "/profile",
    summary="Update user profile",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=store_model.CreatorDetails,
 )
 async def update_or_create_profile(
    profile: store_model.Profile,
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-) -> store_model.ProfileDetails:
-    """Update the store profile for the authenticated user."""
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+):
+    """
+    Update the store profile for the authenticated user.
+
+    Args:
+        profile (Profile): The updated profile details
+        user_id (str): ID of the authenticated user
+
+    Returns:
+        CreatorDetails: The updated profile
+
+    Raises:
+        HTTPException: If there is an error updating the profile
+    """
    updated_profile = await store_db.update_profile(user_id=user_id, profile=profile)
    return updated_profile


+##############################################
+############### Agent Endpoints ##############
+##############################################
+
+
+@router.get(
+    "/agents",
+    summary="List store agents",
+    tags=["store", "public"],
+    response_model=store_model.StoreAgentsResponse,
+)
+async def get_agents(
+    featured: bool = False,
+    creator: str | None = None,
+    sorted_by: Literal["rating", "runs", "name", "updated_at"] | None = None,
+    search_query: str | None = None,
+    category: str | None = None,
+    page: int = 1,
+    page_size: int = 20,
+):
+    """
+    Get a paginated list of agents from the store with optional filtering and sorting.
+
+    Args:
+        featured (bool, optional): Filter to only show featured agents. Defaults to False.
+        creator (str | None, optional): Filter agents by creator username. Defaults to None.
+        sorted_by (str | None, optional): Sort agents by "runs" or "rating". Defaults to None.
+        search_query (str | None, optional): Search agents by name, subheading and description. Defaults to None.
+        category (str | None, optional): Filter agents by category. Defaults to None.
+        page (int, optional): Page number for pagination. Defaults to 1.
+        page_size (int, optional): Number of agents per page. Defaults to 20.
+
+    Returns:
+        StoreAgentsResponse: Paginated list of agents matching the filters
+
+    Raises:
+        HTTPException: If page or page_size are less than 1
+
+    Used for:
+    - Home Page Featured Agents
+    - Home Page Top Agents
+    - Search Results
+    - Agent Details - Other Agents By Creator
+    - Agent Details - Similar Agents
+    - Creator Details - Agents By Creator
+    """
+    if page < 1:
+        raise fastapi.HTTPException(
+            status_code=422, detail="Page must be greater than 0"
+        )
+
+    if page_size < 1:
+        raise fastapi.HTTPException(
+            status_code=422, detail="Page size must be greater than 0"
+        )
+
+    agents = await store_cache._get_cached_store_agents(
+        featured=featured,
+        creator=creator,
+        sorted_by=sorted_by,
+        search_query=search_query,
+        category=category,
+        page=page,
+        page_size=page_size,
+    )
+    return agents
+
+
 ##############################################
 ############### Search Endpoints #############
 ##############################################
@@ -71,30 +158,60 @@ async def update_or_create_profile(
    "/search",
    summary="Unified search across all content types",
    tags=["store", "public"],
+    response_model=store_model.UnifiedSearchResponse,
 )
 async def unified_search(
    query: str,
-    content_types: list[prisma.enums.ContentType] | None = Query(
+    content_types: list[str] | None = fastapi.Query(
        default=None,
-        description="Content types to search. If not specified, searches all.",
+        description="Content types to search: STORE_AGENT, BLOCK, DOCUMENTATION. If not specified, searches all.",
    ),
-    page: int = Query(ge=1, default=1),
-    page_size: int = Query(ge=1, default=20),
-    user_id: str | None = Security(
+    page: int = 1,
+    page_size: int = 20,
+    user_id: str | None = fastapi.Security(
        autogpt_libs.auth.get_optional_user_id, use_cache=False
    ),
-) -> store_model.UnifiedSearchResponse:
+):
    """
-    Search across all content types (marketplace agents, blocks, documentation)
-    using hybrid search.
+    Search across all content types (store agents, blocks, documentation) using hybrid search.

    Combines semantic (embedding-based) and lexical (text-based) search for best results.
+
+    Args:
+        query: The search query string
+        content_types: Optional list of content types to filter by (STORE_AGENT, BLOCK, DOCUMENTATION)
+        page: Page number for pagination (default 1)
+        page_size: Number of results per page (default 20)
+        user_id: Optional authenticated user ID (for user-scoped content in future)
+
+    Returns:
+        UnifiedSearchResponse: Paginated list of search results with relevance scores
    """
+    if page < 1:
+        raise fastapi.HTTPException(
+            status_code=422, detail="Page must be greater than 0"
+        )
+
+    if page_size < 1:
+        raise fastapi.HTTPException(
+            status_code=422, detail="Page size must be greater than 0"
+        )
+
+    # Convert string content types to enum
+    content_type_enums: list[prisma.enums.ContentType] | None = None
+    if content_types:
+        try:
+            content_type_enums = [prisma.enums.ContentType(ct) for ct in content_types]
+        except ValueError as e:
+            raise fastapi.HTTPException(
+                status_code=422,
+                detail=f"Invalid content type. Valid values: STORE_AGENT, BLOCK, DOCUMENTATION. Error: {e}",
+            )

    # Perform unified hybrid search
    results, total = await store_hybrid_search.unified_hybrid_search(
        query=query,
-        content_types=content_types,
+        content_types=content_type_enums,
        user_id=user_id,
        page=page,
        page_size=page_size,
@@ -128,69 +245,22 @@ async def unified_search(
    )


-##############################################
-############### Agent Endpoints ##############
-##############################################
-
-
-@router.get(
-    "/agents",
-    summary="List store agents",
-    tags=["store", "public"],
-)
-async def get_agents(
-    featured: bool = Query(
-        default=False, description="Filter to only show featured agents"
-    ),
-    creator: str | None = Query(
-        default=None, description="Filter agents by creator username"
-    ),
-    category: str | None = Query(default=None, description="Filter agents by category"),
-    search_query: str | None = Query(
-        default=None, description="Literal + semantic search on names and descriptions"
-    ),
-    sorted_by: store_db.StoreAgentsSortOptions | None = Query(
-        default=None,
-        description="Property to sort results by. Ignored if search_query is provided.",
-    ),
-    page: int = Query(ge=1, default=1),
-    page_size: int = Query(ge=1, default=20),
-) -> store_model.StoreAgentsResponse:
-    """
-    Get a paginated list of agents from the marketplace,
-    with optional filtering and sorting.
-
-    Used for:
-    - Home Page Featured Agents
-    - Home Page Top Agents
-    - Search Results
-    - Agent Details - Other Agents By Creator
-    - Agent Details - Similar Agents
-    - Creator Details - Agents By Creator
-    """
-    agents = await store_cache._get_cached_store_agents(
-        featured=featured,
-        creator=creator,
-        sorted_by=sorted_by,
-        search_query=search_query,
-        category=category,
-        page=page,
-        page_size=page_size,
-    )
-    return agents
-
-
@router.get(
    "/agents/{username}/{agent_name}",
    summary="Get specific agent",
    tags=["store", "public"],
+    response_model=store_model.StoreAgentDetails,
 )
-async def get_agent_by_name(
+async def get_agent(
    username: str,
    agent_name: str,
-    include_changelog: bool = Query(default=False),
-) -> store_model.StoreAgentDetails:
-    """Get details of a marketplace agent"""
+    include_changelog: bool = fastapi.Query(default=False),
+):
+    """
+    This is only used on the AgentDetails Page.
+
+    It returns the store listing agents details.
+    """
    username = urllib.parse.unquote(username).lower()
    # URL decode the agent name since it comes from the URL path
    agent_name = urllib.parse.unquote(agent_name).lower()
@@ -200,82 +270,76 @@ async def get_agent_by_name(
    return agent


+@router.get(
+    "/graph/{store_listing_version_id}",
+    summary="Get agent graph",
+    tags=["store"],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+)
+async def get_graph_meta_by_store_listing_version_id(
+    store_listing_version_id: str,
+) -> backend.data.graph.GraphModelWithoutNodes:
+    """
+    Get Agent Graph from Store Listing Version ID.
+    """
+    graph = await store_db.get_available_graph(store_listing_version_id)
+    return graph
+
+
+@router.get(
+    "/agents/{store_listing_version_id}",
+    summary="Get agent by version",
+    tags=["store"],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=store_model.StoreAgentDetails,
+)
+async def get_store_agent(store_listing_version_id: str):
+    """
+    Get Store Agent Details from Store Listing Version ID.
+    """
+    agent = await store_db.get_store_agent_by_version_id(store_listing_version_id)
+
+    return agent
+
+
@router.post(
    "/agents/{username}/{agent_name}/review",
    summary="Create agent review",
    tags=["store"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=store_model.StoreReview,
 )
-async def post_user_review_for_agent(
+async def create_review(
    username: str,
    agent_name: str,
    review: store_model.StoreReviewCreate,
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-) -> store_model.StoreReview:
-    """Post a user review on a marketplace agent listing"""
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+):
+    """
+    Create a review for a store agent.
+
+    Args:
+        username: Creator's username
+        agent_name: Name/slug of the agent
+        review: Review details including score and optional comments
+        user_id: ID of authenticated user creating the review
+
+    Returns:
+        The created review
+    """
    username = urllib.parse.unquote(username).lower()
    agent_name = urllib.parse.unquote(agent_name).lower()
-
+    # Create the review
    created_review = await store_db.create_store_review(
        user_id=user_id,
        store_listing_version_id=review.store_listing_version_id,
        score=review.score,
        comments=review.comments,
    )
+
    return created_review


-@router.get(
-    "/listings/versions/{store_listing_version_id}",
-    summary="Get agent by version",
-    tags=["store"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
-)
-async def get_agent_by_listing_version(
-    store_listing_version_id: str,
-) -> store_model.StoreAgentDetails:
-    agent = await store_db.get_store_agent_by_version_id(store_listing_version_id)
-    return agent
-
-
-@router.get(
-    "/listings/versions/{store_listing_version_id}/graph",
-    summary="Get agent graph",
-    tags=["store"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
-)
-async def get_graph_meta_by_store_listing_version_id(
-    store_listing_version_id: str,
-) -> backend.data.graph.GraphModelWithoutNodes:
-    """Get outline of graph belonging to a specific marketplace listing version"""
-    graph = await store_db.get_available_graph(store_listing_version_id)
-    return graph
-
-
-@router.get(
-    "/listings/versions/{store_listing_version_id}/graph/download",
-    summary="Download agent file",
-    tags=["store", "public"],
-)
-async def download_agent_file(
-    store_listing_version_id: str,
-) -> fastapi.responses.FileResponse:
-    """Download agent graph file for a specific marketplace listing version"""
-    graph_data = await store_db.get_agent(store_listing_version_id)
-    file_name = f"agent_{graph_data.id}_v{graph_data.version or 'latest'}.json"
-
-    # Sending graph as a stream (similar to marketplace v1)
-    with tempfile.NamedTemporaryFile(
-        mode="w", suffix=".json", delete=False
-    ) as tmp_file:
-        tmp_file.write(backend.util.json.dumps(graph_data))
-        tmp_file.flush()
-
-        return fastapi.responses.FileResponse(
-            tmp_file.name, filename=file_name, media_type="application/json"
-        )
-
-
 ##############################################
 ############# Creator Endpoints #############
 ##############################################
@@ -285,19 +349,37 @@ async def download_agent_file(
    "/creators",
    summary="List store creators",
    tags=["store", "public"],
+    response_model=store_model.CreatorsResponse,
 )
 async def get_creators(
-    featured: bool = Query(
-        default=False, description="Filter to only show featured creators"
-    ),
-    search_query: str | None = Query(
-        default=None, description="Literal + semantic search on names and descriptions"
-    ),
-    sorted_by: store_db.StoreCreatorsSortOptions | None = None,
-    page: int = Query(ge=1, default=1),
-    page_size: int = Query(ge=1, default=20),
-) -> store_model.CreatorsResponse:
-    """List or search marketplace creators"""
+    featured: bool = False,
+    search_query: str | None = None,
+    sorted_by: Literal["agent_rating", "agent_runs", "num_agents"] | None = None,
+    page: int = 1,
+    page_size: int = 20,
+):
+    """
+    This is needed for:
+    - Home Page Featured Creators
+    - Search Results Page
+
+    ---
+
+    To support this functionality we need:
+    - featured: bool - to limit the list to just featured agents
+    - search_query: str - vector search based on the creators profile description.
+    - sorted_by: [agent_rating, agent_runs] -
+    """
+    if page < 1:
+        raise fastapi.HTTPException(
+            status_code=422, detail="Page must be greater than 0"
+        )
+
+    if page_size < 1:
+        raise fastapi.HTTPException(
+            status_code=422, detail="Page size must be greater than 0"
+        )
+
    creators = await store_cache._get_cached_store_creators(
        featured=featured,
        search_query=search_query,
@@ -309,12 +391,18 @@ async def get_creators(


@router.get(
-    "/creators/{username}",
+    "/creator/{username}",
    summary="Get creator details",
    tags=["store", "public"],
+    response_model=store_model.CreatorDetails,
 )
-async def get_creator(username: str) -> store_model.CreatorDetails:
-    """Get details on a marketplace creator"""
+async def get_creator(
+    username: str,
+):
+    """
+    Get the details of a creator.
+    - Creator Details Page
+    """
    username = urllib.parse.unquote(username).lower()
    creator = await store_cache._get_cached_creator_details(username=username)
    return creator
@@ -326,17 +414,20 @@ async def get_creator(username: str) -> store_model.CreatorDetails:


@router.get(
-    "/my-unpublished-agents",
+    "/myagents",
    summary="Get my agents",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=store_model.MyAgentsResponse,
 )
-async def get_my_unpublished_agents(
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-    page: int = Query(ge=1, default=1),
-    page_size: int = Query(ge=1, default=20),
-) -> store_model.MyUnpublishedAgentsResponse:
-    """List the authenticated user's unpublished agents"""
+async def get_my_agents(
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+    page: typing.Annotated[int, fastapi.Query(ge=1)] = 1,
+    page_size: typing.Annotated[int, fastapi.Query(ge=1)] = 20,
+):
+    """
+    Get user's own agents.
+    """
    agents = await store_db.get_my_agents(user_id, page=page, page_size=page_size)
    return agents

@@ -345,17 +436,28 @@ async def get_my_unpublished_agents(
    "/submissions/{submission_id}",
    summary="Delete store submission",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=bool,
 )
 async def delete_submission(
    submission_id: str,
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-) -> bool:
-    """Delete a marketplace listing submission"""
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+):
+    """
+    Delete a store listing submission.
+
+    Args:
+        user_id (str): ID of the authenticated user
+        submission_id (str): ID of the submission to be deleted
+
+    Returns:
+        bool: True if the submission was successfully deleted, False otherwise
+    """
    result = await store_db.delete_store_submission(
        user_id=user_id,
        submission_id=submission_id,
    )
+
    return result


@@ -363,14 +465,37 @@ async def delete_submission(
    "/submissions",
    summary="List my submissions",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=store_model.StoreSubmissionsResponse,
 )
 async def get_submissions(
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-    page: int = Query(ge=1, default=1),
-    page_size: int = Query(ge=1, default=20),
-) -> store_model.StoreSubmissionsResponse:
-    """List the authenticated user's marketplace listing submissions"""
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+    page: int = 1,
+    page_size: int = 20,
+):
+    """
+    Get a paginated list of store submissions for the authenticated user.
+
+    Args:
+        user_id (str): ID of the authenticated user
+        page (int, optional): Page number for pagination. Defaults to 1.
+        page_size (int, optional): Number of submissions per page. Defaults to 20.
+
+    Returns:
+        StoreListingsResponse: Paginated list of store submissions
+
+    Raises:
+        HTTPException: If page or page_size are less than 1
+    """
+    if page < 1:
+        raise fastapi.HTTPException(
+            status_code=422, detail="Page must be greater than 0"
+        )
+
+    if page_size < 1:
+        raise fastapi.HTTPException(
+            status_code=422, detail="Page size must be greater than 0"
+        )
    listings = await store_db.get_store_submissions(
        user_id=user_id,
        page=page,
@@ -383,17 +508,30 @@ async def get_submissions(
    "/submissions",
    summary="Create store submission",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=store_model.StoreSubmission,
 )
 async def create_submission(
    submission_request: store_model.StoreSubmissionRequest,
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-) -> store_model.StoreSubmission:
-    """Submit a new marketplace listing for review"""
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+):
+    """
+    Create a new store listing submission.
+
+    Args:
+        submission_request (StoreSubmissionRequest): The submission details
+        user_id (str): ID of the authenticated user submitting the listing
+
+    Returns:
+        StoreSubmission: The created store submission
+
+    Raises:
+        HTTPException: If there is an error creating the submission
+    """
    result = await store_db.create_store_submission(
        user_id=user_id,
-        graph_id=submission_request.graph_id,
-        graph_version=submission_request.graph_version,
+        agent_id=submission_request.agent_id,
+        agent_version=submission_request.agent_version,
        slug=submission_request.slug,
        name=submission_request.name,
        video_url=submission_request.video_url,
@@ -406,6 +544,7 @@ async def create_submission(
        changes_summary=submission_request.changes_summary or "Initial Submission",
        recommended_schedule_cron=submission_request.recommended_schedule_cron,
    )
+
    return result


@@ -413,14 +552,28 @@ async def create_submission(
    "/submissions/{store_listing_version_id}",
    summary="Edit store submission",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
+    response_model=store_model.StoreSubmission,
 )
 async def edit_submission(
    store_listing_version_id: str,
    submission_request: store_model.StoreSubmissionEditRequest,
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-) -> store_model.StoreSubmission:
-    """Update a pending marketplace listing submission"""
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+):
+    """
+    Edit an existing store listing submission.
+
+    Args:
+        store_listing_version_id (str): ID of the store listing version to edit
+        submission_request (StoreSubmissionRequest): The updated submission details
+        user_id (str): ID of the authenticated user editing the listing
+
+    Returns:
+        StoreSubmission: The updated store submission
+
+    Raises:
+        HTTPException: If there is an error editing the submission
+    """
    result = await store_db.edit_store_submission(
        user_id=user_id,
        store_listing_version_id=store_listing_version_id,
@@ -435,6 +588,7 @@ async def edit_submission(
        changes_summary=submission_request.changes_summary,
        recommended_schedule_cron=submission_request.recommended_schedule_cron,
    )
+
    return result


@@ -442,61 +596,115 @@ async def edit_submission(
    "/submissions/media",
    summary="Upload submission media",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
 )
 async def upload_submission_media(
    file: fastapi.UploadFile,
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-) -> str:
-    """Upload media for a marketplace listing submission"""
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+):
+    """
+    Upload media (images/videos) for a store listing submission.
+
+    Args:
+        file (UploadFile): The media file to upload
+        user_id (str): ID of the authenticated user uploading the media
+
+    Returns:
+        str: URL of the uploaded media file
+
+    Raises:
+        HTTPException: If there is an error uploading the media
+    """
    media_url = await store_media.upload_media(user_id=user_id, file=file)
    return media_url


-class ImageURLResponse(BaseModel):
-    image_url: str
-
-
@router.post(
    "/submissions/generate_image",
    summary="Generate submission image",
    tags=["store", "private"],
-    dependencies=[Security(autogpt_libs.auth.requires_user)],
+    dependencies=[fastapi.Security(autogpt_libs.auth.requires_user)],
 )
 async def generate_image(
-    graph_id: str,
-    user_id: str = Security(autogpt_libs.auth.get_user_id),
-) -> ImageURLResponse:
+    agent_id: str,
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+) -> fastapi.responses.Response:
    """
-    Generate an image for a marketplace listing submission based on the properties
-    of a given graph.
+    Generate an image for a store listing submission.
+
+    Args:
+        agent_id (str): ID of the agent to generate an image for
+        user_id (str): ID of the authenticated user
+
+    Returns:
+        JSONResponse: JSON containing the URL of the generated image
    """
-    graph = await backend.data.graph.get_graph(
-        graph_id=graph_id, version=None, user_id=user_id
+    agent = await backend.data.graph.get_graph(
+        graph_id=agent_id, version=None, user_id=user_id
    )

-    if not graph:
-        raise NotFoundError(f"Agent graph #{graph_id} not found")
+    if not agent:
+        raise fastapi.HTTPException(
+            status_code=404, detail=f"Agent with ID {agent_id} not found"
+        )
    # Use .jpeg here since we are generating JPEG images
-    filename = f"agent_{graph_id}.jpeg"
+    filename = f"agent_{agent_id}.jpeg"

    existing_url = await store_media.check_media_exists(user_id, filename)
    if existing_url:
-        logger.info(f"Using existing image for agent graph {graph_id}")
-        return ImageURLResponse(image_url=existing_url)
+        logger.info(f"Using existing image for agent {agent_id}")
+        return fastapi.responses.JSONResponse(content={"image_url": existing_url})
    # Generate agent image as JPEG
-    image = await store_image_gen.generate_agent_image(agent=graph)
+    image = await store_image_gen.generate_agent_image(agent=agent)

    # Create UploadFile with the correct filename and content_type
    image_file = fastapi.UploadFile(
        file=image,
        filename=filename,
    )
+
    image_url = await store_media.upload_media(
        user_id=user_id, file=image_file, use_file_name=True
    )

-    return ImageURLResponse(image_url=image_url)
+    return fastapi.responses.JSONResponse(content={"image_url": image_url})
+
+
+@router.get(
+    "/download/agents/{store_listing_version_id}",
+    summary="Download agent file",
+    tags=["store", "public"],
+)
+async def download_agent_file(
+    store_listing_version_id: str = fastapi.Path(
+        ..., description="The ID of the agent to download"
+    ),
+) -> fastapi.responses.FileResponse:
+    """
+    Download the agent file by streaming its content.
+
+    Args:
+        store_listing_version_id (str): The ID of the agent to download
+
+    Returns:
+        StreamingResponse: A streaming response containing the agent's graph data.
+
+    Raises:
+        HTTPException: If the agent is not found or an unexpected error occurs.
+    """
+    graph_data = await store_db.get_agent(store_listing_version_id)
+    file_name = f"agent_{graph_data.id}_v{graph_data.version or 'latest'}.json"
+
+    # Sending graph as a stream (similar to marketplace v1)
+    with tempfile.NamedTemporaryFile(
+        mode="w", suffix=".json", delete=False
+    ) as tmp_file:
+        tmp_file.write(backend.util.json.dumps(graph_data))
+        tmp_file.flush()
+
+        return fastapi.responses.FileResponse(
+            tmp_file.name, filename=file_name, media_type="application/json"
+        )


 ##############################################
--- a/autogpt_platform/backend/backend/api/features/store/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/routes_test.py
@@ -8,8 +8,6 @@ import pytest
 import pytest_mock
 from pytest_snapshot.plugin import Snapshot

-from backend.api.features.store.db import StoreAgentsSortOptions
-
 from . import model as store_model
 from . import routes as store_routes

@@ -198,7 +196,7 @@ def test_get_agents_sorted(
    mock_db_call.assert_called_once_with(
        featured=False,
        creators=None,
-        sorted_by=StoreAgentsSortOptions.RUNS,
+        sorted_by="runs",
        search_query=None,
        category=None,
        page=1,
@@ -382,11 +380,9 @@ def test_get_agent_details(
        runs=100,
        rating=4.5,
        versions=["1.0.0", "1.1.0"],
-        graph_versions=["1", "2"],
-        graph_id="test-graph-id",
+        agentGraphVersions=["1", "2"],
+        agentGraphId="test-graph-id",
        last_updated=FIXED_NOW,
-        active_version_id="test-version-id",
-        has_approved_version=True,
    )
    mock_db_call = mocker.patch("backend.api.features.store.db.get_store_agent_details")
    mock_db_call.return_value = mocked_value
@@ -439,17 +435,15 @@ def test_get_creators_pagination(
 ) -> None:
    mocked_value = store_model.CreatorsResponse(
        creators=[
-            store_model.CreatorDetails(
+            store_model.Creator(
                name=f"Creator {i}",
                username=f"creator{i}",
-                avatar_url=f"avatar{i}.jpg",
                description=f"Creator {i} description",
-                links=[f"user{i}.link.com"],
-                is_featured=False,
+                avatar_url=f"avatar{i}.jpg",
                num_agents=1,
-                agent_runs=100,
                agent_rating=4.5,
-                top_categories=["cat1", "cat2", "cat3"],
+                agent_runs=100,
+                is_featured=False,
            )
            for i in range(5)
        ],
@@ -502,19 +496,19 @@ def test_get_creator_details(
    mocked_value = store_model.CreatorDetails(
        name="Test User",
        username="creator1",
-        avatar_url="avatar.jpg",
        description="Test creator description",
        links=["link1.com", "link2.com"],
-        is_featured=True,
-        num_agents=5,
-        agent_runs=1000,
+        avatar_url="avatar.jpg",
        agent_rating=4.8,
+        agent_runs=1000,
        top_categories=["category1", "category2"],
    )
-    mock_db_call = mocker.patch("backend.api.features.store.db.get_store_creator")
+    mock_db_call = mocker.patch(
+        "backend.api.features.store.db.get_store_creator_details"
+    )
    mock_db_call.return_value = mocked_value

-    response = client.get("/creators/creator1")
+    response = client.get("/creator/creator1")
    assert response.status_code == 200

    data = store_model.CreatorDetails.model_validate(response.json())
@@ -534,26 +528,19 @@ def test_get_submissions_success(
        submissions=[
            store_model.StoreSubmission(
                listing_id="test-listing-id",
-                user_id="test-user-id",
-                slug="test-agent",
-                listing_version_id="test-version-id",
-                listing_version=1,
-                graph_id="test-agent-id",
-                graph_version=1,
                name="Test Agent",
-                sub_heading="Test agent subheading",
                description="Test agent description",
-                instructions="Click the button!",
-                categories=["test-category"],
                image_urls=["test.jpg"],
-                video_url="test.mp4",
-                agent_output_demo_url="demo_video.mp4",
-                submitted_at=FIXED_NOW,
-                changes_summary="Initial Submission",
+                date_submitted=FIXED_NOW,
                status=prisma.enums.SubmissionStatus.APPROVED,
-                run_count=50,
-                review_count=5,
-                review_avg_rating=4.2,
+                runs=50,
+                rating=4.2,
+                agent_id="test-agent-id",
+                agent_version=1,
+                sub_heading="Test agent subheading",
+                slug="test-agent",
+                video_url="test.mp4",
+                categories=["test-category"],
            )
        ],
        pagination=store_model.Pagination(
--- a/autogpt_platform/backend/backend/api/features/store/test_cache_delete.py
+++ b/autogpt_platform/backend/backend/api/features/store/test_cache_delete.py
@@ -11,7 +11,6 @@ import pytest
 from backend.util.models import Pagination

 from . import cache as store_cache
-from .db import StoreAgentsSortOptions
 from .model import StoreAgent, StoreAgentsResponse


@@ -216,7 +215,7 @@ class TestCacheDeletion:
            await store_cache._get_cached_store_agents(
                featured=True,
                creator="testuser",
-                sorted_by=StoreAgentsSortOptions.RATING,
+                sorted_by="rating",
                search_query="AI assistant",
                category="productivity",
                page=2,
@@ -228,7 +227,7 @@ class TestCacheDeletion:
            deleted = store_cache._get_cached_store_agents.cache_delete(
                featured=True,
                creator="testuser",
-                sorted_by=StoreAgentsSortOptions.RATING,
+                sorted_by="rating",
                search_query="AI assistant",
                category="productivity",
                page=2,
@@ -240,7 +239,7 @@ class TestCacheDeletion:
            deleted = store_cache._get_cached_store_agents.cache_delete(
                featured=True,
                creator="testuser",
-                sorted_by=StoreAgentsSortOptions.RATING,
+                sorted_by="rating",
                search_query="AI assistant",
                category="productivity",
                page=2,
--- a/autogpt_platform/backend/backend/api/features/v1.py
+++ b/autogpt_platform/backend/backend/api/features/v1.py
@@ -55,7 +55,6 @@ from backend.data.credit import (
    set_auto_top_up,
 )
 from backend.data.graph import GraphSettings
-from backend.data.invited_user import get_or_activate_user
 from backend.data.model import CredentialsMetaInput, UserOnboarding
 from backend.data.notifications import NotificationPreference, NotificationPreferenceDTO
 from backend.data.onboarding import (
@@ -71,6 +70,7 @@ from backend.data.onboarding import (
    update_user_onboarding,
 )
 from backend.data.user import (
+    get_or_create_user,
    get_user_by_id,
    get_user_notification_preference,
    update_user_email,
@@ -126,9 +126,6 @@ v1_router = APIRouter()
 ########################################################


-_tally_background_tasks: set[asyncio.Task] = set()
-
-
@v1_router.post(
    "/auth/user",
    summary="Get or create user",
@@ -136,23 +133,7 @@ _tally_background_tasks: set[asyncio.Task] = set()
    dependencies=[Security(requires_user)],
 )
 async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
-    user = await get_or_activate_user(user_data)
-
-    # Fire-and-forget: backfill Tally understanding when invite pre-seeding did
-    # not produce a stored result before first activation.
-    age_seconds = (datetime.now(timezone.utc) - user.created_at).total_seconds()
-    if age_seconds < 30:
-        try:
-            from backend.data.tally import populate_understanding_from_tally
-
-            task = asyncio.create_task(
-                populate_understanding_from_tally(user.id, user.email)
-            )
-            _tally_background_tasks.add(task)
-            task.add_done_callback(_tally_background_tasks.discard)
-        except Exception:
-            logger.debug("Failed to start Tally population task", exc_info=True)
-
+    user = await get_or_create_user(user_data)
    return user.model_dump()


@@ -163,11 +144,8 @@ async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
    dependencies=[Security(requires_user)],
 )
 async def update_user_email_route(
-    user_id: Annotated[str, Security(get_user_id)],
-    email: str = Body(...),
-    user_data: dict = Security(get_jwt_payload),
+    user_id: Annotated[str, Security(get_user_id)], email: str = Body(...)
 ) -> dict[str, str]:
-    await get_or_activate_user(user_data)
    await update_user_email(user_id, email)

    return {"email": email}
@@ -183,7 +161,7 @@ async def get_user_timezone_route(
    user_data: dict = Security(get_jwt_payload),
 ) -> TimezoneResponse:
    """Get user timezone setting."""
-    user = await get_or_activate_user(user_data)
+    user = await get_or_create_user(user_data)
    return TimezoneResponse(timezone=user.timezone)


@@ -194,12 +172,9 @@ async def get_user_timezone_route(
    dependencies=[Security(requires_user)],
 )
 async def update_user_timezone_route(
-    user_id: Annotated[str, Security(get_user_id)],
-    request: UpdateTimezoneRequest,
-    user_data: dict = Security(get_jwt_payload),
+    user_id: Annotated[str, Security(get_user_id)], request: UpdateTimezoneRequest
 ) -> TimezoneResponse:
    """Update user timezone. The timezone should be a valid IANA timezone identifier."""
-    await get_or_activate_user(user_data)
    user = await update_user_timezone(user_id, str(request.timezone))
    return TimezoneResponse(timezone=user.timezone)

@@ -212,9 +187,7 @@ async def update_user_timezone_route(
 )
 async def get_preferences(
    user_id: Annotated[str, Security(get_user_id)],
-    user_data: dict = Security(get_jwt_payload),
 ) -> NotificationPreference:
-    await get_or_activate_user(user_data)
    preferences = await get_user_notification_preference(user_id)
    return preferences

@@ -228,9 +201,7 @@ async def get_preferences(
 async def update_preferences(
    user_id: Annotated[str, Security(get_user_id)],
    preferences: NotificationPreferenceDTO = Body(...),
-    user_data: dict = Security(get_jwt_payload),
 ) -> NotificationPreference:
-    await get_or_activate_user(user_data)
    output = await update_user_notification_preference(user_id, preferences)
    return output

@@ -457,6 +428,7 @@ async def execute_graph_block(
 async def upload_file(
    user_id: Annotated[str, Security(get_user_id)],
    file: UploadFile = File(...),
+    provider: str = "gcs",
    expiration_hours: int = 24,
 ) -> UploadFileResponse:
    """
@@ -519,6 +491,7 @@ async def upload_file(
    storage_path = await cloud_storage.store_file(
        content=content,
        filename=file_name,
+        provider=provider,
        expiration_hours=expiration_hours,
        user_id=user_id,
    )
--- a/autogpt_platform/backend/backend/api/features/v1_test.py
+++ b/autogpt_platform/backend/backend/api/features/v1_test.py
@@ -1,5 +1,5 @@
 import json
-from datetime import datetime, timezone
+from datetime import datetime
 from io import BytesIO
 from unittest.mock import AsyncMock, Mock, patch

@@ -43,7 +43,6 @@ def test_get_or_create_user_route(
 ) -> None:
    """Test get or create user endpoint"""
    mock_user = Mock()
-    mock_user.created_at = datetime.now(timezone.utc)
    mock_user.model_dump.return_value = {
        "id": test_user_id,
        "email": "test@example.com",
@@ -51,7 +50,7 @@ def test_get_or_create_user_route(
    }

    mocker.patch(
-        "backend.api.features.v1.get_or_activate_user",
+        "backend.api.features.v1.get_or_create_user",
        return_value=mock_user,
    )

@@ -515,6 +514,7 @@ async def test_upload_file_success(test_user_id: str):
        result = await upload_file(
            file=upload_file_mock,
            user_id=test_user_id,
+            provider="gcs",
            expiration_hours=24,
        )

@@ -532,6 +532,7 @@ async def test_upload_file_success(test_user_id: str):
        mock_handler.store_file.assert_called_once_with(
            content=file_content,
            filename="test.txt",
+            provider="gcs",
            expiration_hours=24,
            user_id=test_user_id,
        )
--- a/autogpt_platform/backend/backend/api/features/workspace/routes.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes.py
@@ -3,29 +3,15 @@ Workspace API routes for managing user file storage.
 """

 import logging
-import os
 import re
 from typing import Annotated
 from urllib.parse import quote

 import fastapi
 from autogpt_libs.auth.dependencies import get_user_id, requires_user
-from fastapi import Query, UploadFile
 from fastapi.responses import Response
-from pydantic import BaseModel

-from backend.data.workspace import (
-    WorkspaceFile,
-    count_workspace_files,
-    get_or_create_workspace,
-    get_workspace,
-    get_workspace_file,
-    get_workspace_total_size,
-    soft_delete_workspace_file,
-)
-from backend.util.settings import Config
-from backend.util.virus_scanner import scan_content_safe
-from backend.util.workspace import WorkspaceManager
+from backend.data.workspace import WorkspaceFile, get_workspace, get_workspace_file
 from backend.util.workspace_storage import get_workspace_storage


@@ -112,25 +98,6 @@ async def _create_file_download_response(file: WorkspaceFile) -> Response:
            raise


-class UploadFileResponse(BaseModel):
-    file_id: str
-    name: str
-    path: str
-    mime_type: str
-    size_bytes: int
-
-
-class DeleteFileResponse(BaseModel):
-    deleted: bool
-
-
-class StorageUsageResponse(BaseModel):
-    used_bytes: int
-    limit_bytes: int
-    used_percent: float
-    file_count: int
-
-
@router.get(
    "/files/{file_id}/download",
    summary="Download file by ID",
@@ -153,148 +120,3 @@ async def download_file(
        raise fastapi.HTTPException(status_code=404, detail="File not found")

    return await _create_file_download_response(file)
-
-
-@router.delete(
-    "/files/{file_id}",
-    summary="Delete a workspace file",
-)
-async def delete_workspace_file(
-    user_id: Annotated[str, fastapi.Security(get_user_id)],
-    file_id: str,
-) -> DeleteFileResponse:
-    """
-    Soft-delete a workspace file and attempt to remove it from storage.
-
-    Used when a user clears a file input in the builder.
-    """
-    workspace = await get_workspace(user_id)
-    if workspace is None:
-        raise fastapi.HTTPException(status_code=404, detail="Workspace not found")
-
-    manager = WorkspaceManager(user_id, workspace.id)
-    deleted = await manager.delete_file(file_id)
-    if not deleted:
-        raise fastapi.HTTPException(status_code=404, detail="File not found")
-
-    return DeleteFileResponse(deleted=True)
-
-
-@router.post(
-    "/files/upload",
-    summary="Upload file to workspace",
-)
-async def upload_file(
-    user_id: Annotated[str, fastapi.Security(get_user_id)],
-    file: UploadFile,
-    session_id: str | None = Query(default=None),
-) -> UploadFileResponse:
-    """
-    Upload a file to the user's workspace.
-
-    Files are stored in session-scoped paths when session_id is provided,
-    so the agent's session-scoped tools can discover them automatically.
-    """
-    config = Config()
-
-    # Sanitize filename — strip any directory components
-    filename = os.path.basename(file.filename or "upload") or "upload"
-
-    # Read file content with early abort on size limit
-    max_file_bytes = config.max_file_size_mb * 1024 * 1024
-    chunks: list[bytes] = []
-    total_size = 0
-    while chunk := await file.read(64 * 1024):  # 64KB chunks
-        total_size += len(chunk)
-        if total_size > max_file_bytes:
-            raise fastapi.HTTPException(
-                status_code=413,
-                detail=f"File exceeds maximum size of {config.max_file_size_mb} MB",
-            )
-        chunks.append(chunk)
-    content = b"".join(chunks)
-
-    # Get or create workspace
-    workspace = await get_or_create_workspace(user_id)
-
-    # Pre-write storage cap check (soft check — final enforcement is post-write)
-    storage_limit_bytes = config.max_workspace_storage_mb * 1024 * 1024
-    current_usage = await get_workspace_total_size(workspace.id)
-    if storage_limit_bytes and current_usage + len(content) > storage_limit_bytes:
-        used_percent = (current_usage / storage_limit_bytes) * 100
-        raise fastapi.HTTPException(
-            status_code=413,
-            detail={
-                "message": "Storage limit exceeded",
-                "used_bytes": current_usage,
-                "limit_bytes": storage_limit_bytes,
-                "used_percent": round(used_percent, 1),
-            },
-        )
-
-    # Warn at 80% usage
-    if (
-        storage_limit_bytes
-        and (usage_ratio := (current_usage + len(content)) / storage_limit_bytes) >= 0.8
-    ):
-        logger.warning(
-            f"User {user_id} workspace storage at {usage_ratio * 100:.1f}% "
-            f"({current_usage + len(content)} / {storage_limit_bytes} bytes)"
-        )
-
-    # Virus scan
-    await scan_content_safe(content, filename=filename)
-
-    # Write file via WorkspaceManager
-    manager = WorkspaceManager(user_id, workspace.id, session_id)
-    try:
-        workspace_file = await manager.write_file(content, filename)
-    except ValueError as e:
-        raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
-
-    # Post-write storage check — eliminates TOCTOU race on the quota.
-    # If a concurrent upload pushed us over the limit, undo this write.
-    new_total = await get_workspace_total_size(workspace.id)
-    if storage_limit_bytes and new_total > storage_limit_bytes:
-        await soft_delete_workspace_file(workspace_file.id, workspace.id)
-        raise fastapi.HTTPException(
-            status_code=413,
-            detail={
-                "message": "Storage limit exceeded (concurrent upload)",
-                "used_bytes": new_total,
-                "limit_bytes": storage_limit_bytes,
-            },
-        )
-
-    return UploadFileResponse(
-        file_id=workspace_file.id,
-        name=workspace_file.name,
-        path=workspace_file.path,
-        mime_type=workspace_file.mime_type,
-        size_bytes=workspace_file.size_bytes,
-    )
-
-
-@router.get(
-    "/storage/usage",
-    summary="Get workspace storage usage",
-)
-async def get_storage_usage(
-    user_id: Annotated[str, fastapi.Security(get_user_id)],
-) -> StorageUsageResponse:
-    """
-    Get storage usage information for the user's workspace.
-    """
-    config = Config()
-    workspace = await get_or_create_workspace(user_id)
-
-    used_bytes = await get_workspace_total_size(workspace.id)
-    file_count = await count_workspace_files(workspace.id)
-    limit_bytes = config.max_workspace_storage_mb * 1024 * 1024
-
-    return StorageUsageResponse(
-        used_bytes=used_bytes,
-        limit_bytes=limit_bytes,
-        used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
-        file_count=file_count,
-    )
--- a/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
@@ -1,359 +0,0 @@
-"""Tests for workspace file upload and download routes."""
-
-import io
-from datetime import datetime, timezone
-
-import fastapi
-import fastapi.testclient
-import pytest
-import pytest_mock
-
-from backend.api.features.workspace import routes as workspace_routes
-from backend.data.workspace import WorkspaceFile
-
-app = fastapi.FastAPI()
-app.include_router(workspace_routes.router)
-
-
-@app.exception_handler(ValueError)
-async def _value_error_handler(
-    request: fastapi.Request, exc: ValueError
-) -> fastapi.responses.JSONResponse:
-    """Mirror the production ValueError → 400 mapping from rest_api.py."""
-    return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})
-
-
-client = fastapi.testclient.TestClient(app)
-
-TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
-
-MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
-
-_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
-
-MOCK_FILE = WorkspaceFile(
-    id="file-aaa-bbb",
-    workspace_id="ws-1",
-    created_at=_NOW,
-    updated_at=_NOW,
-    name="hello.txt",
-    path="/session/hello.txt",
-    mime_type="text/plain",
-    size_bytes=13,
-    storage_path="local://hello.txt",
-)
-
-
-@pytest.fixture(autouse=True)
-def setup_app_auth(mock_jwt_user):
-    from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def _upload(
-    filename: str = "hello.txt",
-    content: bytes = b"Hello, world!",
-    content_type: str = "text/plain",
-):
-    """Helper to POST a file upload."""
-    return client.post(
-        "/files/upload?session_id=sess-1",
-        files={"file": (filename, io.BytesIO(content), content_type)},
-    )
-
-
-# ---- Happy path ----
-
-
-def test_upload_happy_path(mocker: pytest_mock.MockFixture):
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace_total_size",
-        return_value=0,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.scan_content_safe",
-        return_value=None,
-    )
-    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
-    mocker.patch(
-        "backend.api.features.workspace.routes.WorkspaceManager",
-        return_value=mock_manager,
-    )
-
-    response = _upload()
-    assert response.status_code == 200
-    data = response.json()
-    assert data["file_id"] == "file-aaa-bbb"
-    assert data["name"] == "hello.txt"
-    assert data["size_bytes"] == 13
-
-
-# ---- Per-file size limit ----
-
-
-def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
-    """Files larger than max_file_size_mb should be rejected with 413."""
-    cfg = mocker.patch("backend.api.features.workspace.routes.Config")
-    cfg.return_value.max_file_size_mb = 0  # 0 MB → any content is too big
-    cfg.return_value.max_workspace_storage_mb = 500
-
-    response = _upload(content=b"x" * 1024)
-    assert response.status_code == 413
-
-
-# ---- Storage quota exceeded ----
-
-
-def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    # Current usage already at limit
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace_total_size",
-        return_value=500 * 1024 * 1024,
-    )
-
-    response = _upload()
-    assert response.status_code == 413
-    assert "Storage limit exceeded" in response.text
-
-
-# ---- Post-write quota race (B2) ----
-
-
-def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
-    """If a concurrent upload tips the total over the limit after write,
-    the file should be soft-deleted and 413 returned."""
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    # Pre-write check passes (under limit), but post-write check fails
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace_total_size",
-        side_effect=[0, 600 * 1024 * 1024],  # first call OK, second over limit
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.scan_content_safe",
-        return_value=None,
-    )
-    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
-    mocker.patch(
-        "backend.api.features.workspace.routes.WorkspaceManager",
-        return_value=mock_manager,
-    )
-    mock_delete = mocker.patch(
-        "backend.api.features.workspace.routes.soft_delete_workspace_file",
-        return_value=None,
-    )
-
-    response = _upload()
-    assert response.status_code == 413
-    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
-
-
-# ---- Any extension accepted (no allowlist) ----
-
-
-def test_upload_any_extension(mocker: pytest_mock.MockFixture):
-    """Any file extension should be accepted — ClamAV is the security layer."""
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace_total_size",
-        return_value=0,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.scan_content_safe",
-        return_value=None,
-    )
-    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
-    mocker.patch(
-        "backend.api.features.workspace.routes.WorkspaceManager",
-        return_value=mock_manager,
-    )
-
-    response = _upload(filename="data.xyz", content=b"arbitrary")
-    assert response.status_code == 200
-
-
-# ---- Virus scan rejection ----
-
-
-def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
-    """Files flagged by ClamAV should be rejected and never written to storage."""
-    from backend.api.features.store.exceptions import VirusDetectedError
-
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace_total_size",
-        return_value=0,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.scan_content_safe",
-        side_effect=VirusDetectedError("Eicar-Test-Signature"),
-    )
-    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
-    mocker.patch(
-        "backend.api.features.workspace.routes.WorkspaceManager",
-        return_value=mock_manager,
-    )
-
-    response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
-    assert response.status_code == 400
-    assert "Virus detected" in response.text
-    mock_manager.write_file.assert_not_called()
-
-
-# ---- No file extension ----
-
-
-def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
-    """Files without an extension should be accepted and stored as-is."""
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace_total_size",
-        return_value=0,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.scan_content_safe",
-        return_value=None,
-    )
-    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
-    mocker.patch(
-        "backend.api.features.workspace.routes.WorkspaceManager",
-        return_value=mock_manager,
-    )
-
-    response = _upload(
-        filename="Makefile",
-        content=b"all:\n\techo hello",
-        content_type="application/octet-stream",
-    )
-    assert response.status_code == 200
-    mock_manager.write_file.assert_called_once()
-    assert mock_manager.write_file.call_args[0][1] == "Makefile"
-
-
-# ---- Filename sanitization (SF5) ----
-
-
-def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
-    """Path-traversal filenames should be reduced to their basename."""
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace_total_size",
-        return_value=0,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.scan_content_safe",
-        return_value=None,
-    )
-    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
-    mocker.patch(
-        "backend.api.features.workspace.routes.WorkspaceManager",
-        return_value=mock_manager,
-    )
-
-    # Filename with traversal
-    _upload(filename="../../etc/passwd.txt")
-
-    # write_file should have been called with just the basename
-    mock_manager.write_file.assert_called_once()
-    call_args = mock_manager.write_file.call_args
-    assert call_args[0][1] == "passwd.txt"
-
-
-# ---- Download ----
-
-
-def test_download_file_not_found(mocker: pytest_mock.MockFixture):
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace_file",
-        return_value=None,
-    )
-
-    response = client.get("/files/some-file-id/download")
-    assert response.status_code == 404
-
-
-# ---- Delete ----
-
-
-def test_delete_file_success(mocker: pytest_mock.MockFixture):
-    """Deleting an existing file should return {"deleted": true}."""
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    mock_manager = mocker.MagicMock()
-    mock_manager.delete_file = mocker.AsyncMock(return_value=True)
-    mocker.patch(
-        "backend.api.features.workspace.routes.WorkspaceManager",
-        return_value=mock_manager,
-    )
-
-    response = client.delete("/files/file-aaa-bbb")
-    assert response.status_code == 200
-    assert response.json() == {"deleted": True}
-    mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")
-
-
-def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
-    """Deleting a non-existent file should return 404."""
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
-    )
-    mock_manager = mocker.MagicMock()
-    mock_manager.delete_file = mocker.AsyncMock(return_value=False)
-    mocker.patch(
-        "backend.api.features.workspace.routes.WorkspaceManager",
-        return_value=mock_manager,
-    )
-
-    response = client.delete("/files/nonexistent-id")
-    assert response.status_code == 404
-    assert "File not found" in response.text
-
-
-def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
-    """Deleting when user has no workspace should return 404."""
-    mocker.patch(
-        "backend.api.features.workspace.routes.get_workspace",
-        return_value=None,
-    )
-
-    response = client.delete("/files/file-aaa-bbb")
-    assert response.status_code == 404
-    assert "Workspace not found" in response.text
--- a/autogpt_platform/backend/backend/api/rest_api.py
+++ b/autogpt_platform/backend/backend/api/rest_api.py
@@ -19,7 +19,6 @@ from prisma.errors import PrismaError
 import backend.api.features.admin.credit_admin_routes
 import backend.api.features.admin.execution_analytics_routes
 import backend.api.features.admin.store_admin_routes
-import backend.api.features.admin.user_admin_routes
 import backend.api.features.builder
 import backend.api.features.builder.routes
 import backend.api.features.chat.routes as chat_routes
@@ -42,11 +41,11 @@ import backend.data.user
 import backend.integrations.webhooks.utils
 import backend.util.service
 import backend.util.settings
-from backend.api.features.library.exceptions import (
-    FolderAlreadyExistsError,
-    FolderValidationError,
-)
 from backend.blocks.llm import DEFAULT_LLM_MODEL
+from backend.copilot.completion_consumer import (
+    start_completion_consumer,
+    stop_completion_consumer,
+)
 from backend.data.model import Credentials
 from backend.integrations.providers import ProviderName
 from backend.monitoring.instrumentation import instrument_fastapi
@@ -56,7 +55,6 @@ from backend.util.exceptions import (
    MissingConfigError,
    NotAuthorizedError,
    NotFoundError,
-    PreconditionFailed,
 )
 from backend.util.feature_flag import initialize_launchdarkly, shutdown_launchdarkly
 from backend.util.service import UnhealthyServiceError
@@ -125,9 +123,21 @@ async def lifespan_context(app: fastapi.FastAPI):
    await backend.data.graph.migrate_llm_models(DEFAULT_LLM_MODEL)
    await backend.integrations.webhooks.utils.migrate_legacy_triggered_graphs()

+    # Start chat completion consumer for Redis Streams notifications
+    try:
+        await start_completion_consumer()
+    except Exception as e:
+        logger.warning(f"Could not start chat completion consumer: {e}")
+
    with launch_darkly_context():
        yield

+    # Stop chat completion consumer
+    try:
+        await stop_completion_consumer()
+    except Exception as e:
+        logger.warning(f"Error stopping chat completion consumer: {e}")
+
    try:
        await shutdown_cloud_storage_handler()
    except Exception as e:
@@ -267,17 +277,12 @@ async def validation_error_handler(


 app.add_exception_handler(PrismaError, handle_internal_http_error(500))
-app.add_exception_handler(
-    FolderAlreadyExistsError, handle_internal_http_error(409, False)
-)
-app.add_exception_handler(FolderValidationError, handle_internal_http_error(400, False))
 app.add_exception_handler(NotFoundError, handle_internal_http_error(404, False))
 app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403, False))
 app.add_exception_handler(RequestValidationError, validation_error_handler)
 app.add_exception_handler(pydantic.ValidationError, validation_error_handler)
 app.add_exception_handler(MissingConfigError, handle_internal_http_error(503))
 app.add_exception_handler(ValueError, handle_internal_http_error(400))
-app.add_exception_handler(PreconditionFailed, handle_internal_http_error(428))
 app.add_exception_handler(Exception, handle_internal_http_error(500))

 app.include_router(backend.api.features.v1.v1_router, tags=["v1"], prefix="/api")
@@ -312,11 +317,6 @@ app.include_router(
    tags=["v2", "admin"],
    prefix="/api/executions",
 )
-app.include_router(
-    backend.api.features.admin.user_admin_routes.router,
-    tags=["v2", "admin"],
-    prefix="/api/users",
-)
 app.include_router(
    backend.api.features.executions.review.routes.router,
    tags=["v2", "executions", "review"],
--- a/autogpt_platform/backend/backend/app.py
+++ b/autogpt_platform/backend/backend/app.py
@@ -24,7 +24,7 @@ def run_processes(*processes: "AppProcess", **kwargs):
        # Run the last process in the foreground.
        processes[-1].start(background=False, **kwargs)
    finally:
-        for process in reversed(processes):
+        for process in processes:
            try:
                process.stop()
            except Exception as e:
--- a/autogpt_platform/backend/backend/blocks/llm.py
+++ b/autogpt_platform/backend/backend/blocks/llm.py
@@ -116,7 +116,6 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
    CLAUDE_4_5_SONNET = "claude-sonnet-4-5-20250929"
    CLAUDE_4_5_HAIKU = "claude-haiku-4-5-20251001"
    CLAUDE_4_6_OPUS = "claude-opus-4-6"
-    CLAUDE_4_6_SONNET = "claude-sonnet-4-6"
    CLAUDE_3_HAIKU = "claude-3-haiku-20240307"
    # AI/ML API models
    AIML_API_QWEN2_5_72B = "Qwen/Qwen2.5-72B-Instruct-Turbo"
@@ -275,9 +274,6 @@ MODEL_METADATA = {
    LlmModel.CLAUDE_4_6_OPUS: ModelMetadata(
        "anthropic", 200000, 128000, "Claude Opus 4.6", "Anthropic", "Anthropic", 3
    ),  # claude-opus-4-6
-    LlmModel.CLAUDE_4_6_SONNET: ModelMetadata(
-        "anthropic", 200000, 64000, "Claude Sonnet 4.6", "Anthropic", "Anthropic", 3
-    ),  # claude-sonnet-4-6
    LlmModel.CLAUDE_4_5_OPUS: ModelMetadata(
        "anthropic", 200000, 64000, "Claude Opus 4.5", "Anthropic", "Anthropic", 3
    ),  # claude-opus-4-5-20251101
--- a/autogpt_platform/backend/backend/blocks/mcp/block.py
+++ b/autogpt_platform/backend/backend/blocks/mcp/block.py
@@ -6,6 +6,7 @@ and execute them. Works like AgentExecutorBlock — the user selects a tool from
 dropdown and the input/output schema adapts dynamically.
 """

+import json
 import logging
 from typing import Any, Literal

@@ -19,11 +20,6 @@ from backend.blocks._base import (
    BlockType,
 )
 from backend.blocks.mcp.client import MCPClient, MCPClientError
-from backend.blocks.mcp.helpers import (
-    auto_lookup_mcp_credential,
-    normalize_mcp_url,
-    parse_mcp_content,
-)
 from backend.data.block import BlockInput, BlockOutput
 from backend.data.model import (
    CredentialsField,
@@ -183,7 +179,31 @@ class MCPToolBlock(Block):
                f"{error_text or 'Unknown error'}"
            )

-        return parse_mcp_content(result.content)
+        # Extract text content from the result
+        output_parts = []
+        for item in result.content:
+            if item.get("type") == "text":
+                text = item.get("text", "")
+                # Try to parse as JSON for structured output
+                try:
+                    output_parts.append(json.loads(text))
+                except (json.JSONDecodeError, ValueError):
+                    output_parts.append(text)
+            elif item.get("type") == "image":
+                output_parts.append(
+                    {
+                        "type": "image",
+                        "data": item.get("data"),
+                        "mimeType": item.get("mimeType"),
+                    }
+                )
+            elif item.get("type") == "resource":
+                output_parts.append(item.get("resource", {}))
+
+        # If single result, unwrap
+        if len(output_parts) == 1:
+            return output_parts[0]
+        return output_parts if output_parts else None

    @staticmethod
    async def _auto_lookup_credential(
@@ -191,10 +211,37 @@ class MCPToolBlock(Block):
    ) -> "OAuth2Credentials | None":
        """Auto-lookup stored MCP credential for a server URL.

-        Delegates to :func:`~backend.blocks.mcp.helpers.auto_lookup_mcp_credential`.
-        The caller should pass a normalized URL.
+        This is a fallback for nodes that don't have ``credentials`` explicitly
+        set (e.g. nodes created before the credential field was wired up).
        """
-        return await auto_lookup_mcp_credential(user_id, server_url)
+        from backend.integrations.creds_manager import IntegrationCredentialsManager
+        from backend.integrations.providers import ProviderName
+
+        try:
+            mgr = IntegrationCredentialsManager()
+            mcp_creds = await mgr.store.get_creds_by_provider(
+                user_id, ProviderName.MCP.value
+            )
+            best: OAuth2Credentials | None = None
+            for cred in mcp_creds:
+                if (
+                    isinstance(cred, OAuth2Credentials)
+                    and (cred.metadata or {}).get("mcp_server_url") == server_url
+                ):
+                    if best is None or (
+                        (cred.access_token_expires_at or 0)
+                        > (best.access_token_expires_at or 0)
+                    ):
+                        best = cred
+            if best:
+                best = await mgr.refresh_if_needed(user_id, best)
+                logger.info(
+                    "Auto-resolved MCP credential %s for %s", best.id, server_url
+                )
+            return best
+        except Exception:
+            logger.warning("Auto-lookup MCP credential failed", exc_info=True)
+            return None

    async def run(
        self,
@@ -231,7 +278,7 @@ class MCPToolBlock(Block):
        # the stored MCP credential for this server URL.
        if credentials is None:
            credentials = await self._auto_lookup_credential(
-                user_id, normalize_mcp_url(input_data.server_url)
+                user_id, input_data.server_url
            )

        auth_token = (
--- a/autogpt_platform/backend/backend/blocks/mcp/client.py
+++ b/autogpt_platform/backend/backend/blocks/mcp/client.py
@@ -55,9 +55,7 @@ class MCPClient:
        server_url: str,
        auth_token: str | None = None,
    ):
-        from backend.blocks.mcp.helpers import normalize_mcp_url
-
-        self.server_url = normalize_mcp_url(server_url)
+        self.server_url = server_url.rstrip("/")
        self.auth_token = auth_token
        self._request_id = 0
        self._session_id: str | None = None
--- a/autogpt_platform/backend/backend/blocks/mcp/helpers.py
+++ b/autogpt_platform/backend/backend/blocks/mcp/helpers.py
@@ -1,117 +0,0 @@
-"""Shared MCP helpers used by blocks, copilot tools, and API routes."""
-
-from __future__ import annotations
-
-import json
-import logging
-from typing import TYPE_CHECKING, Any
-from urllib.parse import urlparse
-
-if TYPE_CHECKING:
-    from backend.data.model import OAuth2Credentials
-
-logger = logging.getLogger(__name__)
-
-
-def normalize_mcp_url(url: str) -> str:
-    """Normalize an MCP server URL for consistent credential matching.
-
-    Strips leading/trailing whitespace and a single trailing slash so that
-    ``https://mcp.example.com/`` and ``https://mcp.example.com`` resolve to
-    the same stored credential.
-    """
-    return url.strip().rstrip("/")
-
-
-def server_host(server_url: str) -> str:
-    """Extract the hostname from a server URL for display purposes.
-
-    Uses ``parsed.hostname`` (never ``netloc``) to strip any embedded
-    username/password before surfacing the value in UI messages.
-    """
-    try:
-        parsed = urlparse(server_url)
-        return parsed.hostname or server_url
-    except Exception:
-        return server_url
-
-
-def parse_mcp_content(content: list[dict[str, Any]]) -> Any:
-    """Parse MCP tool response content into a plain Python value.
-
-    - text items: parsed as JSON when possible, kept as str otherwise
-    - image items: kept as ``{type, data, mimeType}`` dict for frontend rendering
-    - resource items: unwrapped to their resource payload dict
-
-    Single-item responses are unwrapped from the list; multiple items are
-    returned as a list; empty content returns ``None``.
-    """
-    output_parts: list[Any] = []
-    for item in content:
-        item_type = item.get("type")
-        if item_type == "text":
-            text = item.get("text", "")
-            try:
-                output_parts.append(json.loads(text))
-            except (json.JSONDecodeError, ValueError):
-                output_parts.append(text)
-        elif item_type == "image":
-            output_parts.append(
-                {
-                    "type": "image",
-                    "data": item.get("data"),
-                    "mimeType": item.get("mimeType"),
-                }
-            )
-        elif item_type == "resource":
-            output_parts.append(item.get("resource", {}))
-
-    if len(output_parts) == 1:
-        return output_parts[0]
-    return output_parts or None
-
-
-async def auto_lookup_mcp_credential(
-    user_id: str, server_url: str
-) -> OAuth2Credentials | None:
-    """Look up the best stored MCP credential for *server_url*.
-
-    The caller should pass a **normalized** URL (via :func:`normalize_mcp_url`)
-    so the comparison with ``mcp_server_url`` in credential metadata matches.
-
-    Returns the credential with the latest ``access_token_expires_at``, refreshed
-    if needed, or ``None`` when no match is found.
-    """
-    from backend.data.model import OAuth2Credentials
-    from backend.integrations.creds_manager import IntegrationCredentialsManager
-    from backend.integrations.providers import ProviderName
-
-    try:
-        mgr = IntegrationCredentialsManager()
-        mcp_creds = await mgr.store.get_creds_by_provider(
-            user_id, ProviderName.MCP.value
-        )
-        # Collect all matching credentials and pick the best one.
-        # Primary sort: latest access_token_expires_at (tokens with expiry
-        # are preferred over non-expiring ones).  Secondary sort: last in
-        # iteration order, which corresponds to the most recently created
-        # row — this acts as a tiebreaker when multiple bearer tokens have
-        # no expiry (e.g. after a failed old-credential cleanup).
-        best: OAuth2Credentials | None = None
-        for cred in mcp_creds:
-            if (
-                isinstance(cred, OAuth2Credentials)
-                and (cred.metadata or {}).get("mcp_server_url") == server_url
-            ):
-                if best is None or (
-                    (cred.access_token_expires_at or 0)
-                    >= (best.access_token_expires_at or 0)
-                ):
-                    best = cred
-        if best:
-            best = await mgr.refresh_if_needed(user_id, best)
-            logger.info("Auto-resolved MCP credential %s for %s", best.id, server_url)
-        return best
-    except Exception:
-        logger.warning("Auto-lookup MCP credential failed", exc_info=True)
-        return None
--- a/autogpt_platform/backend/backend/blocks/mcp/test_helpers.py
+++ b/autogpt_platform/backend/backend/blocks/mcp/test_helpers.py
@@ -1,98 +0,0 @@
-"""Unit tests for the shared MCP helpers."""
-
-from backend.blocks.mcp.helpers import normalize_mcp_url, parse_mcp_content, server_host
-
-# ---------------------------------------------------------------------------
-# normalize_mcp_url
-# ---------------------------------------------------------------------------
-
-
-def test_normalize_trailing_slash():
-    assert normalize_mcp_url("https://mcp.example.com/") == "https://mcp.example.com"
-
-
-def test_normalize_whitespace():
-    assert normalize_mcp_url("  https://mcp.example.com  ") == "https://mcp.example.com"
-
-
-def test_normalize_both():
-    assert (
-        normalize_mcp_url("  https://mcp.example.com/  ") == "https://mcp.example.com"
-    )
-
-
-def test_normalize_noop():
-    assert normalize_mcp_url("https://mcp.example.com") == "https://mcp.example.com"
-
-
-def test_normalize_path_with_trailing_slash():
-    assert (
-        normalize_mcp_url("https://mcp.example.com/path/")
-        == "https://mcp.example.com/path"
-    )
-
-
-# ---------------------------------------------------------------------------
-# server_host
-# ---------------------------------------------------------------------------
-
-
-def test_server_host_standard_url():
-    assert server_host("https://mcp.example.com/mcp") == "mcp.example.com"
-
-
-def test_server_host_strips_credentials():
-    """hostname must not expose user:pass."""
-    assert server_host("https://user:secret@mcp.example.com/mcp") == "mcp.example.com"
-
-
-def test_server_host_with_port():
-    """Port should not appear in hostname (hostname strips it)."""
-    assert server_host("https://mcp.example.com:8080/mcp") == "mcp.example.com"
-
-
-def test_server_host_fallback():
-    """Falls back to the raw string for un-parseable URLs."""
-    assert server_host("not-a-url") == "not-a-url"
-
-
-# ---------------------------------------------------------------------------
-# parse_mcp_content
-# ---------------------------------------------------------------------------
-
-
-def test_parse_text_plain():
-    assert parse_mcp_content([{"type": "text", "text": "hello world"}]) == "hello world"
-
-
-def test_parse_text_json():
-    content = [{"type": "text", "text": '{"status": "ok", "count": 42}'}]
-    assert parse_mcp_content(content) == {"status": "ok", "count": 42}
-
-
-def test_parse_image():
-    content = [{"type": "image", "data": "abc123==", "mimeType": "image/png"}]
-    assert parse_mcp_content(content) == {
-        "type": "image",
-        "data": "abc123==",
-        "mimeType": "image/png",
-    }
-
-
-def test_parse_resource():
-    content = [
-        {"type": "resource", "resource": {"uri": "file:///tmp/out.txt", "text": "hi"}}
-    ]
-    assert parse_mcp_content(content) == {"uri": "file:///tmp/out.txt", "text": "hi"}
-
-
-def test_parse_multi_item():
-    content = [
-        {"type": "text", "text": "first"},
-        {"type": "text", "text": "second"},
-    ]
-    assert parse_mcp_content(content) == ["first", "second"]
-
-
-def test_parse_empty():
-    assert parse_mcp_content([]) is None
--- a/autogpt_platform/backend/backend/blocks/stagehand/blocks.py
+++ b/autogpt_platform/backend/backend/blocks/stagehand/blocks.py
@@ -83,8 +83,7 @@ class StagehandRecommendedLlmModel(str, Enum):
    GPT41_MINI = "gpt-4.1-mini-2025-04-14"

    # Anthropic
-    CLAUDE_4_5_SONNET = "claude-sonnet-4-5-20250929"  # Keep for backwards compat
-    CLAUDE_4_6_SONNET = "claude-sonnet-4-6"
+    CLAUDE_4_5_SONNET = "claude-sonnet-4-5-20250929"

    @property
    def provider_name(self) -> str:
@@ -138,7 +137,7 @@ class StagehandObserveBlock(Block):
        model: StagehandRecommendedLlmModel = SchemaField(
            title="LLM Model",
            description="LLM to use for Stagehand (provider is inferred)",
-            default=StagehandRecommendedLlmModel.CLAUDE_4_6_SONNET,
+            default=StagehandRecommendedLlmModel.CLAUDE_4_5_SONNET,
            advanced=False,
        )
        model_credentials: AICredentials = AICredentialsField()
@@ -228,7 +227,7 @@ class StagehandActBlock(Block):
        model: StagehandRecommendedLlmModel = SchemaField(
            title="LLM Model",
            description="LLM to use for Stagehand (provider is inferred)",
-            default=StagehandRecommendedLlmModel.CLAUDE_4_6_SONNET,
+            default=StagehandRecommendedLlmModel.CLAUDE_4_5_SONNET,
            advanced=False,
        )
        model_credentials: AICredentials = AICredentialsField()
@@ -325,7 +324,7 @@ class StagehandExtractBlock(Block):
        model: StagehandRecommendedLlmModel = SchemaField(
            title="LLM Model",
            description="LLM to use for Stagehand (provider is inferred)",
-            default=StagehandRecommendedLlmModel.CLAUDE_4_6_SONNET,
+            default=StagehandRecommendedLlmModel.CLAUDE_4_5_SONNET,
            advanced=False,
        )
        model_credentials: AICredentials = AICredentialsField()
--- a/autogpt_platform/backend/backend/blocks/system/store_operations.py
+++ b/autogpt_platform/backend/backend/blocks/system/store_operations.py
@@ -1,8 +1,8 @@
 import logging
+from typing import Literal

 from pydantic import BaseModel

-from backend.api.features.store.db import StoreAgentsSortOptions
 from backend.blocks._base import (
    Block,
    BlockCategory,
@@ -176,8 +176,8 @@ class SearchStoreAgentsBlock(Block):
        category: str | None = SchemaField(
            description="Filter by category", default=None
        )
-        sort_by: StoreAgentsSortOptions = SchemaField(
-            description="How to sort the results", default=StoreAgentsSortOptions.RATING
+        sort_by: Literal["rating", "runs", "name", "updated_at"] = SchemaField(
+            description="How to sort the results", default="rating"
        )
        limit: int = SchemaField(
            description="Maximum number of results to return", default=10, ge=1, le=100
@@ -278,7 +278,7 @@ class SearchStoreAgentsBlock(Block):
        self,
        query: str | None = None,
        category: str | None = None,
-        sort_by: StoreAgentsSortOptions = StoreAgentsSortOptions.RATING,
+        sort_by: Literal["rating", "runs", "name", "updated_at"] = "rating",
        limit: int = 10,
    ) -> SearchAgentsResponse:
        """
--- a/autogpt_platform/backend/backend/blocks/telegram/_api.py
+++ b/autogpt_platform/backend/backend/blocks/telegram/_api.py
@@ -1,182 +0,0 @@
-"""
-Telegram Bot API helper functions.
-
-Provides utilities for making authenticated requests to the Telegram Bot API.
-"""
-
-import logging
-from io import BytesIO
-from typing import Any, Optional
-
-from pydantic import BaseModel
-
-from backend.data.model import APIKeyCredentials
-from backend.util.request import Requests
-
-logger = logging.getLogger(__name__)
-
-TELEGRAM_API_BASE = "https://api.telegram.org"
-
-
-class TelegramMessageResult(BaseModel, extra="allow"):
-    """Result from Telegram send/edit message API calls."""
-
-    message_id: int = 0
-    chat: dict[str, Any] = {}
-    date: int = 0
-    text: str = ""
-
-
-class TelegramFileResult(BaseModel, extra="allow"):
-    """Result from Telegram getFile API call."""
-
-    file_id: str = ""
-    file_unique_id: str = ""
-    file_size: int = 0
-    file_path: str = ""
-
-
-class TelegramAPIException(ValueError):
-    """Exception raised for Telegram API errors."""
-
-    def __init__(self, message: str, error_code: int = 0):
-        super().__init__(message)
-        self.error_code = error_code
-
-
-def get_bot_api_url(bot_token: str, method: str) -> str:
-    """Construct Telegram Bot API URL for a method."""
-    return f"{TELEGRAM_API_BASE}/bot{bot_token}/{method}"
-
-
-def get_file_url(bot_token: str, file_path: str) -> str:
-    """Construct Telegram file download URL."""
-    return f"{TELEGRAM_API_BASE}/file/bot{bot_token}/{file_path}"
-
-
-async def call_telegram_api(
-    credentials: APIKeyCredentials,
-    method: str,
-    data: Optional[dict[str, Any]] = None,
-) -> TelegramMessageResult:
-    """
-    Make a request to the Telegram Bot API.
-
-    Args:
-        credentials: Bot token credentials
-        method: API method name (e.g., "sendMessage", "getFile")
-        data: Request parameters
-
-    Returns:
-        API response result
-
-    Raises:
-        TelegramAPIException: If the API returns an error
-    """
-    token = credentials.api_key.get_secret_value()
-    url = get_bot_api_url(token, method)
-
-    response = await Requests().post(url, json=data or {})
-    result = response.json()
-
-    if not result.get("ok"):
-        error_code = result.get("error_code", 0)
-        description = result.get("description", "Unknown error")
-        raise TelegramAPIException(description, error_code)
-
-    return TelegramMessageResult(**result.get("result", {}))
-
-
-async def call_telegram_api_with_file(
-    credentials: APIKeyCredentials,
-    method: str,
-    file_field: str,
-    file_data: bytes,
-    filename: str,
-    content_type: str,
-    data: Optional[dict[str, Any]] = None,
-) -> TelegramMessageResult:
-    """
-    Make a multipart/form-data request to the Telegram Bot API with a file upload.
-
-    Args:
-        credentials: Bot token credentials
-        method: API method name (e.g., "sendPhoto", "sendVoice")
-        file_field: Form field name for the file (e.g., "photo", "voice")
-        file_data: Raw file bytes
-        filename: Filename for the upload
-        content_type: MIME type of the file
-        data: Additional form parameters
-
-    Returns:
-        API response result
-
-    Raises:
-        TelegramAPIException: If the API returns an error
-    """
-    token = credentials.api_key.get_secret_value()
-    url = get_bot_api_url(token, method)
-
-    files = [(file_field, (filename, BytesIO(file_data), content_type))]
-
-    response = await Requests().post(url, files=files, data=data or {})
-    result = response.json()
-
-    if not result.get("ok"):
-        error_code = result.get("error_code", 0)
-        description = result.get("description", "Unknown error")
-        raise TelegramAPIException(description, error_code)
-
-    return TelegramMessageResult(**result.get("result", {}))
-
-
-async def get_file_info(
-    credentials: APIKeyCredentials, file_id: str
-) -> TelegramFileResult:
-    """
-    Get file information from Telegram.
-
-    Args:
-        credentials: Bot token credentials
-        file_id: Telegram file_id from message
-
-    Returns:
-        File info dict containing file_id, file_unique_id, file_size, file_path
-    """
-    result = await call_telegram_api(credentials, "getFile", {"file_id": file_id})
-    return TelegramFileResult(**result.model_dump())
-
-
-async def get_file_download_url(credentials: APIKeyCredentials, file_id: str) -> str:
-    """
-    Get the download URL for a Telegram file.
-
-    Args:
-        credentials: Bot token credentials
-        file_id: Telegram file_id from message
-
-    Returns:
-        Full download URL
-    """
-    token = credentials.api_key.get_secret_value()
-    result = await get_file_info(credentials, file_id)
-    file_path = result.file_path
-    if not file_path:
-        raise TelegramAPIException("No file_path returned from getFile")
-    return get_file_url(token, file_path)
-
-
-async def download_telegram_file(credentials: APIKeyCredentials, file_id: str) -> bytes:
-    """
-    Download a file from Telegram servers.
-
-    Args:
-        credentials: Bot token credentials
-        file_id: Telegram file_id
-
-    Returns:
-        File content as bytes
-    """
-    url = await get_file_download_url(credentials, file_id)
-    response = await Requests().get(url)
-    return response.content
--- a/autogpt_platform/backend/backend/blocks/telegram/_auth.py
+++ b/autogpt_platform/backend/backend/blocks/telegram/_auth.py
@@ -1,43 +0,0 @@
-"""
-Telegram Bot credentials handling.
-
-Telegram bots use an API key (bot token) obtained from @BotFather.
-"""
-
-from typing import Literal
-
-from pydantic import SecretStr
-
-from backend.data.model import APIKeyCredentials, CredentialsField, CredentialsMetaInput
-from backend.integrations.providers import ProviderName
-
-# Bot token credentials (API key style)
-TelegramCredentials = APIKeyCredentials
-TelegramCredentialsInput = CredentialsMetaInput[
-    Literal[ProviderName.TELEGRAM], Literal["api_key"]
-]
-
-
-def TelegramCredentialsField() -> TelegramCredentialsInput:
-    """Creates a Telegram bot token credentials field."""
-    return CredentialsField(
-        description="Telegram Bot API token from @BotFather. "
-        "Create a bot at https://t.me/BotFather to get your token."
-    )
-
-
-# Test credentials for unit tests
-TEST_CREDENTIALS = APIKeyCredentials(
-    id="01234567-89ab-cdef-0123-456789abcdef",
-    provider="telegram",
-    api_key=SecretStr("test_telegram_bot_token"),
-    title="Mock Telegram Bot Token",
-    expires_at=None,
-)
-
-TEST_CREDENTIALS_INPUT = {
-    "provider": TEST_CREDENTIALS.provider,
-    "id": TEST_CREDENTIALS.id,
-    "type": TEST_CREDENTIALS.type,
-    "title": TEST_CREDENTIALS.title,
-}
--- a/autogpt_platform/backend/backend/blocks/telegram/blocks.py
+++ b/autogpt_platform/backend/backend/blocks/telegram/blocks.py
--- a/autogpt_platform/backend/backend/blocks/telegram/triggers.py
+++ b/autogpt_platform/backend/backend/blocks/telegram/triggers.py
@@ -1,377 +0,0 @@
-"""
-Telegram trigger blocks for receiving messages via webhooks.
-"""
-
-import logging
-
-from pydantic import BaseModel
-
-from backend.blocks._base import (
-    Block,
-    BlockCategory,
-    BlockOutput,
-    BlockSchemaInput,
-    BlockSchemaOutput,
-    BlockWebhookConfig,
-)
-from backend.data.model import SchemaField
-from backend.integrations.providers import ProviderName
-from backend.integrations.webhooks.telegram import TelegramWebhookType
-
-from ._auth import (
-    TEST_CREDENTIALS,
-    TEST_CREDENTIALS_INPUT,
-    TelegramCredentialsField,
-    TelegramCredentialsInput,
-)
-
-logger = logging.getLogger(__name__)
-
-
-# Example payload for testing
-EXAMPLE_MESSAGE_PAYLOAD = {
-    "update_id": 123456789,
-    "message": {
-        "message_id": 1,
-        "from": {
-            "id": 12345678,
-            "is_bot": False,
-            "first_name": "John",
-            "last_name": "Doe",
-            "username": "johndoe",
-            "language_code": "en",
-        },
-        "chat": {
-            "id": 12345678,
-            "first_name": "John",
-            "last_name": "Doe",
-            "username": "johndoe",
-            "type": "private",
-        },
-        "date": 1234567890,
-        "text": "Hello, bot!",
-    },
-}
-
-
-class TelegramTriggerBase:
-    """Base class for Telegram trigger blocks."""
-
-    class Input(BlockSchemaInput):
-        credentials: TelegramCredentialsInput = TelegramCredentialsField()
-        payload: dict = SchemaField(hidden=True, default_factory=dict)
-
-
-class TelegramMessageTriggerBlock(TelegramTriggerBase, Block):
-    """
-    Triggers when a message is received or edited in your Telegram bot.
-
-    Supports text, photos, voice messages, audio files, documents, and videos.
-    Connect the outputs to other blocks to process messages and send responses.
-    """
-
-    class Input(TelegramTriggerBase.Input):
-        class EventsFilter(BaseModel):
-            """Filter for message types to receive."""
-
-            text: bool = True
-            photo: bool = False
-            voice: bool = False
-            audio: bool = False
-            document: bool = False
-            video: bool = False
-            edited_message: bool = False
-
-        events: EventsFilter = SchemaField(
-            title="Message Types", description="Types of messages to receive"
-        )
-
-    class Output(BlockSchemaOutput):
-        payload: dict = SchemaField(
-            description="The complete webhook payload from Telegram"
-        )
-        chat_id: int = SchemaField(
-            description="The chat ID where the message was received. "
-            "Use this to send replies."
-        )
-        message_id: int = SchemaField(description="The unique message ID")
-        user_id: int = SchemaField(description="The user ID who sent the message")
-        username: str = SchemaField(description="Username of the sender (may be empty)")
-        first_name: str = SchemaField(description="First name of the sender")
-        event: str = SchemaField(
-            description="The message type (text, photo, voice, audio, etc.)"
-        )
-        text: str = SchemaField(
-            description="Text content of the message (for text messages)"
-        )
-        photo_file_id: str = SchemaField(
-            description="File ID of the photo (for photo messages). "
-            "Use GetTelegramFileBlock to download."
-        )
-        voice_file_id: str = SchemaField(
-            description="File ID of the voice message (for voice messages). "
-            "Use GetTelegramFileBlock to download."
-        )
-        audio_file_id: str = SchemaField(
-            description="File ID of the audio file (for audio messages). "
-            "Use GetTelegramFileBlock to download."
-        )
-        file_id: str = SchemaField(
-            description="File ID for document/video messages. "
-            "Use GetTelegramFileBlock to download."
-        )
-        file_name: str = SchemaField(
-            description="Original filename (for document/audio messages)"
-        )
-        caption: str = SchemaField(description="Caption for media messages")
-        is_edited: bool = SchemaField(
-            description="Whether this is an edit of a previously sent message"
-        )
-
-    def __init__(self):
-        super().__init__(
-            id="4435e4e0-df6e-4301-8f35-ad70b12fc9ec",
-            description="Triggers when a message is received or edited in your Telegram bot. "
-            "Supports text, photos, voice messages, audio files, documents, and videos.",
-            categories={BlockCategory.SOCIAL},
-            input_schema=TelegramMessageTriggerBlock.Input,
-            output_schema=TelegramMessageTriggerBlock.Output,
-            webhook_config=BlockWebhookConfig(
-                provider=ProviderName.TELEGRAM,
-                webhook_type=TelegramWebhookType.BOT,
-                resource_format="bot",
-                event_filter_input="events",
-                event_format="message.{event}",
-            ),
-            test_input={
-                "events": {"text": True, "photo": True},
-                "credentials": TEST_CREDENTIALS_INPUT,
-                "payload": EXAMPLE_MESSAGE_PAYLOAD,
-            },
-            test_credentials=TEST_CREDENTIALS,
-            test_output=[
-                ("payload", EXAMPLE_MESSAGE_PAYLOAD),
-                ("chat_id", 12345678),
-                ("message_id", 1),
-                ("user_id", 12345678),
-                ("username", "johndoe"),
-                ("first_name", "John"),
-                ("is_edited", False),
-                ("event", "text"),
-                ("text", "Hello, bot!"),
-                ("photo_file_id", ""),
-                ("voice_file_id", ""),
-                ("audio_file_id", ""),
-                ("file_id", ""),
-                ("file_name", ""),
-                ("caption", ""),
-            ],
-        )
-
-    async def run(self, input_data: Input, **kwargs) -> BlockOutput:
-        payload = input_data.payload
-        is_edited = "edited_message" in payload
-        message = payload.get("message") or payload.get("edited_message", {})
-
-        # Extract common fields
-        chat = message.get("chat", {})
-        sender = message.get("from", {})
-
-        yield "payload", payload
-        yield "chat_id", chat.get("id", 0)
-        yield "message_id", message.get("message_id", 0)
-        yield "user_id", sender.get("id", 0)
-        yield "username", sender.get("username", "")
-        yield "first_name", sender.get("first_name", "")
-        yield "is_edited", is_edited
-
-        # For edited messages, yield event as "edited_message" and extract
-        # all content fields from the edited message body
-        if is_edited:
-            yield "event", "edited_message"
-            yield "text", message.get("text", "")
-            photos = message.get("photo", [])
-            yield "photo_file_id", photos[-1].get("file_id", "") if photos else ""
-            voice = message.get("voice", {})
-            yield "voice_file_id", voice.get("file_id", "")
-            audio = message.get("audio", {})
-            yield "audio_file_id", audio.get("file_id", "")
-            document = message.get("document", {})
-            video = message.get("video", {})
-            yield "file_id", (document.get("file_id", "") or video.get("file_id", ""))
-            yield "file_name", (
-                document.get("file_name", "") or audio.get("file_name", "")
-            )
-            yield "caption", message.get("caption", "")
-        # Determine message type and extract content
-        elif "text" in message:
-            yield "event", "text"
-            yield "text", message.get("text", "")
-            yield "photo_file_id", ""
-            yield "voice_file_id", ""
-            yield "audio_file_id", ""
-            yield "file_id", ""
-            yield "file_name", ""
-            yield "caption", ""
-        elif "photo" in message:
-            # Get the largest photo (last in array)
-            photos = message.get("photo", [])
-            photo_fid = photos[-1].get("file_id", "") if photos else ""
-            yield "event", "photo"
-            yield "text", ""
-            yield "photo_file_id", photo_fid
-            yield "voice_file_id", ""
-            yield "audio_file_id", ""
-            yield "file_id", ""
-            yield "file_name", ""
-            yield "caption", message.get("caption", "")
-        elif "voice" in message:
-            voice = message.get("voice", {})
-            yield "event", "voice"
-            yield "text", ""
-            yield "photo_file_id", ""
-            yield "voice_file_id", voice.get("file_id", "")
-            yield "audio_file_id", ""
-            yield "file_id", ""
-            yield "file_name", ""
-            yield "caption", message.get("caption", "")
-        elif "audio" in message:
-            audio = message.get("audio", {})
-            yield "event", "audio"
-            yield "text", ""
-            yield "photo_file_id", ""
-            yield "voice_file_id", ""
-            yield "audio_file_id", audio.get("file_id", "")
-            yield "file_id", ""
-            yield "file_name", audio.get("file_name", "")
-            yield "caption", message.get("caption", "")
-        elif "document" in message:
-            document = message.get("document", {})
-            yield "event", "document"
-            yield "text", ""
-            yield "photo_file_id", ""
-            yield "voice_file_id", ""
-            yield "audio_file_id", ""
-            yield "file_id", document.get("file_id", "")
-            yield "file_name", document.get("file_name", "")
-            yield "caption", message.get("caption", "")
-        elif "video" in message:
-            video = message.get("video", {})
-            yield "event", "video"
-            yield "text", ""
-            yield "photo_file_id", ""
-            yield "voice_file_id", ""
-            yield "audio_file_id", ""
-            yield "file_id", video.get("file_id", "")
-            yield "file_name", video.get("file_name", "")
-            yield "caption", message.get("caption", "")
-        else:
-            yield "event", "other"
-            yield "text", ""
-            yield "photo_file_id", ""
-            yield "voice_file_id", ""
-            yield "audio_file_id", ""
-            yield "file_id", ""
-            yield "file_name", ""
-            yield "caption", ""
-
-
-# Example payload for reaction trigger testing
-EXAMPLE_REACTION_PAYLOAD = {
-    "update_id": 123456790,
-    "message_reaction": {
-        "chat": {
-            "id": 12345678,
-            "first_name": "John",
-            "last_name": "Doe",
-            "username": "johndoe",
-            "type": "private",
-        },
-        "message_id": 42,
-        "user": {
-            "id": 12345678,
-            "is_bot": False,
-            "first_name": "John",
-            "username": "johndoe",
-        },
-        "date": 1234567890,
-        "new_reaction": [{"type": "emoji", "emoji": "👍"}],
-        "old_reaction": [],
-    },
-}
-
-
-class TelegramMessageReactionTriggerBlock(TelegramTriggerBase, Block):
-    """
-    Triggers when a reaction to a message is changed.
-
-    Works automatically in private chats. In group chats, the bot must be
-    an administrator to receive reaction updates.
-    """
-
-    class Input(TelegramTriggerBase.Input):
-        pass
-
-    class Output(BlockSchemaOutput):
-        payload: dict = SchemaField(
-            description="The complete webhook payload from Telegram"
-        )
-        chat_id: int = SchemaField(
-            description="The chat ID where the reaction occurred"
-        )
-        message_id: int = SchemaField(description="The message ID that was reacted to")
-        user_id: int = SchemaField(description="The user ID who changed the reaction")
-        username: str = SchemaField(description="Username of the user (may be empty)")
-        new_reactions: list = SchemaField(
-            description="List of new reactions on the message"
-        )
-        old_reactions: list = SchemaField(
-            description="List of previous reactions on the message"
-        )
-
-    def __init__(self):
-        super().__init__(
-            id="82525328-9368-4966-8f0c-cd78e80181fd",
-            description="Triggers when a reaction to a message is changed. "
-            "Works in private chats automatically. "
-            "In groups, the bot must be an administrator.",
-            categories={BlockCategory.SOCIAL},
-            input_schema=TelegramMessageReactionTriggerBlock.Input,
-            output_schema=TelegramMessageReactionTriggerBlock.Output,
-            webhook_config=BlockWebhookConfig(
-                provider=ProviderName.TELEGRAM,
-                webhook_type=TelegramWebhookType.BOT,
-                resource_format="bot",
-                event_filter_input="",
-                event_format="message_reaction",
-            ),
-            test_input={
-                "credentials": TEST_CREDENTIALS_INPUT,
-                "payload": EXAMPLE_REACTION_PAYLOAD,
-            },
-            test_credentials=TEST_CREDENTIALS,
-            test_output=[
-                ("payload", EXAMPLE_REACTION_PAYLOAD),
-                ("chat_id", 12345678),
-                ("message_id", 42),
-                ("user_id", 12345678),
-                ("username", "johndoe"),
-                ("new_reactions", [{"type": "emoji", "emoji": "👍"}]),
-                ("old_reactions", []),
-            ],
-        )
-
-    async def run(self, input_data: Input, **kwargs) -> BlockOutput:
-        payload = input_data.payload
-        reaction = payload.get("message_reaction", {})
-
-        chat = reaction.get("chat", {})
-        user = reaction.get("user", {})
-
-        yield "payload", payload
-        yield "chat_id", chat.get("id", 0)
-        yield "message_id", reaction.get("message_id", 0)
-        yield "user_id", user.get("id", 0)
-        yield "username", user.get("username", "")
-        yield "new_reactions", reaction.get("new_reaction", [])
-        yield "old_reactions", reaction.get("old_reaction", [])
--- a/autogpt_platform/backend/backend/blocks/test/test_store_operations.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_store_operations.py
@@ -2,7 +2,6 @@ from unittest.mock import MagicMock

 import pytest

-from backend.api.features.store.db import StoreAgentsSortOptions
 from backend.blocks.system.library_operations import (
    AddToLibraryFromStoreBlock,
    LibraryAgent,
@@ -122,10 +121,7 @@ async def test_search_store_agents_block(mocker):
    )

    input_data = block.Input(
-        query="test",
-        category="productivity",
-        sort_by=StoreAgentsSortOptions.RATING,  # type: ignore[reportArgumentType]
-        limit=10,
+        query="test", category="productivity", sort_by="rating", limit=10
    )

    outputs = {}
--- a/autogpt_platform/backend/backend/cli/generate_openapi_json.py
+++ b/autogpt_platform/backend/backend/cli/generate_openapi_json.py
@@ -34,12 +34,10 @@ def main(output: Path, pretty: bool):
    """Generate and output the OpenAPI JSON specification."""
    openapi_schema = get_openapi_schema()

-    json_output = json.dumps(
-        openapi_schema, indent=2 if pretty else None, ensure_ascii=False
-    )
+    json_output = json.dumps(openapi_schema, indent=2 if pretty else None)

    if output:
-        output.write_text(json_output, encoding="utf-8")
+        output.write_text(json_output)
        click.echo(f"✅ OpenAPI specification written to {output}\n\nPreview:")
        click.echo(f"\n{json_output[:500]} ...")
    else:
--- a/autogpt_platform/backend/backend/copilot/baseline/init.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/init.py
@@ -1,3 +0,0 @@
-from .service import stream_chat_completion_baseline
-
-__all__ = ["stream_chat_completion_baseline"]
--- a/autogpt_platform/backend/backend/copilot/baseline/service.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service.py
@@ -1,424 +0,0 @@
-"""Baseline LLM fallback — OpenAI-compatible streaming with tool calling.
-
-Used when ``CHAT_USE_CLAUDE_AGENT_SDK=false``, e.g. as a fallback when the
-Claude Agent SDK / Anthropic API is unavailable.  Routes through any
-OpenAI-compatible provider (OpenRouter by default) and reuses the same
-shared tool registry as the SDK path.
-"""
-
-import asyncio
-import logging
-import uuid
-from collections.abc import AsyncGenerator
-from typing import Any
-
-import orjson
-from langfuse import propagate_attributes
-
-from backend.copilot.model import (
-    ChatMessage,
-    ChatSession,
-    get_chat_session,
-    update_session_title,
-    upsert_chat_session,
-)
-from backend.copilot.prompting import get_baseline_supplement
-from backend.copilot.response_model import (
-    StreamBaseResponse,
-    StreamError,
-    StreamFinish,
-    StreamFinishStep,
-    StreamStart,
-    StreamStartStep,
-    StreamTextDelta,
-    StreamTextEnd,
-    StreamTextStart,
-    StreamToolInputAvailable,
-    StreamToolInputStart,
-    StreamToolOutputAvailable,
-)
-from backend.copilot.service import (
-    _build_system_prompt,
-    _generate_session_title,
-    client,
-    config,
-)
-from backend.copilot.tools import execute_tool, get_available_tools
-from backend.copilot.tracking import track_user_message
-from backend.util.exceptions import NotFoundError
-from backend.util.prompt import compress_context
-
-logger = logging.getLogger(__name__)
-
-# Set to hold background tasks to prevent garbage collection
-_background_tasks: set[asyncio.Task[Any]] = set()
-
-# Maximum number of tool-call rounds before forcing a text response.
-_MAX_TOOL_ROUNDS = 30
-
-
-async def _update_title_async(
-    session_id: str, message: str, user_id: str | None
-) -> None:
-    """Generate and persist a session title in the background."""
-    try:
-        title = await _generate_session_title(message, user_id, session_id)
-        if title and user_id:
-            await update_session_title(session_id, user_id, title, only_if_empty=True)
-    except Exception as e:
-        logger.warning("[Baseline] Failed to update session title: %s", e)
-
-
-async def _compress_session_messages(
-    messages: list[ChatMessage],
-) -> list[ChatMessage]:
-    """Compress session messages if they exceed the model's token limit.
-
-    Uses the shared compress_context() utility which supports LLM-based
-    summarization of older messages while keeping recent ones intact,
-    with progressive truncation and middle-out deletion as fallbacks.
-    """
-    messages_dict = []
-    for msg in messages:
-        msg_dict: dict[str, Any] = {"role": msg.role}
-        if msg.content:
-            msg_dict["content"] = msg.content
-        messages_dict.append(msg_dict)
-
-    try:
-        result = await compress_context(
-            messages=messages_dict,
-            model=config.model,
-            client=client,
-        )
-    except Exception as e:
-        logger.warning("[Baseline] Context compression with LLM failed: %s", e)
-        result = await compress_context(
-            messages=messages_dict,
-            model=config.model,
-            client=None,
-        )
-
-    if result.was_compacted:
-        logger.info(
-            "[Baseline] Context compacted: %d -> %d tokens "
-            "(%d summarized, %d dropped)",
-            result.original_token_count,
-            result.token_count,
-            result.messages_summarized,
-            result.messages_dropped,
-        )
-        return [
-            ChatMessage(role=m["role"], content=m.get("content"))
-            for m in result.messages
-        ]
-
-    return messages
-
-
-async def stream_chat_completion_baseline(
-    session_id: str,
-    message: str | None = None,
-    is_user_message: bool = True,
-    user_id: str | None = None,
-    session: ChatSession | None = None,
-    **_kwargs: Any,
-) -> AsyncGenerator[StreamBaseResponse, None]:
-    """Baseline LLM with tool calling via OpenAI-compatible API.
-
-    Designed as a fallback when the Claude Agent SDK is unavailable.
-    Uses the same tool registry as the SDK path but routes through any
-    OpenAI-compatible provider (e.g. OpenRouter).
-
-    Flow: stream response -> if tool_calls, execute them -> feed results back -> repeat.
-    """
-    if session is None:
-        session = await get_chat_session(session_id, user_id)
-
-    if not session:
-        raise NotFoundError(
-            f"Session {session_id} not found. Please create a new session first."
-        )
-
-    # Append user message
-    new_role = "user" if is_user_message else "assistant"
-    if message and (
-        len(session.messages) == 0
-        or not (
-            session.messages[-1].role == new_role
-            and session.messages[-1].content == message
-        )
-    ):
-        session.messages.append(ChatMessage(role=new_role, content=message))
-        if is_user_message:
-            track_user_message(
-                user_id=user_id,
-                session_id=session_id,
-                message_length=len(message),
-            )
-
-    session = await upsert_chat_session(session)
-
-    # Generate title for new sessions
-    if is_user_message and not session.title:
-        user_messages = [m for m in session.messages if m.role == "user"]
-        if len(user_messages) == 1:
-            first_message = user_messages[0].content or message or ""
-            if first_message:
-                task = asyncio.create_task(
-                    _update_title_async(session_id, first_message, user_id)
-                )
-                _background_tasks.add(task)
-                task.add_done_callback(_background_tasks.discard)
-
-    message_id = str(uuid.uuid4())
-
-    # Build system prompt only on the first turn to avoid mid-conversation
-    # changes from concurrent chats updating business understanding.
-    is_first_turn = len(session.messages) <= 1
-    if is_first_turn:
-        base_system_prompt, _ = await _build_system_prompt(
-            user_id, has_conversation_history=False
-        )
-    else:
-        base_system_prompt, _ = await _build_system_prompt(
-            user_id=None, has_conversation_history=True
-        )
-
-    # Append tool documentation and technical notes
-    system_prompt = base_system_prompt + get_baseline_supplement()
-
-    # Compress context if approaching the model's token limit
-    messages_for_context = await _compress_session_messages(session.messages)
-
-    # Build OpenAI message list from session history
-    openai_messages: list[dict[str, Any]] = [
-        {"role": "system", "content": system_prompt}
-    ]
-    for msg in messages_for_context:
-        if msg.role in ("user", "assistant") and msg.content:
-            openai_messages.append({"role": msg.role, "content": msg.content})
-
-    tools = get_available_tools()
-
-    yield StreamStart(messageId=message_id, sessionId=session_id)
-
-    # Propagate user/session context to Langfuse so all LLM calls within
-    # this request are grouped under a single trace with proper attribution.
-    _trace_ctx: Any = None
-    try:
-        _trace_ctx = propagate_attributes(
-            user_id=user_id,
-            session_id=session_id,
-            trace_name="copilot-baseline",
-            tags=["baseline"],
-        )
-        _trace_ctx.__enter__()
-    except Exception:
-        logger.warning("[Baseline] Langfuse trace context setup failed")
-
-    assistant_text = ""
-    text_block_id = str(uuid.uuid4())
-    text_started = False
-    step_open = False
-    try:
-        for _round in range(_MAX_TOOL_ROUNDS):
-            # Open a new step for each LLM round
-            yield StreamStartStep()
-            step_open = True
-
-            # Stream a response from the model
-            create_kwargs: dict[str, Any] = dict(
-                model=config.model,
-                messages=openai_messages,
-                stream=True,
-            )
-            if tools:
-                create_kwargs["tools"] = tools
-            response = await client.chat.completions.create(**create_kwargs)  # type: ignore[arg-type]  # dynamic kwargs
-
-            # Accumulate streamed response (text + tool calls)
-            round_text = ""
-            tool_calls_by_index: dict[int, dict[str, str]] = {}
-
-            async for chunk in response:
-                delta = chunk.choices[0].delta if chunk.choices else None
-                if not delta:
-                    continue
-
-                # Text content
-                if delta.content:
-                    if not text_started:
-                        yield StreamTextStart(id=text_block_id)
-                        text_started = True
-                    round_text += delta.content
-                    yield StreamTextDelta(id=text_block_id, delta=delta.content)
-
-                # Tool call fragments (streamed incrementally)
-                if delta.tool_calls:
-                    for tc in delta.tool_calls:
-                        idx = tc.index
-                        if idx not in tool_calls_by_index:
-                            tool_calls_by_index[idx] = {
-                                "id": "",
-                                "name": "",
-                                "arguments": "",
-                            }
-                        entry = tool_calls_by_index[idx]
-                        if tc.id:
-                            entry["id"] = tc.id
-                        if tc.function and tc.function.name:
-                            entry["name"] = tc.function.name
-                        if tc.function and tc.function.arguments:
-                            entry["arguments"] += tc.function.arguments
-
-            # Close text block if we had one this round
-            if text_started:
-                yield StreamTextEnd(id=text_block_id)
-                text_started = False
-                text_block_id = str(uuid.uuid4())
-
-            # Accumulate text for session persistence
-            assistant_text += round_text
-
-            # No tool calls -> model is done
-            if not tool_calls_by_index:
-                yield StreamFinishStep()
-                step_open = False
-                break
-
-            # Close step before tool execution
-            yield StreamFinishStep()
-            step_open = False
-
-            # Append the assistant message with tool_calls to context.
-            assistant_msg: dict[str, Any] = {"role": "assistant"}
-            if round_text:
-                assistant_msg["content"] = round_text
-            assistant_msg["tool_calls"] = [
-                {
-                    "id": tc["id"],
-                    "type": "function",
-                    "function": {
-                        "name": tc["name"],
-                        "arguments": tc["arguments"] or "{}",
-                    },
-                }
-                for tc in tool_calls_by_index.values()
-            ]
-            openai_messages.append(assistant_msg)
-
-            # Execute each tool call and stream events
-            for tc in tool_calls_by_index.values():
-                tool_call_id = tc["id"]
-                tool_name = tc["name"]
-                raw_args = tc["arguments"] or "{}"
-                try:
-                    tool_args = orjson.loads(raw_args)
-                except orjson.JSONDecodeError as parse_err:
-                    parse_error = (
-                        f"Invalid JSON arguments for tool '{tool_name}': {parse_err}"
-                    )
-                    logger.warning("[Baseline] %s", parse_error)
-                    yield StreamToolOutputAvailable(
-                        toolCallId=tool_call_id,
-                        toolName=tool_name,
-                        output=parse_error,
-                        success=False,
-                    )
-                    openai_messages.append(
-                        {
-                            "role": "tool",
-                            "tool_call_id": tool_call_id,
-                            "content": parse_error,
-                        }
-                    )
-                    continue
-
-                yield StreamToolInputStart(toolCallId=tool_call_id, toolName=tool_name)
-                yield StreamToolInputAvailable(
-                    toolCallId=tool_call_id,
-                    toolName=tool_name,
-                    input=tool_args,
-                )
-
-                # Execute via shared tool registry
-                try:
-                    result: StreamToolOutputAvailable = await execute_tool(
-                        tool_name=tool_name,
-                        parameters=tool_args,
-                        user_id=user_id,
-                        session=session,
-                        tool_call_id=tool_call_id,
-                    )
-                    yield result
-                    tool_output = (
-                        result.output
-                        if isinstance(result.output, str)
-                        else str(result.output)
-                    )
-                except Exception as e:
-                    error_output = f"Tool execution error: {e}"
-                    logger.error(
-                        "[Baseline] Tool %s failed: %s",
-                        tool_name,
-                        error_output,
-                        exc_info=True,
-                    )
-                    yield StreamToolOutputAvailable(
-                        toolCallId=tool_call_id,
-                        toolName=tool_name,
-                        output=error_output,
-                        success=False,
-                    )
-                    tool_output = error_output
-
-                # Append tool result to context for next round
-                openai_messages.append(
-                    {
-                        "role": "tool",
-                        "tool_call_id": tool_call_id,
-                        "content": tool_output,
-                    }
-                )
-        else:
-            # for-loop exhausted without break -> tool-round limit hit
-            limit_msg = (
-                f"Exceeded {_MAX_TOOL_ROUNDS} tool-call rounds "
-                "without a final response."
-            )
-            logger.error("[Baseline] %s", limit_msg)
-            yield StreamError(
-                errorText=limit_msg,
-                code="baseline_tool_round_limit",
-            )
-
-    except Exception as e:
-        error_msg = str(e) or type(e).__name__
-        logger.error("[Baseline] Streaming error: %s", error_msg, exc_info=True)
-        # Close any open text/step before emitting error
-        if text_started:
-            yield StreamTextEnd(id=text_block_id)
-        if step_open:
-            yield StreamFinishStep()
-        yield StreamError(errorText=error_msg, code="baseline_error")
-        # Still persist whatever we got
-    finally:
-        # Close Langfuse trace context
-        if _trace_ctx is not None:
-            try:
-                _trace_ctx.__exit__(None, None, None)
-            except Exception:
-                logger.warning("[Baseline] Langfuse trace context teardown failed")
-
-        # Persist assistant response
-        if assistant_text:
-            session.messages.append(
-                ChatMessage(role="assistant", content=assistant_text)
-            )
-        try:
-            await upsert_chat_session(session)
-        except Exception as persist_err:
-            logger.error("[Baseline] Failed to persist session: %s", persist_err)
-
-    yield StreamFinish()
--- a/autogpt_platform/backend/backend/copilot/baseline/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service_test.py
@@ -1,99 +0,0 @@
-import logging
-from os import getenv
-
-import pytest
-
-from backend.copilot.baseline import stream_chat_completion_baseline
-from backend.copilot.model import (
-    create_chat_session,
-    get_chat_session,
-    upsert_chat_session,
-)
-from backend.copilot.response_model import (
-    StreamError,
-    StreamFinish,
-    StreamStart,
-    StreamTextDelta,
-)
-
-logger = logging.getLogger(__name__)
-
-
-@pytest.mark.asyncio(loop_scope="session")
-async def test_baseline_multi_turn(setup_test_user, test_user_id):
-    """Test that the baseline LLM path streams responses and maintains history.
-
-    Turn 1: Send a message with a unique keyword.
-    Turn 2: Ask the model to recall the keyword — proving conversation history
-    is correctly passed to the single-call LLM.
-    """
-    api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
-    if not api_key:
-        return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
-
-    session = await create_chat_session(test_user_id)
-    session = await upsert_chat_session(session)
-
-    # --- Turn 1: send a message with a unique keyword ---
-    keyword = "QUASAR99"
-    turn1_msg = (
-        f"Please remember this special keyword: {keyword}. "
-        "Just confirm you've noted it, keep your response brief."
-    )
-    turn1_text = ""
-    turn1_errors: list[str] = []
-    got_start = False
-    got_finish = False
-
-    async for chunk in stream_chat_completion_baseline(
-        session.session_id,
-        turn1_msg,
-        user_id=test_user_id,
-    ):
-        if isinstance(chunk, StreamStart):
-            got_start = True
-        elif isinstance(chunk, StreamTextDelta):
-            turn1_text += chunk.delta
-        elif isinstance(chunk, StreamError):
-            turn1_errors.append(chunk.errorText)
-        elif isinstance(chunk, StreamFinish):
-            got_finish = True
-
-    assert got_start, "Turn 1 did not yield StreamStart"
-    assert got_finish, "Turn 1 did not yield StreamFinish"
-    assert not turn1_errors, f"Turn 1 errors: {turn1_errors}"
-    assert turn1_text, "Turn 1 produced no text"
-    logger.info(f"Turn 1 response: {turn1_text[:100]}")
-
-    # Reload session for turn 2
-    session = await get_chat_session(session.session_id, test_user_id)
-    assert session, "Session not found after turn 1"
-
-    # Verify messages were persisted (user + assistant)
-    assert (
-        len(session.messages) >= 2
-    ), f"Expected at least 2 messages after turn 1, got {len(session.messages)}"
-
-    # --- Turn 2: ask model to recall the keyword ---
-    turn2_msg = "What was the special keyword I asked you to remember?"
-    turn2_text = ""
-    turn2_errors: list[str] = []
-
-    async for chunk in stream_chat_completion_baseline(
-        session.session_id,
-        turn2_msg,
-        user_id=test_user_id,
-        session=session,
-    ):
-        if isinstance(chunk, StreamTextDelta):
-            turn2_text += chunk.delta
-        elif isinstance(chunk, StreamError):
-            turn2_errors.append(chunk.errorText)
-
-    assert not turn2_errors, f"Turn 2 errors: {turn2_errors}"
-    assert turn2_text, "Turn 2 produced no text"
-    assert keyword in turn2_text, (
-        f"Model did not recall keyword '{keyword}' in turn 2. "
-        f"Response: {turn2_text[:200]}"
-    )
-    logger.info(f"Turn 2 recalled keyword successfully: {turn2_text[:100]}")
--- a/autogpt_platform/backend/backend/copilot/completion_consumer.py
+++ b/autogpt_platform/backend/backend/copilot/completion_consumer.py
@@ -0,0 +1,349 @@
+"""Redis Streams consumer for operation completion messages.
+
+This module provides a consumer (ChatCompletionConsumer) that listens for
+completion notifications (OperationCompleteMessage) from external services
+(like Agent Generator) and triggers the appropriate stream registry and
+chat service updates via process_operation_success/process_operation_failure.
+
+Why Redis Streams instead of RabbitMQ?
+--------------------------------------
+While the project typically uses RabbitMQ for async task queues (e.g., execution
+queue), Redis Streams was chosen for chat completion notifications because:
+
+1. **Unified Infrastructure**: The SSE reconnection feature already uses Redis
+   Streams (via stream_registry) for message persistence and replay. Using Redis
+   Streams for completion notifications keeps all chat streaming infrastructure
+   in one system, simplifying operations and reducing cross-system coordination.
+
+2. **Message Replay**: Redis Streams support XREAD with arbitrary message IDs,
+   allowing consumers to replay missed messages after reconnection. This aligns
+   with the SSE reconnection pattern where clients can resume from last_message_id.
+
+3. **Consumer Groups with XAUTOCLAIM**: Redis consumer groups provide automatic
+   load balancing across pods with explicit message claiming (XAUTOCLAIM) for
+   recovering from dead consumers - ideal for the completion callback pattern.
+
+4. **Lower Latency**: For real-time SSE updates, Redis (already in-memory for
+   stream_registry) provides lower latency than an additional RabbitMQ hop.
+
+5. **Atomicity with Task State**: Completion processing often needs to update
+   task metadata stored in Redis. Keeping both in Redis enables simpler
+   transactional semantics without distributed coordination.
+
+The consumer uses Redis Streams with consumer groups for reliable message
+processing across multiple platform pods, with XAUTOCLAIM for reclaiming
+stale pending messages from dead consumers.
+"""
+
+import asyncio
+import logging
+import uuid
+from typing import Any
+
+import orjson
+from pydantic import BaseModel
+from redis.exceptions import ResponseError
+
+from backend.data.redis_client import get_redis_async
+
+from . import stream_registry
+from .completion_handler import process_operation_failure, process_operation_success
+from .config import ChatConfig
+
+logger = logging.getLogger(__name__)
+config = ChatConfig()
+
+
+class OperationCompleteMessage(BaseModel):
+    """Message format for operation completion notifications."""
+
+    operation_id: str
+    task_id: str
+    success: bool
+    result: dict | str | None = None
+    error: str | None = None
+
+
+class ChatCompletionConsumer:
+    """Consumer for chat operation completion messages from Redis Streams.
+
+    Database operations are handled through the chat_db() accessor, which
+    routes through DatabaseManager RPC when Prisma is not directly connected.
+
+    Uses Redis consumer groups to allow multiple platform pods to consume
+    messages reliably with automatic redelivery on failure.
+    """
+
+    def __init__(self):
+        self._consumer_task: asyncio.Task | None = None
+        self._running = False
+        self._consumer_name = f"consumer-{uuid.uuid4().hex[:8]}"
+
+    async def start(self) -> None:
+        """Start the completion consumer."""
+        if self._running:
+            logger.warning("Completion consumer already running")
+            return
+
+        # Create consumer group if it doesn't exist
+        try:
+            redis = await get_redis_async()
+            await redis.xgroup_create(
+                config.stream_completion_name,
+                config.stream_consumer_group,
+                id="0",
+                mkstream=True,
+            )
+            logger.info(
+                f"Created consumer group '{config.stream_consumer_group}' "
+                f"on stream '{config.stream_completion_name}'"
+            )
+        except ResponseError as e:
+            if "BUSYGROUP" in str(e):
+                logger.debug(
+                    f"Consumer group '{config.stream_consumer_group}' already exists"
+                )
+            else:
+                raise
+
+        self._running = True
+        self._consumer_task = asyncio.create_task(self._consume_messages())
+        logger.info(
+            f"Chat completion consumer started (consumer: {self._consumer_name})"
+        )
+
+    async def stop(self) -> None:
+        """Stop the completion consumer."""
+        self._running = False
+
+        if self._consumer_task:
+            self._consumer_task.cancel()
+            try:
+                await self._consumer_task
+            except asyncio.CancelledError:
+                pass
+            self._consumer_task = None
+
+        logger.info("Chat completion consumer stopped")
+
+    async def _consume_messages(self) -> None:
+        """Main message consumption loop with retry logic."""
+        max_retries = 10
+        retry_delay = 5  # seconds
+        retry_count = 0
+        block_timeout = 5000  # milliseconds
+
+        while self._running and retry_count < max_retries:
+            try:
+                redis = await get_redis_async()
+
+                # Reset retry count on successful connection
+                retry_count = 0
+
+                while self._running:
+                    # First, claim any stale pending messages from dead consumers
+                    # Redis does NOT auto-redeliver pending messages; we must explicitly
+                    # claim them using XAUTOCLAIM
+                    try:
+                        claimed_result = await redis.xautoclaim(
+                            name=config.stream_completion_name,
+                            groupname=config.stream_consumer_group,
+                            consumername=self._consumer_name,
+                            min_idle_time=config.stream_claim_min_idle_ms,
+                            start_id="0-0",
+                            count=10,
+                        )
+                        # xautoclaim returns: (next_start_id, [(id, data), ...], [deleted_ids])
+                        if claimed_result and len(claimed_result) >= 2:
+                            claimed_entries = claimed_result[1]
+                            if claimed_entries:
+                                logger.info(
+                                    f"Claimed {len(claimed_entries)} stale pending messages"
+                                )
+                                for entry_id, data in claimed_entries:
+                                    if not self._running:
+                                        return
+                                    await self._process_entry(redis, entry_id, data)
+                    except Exception as e:
+                        logger.warning(f"XAUTOCLAIM failed (non-fatal): {e}")
+
+                    # Read new messages from the stream
+                    messages = await redis.xreadgroup(
+                        groupname=config.stream_consumer_group,
+                        consumername=self._consumer_name,
+                        streams={config.stream_completion_name: ">"},
+                        block=block_timeout,
+                        count=10,
+                    )
+
+                    if not messages:
+                        continue
+
+                    for stream_name, entries in messages:
+                        for entry_id, data in entries:
+                            if not self._running:
+                                return
+                            await self._process_entry(redis, entry_id, data)
+
+            except asyncio.CancelledError:
+                logger.info("Consumer cancelled")
+                return
+            except Exception as e:
+                retry_count += 1
+                logger.error(
+                    f"Consumer error (retry {retry_count}/{max_retries}): {e}",
+                    exc_info=True,
+                )
+                if self._running and retry_count < max_retries:
+                    await asyncio.sleep(retry_delay)
+                else:
+                    logger.error("Max retries reached, stopping consumer")
+                    return
+
+    async def _process_entry(
+        self, redis: Any, entry_id: str, data: dict[str, Any]
+    ) -> None:
+        """Process a single stream entry and acknowledge it on success.
+
+        Args:
+            redis: Redis client connection
+            entry_id: The stream entry ID
+            data: The entry data dict
+        """
+        try:
+            # Handle the message
+            message_data = data.get("data")
+            if message_data:
+                await self._handle_message(
+                    message_data.encode()
+                    if isinstance(message_data, str)
+                    else message_data
+                )
+
+            # Acknowledge the message after successful processing
+            await redis.xack(
+                config.stream_completion_name,
+                config.stream_consumer_group,
+                entry_id,
+            )
+        except Exception as e:
+            logger.error(
+                f"Error processing completion message {entry_id}: {e}",
+                exc_info=True,
+            )
+            # Message remains in pending state and will be claimed by
+            # XAUTOCLAIM after min_idle_time expires
+
+    async def _handle_message(self, body: bytes) -> None:
+        """Handle a completion message."""
+        try:
+            data = orjson.loads(body)
+            message = OperationCompleteMessage(**data)
+        except Exception as e:
+            logger.error(f"Failed to parse completion message: {e}")
+            return
+
+        logger.info(
+            f"[COMPLETION] Received completion for operation {message.operation_id} "
+            f"(task_id={message.task_id}, success={message.success})"
+        )
+
+        # Find task in registry
+        task = await stream_registry.find_task_by_operation_id(message.operation_id)
+        if task is None:
+            task = await stream_registry.get_task(message.task_id)
+
+        if task is None:
+            logger.warning(
+                f"[COMPLETION] Task not found for operation {message.operation_id} "
+                f"(task_id={message.task_id})"
+            )
+            return
+
+        logger.info(
+            f"[COMPLETION] Found task: task_id={task.task_id}, "
+            f"session_id={task.session_id}, tool_call_id={task.tool_call_id}"
+        )
+
+        # Guard against empty task fields
+        if not task.task_id or not task.session_id or not task.tool_call_id:
+            logger.error(
+                f"[COMPLETION] Task has empty critical fields! "
+                f"task_id={task.task_id!r}, session_id={task.session_id!r}, "
+                f"tool_call_id={task.tool_call_id!r}"
+            )
+            return
+
+        if message.success:
+            await self._handle_success(task, message)
+        else:
+            await self._handle_failure(task, message)
+
+    async def _handle_success(
+        self,
+        task: stream_registry.ActiveTask,
+        message: OperationCompleteMessage,
+    ) -> None:
+        """Handle successful operation completion."""
+        await process_operation_success(task, message.result)
+
+    async def _handle_failure(
+        self,
+        task: stream_registry.ActiveTask,
+        message: OperationCompleteMessage,
+    ) -> None:
+        """Handle failed operation completion."""
+        await process_operation_failure(task, message.error)
+
+
+# Module-level consumer instance
+_consumer: ChatCompletionConsumer | None = None
+
+
+async def start_completion_consumer() -> None:
+    """Start the global completion consumer."""
+    global _consumer
+    if _consumer is None:
+        _consumer = ChatCompletionConsumer()
+    await _consumer.start()
+
+
+async def stop_completion_consumer() -> None:
+    """Stop the global completion consumer."""
+    global _consumer
+    if _consumer:
+        await _consumer.stop()
+        _consumer = None
+
+
+async def publish_operation_complete(
+    operation_id: str,
+    task_id: str,
+    success: bool,
+    result: dict | str | None = None,
+    error: str | None = None,
+) -> None:
+    """Publish an operation completion message to Redis Streams.
+
+    Args:
+        operation_id: The operation ID that completed.
+        task_id: The task ID associated with the operation.
+        success: Whether the operation succeeded.
+        result: The result data (for success).
+        error: The error message (for failure).
+    """
+    message = OperationCompleteMessage(
+        operation_id=operation_id,
+        task_id=task_id,
+        success=success,
+        result=result,
+        error=error,
+    )
+
+    redis = await get_redis_async()
+    await redis.xadd(
+        config.stream_completion_name,
+        {"data": message.model_dump_json()},
+        maxlen=config.stream_max_length,
+    )
+    logger.info(f"Published completion for operation {operation_id}")
--- a/autogpt_platform/backend/backend/copilot/completion_handler.py
+++ b/autogpt_platform/backend/backend/copilot/completion_handler.py
@@ -0,0 +1,329 @@
+"""Shared completion handling for operation success and failure.
+
+This module provides common logic for handling operation completion from both:
+- The Redis Streams consumer (completion_consumer.py)
+- The HTTP webhook endpoint (routes.py)
+"""
+
+import logging
+from typing import Any
+
+import orjson
+
+from backend.data.db_accessors import chat_db
+
+from . import service as chat_service
+from . import stream_registry
+from .response_model import StreamError, StreamToolOutputAvailable
+from .tools.models import ErrorResponse
+
+logger = logging.getLogger(__name__)
+
+# Tools that produce agent_json that needs to be saved to library
+AGENT_GENERATION_TOOLS = {"create_agent", "edit_agent"}
+
+# Keys that should be stripped from agent_json when returning in error responses
+SENSITIVE_KEYS = frozenset(
+    {
+        "api_key",
+        "apikey",
+        "api_secret",
+        "password",
+        "secret",
+        "credentials",
+        "credential",
+        "token",
+        "access_token",
+        "refresh_token",
+        "private_key",
+        "privatekey",
+        "auth",
+        "authorization",
+    }
+)
+
+
+def _sanitize_agent_json(obj: Any) -> Any:
+    """Recursively sanitize agent_json by removing sensitive keys.
+
+    Args:
+        obj: The object to sanitize (dict, list, or primitive)
+
+    Returns:
+        Sanitized copy with sensitive keys removed/redacted
+    """
+    if isinstance(obj, dict):
+        return {
+            k: "[REDACTED]" if k.lower() in SENSITIVE_KEYS else _sanitize_agent_json(v)
+            for k, v in obj.items()
+        }
+    elif isinstance(obj, list):
+        return [_sanitize_agent_json(item) for item in obj]
+    else:
+        return obj
+
+
+class ToolMessageUpdateError(Exception):
+    """Raised when updating a tool message in the database fails."""
+
+    pass
+
+
+async def _update_tool_message(
+    session_id: str,
+    tool_call_id: str,
+    content: str,
+) -> None:
+    """Update tool message in database using the chat_db accessor.
+
+    Routes through DatabaseManager RPC when Prisma is not directly
+    connected (e.g. in the CoPilot Executor microservice).
+
+    Args:
+        session_id: The session ID
+        tool_call_id: The tool call ID to update
+        content: The new content for the message
+
+    Raises:
+        ToolMessageUpdateError: If the database update fails.
+    """
+    try:
+        updated = await chat_db().update_tool_message_content(
+            session_id=session_id,
+            tool_call_id=tool_call_id,
+            new_content=content,
+        )
+        if not updated:
+            raise ToolMessageUpdateError(
+                f"No message found with tool_call_id="
+                f"{tool_call_id} in session {session_id}"
+            )
+    except ToolMessageUpdateError:
+        raise
+    except Exception as e:
+        logger.error(
+            f"[COMPLETION] Failed to update tool message: {e}",
+            exc_info=True,
+        )
+        raise ToolMessageUpdateError(
+            f"Failed to update tool message for tool call #{tool_call_id}: {e}"
+        ) from e
+
+
+def serialize_result(result: dict | list | str | int | float | bool | None) -> str:
+    """Serialize result to JSON string with sensible defaults.
+
+    Args:
+        result: The result to serialize. Can be a dict, list, string,
+            number, boolean, or None.
+
+    Returns:
+        JSON string representation of the result. Returns '{"status": "completed"}'
+        only when result is explicitly None.
+    """
+    if isinstance(result, str):
+        return result
+    if result is None:
+        return '{"status": "completed"}'
+    return orjson.dumps(result).decode("utf-8")
+
+
+async def _save_agent_from_result(
+    result: dict[str, Any],
+    user_id: str | None,
+    tool_name: str,
+) -> dict[str, Any]:
+    """Save agent to library if result contains agent_json.
+
+    Args:
+        result: The result dict that may contain agent_json
+        user_id: The user ID to save the agent for
+        tool_name: The tool name (create_agent or edit_agent)
+
+    Returns:
+        Updated result dict with saved agent details, or original result if no agent_json
+    """
+    if not user_id:
+        logger.warning("[COMPLETION] Cannot save agent: no user_id in task")
+        return result
+
+    agent_json = result.get("agent_json")
+    if not agent_json:
+        logger.warning(
+            f"[COMPLETION] {tool_name} completed but no agent_json in result"
+        )
+        return result
+
+    try:
+        from .tools.agent_generator import save_agent_to_library
+
+        is_update = tool_name == "edit_agent"
+        created_graph, library_agent = await save_agent_to_library(
+            agent_json, user_id, is_update=is_update
+        )
+
+        logger.info(
+            f"[COMPLETION] Saved agent '{created_graph.name}' to library "
+            f"(graph_id={created_graph.id}, library_agent_id={library_agent.id})"
+        )
+
+        # Return a response similar to AgentSavedResponse
+        return {
+            "type": "agent_saved",
+            "message": f"Agent '{created_graph.name}' has been saved to your library!",
+            "agent_id": created_graph.id,
+            "agent_name": created_graph.name,
+            "library_agent_id": library_agent.id,
+            "library_agent_link": f"/library/agents/{library_agent.id}",
+            "agent_page_link": f"/build?flowID={created_graph.id}",
+        }
+    except Exception as e:
+        logger.error(
+            f"[COMPLETION] Failed to save agent to library: {e}",
+            exc_info=True,
+        )
+        # Return error but don't fail the whole operation
+        # Sanitize agent_json to remove sensitive keys before returning
+        return {
+            "type": "error",
+            "message": f"Agent was generated but failed to save: {str(e)}",
+            "error": str(e),
+            "agent_json": _sanitize_agent_json(agent_json),
+        }
+
+
+async def process_operation_success(
+    task: stream_registry.ActiveTask,
+    result: dict | str | None,
+) -> None:
+    """Handle successful operation completion.
+
+    Publishes the result to the stream registry, updates the database,
+    generates LLM continuation, and marks the task as completed.
+
+    Args:
+        task: The active task that completed
+        result: The result data from the operation
+
+    Raises:
+        ToolMessageUpdateError: If the database update fails. The task
+            will be marked as failed instead of completed.
+    """
+    # For agent generation tools, save the agent to library
+    if task.tool_name in AGENT_GENERATION_TOOLS and isinstance(result, dict):
+        result = await _save_agent_from_result(result, task.user_id, task.tool_name)
+
+    # Serialize result for output (only substitute default when result is exactly None)
+    result_output = result if result is not None else {"status": "completed"}
+    output_str = (
+        result_output
+        if isinstance(result_output, str)
+        else orjson.dumps(result_output).decode("utf-8")
+    )
+
+    # Publish result to stream registry
+    await stream_registry.publish_chunk(
+        task.task_id,
+        StreamToolOutputAvailable(
+            toolCallId=task.tool_call_id,
+            toolName=task.tool_name,
+            output=output_str,
+            success=True,
+        ),
+    )
+
+    # Update pending operation in database
+    # If this fails, we must not continue to mark the task as completed
+    result_str = serialize_result(result)
+    try:
+        await _update_tool_message(
+            session_id=task.session_id,
+            tool_call_id=task.tool_call_id,
+            content=result_str,
+        )
+    except ToolMessageUpdateError:
+        # DB update failed - mark task as failed to avoid inconsistent state
+        logger.error(
+            f"[COMPLETION] DB update failed for task {task.task_id}, "
+            "marking as failed instead of completed"
+        )
+        await stream_registry.publish_chunk(
+            task.task_id,
+            StreamError(errorText="Failed to save operation result to database"),
+        )
+        await stream_registry.mark_task_completed(task.task_id, status="failed")
+        raise
+
+    # Generate LLM continuation with streaming
+    try:
+        await chat_service._generate_llm_continuation_with_streaming(
+            session_id=task.session_id,
+            user_id=task.user_id,
+            task_id=task.task_id,
+        )
+    except Exception as e:
+        logger.error(
+            f"[COMPLETION] Failed to generate LLM continuation: {e}",
+            exc_info=True,
+        )
+
+    # Mark task as completed and release Redis lock
+    await stream_registry.mark_task_completed(task.task_id, status="completed")
+    try:
+        await chat_service._mark_operation_completed(task.tool_call_id)
+    except Exception as e:
+        logger.error(f"[COMPLETION] Failed to mark operation completed: {e}")
+
+    logger.info(
+        f"[COMPLETION] Successfully processed completion for task {task.task_id}"
+    )
+
+
+async def process_operation_failure(
+    task: stream_registry.ActiveTask,
+    error: str | None,
+) -> None:
+    """Handle failed operation completion.
+
+    Publishes the error to the stream registry, updates the database
+    with the error response, and marks the task as failed.
+
+    Args:
+        task: The active task that failed
+        error: The error message from the operation
+    """
+    error_msg = error or "Operation failed"
+
+    # Publish error to stream registry
+    await stream_registry.publish_chunk(
+        task.task_id,
+        StreamError(errorText=error_msg),
+    )
+
+    # Update pending operation with error
+    # If this fails, we still continue to mark the task as failed
+    error_response = ErrorResponse(
+        message=error_msg,
+        error=error,
+    )
+    try:
+        await _update_tool_message(
+            session_id=task.session_id,
+            tool_call_id=task.tool_call_id,
+            content=error_response.model_dump_json(),
+        )
+    except ToolMessageUpdateError:
+        # DB update failed - log but continue with cleanup
+        logger.error(
+            f"[COMPLETION] DB update failed while processing failure for task {task.task_id}, "
+            "continuing with cleanup"
+        )
+
+    # Mark task as failed and release Redis lock
+    await stream_registry.mark_task_completed(task.task_id, status="failed")
+    try:
+        await chat_service._mark_operation_completed(task.tool_call_id)
+    except Exception as e:
+        logger.error(f"[COMPLETION] Failed to mark operation completed: {e}")
+
+    logger.info(f"[COMPLETION] Processed failure for task {task.task_id}: {error_msg}")
--- a/autogpt_platform/backend/backend/copilot/config.py
+++ b/autogpt_platform/backend/backend/copilot/config.py
@@ -26,34 +26,63 @@ class ChatConfig(BaseSettings):
    # Session TTL Configuration - 12 hours
    session_ttl: int = Field(default=43200, description="Session TTL in seconds")

+    # Streaming Configuration
+    stream_timeout: int = Field(default=300, description="Stream timeout in seconds")
+    max_retries: int = Field(
+        default=3,
+        description="Max retries for fallback path (SDK handles retries internally)",
+    )
    max_agent_runs: int = Field(default=30, description="Maximum number of agent runs")
    max_agent_schedules: int = Field(
        default=30, description="Maximum number of agent schedules"
    )

+    # Long-running operation configuration
+    long_running_operation_ttl: int = Field(
+        default=600,
+        description="TTL in seconds for long-running operation tracking in Redis (safety net if pod dies)",
+    )
+
    # Stream registry configuration for SSE reconnection
    stream_ttl: int = Field(
        default=3600,
        description="TTL in seconds for stream data in Redis (1 hour)",
    )
-    stream_lock_ttl: int = Field(
-        default=120,
-        description="TTL in seconds for stream lock (2 minutes). Short timeout allows "
-        "reconnection after refresh/crash without long waits.",
-    )
    stream_max_length: int = Field(
        default=10000,
        description="Maximum number of messages to store per stream",
    )

-    # Redis key prefixes for stream registry
-    session_meta_prefix: str = Field(
-        default="chat:task:meta:",
-        description="Prefix for session metadata hash keys",
+    # Redis Streams configuration for completion consumer
+    stream_completion_name: str = Field(
+        default="chat:completions",
+        description="Redis Stream name for operation completions",
    )
-    turn_stream_prefix: str = Field(
+    stream_consumer_group: str = Field(
+        default="chat_consumers",
+        description="Consumer group name for completion stream",
+    )
+    stream_claim_min_idle_ms: int = Field(
+        default=60000,
+        description="Minimum idle time in milliseconds before claiming pending messages from dead consumers",
+    )
+
+    # Redis key prefixes for stream registry
+    task_meta_prefix: str = Field(
+        default="chat:task:meta:",
+        description="Prefix for task metadata hash keys",
+    )
+    task_stream_prefix: str = Field(
        default="chat:stream:",
-        description="Prefix for turn message stream keys",
+        description="Prefix for task message stream keys",
+    )
+    task_op_prefix: str = Field(
+        default="chat:task:op:",
+        description="Prefix for operation ID to task ID mapping keys",
+    )
+    internal_api_key: str | None = Field(
+        default=None,
+        description="API key for internal webhook callbacks (env: CHAT_INTERNAL_API_KEY)",
    )

    # Langfuse Prompt Management Configuration
@@ -62,15 +91,11 @@ class ChatConfig(BaseSettings):
        default="CoPilot Prompt",
        description="Name of the prompt in Langfuse to fetch",
    )
-    langfuse_prompt_cache_ttl: int = Field(
-        default=300,
-        description="Cache TTL in seconds for Langfuse prompt (0 to disable caching)",
-    )

    # Claude Agent SDK Configuration
    use_claude_agent_sdk: bool = Field(
        default=True,
-        description="Use Claude Agent SDK (True) or OpenAI-compatible LLM baseline (False)",
+        description="Use Claude Agent SDK for chat completions",
    )
    claude_agent_model: str | None = Field(
        default=None,
@@ -84,60 +109,25 @@ class ChatConfig(BaseSettings):
    )
    claude_agent_max_subtasks: int = Field(
        default=10,
-        description="Max number of concurrent sub-agent Tasks the SDK can run per session.",
+        description="Max number of sub-agent Tasks the SDK can spawn per session.",
    )
    claude_agent_use_resume: bool = Field(
        default=True,
        description="Use --resume for multi-turn conversations instead of "
        "history compression. Falls back to compression when unavailable.",
    )
-    use_claude_code_subscription: bool = Field(
-        default=False,
-        description="For personal/dev use: use Claude Code CLI subscription auth instead of API keys. Requires `claude login` on the host. Only works with SDK mode.",
-    )

-    # E2B Sandbox Configuration
-    use_e2b_sandbox: bool = Field(
+    # Extended thinking configuration for Claude models
+    thinking_enabled: bool = Field(
        default=True,
-        description="Use E2B cloud sandboxes for persistent bash/python execution. "
-        "When enabled, bash_exec routes commands to E2B and SDK file tools "
-        "operate directly on the sandbox via E2B's filesystem API.",
+        description="Enable adaptive thinking for Claude models via OpenRouter",
    )
-    e2b_api_key: str | None = Field(
-        default=None,
-        description="E2B API key. Falls back to E2B_API_KEY environment variable.",
-    )
-    e2b_sandbox_template: str = Field(
-        default="base",
-        description="E2B sandbox template to use for copilot sessions.",
-    )
-    e2b_sandbox_timeout: int = Field(
-        default=43200,  # 12 hours — same as session_ttl
-        description="E2B sandbox keepalive timeout in seconds.",
-    )
-
-    @field_validator("use_e2b_sandbox", mode="before")
-    @classmethod
-    def get_use_e2b_sandbox(cls, v):
-        """Get use_e2b_sandbox from environment if not provided."""
-        env_val = os.getenv("CHAT_USE_E2B_SANDBOX", "").lower()
-        if env_val:
-            return env_val in ("true", "1", "yes", "on")
-        return True if v is None else v
-
-    @field_validator("e2b_api_key", mode="before")
-    @classmethod
-    def get_e2b_api_key(cls, v):
-        """Get E2B API key from environment if not provided."""
-        if not v:
-            v = os.getenv("CHAT_E2B_API_KEY") or os.getenv("E2B_API_KEY")
-        return v

    @field_validator("api_key", mode="before")
    @classmethod
    def get_api_key(cls, v):
        """Get API key from environment if not provided."""
-        if not v:
+        if v is None:
            # Try to get from environment variables
            # First check for CHAT_API_KEY (Pydantic prefix)
            v = os.getenv("CHAT_API_KEY")
@@ -147,16 +137,13 @@ class ChatConfig(BaseSettings):
            if not v:
                # Fall back to OPENAI_API_KEY
                v = os.getenv("OPENAI_API_KEY")
-            # Note: ANTHROPIC_API_KEY is intentionally NOT included here.
-            # The SDK CLI picks it up from the env directly. Including it
-            # would pair it with the OpenRouter base_url, causing auth failures.
        return v

    @field_validator("base_url", mode="before")
    @classmethod
    def get_base_url(cls, v):
        """Get base URL from environment if not provided."""
-        if not v:
+        if v is None:
            # Check for OpenRouter or custom base URL
            v = os.getenv("CHAT_BASE_URL")
            if not v:
@@ -167,6 +154,14 @@ class ChatConfig(BaseSettings):
                v = "https://openrouter.ai/api/v1"
        return v

+    @field_validator("internal_api_key", mode="before")
+    @classmethod
+    def get_internal_api_key(cls, v):
+        """Get internal API key from environment if not provided."""
+        if v is None:
+            v = os.getenv("CHAT_INTERNAL_API_KEY")
+        return v
+
    @field_validator("use_claude_agent_sdk", mode="before")
    @classmethod
    def get_use_claude_agent_sdk(cls, v):
@@ -178,15 +173,6 @@ class ChatConfig(BaseSettings):
        # Default to True (SDK enabled by default)
        return True if v is None else v

-    @field_validator("use_claude_code_subscription", mode="before")
-    @classmethod
-    def get_use_claude_code_subscription(cls, v):
-        """Get use_claude_code_subscription from environment if not provided."""
-        env_val = os.getenv("CHAT_USE_CLAUDE_CODE_SUBSCRIPTION", "").lower()
-        if env_val:
-            return env_val in ("true", "1", "yes", "on")
-        return False if v is None else v
-
    # Prompt paths for different contexts
    PROMPT_PATHS: dict[str, str] = {
        "default": "prompts/chat_system.md",
--- a/autogpt_platform/backend/backend/copilot/constants.py
+++ b/autogpt_platform/backend/backend/copilot/constants.py
@@ -1,11 +0,0 @@
-"""Shared constants for the CoPilot module."""
-
-# Special message prefixes for text-based markers (parsed by frontend).
-# The hex suffix makes accidental LLM generation of these strings virtually
-# impossible, avoiding false-positive marker detection in normal conversation.
-COPILOT_ERROR_PREFIX = "[__COPILOT_ERROR_f7a1__]"  # Renders as ErrorCard
-COPILOT_SYSTEM_PREFIX = "[__COPILOT_SYSTEM_e3b0__]"  # Renders as system info message
-
-# Compaction notice messages shown to users.
-COMPACTION_DONE_MSG = "Earlier messages were summarized to fit within context limits."
-COMPACTION_TOOL_NAME = "context_compaction"
--- a/autogpt_platform/backend/backend/copilot/db.py
+++ b/autogpt_platform/backend/backend/copilot/db.py
@@ -3,9 +3,8 @@
 import asyncio
 import logging
 from datetime import UTC, datetime
-from typing import Any
+from typing import Any, cast

-from prisma.errors import UniqueViolationError
 from prisma.models import ChatMessage as PrismaChatMessage
 from prisma.models import ChatSession as PrismaChatSession
 from prisma.types import (
@@ -16,7 +15,7 @@ from prisma.types import (
 )

 from backend.data import db
-from backend.util.json import SafeJson, sanitize_string
+from backend.util.json import SafeJson

 from .model import ChatMessage, ChatSession, ChatSessionInfo

@@ -81,35 +80,6 @@ async def update_chat_session(
    return ChatSession.from_db(session) if session else None


-async def update_chat_session_title(
-    session_id: str,
-    user_id: str,
-    title: str,
-    *,
-    only_if_empty: bool = False,
-) -> bool:
-    """Update the title of a chat session, scoped to the owning user.
-
-    Always filters by (session_id, user_id) so callers cannot mutate another
-    user's session even when they know the session_id.
-
-    Args:
-        only_if_empty: When True, uses an atomic ``UPDATE WHERE title IS NULL``
-            guard so auto-generated titles never overwrite a user-set title.
-
-    Returns True if a row was updated, False otherwise (session not found,
-    wrong user, or — when only_if_empty — title was already set).
-    """
-    where: ChatSessionWhereInput = {"id": session_id, "userId": user_id}
-    if only_if_empty:
-        where["title"] = None
-    result = await PrismaChatSession.prisma().update_many(
-        where=where,
-        data={"title": title, "updatedAt": datetime.now(UTC)},
-    )
-    return result > 0
-
-
 async def add_chat_message(
    session_id: str,
    role: str,
@@ -122,24 +92,24 @@ async def add_chat_message(
    function_call: dict[str, Any] | None = None,
 ) -> ChatMessage:
    """Add a message to a chat session."""
-    # Build ChatMessageCreateInput with only non-None values
-    # (Prisma TypedDict rejects optional fields set to None)
-    data: ChatMessageCreateInput = {
+    # Build input dict dynamically rather than using ChatMessageCreateInput directly
+    # because Prisma's TypedDict validation rejects optional fields set to None.
+    # We only include fields that have values, then cast at the end.
+    data: dict[str, Any] = {
        "Session": {"connect": {"id": session_id}},
        "role": role,
        "sequence": sequence,
    }

-    # Add optional string fields — sanitize to strip PostgreSQL-incompatible
-    # control characters (null bytes etc.) that may appear in tool outputs.
+    # Add optional string fields
    if content is not None:
-        data["content"] = sanitize_string(content)
+        data["content"] = content
    if name is not None:
        data["name"] = name
    if tool_call_id is not None:
        data["toolCallId"] = tool_call_id
    if refusal is not None:
-        data["refusal"] = sanitize_string(refusal)
+        data["refusal"] = refusal

    # Add optional JSON fields only when they have values
    if tool_calls is not None:
@@ -153,7 +123,7 @@ async def add_chat_message(
            where={"id": session_id},
            data={"updatedAt": datetime.now(UTC)},
        ),
-        PrismaChatMessage.prisma().create(data=data),
+        PrismaChatMessage.prisma().create(data=cast(ChatMessageCreateInput, data)),
    )
    return ChatMessage.from_db(message)

@@ -162,94 +132,58 @@ async def add_chat_messages_batch(
    session_id: str,
    messages: list[dict[str, Any]],
    start_sequence: int,
-) -> int:
+) -> list[ChatMessage]:
    """Add multiple messages to a chat session in a batch.

-    Uses collision detection with retry: tries to create messages starting
-    at start_sequence. If a unique constraint violation occurs (e.g., the
-    streaming loop and long-running callback race), queries the latest
-    sequence and retries with the correct offset. This avoids unnecessary
-    upserts and DB queries in the common case (no collision).
-
-    Returns:
-        Next sequence number for the next message to be inserted. This equals
-        start_sequence + len(messages) and allows callers to update their
-        counters even when collision detection adjusts start_sequence.
+    Uses a transaction for atomicity - if any message creation fails,
+    the entire batch is rolled back.
    """
    if not messages:
-        # No messages to add - return current count
-        return start_sequence
+        return []

-    max_retries = 5
-    for attempt in range(max_retries):
-        try:
-            # Single timestamp for all messages and session update
-            now = datetime.now(UTC)
+    created_messages = []

-            async with db.transaction() as tx:
-                # Build all message data
-                messages_data = []
-                for i, msg in enumerate(messages):
-                    # Build ChatMessageCreateInput with only non-None values
-                    # (Prisma TypedDict rejects optional fields set to None)
-                    # Note: create_many doesn't support nested creates, use sessionId directly
-                    data: ChatMessageCreateInput = {
-                        "sessionId": session_id,
-                        "role": msg["role"],
-                        "sequence": start_sequence + i,
-                        "createdAt": now,
-                    }
+    async with db.transaction() as tx:
+        for i, msg in enumerate(messages):
+            # Build input dict dynamically rather than using ChatMessageCreateInput
+            # directly because Prisma's TypedDict validation rejects optional fields
+            # set to None. We only include fields that have values, then cast.
+            data: dict[str, Any] = {
+                "Session": {"connect": {"id": session_id}},
+                "role": msg["role"],
+                "sequence": start_sequence + i,
+            }

-                    # Add optional string fields — sanitize to strip
-                    # PostgreSQL-incompatible control characters.
-                    if msg.get("content") is not None:
-                        data["content"] = sanitize_string(msg["content"])
-                    if msg.get("name") is not None:
-                        data["name"] = msg["name"]
-                    if msg.get("tool_call_id") is not None:
-                        data["toolCallId"] = msg["tool_call_id"]
-                    if msg.get("refusal") is not None:
-                        data["refusal"] = sanitize_string(msg["refusal"])
+            # Add optional string fields
+            if msg.get("content") is not None:
+                data["content"] = msg["content"]
+            if msg.get("name") is not None:
+                data["name"] = msg["name"]
+            if msg.get("tool_call_id") is not None:
+                data["toolCallId"] = msg["tool_call_id"]
+            if msg.get("refusal") is not None:
+                data["refusal"] = msg["refusal"]

-                    # Add optional JSON fields only when they have values
-                    if msg.get("tool_calls") is not None:
-                        data["toolCalls"] = SafeJson(msg["tool_calls"])
-                    if msg.get("function_call") is not None:
-                        data["functionCall"] = SafeJson(msg["function_call"])
+            # Add optional JSON fields only when they have values
+            if msg.get("tool_calls") is not None:
+                data["toolCalls"] = SafeJson(msg["tool_calls"])
+            if msg.get("function_call") is not None:
+                data["functionCall"] = SafeJson(msg["function_call"])

-                    messages_data.append(data)
+            created = await PrismaChatMessage.prisma(tx).create(
+                data=cast(ChatMessageCreateInput, data)
+            )
+            created_messages.append(created)

-                # Run create_many and session update in parallel within transaction
-                # Both use the same timestamp for consistency
-                await asyncio.gather(
-                    PrismaChatMessage.prisma(tx).create_many(data=messages_data),
-                    PrismaChatSession.prisma(tx).update(
-                        where={"id": session_id},
-                        data={"updatedAt": now},
-                    ),
-                )
+        # Update session's updatedAt timestamp within the same transaction.
+        # Note: Token usage (total_prompt_tokens, total_completion_tokens) is updated
+        # separately via update_chat_session() after streaming completes.
+        await PrismaChatSession.prisma(tx).update(
+            where={"id": session_id},
+            data={"updatedAt": datetime.now(UTC)},
+        )

-            # Return next sequence number for counter sync
-            return start_sequence + len(messages)
-
-        except UniqueViolationError:
-            if attempt < max_retries - 1:
-                # Collision detected - query MAX(sequence)+1 and retry with correct offset
-                logger.info(
-                    f"Collision detected for session {session_id} at sequence "
-                    f"{start_sequence}, querying DB for latest sequence"
-                )
-                start_sequence = await get_next_sequence(session_id)
-                logger.info(
-                    f"Retrying batch insert with start_sequence={start_sequence}"
-                )
-                continue
-            else:
-                # Max retries exceeded - propagate error
-                raise
-
-    # Should never reach here due to raise in exception handler
-    raise RuntimeError(f"Failed to insert messages after {max_retries} attempts")
+    return [ChatMessage.from_db(m) for m in created_messages]


 async def get_user_chat_sessions(
@@ -303,20 +237,10 @@ async def delete_chat_session(session_id: str, user_id: str | None = None) -> bo
        return False


-async def get_next_sequence(session_id: str) -> int:
-    """Get the next sequence number for a new message in this session.
-
-    Uses MAX(sequence) + 1 for robustness. Returns 0 if no messages exist.
-    More robust than COUNT(*) because it's immune to deleted messages.
-
-    Optimized to select only the sequence column using raw SQL.
-    The unique index on (sessionId, sequence) makes this query fast.
-    """
-    results = await db.query_raw_with_schema(
-        'SELECT "sequence" FROM {schema_prefix}"ChatMessage" WHERE "sessionId" = $1 ORDER BY "sequence" DESC LIMIT 1',
-        session_id,
-    )
-    return 0 if not results else results[0]["sequence"] + 1
+async def get_chat_session_message_count(session_id: str) -> int:
+    """Get the number of messages in a chat session."""
+    count = await PrismaChatMessage.prisma().count(where={"sessionId": session_id})
+    return count


 async def update_tool_message_content(
@@ -343,7 +267,7 @@ async def update_tool_message_content(
                "toolCallId": tool_call_id,
            },
            data={
-                "content": sanitize_string(new_content),
+                "content": new_content,
            },
        )
        if result == 0:
--- a/autogpt_platform/backend/backend/copilot/executor/manager.py
+++ b/autogpt_platform/backend/backend/copilot/executor/manager.py
@@ -4,7 +4,6 @@ This module contains the CoPilotExecutor class that consumes chat tasks from
 RabbitMQ and processes them using a thread pool, following the graph executor pattern.
 """

-import asyncio
 import logging
 import os
 import threading
@@ -26,7 +25,7 @@ from backend.util.process import AppProcess
 from backend.util.retry import continuous_retry
 from backend.util.settings import Settings

-from .processor import execute_copilot_turn, init_worker
+from .processor import execute_copilot_task, init_worker
 from .utils import (
    COPILOT_CANCEL_QUEUE_NAME,
    COPILOT_EXECUTION_QUEUE_NAME,
@@ -182,13 +181,13 @@ class CoPilotExecutor(AppProcess):
            self._executor.shutdown(wait=False)

        # Release any remaining locks
-        for session_id, lock in list(self._task_locks.items()):
+        for task_id, lock in list(self._task_locks.items()):
            try:
                lock.release()
-                logger.info(f"[cleanup {pid}] Released lock for {session_id}")
+                logger.info(f"[cleanup {pid}] Released lock for {task_id}")
            except Exception as e:
                logger.error(
-                    f"[cleanup {pid}] Failed to release lock for {session_id}: {e}"
+                    f"[cleanup {pid}] Failed to release lock for {task_id}: {e}"
                )

        logger.info(f"[cleanup {pid}] Graceful shutdown completed")
@@ -268,20 +267,20 @@ class CoPilotExecutor(AppProcess):
    ):
        """Handle cancel message from FANOUT exchange."""
        request = CancelCoPilotEvent.model_validate_json(body)
-        session_id = request.session_id
-        if not session_id:
-            logger.warning("Cancel message missing 'session_id'")
+        task_id = request.task_id
+        if not task_id:
+            logger.warning("Cancel message missing 'task_id'")
            return
-        if session_id not in self.active_tasks:
-            logger.debug(f"Cancel received for {session_id} but not active")
+        if task_id not in self.active_tasks:
+            logger.debug(f"Cancel received for {task_id} but not active")
            return

-        _, cancel_event = self.active_tasks[session_id]
-        logger.info(f"Received cancel for {session_id}")
+        _, cancel_event = self.active_tasks[task_id]
+        logger.info(f"Received cancel for {task_id}")
        if not cancel_event.is_set():
            cancel_event.set()
        else:
-            logger.debug(f"Cancel already set for {session_id}")
+            logger.debug(f"Cancel already set for {task_id}")

    def _handle_run_message(
        self,
@@ -353,12 +352,12 @@ class CoPilotExecutor(AppProcess):
            ack_message(reject=True, requeue=False)
            return

-        session_id = entry.session_id
+        task_id = entry.task_id

-        # Check for local duplicate - session is already running on this executor
-        if session_id in self.active_tasks:
+        # Check for local duplicate - task is already running on this executor
+        if task_id in self.active_tasks:
            logger.warning(
-                f"Session {session_id} already running locally, rejecting duplicate"
+                f"Task {task_id} already running locally, rejecting duplicate"
            )
            ack_message(reject=True, requeue=False)
            return
@@ -366,69 +365,64 @@ class CoPilotExecutor(AppProcess):
        # Try to acquire cluster-wide lock
        cluster_lock = ClusterLock(
            redis=redis.get_redis(),
-            key=f"copilot:session:{session_id}:lock",
+            key=f"copilot:task:{task_id}:lock",
            owner_id=self.executor_id,
            timeout=settings.config.cluster_lock_timeout,
        )
        current_owner = cluster_lock.try_acquire()
        if current_owner != self.executor_id:
            if current_owner is not None:
-                logger.warning(
-                    f"Session {session_id} already running on pod {current_owner}"
-                )
+                logger.warning(f"Task {task_id} already running on pod {current_owner}")
                ack_message(reject=True, requeue=False)
            else:
                logger.warning(
-                    f"Could not acquire lock for {session_id} - Redis unavailable"
+                    f"Could not acquire lock for {task_id} - Redis unavailable"
                )
                ack_message(reject=True, requeue=True)
            return

        # Execute the task
        try:
-            self._task_locks[session_id] = cluster_lock
+            self._task_locks[task_id] = cluster_lock

            logger.info(
-                f"Acquired cluster lock for {session_id}, "
-                f"executor_id={self.executor_id}"
+                f"Acquired cluster lock for {task_id}, executor_id={self.executor_id}"
            )

            cancel_event = threading.Event()
            future = self.executor.submit(
-                execute_copilot_turn, entry, cancel_event, cluster_lock
+                execute_copilot_task, entry, cancel_event, cluster_lock
            )
-            self.active_tasks[session_id] = (future, cancel_event)
+            self.active_tasks[task_id] = (future, cancel_event)
        except Exception as e:
-            logger.warning(f"Failed to setup execution for {session_id}: {e}")
+            logger.warning(f"Failed to setup execution for {task_id}: {e}")
            cluster_lock.release()
-            if session_id in self._task_locks:
-                del self._task_locks[session_id]
+            if task_id in self._task_locks:
+                del self._task_locks[task_id]
            ack_message(reject=True, requeue=True)
            return

        self._update_metrics()

        def on_run_done(f: Future):
-            logger.info(f"Run completed for {session_id}")
-            error_msg = None
+            logger.info(f"Run completed for {task_id}")
            try:
                if exec_error := f.exception():
-                    error_msg = str(exec_error) or type(exec_error).__name__
-                    logger.error(f"Execution for {session_id} failed: {error_msg}")
+                    logger.error(f"Execution for {task_id} failed: {exec_error}")
+                    # Don't requeue failed tasks - they've been marked as failed
+                    # in the stream registry. Requeuing would cause infinite retries
+                    # for deterministic failures.
                    ack_message(reject=True, requeue=False)
                else:
                    ack_message(reject=False, requeue=False)
-            except asyncio.CancelledError:
-                logger.info(f"Run completion callback cancelled for {session_id}")
            except BaseException as e:
-                error_msg = str(e) or type(e).__name__
-                logger.exception(f"Error in run completion callback: {error_msg}")
+                logger.exception(f"Error in run completion callback: {e}")
            finally:
                # Release the cluster lock
-                if session_id in self._task_locks:
-                    logger.info(f"Releasing cluster lock for {session_id}")
-                    self._task_locks[session_id].release()
-                    del self._task_locks[session_id]
+                if task_id in self._task_locks:
+                    logger.info(f"Releasing cluster lock for {task_id}")
+                    self._task_locks[task_id].release()
+                    del self._task_locks[task_id]
                self._cleanup_completed_tasks()

        future.add_done_callback(on_run_done)
@@ -439,11 +433,11 @@ class CoPilotExecutor(AppProcess):
        """Remove completed futures from active_tasks and update metrics."""
        completed_tasks = []
        with self._active_tasks_lock:
-            for session_id, (future, _) in list(self.active_tasks.items()):
+            for task_id, (future, _) in list(self.active_tasks.items()):
                if future.done():
-                    completed_tasks.append(session_id)
-                    self.active_tasks.pop(session_id, None)
-                    logger.info(f"Cleaned up completed session {session_id}")
+                    completed_tasks.append(task_id)
+                    self.active_tasks.pop(task_id, None)
+                    logger.info(f"Cleaned up completed task {task_id}")

        self._update_metrics()
        return completed_tasks
--- a/autogpt_platform/backend/backend/copilot/executor/processor.py
+++ b/autogpt_platform/backend/backend/copilot/executor/processor.py
@@ -1,20 +1,18 @@
 """CoPilot execution processor - per-worker execution logic.

-This module contains the processor class that handles CoPilot session execution
+This module contains the processor class that handles CoPilot task execution
 in a thread-local context, following the graph executor pattern.
 """

 import asyncio
 import logging
-import os
-import subprocess
 import threading
 import time

+from backend.copilot import service as copilot_service
 from backend.copilot import stream_registry
-from backend.copilot.baseline import stream_chat_completion_baseline
 from backend.copilot.config import ChatConfig
-from backend.copilot.response_model import StreamFinish
+from backend.copilot.response_model import StreamError, StreamFinish, StreamFinishStep
 from backend.copilot.sdk import service as sdk_service
 from backend.executor.cluster_lock import ClusterLock
 from backend.util.decorator import error_logged
@@ -34,17 +32,17 @@ logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]"
 _tls = threading.local()


-def execute_copilot_turn(
+def execute_copilot_task(
    entry: CoPilotExecutionEntry,
    cancel: threading.Event,
    cluster_lock: ClusterLock,
 ):
-    """Execute a single CoPilot turn (user message → AI response).
+    """Execute a CoPilot task using the thread-local processor.

    This function is the entry point called by the thread pool executor.

    Args:
-        entry: The turn payload
+        entry: The task payload
        cancel: Threading event to signal cancellation
        cluster_lock: Distributed lock for this execution
    """
@@ -78,16 +76,16 @@ def cleanup_worker():


 class CoPilotProcessor:
-    """Per-worker execution logic for CoPilot sessions.
+    """Per-worker execution logic for CoPilot tasks.

    This class is instantiated once per worker thread and handles the execution
-    of CoPilot chat generation sessions. It maintains an async event loop for
+    of CoPilot chat generation tasks. It maintains an async event loop for
    running the async service code.

    The execution flow:
-        1. Session entry is picked from RabbitMQ queue
-        2. Manager submits to thread pool
-        3. Processor executes in its event loop
+        1. CoPilot task is picked from RabbitMQ queue
+        2. Manager submits task to thread pool
+        3. Processor executes the task in its event loop
        4. Results are published to Redis Streams
    """

@@ -110,41 +108,8 @@ class CoPilotProcessor:
        )
        self.execution_thread.start()

-        # Skip the SDK's per-request CLI version check — the bundled CLI is
-        # already version-matched to the SDK package.
-        os.environ.setdefault("CLAUDE_AGENT_SDK_SKIP_VERSION_CHECK", "1")
-
-        # Pre-warm the bundled CLI binary so the OS page-caches the ~185 MB
-        # executable.  First spawn pays ~1.2 s; subsequent spawns ~0.65 s.
-        self._prewarm_cli()
-
        logger.info(f"[CoPilotExecutor] Worker {self.tid} started")

-    def _prewarm_cli(self) -> None:
-        """Run the bundled CLI binary once to warm OS page caches."""
-        try:
-            from claude_agent_sdk._internal.transport.subprocess_cli import (
-                SubprocessCLITransport,
-            )
-
-            cli_path = SubprocessCLITransport._find_bundled_cli(None)  # type: ignore[arg-type]
-            if cli_path:
-                result = subprocess.run(
-                    [cli_path, "-v"],
-                    capture_output=True,
-                    timeout=10,
-                )
-                if result.returncode == 0:
-                    logger.info(f"[CoPilotExecutor] CLI pre-warm done: {cli_path}")
-                else:
-                    logger.warning(
-                        "[CoPilotExecutor] CLI pre-warm failed (rc=%d): %s",
-                        result.returncode,  # type: ignore[reportCallIssue]
-                        cli_path,
-                    )
-        except Exception as e:
-            logger.debug(f"[CoPilotExecutor] CLI pre-warm skipped: {e}")
-
    def cleanup(self):
        """Clean up event-loop-bound resources before the loop is destroyed.

@@ -154,16 +119,13 @@ class CoPilotProcessor:
        """
        from backend.util.workspace_storage import shutdown_workspace_storage

-        coro = shutdown_workspace_storage()
        try:
-            future = asyncio.run_coroutine_threadsafe(coro, self.execution_loop)
+            future = asyncio.run_coroutine_threadsafe(
+                shutdown_workspace_storage(), self.execution_loop
+            )
            future.result(timeout=5)
        except Exception as e:
-            coro.close()  # Prevent "coroutine was never awaited" warning
-            error_msg = str(e) or type(e).__name__
-            logger.warning(
-                f"[CoPilotExecutor] Worker {self.tid} cleanup error: {error_msg}"
-            )
+            logger.warning(f"[CoPilotExecutor] Worker {self.tid} cleanup error: {e}")

        # Stop the event loop
        self.execution_loop.call_soon_threadsafe(self.execution_loop.stop)
@@ -177,17 +139,19 @@ class CoPilotProcessor:
        cancel: threading.Event,
        cluster_lock: ClusterLock,
    ):
-        """Execute a CoPilot turn.
+        """Execute a CoPilot task.

-        Runs the async logic in the worker's event loop and handles errors.
+        This is the main entry point for task execution. It runs the async
+        execution logic in the worker's event loop and handles errors.

        Args:
-            entry: The turn payload containing session and message info
+            entry: The task payload containing session and message info
            cancel: Threading event to signal cancellation
            cluster_lock: Distributed lock to prevent duplicate execution
        """
        log = CoPilotLogMetadata(
            logging.getLogger(__name__),
+            task_id=entry.task_id,
            session_id=entry.session_id,
            user_id=entry.user_id,
        )
@@ -195,30 +159,38 @@ class CoPilotProcessor:

        start_time = time.monotonic()

-        # Run the async execution in our event loop
-        future = asyncio.run_coroutine_threadsafe(
-            self._execute_async(entry, cancel, cluster_lock, log),
-            self.execution_loop,
-        )
+        try:
+            # Run the async execution in our event loop
+            future = asyncio.run_coroutine_threadsafe(
+                self._execute_async(entry, cancel, cluster_lock, log),
+                self.execution_loop,
+            )

-        # Wait for completion, checking cancel periodically
-        while not future.done():
-            try:
-                future.result(timeout=1.0)
-            except asyncio.TimeoutError:
-                if cancel.is_set():
-                    log.info("Cancellation requested")
-                    future.cancel()
-                    break
-                # Refresh cluster lock to maintain ownership
-                cluster_lock.refresh()
+            # Wait for completion, checking cancel periodically
+            while not future.done():
+                try:
+                    future.result(timeout=1.0)
+                except asyncio.TimeoutError:
+                    if cancel.is_set():
+                        log.info("Cancellation requested")
+                        future.cancel()
+                        break
+                    # Refresh cluster lock to maintain ownership
+                    cluster_lock.refresh()

-        if not future.cancelled():
-            # Get result to propagate any exceptions
-            future.result()
+            if not future.cancelled():
+                # Get result to propagate any exceptions
+                future.result()

-        elapsed = time.monotonic() - start_time
-        log.info(f"Execution completed in {elapsed:.2f}s")
+            elapsed = time.monotonic() - start_time
+            log.info(f"Execution completed in {elapsed:.2f}s")
+
+        except Exception as e:
+            elapsed = time.monotonic() - start_time
+            log.error(f"Execution failed after {elapsed:.2f}s: {e}")
+            # Note: _execute_async already marks the task as failed before re-raising,
+            # so we don't call _mark_task_failed here to avoid duplicate error events.
+            raise

    async def _execute_async(
        self,
@@ -227,26 +199,24 @@ class CoPilotProcessor:
        cluster_lock: ClusterLock,
        log: CoPilotLogMetadata,
    ):
-        """Async execution logic for a CoPilot turn.
+        """Async execution logic for CoPilot task.

-        Calls the chat completion service (SDK or baseline) and publishes
-        results to the stream registry.
+        This method calls the existing stream_chat_completion service function
+        and publishes results to the stream registry.

        Args:
-            entry: The turn payload
+            entry: The task payload
            cancel: Threading event to signal cancellation
            cluster_lock: Distributed lock for refresh
-            log: Structured logger
+            log: Structured logger for this task
        """
        last_refresh = time.monotonic()
        refresh_interval = 30.0  # Refresh lock every 30 seconds
-        error_msg = None

        try:
-            # Choose service based on LaunchDarkly flag.
-            # Claude Code subscription forces SDK mode (CLI subprocess auth).
+            # Choose service based on LaunchDarkly flag
            config = ChatConfig()
-            use_sdk = config.use_claude_code_subscription or await is_feature_enabled(
+            use_sdk = await is_feature_enabled(
                Flag.COPILOT_SDK,
                entry.user_id or "anonymous",
                default=config.use_claude_agent_sdk,
@@ -254,60 +224,64 @@ class CoPilotProcessor:
            stream_fn = (
                sdk_service.stream_chat_completion_sdk
                if use_sdk
-                else stream_chat_completion_baseline
+                else copilot_service.stream_chat_completion
            )
-            log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
+            log.info(f"Using {'SDK' if use_sdk else 'standard'} service")

-            # Stream chat completion and publish chunks to Redis.
+            # Stream chat completion and publish chunks to Redis
            async for chunk in stream_fn(
                session_id=entry.session_id,
                message=entry.message if entry.message else None,
                is_user_message=entry.is_user_message,
                user_id=entry.user_id,
                context=entry.context,
-                file_ids=entry.file_ids,
            ):
+                # Check for cancellation
                if cancel.is_set():
-                    log.info("Cancel requested, breaking stream")
-                    break
+                    log.info("Cancelled during streaming")
+                    await stream_registry.publish_chunk(
+                        entry.task_id, StreamError(errorText="Operation cancelled")
+                    )
+                    await stream_registry.publish_chunk(
+                        entry.task_id, StreamFinishStep()
+                    )
+                    await stream_registry.publish_chunk(entry.task_id, StreamFinish())
+                    await stream_registry.mark_task_completed(
+                        entry.task_id, status="failed"
+                    )
+                    return

+                # Refresh cluster lock periodically
                current_time = time.monotonic()
                if current_time - last_refresh >= refresh_interval:
                    cluster_lock.refresh()
                    last_refresh = current_time

-                # Skip StreamFinish — mark_session_completed publishes it.
-                if isinstance(chunk, StreamFinish):
-                    continue
+                # Publish chunk to stream registry
+                await stream_registry.publish_chunk(entry.task_id, chunk)

-                try:
-                    await stream_registry.publish_chunk(entry.turn_id, chunk)
-                except Exception as e:
-                    log.error(
-                        f"Error publishing chunk {type(chunk).__name__}: {e}",
-                        exc_info=True,
-                    )
+            # Mark task as completed
+            await stream_registry.mark_task_completed(entry.task_id, status="completed")
+            log.info("Task completed successfully")

-            # Stream loop completed
-            if cancel.is_set():
-                log.info("Stream cancelled by user")
-
-        except BaseException as e:
-            # Handle all exceptions (including CancelledError) with appropriate logging
-            if isinstance(e, asyncio.CancelledError):
-                log.info("Turn cancelled")
-                error_msg = "Operation cancelled"
-            else:
-                error_msg = str(e) or type(e).__name__
-                log.error(f"Turn failed: {error_msg}")
+        except asyncio.CancelledError:
+            log.info("Task cancelled")
+            await stream_registry.mark_task_completed(entry.task_id, status="failed")
            raise
-        finally:
-            # If no exception but user cancelled, still mark as cancelled
-            if not error_msg and cancel.is_set():
-                error_msg = "Operation cancelled"
-            try:
-                await stream_registry.mark_session_completed(
-                    entry.session_id, error_message=error_msg
-                )
-            except Exception as mark_err:
-                log.error(f"Failed to mark session completed: {mark_err}")
+
+        except Exception as e:
+            log.error(f"Task failed: {e}")
+            await self._mark_task_failed(entry.task_id, str(e))
+            raise
+
+    async def _mark_task_failed(self, task_id: str, error_message: str):
+        """Mark a task as failed and publish error to stream registry."""
+        try:
+            await stream_registry.publish_chunk(
+                task_id, StreamError(errorText=error_message)
+            )
+            await stream_registry.publish_chunk(task_id, StreamFinishStep())
+            await stream_registry.publish_chunk(task_id, StreamFinish())
+            await stream_registry.mark_task_completed(task_id, status="failed")
+        except Exception as e:
+            logger.error(f"Failed to mark task {task_id} as failed: {e}")
--- a/autogpt_platform/backend/backend/copilot/executor/utils.py
+++ b/autogpt_platform/backend/backend/copilot/executor/utils.py
@@ -28,7 +28,7 @@ class CoPilotLogMetadata(TruncatedLogger):
    Args:
        logger: The underlying logger instance
        max_length: Maximum log message length before truncation
-        **kwargs: Metadata key-value pairs (e.g., session_id="xyz", turn_id="abc")
+        **kwargs: Metadata key-value pairs (e.g., task_id="abc", session_id="xyz")
            These are added to json_fields in cloud mode, or to the prefix in local mode.
    """

@@ -135,15 +135,18 @@ class CoPilotExecutionEntry(BaseModel):
    This model represents a chat generation task to be processed by the executor.
    """

-    session_id: str
-    """Chat session ID (also used for dedup/locking)"""
+    task_id: str
+    """Unique identifier for this task (used for stream registry)"""

-    turn_id: str = ""
-    """Per-turn UUID for Redis stream isolation"""
+    session_id: str
+    """Chat session ID"""

    user_id: str | None
    """User ID (may be None for anonymous users)"""

+    operation_id: str
+    """Operation ID for webhook callbacks and completion tracking"""
+
    message: str
    """User's message to process"""

@@ -153,50 +156,47 @@ class CoPilotExecutionEntry(BaseModel):
    context: dict[str, str] | None = None
    """Optional context for the message (e.g., {url: str, content: str})"""

-    file_ids: list[str] | None = None
-    """Workspace file IDs attached to the user's message"""
-

 class CancelCoPilotEvent(BaseModel):
    """Event to cancel a CoPilot operation."""

-    session_id: str
-    """Session ID to cancel"""
+    task_id: str
+    """Task ID to cancel"""


 # ============ Queue Publishing Helpers ============ #


-async def enqueue_copilot_turn(
+async def enqueue_copilot_task(
+    task_id: str,
    session_id: str,
    user_id: str | None,
+    operation_id: str,
    message: str,
-    turn_id: str,
    is_user_message: bool = True,
    context: dict[str, str] | None = None,
-    file_ids: list[str] | None = None,
 ) -> None:
    """Enqueue a CoPilot task for processing by the executor service.

    Args:
-        session_id: Chat session ID (also used for dedup/locking)
+        task_id: Unique identifier for this task (used for stream registry)
+        session_id: Chat session ID
        user_id: User ID (may be None for anonymous users)
+        operation_id: Operation ID for webhook callbacks and completion tracking
        message: User's message to process
-        turn_id: Per-turn UUID for Redis stream isolation
        is_user_message: Whether the message is from the user (vs system/assistant)
        context: Optional context for the message (e.g., {url: str, content: str})
-        file_ids: Optional workspace file IDs attached to the user's message
    """
    from backend.util.clients import get_async_copilot_queue

    entry = CoPilotExecutionEntry(
+        task_id=task_id,
        session_id=session_id,
-        turn_id=turn_id,
        user_id=user_id,
+        operation_id=operation_id,
        message=message,
        is_user_message=is_user_message,
        context=context,
-        file_ids=file_ids,
    )

    queue_client = await get_async_copilot_queue()
@@ -207,15 +207,15 @@ async def enqueue_copilot_turn(
    )


-async def enqueue_cancel_task(session_id: str) -> None:
-    """Publish a cancel request for a running CoPilot session.
+async def enqueue_cancel_task(task_id: str) -> None:
+    """Publish a cancel request for a running CoPilot task.

    Sends a ``CancelCoPilotEvent`` to the FANOUT exchange so all executor
    pods receive the cancellation signal.
    """
    from backend.util.clients import get_async_copilot_queue

-    event = CancelCoPilotEvent(session_id=session_id)
+    event = CancelCoPilotEvent(task_id=task_id)
    queue_client = await get_async_copilot_queue()
    await queue_client.publish_message(
        routing_key="",  # FANOUT ignores routing key
--- a/autogpt_platform/backend/backend/copilot/model.py
+++ b/autogpt_platform/backend/backend/copilot/model.py
@@ -432,9 +432,7 @@ async def _get_session_from_db(session_id: str) -> ChatSession | None:
    return session


-async def upsert_chat_session(
-    session: ChatSession,
-) -> ChatSession:
+async def upsert_chat_session(session: ChatSession) -> ChatSession:
    """Update a chat session in both cache and database.

    Uses session-level locking to prevent race conditions when concurrent
@@ -451,34 +449,24 @@ async def upsert_chat_session(
    lock = await _get_session_lock(session.session_id)

    async with lock:
-        # Always query DB for existing message count to ensure consistency
-        existing_message_count = await chat_db().get_next_sequence(session.session_id)
+        # Get existing message count from DB for incremental saves
+        existing_message_count = await chat_db().get_chat_session_message_count(
+            session.session_id
+        )

        db_error: Exception | None = None

        # Save to database (primary storage)
        try:
-            await _save_session_to_db(
-                session,
-                existing_message_count,
-                skip_existence_check=existing_message_count > 0,
-            )
+            await _save_session_to_db(session, existing_message_count)
        except Exception as e:
            logger.error(
                f"Failed to save session {session.session_id} to database: {e}"
            )
            db_error = e

-        # Save to cache (best-effort, even if DB failed).
-        # Title updates (update_session_title) run *outside* this lock because
-        # they only touch the title field, not messages.  So a concurrent rename
-        # or auto-title may have written a newer title to Redis while this
-        # upsert was in progress.  Always prefer the cached title to avoid
-        # overwriting it with the stale in-memory copy.
+        # Save to cache (best-effort, even if DB failed)
        try:
-            existing_cached = await _get_session_from_cache(session.session_id)
-            if existing_cached and existing_cached.title:
-                session = session.model_copy(update={"title": existing_cached.title})
            await cache_chat_session(session)
        except Exception as e:
            # If DB succeeded but cache failed, raise cache error
@@ -501,31 +489,21 @@ async def upsert_chat_session(


 async def _save_session_to_db(
-    session: ChatSession,
-    existing_message_count: int,
-    *,
-    skip_existence_check: bool = False,
+    session: ChatSession, existing_message_count: int
 ) -> None:
-    """Save or update a chat session in the database.
-
-    Args:
-        skip_existence_check: When True, skip the ``get_chat_session`` query
-            and assume the session row already exists.  Saves one DB round trip
-            for incremental saves during streaming.
-    """
+    """Save or update a chat session in the database."""
    db = chat_db()

-    if not skip_existence_check:
-        # Check if session exists in DB
-        existing = await db.get_chat_session(session.session_id)
+    # Check if session exists in DB
+    existing = await db.get_chat_session(session.session_id)

-        if not existing:
-            # Create new session
-            await db.create_chat_session(
-                session_id=session.session_id,
-                user_id=session.user_id,
-            )
-            existing_message_count = 0
+    if not existing:
+        # Create new session
+        await db.create_chat_session(
+            session_id=session.session_id,
+            user_id=session.user_id,
+        )
+        existing_message_count = 0

    # Calculate total tokens from usage
    total_prompt = sum(u.prompt_tokens for u in session.usage)
@@ -584,7 +562,9 @@ async def append_and_save_message(session_id: str, message: ChatMessage) -> Chat
            raise ValueError(f"Session {session_id} not found")

        session.messages.append(message)
-        existing_message_count = await chat_db().get_next_sequence(session_id)
+        existing_message_count = await chat_db().get_chat_session_message_count(
+            session_id
+        )

        try:
            await _save_session_to_db(session, existing_message_count)
@@ -680,47 +660,27 @@ async def delete_chat_session(session_id: str, user_id: str | None = None) -> bo
    async with _session_locks_mutex:
        _session_locks.pop(session_id, None)

-    # Shut down any local browser daemon for this session (best-effort).
-    # Inline import required: all tool modules import ChatSession from this
-    # module, so any top-level import from tools.* would create a cycle.
-    try:
-        from .tools.agent_browser import close_browser_session
-
-        await close_browser_session(session_id, user_id=user_id)
-    except Exception as e:
-        logger.debug(f"Browser cleanup for session {session_id}: {e}")
-
    return True


-async def update_session_title(
-    session_id: str,
-    user_id: str,
-    title: str,
-    *,
-    only_if_empty: bool = False,
-) -> bool:
-    """Update the title of a chat session, scoped to the owning user.
+async def update_session_title(session_id: str, title: str) -> bool:
+    """Update only the title of a chat session.

-    Lightweight operation that doesn't touch messages, avoiding race conditions
-    with concurrent message updates.
+    This is a lightweight operation that doesn't touch messages, avoiding
+    race conditions with concurrent message updates. Use this for background
+    title generation instead of upsert_chat_session.

    Args:
        session_id: The session ID to update.
-        user_id: Owning user — the DB query filters on this.
        title: The new title to set.
-        only_if_empty: When True, uses an atomic ``UPDATE WHERE title IS NULL``
-            so auto-generated titles never overwrite a user-set title.

    Returns:
-        True if updated successfully, False otherwise (not found, wrong user,
-        or — when only_if_empty — title was already set).
+        True if updated successfully, False otherwise.
    """
    try:
-        updated = await chat_db().update_chat_session_title(
-            session_id, user_id, title, only_if_empty=only_if_empty
-        )
-        if not updated:
+        result = await chat_db().update_chat_session(session_id=session_id, title=title)
+        if result is None:
+            logger.warning(f"Session {session_id} not found for title update")
            return False

        # Update title in cache if it exists (instead of invalidating).
@@ -732,8 +692,9 @@ async def update_session_title(
                cached.title = title
                await cache_chat_session(cached)
        except Exception as e:
+            # Not critical - title will be correct on next full cache refresh
            logger.warning(
-                f"Cache title update failed for session {session_id} (non-critical): {e}"
+                f"Failed to update title in cache for session {session_id}: {e}"
            )

        return True
--- a/autogpt_platform/backend/backend/copilot/model_test.py
+++ b/autogpt_platform/backend/backend/copilot/model_test.py
@@ -331,96 +331,3 @@ def test_to_openai_messages_merges_split_assistants():
    tc_list = merged.get("tool_calls")
    assert tc_list is not None and len(list(tc_list)) == 1
    assert list(tc_list)[0]["id"] == "tc1"
-
-
-# --------------------------------------------------------------------------- #
-#  Concurrent save collision detection                                        #
-# --------------------------------------------------------------------------- #
-
-
-@pytest.mark.asyncio(loop_scope="session")
-async def test_concurrent_saves_collision_detection(setup_test_user, test_user_id):
-    """Test that concurrent saves from streaming loop and callback handle collisions correctly.
-
-    Simulates the race condition where:
-    1. Streaming loop starts with saved_msg_count=5
-    2. Long-running callback appends message #5 and saves
-    3. Streaming loop tries to save with stale count=5
-
-    The collision detection should handle this gracefully.
-    """
-    import asyncio
-
-    # Create a session with initial messages
-    session = ChatSession.new(user_id=test_user_id)
-    for i in range(3):
-        session.messages.append(
-            ChatMessage(
-                role="user" if i % 2 == 0 else "assistant", content=f"Message {i}"
-            )
-        )
-
-    # Save initial messages
-    session = await upsert_chat_session(session)
-
-    # Simulate streaming loop and callback saving concurrently
-    async def streaming_loop_save():
-        """Simulates streaming loop saving messages."""
-        # Add 2 messages
-        session.messages.append(ChatMessage(role="user", content="Streaming message 1"))
-        session.messages.append(
-            ChatMessage(role="assistant", content="Streaming message 2")
-        )
-
-        # Wait a bit to let callback potentially save first
-        await asyncio.sleep(0.01)
-
-        # Save (will query DB for existing count)
-        return await upsert_chat_session(session)
-
-    async def callback_save():
-        """Simulates long-running callback saving a message."""
-        # Add 1 message
-        session.messages.append(
-            ChatMessage(role="tool", content="Callback result", tool_call_id="tc1")
-        )
-
-        # Save immediately (will query DB for existing count)
-        return await upsert_chat_session(session)
-
-    # Run both saves concurrently - one will hit collision detection
-    results = await asyncio.gather(streaming_loop_save(), callback_save())
-
-    # Both should succeed
-    assert all(r is not None for r in results)
-
-    # Reload session from DB to verify
-    from backend.data.redis_client import get_redis_async
-
-    redis_key = f"chat:session:{session.session_id}"
-    async_redis = await get_redis_async()
-    await async_redis.delete(redis_key)  # Clear cache to force DB load
-
-    loaded_session = await get_chat_session(session.session_id, test_user_id)
-    assert loaded_session is not None
-
-    # Should have all 6 messages (3 initial + 2 streaming + 1 callback)
-    assert len(loaded_session.messages) == 6
-
-    # Verify no duplicate sequences
-    sequences = []
-    for i, msg in enumerate(loaded_session.messages):
-        # Messages should have sequential sequence numbers starting from 0
-        sequences.append(i)
-
-    # All sequences should be unique and sequential
-    assert sequences == list(range(6))
-
-    # Verify message content is preserved
-    contents = [m.content for m in loaded_session.messages]
-    assert "Message 0" in contents
-    assert "Message 1" in contents
-    assert "Message 2" in contents
-    assert "Streaming message 1" in contents
-    assert "Streaming message 2" in contents
-    assert "Callback result" in contents
--- a/autogpt_platform/backend/backend/copilot/parallel_tool_calls_test.py
+++ b/autogpt_platform/backend/backend/copilot/parallel_tool_calls_test.py
@@ -0,0 +1,272 @@
+"""Tests for parallel tool call execution in CoPilot.
+
+These tests mock _yield_tool_call to avoid importing the full copilot stack
+which requires Prisma, DB connections, etc.
+"""
+
+import asyncio
+import time
+from typing import Any, cast
+
+import pytest
+
+
+@pytest.mark.asyncio
+async def test_parallel_tool_calls_run_concurrently():
+    """Multiple tool calls should complete in ~max(delays), not sum(delays)."""
+    # Import here to allow module-level mocking if needed
+    from backend.copilot.response_model import (
+        StreamToolInputAvailable,
+        StreamToolOutputAvailable,
+    )
+    from backend.copilot.service import _execute_tool_calls_parallel
+
+    n_tools = 3
+    delay_per_tool = 0.2
+    tool_calls = [
+        {
+            "id": f"call_{i}",
+            "type": "function",
+            "function": {"name": f"tool_{i}", "arguments": "{}"},
+        }
+        for i in range(n_tools)
+    ]
+
+    # Minimal session mock
+    class FakeSession:
+        session_id = "test"
+        user_id = "test"
+
+        def __init__(self):
+            self.messages = []
+
+    original_yield = None
+
+    async def fake_yield(tc_list, idx, sess, lock=None):
+        yield StreamToolInputAvailable(
+            toolCallId=tc_list[idx]["id"],
+            toolName=tc_list[idx]["function"]["name"],
+            input={},
+        )
+        await asyncio.sleep(delay_per_tool)
+        yield StreamToolOutputAvailable(
+            toolCallId=tc_list[idx]["id"],
+            toolName=tc_list[idx]["function"]["name"],
+            output="{}",
+        )
+
+    import backend.copilot.service as svc
+
+    original_yield = svc._yield_tool_call
+    svc._yield_tool_call = fake_yield
+    try:
+        start = time.monotonic()
+        events = []
+        async for event in _execute_tool_calls_parallel(
+            tool_calls, cast(Any, FakeSession())
+        ):
+            events.append(event)
+        elapsed = time.monotonic() - start
+    finally:
+        svc._yield_tool_call = original_yield
+
+    assert len(events) == n_tools * 2
+    # Parallel: should take ~delay, not ~n*delay
+    assert elapsed < delay_per_tool * (
+        n_tools - 0.5
+    ), f"Took {elapsed:.2f}s, expected parallel (~{delay_per_tool}s)"
+
+
+@pytest.mark.asyncio
+async def test_single_tool_call_works():
+    """Single tool call should work identically."""
+    from backend.copilot.response_model import (
+        StreamToolInputAvailable,
+        StreamToolOutputAvailable,
+    )
+    from backend.copilot.service import _execute_tool_calls_parallel
+
+    tool_calls = [
+        {
+            "id": "call_0",
+            "type": "function",
+            "function": {"name": "t", "arguments": "{}"},
+        }
+    ]
+
+    class FakeSession:
+        session_id = "test"
+        user_id = "test"
+
+        def __init__(self):
+            self.messages = []
+
+    async def fake_yield(tc_list, idx, sess, lock=None):
+        yield StreamToolInputAvailable(toolCallId="call_0", toolName="t", input={})
+        yield StreamToolOutputAvailable(toolCallId="call_0", toolName="t", output="{}")
+
+    import backend.copilot.service as svc
+
+    orig = svc._yield_tool_call
+    svc._yield_tool_call = fake_yield
+    try:
+        events = [
+            e
+            async for e in _execute_tool_calls_parallel(
+                tool_calls, cast(Any, FakeSession())
+            )
+        ]
+    finally:
+        svc._yield_tool_call = orig
+
+    assert len(events) == 2
+
+
+@pytest.mark.asyncio
+async def test_retryable_error_propagates():
+    """Retryable errors should be raised after all tools finish."""
+    from backend.copilot.response_model import StreamToolOutputAvailable
+    from backend.copilot.service import _execute_tool_calls_parallel
+
+    tool_calls = [
+        {
+            "id": f"call_{i}",
+            "type": "function",
+            "function": {"name": f"t_{i}", "arguments": "{}"},
+        }
+        for i in range(2)
+    ]
+
+    class FakeSession:
+        session_id = "test"
+        user_id = "test"
+
+        def __init__(self):
+            self.messages = []
+
+    async def fake_yield(tc_list, idx, sess, lock=None):
+        if idx == 1:
+            raise KeyError("bad")
+        from backend.copilot.response_model import StreamToolInputAvailable
+
+        yield StreamToolInputAvailable(
+            toolCallId=tc_list[idx]["id"], toolName="t_0", input={}
+        )
+        await asyncio.sleep(0.05)
+        yield StreamToolOutputAvailable(
+            toolCallId=tc_list[idx]["id"], toolName="t_0", output="{}"
+        )
+
+    import backend.copilot.service as svc
+
+    orig = svc._yield_tool_call
+    svc._yield_tool_call = fake_yield
+    try:
+        events = []
+        with pytest.raises(KeyError):
+            async for event in _execute_tool_calls_parallel(
+                tool_calls, cast(Any, FakeSession())
+            ):
+                events.append(event)
+        # First tool's events should still be yielded
+        assert any(isinstance(e, StreamToolOutputAvailable) for e in events)
+    finally:
+        svc._yield_tool_call = orig
+
+
+@pytest.mark.asyncio
+async def test_session_lock_shared():
+    """All parallel tools should receive the same lock instance."""
+    from backend.copilot.response_model import (
+        StreamToolInputAvailable,
+        StreamToolOutputAvailable,
+    )
+    from backend.copilot.service import _execute_tool_calls_parallel
+
+    tool_calls = [
+        {
+            "id": f"call_{i}",
+            "type": "function",
+            "function": {"name": f"t_{i}", "arguments": "{}"},
+        }
+        for i in range(3)
+    ]
+
+    class FakeSession:
+        session_id = "test"
+        user_id = "test"
+
+        def __init__(self):
+            self.messages = []
+
+    observed_locks = []
+
+    async def fake_yield(tc_list, idx, sess, lock=None):
+        observed_locks.append(lock)
+        yield StreamToolInputAvailable(
+            toolCallId=tc_list[idx]["id"], toolName=f"t_{idx}", input={}
+        )
+        yield StreamToolOutputAvailable(
+            toolCallId=tc_list[idx]["id"], toolName=f"t_{idx}", output="{}"
+        )
+
+    import backend.copilot.service as svc
+
+    orig = svc._yield_tool_call
+    svc._yield_tool_call = fake_yield
+    try:
+        async for _ in _execute_tool_calls_parallel(
+            tool_calls, cast(Any, FakeSession())
+        ):
+            pass
+    finally:
+        svc._yield_tool_call = orig
+
+    assert len(observed_locks) == 3
+    assert observed_locks[0] is observed_locks[1] is observed_locks[2]
+    assert isinstance(observed_locks[0], asyncio.Lock)
+
+
+@pytest.mark.asyncio
+async def test_cancellation_cleans_up():
+    """Generator close should cancel in-flight tasks."""
+    from backend.copilot.response_model import StreamToolInputAvailable
+    from backend.copilot.service import _execute_tool_calls_parallel
+
+    tool_calls = [
+        {
+            "id": f"call_{i}",
+            "type": "function",
+            "function": {"name": f"t_{i}", "arguments": "{}"},
+        }
+        for i in range(2)
+    ]
+
+    class FakeSession:
+        session_id = "test"
+        user_id = "test"
+
+        def __init__(self):
+            self.messages = []
+
+    started = asyncio.Event()
+
+    async def fake_yield(tc_list, idx, sess, lock=None):
+        yield StreamToolInputAvailable(
+            toolCallId=tc_list[idx]["id"], toolName=f"t_{idx}", input={}
+        )
+        started.set()
+        await asyncio.sleep(10)  # simulate long-running
+
+    import backend.copilot.service as svc
+
+    orig = svc._yield_tool_call
+    svc._yield_tool_call = fake_yield
+    try:
+        gen = _execute_tool_calls_parallel(tool_calls, cast(Any, FakeSession()))
+        await gen.__anext__()  # get first event
+        await started.wait()
+        await gen.aclose()  # close generator
+    finally:
+        svc._yield_tool_call = orig
+    # If we get here without hanging, cleanup worked
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -1,191 +0,0 @@
-"""Centralized prompt building logic for CoPilot.
-
-This module contains all prompt construction functions and constants,
-handling the distinction between:
- SDK mode vs Baseline mode (tool documentation needs)
- Local mode vs E2B mode (storage/filesystem differences)
-"""
-
-from backend.copilot.tools import TOOL_REGISTRY
-
-# Shared technical notes that apply to both SDK and baseline modes
-_SHARED_TOOL_NOTES = """\
-
-### Sharing files with the user
-After saving a file to the persistent workspace with `write_workspace_file`,
-share it with the user by embedding the `download_url` from the response in
-your message as a Markdown link or image:
-
- **Any file** — shows as a clickable download link:
-  `[report.csv](workspace://file_id#text/csv)`
- **Image** — renders inline in chat:
-  `![chart](workspace://file_id#image/png)`
- **Video** — renders inline in chat with player controls:
-  `![recording](workspace://file_id#video/mp4)`
-
-The `download_url` field in the `write_workspace_file` response is already
-in the correct format — paste it directly after the `(` in the Markdown.
-
-### Sub-agent tasks
- When using the Task tool, NEVER set `run_in_background` to true.
-  All tasks must run in the foreground.
-"""
-
-
-# Environment-specific supplement templates
-def _build_storage_supplement(
-    working_dir: str,
-    sandbox_type: str,
-    storage_system_1_name: str,
-    storage_system_1_characteristics: list[str],
-    storage_system_1_persistence: list[str],
-    file_move_name_1_to_2: str,
-    file_move_name_2_to_1: str,
-) -> str:
-    """Build storage/filesystem supplement for a specific environment.
-
-    Template function handles all formatting (bullets, indentation, markdown).
-    Callers provide clean data as lists of strings.
-
-    Args:
-        working_dir: Working directory path
-        sandbox_type: Description of bash_exec sandbox
-        storage_system_1_name: Name of primary storage (ephemeral or cloud)
-        storage_system_1_characteristics: List of characteristic descriptions
-        storage_system_1_persistence: List of persistence behavior descriptions
-        file_move_name_1_to_2: Direction label for primary→persistent
-        file_move_name_2_to_1: Direction label for persistent→primary
-    """
-    # Format lists as bullet points with proper indentation
-    characteristics = "\n".join(f"   - {c}" for c in storage_system_1_characteristics)
-    persistence = "\n".join(f"   - {p}" for p in storage_system_1_persistence)
-
-    return f"""
-
-## Tool notes
-
-### Shell commands
- The SDK built-in Bash tool is NOT available.  Use the `bash_exec` MCP tool
-  for shell commands — it runs {sandbox_type}.
-
-### Working directory
- Your working directory is: `{working_dir}`
- All SDK file tools AND `bash_exec` operate on the same filesystem
- Use relative paths or absolute paths under `{working_dir}` for all file operations
-
-### Two storage systems — CRITICAL to understand
-
-1. **{storage_system_1_name}** (`{working_dir}`):
-{characteristics}
-{persistence}
-
-2. **Persistent workspace** (cloud storage):
-   - Files here **survive across sessions indefinitely**
-
-### Moving files between storages
- **{file_move_name_1_to_2}**: Copy to persistent workspace
- **{file_move_name_2_to_1}**: Download for processing
-
-### File persistence
-Important files (code, configs, outputs) should be saved to workspace to ensure they persist.
-{_SHARED_TOOL_NOTES}"""
-
-
-# Pre-built supplements for common environments
-def _get_local_storage_supplement(cwd: str) -> str:
-    """Local ephemeral storage (files lost between turns)."""
-    return _build_storage_supplement(
-        working_dir=cwd,
-        sandbox_type="in a network-isolated sandbox",
-        storage_system_1_name="Ephemeral working directory",
-        storage_system_1_characteristics=[
-            "Shared by SDK Read/Write/Edit/Glob/Grep tools AND `bash_exec`",
-        ],
-        storage_system_1_persistence=[
-            "Files here are **lost between turns** — do NOT rely on them persisting",
-            "Use for temporary work: running scripts, processing data, etc.",
-        ],
-        file_move_name_1_to_2="Ephemeral → Persistent",
-        file_move_name_2_to_1="Persistent → Ephemeral",
-    )
-
-
-def _get_cloud_sandbox_supplement() -> str:
-    """Cloud persistent sandbox (files survive across turns in session)."""
-    return _build_storage_supplement(
-        working_dir="/home/user",
-        sandbox_type="in a cloud sandbox with full internet access",
-        storage_system_1_name="Cloud sandbox",
-        storage_system_1_characteristics=[
-            "Shared by all file tools AND `bash_exec` — same filesystem",
-            "Full Linux environment with internet access",
-        ],
-        storage_system_1_persistence=[
-            "Files **persist across turns** within the current session",
-            "Lost when the session expires (12 h inactivity)",
-        ],
-        file_move_name_1_to_2="Sandbox → Persistent",
-        file_move_name_2_to_1="Persistent → Sandbox",
-    )
-
-
-def _generate_tool_documentation() -> str:
-    """Auto-generate tool documentation from TOOL_REGISTRY.
-
-    NOTE: This is ONLY used in baseline mode (direct OpenAI API).
-    SDK mode doesn't need it since Claude gets tool schemas automatically.
-
-    This generates a complete list of available tools with their descriptions,
-    ensuring the documentation stays in sync with the actual tool implementations.
-    All workflow guidance is now embedded in individual tool descriptions.
-
-    Only documents tools that are available in the current environment
-    (checked via tool.is_available property).
-    """
-    docs = "\n## AVAILABLE TOOLS\n\n"
-
-    # Sort tools alphabetically for consistent output
-    # Filter by is_available to match get_available_tools() behavior
-    for name in sorted(TOOL_REGISTRY.keys()):
-        tool = TOOL_REGISTRY[name]
-        if not tool.is_available:
-            continue
-        schema = tool.as_openai_tool()
-        desc = schema["function"].get("description", "No description available")
-        # Format as bullet list with tool name in code style
-        docs += f"- **`{name}`**: {desc}\n"
-
-    return docs
-
-
-def get_sdk_supplement(use_e2b: bool, cwd: str = "") -> str:
-    """Get the supplement for SDK mode (Claude Agent SDK).
-
-    SDK mode does NOT include tool documentation because Claude automatically
-    receives tool schemas from the SDK. Only includes technical notes about
-    storage systems and execution environment.
-
-    Args:
-        use_e2b: Whether E2B cloud sandbox is being used
-        cwd: Current working directory (only used in local_storage mode)
-
-    Returns:
-        The supplement string to append to the system prompt
-    """
-    if use_e2b:
-        return _get_cloud_sandbox_supplement()
-    return _get_local_storage_supplement(cwd)
-
-
-def get_baseline_supplement() -> str:
-    """Get the supplement for baseline mode (direct OpenAI API).
-
-    Baseline mode INCLUDES auto-generated tool documentation because the
-    direct API doesn't automatically provide tool schemas to Claude.
-    Also includes shared technical notes (but NOT SDK-specific environment details).
-
-    Returns:
-        The supplement string to append to the system prompt
-    """
-    tool_docs = _generate_tool_documentation()
-    return tool_docs + _SHARED_TOOL_NOTES
--- a/autogpt_platform/backend/backend/copilot/response_model.py
+++ b/autogpt_platform/backend/backend/copilot/response_model.py
@@ -5,17 +5,12 @@ This module implements the AI SDK UI Stream Protocol (v1) for streaming chat res
 See: https://ai-sdk.dev/docs/ai-sdk-ui/stream-protocol
 """

-import json
-import logging
 from enum import Enum
 from typing import Any

 from pydantic import BaseModel, Field

 from backend.util.json import dumps as json_dumps
-from backend.util.truncate import truncate
-
-logger = logging.getLogger(__name__)


 class ResponseType(str, Enum):
@@ -52,8 +47,7 @@ class StreamBaseResponse(BaseModel):

    def to_sse(self) -> str:
        """Convert to SSE format."""
-        json_str = self.model_dump_json(exclude_none=True)
-        return f"data: {json_str}\n\n"
+        return f"data: {self.model_dump_json()}\n\n"


 # ========== Message Lifecycle ==========
@@ -64,13 +58,15 @@ class StreamStart(StreamBaseResponse):

    type: ResponseType = ResponseType.START
    messageId: str = Field(..., description="Unique message ID")
-    sessionId: str | None = Field(
+    taskId: str | None = Field(
        default=None,
-        description="Session ID for SSE reconnection.",
+        description="Task ID for SSE reconnection. Clients can reconnect using GET /tasks/{taskId}/stream",
    )

    def to_sse(self) -> str:
-        """Convert to SSE format, excluding non-protocol fields like sessionId."""
+        """Convert to SSE format, excluding non-protocol fields like taskId."""
+        import json
+
        data: dict[str, Any] = {
            "type": self.type.value,
            "messageId": self.messageId,
@@ -151,9 +147,6 @@ class StreamToolInputAvailable(StreamBaseResponse):
    )


-_MAX_TOOL_OUTPUT_SIZE = 100_000  # ~100 KB; truncate to avoid bloating SSE/DB
-
-
 class StreamToolOutputAvailable(StreamBaseResponse):
    """Tool execution result."""

@@ -168,12 +161,10 @@ class StreamToolOutputAvailable(StreamBaseResponse):
        default=True, description="Whether the tool execution succeeded"
    )

-    def model_post_init(self, __context: Any) -> None:
-        """Truncate oversized outputs after construction."""
-        self.output = truncate(self.output, _MAX_TOOL_OUTPUT_SIZE)
-
    def to_sse(self) -> str:
        """Convert to SSE format, excluding non-spec fields."""
+        import json
+
        data = {
            "type": self.type.value,
            "toolCallId": self.toolCallId,
--- a/autogpt_platform/backend/backend/copilot/sdk/compaction.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/compaction.py
@@ -1,239 +0,0 @@
-"""Compaction tracking for SDK-based chat sessions.
-
-Encapsulates the state machine and event emission for context compaction,
-both pre-query (history compressed before SDK query) and SDK-internal
-(PreCompact hook fires mid-stream).
-
-All compaction-related helpers live here: event builders, message filtering,
-persistence, and the ``CompactionTracker`` state machine.
-"""
-
-import asyncio
-import logging
-import uuid
-from collections.abc import Callable
-
-from ..constants import COMPACTION_DONE_MSG, COMPACTION_TOOL_NAME
-from ..model import ChatMessage, ChatSession
-from ..response_model import (
-    StreamBaseResponse,
-    StreamFinishStep,
-    StreamStartStep,
-    StreamToolInputAvailable,
-    StreamToolInputStart,
-    StreamToolOutputAvailable,
-)
-
-logger = logging.getLogger(__name__)
-
-
-# ---------------------------------------------------------------------------
-# Event builders (private — use CompactionTracker or compaction_events)
-# ---------------------------------------------------------------------------
-
-
-def _start_events(tool_call_id: str) -> list[StreamBaseResponse]:
-    """Build the opening events for a compaction tool call."""
-    return [
-        StreamStartStep(),
-        StreamToolInputStart(toolCallId=tool_call_id, toolName=COMPACTION_TOOL_NAME),
-        StreamToolInputAvailable(
-            toolCallId=tool_call_id, toolName=COMPACTION_TOOL_NAME, input={}
-        ),
-    ]
-
-
-def _end_events(tool_call_id: str, message: str) -> list[StreamBaseResponse]:
-    """Build the closing events for a compaction tool call."""
-    return [
-        StreamToolOutputAvailable(
-            toolCallId=tool_call_id,
-            toolName=COMPACTION_TOOL_NAME,
-            output=message,
-        ),
-        StreamFinishStep(),
-    ]
-
-
-def _new_tool_call_id() -> str:
-    return f"compaction-{uuid.uuid4().hex[:12]}"
-
-
-# ---------------------------------------------------------------------------
-# Public event builder
-# ---------------------------------------------------------------------------
-
-
-def emit_compaction(session: ChatSession) -> list[StreamBaseResponse]:
-    """Create, persist, and return a self-contained compaction tool call.
-
-    Convenience for callers that don't use ``CompactionTracker`` (e.g. the
-    legacy non-SDK streaming path in ``service.py``).
-    """
-    tc_id = _new_tool_call_id()
-    evts = compaction_events(COMPACTION_DONE_MSG, tool_call_id=tc_id)
-    _persist(session, tc_id, COMPACTION_DONE_MSG)
-    return evts
-
-
-def compaction_events(
-    message: str, tool_call_id: str | None = None
-) -> list[StreamBaseResponse]:
-    """Emit a self-contained compaction tool call (already completed).
-
-    When *tool_call_id* is provided it is reused (e.g. for persistence that
-    must match an already-streamed start event).  Otherwise a new ID is
-    generated.
-    """
-    tc_id = tool_call_id or _new_tool_call_id()
-    return _start_events(tc_id) + _end_events(tc_id, message)
-
-
-# ---------------------------------------------------------------------------
-# Message filtering
-# ---------------------------------------------------------------------------
-
-
-def filter_compaction_messages(
-    messages: list[ChatMessage],
-) -> list[ChatMessage]:
-    """Remove synthetic compaction tool-call messages (UI-only artifacts).
-
-    Strips assistant messages whose only tool calls are compaction calls,
-    and their corresponding tool-result messages.
-    """
-    compaction_ids: set[str] = set()
-    filtered: list[ChatMessage] = []
-    for msg in messages:
-        if msg.role == "assistant" and msg.tool_calls:
-            for tc in msg.tool_calls:
-                if tc.get("function", {}).get("name") == COMPACTION_TOOL_NAME:
-                    compaction_ids.add(tc.get("id", ""))
-            real_calls = [
-                tc
-                for tc in msg.tool_calls
-                if tc.get("function", {}).get("name") != COMPACTION_TOOL_NAME
-            ]
-            if not real_calls and not msg.content:
-                continue
-        if msg.role == "tool" and msg.tool_call_id in compaction_ids:
-            continue
-        filtered.append(msg)
-    return filtered
-
-
-# ---------------------------------------------------------------------------
-# Persistence
-# ---------------------------------------------------------------------------
-
-
-def _persist(session: ChatSession, tool_call_id: str, message: str) -> None:
-    """Append compaction tool-call + result to session messages.
-
-    Compaction events are synthetic so they bypass the normal adapter
-    accumulation.  This explicitly records them so they survive a page refresh.
-    """
-    session.messages.append(
-        ChatMessage(
-            role="assistant",
-            content="",
-            tool_calls=[
-                {
-                    "id": tool_call_id,
-                    "type": "function",
-                    "function": {
-                        "name": COMPACTION_TOOL_NAME,
-                        "arguments": "{}",
-                    },
-                }
-            ],
-        )
-    )
-    session.messages.append(
-        ChatMessage(role="tool", content=message, tool_call_id=tool_call_id)
-    )
-
-
-# ---------------------------------------------------------------------------
-# CompactionTracker — state machine for streaming sessions
-# ---------------------------------------------------------------------------
-
-
-class CompactionTracker:
-    """Tracks compaction state and yields UI events.
-
-    Two compaction paths:
-
-    1. **Pre-query** — history compressed before the SDK query starts.
-       Call :meth:`emit_pre_query` to yield a self-contained tool call.
-
-    2. **SDK-internal** — ``PreCompact`` hook fires mid-stream.
-       Call :meth:`emit_start_if_ready` on heartbeat ticks and
-       :meth:`emit_end_if_ready` when a message arrives.
-    """
-
-    def __init__(self) -> None:
-        self._compact_start = asyncio.Event()
-        self._start_emitted = False
-        self._done = False
-        self._tool_call_id = ""
-
-    @property
-    def on_compact(self) -> Callable[[], None]:
-        """Callback for the PreCompact hook."""
-        return self._compact_start.set
-
-    # ------------------------------------------------------------------
-    # Pre-query compaction
-    # ------------------------------------------------------------------
-
-    def emit_pre_query(self, session: ChatSession) -> list[StreamBaseResponse]:
-        """Emit + persist a self-contained compaction tool call."""
-        self._done = True
-        return emit_compaction(session)
-
-    # ------------------------------------------------------------------
-    # SDK-internal compaction
-    # ------------------------------------------------------------------
-
-    def reset_for_query(self) -> None:
-        """Reset per-query state before a new SDK query."""
-        self._done = False
-        self._start_emitted = False
-        self._tool_call_id = ""
-
-    def emit_start_if_ready(self) -> list[StreamBaseResponse]:
-        """If the PreCompact hook fired, emit start events (spinning tool)."""
-        if self._compact_start.is_set() and not self._start_emitted and not self._done:
-            self._compact_start.clear()
-            self._start_emitted = True
-            self._tool_call_id = _new_tool_call_id()
-            return _start_events(self._tool_call_id)
-        return []
-
-    async def emit_end_if_ready(self, session: ChatSession) -> list[StreamBaseResponse]:
-        """If compaction is in progress, emit end events and persist."""
-        # Yield so pending hook tasks can set compact_start
-        await asyncio.sleep(0)
-
-        if self._done:
-            return []
-        if not self._start_emitted and not self._compact_start.is_set():
-            return []
-
-        if self._start_emitted:
-            # Close the open spinner
-            done_events = _end_events(self._tool_call_id, COMPACTION_DONE_MSG)
-            persist_id = self._tool_call_id
-        else:
-            # PreCompact fired but start never emitted — self-contained
-            persist_id = _new_tool_call_id()
-            done_events = compaction_events(
-                COMPACTION_DONE_MSG, tool_call_id=persist_id
-            )
-
-        self._compact_start.clear()
-        self._start_emitted = False
-        self._done = True
-        _persist(session, persist_id, COMPACTION_DONE_MSG)
-        return done_events
--- a/autogpt_platform/backend/backend/copilot/sdk/compaction_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/compaction_test.py
@@ -1,291 +0,0 @@
-"""Tests for sdk/compaction.py — event builders, filtering, persistence, and
-CompactionTracker state machine."""
-
-import pytest
-
-from backend.copilot.constants import COMPACTION_DONE_MSG, COMPACTION_TOOL_NAME
-from backend.copilot.model import ChatMessage, ChatSession
-from backend.copilot.response_model import (
-    StreamFinishStep,
-    StreamStartStep,
-    StreamToolInputAvailable,
-    StreamToolInputStart,
-    StreamToolOutputAvailable,
-)
-from backend.copilot.sdk.compaction import (
-    CompactionTracker,
-    compaction_events,
-    emit_compaction,
-    filter_compaction_messages,
-)
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-
-def _make_session() -> ChatSession:
-    return ChatSession.new(user_id="test-user")
-
-
-# ---------------------------------------------------------------------------
-# compaction_events
-# ---------------------------------------------------------------------------
-
-
-class TestCompactionEvents:
-    def test_returns_start_and_end_events(self):
-        evts = compaction_events("done")
-        assert len(evts) == 5
-        assert isinstance(evts[0], StreamStartStep)
-        assert isinstance(evts[1], StreamToolInputStart)
-        assert isinstance(evts[2], StreamToolInputAvailable)
-        assert isinstance(evts[3], StreamToolOutputAvailable)
-        assert isinstance(evts[4], StreamFinishStep)
-
-    def test_uses_provided_tool_call_id(self):
-        evts = compaction_events("msg", tool_call_id="my-id")
-        tool_start = evts[1]
-        assert isinstance(tool_start, StreamToolInputStart)
-        assert tool_start.toolCallId == "my-id"
-
-    def test_generates_id_when_not_provided(self):
-        evts = compaction_events("msg")
-        tool_start = evts[1]
-        assert isinstance(tool_start, StreamToolInputStart)
-        assert tool_start.toolCallId.startswith("compaction-")
-
-    def test_tool_name_is_context_compaction(self):
-        evts = compaction_events("msg")
-        tool_start = evts[1]
-        assert isinstance(tool_start, StreamToolInputStart)
-        assert tool_start.toolName == COMPACTION_TOOL_NAME
-
-
-# ---------------------------------------------------------------------------
-# emit_compaction
-# ---------------------------------------------------------------------------
-
-
-class TestEmitCompaction:
-    def test_persists_to_session(self):
-        session = _make_session()
-        assert len(session.messages) == 0
-        evts = emit_compaction(session)
-        assert len(evts) == 5
-        # Should have appended 2 messages (assistant tool call + tool result)
-        assert len(session.messages) == 2
-        assert session.messages[0].role == "assistant"
-        assert session.messages[0].tool_calls is not None
-        assert (
-            session.messages[0].tool_calls[0]["function"]["name"]
-            == COMPACTION_TOOL_NAME
-        )
-        assert session.messages[1].role == "tool"
-        assert session.messages[1].content == COMPACTION_DONE_MSG
-
-
-# ---------------------------------------------------------------------------
-# filter_compaction_messages
-# ---------------------------------------------------------------------------
-
-
-class TestFilterCompactionMessages:
-    def test_removes_compaction_tool_calls(self):
-        msgs = [
-            ChatMessage(role="user", content="hello"),
-            ChatMessage(
-                role="assistant",
-                content="",
-                tool_calls=[
-                    {
-                        "id": "comp-1",
-                        "type": "function",
-                        "function": {"name": COMPACTION_TOOL_NAME, "arguments": "{}"},
-                    }
-                ],
-            ),
-            ChatMessage(
-                role="tool", content=COMPACTION_DONE_MSG, tool_call_id="comp-1"
-            ),
-            ChatMessage(role="assistant", content="world"),
-        ]
-        filtered = filter_compaction_messages(msgs)
-        assert len(filtered) == 2
-        assert filtered[0].content == "hello"
-        assert filtered[1].content == "world"
-
-    def test_keeps_non_compaction_tool_calls(self):
-        msgs = [
-            ChatMessage(
-                role="assistant",
-                content="",
-                tool_calls=[
-                    {
-                        "id": "real-1",
-                        "type": "function",
-                        "function": {"name": "search", "arguments": "{}"},
-                    }
-                ],
-            ),
-            ChatMessage(role="tool", content="result", tool_call_id="real-1"),
-        ]
-        filtered = filter_compaction_messages(msgs)
-        assert len(filtered) == 2
-
-    def test_keeps_assistant_with_content_and_compaction_call(self):
-        """If assistant message has both content and a compaction tool call,
-        the message is kept (has real content)."""
-        msgs = [
-            ChatMessage(
-                role="assistant",
-                content="I have content",
-                tool_calls=[
-                    {
-                        "id": "comp-1",
-                        "type": "function",
-                        "function": {"name": COMPACTION_TOOL_NAME, "arguments": "{}"},
-                    }
-                ],
-            ),
-        ]
-        filtered = filter_compaction_messages(msgs)
-        assert len(filtered) == 1
-
-    def test_empty_list(self):
-        assert filter_compaction_messages([]) == []
-
-
-# ---------------------------------------------------------------------------
-# CompactionTracker
-# ---------------------------------------------------------------------------
-
-
-class TestCompactionTracker:
-    def test_on_compact_sets_event(self):
-        tracker = CompactionTracker()
-        tracker.on_compact()
-        assert tracker._compact_start.is_set()
-
-    def test_emit_start_if_ready_no_event(self):
-        tracker = CompactionTracker()
-        assert tracker.emit_start_if_ready() == []
-
-    def test_emit_start_if_ready_with_event(self):
-        tracker = CompactionTracker()
-        tracker.on_compact()
-        evts = tracker.emit_start_if_ready()
-        assert len(evts) == 3
-        assert isinstance(evts[0], StreamStartStep)
-        assert isinstance(evts[1], StreamToolInputStart)
-        assert isinstance(evts[2], StreamToolInputAvailable)
-
-    def test_emit_start_only_once(self):
-        tracker = CompactionTracker()
-        tracker.on_compact()
-        evts1 = tracker.emit_start_if_ready()
-        assert len(evts1) == 3
-        # Second call should return empty
-        evts2 = tracker.emit_start_if_ready()
-        assert evts2 == []
-
-    @pytest.mark.asyncio
-    async def test_emit_end_after_start(self):
-        tracker = CompactionTracker()
-        session = _make_session()
-        tracker.on_compact()
-        tracker.emit_start_if_ready()
-        evts = await tracker.emit_end_if_ready(session)
-        assert len(evts) == 2
-        assert isinstance(evts[0], StreamToolOutputAvailable)
-        assert isinstance(evts[1], StreamFinishStep)
-        # Should persist
-        assert len(session.messages) == 2
-
-    @pytest.mark.asyncio
-    async def test_emit_end_without_start_self_contained(self):
-        """If PreCompact fired but start was never emitted, emit_end
-        produces a self-contained compaction event."""
-        tracker = CompactionTracker()
-        session = _make_session()
-        tracker.on_compact()
-        # Don't call emit_start_if_ready
-        evts = await tracker.emit_end_if_ready(session)
-        assert len(evts) == 5  # Full self-contained event
-        assert isinstance(evts[0], StreamStartStep)
-        assert len(session.messages) == 2
-
-    @pytest.mark.asyncio
-    async def test_emit_end_no_op_when_done(self):
-        tracker = CompactionTracker()
-        session = _make_session()
-        tracker.on_compact()
-        tracker.emit_start_if_ready()
-        await tracker.emit_end_if_ready(session)
-        # Second call should be no-op
-        evts = await tracker.emit_end_if_ready(session)
-        assert evts == []
-
-    @pytest.mark.asyncio
-    async def test_emit_end_no_op_when_nothing_happened(self):
-        tracker = CompactionTracker()
-        session = _make_session()
-        evts = await tracker.emit_end_if_ready(session)
-        assert evts == []
-
-    def test_emit_pre_query(self):
-        tracker = CompactionTracker()
-        session = _make_session()
-        evts = tracker.emit_pre_query(session)
-        assert len(evts) == 5
-        assert len(session.messages) == 2
-        assert tracker._done is True
-
-    def test_reset_for_query(self):
-        tracker = CompactionTracker()
-        tracker._done = True
-        tracker._start_emitted = True
-        tracker._tool_call_id = "old"
-        tracker.reset_for_query()
-        assert tracker._done is False
-        assert tracker._start_emitted is False
-        assert tracker._tool_call_id == ""
-
-    @pytest.mark.asyncio
-    async def test_pre_query_blocks_sdk_compaction(self):
-        """After pre-query compaction, SDK compaction events are suppressed."""
-        tracker = CompactionTracker()
-        session = _make_session()
-        tracker.emit_pre_query(session)
-        tracker.on_compact()
-        evts = tracker.emit_start_if_ready()
-        assert evts == []  # _done blocks it
-
-    @pytest.mark.asyncio
-    async def test_reset_allows_new_compaction(self):
-        """After reset_for_query, compaction can fire again."""
-        tracker = CompactionTracker()
-        session = _make_session()
-        tracker.emit_pre_query(session)
-        tracker.reset_for_query()
-        tracker.on_compact()
-        evts = tracker.emit_start_if_ready()
-        assert len(evts) == 3  # Start events emitted
-
-    @pytest.mark.asyncio
-    async def test_tool_call_id_consistency(self):
-        """Start and end events use the same tool_call_id."""
-        tracker = CompactionTracker()
-        session = _make_session()
-        tracker.on_compact()
-        start_evts = tracker.emit_start_if_ready()
-        end_evts = await tracker.emit_end_if_ready(session)
-        start_evt = start_evts[1]
-        end_evt = end_evts[0]
-        assert isinstance(start_evt, StreamToolInputStart)
-        assert isinstance(end_evt, StreamToolOutputAvailable)
-        assert start_evt.toolCallId == end_evt.toolCallId
-        # Persisted ID should also match
-        tool_calls = session.messages[0].tool_calls
-        assert tool_calls is not None
-        assert tool_calls[0]["id"] == start_evt.toolCallId
--- a/autogpt_platform/backend/backend/copilot/sdk/dummy.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/dummy.py
@@ -1,59 +0,0 @@
-"""Dummy SDK service for testing copilot streaming.
-
-Returns mock streaming responses without calling Claude Agent SDK.
-Enable via COPILOT_TEST_MODE=true environment variable.
-
-WARNING: This is for testing only. Do not use in production.
-"""
-
-import asyncio
-import logging
-import uuid
-from collections.abc import AsyncGenerator
-from typing import Any
-
-from ..model import ChatSession
-from ..response_model import StreamBaseResponse, StreamStart, StreamTextDelta
-
-logger = logging.getLogger(__name__)
-
-
-async def stream_chat_completion_dummy(
-    session_id: str,
-    message: str | None = None,
-    tool_call_response: str | None = None,
-    is_user_message: bool = True,
-    user_id: str | None = None,
-    retry_count: int = 0,
-    session: ChatSession | None = None,
-    context: dict[str, str] | None = None,
-    **_kwargs: Any,
-) -> AsyncGenerator[StreamBaseResponse, None]:
-    """Stream dummy chat completion for testing.
-
-    Returns a simple streaming response with text deltas to test:
-    - Streaming infrastructure works
-    - No timeout occurs
-    - Text arrives in chunks
-    - StreamFinish is sent by mark_session_completed
-    """
-    logger.warning(
-        f"[TEST MODE] Using dummy copilot streaming for session {session_id}"
-    )
-
-    message_id = str(uuid.uuid4())
-    text_block_id = str(uuid.uuid4())
-
-    # Start the stream
-    yield StreamStart(messageId=message_id, sessionId=session_id)
-
-    # Simulate streaming text response with delays
-    dummy_response = "I counted: 1... 2... 3. All done!"
-    words = dummy_response.split()
-
-    for i, word in enumerate(words):
-        # Add space except for last word
-        text = word if i == len(words) - 1 else f"{word} "
-        yield StreamTextDelta(id=text_block_id, delta=text)
-        # Small delay to simulate real streaming
-        await asyncio.sleep(0.1)
--- a/autogpt_platform/backend/backend/copilot/sdk/e2b_file_tools.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/e2b_file_tools.py
@@ -1,362 +0,0 @@
-"""MCP file-tool handlers that route to the E2B cloud sandbox.
-
-When E2B is active, these tools replace the SDK built-in Read/Write/Edit/
-Glob/Grep so that all file operations share the same ``/home/user``
-filesystem as ``bash_exec``.
-
-SDK-internal paths (``~/.claude/projects/…/tool-results/``) are handled
-by the separate ``Read`` MCP tool registered in ``tool_adapter.py``.
-"""
-
-from __future__ import annotations
-
-import itertools
-import json
-import logging
-import os
-import shlex
-from typing import Any, Callable
-
-from backend.copilot.tools.e2b_sandbox import E2B_WORKDIR
-
-logger = logging.getLogger(__name__)
-
-
-# Lazy imports to break circular dependency with tool_adapter.
-
-
-def _get_sandbox():  # type: ignore[return]
-    from .tool_adapter import get_current_sandbox  # noqa: E402
-
-    return get_current_sandbox()
-
-
-def _is_allowed_local(path: str) -> bool:
-    from .tool_adapter import is_allowed_local_path  # noqa: E402
-
-    return is_allowed_local_path(path)
-
-
-def _resolve_remote(path: str) -> str:
-    """Normalise *path* to an absolute sandbox path under ``/home/user``.
-
-    Raises :class:`ValueError` if the resolved path escapes the sandbox.
-    """
-    candidate = path if os.path.isabs(path) else os.path.join(E2B_WORKDIR, path)
-    normalized = os.path.normpath(candidate)
-    if normalized != E2B_WORKDIR and not normalized.startswith(E2B_WORKDIR + "/"):
-        raise ValueError(f"Path must be within {E2B_WORKDIR}: {path}")
-    return normalized
-
-
-def _mcp(text: str, *, error: bool = False) -> dict[str, Any]:
-    if error:
-        text = json.dumps({"error": text, "type": "error"})
-    return {"content": [{"type": "text", "text": text}], "isError": error}
-
-
-def _get_sandbox_and_path(
-    file_path: str,
-) -> tuple[Any, str] | dict[str, Any]:
-    """Common preamble: get sandbox + resolve path, or return MCP error."""
-    sandbox = _get_sandbox()
-    if sandbox is None:
-        return _mcp("No E2B sandbox available", error=True)
-    try:
-        remote = _resolve_remote(file_path)
-    except ValueError as exc:
-        return _mcp(str(exc), error=True)
-    return sandbox, remote
-
-
-# Tool handlers
-
-
-async def _handle_read_file(args: dict[str, Any]) -> dict[str, Any]:
-    file_path: str = args.get("file_path", "")
-    offset: int = max(0, int(args.get("offset", 0)))
-    limit: int = max(1, int(args.get("limit", 2000)))
-
-    if not file_path:
-        return _mcp("file_path is required", error=True)
-
-    # SDK-internal paths (tool-results, ephemeral working dir) stay on the host.
-    if _is_allowed_local(file_path):
-        return _read_local(file_path, offset, limit)
-
-    result = _get_sandbox_and_path(file_path)
-    if isinstance(result, dict):
-        return result
-    sandbox, remote = result
-
-    try:
-        raw: bytes = await sandbox.files.read(remote, format="bytes")
-        content = raw.decode("utf-8", errors="replace")
-    except Exception as exc:
-        return _mcp(f"Failed to read {remote}: {exc}", error=True)
-
-    lines = content.splitlines(keepends=True)
-    selected = list(itertools.islice(lines, offset, offset + limit))
-    numbered = "".join(
-        f"{i + offset + 1:>6}\t{line}" for i, line in enumerate(selected)
-    )
-    return _mcp(numbered)
-
-
-async def _handle_write_file(args: dict[str, Any]) -> dict[str, Any]:
-    file_path: str = args.get("file_path", "")
-    content: str = args.get("content", "")
-
-    if not file_path:
-        return _mcp("file_path is required", error=True)
-
-    result = _get_sandbox_and_path(file_path)
-    if isinstance(result, dict):
-        return result
-    sandbox, remote = result
-
-    try:
-        parent = os.path.dirname(remote)
-        if parent and parent != E2B_WORKDIR:
-            await sandbox.files.make_dir(parent)
-        await sandbox.files.write(remote, content)
-    except Exception as exc:
-        return _mcp(f"Failed to write {remote}: {exc}", error=True)
-
-    return _mcp(f"Successfully wrote to {remote}")
-
-
-async def _handle_edit_file(args: dict[str, Any]) -> dict[str, Any]:
-    file_path: str = args.get("file_path", "")
-    old_string: str = args.get("old_string", "")
-    new_string: str = args.get("new_string", "")
-    replace_all: bool = args.get("replace_all", False)
-
-    if not file_path:
-        return _mcp("file_path is required", error=True)
-    if not old_string:
-        return _mcp("old_string is required", error=True)
-
-    result = _get_sandbox_and_path(file_path)
-    if isinstance(result, dict):
-        return result
-    sandbox, remote = result
-
-    try:
-        raw: bytes = await sandbox.files.read(remote, format="bytes")
-        content = raw.decode("utf-8", errors="replace")
-    except Exception as exc:
-        return _mcp(f"Failed to read {remote}: {exc}", error=True)
-
-    count = content.count(old_string)
-    if count == 0:
-        return _mcp(f"old_string not found in {file_path}", error=True)
-    if count > 1 and not replace_all:
-        return _mcp(
-            f"old_string appears {count} times in {file_path}. "
-            "Use replace_all=true or provide a more unique string.",
-            error=True,
-        )
-
-    updated = (
-        content.replace(old_string, new_string)
-        if replace_all
-        else content.replace(old_string, new_string, 1)
-    )
-    try:
-        await sandbox.files.write(remote, updated)
-    except Exception as exc:
-        return _mcp(f"Failed to write {remote}: {exc}", error=True)
-
-    return _mcp(f"Edited {remote} ({count} replacement{'s' if count > 1 else ''})")
-
-
-async def _handle_glob(args: dict[str, Any]) -> dict[str, Any]:
-    pattern: str = args.get("pattern", "")
-    path: str = args.get("path", "")
-
-    if not pattern:
-        return _mcp("pattern is required", error=True)
-
-    sandbox = _get_sandbox()
-    if sandbox is None:
-        return _mcp("No E2B sandbox available", error=True)
-
-    try:
-        search_dir = _resolve_remote(path) if path else E2B_WORKDIR
-    except ValueError as exc:
-        return _mcp(str(exc), error=True)
-
-    cmd = f"find {shlex.quote(search_dir)} -name {shlex.quote(pattern)} -type f 2>/dev/null | head -500"
-    try:
-        result = await sandbox.commands.run(cmd, cwd=E2B_WORKDIR, timeout=10)
-    except Exception as exc:
-        return _mcp(f"Glob failed: {exc}", error=True)
-
-    files = [line for line in (result.stdout or "").strip().splitlines() if line]
-    return _mcp(json.dumps(files, indent=2))
-
-
-async def _handle_grep(args: dict[str, Any]) -> dict[str, Any]:
-    pattern: str = args.get("pattern", "")
-    path: str = args.get("path", "")
-    include: str = args.get("include", "")
-
-    if not pattern:
-        return _mcp("pattern is required", error=True)
-
-    sandbox = _get_sandbox()
-    if sandbox is None:
-        return _mcp("No E2B sandbox available", error=True)
-
-    try:
-        search_dir = _resolve_remote(path) if path else E2B_WORKDIR
-    except ValueError as exc:
-        return _mcp(str(exc), error=True)
-
-    parts = ["grep", "-rn", "--color=never"]
-    if include:
-        parts.extend(["--include", include])
-    parts.extend([pattern, search_dir])
-    cmd = " ".join(shlex.quote(p) for p in parts) + " 2>/dev/null | head -200"
-
-    try:
-        result = await sandbox.commands.run(cmd, cwd=E2B_WORKDIR, timeout=15)
-    except Exception as exc:
-        return _mcp(f"Grep failed: {exc}", error=True)
-
-    output = (result.stdout or "").strip()
-    return _mcp(output if output else "No matches found.")
-
-
-# Local read (for SDK-internal paths)
-
-
-def _read_local(file_path: str, offset: int, limit: int) -> dict[str, Any]:
-    """Read from the host filesystem (defence-in-depth path check)."""
-    if not _is_allowed_local(file_path):
-        return _mcp(f"Path not allowed: {file_path}", error=True)
-    expanded = os.path.realpath(os.path.expanduser(file_path))
-    try:
-        with open(expanded) as fh:
-            selected = list(itertools.islice(fh, offset, offset + limit))
-        numbered = "".join(
-            f"{i + offset + 1:>6}\t{line}" for i, line in enumerate(selected)
-        )
-        return _mcp(numbered)
-    except FileNotFoundError:
-        return _mcp(f"File not found: {file_path}", error=True)
-    except Exception as exc:
-        return _mcp(f"Error reading {file_path}: {exc}", error=True)
-
-
-# Tool descriptors (name, description, schema, handler)
-
-E2B_FILE_TOOLS: list[tuple[str, str, dict[str, Any], Callable[..., Any]]] = [
-    (
-        "read_file",
-        "Read a file from the cloud sandbox (/home/user). "
-        "Use offset and limit for large files.",
-        {
-            "type": "object",
-            "properties": {
-                "file_path": {
-                    "type": "string",
-                    "description": "Path (relative to /home/user, or absolute).",
-                },
-                "offset": {
-                    "type": "integer",
-                    "description": "Line to start reading from (0-indexed). Default: 0.",
-                },
-                "limit": {
-                    "type": "integer",
-                    "description": "Number of lines to read. Default: 2000.",
-                },
-            },
-            "required": ["file_path"],
-        },
-        _handle_read_file,
-    ),
-    (
-        "write_file",
-        "Write or create a file in the cloud sandbox (/home/user). "
-        "Parent directories are created automatically. "
-        "To copy a workspace file into the sandbox, use "
-        "read_workspace_file with save_to_path instead.",
-        {
-            "type": "object",
-            "properties": {
-                "file_path": {
-                    "type": "string",
-                    "description": "Path (relative to /home/user, or absolute).",
-                },
-                "content": {"type": "string", "description": "Content to write."},
-            },
-            "required": ["file_path", "content"],
-        },
-        _handle_write_file,
-    ),
-    (
-        "edit_file",
-        "Targeted text replacement in a sandbox file. "
-        "old_string must appear in the file and is replaced with new_string.",
-        {
-            "type": "object",
-            "properties": {
-                "file_path": {
-                    "type": "string",
-                    "description": "Path (relative to /home/user, or absolute).",
-                },
-                "old_string": {"type": "string", "description": "Text to find."},
-                "new_string": {"type": "string", "description": "Replacement text."},
-                "replace_all": {
-                    "type": "boolean",
-                    "description": "Replace all occurrences (default: false).",
-                },
-            },
-            "required": ["file_path", "old_string", "new_string"],
-        },
-        _handle_edit_file,
-    ),
-    (
-        "glob",
-        "Search for files by name pattern in the cloud sandbox.",
-        {
-            "type": "object",
-            "properties": {
-                "pattern": {
-                    "type": "string",
-                    "description": "Glob pattern (e.g. *.py).",
-                },
-                "path": {
-                    "type": "string",
-                    "description": "Directory to search. Default: /home/user.",
-                },
-            },
-            "required": ["pattern"],
-        },
-        _handle_glob,
-    ),
-    (
-        "grep",
-        "Search file contents by regex in the cloud sandbox.",
-        {
-            "type": "object",
-            "properties": {
-                "pattern": {"type": "string", "description": "Regex pattern."},
-                "path": {
-                    "type": "string",
-                    "description": "File or directory. Default: /home/user.",
-                },
-                "include": {
-                    "type": "string",
-                    "description": "Glob to filter files (e.g. *.py).",
-                },
-            },
-            "required": ["pattern"],
-        },
-        _handle_grep,
-    ),
-]
-
-E2B_FILE_TOOL_NAMES: list[str] = [name for name, *_ in E2B_FILE_TOOLS]
--- a/autogpt_platform/backend/backend/copilot/sdk/e2b_file_tools_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/e2b_file_tools_test.py
@@ -1,153 +0,0 @@
-"""Tests for E2B file-tool path validation and local read safety.
-
-Pure unit tests with no external dependencies (no E2B, no sandbox).
-"""
-
-import os
-
-import pytest
-
-from .e2b_file_tools import _read_local, _resolve_remote
-from .tool_adapter import _current_project_dir
-
-_SDK_PROJECTS_DIR = os.path.realpath(os.path.expanduser("~/.claude/projects"))
-
-
-# ---------------------------------------------------------------------------
-# _resolve_remote — sandbox path normalisation & boundary enforcement
-# ---------------------------------------------------------------------------
-
-
-class TestResolveRemote:
-    def test_relative_path_resolved(self):
-        assert _resolve_remote("src/main.py") == "/home/user/src/main.py"
-
-    def test_absolute_within_sandbox(self):
-        assert _resolve_remote("/home/user/file.txt") == "/home/user/file.txt"
-
-    def test_workdir_itself(self):
-        assert _resolve_remote("/home/user") == "/home/user"
-
-    def test_relative_dotslash(self):
-        assert _resolve_remote("./README.md") == "/home/user/README.md"
-
-    def test_traversal_blocked(self):
-        with pytest.raises(ValueError, match="must be within /home/user"):
-            _resolve_remote("../../etc/passwd")
-
-    def test_absolute_traversal_blocked(self):
-        with pytest.raises(ValueError, match="must be within /home/user"):
-            _resolve_remote("/home/user/../../etc/passwd")
-
-    def test_absolute_outside_sandbox_blocked(self):
-        with pytest.raises(ValueError, match="must be within /home/user"):
-            _resolve_remote("/etc/passwd")
-
-    def test_root_blocked(self):
-        with pytest.raises(ValueError, match="must be within /home/user"):
-            _resolve_remote("/")
-
-    def test_home_other_user_blocked(self):
-        with pytest.raises(ValueError, match="must be within /home/user"):
-            _resolve_remote("/home/other/file.txt")
-
-    def test_deep_nested_allowed(self):
-        assert _resolve_remote("a/b/c/d/e.txt") == "/home/user/a/b/c/d/e.txt"
-
-    def test_trailing_slash_normalised(self):
-        assert _resolve_remote("src/") == "/home/user/src"
-
-    def test_double_dots_within_sandbox_ok(self):
-        """Path that resolves back within /home/user is allowed."""
-        assert _resolve_remote("a/b/../c.txt") == "/home/user/a/c.txt"
-
-
-# ---------------------------------------------------------------------------
-# _read_local — host filesystem reads with allowlist enforcement
-#
-# In E2B mode, _read_local only allows tool-results paths (via
-# is_allowed_local_path without sdk_cwd).  Regular files live on the
-# sandbox, not the host.
-# ---------------------------------------------------------------------------
-
-
-class TestReadLocal:
-    def _make_tool_results_file(self, encoded: str, filename: str, content: str) -> str:
-        """Create a tool-results file and return its path."""
-        tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
-        os.makedirs(tool_results_dir, exist_ok=True)
-        filepath = os.path.join(tool_results_dir, filename)
-        with open(filepath, "w") as f:
-            f.write(content)
-        return filepath
-
-    def test_read_tool_results_file(self):
-        """Reading a tool-results file should succeed."""
-        encoded = "-tmp-copilot-e2b-test-read"
-        filepath = self._make_tool_results_file(
-            encoded, "result.txt", "line 1\nline 2\nline 3\n"
-        )
-        token = _current_project_dir.set(encoded)
-        try:
-            result = _read_local(filepath, offset=0, limit=2000)
-            assert result["isError"] is False
-            assert "line 1" in result["content"][0]["text"]
-            assert "line 2" in result["content"][0]["text"]
-        finally:
-            _current_project_dir.reset(token)
-            os.unlink(filepath)
-
-    def test_read_disallowed_path_blocked(self):
-        """Reading /etc/passwd should be blocked by the allowlist."""
-        result = _read_local("/etc/passwd", offset=0, limit=10)
-        assert result["isError"] is True
-        assert "not allowed" in result["content"][0]["text"].lower()
-
-    def test_read_nonexistent_tool_results(self):
-        """A tool-results path that doesn't exist returns FileNotFoundError."""
-        encoded = "-tmp-copilot-e2b-test-nofile"
-        tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
-        os.makedirs(tool_results_dir, exist_ok=True)
-        filepath = os.path.join(tool_results_dir, "nonexistent.txt")
-        token = _current_project_dir.set(encoded)
-        try:
-            result = _read_local(filepath, offset=0, limit=10)
-            assert result["isError"] is True
-            assert "not found" in result["content"][0]["text"].lower()
-        finally:
-            _current_project_dir.reset(token)
-            os.rmdir(tool_results_dir)
-
-    def test_read_traversal_path_blocked(self):
-        """A traversal attempt that escapes allowed directories is blocked."""
-        result = _read_local("/tmp/copilot-abc/../../etc/shadow", offset=0, limit=10)
-        assert result["isError"] is True
-        assert "not allowed" in result["content"][0]["text"].lower()
-
-    def test_read_arbitrary_host_path_blocked(self):
-        """Arbitrary host paths are blocked even if they exist."""
-        result = _read_local("/proc/self/environ", offset=0, limit=10)
-        assert result["isError"] is True
-
-    def test_read_with_offset_and_limit(self):
-        """Offset and limit should control which lines are returned."""
-        encoded = "-tmp-copilot-e2b-test-offset"
-        content = "".join(f"line {i}\n" for i in range(10))
-        filepath = self._make_tool_results_file(encoded, "lines.txt", content)
-        token = _current_project_dir.set(encoded)
-        try:
-            result = _read_local(filepath, offset=3, limit=2)
-            assert result["isError"] is False
-            text = result["content"][0]["text"]
-            assert "line 3" in text
-            assert "line 4" in text
-            assert "line 2" not in text
-            assert "line 5" not in text
-        finally:
-            _current_project_dir.reset(token)
-            os.unlink(filepath)
-
-    def test_read_without_project_dir_blocks_all(self):
-        """Without _current_project_dir set, all paths are blocked."""
-        result = _read_local("/tmp/anything.txt", offset=0, limit=10)
-        assert result["isError"] is True
--- a/autogpt_platform/backend/backend/copilot/sdk/otel_setup_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/otel_setup_test.py
@@ -1,172 +0,0 @@
-"""Tests for OTEL tracing setup in the SDK copilot path."""
-
-import os
-from unittest.mock import MagicMock, patch
-
-
-class TestSetupLangfuseOtel:
-    """Tests for _setup_langfuse_otel()."""
-
-    def test_noop_when_langfuse_not_configured(self):
-        """No env vars should be set when Langfuse credentials are missing."""
-        with patch(
-            "backend.copilot.sdk.service._is_langfuse_configured", return_value=False
-        ):
-            from backend.copilot.sdk.service import _setup_langfuse_otel
-
-            # Clear any previously set env vars
-            env_keys = [
-                "LANGSMITH_OTEL_ENABLED",
-                "LANGSMITH_OTEL_ONLY",
-                "LANGSMITH_TRACING",
-                "OTEL_EXPORTER_OTLP_ENDPOINT",
-                "OTEL_EXPORTER_OTLP_HEADERS",
-            ]
-            saved = {k: os.environ.pop(k, None) for k in env_keys}
-            try:
-                _setup_langfuse_otel()
-                for key in env_keys:
-                    assert key not in os.environ, f"{key} should not be set"
-            finally:
-                for k, v in saved.items():
-                    if v is not None:
-                        os.environ[k] = v
-
-    def test_sets_env_vars_when_langfuse_configured(self):
-        """OTEL env vars should be set when Langfuse credentials exist."""
-        mock_settings = MagicMock()
-        mock_settings.secrets.langfuse_public_key = "pk-test-123"
-        mock_settings.secrets.langfuse_secret_key = "sk-test-456"
-        mock_settings.secrets.langfuse_host = "https://langfuse.example.com"
-        mock_settings.secrets.langfuse_tracing_environment = "test"
-
-        with (
-            patch(
-                "backend.copilot.sdk.service._is_langfuse_configured",
-                return_value=True,
-            ),
-            patch("backend.copilot.sdk.service.Settings", return_value=mock_settings),
-            patch(
-                "backend.copilot.sdk.service.configure_claude_agent_sdk",
-                return_value=True,
-            ) as mock_configure,
-        ):
-            from backend.copilot.sdk.service import _setup_langfuse_otel
-
-            # Clear env vars so setdefault works
-            env_keys = [
-                "LANGSMITH_OTEL_ENABLED",
-                "LANGSMITH_OTEL_ONLY",
-                "LANGSMITH_TRACING",
-                "OTEL_EXPORTER_OTLP_ENDPOINT",
-                "OTEL_EXPORTER_OTLP_HEADERS",
-                "OTEL_RESOURCE_ATTRIBUTES",
-            ]
-            saved = {k: os.environ.pop(k, None) for k in env_keys}
-            try:
-                _setup_langfuse_otel()
-
-                assert os.environ["LANGSMITH_OTEL_ENABLED"] == "true"
-                assert os.environ["LANGSMITH_OTEL_ONLY"] == "true"
-                assert os.environ["LANGSMITH_TRACING"] == "true"
-                assert (
-                    os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"]
-                    == "https://langfuse.example.com/api/public/otel"
-                )
-                assert "Authorization=Basic" in os.environ["OTEL_EXPORTER_OTLP_HEADERS"]
-                assert (
-                    os.environ["OTEL_RESOURCE_ATTRIBUTES"]
-                    == "langfuse.environment=test"
-                )
-
-                mock_configure.assert_called_once_with(tags=["sdk"])
-            finally:
-                for k, v in saved.items():
-                    if v is not None:
-                        os.environ[k] = v
-                    elif k in os.environ:
-                        del os.environ[k]
-
-    def test_existing_env_vars_not_overwritten(self):
-        """Explicit env-var overrides should not be clobbered."""
-        mock_settings = MagicMock()
-        mock_settings.secrets.langfuse_public_key = "pk-test"
-        mock_settings.secrets.langfuse_secret_key = "sk-test"
-        mock_settings.secrets.langfuse_host = "https://langfuse.example.com"
-
-        with (
-            patch(
-                "backend.copilot.sdk.service._is_langfuse_configured",
-                return_value=True,
-            ),
-            patch("backend.copilot.sdk.service.Settings", return_value=mock_settings),
-            patch(
-                "backend.copilot.sdk.service.configure_claude_agent_sdk",
-                return_value=True,
-            ),
-        ):
-            from backend.copilot.sdk.service import _setup_langfuse_otel
-
-            saved = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT")
-            try:
-                os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://custom.endpoint/v1"
-                _setup_langfuse_otel()
-                assert (
-                    os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"]
-                    == "https://custom.endpoint/v1"
-                )
-            finally:
-                if saved is not None:
-                    os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = saved
-                elif "OTEL_EXPORTER_OTLP_ENDPOINT" in os.environ:
-                    del os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"]
-
-    def test_graceful_failure_on_exception(self):
-        """Setup should not raise even if internal code fails."""
-        with (
-            patch(
-                "backend.copilot.sdk.service._is_langfuse_configured",
-                return_value=True,
-            ),
-            patch(
-                "backend.copilot.sdk.service.Settings",
-                side_effect=RuntimeError("settings unavailable"),
-            ),
-        ):
-            from backend.copilot.sdk.service import _setup_langfuse_otel
-
-            # Should not raise — just logs and returns
-            _setup_langfuse_otel()
-
-
-class TestPropagateAttributesImport:
-    """Verify langfuse.propagate_attributes is available."""
-
-    def test_propagate_attributes_is_importable(self):
-        from langfuse import propagate_attributes
-
-        assert callable(propagate_attributes)
-
-    def test_propagate_attributes_returns_context_manager(self):
-        from langfuse import propagate_attributes
-
-        ctx = propagate_attributes(user_id="u1", session_id="s1", tags=["test"])
-        assert hasattr(ctx, "__enter__")
-        assert hasattr(ctx, "__exit__")
-
-
-class TestReceiveResponseCompat:
-    """Verify ClaudeSDKClient.receive_response() exists (langsmith patches it)."""
-
-    def test_receive_response_exists(self):
-        from claude_agent_sdk import ClaudeSDKClient
-
-        assert hasattr(ClaudeSDKClient, "receive_response")
-
-    def test_receive_response_is_async_generator(self):
-        import inspect
-
-        from claude_agent_sdk import ClaudeSDKClient
-
-        method = getattr(ClaudeSDKClient, "receive_response")
-        assert inspect.isfunction(method) or inspect.ismethod(method)
--- a/autogpt_platform/backend/backend/copilot/sdk/query_builder_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/query_builder_test.py
@@ -1,255 +0,0 @@
-"""Tests for _format_conversation_context and _build_query_message."""
-
-from datetime import UTC, datetime
-
-import pytest
-
-from backend.copilot.model import ChatMessage, ChatSession
-from backend.copilot.sdk.service import (
-    _build_query_message,
-    _format_conversation_context,
-)
-
-# ---------------------------------------------------------------------------
-# _format_conversation_context
-# ---------------------------------------------------------------------------
-
-
-def test_format_empty_list():
-    assert _format_conversation_context([]) is None
-
-
-def test_format_none_content_messages():
-    msgs = [ChatMessage(role="user", content=None)]
-    assert _format_conversation_context(msgs) is None
-
-
-def test_format_user_message():
-    msgs = [ChatMessage(role="user", content="hello")]
-    result = _format_conversation_context(msgs)
-    assert result is not None
-    assert "User: hello" in result
-    assert result.startswith("<conversation_history>")
-    assert result.endswith("</conversation_history>")
-
-
-def test_format_assistant_text():
-    msgs = [ChatMessage(role="assistant", content="hi there")]
-    result = _format_conversation_context(msgs)
-    assert result is not None
-    assert "You responded: hi there" in result
-
-
-def test_format_assistant_tool_calls():
-    msgs = [
-        ChatMessage(
-            role="assistant",
-            content=None,
-            tool_calls=[{"function": {"name": "search", "arguments": '{"q": "test"}'}}],
-        )
-    ]
-    result = _format_conversation_context(msgs)
-    assert result is not None
-    assert 'You called tool: search({"q": "test"})' in result
-
-
-def test_format_tool_result():
-    msgs = [ChatMessage(role="tool", content='{"result": "ok"}')]
-    result = _format_conversation_context(msgs)
-    assert result is not None
-    assert 'Tool result: {"result": "ok"}' in result
-
-
-def test_format_tool_result_none_content():
-    msgs = [ChatMessage(role="tool", content=None)]
-    result = _format_conversation_context(msgs)
-    assert result is not None
-    assert "Tool result: " in result
-
-
-def test_format_full_conversation():
-    msgs = [
-        ChatMessage(role="user", content="find agents"),
-        ChatMessage(
-            role="assistant",
-            content="I'll search for agents.",
-            tool_calls=[
-                {"function": {"name": "find_agents", "arguments": '{"q": "test"}'}}
-            ],
-        ),
-        ChatMessage(role="tool", content='[{"id": "1", "name": "Agent1"}]'),
-        ChatMessage(role="assistant", content="Found Agent1."),
-    ]
-    result = _format_conversation_context(msgs)
-    assert result is not None
-    assert "User: find agents" in result
-    assert "You responded: I'll search for agents." in result
-    assert "You called tool: find_agents" in result
-    assert "Tool result:" in result
-    assert "You responded: Found Agent1." in result
-
-
-# ---------------------------------------------------------------------------
-# _build_query_message
-# ---------------------------------------------------------------------------
-
-
-def _make_session(messages: list[ChatMessage]) -> ChatSession:
-    """Build a minimal ChatSession with the given messages."""
-    now = datetime.now(UTC)
-    return ChatSession(
-        session_id="test-session",
-        user_id="user-1",
-        messages=messages,
-        title="test",
-        usage=[],
-        started_at=now,
-        updated_at=now,
-    )
-
-
-@pytest.mark.asyncio
-async def test_build_query_resume_up_to_date():
-    """With --resume and transcript covers all messages, return raw message."""
-    session = _make_session(
-        [
-            ChatMessage(role="user", content="hello"),
-            ChatMessage(role="assistant", content="hi"),
-            ChatMessage(role="user", content="what's new?"),
-        ]
-    )
-    result, was_compacted = await _build_query_message(
-        "what's new?",
-        session,
-        use_resume=True,
-        transcript_msg_count=2,
-        session_id="test-session",
-    )
-    # transcript_msg_count == msg_count - 1, so no gap
-    assert result == "what's new?"
-    assert was_compacted is False
-
-
-@pytest.mark.asyncio
-async def test_build_query_resume_stale_transcript():
-    """With --resume and stale transcript, gap context is prepended."""
-    session = _make_session(
-        [
-            ChatMessage(role="user", content="turn 1"),
-            ChatMessage(role="assistant", content="reply 1"),
-            ChatMessage(role="user", content="turn 2"),
-            ChatMessage(role="assistant", content="reply 2"),
-            ChatMessage(role="user", content="turn 3"),
-        ]
-    )
-    result, was_compacted = await _build_query_message(
-        "turn 3",
-        session,
-        use_resume=True,
-        transcript_msg_count=2,
-        session_id="test-session",
-    )
-    assert "<conversation_history>" in result
-    assert "turn 2" in result
-    assert "reply 2" in result
-    assert "Now, the user says:\nturn 3" in result
-    assert was_compacted is False  # gap context does not compact
-
-
-@pytest.mark.asyncio
-async def test_build_query_resume_zero_msg_count():
-    """With --resume but transcript_msg_count=0, return raw message."""
-    session = _make_session(
-        [
-            ChatMessage(role="user", content="hello"),
-            ChatMessage(role="assistant", content="hi"),
-            ChatMessage(role="user", content="new msg"),
-        ]
-    )
-    result, was_compacted = await _build_query_message(
-        "new msg",
-        session,
-        use_resume=True,
-        transcript_msg_count=0,
-        session_id="test-session",
-    )
-    assert result == "new msg"
-    assert was_compacted is False
-
-
-@pytest.mark.asyncio
-async def test_build_query_no_resume_single_message():
-    """Without --resume and only 1 message, return raw message."""
-    session = _make_session([ChatMessage(role="user", content="first")])
-    result, was_compacted = await _build_query_message(
-        "first",
-        session,
-        use_resume=False,
-        transcript_msg_count=0,
-        session_id="test-session",
-    )
-    assert result == "first"
-    assert was_compacted is False
-
-
-@pytest.mark.asyncio
-async def test_build_query_no_resume_multi_message(monkeypatch):
-    """Without --resume and multiple messages, compress and prepend."""
-    session = _make_session(
-        [
-            ChatMessage(role="user", content="older question"),
-            ChatMessage(role="assistant", content="older answer"),
-            ChatMessage(role="user", content="new question"),
-        ]
-    )
-
-    # Mock _compress_messages to return the messages as-is
-    async def _mock_compress(msgs):
-        return msgs, False
-
-    monkeypatch.setattr(
-        "backend.copilot.sdk.service._compress_messages",
-        _mock_compress,
-    )
-
-    result, was_compacted = await _build_query_message(
-        "new question",
-        session,
-        use_resume=False,
-        transcript_msg_count=0,
-        session_id="test-session",
-    )
-    assert "<conversation_history>" in result
-    assert "older question" in result
-    assert "older answer" in result
-    assert "Now, the user says:\nnew question" in result
-    assert was_compacted is False  # mock returns False
-
-
-@pytest.mark.asyncio
-async def test_build_query_no_resume_multi_message_compacted(monkeypatch):
-    """When compression actually compacts, was_compacted should be True."""
-    session = _make_session(
-        [
-            ChatMessage(role="user", content="old"),
-            ChatMessage(role="assistant", content="reply"),
-            ChatMessage(role="user", content="new"),
-        ]
-    )
-
-    async def _mock_compress(msgs):
-        return msgs, True  # Simulate actual compaction
-
-    monkeypatch.setattr(
-        "backend.copilot.sdk.service._compress_messages",
-        _mock_compress,
-    )
-
-    result, was_compacted = await _build_query_message(
-        "new",
-        session,
-        use_resume=False,
-        transcript_msg_count=0,
-        session_id="test-session",
-    )
-    assert was_compacted is True
--- a/autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
@@ -47,20 +47,19 @@ class SDKResponseAdapter:
    text blocks, tool calls, and message lifecycle.
    """

-    def __init__(self, message_id: str | None = None, session_id: str | None = None):
+    def __init__(self, message_id: str | None = None):
        self.message_id = message_id or str(uuid.uuid4())
-        self.session_id = session_id
        self.text_block_id = str(uuid.uuid4())
        self.has_started_text = False
        self.has_ended_text = False
        self.current_tool_calls: dict[str, dict[str, str]] = {}
        self.resolved_tool_calls: set[str] = set()
+        self.task_id: str | None = None
        self.step_open = False

-    @property
-    def has_unresolved_tool_calls(self) -> bool:
-        """True when there are tool calls that haven't received output yet."""
-        return bool(self.current_tool_calls.keys() - self.resolved_tool_calls)
+    def set_task_id(self, task_id: str) -> None:
+        """Set the task ID for reconnection support."""
+        self.task_id = task_id

    def convert_message(self, sdk_message: Message) -> list[StreamBaseResponse]:
        """Convert a single SDK message to Vercel AI SDK format."""
@@ -69,7 +68,7 @@ class SDKResponseAdapter:
        if isinstance(sdk_message, SystemMessage):
            if sdk_message.subtype == "init":
                responses.append(
-                    StreamStart(messageId=self.message_id, sessionId=self.session_id)
+                    StreamStart(messageId=self.message_id, taskId=self.task_id)
                )
                # Open the first step (matches non-SDK: StreamStart then StreamStartStep)
                responses.append(StreamStartStep())
@@ -78,12 +77,7 @@ class SDKResponseAdapter:
        elif isinstance(sdk_message, AssistantMessage):
            # Flush any SDK built-in tool calls that didn't get a UserMessage
            # result (e.g. WebSearch, Read handled internally by the CLI).
-            # BUT skip flush when this AssistantMessage is a parallel tool
-            # continuation (contains only ToolUseBlocks) — the prior tools
-            # are still executing concurrently and haven't finished yet.
-            is_tool_only = all(isinstance(b, ToolUseBlock) for b in sdk_message.content)
-            if not is_tool_only:
-                self._flush_unresolved_tool_calls(responses)
+            self._flush_unresolved_tool_calls(responses)

            # After tool results, the SDK sends a new AssistantMessage for the
            # next LLM turn. Open a new step if the previous one was closed.
@@ -124,24 +118,8 @@ class SDKResponseAdapter:
            blocks = content if isinstance(content, list) else []
            resolved_in_blocks: set[str] = set()

-            sid = (self.session_id or "?")[:12]
-            parent_id_preview = getattr(sdk_message, "parent_tool_use_id", None)
-            logger.info(
-                "[SDK] [%s] UserMessage: %d blocks, content_type=%s, "
-                "parent_tool_use_id=%s",
-                sid,
-                len(blocks),
-                type(content).__name__,
-                parent_id_preview[:12] if parent_id_preview else "None",
-            )
-
            for block in blocks:
                if isinstance(block, ToolResultBlock) and block.tool_use_id:
-                    # Skip if already resolved (e.g. by flush) — the real
-                    # result supersedes the empty flush, but re-emitting
-                    # would confuse the frontend's state machine.
-                    if block.tool_use_id in self.resolved_tool_calls:
-                        continue
                    tool_info = self.current_tool_calls.get(block.tool_use_id, {})
                    tool_name = tool_info.get("name", "unknown")

@@ -166,11 +144,7 @@ class SDKResponseAdapter:
            # Handle SDK built-in tool results carried via parent_tool_use_id
            # instead of (or in addition to) ToolResultBlock content.
            parent_id = sdk_message.parent_tool_use_id
-            if (
-                parent_id
-                and parent_id not in resolved_in_blocks
-                and parent_id not in self.resolved_tool_calls
-            ):
+            if parent_id and parent_id not in resolved_in_blocks:
                tool_info = self.current_tool_calls.get(parent_id, {})
                tool_name = tool_info.get("name", "unknown")

@@ -254,28 +228,11 @@ class SDKResponseAdapter:
        output, which we pop and emit here before the next ``AssistantMessage``
        starts.
        """
-        unresolved = [
-            (tid, info.get("name", "unknown"))
-            for tid, info in self.current_tool_calls.items()
-            if tid not in self.resolved_tool_calls
-        ]
-        sid = (self.session_id or "?")[:12]
-        if not unresolved:
-            logger.info(
-                "[SDK] [%s] Flush called but all %d tool(s) already resolved",
-                sid,
-                len(self.current_tool_calls),
-            )
-            return
-        logger.info(
-            "[SDK] [%s] Flushing %d unresolved tool call(s): %s",
-            sid,
-            len(unresolved),
-            ", ".join(f"{name}({tid[:12]})" for tid, name in unresolved),
-        )
-
        flushed = False
-        for tool_id, tool_name in unresolved:
+        for tool_id, tool_info in self.current_tool_calls.items():
+            if tool_id in self.resolved_tool_calls:
+                continue
+            tool_name = tool_info.get("name", "unknown")
            output = pop_pending_tool_output(tool_name)
            if output is not None:
                responses.append(
@@ -288,12 +245,9 @@ class SDKResponseAdapter:
                )
                self.resolved_tool_calls.add(tool_id)
                flushed = True
-                logger.info(
-                    "[SDK] [%s] Flushed stashed output for %s " "(call %s, %d chars)",
-                    sid,
-                    tool_name,
-                    tool_id[:12],
-                    len(output),
+                logger.debug(
+                    f"Flushed pending output for built-in tool {tool_name} "
+                    f"(call {tool_id})"
                )
            else:
                # No output available — emit an empty output so the frontend
@@ -309,14 +263,9 @@ class SDKResponseAdapter:
                )
                self.resolved_tool_calls.add(tool_id)
                flushed = True
-                logger.warning(
-                    "[SDK] [%s] Flushed EMPTY output for unresolved tool %s "
-                    "(call %s) — stash was empty (likely SDK hook race "
-                    "condition: PostToolUse hook hadn't completed before "
-                    "flush was triggered)",
-                    sid,
-                    tool_name,
-                    tool_id[:12],
+                logger.debug(
+                    f"Flushed empty output for unresolved tool {tool_name} "
+                    f"(call {tool_id})"
                )

        if flushed and self.step_open:
--- a/autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
@@ -1,8 +1,5 @@
 """Unit tests for the SDK response adapter."""

-import asyncio
-
-import pytest
 from claude_agent_sdk import (
    AssistantMessage,
    ResultMessage,
@@ -30,14 +27,12 @@ from backend.copilot.response_model import (

 from .response_adapter import SDKResponseAdapter
 from .tool_adapter import MCP_TOOL_PREFIX
-from .tool_adapter import _pending_tool_outputs as _pto
-from .tool_adapter import _stash_event
-from .tool_adapter import stash_pending_tool_output as _stash
-from .tool_adapter import wait_for_stash


 def _adapter() -> SDKResponseAdapter:
-    return SDKResponseAdapter(message_id="msg-1", session_id="session-1")
+    a = SDKResponseAdapter(message_id="msg-1")
+    a.set_task_id("task-1")
+    return a


 # -- SystemMessage -----------------------------------------------------------
@@ -49,7 +44,7 @@ def test_system_init_emits_start_and_step():
    assert len(results) == 2
    assert isinstance(results[0], StreamStart)
    assert results[0].messageId == "msg-1"
-    assert results[0].sessionId == "session-1"
+    assert results[0].taskId == "task-1"
    assert isinstance(results[1], StreamStartStep)


@@ -369,310 +364,3 @@ def test_full_conversation_flow():
        "StreamFinishStep",  # step 2 closed
        "StreamFinish",
    ]
-
-
-# -- Flush unresolved tool calls --------------------------------------------
-
-
-def test_flush_unresolved_at_result_message():
-    """Built-in tools (WebSearch) without UserMessage results get flushed at ResultMessage."""
-    adapter = _adapter()
-    all_responses: list[StreamBaseResponse] = []
-
-    # 1. Init
-    all_responses.extend(
-        adapter.convert_message(SystemMessage(subtype="init", data={}))
-    )
-    # 2. Tool use (built-in tool — no MCP prefix)
-    all_responses.extend(
-        adapter.convert_message(
-            AssistantMessage(
-                content=[
-                    ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
-                ],
-                model="test",
-            )
-        )
-    )
-    # 3. No UserMessage for this tool — go straight to ResultMessage
-    all_responses.extend(
-        adapter.convert_message(
-            ResultMessage(
-                subtype="success",
-                duration_ms=100,
-                duration_api_ms=50,
-                is_error=False,
-                num_turns=1,
-                session_id="s1",
-            )
-        )
-    )
-
-    types = [type(r).__name__ for r in all_responses]
-    assert types == [
-        "StreamStart",
-        "StreamStartStep",
-        "StreamToolInputStart",
-        "StreamToolInputAvailable",
-        "StreamToolOutputAvailable",  # flushed with empty output
-        "StreamFinishStep",  # step closed by flush
-        "StreamFinish",
-    ]
-    # The flushed output should be empty (no stash available)
-    output_event = [
-        r for r in all_responses if isinstance(r, StreamToolOutputAvailable)
-    ][0]
-    assert output_event.toolCallId == "ws-1"
-    assert output_event.toolName == "WebSearch"
-    assert output_event.output == ""
-
-
-def test_flush_unresolved_at_next_assistant_message():
-    """Built-in tools get flushed when the next AssistantMessage arrives."""
-    adapter = _adapter()
-    all_responses: list[StreamBaseResponse] = []
-
-    # 1. Init
-    all_responses.extend(
-        adapter.convert_message(SystemMessage(subtype="init", data={}))
-    )
-    # 2. Tool use (built-in — no UserMessage will come)
-    all_responses.extend(
-        adapter.convert_message(
-            AssistantMessage(
-                content=[
-                    ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
-                ],
-                model="test",
-            )
-        )
-    )
-    # 3. Next AssistantMessage triggers flush before processing its blocks
-    all_responses.extend(
-        adapter.convert_message(
-            AssistantMessage(
-                content=[TextBlock(text="Here are the results")], model="test"
-            )
-        )
-    )
-
-    types = [type(r).__name__ for r in all_responses]
-    assert types == [
-        "StreamStart",
-        "StreamStartStep",
-        "StreamToolInputStart",
-        "StreamToolInputAvailable",
-        # Flush at next AssistantMessage:
-        "StreamToolOutputAvailable",
-        "StreamFinishStep",  # step closed by flush
-        # New step for continuation text:
-        "StreamStartStep",
-        "StreamTextStart",
-        "StreamTextDelta",
-    ]
-
-
-def test_flush_with_stashed_output():
-    """Stashed output from PostToolUse hook is used when flushing."""
-    adapter = _adapter()
-
-    # Simulate PostToolUse hook stashing output
-    _pto.set({})
-    _stash("WebSearch", "Search result: 5 items found")
-
-    all_responses: list[StreamBaseResponse] = []
-
-    # Tool use
-    all_responses.extend(
-        adapter.convert_message(
-            AssistantMessage(
-                content=[
-                    ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
-                ],
-                model="test",
-            )
-        )
-    )
-    # ResultMessage triggers flush
-    all_responses.extend(
-        adapter.convert_message(
-            ResultMessage(
-                subtype="success",
-                duration_ms=100,
-                duration_api_ms=50,
-                is_error=False,
-                num_turns=1,
-                session_id="s1",
-            )
-        )
-    )
-
-    output_events = [
-        r for r in all_responses if isinstance(r, StreamToolOutputAvailable)
-    ]
-    assert len(output_events) == 1
-    assert output_events[0].output == "Search result: 5 items found"
-
-    # Cleanup
-    _pto.set({})  # type: ignore[arg-type]
-
-
-# -- wait_for_stash synchronisation tests --
-
-
-@pytest.mark.asyncio
-async def test_wait_for_stash_signaled():
-    """wait_for_stash returns True when stash_pending_tool_output signals."""
-    _pto.set({})
-    event = asyncio.Event()
-    _stash_event.set(event)
-
-    # Simulate a PostToolUse hook that stashes output after a short delay
-    async def delayed_stash():
-        await asyncio.sleep(0.01)
-        _stash("WebSearch", "result data")
-
-    asyncio.create_task(delayed_stash())
-    result = await wait_for_stash(timeout=1.0)
-
-    assert result is True
-    assert _pto.get({}).get("WebSearch") == ["result data"]
-
-    # Cleanup
-    _pto.set({})  # type: ignore[arg-type]
-    _stash_event.set(None)
-
-
-@pytest.mark.asyncio
-async def test_wait_for_stash_timeout():
-    """wait_for_stash returns False on timeout when no stash occurs."""
-    _pto.set({})
-    event = asyncio.Event()
-    _stash_event.set(event)
-
-    result = await wait_for_stash(timeout=0.05)
-    assert result is False
-
-    # Cleanup
-    _pto.set({})  # type: ignore[arg-type]
-    _stash_event.set(None)
-
-
-@pytest.mark.asyncio
-async def test_wait_for_stash_already_stashed():
-    """wait_for_stash picks up a stash that happened just before the wait."""
-    _pto.set({})
-    event = asyncio.Event()
-    _stash_event.set(event)
-
-    # Stash before waiting — simulates hook completing before message arrives
-    _stash("Read", "file contents")
-    # Event is now set; wait_for_stash detects the fast path and returns
-    # immediately without timing out.
-    result = await wait_for_stash(timeout=0.05)
-    assert result is True
-
-    # But the stash itself is populated
-    assert _pto.get({}).get("Read") == ["file contents"]
-
-    # Cleanup
-    _pto.set({})  # type: ignore[arg-type]
-    _stash_event.set(None)
-
-
-# -- Parallel tool call tests --
-
-
-def test_parallel_tool_calls_not_flushed_prematurely():
-    """Parallel tool calls should NOT be flushed when the next AssistantMessage
-    only contains ToolUseBlocks (parallel continuation)."""
-    adapter = SDKResponseAdapter()
-
-    # Init
-    adapter.convert_message(SystemMessage(subtype="init", data={}))
-
-    # First AssistantMessage: tool call #1
-    msg1 = AssistantMessage(
-        content=[ToolUseBlock(id="t1", name="WebSearch", input={"q": "foo"})],
-        model="test",
-    )
-    r1 = adapter.convert_message(msg1)
-    assert any(isinstance(r, StreamToolInputAvailable) for r in r1)
-    assert adapter.has_unresolved_tool_calls
-
-    # Second AssistantMessage: tool call #2 (parallel continuation)
-    msg2 = AssistantMessage(
-        content=[ToolUseBlock(id="t2", name="WebSearch", input={"q": "bar"})],
-        model="test",
-    )
-    r2 = adapter.convert_message(msg2)
-
-    # No flush should have happened — t1 should NOT have StreamToolOutputAvailable
-    output_events = [r for r in r2 if isinstance(r, StreamToolOutputAvailable)]
-    assert len(output_events) == 0, (
-        f"Tool-only AssistantMessage should not flush prior tools, "
-        f"but got {len(output_events)} output events"
-    )
-
-    # Both t1 and t2 should still be unresolved
-    assert "t1" not in adapter.resolved_tool_calls
-    assert "t2" not in adapter.resolved_tool_calls
-
-
-def test_text_assistant_message_flushes_prior_tools():
-    """An AssistantMessage with text (new turn) should flush unresolved tools."""
-    adapter = SDKResponseAdapter()
-
-    # Init
-    adapter.convert_message(SystemMessage(subtype="init", data={}))
-
-    # Tool call
-    msg1 = AssistantMessage(
-        content=[ToolUseBlock(id="t1", name="WebSearch", input={"q": "foo"})],
-        model="test",
-    )
-    adapter.convert_message(msg1)
-    assert adapter.has_unresolved_tool_calls
-
-    # Text AssistantMessage (new turn after tools completed)
-    msg2 = AssistantMessage(
-        content=[TextBlock(text="Here are the results")],
-        model="test",
-    )
-    r2 = adapter.convert_message(msg2)
-
-    # Flush SHOULD have happened — t1 gets empty output
-    output_events = [r for r in r2 if isinstance(r, StreamToolOutputAvailable)]
-    assert len(output_events) == 1
-    assert output_events[0].toolCallId == "t1"
-    assert "t1" in adapter.resolved_tool_calls
-
-
-def test_already_resolved_tool_skipped_in_user_message():
-    """A tool result in UserMessage should be skipped if already resolved by flush."""
-    adapter = SDKResponseAdapter()
-
-    adapter.convert_message(SystemMessage(subtype="init", data={}))
-
-    # Tool call + flush via text message
-    adapter.convert_message(
-        AssistantMessage(
-            content=[ToolUseBlock(id="t1", name="WebSearch", input={})],
-            model="test",
-        )
-    )
-    adapter.convert_message(
-        AssistantMessage(
-            content=[TextBlock(text="Done")],
-            model="test",
-        )
-    )
-    assert "t1" in adapter.resolved_tool_calls
-
-    # Now UserMessage arrives with the real result — should be skipped
-    user_msg = UserMessage(content=[ToolResultBlock(tool_use_id="t1", content="real")])
-    r = adapter.convert_message(user_msg)
-    output_events = [r_ for r_ in r if isinstance(r_, StreamToolOutputAvailable)]
-    assert (
-        len(output_events) == 0
-    ), "Already-resolved tool should not emit duplicate output"
--- a/autogpt_platform/backend/backend/copilot/sdk/sdk_compat_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/sdk_compat_test.py
@@ -1,194 +0,0 @@
-"""SDK compatibility tests — verify the claude-agent-sdk public API surface we depend on.
-
-Instead of pinning to a narrow version range, these tests verify that the
-installed SDK exposes every class, function, attribute, and method the copilot
-integration relies on.  If an SDK upgrade removes or renames something these
-tests will catch it immediately.
-"""
-
-import inspect
-
-import pytest
-
-# ---------------------------------------------------------------------------
-# Public types & factories
-# ---------------------------------------------------------------------------
-
-
-def test_sdk_exports_client_and_options():
-    from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
-
-    assert inspect.isclass(ClaudeSDKClient)
-    assert inspect.isclass(ClaudeAgentOptions)
-
-
-def test_sdk_exports_message_types():
-    from claude_agent_sdk import (
-        AssistantMessage,
-        Message,
-        ResultMessage,
-        SystemMessage,
-        UserMessage,
-    )
-
-    for cls in (AssistantMessage, ResultMessage, SystemMessage, UserMessage):
-        assert inspect.isclass(cls), f"{cls.__name__} is not a class"
-    # Message is a Union type alias, just verify it's importable
-    assert Message is not None
-
-
-def test_sdk_exports_content_block_types():
-    from claude_agent_sdk import TextBlock, ToolResultBlock, ToolUseBlock
-
-    for cls in (TextBlock, ToolResultBlock, ToolUseBlock):
-        assert inspect.isclass(cls), f"{cls.__name__} is not a class"
-
-
-def test_sdk_exports_mcp_helpers():
-    from claude_agent_sdk import create_sdk_mcp_server, tool
-
-    assert callable(create_sdk_mcp_server)
-    assert callable(tool)
-
-
-# ---------------------------------------------------------------------------
-# ClaudeSDKClient interface
-# ---------------------------------------------------------------------------
-
-
-def test_client_has_required_methods():
-    from claude_agent_sdk import ClaudeSDKClient
-
-    required = ["connect", "disconnect", "query", "receive_messages"]
-    for name in required:
-        attr = getattr(ClaudeSDKClient, name, None)
-        assert attr is not None, f"ClaudeSDKClient.{name} missing"
-        assert callable(attr), f"ClaudeSDKClient.{name} is not callable"
-
-
-def test_client_supports_async_context_manager():
-    from claude_agent_sdk import ClaudeSDKClient
-
-    assert hasattr(ClaudeSDKClient, "__aenter__")
-    assert hasattr(ClaudeSDKClient, "__aexit__")
-
-
-# ---------------------------------------------------------------------------
-# ClaudeAgentOptions fields
-# ---------------------------------------------------------------------------
-
-
-def test_agent_options_accepts_required_fields():
-    """Verify ClaudeAgentOptions accepts all kwargs our code passes."""
-    from claude_agent_sdk import ClaudeAgentOptions
-
-    opts = ClaudeAgentOptions(
-        system_prompt="test",
-        cwd="/tmp",
-    )
-    assert opts.system_prompt == "test"
-    assert opts.cwd == "/tmp"
-
-
-def test_agent_options_accepts_all_our_fields():
-    """Comprehensive check of every field we use in service.py."""
-    from claude_agent_sdk import ClaudeAgentOptions
-
-    fields_we_use = [
-        "system_prompt",
-        "mcp_servers",
-        "allowed_tools",
-        "disallowed_tools",
-        "hooks",
-        "cwd",
-        "model",
-        "env",
-        "resume",
-        "max_buffer_size",
-    ]
-    sig = inspect.signature(ClaudeAgentOptions)
-    for field in fields_we_use:
-        assert field in sig.parameters, (
-            f"ClaudeAgentOptions no longer accepts '{field}' — "
-            f"available params: {list(sig.parameters.keys())}"
-        )
-
-
-# ---------------------------------------------------------------------------
-# Message attributes
-# ---------------------------------------------------------------------------
-
-
-def test_assistant_message_has_content_and_model():
-    from claude_agent_sdk import AssistantMessage, TextBlock
-
-    msg = AssistantMessage(content=[TextBlock(text="hi")], model="test")
-    assert hasattr(msg, "content")
-    assert hasattr(msg, "model")
-
-
-def test_result_message_has_required_attrs():
-    from claude_agent_sdk import ResultMessage
-
-    msg = ResultMessage(
-        subtype="success",
-        duration_ms=100,
-        duration_api_ms=50,
-        is_error=False,
-        num_turns=1,
-        session_id="s1",
-    )
-    assert msg.subtype == "success"
-    assert hasattr(msg, "result")
-
-
-def test_system_message_has_subtype_and_data():
-    from claude_agent_sdk import SystemMessage
-
-    msg = SystemMessage(subtype="init", data={})
-    assert msg.subtype == "init"
-    assert msg.data == {}
-
-
-def test_user_message_has_parent_tool_use_id():
-    from claude_agent_sdk import UserMessage
-
-    msg = UserMessage(content="test")
-    assert hasattr(msg, "parent_tool_use_id")
-    assert hasattr(msg, "tool_use_result")
-
-
-def test_tool_use_block_has_id_name_input():
-    from claude_agent_sdk import ToolUseBlock
-
-    block = ToolUseBlock(id="t1", name="test", input={"key": "val"})
-    assert block.id == "t1"
-    assert block.name == "test"
-    assert block.input == {"key": "val"}
-
-
-def test_tool_result_block_has_required_attrs():
-    from claude_agent_sdk import ToolResultBlock
-
-    block = ToolResultBlock(tool_use_id="t1", content="result")
-    assert block.tool_use_id == "t1"
-    assert block.content == "result"
-    assert hasattr(block, "is_error")
-
-
-# ---------------------------------------------------------------------------
-# Hook types
-# ---------------------------------------------------------------------------
-
-
-@pytest.mark.parametrize(
-    "hook_event",
-    ["PreToolUse", "PostToolUse", "Stop"],
-)
-def test_sdk_exports_hook_event_type(hook_event: str):
-    """Verify HookEvent literal includes the events our security_hooks use."""
-    from claude_agent_sdk.types import HookEvent
-
-    # HookEvent is a Literal type — check that our events are valid values.
-    # We can't easily inspect Literal at runtime, so just verify the type exists.
-    assert HookEvent is not None
--- a/autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
@@ -6,6 +6,7 @@ ensuring multi-user isolation and preventing unauthorized operations.

 import json
 import logging
+import os
 import re
 from collections.abc import Callable
 from typing import Any, cast
@@ -15,7 +16,6 @@ from .tool_adapter import (
    DANGEROUS_PATTERNS,
    MCP_TOOL_PREFIX,
    WORKSPACE_SCOPED_TOOLS,
-    is_allowed_local_path,
    stash_pending_tool_output,
 )

@@ -38,20 +38,40 @@ def _validate_workspace_path(
 ) -> dict[str, Any]:
    """Validate that a workspace-scoped tool only accesses allowed paths.

-    Delegates to :func:`is_allowed_local_path` which permits:
+    Allowed directories:
    - The SDK working directory (``/tmp/copilot-<session>/``)
-    - The current session's tool-results directory
-      (``~/.claude/projects/<encoded-cwd>/tool-results/``)
+    - The SDK tool-results directory (``~/.claude/projects/…/tool-results/``)
    """
    path = tool_input.get("file_path") or tool_input.get("path") or ""
    if not path:
        # Glob/Grep without a path default to cwd which is already sandboxed
        return {}

-    if is_allowed_local_path(path, sdk_cwd):
+    # Resolve relative paths against sdk_cwd (the SDK sets cwd so the LLM
+    # naturally uses relative paths like "test.txt" instead of absolute ones).
+    # Tilde paths (~/) are home-dir references, not relative — expand first.
+    if path.startswith("~"):
+        resolved = os.path.realpath(os.path.expanduser(path))
+    elif not os.path.isabs(path) and sdk_cwd:
+        resolved = os.path.realpath(os.path.join(sdk_cwd, path))
+    else:
+        resolved = os.path.realpath(path)
+
+    # Allow access within the SDK working directory
+    if sdk_cwd:
+        norm_cwd = os.path.realpath(sdk_cwd)
+        if resolved.startswith(norm_cwd + os.sep) or resolved == norm_cwd:
+            return {}
+
+    # Allow access to ~/.claude/projects/*/tool-results/ (big tool results)
+    claude_dir = os.path.realpath(os.path.expanduser("~/.claude/projects"))
+    tool_results_seg = os.sep + "tool-results" + os.sep
+    if resolved.startswith(claude_dir + os.sep) and tool_results_seg in resolved:
        return {}

-    logger.warning(f"Blocked {tool_name} outside workspace: {path}")
+    logger.warning(
+        f"Blocked {tool_name} outside workspace: {path} (resolved={resolved})"
+    )
    workspace_hint = f" Allowed workspace: {sdk_cwd}" if sdk_cwd else ""
    return _deny(
        f"[SECURITY] Tool '{tool_name}' can only access files within the workspace "
@@ -104,20 +124,20 @@ def _validate_user_isolation(
    """Validate that tool calls respect user isolation."""
    # For workspace file tools, ensure path doesn't escape
    if "workspace" in tool_name.lower():
-        # The "path" param is a cloud storage key (e.g. "/ASEAN/report.md")
-        # where a leading "/" is normal.  Only check for ".." traversal.
-        # Filesystem paths (source_path, save_to_path) are validated inside
-        # the tool itself via _validate_ephemeral_path.
        path = tool_input.get("path", "") or tool_input.get("file_path", "")
-        if path and ".." in path:
-            logger.warning(f"Blocked path traversal attempt: {path} by user {user_id}")
-            return {
-                "hookSpecificOutput": {
-                    "hookEventName": "PreToolUse",
-                    "permissionDecision": "deny",
-                    "permissionDecisionReason": "Path traversal not allowed",
+        if path:
+            # Check for path traversal
+            if ".." in path or path.startswith("/"):
+                logger.warning(
+                    f"Blocked path traversal attempt: {path} by user {user_id}"
+                )
+                return {
+                    "hookSpecificOutput": {
+                        "hookEventName": "PreToolUse",
+                        "permissionDecision": "deny",
+                        "permissionDecisionReason": "Path traversal not allowed",
+                    }
                }
-            }

    return {}

@@ -126,7 +146,7 @@ def create_security_hooks(
    user_id: str | None,
    sdk_cwd: str | None = None,
    max_subtasks: int = 3,
-    on_compact: Callable[[], None] | None = None,
+    on_stop: Callable[[str, str], None] | None = None,
 ) -> dict[str, Any]:
    """Create the security hooks configuration for Claude Agent SDK.

@@ -135,12 +155,15 @@ def create_security_hooks(
    - PostToolUse: Log successful tool executions
    - PostToolUseFailure: Log and handle failed tool executions
    - PreCompact: Log context compaction events (SDK handles compaction automatically)
+    - Stop: Capture transcript path for stateless resume (when *on_stop* is provided)

    Args:
        user_id: Current user ID for isolation validation
        sdk_cwd: SDK working directory for workspace-scoped tool validation
-        max_subtasks: Maximum concurrent Task (sub-agent) spawns allowed per session
-        on_compact: Callback invoked when SDK starts compacting context.
+        max_subtasks: Maximum Task (sub-agent) spawns allowed per session
+        on_stop: Callback ``(transcript_path, sdk_session_id)`` invoked when
+            the SDK finishes processing — used to read the JSONL transcript
+            before the CLI process exits.

    Returns:
        Hooks configuration dict for ClaudeAgentOptions
@@ -149,9 +172,8 @@ def create_security_hooks(
        from claude_agent_sdk import HookMatcher
        from claude_agent_sdk.types import HookContext, HookInput, SyncHookJSONOutput

-        # Per-session tracking for Task sub-agent concurrency.
-        # Set of tool_use_ids that consumed a slot — len() is the active count.
-        task_tool_use_ids: set[str] = set()
+        # Per-session counter for Task sub-agent spawns
+        task_spawn_count = 0

        async def pre_tool_use_hook(
            input_data: HookInput,
@@ -159,34 +181,23 @@ def create_security_hooks(
            context: HookContext,
        ) -> SyncHookJSONOutput:
            """Combined pre-tool-use validation hook."""
+            nonlocal task_spawn_count
            _ = context  # unused but required by signature
            tool_name = cast(str, input_data.get("tool_name", ""))
            tool_input = cast(dict[str, Any], input_data.get("tool_input", {}))

            # Rate-limit Task (sub-agent) spawns per session
            if tool_name == "Task":
-                # Block background task execution first — denied calls
-                # should not consume a subtask slot.
-                if tool_input.get("run_in_background"):
-                    logger.info(f"[SDK] Blocked background Task, user={user_id}")
-                    return cast(
-                        SyncHookJSONOutput,
-                        _deny(
-                            "Background task execution is not supported. "
-                            "Run tasks in the foreground instead "
-                            "(remove the run_in_background parameter)."
-                        ),
-                    )
-                if len(task_tool_use_ids) >= max_subtasks:
+                task_spawn_count += 1
+                if task_spawn_count > max_subtasks:
                    logger.warning(
                        f"[SDK] Task limit reached ({max_subtasks}), user={user_id}"
                    )
                    return cast(
                        SyncHookJSONOutput,
                        _deny(
-                            f"Maximum {max_subtasks} concurrent sub-tasks. "
-                            "Wait for running sub-tasks to finish, "
-                            "or continue in the main conversation."
+                            f"Maximum {max_subtasks} sub-tasks per session. "
+                            "Please continue in the main conversation."
                        ),
                    )

@@ -206,24 +217,9 @@ def create_security_hooks(
            if result:
                return cast(SyncHookJSONOutput, result)

-            # Reserve the Task slot only after all validations pass
-            if tool_name == "Task" and tool_use_id is not None:
-                task_tool_use_ids.add(tool_use_id)
-
            logger.debug(f"[SDK] Tool start: {tool_name}, user={user_id}")
            return cast(SyncHookJSONOutput, {})

-        def _release_task_slot(tool_name: str, tool_use_id: str | None) -> None:
-            """Release a Task concurrency slot if one was reserved."""
-            if tool_name == "Task" and tool_use_id in task_tool_use_ids:
-                task_tool_use_ids.discard(tool_use_id)
-                logger.info(
-                    "[SDK] Task slot released, active=%d/%d, user=%s",
-                    len(task_tool_use_ids),
-                    max_subtasks,
-                    user_id,
-                )
-
        async def post_tool_use_hook(
            input_data: HookInput,
            tool_use_id: str | None,
@@ -238,35 +234,15 @@ def create_security_hooks(
            """
            _ = context
            tool_name = cast(str, input_data.get("tool_name", ""))
-
-            _release_task_slot(tool_name, tool_use_id)
-            is_builtin = not tool_name.startswith(MCP_TOOL_PREFIX)
-            logger.info(
-                "[SDK] PostToolUse: %s (builtin=%s, tool_use_id=%s)",
-                tool_name,
-                is_builtin,
-                (tool_use_id or "")[:12],
-            )
+            logger.debug(f"[SDK] Tool success: {tool_name}, tool_use_id={tool_use_id}")

            # Stash output for SDK built-in tools so the response adapter can
            # emit StreamToolOutputAvailable even when the CLI doesn't surface
            # a separate UserMessage with ToolResultBlock content.
-            if is_builtin:
+            if not tool_name.startswith(MCP_TOOL_PREFIX):
                tool_response = input_data.get("tool_response")
                if tool_response is not None:
-                    resp_preview = str(tool_response)[:100]
-                    logger.info(
-                        "[SDK] Stashing builtin output for %s (%d chars): %s...",
-                        tool_name,
-                        len(str(tool_response)),
-                        resp_preview,
-                    )
                    stash_pending_tool_output(tool_name, tool_response)
-                else:
-                    logger.warning(
-                        "[SDK] PostToolUse for builtin %s but tool_response is None",
-                        tool_name,
-                    )

            return cast(SyncHookJSONOutput, {})

@@ -283,9 +259,6 @@ def create_security_hooks(
                f"[SDK] Tool failed: {tool_name}, error={error}, "
                f"user={user_id}, tool_use_id={tool_use_id}"
            )
-
-            _release_task_slot(tool_name, tool_use_id)
-
            return cast(SyncHookJSONOutput, {})

        async def pre_compact_hook(
@@ -303,8 +276,30 @@ def create_security_hooks(
            logger.info(
                f"[SDK] Context compaction triggered: {trigger}, user={user_id}"
            )
-            if on_compact is not None:
-                on_compact()
+            return cast(SyncHookJSONOutput, {})
+
+        # --- Stop hook: capture transcript path for stateless resume ---
+        async def stop_hook(
+            input_data: HookInput,
+            tool_use_id: str | None,
+            context: HookContext,
+        ) -> SyncHookJSONOutput:
+            """Capture transcript path when SDK finishes processing.
+
+            The Stop hook fires while the CLI process is still alive, giving us
+            a reliable window to read the JSONL transcript before SIGTERM.
+            """
+            _ = context, tool_use_id
+            transcript_path = cast(str, input_data.get("transcript_path", ""))
+            sdk_session_id = cast(str, input_data.get("session_id", ""))
+
+            if transcript_path and on_stop:
+                logger.info(
+                    f"[SDK] Stop hook: transcript_path={transcript_path}, "
+                    f"sdk_session_id={sdk_session_id[:12]}..."
+                )
+                on_stop(transcript_path, sdk_session_id)
+
            return cast(SyncHookJSONOutput, {})

        hooks: dict[str, Any] = {
@@ -316,6 +311,9 @@ def create_security_hooks(
            "PreCompact": [HookMatcher(matcher="*", hooks=[pre_compact_hook])],
        }

+        if on_stop is not None:
+            hooks["Stop"] = [HookMatcher(matcher=None, hooks=[stop_hook])]
+
        return hooks
    except ImportError:
        # Fallback for when SDK isn't available - return empty hooks
--- a/autogpt_platform/backend/backend/copilot/sdk/security_hooks_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/security_hooks_test.py
@@ -7,23 +7,11 @@ tool access, and dangerous input patterns.

 import os

-import pytest
-
 from .security_hooks import _validate_tool_access, _validate_user_isolation
-from .service import _is_tool_error_or_denial

 SDK_CWD = "/tmp/copilot-abc123"


-def _sdk_available() -> bool:
-    try:
-        import claude_agent_sdk  # noqa: F401
-
-        return True
-    except ImportError:
-        return False
-
-
 def _is_denied(result: dict) -> bool:
    hook = result.get("hookSpecificOutput", {})
    return hook.get("permissionDecision") == "deny"
@@ -120,31 +108,17 @@ def test_read_no_cwd_denies_absolute():


 def test_read_tool_results_allowed():
-    from .tool_adapter import _current_project_dir
-
    home = os.path.expanduser("~")
    path = f"{home}/.claude/projects/-tmp-copilot-abc123/tool-results/12345.txt"
-    # is_allowed_local_path requires the session's encoded cwd to be set
-    token = _current_project_dir.set("-tmp-copilot-abc123")
-    try:
-        result = _validate_tool_access("Read", {"file_path": path}, sdk_cwd=SDK_CWD)
-        assert result == {}
-    finally:
-        _current_project_dir.reset(token)
+    result = _validate_tool_access("Read", {"file_path": path}, sdk_cwd=SDK_CWD)
+    assert result == {}


-def test_read_claude_projects_session_dir_allowed():
-    """Files within the current session's project dir are allowed."""
-    from .tool_adapter import _current_project_dir
-
+def test_read_claude_projects_without_tool_results_denied():
    home = os.path.expanduser("~")
    path = f"{home}/.claude/projects/-tmp-copilot-abc123/settings.json"
-    token = _current_project_dir.set("-tmp-copilot-abc123")
-    try:
-        result = _validate_tool_access("Read", {"file_path": path}, sdk_cwd=SDK_CWD)
-        assert not _is_denied(result)
-    finally:
-        _current_project_dir.reset(token)
+    result = _validate_tool_access("Read", {"file_path": path}, sdk_cwd=SDK_CWD)
+    assert _is_denied(result)


 # -- Built-in Bash is blocked (use bash_exec MCP tool instead) ---------------
@@ -179,12 +153,11 @@ def test_workspace_path_traversal_blocked():
    assert _is_denied(result)


-def test_workspace_absolute_path_allowed():
-    """Workspace 'path' is a cloud storage key — leading '/' is normal."""
+def test_workspace_absolute_path_blocked():
    result = _validate_user_isolation(
-        "workspace_read", {"path": "/ASEAN/report.md"}, user_id="user-1"
+        "workspace_read", {"path": "/etc/passwd"}, user_id="user-1"
    )
-    assert result == {}
+    assert _is_denied(result)


 def test_workspace_normal_path_allowed():
@@ -215,218 +188,3 @@ def test_bash_builtin_blocked_message_clarity():
    reason = _reason(_validate_tool_access("Bash", {"command": "echo hello"}))
    assert "[SECURITY]" in reason
    assert "cannot be bypassed" in reason
-
-
-# -- Task sub-agent hooks (require SDK) --------------------------------------
-
-
-@pytest.fixture()
-def _hooks():
-    """Create security hooks and return (pre, post, post_failure) handlers."""
-    from .security_hooks import create_security_hooks
-
-    hooks = create_security_hooks(user_id="u1", sdk_cwd=SDK_CWD, max_subtasks=2)
-    pre = hooks["PreToolUse"][0].hooks[0]
-    post = hooks["PostToolUse"][0].hooks[0]
-    post_failure = hooks["PostToolUseFailure"][0].hooks[0]
-    return pre, post, post_failure
-
-
-@pytest.mark.skipif(not _sdk_available(), reason="claude_agent_sdk not installed")
-@pytest.mark.asyncio
-async def test_task_background_blocked(_hooks):
-    """Task with run_in_background=true must be denied."""
-    pre, _, _ = _hooks
-    result = await pre(
-        {"tool_name": "Task", "tool_input": {"run_in_background": True, "prompt": "x"}},
-        tool_use_id=None,
-        context={},
-    )
-    assert _is_denied(result)
-    assert "foreground" in _reason(result).lower()
-
-
-@pytest.mark.skipif(not _sdk_available(), reason="claude_agent_sdk not installed")
-@pytest.mark.asyncio
-async def test_task_foreground_allowed(_hooks):
-    """Task without run_in_background should be allowed."""
-    pre, _, _ = _hooks
-    result = await pre(
-        {"tool_name": "Task", "tool_input": {"prompt": "do stuff"}},
-        tool_use_id="tu-1",
-        context={},
-    )
-    assert not _is_denied(result)
-
-
-@pytest.mark.skipif(not _sdk_available(), reason="claude_agent_sdk not installed")
-@pytest.mark.asyncio
-async def test_task_limit_enforced(_hooks):
-    """Task spawns beyond max_subtasks should be denied."""
-    pre, _, _ = _hooks
-    # First two should pass
-    for i in range(2):
-        result = await pre(
-            {"tool_name": "Task", "tool_input": {"prompt": "ok"}},
-            tool_use_id=f"tu-limit-{i}",
-            context={},
-        )
-        assert not _is_denied(result)
-
-    # Third should be denied (limit=2)
-    result = await pre(
-        {"tool_name": "Task", "tool_input": {"prompt": "over limit"}},
-        tool_use_id="tu-limit-2",
-        context={},
-    )
-    assert _is_denied(result)
-    assert "Maximum" in _reason(result)
-
-
-@pytest.mark.skipif(not _sdk_available(), reason="claude_agent_sdk not installed")
-@pytest.mark.asyncio
-async def test_task_slot_released_on_completion(_hooks):
-    """Completing a Task should free a slot so new Tasks can be spawned."""
-    pre, post, _ = _hooks
-    # Fill both slots
-    for i in range(2):
-        result = await pre(
-            {"tool_name": "Task", "tool_input": {"prompt": "ok"}},
-            tool_use_id=f"tu-comp-{i}",
-            context={},
-        )
-        assert not _is_denied(result)
-
-    # Third should be denied — at capacity
-    result = await pre(
-        {"tool_name": "Task", "tool_input": {"prompt": "over"}},
-        tool_use_id="tu-comp-2",
-        context={},
-    )
-    assert _is_denied(result)
-
-    # Complete first task — frees a slot
-    await post(
-        {"tool_name": "Task", "tool_input": {}},
-        tool_use_id="tu-comp-0",
-        context={},
-    )
-
-    # Now a new Task should be allowed
-    result = await pre(
-        {"tool_name": "Task", "tool_input": {"prompt": "after release"}},
-        tool_use_id="tu-comp-3",
-        context={},
-    )
-    assert not _is_denied(result)
-
-
-@pytest.mark.skipif(not _sdk_available(), reason="claude_agent_sdk not installed")
-@pytest.mark.asyncio
-async def test_task_slot_released_on_failure(_hooks):
-    """A failed Task should also free its concurrency slot."""
-    pre, _, post_failure = _hooks
-    # Fill both slots
-    for i in range(2):
-        result = await pre(
-            {"tool_name": "Task", "tool_input": {"prompt": "ok"}},
-            tool_use_id=f"tu-fail-{i}",
-            context={},
-        )
-        assert not _is_denied(result)
-
-    # At capacity
-    result = await pre(
-        {"tool_name": "Task", "tool_input": {"prompt": "over"}},
-        tool_use_id="tu-fail-2",
-        context={},
-    )
-    assert _is_denied(result)
-
-    # Fail first task — should free a slot
-    await post_failure(
-        {"tool_name": "Task", "tool_input": {}, "error": "something broke"},
-        tool_use_id="tu-fail-0",
-        context={},
-    )
-
-    # New Task should be allowed
-    result = await pre(
-        {"tool_name": "Task", "tool_input": {"prompt": "after failure"}},
-        tool_use_id="tu-fail-3",
-        context={},
-    )
-    assert not _is_denied(result)
-
-
-# -- _is_tool_error_or_denial ------------------------------------------------
-
-
-class TestIsToolErrorOrDenial:
-    def test_none_content(self):
-        assert _is_tool_error_or_denial(None) is False
-
-    def test_empty_content(self):
-        assert _is_tool_error_or_denial("") is False
-
-    def test_benign_output(self):
-        assert _is_tool_error_or_denial("All good, no issues.") is False
-
-    def test_security_marker(self):
-        assert _is_tool_error_or_denial("[SECURITY] Tool access blocked") is True
-
-    def test_cannot_be_bypassed(self):
-        assert _is_tool_error_or_denial("This restriction cannot be bypassed.") is True
-
-    def test_not_allowed(self):
-        assert _is_tool_error_or_denial("Operation not allowed in sandbox") is True
-
-    def test_background_task_denial(self):
-        assert (
-            _is_tool_error_or_denial(
-                "Background task execution is not supported. "
-                "Run tasks in the foreground instead."
-            )
-            is True
-        )
-
-    def test_subtask_limit_denial(self):
-        assert (
-            _is_tool_error_or_denial(
-                "Maximum 2 concurrent sub-tasks. "
-                "Wait for running sub-tasks to finish, "
-                "or continue in the main conversation."
-            )
-            is True
-        )
-
-    def test_denied_marker(self):
-        assert (
-            _is_tool_error_or_denial("Access denied: insufficient privileges") is True
-        )
-
-    def test_blocked_marker(self):
-        assert _is_tool_error_or_denial("Request blocked by security policy") is True
-
-    def test_failed_marker(self):
-        assert _is_tool_error_or_denial("Failed to execute tool: timeout") is True
-
-    def test_mcp_iserror(self):
-        assert _is_tool_error_or_denial('{"isError": true, "content": []}') is True
-
-    def test_benign_error_in_value(self):
-        """Content like '0 errors found' should not trigger — 'error' was removed."""
-        assert _is_tool_error_or_denial("0 errors found") is False
-
-    def test_benign_permission_field(self):
-        """Schema descriptions mentioning 'permission' should not trigger."""
-        assert (
-            _is_tool_error_or_denial(
-                '{"fields": [{"name": "permission_level", "type": "int"}]}'
-            )
-            is False
-        )
-
-    def test_benign_not_found_in_listing(self):
-        """File listing containing 'not found' in filenames should not trigger."""
-        assert _is_tool_error_or_denial("readme.md\nfile-not-found-handler.py") is False
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
--- a/autogpt_platform/backend/backend/copilot/sdk/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service_test.py
@@ -1,247 +0,0 @@
-"""Tests for SDK service helpers."""
-
-import base64
-import os
-from dataclasses import dataclass
-from unittest.mock import AsyncMock, patch
-
-import pytest
-
-from .service import _prepare_file_attachments
-
-
-@dataclass
-class _FakeFileInfo:
-    id: str
-    name: str
-    path: str
-    mime_type: str
-    size_bytes: int
-
-
-_PATCH_TARGET = "backend.copilot.sdk.service.get_manager"
-
-
-class TestPrepareFileAttachments:
-    @pytest.mark.asyncio
-    async def test_empty_list_returns_empty(self, tmp_path):
-        result = await _prepare_file_attachments([], "u", "s", str(tmp_path))
-        assert result.hint == ""
-        assert result.image_blocks == []
-
-    @pytest.mark.asyncio
-    async def test_image_embedded_as_vision_block(self, tmp_path):
-        """JPEG images should become vision content blocks, not files on disk."""
-        raw = b"\xff\xd8\xff\xe0fake-jpeg"
-        info = _FakeFileInfo(
-            id="abc",
-            name="photo.jpg",
-            path="/photo.jpg",
-            mime_type="image/jpeg",
-            size_bytes=len(raw),
-        )
-        mgr = AsyncMock()
-        mgr.get_file_info.return_value = info
-        mgr.read_file_by_id.return_value = raw
-
-        with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mgr):
-            result = await _prepare_file_attachments(
-                ["abc"], "user1", "sess1", str(tmp_path)
-            )
-
-        assert "1 file" in result.hint
-        assert "photo.jpg" in result.hint
-        assert "embedded as image" in result.hint
-        assert len(result.image_blocks) == 1
-        block = result.image_blocks[0]
-        assert block["type"] == "image"
-        assert block["source"]["media_type"] == "image/jpeg"
-        assert block["source"]["data"] == base64.b64encode(raw).decode("ascii")
-        # Image should NOT be written to disk (embedded instead)
-        assert not os.path.exists(os.path.join(tmp_path, "photo.jpg"))
-
-    @pytest.mark.asyncio
-    async def test_pdf_saved_to_disk(self, tmp_path):
-        """PDFs should be saved to disk for Read tool access, not embedded."""
-        info = _FakeFileInfo("f1", "doc.pdf", "/doc.pdf", "application/pdf", 50)
-        mgr = AsyncMock()
-        mgr.get_file_info.return_value = info
-        mgr.read_file_by_id.return_value = b"%PDF-1.4 fake"
-
-        with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mgr):
-            result = await _prepare_file_attachments(["f1"], "u", "s", str(tmp_path))
-
-        assert result.image_blocks == []
-        saved = tmp_path / "doc.pdf"
-        assert saved.exists()
-        assert saved.read_bytes() == b"%PDF-1.4 fake"
-        assert str(saved) in result.hint
-
-    @pytest.mark.asyncio
-    async def test_mixed_images_and_files(self, tmp_path):
-        """Images become blocks, non-images go to disk."""
-        infos = {
-            "id1": _FakeFileInfo("id1", "a.png", "/a.png", "image/png", 4),
-            "id2": _FakeFileInfo("id2", "b.pdf", "/b.pdf", "application/pdf", 4),
-            "id3": _FakeFileInfo("id3", "c.txt", "/c.txt", "text/plain", 4),
-        }
-        mgr = AsyncMock()
-        mgr.get_file_info.side_effect = lambda fid: infos[fid]
-        mgr.read_file_by_id.return_value = b"data"
-
-        with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mgr):
-            result = await _prepare_file_attachments(
-                ["id1", "id2", "id3"], "u", "s", str(tmp_path)
-            )
-
-        assert "3 files" in result.hint
-        assert "a.png" in result.hint
-        assert "b.pdf" in result.hint
-        assert "c.txt" in result.hint
-        # Only the image should be a vision block
-        assert len(result.image_blocks) == 1
-        assert result.image_blocks[0]["source"]["media_type"] == "image/png"
-        # Non-image files should be on disk
-        assert (tmp_path / "b.pdf").exists()
-        assert (tmp_path / "c.txt").exists()
-        # Read tool hint should appear (has non-image files)
-        assert "Read tool" in result.hint
-
-    @pytest.mark.asyncio
-    async def test_singular_noun(self, tmp_path):
-        info = _FakeFileInfo("x", "only.txt", "/only.txt", "text/plain", 2)
-        mgr = AsyncMock()
-        mgr.get_file_info.return_value = info
-        mgr.read_file_by_id.return_value = b"hi"
-
-        with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mgr):
-            result = await _prepare_file_attachments(["x"], "u", "s", str(tmp_path))
-
-        assert "1 file." in result.hint
-
-    @pytest.mark.asyncio
-    async def test_missing_file_skipped(self, tmp_path):
-        mgr = AsyncMock()
-        mgr.get_file_info.return_value = None
-
-        with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mgr):
-            result = await _prepare_file_attachments(
-                ["missing-id"], "u", "s", str(tmp_path)
-            )
-
-        assert result.hint == ""
-        assert result.image_blocks == []
-
-    @pytest.mark.asyncio
-    async def test_image_only_no_read_hint(self, tmp_path):
-        """When all files are images, no Read tool hint should appear."""
-        info = _FakeFileInfo("i1", "cat.png", "/cat.png", "image/png", 4)
-        mgr = AsyncMock()
-        mgr.get_file_info.return_value = info
-        mgr.read_file_by_id.return_value = b"data"
-
-        with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mgr):
-            result = await _prepare_file_attachments(["i1"], "u", "s", str(tmp_path))
-
-        assert "Read tool" not in result.hint
-        assert len(result.image_blocks) == 1
-
-
-class TestPromptSupplement:
-    """Tests for centralized prompt supplement generation."""
-
-    def test_sdk_supplement_excludes_tool_docs(self):
-        """SDK mode should NOT include tool documentation (Claude gets schemas automatically)."""
-        from backend.copilot.prompting import get_sdk_supplement
-
-        # Test both local and E2B modes
-        local_supplement = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
-        e2b_supplement = get_sdk_supplement(use_e2b=True, cwd="")
-
-        # Should NOT have tool list section
-        assert "## AVAILABLE TOOLS" not in local_supplement
-        assert "## AVAILABLE TOOLS" not in e2b_supplement
-
-        # Should still have technical notes
-        assert "## Tool notes" in local_supplement
-        assert "## Tool notes" in e2b_supplement
-
-    def test_baseline_supplement_includes_tool_docs(self):
-        """Baseline mode MUST include tool documentation (direct API needs it)."""
-        from backend.copilot.prompting import get_baseline_supplement
-
-        supplement = get_baseline_supplement()
-
-        # MUST have tool list section
-        assert "## AVAILABLE TOOLS" in supplement
-
-        # Should NOT have environment-specific notes (SDK-only)
-        assert "## Tool notes" not in supplement
-
-    def test_baseline_supplement_includes_key_tools(self):
-        """Baseline supplement should document all essential tools."""
-        from backend.copilot.prompting import get_baseline_supplement
-        from backend.copilot.tools import TOOL_REGISTRY
-
-        docs = get_baseline_supplement()
-
-        # Core agent workflow tools (always available)
-        assert "`create_agent`" in docs
-        assert "`run_agent`" in docs
-        assert "`find_library_agent`" in docs
-        assert "`edit_agent`" in docs
-
-        # MCP integration (always available)
-        assert "`run_mcp_tool`" in docs
-
-        # Folder management (always available)
-        assert "`create_folder`" in docs
-
-        # Browser tools only if available (Playwright may not be installed in CI)
-        if (
-            TOOL_REGISTRY.get("browser_navigate")
-            and TOOL_REGISTRY["browser_navigate"].is_available
-        ):
-            assert "`browser_navigate`" in docs
-
-    def test_baseline_supplement_includes_workflows(self):
-        """Baseline supplement should include workflow guidance in tool descriptions."""
-        from backend.copilot.prompting import get_baseline_supplement
-
-        docs = get_baseline_supplement()
-
-        # Workflows are now in individual tool descriptions (not separate sections)
-        # Check that key workflow concepts appear in tool descriptions
-        assert "suggested_goal" in docs or "clarifying_questions" in docs
-        assert "run_mcp_tool" in docs
-
-    def test_baseline_supplement_completeness(self):
-        """All available tools from TOOL_REGISTRY should appear in baseline supplement."""
-        from backend.copilot.prompting import get_baseline_supplement
-        from backend.copilot.tools import TOOL_REGISTRY
-
-        docs = get_baseline_supplement()
-
-        # Verify each available registered tool is documented
-        # (matches _generate_tool_documentation which filters by is_available)
-        for tool_name, tool in TOOL_REGISTRY.items():
-            if not tool.is_available:
-                continue
-            assert (
-                f"`{tool_name}`" in docs
-            ), f"Tool '{tool_name}' missing from baseline supplement"
-
-    def test_baseline_supplement_no_duplicate_tools(self):
-        """No tool should appear multiple times in baseline supplement."""
-        from backend.copilot.prompting import get_baseline_supplement
-        from backend.copilot.tools import TOOL_REGISTRY
-
-        docs = get_baseline_supplement()
-
-        # Count occurrences of each available tool in the entire supplement
-        for tool_name, tool in TOOL_REGISTRY.items():
-            if not tool.is_available:
-                continue
-            # Count how many times this tool appears as a bullet point
-            count = docs.count(f"- **`{tool_name}`**")
-            assert count == 1, f"Tool '{tool_name}' appears {count} times (should be 1)"
--- a/autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
@@ -2,91 +2,32 @@

 This module provides the adapter layer that converts existing BaseTool implementations
 into in-process MCP tools that can be used with the Claude Agent SDK.
+
+Long-running tools (``is_long_running=True``) are delegated to the non-SDK
+background infrastructure (stream_registry, Redis persistence, SSE reconnection)
+via a callback provided by the service layer.  This avoids wasteful SDK polling
+and makes results survive page refreshes.
 """

-import asyncio
 import itertools
 import json
 import logging
 import os
-import re
 import uuid
+from collections.abc import Awaitable, Callable
 from contextvars import ContextVar
-from typing import TYPE_CHECKING, Any
-
-from claude_agent_sdk import create_sdk_mcp_server, tool
+from typing import Any

 from backend.copilot.model import ChatSession
 from backend.copilot.tools import TOOL_REGISTRY
 from backend.copilot.tools.base import BaseTool
-from backend.util.truncate import truncate
-
-from .e2b_file_tools import E2B_FILE_TOOL_NAMES, E2B_FILE_TOOLS
-
-if TYPE_CHECKING:
-    from e2b import AsyncSandbox

 logger = logging.getLogger(__name__)

 # Allowed base directory for the Read tool (SDK saves oversized tool results here).
 # Restricted to ~/.claude/projects/ and further validated to require "tool-results"
 # in the path — prevents reading settings, credentials, or other sensitive files.
-_SDK_PROJECTS_DIR = os.path.realpath(os.path.expanduser("~/.claude/projects"))
-
-# Max MCP response size in chars — keeps tool output under the SDK's 10 MB JSON buffer.
-_MCP_MAX_CHARS = 500_000
-
-# Context variable holding the encoded project directory name for the current
-# session (e.g. "-private-tmp-copilot-<uuid>").  Set by set_execution_context()
-# so that path validation can scope tool-results reads to the current session.
-_current_project_dir: ContextVar[str] = ContextVar("_current_project_dir", default="")
-
-
-def _encode_cwd_for_cli(cwd: str) -> str:
-    """Encode a working directory path the same way the Claude CLI does.
-
-    The CLI replaces all non-alphanumeric characters with ``-``.
-    """
-    return re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(cwd))
-
-
-def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
-    """Check whether *path* is an allowed host-filesystem path.
-
-    Allowed:
-    - Files under *sdk_cwd* (``/tmp/copilot-<session>/``)
-    - Files under ``~/.claude/projects/<encoded-cwd>/`` — the SDK's
-      project directory for this session (tool-results, transcripts, etc.)
-
-    Both checks are scoped to the **current session** so sessions cannot
-    read each other's data.
-    """
-    if not path:
-        return False
-
-    if path.startswith("~"):
-        resolved = os.path.realpath(os.path.expanduser(path))
-    elif not os.path.isabs(path) and sdk_cwd:
-        resolved = os.path.realpath(os.path.join(sdk_cwd, path))
-    else:
-        resolved = os.path.realpath(path)
-
-    # Allow access within the SDK working directory
-    if sdk_cwd:
-        norm_cwd = os.path.realpath(sdk_cwd)
-        if resolved == norm_cwd or resolved.startswith(norm_cwd + os.sep):
-            return True
-
-    # Allow access within the current session's CLI project directory
-    # (~/.claude/projects/<encoded-cwd>/).
-    encoded = _current_project_dir.get("")
-    if encoded:
-        session_project = os.path.join(_SDK_PROJECTS_DIR, encoded)
-        if resolved == session_project or resolved.startswith(session_project + os.sep):
-            return True
-
-    return False
-
+_SDK_PROJECTS_DIR = os.path.expanduser("~/.claude/projects/")

 # MCP server naming - the SDK prefixes tool names as "mcp__{server_name}__{tool}"
 MCP_SERVER_NAME = "copilot"
@@ -97,66 +38,45 @@ _current_user_id: ContextVar[str | None] = ContextVar("current_user_id", default
 _current_session: ContextVar[ChatSession | None] = ContextVar(
    "current_session", default=None
 )
-# E2B cloud sandbox for the current turn (None when E2B is not configured).
-# Passed to bash_exec so commands run on E2B instead of the local bwrap sandbox.
-_current_sandbox: ContextVar["AsyncSandbox | None"] = ContextVar(
-    "_current_sandbox", default=None
-)
-# Raw SDK working directory path (e.g. /tmp/copilot-<session_id>).
-# Used by workspace tools to save binary files for the CLI's built-in Read.
-_current_sdk_cwd: ContextVar[str] = ContextVar("_current_sdk_cwd", default="")
-
 # Stash for MCP tool outputs before the SDK potentially truncates them.
 # Keyed by tool_name → full output string. Consumed (popped) by the
 # response adapter when it builds StreamToolOutputAvailable.
 _pending_tool_outputs: ContextVar[dict[str, list[str]]] = ContextVar(
-    "pending_tool_outputs",
-    default=None,  # type: ignore[arg-type]
+    "pending_tool_outputs", default=None  # type: ignore[arg-type]
 )
-# Event signaled whenever stash_pending_tool_output() adds a new entry.
-# Used by the streaming loop to wait for PostToolUse hooks to complete
-# instead of sleeping an arbitrary duration.  The SDK fires hooks via
-# start_soon (fire-and-forget) so the next message can arrive before
-# the hook stashes its output — this event bridges that gap.
-_stash_event: ContextVar[asyncio.Event | None] = ContextVar(
-    "_stash_event", default=None
+
+# Callback type for delegating long-running tools to the non-SDK infrastructure.
+# Args: (tool_name, arguments, session) → MCP-formatted response dict.
+LongRunningCallback = Callable[
+    [str, dict[str, Any], ChatSession], Awaitable[dict[str, Any]]
+]
+
+# ContextVar so the service layer can inject the callback per-request.
+_long_running_callback: ContextVar[LongRunningCallback | None] = ContextVar(
+    "long_running_callback", default=None
 )


 def set_execution_context(
    user_id: str | None,
    session: ChatSession,
-    sandbox: "AsyncSandbox | None" = None,
-    sdk_cwd: str | None = None,
+    long_running_callback: LongRunningCallback | None = None,
 ) -> None:
    """Set the execution context for tool calls.

    This must be called before streaming begins to ensure tools have access
-    to user_id, session, and (optionally) an E2B sandbox for bash execution.
+    to user_id and session information.

    Args:
        user_id: Current user's ID.
        session: Current chat session.
-        sandbox: Optional E2B sandbox; when set, bash_exec routes commands there.
-        sdk_cwd: SDK working directory; used to scope tool-results reads.
+        long_running_callback: Optional callback to delegate long-running tools
+            to the non-SDK background infrastructure (stream_registry + Redis).
    """
    _current_user_id.set(user_id)
    _current_session.set(session)
-    _current_sandbox.set(sandbox)
-    _current_sdk_cwd.set(sdk_cwd or "")
-    _current_project_dir.set(_encode_cwd_for_cli(sdk_cwd) if sdk_cwd else "")
    _pending_tool_outputs.set({})
-    _stash_event.set(asyncio.Event())
-
-
-def get_current_sandbox() -> "AsyncSandbox | None":
-    """Return the E2B sandbox for the current turn, or None."""
-    return _current_sandbox.get()
-
-
-def get_sdk_cwd() -> str:
-    """Return the SDK ephemeral working directory for the current turn."""
-    return _current_sdk_cwd.get()
+    _long_running_callback.set(long_running_callback)


 def get_execution_context() -> tuple[str | None, ChatSession | None]:
@@ -214,43 +134,6 @@ def stash_pending_tool_output(tool_name: str, output: Any) -> None:
        except (TypeError, ValueError):
            text = str(output)
    pending.setdefault(tool_name, []).append(text)
-    # Signal any waiters that new output is available.
-    event = _stash_event.get(None)
-    if event is not None:
-        event.set()
-
-
-async def wait_for_stash(timeout: float = 0.5) -> bool:
-    """Wait for a PostToolUse hook to stash tool output.
-
-    The SDK fires PostToolUse hooks asynchronously via ``start_soon()`` —
-    the next message (AssistantMessage/ResultMessage) can arrive before the
-    hook completes and stashes its output.  This function bridges that gap
-    by waiting on the ``_stash_event``, which is signaled by
-    :func:`stash_pending_tool_output`.
-
-    After the event fires, callers should ``await asyncio.sleep(0)`` to
-    give any remaining concurrent hooks a chance to complete.
-
-    Returns ``True`` if a stash signal was received, ``False`` on timeout.
-    The timeout is a safety net — normally the stash happens within
-    microseconds of yielding to the event loop.
-    """
-    event = _stash_event.get(None)
-    if event is None:
-        return False
-    # Fast path: hook already completed before we got here.
-    if event.is_set():
-        event.clear()
-        return True
-    # Slow path: wait for the hook to signal.
-    try:
-        async with asyncio.timeout(timeout):
-            await event.wait()
-        event.clear()
-        return True
-    except TimeoutError:
-        return False


 async def _execute_tool_sync(
@@ -272,12 +155,66 @@ async def _execute_tool_sync(
        result.output if isinstance(result.output, str) else json.dumps(result.output)
    )

+    # Stash the full output before the SDK potentially truncates it.
+    pending = _pending_tool_outputs.get(None)
+    if pending is not None:
+        pending.setdefault(base_tool.name, []).append(text)
+
+    content_blocks: list[dict[str, str]] = [{"type": "text", "text": text}]
+
+    # If the tool result contains inline image data, add an MCP image block
+    # so Claude can "see" the image (e.g. read_workspace_file on a small PNG).
+    image_block = _extract_image_block(text)
+    if image_block:
+        content_blocks.append(image_block)
+
    return {
-        "content": [{"type": "text", "text": text}],
+        "content": content_blocks,
        "isError": not result.success,
    }


+# MIME types that Claude can process as image content blocks.
+_SUPPORTED_IMAGE_TYPES = frozenset(
+    {"image/png", "image/jpeg", "image/gif", "image/webp"}
+)
+
+
+def _extract_image_block(text: str) -> dict[str, str] | None:
+    """Extract an MCP image content block from a tool result JSON string.
+
+    Detects workspace file responses with ``content_base64`` and an image
+    MIME type, returning an MCP-format image block that allows Claude to
+    "see" the image.  Returns ``None`` if the result is not an inline image.
+    """
+    try:
+        data = json.loads(text)
+    except (json.JSONDecodeError, TypeError):
+        return None
+
+    if not isinstance(data, dict):
+        return None
+
+    mime_type = data.get("mime_type", "")
+    base64_content = data.get("content_base64", "")
+
+    # Only inline small images — large ones would exceed Claude's limits.
+    # 32 KB raw ≈ ~43 KB base64.
+    _MAX_IMAGE_BASE64_BYTES = 43_000
+    if (
+        mime_type in _SUPPORTED_IMAGE_TYPES
+        and base64_content
+        and len(base64_content) <= _MAX_IMAGE_BASE64_BYTES
+    ):
+        return {
+            "type": "image",
+            "data": base64_content,
+            "mimeType": mime_type,
+        }
+
+    return None
+
+
 def _mcp_error(message: str) -> dict[str, Any]:
    return {
        "content": [
@@ -292,6 +229,11 @@ def create_tool_handler(base_tool: BaseTool):

    This wraps the existing BaseTool._execute method to be compatible
    with the Claude Agent SDK MCP tool format.
+
+    Long-running tools (``is_long_running=True``) are delegated to the
+    non-SDK background infrastructure via a callback set in the execution
+    context.  The callback persists the operation in Redis (stream_registry)
+    so results survive page refreshes and pod restarts.
    """

    async def tool_handler(args: dict[str, Any]) -> dict[str, Any]:
@@ -301,6 +243,25 @@ def create_tool_handler(base_tool: BaseTool):
        if session is None:
            return _mcp_error("No session context available")

+        # --- Long-running: delegate to non-SDK background infrastructure ---
+        if base_tool.is_long_running:
+            callback = _long_running_callback.get(None)
+            if callback:
+                try:
+                    return await callback(base_tool.name, args, session)
+                except Exception as e:
+                    logger.error(
+                        f"Long-running callback failed for {base_tool.name}: {e}",
+                        exc_info=True,
+                    )
+                    return _mcp_error(f"Failed to start {base_tool.name}: {e}")
+            # No callback — fall through to synchronous execution
+            logger.warning(
+                f"[SDK] No long-running callback for {base_tool.name}, "
+                f"executing synchronously (may block)"
+            )
+
+        # --- Normal (fast) tool: execute synchronously ---
        try:
            return await _execute_tool_sync(base_tool, user_id, session, args)
        except Exception as e:
@@ -320,32 +281,29 @@ def _build_input_schema(base_tool: BaseTool) -> dict[str, Any]:


 async def _read_file_handler(args: dict[str, Any]) -> dict[str, Any]:
-    """Read a local file with optional offset/limit.
+    """Read a file with optional offset/limit. Restricted to SDK working directory.

-    Only allows paths that pass :func:`is_allowed_local_path` — the current
-    session's tool-results directory and ephemeral working directory.
+    After reading, the file is deleted to prevent accumulation in long-running pods.
    """
    file_path = args.get("file_path", "")
    offset = args.get("offset", 0)
    limit = args.get("limit", 2000)

-    if not is_allowed_local_path(file_path):
+    # Security: only allow reads under ~/.claude/projects/**/tool-results/
+    real_path = os.path.realpath(file_path)
+    if not real_path.startswith(_SDK_PROJECTS_DIR) or "tool-results" not in real_path:
        return {
            "content": [{"type": "text", "text": f"Access denied: {file_path}"}],
            "isError": True,
        }

-    resolved = os.path.realpath(os.path.expanduser(file_path))
    try:
-        with open(resolved) as f:
+        with open(real_path) as f:
            selected = list(itertools.islice(f, offset, offset + limit))
        content = "".join(selected)
        # Cleanup happens in _cleanup_sdk_tool_results after session ends;
        # don't delete here — the SDK may read in multiple chunks.
-        return {
-            "content": [{"type": "text", "text": content}],
-            "isError": False,
-        }
+        return {"content": [{"type": "text", "text": content}], "isError": False}
    except FileNotFoundError:
        return {
            "content": [{"type": "text", "text": f"File not found: {file_path}"}],
@@ -383,86 +341,50 @@ _READ_TOOL_SCHEMA = {
 }


-# ---------------------------------------------------------------------------
-# MCP result helpers
-# ---------------------------------------------------------------------------
-
-
-def _text_from_mcp_result(result: dict[str, Any]) -> str:
-    """Extract concatenated text from an MCP response's content blocks."""
-    content = result.get("content", [])
-    if not isinstance(content, list):
-        return ""
-    return "".join(
-        b.get("text", "")
-        for b in content
-        if isinstance(b, dict) and b.get("type") == "text"
-    )
-
-
-def create_copilot_mcp_server(*, use_e2b: bool = False):
+# Create the MCP server configuration
+def create_copilot_mcp_server():
    """Create an in-process MCP server configuration for CoPilot tools.

-    When *use_e2b* is True, five additional MCP file tools are registered
-    that route directly to the E2B sandbox filesystem, and the caller should
-    disable the corresponding SDK built-in tools via
-    :func:`get_sdk_disallowed_tools`.
+    This can be passed to ClaudeAgentOptions.mcp_servers.
+
+    Note: The actual SDK MCP server creation depends on the claude-agent-sdk
+    package being available. This function returns the configuration that
+    can be used with the SDK.
    """
+    try:
+        from claude_agent_sdk import create_sdk_mcp_server, tool

-    def _truncating(fn, tool_name: str):
-        """Wrap a tool handler so its response is truncated to stay under the
-        SDK's 10 MB JSON buffer, and stash the (truncated) output for the
-        response adapter before the SDK can apply its own head-truncation.
+        # Create decorated tool functions
+        sdk_tools = []

-        Applied once to every registered tool."""
-
-        async def wrapper(args: dict[str, Any]) -> dict[str, Any]:
-            result = await fn(args)
-            truncated = truncate(result, _MCP_MAX_CHARS)
-
-            # Stash the text so the response adapter can forward our
-            # middle-out truncated version to the frontend instead of the
-            # SDK's head-truncated version (for outputs >~100 KB the SDK
-            # persists to tool-results/ with a 2 KB head-only preview).
-            if not truncated.get("isError"):
-                text = _text_from_mcp_result(truncated)
-                if text:
-                    stash_pending_tool_output(tool_name, text)
-
-            return truncated
-
-        return wrapper
-
-    sdk_tools = []
-
-    for tool_name, base_tool in TOOL_REGISTRY.items():
-        handler = create_tool_handler(base_tool)
-        decorated = tool(
-            tool_name,
-            base_tool.description,
-            _build_input_schema(base_tool),
-        )(_truncating(handler, tool_name))
-        sdk_tools.append(decorated)
-
-    # E2B file tools replace SDK built-in Read/Write/Edit/Glob/Grep.
-    if use_e2b:
-        for name, desc, schema, handler in E2B_FILE_TOOLS:
-            decorated = tool(name, desc, schema)(_truncating(handler, name))
+        for tool_name, base_tool in TOOL_REGISTRY.items():
+            handler = create_tool_handler(base_tool)
+            decorated = tool(
+                tool_name,
+                base_tool.description,
+                _build_input_schema(base_tool),
+            )(handler)
            sdk_tools.append(decorated)

-    # Read tool for SDK-truncated tool results (always needed).
-    read_tool = tool(
-        _READ_TOOL_NAME,
-        _READ_TOOL_DESCRIPTION,
-        _READ_TOOL_SCHEMA,
-    )(_truncating(_read_file_handler, _READ_TOOL_NAME))
-    sdk_tools.append(read_tool)
+        # Add the Read tool so the SDK can read back oversized tool results
+        read_tool = tool(
+            _READ_TOOL_NAME,
+            _READ_TOOL_DESCRIPTION,
+            _READ_TOOL_SCHEMA,
+        )(_read_file_handler)
+        sdk_tools.append(read_tool)

-    return create_sdk_mcp_server(
-        name=MCP_SERVER_NAME,
-        version="1.0.0",
-        tools=sdk_tools,
-    )
+        server = create_sdk_mcp_server(
+            name=MCP_SERVER_NAME,
+            version="1.0.0",
+            tools=sdk_tools,
+        )
+
+        return server
+
+    except ImportError:
+        # Let ImportError propagate so service.py handles the fallback
+        raise


 # SDK built-in tools allowed within the workspace directory.
@@ -472,11 +394,16 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
 # Task allows spawning sub-agents (rate-limited by security hooks).
 # WebSearch uses Brave Search via Anthropic's API — safe, no SSRF risk.
 # TodoWrite manages the task checklist shown in the UI — no security concern.
-# In E2B mode, all five are disabled — MCP equivalents provide direct sandbox
-# access.  read_file also handles local tool-results and ephemeral reads.
-_SDK_BUILTIN_FILE_TOOLS = ["Read", "Write", "Edit", "Glob", "Grep"]
-_SDK_BUILTIN_ALWAYS = ["Task", "WebSearch", "TodoWrite"]
-_SDK_BUILTIN_TOOLS = [*_SDK_BUILTIN_FILE_TOOLS, *_SDK_BUILTIN_ALWAYS]
+_SDK_BUILTIN_TOOLS = [
+    "Read",
+    "Write",
+    "Edit",
+    "Glob",
+    "Grep",
+    "Task",
+    "WebSearch",
+    "TodoWrite",
+]

 # SDK built-in tools that must be explicitly blocked.
 # Bash: dangerous — agent uses mcp__copilot__bash_exec with kernel-level
@@ -523,37 +450,11 @@ DANGEROUS_PATTERNS = [
    r"subprocess",
 ]

-# Static tool name list for the non-E2B case (backward compatibility).
+# List of tool names for allowed_tools configuration
+# Include MCP tools, the MCP Read tool for oversized results,
+# and SDK built-in file tools for workspace operations.
 COPILOT_TOOL_NAMES = [
    *[f"{MCP_TOOL_PREFIX}{name}" for name in TOOL_REGISTRY.keys()],
    f"{MCP_TOOL_PREFIX}{_READ_TOOL_NAME}",
    *_SDK_BUILTIN_TOOLS,
 ]
-
-
-def get_copilot_tool_names(*, use_e2b: bool = False) -> list[str]:
-    """Build the ``allowed_tools`` list for :class:`ClaudeAgentOptions`.
-
-    When *use_e2b* is True the SDK built-in file tools are replaced by MCP
-    equivalents that route to the E2B sandbox.
-    """
-    if not use_e2b:
-        return list(COPILOT_TOOL_NAMES)
-
-    return [
-        *[f"{MCP_TOOL_PREFIX}{name}" for name in TOOL_REGISTRY.keys()],
-        f"{MCP_TOOL_PREFIX}{_READ_TOOL_NAME}",
-        *[f"{MCP_TOOL_PREFIX}{name}" for name in E2B_FILE_TOOL_NAMES],
-        *_SDK_BUILTIN_ALWAYS,
-    ]
-
-
-def get_sdk_disallowed_tools(*, use_e2b: bool = False) -> list[str]:
-    """Build the ``disallowed_tools`` list for :class:`ClaudeAgentOptions`.
-
-    When *use_e2b* is True the SDK built-in file tools are also disabled
-    because MCP equivalents provide direct sandbox access.
-    """
-    if not use_e2b:
-        return list(SDK_DISALLOWED_TOOLS)
-    return [*SDK_DISALLOWED_TOOLS, *_SDK_BUILTIN_FILE_TOOLS]
--- a/autogpt_platform/backend/backend/copilot/sdk/tool_adapter_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/tool_adapter_test.py
@@ -1,170 +0,0 @@
-"""Tests for tool_adapter helpers: truncation, stash, context vars."""
-
-import pytest
-
-from backend.util.truncate import truncate
-
-from .tool_adapter import (
-    _MCP_MAX_CHARS,
-    _text_from_mcp_result,
-    get_sdk_cwd,
-    pop_pending_tool_output,
-    set_execution_context,
-    stash_pending_tool_output,
-)
-
-# ---------------------------------------------------------------------------
-# _text_from_mcp_result
-# ---------------------------------------------------------------------------
-
-
-class TestTextFromMcpResult:
-    def test_single_text_block(self):
-        result = {"content": [{"type": "text", "text": "hello"}]}
-        assert _text_from_mcp_result(result) == "hello"
-
-    def test_multiple_text_blocks_concatenated(self):
-        result = {
-            "content": [
-                {"type": "text", "text": "one"},
-                {"type": "text", "text": "two"},
-            ]
-        }
-        assert _text_from_mcp_result(result) == "onetwo"
-
-    def test_non_text_blocks_ignored(self):
-        result = {
-            "content": [
-                {"type": "image", "data": "..."},
-                {"type": "text", "text": "only this"},
-            ]
-        }
-        assert _text_from_mcp_result(result) == "only this"
-
-    def test_empty_content_list(self):
-        assert _text_from_mcp_result({"content": []}) == ""
-
-    def test_missing_content_key(self):
-        assert _text_from_mcp_result({}) == ""
-
-    def test_non_list_content(self):
-        assert _text_from_mcp_result({"content": "raw string"}) == ""
-
-    def test_missing_text_field(self):
-        result = {"content": [{"type": "text"}]}
-        assert _text_from_mcp_result(result) == ""
-
-
-# ---------------------------------------------------------------------------
-# get_sdk_cwd
-# ---------------------------------------------------------------------------
-
-
-class TestGetSdkCwd:
-    def test_returns_empty_string_by_default(self):
-        set_execution_context(
-            user_id="test",
-            session=None,  # type: ignore[arg-type]
-            sandbox=None,
-        )
-        assert get_sdk_cwd() == ""
-
-    def test_returns_set_value(self):
-        set_execution_context(
-            user_id="test",
-            session=None,  # type: ignore[arg-type]
-            sandbox=None,
-            sdk_cwd="/tmp/copilot-test-123",
-        )
-        assert get_sdk_cwd() == "/tmp/copilot-test-123"
-
-
-# ---------------------------------------------------------------------------
-# stash / pop round-trip (the mechanism _truncating relies on)
-# ---------------------------------------------------------------------------
-
-
-class TestToolOutputStash:
-    @pytest.fixture(autouse=True)
-    def _init_context(self):
-        """Initialise the context vars that stash_pending_tool_output needs."""
-        set_execution_context(
-            user_id="test",
-            session=None,  # type: ignore[arg-type]
-            sandbox=None,
-            sdk_cwd="/tmp/test",
-        )
-
-    def test_stash_and_pop(self):
-        stash_pending_tool_output("my_tool", "output1")
-        assert pop_pending_tool_output("my_tool") == "output1"
-
-    def test_pop_empty_returns_none(self):
-        assert pop_pending_tool_output("nonexistent") is None
-
-    def test_fifo_order(self):
-        stash_pending_tool_output("t", "first")
-        stash_pending_tool_output("t", "second")
-        assert pop_pending_tool_output("t") == "first"
-        assert pop_pending_tool_output("t") == "second"
-        assert pop_pending_tool_output("t") is None
-
-    def test_dict_serialised_to_json(self):
-        stash_pending_tool_output("t", {"key": "value"})
-        assert pop_pending_tool_output("t") == '{"key": "value"}'
-
-    def test_separate_tool_names(self):
-        stash_pending_tool_output("a", "alpha")
-        stash_pending_tool_output("b", "beta")
-        assert pop_pending_tool_output("b") == "beta"
-        assert pop_pending_tool_output("a") == "alpha"
-
-
-# ---------------------------------------------------------------------------
-# _truncating wrapper (integration via create_copilot_mcp_server)
-# ---------------------------------------------------------------------------
-
-
-class TestTruncationAndStashIntegration:
-    """Test truncation + stash behavior that _truncating relies on."""
-
-    @pytest.fixture(autouse=True)
-    def _init_context(self):
-        set_execution_context(
-            user_id="test",
-            session=None,  # type: ignore[arg-type]
-            sandbox=None,
-            sdk_cwd="/tmp/test",
-        )
-
-    def test_small_output_stashed(self):
-        """Non-error output is stashed for the response adapter."""
-        result = {
-            "content": [{"type": "text", "text": "small output"}],
-            "isError": False,
-        }
-        truncated = truncate(result, _MCP_MAX_CHARS)
-        text = _text_from_mcp_result(truncated)
-        assert text == "small output"
-        stash_pending_tool_output("test_tool", text)
-        assert pop_pending_tool_output("test_tool") == "small output"
-
-    def test_error_result_not_stashed(self):
-        """Error results should not be stashed."""
-        result = {
-            "content": [{"type": "text", "text": "error msg"}],
-            "isError": True,
-        }
-        # _truncating only stashes when not result.get("isError")
-        if not result.get("isError"):
-            stash_pending_tool_output("err_tool", "should not happen")
-        assert pop_pending_tool_output("err_tool") is None
-
-    def test_large_output_truncated(self):
-        """Output exceeding _MCP_MAX_CHARS is truncated before stashing."""
-        big_text = "x" * (_MCP_MAX_CHARS + 100_000)
-        result = {"content": [{"type": "text", "text": big_text}]}
-        truncated = truncate(result, _MCP_MAX_CHARS)
-        text = _text_from_mcp_result(truncated)
-        assert len(text) < len(big_text)
-        assert len(str(truncated)) <= _MCP_MAX_CHARS
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript.py
@@ -10,14 +10,13 @@ Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
 filesystem for self-hosted) — no DB column needed.
 """

+import json
 import logging
 import os
 import re
 import time
 from dataclasses import dataclass

-from backend.util import json
-
 logger = logging.getLogger(__name__)

 # UUIDs are hex + hyphens; strip everything else to prevent path injection.
@@ -59,37 +58,41 @@ def strip_progress_entries(content: str) -> str:
    Removes entries whose ``type`` is in ``STRIPPABLE_TYPES`` and reparents
    any remaining child entries so the ``parentUuid`` chain stays intact.
    Typically reduces transcript size by ~30%.
-
-    Entries that are not stripped or reparented are kept as their original
-    raw JSON line to avoid unnecessary re-serialization that changes
-    whitespace or key ordering.
    """
    lines = content.strip().split("\n")

-    # Parse entries, keeping the original line alongside the parsed dict.
-    parsed: list[tuple[str, dict | None]] = []
+    entries: list[dict] = []
    for line in lines:
-        parsed.append((line, json.loads(line, fallback=None)))
+        try:
+            entries.append(json.loads(line))
+        except json.JSONDecodeError:
+            # Keep unparseable lines as-is (safety)
+            entries.append({"_raw": line})

-    # First pass: identify stripped UUIDs and build parent map.
    stripped_uuids: set[str] = set()
    uuid_to_parent: dict[str, str] = {}
+    kept: list[dict] = []

-    for _line, entry in parsed:
-        if not isinstance(entry, dict):
+    for entry in entries:
+        if "_raw" in entry:
+            kept.append(entry)
            continue
        uid = entry.get("uuid", "")
        parent = entry.get("parentUuid", "")
+        entry_type = entry.get("type", "")
+
        if uid:
            uuid_to_parent[uid] = parent
-        if entry.get("type", "") in STRIPPABLE_TYPES and uid:
-            stripped_uuids.add(uid)

-    # Second pass: keep non-stripped entries, reparenting where needed.
-    # Preserve original line when no reparenting is required.
-    reparented: set[str] = set()
-    for _line, entry in parsed:
-        if not isinstance(entry, dict):
+        if entry_type in STRIPPABLE_TYPES:
+            if uid:
+                stripped_uuids.add(uid)
+        else:
+            kept.append(entry)
+
+    # Reparent: walk up chain through stripped entries to find surviving ancestor
+    for entry in kept:
+        if "_raw" in entry:
            continue
        parent = entry.get("parentUuid", "")
        original_parent = parent
@@ -97,32 +100,60 @@ def strip_progress_entries(content: str) -> str:
            parent = uuid_to_parent.get(parent, "")
        if parent != original_parent:
            entry["parentUuid"] = parent
-            uid = entry.get("uuid", "")
-            if uid:
-                reparented.add(uid)

    result_lines: list[str] = []
-    for line, entry in parsed:
-        if not isinstance(entry, dict):
-            result_lines.append(line)
-            continue
-        if entry.get("type", "") in STRIPPABLE_TYPES:
-            continue
-        uid = entry.get("uuid", "")
-        if uid in reparented:
-            # Re-serialize only entries whose parentUuid was changed.
-            result_lines.append(json.dumps(entry, separators=(",", ":")))
+    for entry in kept:
+        if "_raw" in entry:
+            result_lines.append(entry["_raw"])
        else:
-            result_lines.append(line)
+            result_lines.append(json.dumps(entry, separators=(",", ":")))

    return "\n".join(result_lines) + "\n"


 # ---------------------------------------------------------------------------
-# Local file I/O (write temp file for --resume)
+# Local file I/O (read from CLI's JSONL, write temp file for --resume)
 # ---------------------------------------------------------------------------


+def read_transcript_file(transcript_path: str) -> str | None:
+    """Read a JSONL transcript file from disk.
+
+    Returns the raw JSONL content, or ``None`` if the file is missing, empty,
+    or only contains metadata (≤2 lines with no conversation messages).
+    """
+    if not transcript_path or not os.path.isfile(transcript_path):
+        logger.debug(f"[Transcript] File not found: {transcript_path}")
+        return None
+
+    try:
+        with open(transcript_path) as f:
+            content = f.read()
+
+        if not content.strip():
+            return None
+
+        lines = content.strip().split("\n")
+        if len(lines) < 3:
+            # Raw files with ≤2 lines are metadata-only
+            # (queue-operation + file-history-snapshot, no conversation).
+            return None
+
+        # Quick structural validation — parse first and last lines.
+        json.loads(lines[0])
+        json.loads(lines[-1])
+
+        logger.info(
+            f"[Transcript] Read {len(lines)} lines, "
+            f"{len(content)} bytes from {transcript_path}"
+        )
+        return content
+
+    except (json.JSONDecodeError, OSError) as e:
+        logger.warning(f"[Transcript] Failed to read {transcript_path}: {e}")
+        return None
+
+
 def _sanitize_id(raw_id: str, max_len: int = 36) -> str:
    """Sanitize an ID for safe use in file paths.

@@ -137,6 +168,14 @@ def _sanitize_id(raw_id: str, max_len: int = 36) -> str:
 _SAFE_CWD_PREFIX = os.path.realpath("/tmp/copilot-")


+def _encode_cwd_for_cli(cwd: str) -> str:
+    """Encode a working directory path the same way the Claude CLI does.
+
+    The CLI replaces all non-alphanumeric characters with ``-``.
+    """
+    return re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(cwd))
+
+
 def cleanup_cli_project_dir(sdk_cwd: str) -> None:
    """Remove the CLI's project directory for a specific working directory.

@@ -146,8 +185,7 @@ def cleanup_cli_project_dir(sdk_cwd: str) -> None:
    """
    import shutil

-    # Encode cwd the same way CLI does (replaces non-alphanumeric with -)
-    cwd_encoded = re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(sdk_cwd))
+    cwd_encoded = _encode_cwd_for_cli(sdk_cwd)
    config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
    projects_base = os.path.realpath(os.path.join(config_dir, "projects"))
    project_dir = os.path.realpath(os.path.join(projects_base, cwd_encoded))
@@ -207,29 +245,32 @@ def write_transcript_to_tempfile(
 def validate_transcript(content: str | None) -> bool:
    """Check that a transcript has actual conversation messages.

-    A valid transcript needs at least one assistant message (not just
-    queue-operation / file-history-snapshot metadata).  We do NOT require
-    a ``type: "user"`` entry because with ``--resume`` the user's message
-    is passed as a CLI query parameter and does not appear in the
-    transcript file.
+    A valid transcript for resume needs at least one user message and one
+    assistant message (not just queue-operation / file-history-snapshot
+    metadata).
    """
    if not content or not content.strip():
        return False

    lines = content.strip().split("\n")
+    if len(lines) < 2:
+        return False

+    has_user = False
    has_assistant = False

    for line in lines:
-        if not line.strip():
-            continue
-        entry = json.loads(line, fallback=None)
-        if not isinstance(entry, dict):
+        try:
+            entry = json.loads(line)
+            msg_type = entry.get("type")
+            if msg_type == "user":
+                has_user = True
+            elif msg_type == "assistant":
+                has_assistant = True
+        except json.JSONDecodeError:
            return False
-        if entry.get("type") == "assistant":
-            has_assistant = True

-    return has_assistant
+    return has_user and has_assistant


 # ---------------------------------------------------------------------------
@@ -284,46 +325,45 @@ async def upload_transcript(
    session_id: str,
    content: str,
    message_count: int = 0,
-    log_prefix: str = "[Transcript]",
 ) -> None:
-    """Strip progress entries and upload complete transcript.
+    """Strip progress entries and upload transcript to bucket storage.

-    The transcript represents the FULL active context (atomic).
-    Each upload REPLACES the previous transcript entirely.
-
-    The executor holds a cluster lock per session, so concurrent uploads for
-    the same session cannot happen.
+    Safety: only overwrites when the new (stripped) transcript is larger than
+    what is already stored.  Since JSONL is append-only, the latest transcript
+    is always the longest.  This prevents a slow/stale background task from
+    clobbering a newer upload from a concurrent turn.

    Args:
-        content: Complete JSONL transcript (from TranscriptBuilder).
-        message_count: ``len(session.messages)`` at upload time.
+        message_count: ``len(session.messages)`` at upload time — used by
+            the next turn to detect staleness and compress only the gap.
    """
    from backend.util.workspace_storage import get_workspace_storage

-    # Strip metadata entries (progress, file-history-snapshot, etc.)
-    # Note: SDK-built transcripts shouldn't have these, but strip for safety
    stripped = strip_progress_entries(content)
    if not validate_transcript(stripped):
-        # Log entry types for debugging — helps identify why validation failed
-        entry_types: list[str] = []
-        for line in stripped.strip().split("\n"):
-            entry = json.loads(line, fallback={"type": "INVALID_JSON"})
-            entry_types.append(entry.get("type", "?"))
        logger.warning(
-            "%s Skipping upload — stripped content not valid "
-            "(types=%s, stripped_len=%d, raw_len=%d)",
-            log_prefix,
-            entry_types,
-            len(stripped),
-            len(content),
+            f"[Transcript] Skipping upload — stripped content not valid "
+            f"for session {session_id}"
        )
-        logger.debug("%s Raw content preview: %s", log_prefix, content[:500])
-        logger.debug("%s Stripped content: %s", log_prefix, stripped[:500])
        return

    storage = await get_workspace_storage()
    wid, fid, fname = _storage_path_parts(user_id, session_id)
    encoded = stripped.encode("utf-8")
+    new_size = len(encoded)
+
+    # Check existing transcript size to avoid overwriting newer with older
+    path = _build_storage_path(user_id, session_id, storage)
+    try:
+        existing = await storage.retrieve(path)
+        if len(existing) >= new_size:
+            logger.info(
+                f"[Transcript] Skipping upload — existing ({len(existing)}B) "
+                f">= new ({new_size}B) for session {session_id}"
+            )
+            return
+    except (FileNotFoundError, Exception):
+        pass  # No existing transcript or retrieval error — proceed with upload

    await storage.store(
        workspace_id=wid,
@@ -332,8 +372,11 @@ async def upload_transcript(
        content=encoded,
    )

-    # Update metadata so message_count stays current.  The gap-fill logic
-    # in _build_query_message relies on it to avoid re-compressing messages.
+    # Store metadata alongside the transcript so the next turn can detect
+    # staleness and only compress the gap instead of the full history.
+    # Wrapped in try/except so a metadata write failure doesn't orphan
+    # the already-uploaded transcript — the next turn will just fall back
+    # to full gap fill (msg_count=0).
    try:
        meta = {"message_count": message_count, "uploaded_at": time.time()}
        mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
@@ -344,18 +387,17 @@ async def upload_transcript(
            content=json.dumps(meta).encode("utf-8"),
        )
    except Exception as e:
-        logger.warning(f"{log_prefix} Failed to write metadata: {e}")
+        logger.warning(f"[Transcript] Failed to write metadata for {session_id}: {e}")

    logger.info(
-        f"{log_prefix} Uploaded {len(encoded)}B "
-        f"(stripped from {len(content)}B, msg_count={message_count})"
+        f"[Transcript] Uploaded {new_size}B "
+        f"(stripped from {len(content)}B, msg_count={message_count}) "
+        f"for session {session_id}"
    )


 async def download_transcript(
-    user_id: str,
-    session_id: str,
-    log_prefix: str = "[Transcript]",
+    user_id: str, session_id: str
 ) -> TranscriptDownload | None:
    """Download transcript and metadata from bucket storage.

@@ -371,10 +413,10 @@ async def download_transcript(
        data = await storage.retrieve(path)
        content = data.decode("utf-8")
    except FileNotFoundError:
-        logger.debug(f"{log_prefix} No transcript in storage")
+        logger.debug(f"[Transcript] No transcript in storage for {session_id}")
        return None
    except Exception as e:
-        logger.warning(f"{log_prefix} Failed to download transcript: {e}")
+        logger.warning(f"[Transcript] Failed to download transcript: {e}")
        return None

    # Try to load metadata (best-effort — old transcripts won't have it)
@@ -391,13 +433,16 @@ async def download_transcript(
            meta_path = f"local://{mwid}/{mfid}/{mfname}"

        meta_data = await storage.retrieve(meta_path)
-        meta = json.loads(meta_data.decode("utf-8"), fallback={})
+        meta = json.loads(meta_data.decode("utf-8"))
        message_count = meta.get("message_count", 0)
        uploaded_at = meta.get("uploaded_at", 0.0)
-    except (FileNotFoundError, Exception):
+    except (FileNotFoundError, json.JSONDecodeError, Exception):
        pass  # No metadata — treat as unknown (msg_count=0 → always fill gap)

-    logger.info(f"{log_prefix} Downloaded {len(content)}B (msg_count={message_count})")
+    logger.info(
+        f"[Transcript] Downloaded {len(content)}B "
+        f"(msg_count={message_count}) for session {session_id}"
+    )
    return TranscriptDownload(
        content=content,
        message_count=message_count,
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
@@ -1,188 +0,0 @@
-"""Build complete JSONL transcript from SDK messages.
-
-The transcript represents the FULL active context at any point in time.
-Each upload REPLACES the previous transcript atomically.
-
-Flow:
-  Turn 1: Upload [msg1, msg2]
-  Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
-  Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
-
-The transcript is never incremental - always the complete atomic state.
-"""
-
-import logging
-from typing import Any
-from uuid import uuid4
-
-from pydantic import BaseModel
-
-from backend.util import json
-
-from .transcript import STRIPPABLE_TYPES
-
-logger = logging.getLogger(__name__)
-
-
-class TranscriptEntry(BaseModel):
-    """Single transcript entry (user or assistant turn)."""
-
-    type: str
-    uuid: str
-    parentUuid: str | None
-    message: dict[str, Any]
-
-
-class TranscriptBuilder:
-    """Build complete JSONL transcript from SDK messages.
-
-    This builder maintains the FULL conversation state, not incremental changes.
-    The output is always the complete active context.
-    """
-
-    def __init__(self) -> None:
-        self._entries: list[TranscriptEntry] = []
-        self._last_uuid: str | None = None
-
-    def _last_is_assistant(self) -> bool:
-        return bool(self._entries) and self._entries[-1].type == "assistant"
-
-    def _last_message_id(self) -> str:
-        """Return the message.id of the last entry, or '' if none."""
-        if self._entries:
-            return self._entries[-1].message.get("id", "")
-        return ""
-
-    def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
-        """Load complete previous transcript.
-
-        This loads the FULL previous context. As new messages come in,
-        we append to this state. The final output is the complete context
-        (previous + new), not just the delta.
-        """
-        if not content or not content.strip():
-            return
-
-        lines = content.strip().split("\n")
-        for line_num, line in enumerate(lines, 1):
-            if not line.strip():
-                continue
-
-            data = json.loads(line, fallback=None)
-            if data is None:
-                logger.warning(
-                    "%s Failed to parse transcript line %d/%d",
-                    log_prefix,
-                    line_num,
-                    len(lines),
-                )
-                continue
-
-            # Load all non-strippable entries (user/assistant/system/etc.)
-            # Skip only STRIPPABLE_TYPES to match strip_progress_entries() behavior
-            entry_type = data.get("type", "")
-            if entry_type in STRIPPABLE_TYPES:
-                continue
-
-            entry = TranscriptEntry(
-                type=data["type"],
-                uuid=data.get("uuid") or str(uuid4()),
-                parentUuid=data.get("parentUuid"),
-                message=data.get("message", {}),
-            )
-            self._entries.append(entry)
-            self._last_uuid = entry.uuid
-
-        logger.info(
-            "%s Loaded %d entries from previous transcript (last_uuid=%s)",
-            log_prefix,
-            len(self._entries),
-            self._last_uuid[:12] if self._last_uuid else None,
-        )
-
-    def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
-        """Append a user entry."""
-        msg_uuid = uuid or str(uuid4())
-
-        self._entries.append(
-            TranscriptEntry(
-                type="user",
-                uuid=msg_uuid,
-                parentUuid=self._last_uuid,
-                message={"role": "user", "content": content},
-            )
-        )
-        self._last_uuid = msg_uuid
-
-    def append_tool_result(self, tool_use_id: str, content: str) -> None:
-        """Append a tool result as a user entry (one per tool call)."""
-        self.append_user(
-            content=[
-                {"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
-            ]
-        )
-
-    def append_assistant(
-        self,
-        content_blocks: list[dict],
-        model: str = "",
-        stop_reason: str | None = None,
-    ) -> None:
-        """Append an assistant entry.
-
-        Consecutive assistant entries automatically share the same message ID
-        so the CLI can merge them (thinking → text → tool_use) into a single
-        API message on ``--resume``.  A new ID is assigned whenever an
-        assistant entry follows a non-assistant entry (user message or tool
-        result), because that marks the start of a new API response.
-        """
-        message_id = (
-            self._last_message_id()
-            if self._last_is_assistant()
-            else f"msg_sdk_{uuid4().hex[:24]}"
-        )
-
-        msg_uuid = str(uuid4())
-
-        self._entries.append(
-            TranscriptEntry(
-                type="assistant",
-                uuid=msg_uuid,
-                parentUuid=self._last_uuid,
-                message={
-                    "role": "assistant",
-                    "model": model,
-                    "id": message_id,
-                    "type": "message",
-                    "content": content_blocks,
-                    "stop_reason": stop_reason,
-                    "stop_sequence": None,
-                },
-            )
-        )
-        self._last_uuid = msg_uuid
-
-    def to_jsonl(self) -> str:
-        """Export complete context as JSONL.
-
-        Consecutive assistant entries are kept separate to match the
-        native CLI format — the SDK merges them internally on resume.
-
-        Returns the FULL conversation state (all entries), not incremental.
-        This output REPLACES any previous transcript.
-        """
-        if not self._entries:
-            return ""
-
-        lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
-        return "\n".join(lines) + "\n"
-
-    @property
-    def entry_count(self) -> int:
-        """Total number of entries in the complete context."""
-        return len(self._entries)
-
-    @property
-    def is_empty(self) -> bool:
-        """Whether this builder has any entries."""
-        return len(self._entries) == 0
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
@@ -1,11 +1,11 @@
 """Unit tests for JSONL transcript management utilities."""

+import json
 import os

-from backend.util import json
-
 from .transcript import (
    STRIPPABLE_TYPES,
+    read_transcript_file,
    strip_progress_entries,
    validate_transcript,
    write_transcript_to_tempfile,
@@ -38,6 +38,49 @@ PROGRESS_ENTRY = {
 VALID_TRANSCRIPT = _make_jsonl(METADATA_LINE, FILE_HISTORY, USER_MSG, ASST_MSG)


+# --- read_transcript_file ---
+
+
+class TestReadTranscriptFile:
+    def test_returns_content_for_valid_file(self, tmp_path):
+        path = tmp_path / "session.jsonl"
+        path.write_text(VALID_TRANSCRIPT)
+        result = read_transcript_file(str(path))
+        assert result is not None
+        assert "user" in result
+
+    def test_returns_none_for_missing_file(self):
+        assert read_transcript_file("/nonexistent/path.jsonl") is None
+
+    def test_returns_none_for_empty_path(self):
+        assert read_transcript_file("") is None
+
+    def test_returns_none_for_empty_file(self, tmp_path):
+        path = tmp_path / "empty.jsonl"
+        path.write_text("")
+        assert read_transcript_file(str(path)) is None
+
+    def test_returns_none_for_metadata_only(self, tmp_path):
+        content = _make_jsonl(METADATA_LINE, FILE_HISTORY)
+        path = tmp_path / "meta.jsonl"
+        path.write_text(content)
+        assert read_transcript_file(str(path)) is None
+
+    def test_returns_none_for_invalid_json(self, tmp_path):
+        path = tmp_path / "bad.jsonl"
+        path.write_text("not json\n{}\n{}\n")
+        assert read_transcript_file(str(path)) is None
+
+    def test_no_size_limit(self, tmp_path):
+        """Large files are accepted — bucket storage has no size limit."""
+        big_content = {"type": "user", "uuid": "u9", "data": "x" * 1_000_000}
+        content = _make_jsonl(METADATA_LINE, FILE_HISTORY, big_content, ASST_MSG)
+        path = tmp_path / "big.jsonl"
+        path.write_text(content)
+        result = read_transcript_file(str(path))
+        assert result is not None
+
+
 # --- write_transcript_to_tempfile ---


@@ -112,56 +155,12 @@ class TestValidateTranscript:
        assert validate_transcript(content) is False

    def test_assistant_only_no_user(self):
-        """With --resume the user message is a CLI query param, not a transcript entry.
-        A transcript with only assistant entries is valid."""
        content = _make_jsonl(METADATA_LINE, FILE_HISTORY, ASST_MSG)
-        assert validate_transcript(content) is True
-
-    def test_resume_transcript_without_user_entry(self):
-        """Simulates a real --resume stop hook transcript: the CLI session file
-        has summary + assistant entries but no user entry."""
-        summary = {"type": "summary", "uuid": "s1", "text": "context..."}
-        asst1 = {
-            "type": "assistant",
-            "uuid": "a1",
-            "message": {"role": "assistant", "content": "Hello!"},
-        }
-        asst2 = {
-            "type": "assistant",
-            "uuid": "a2",
-            "parentUuid": "a1",
-            "message": {"role": "assistant", "content": "Sure, let me help."},
-        }
-        content = _make_jsonl(summary, asst1, asst2)
-        assert validate_transcript(content) is True
-
-    def test_single_assistant_entry(self):
-        """A transcript with just one assistant line is valid — the CLI may
-        produce short transcripts for simple responses with no tool use."""
-        content = json.dumps(ASST_MSG) + "\n"
-        assert validate_transcript(content) is True
+        assert validate_transcript(content) is False

    def test_invalid_json_returns_false(self):
        assert validate_transcript("not json\n{}\n{}\n") is False

-    def test_malformed_json_after_valid_assistant_returns_false(self):
-        """Validation must scan all lines - malformed JSON anywhere should fail."""
-        valid_asst = json.dumps(ASST_MSG)
-        malformed = "not valid json"
-        content = valid_asst + "\n" + malformed + "\n"
-        assert validate_transcript(content) is False
-
-    def test_blank_lines_are_skipped(self):
-        """Transcripts with blank lines should be valid if they contain assistant entries."""
-        content = (
-            json.dumps(USER_MSG)
-            + "\n\n"  # blank line
-            + json.dumps(ASST_MSG)
-            + "\n"
-            + "\n"  # another blank line
-        )
-        assert validate_transcript(content) is True
-

 # --- strip_progress_entries ---

@@ -254,31 +253,3 @@ class TestStripProgressEntries:
        assert "queue-operation" not in result_types
        assert "user" in result_types
        assert "assistant" in result_types
-
-    def test_preserves_original_line_formatting(self):
-        """Non-reparented entries keep their original JSON formatting."""
-        # orjson produces compact JSON - test that we preserve the exact input
-        # when no reparenting is needed (no re-serialization)
-        original_line = json.dumps(USER_MSG)
-
-        content = original_line + "\n" + json.dumps(ASST_MSG) + "\n"
-        result = strip_progress_entries(content)
-        result_lines = result.strip().split("\n")
-
-        # Original line should be byte-identical (not re-serialized)
-        assert result_lines[0] == original_line
-
-    def test_reparented_entries_are_reserialized(self):
-        """Entries whose parentUuid changes must be re-serialized."""
-        progress = {"type": "progress", "uuid": "p1", "parentUuid": "u1"}
-        asst = {
-            "type": "assistant",
-            "uuid": "a1",
-            "parentUuid": "p1",
-            "message": {"role": "assistant", "content": "done"},
-        }
-        content = _make_jsonl(USER_MSG, progress, asst)
-        result = strip_progress_entries(content)
-        lines = result.strip().split("\n")
-        asst_entry = json.loads(lines[-1])
-        assert asst_entry["parentUuid"] == "u1"  # reparented
--- a/autogpt_platform/backend/backend/copilot/service.py
+++ b/autogpt_platform/backend/backend/copilot/service.py
--- a/autogpt_platform/backend/backend/copilot/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/service_test.py
@@ -4,14 +4,87 @@ from os import getenv

 import pytest

+from . import service as chat_service
 from .model import create_chat_session, get_chat_session, upsert_chat_session
-from .response_model import StreamError, StreamTextDelta
+from .response_model import (
+    StreamError,
+    StreamFinish,
+    StreamTextDelta,
+    StreamToolOutputAvailable,
+)
 from .sdk import service as sdk_service
 from .sdk.transcript import download_transcript

 logger = logging.getLogger(__name__)


+@pytest.mark.asyncio(loop_scope="session")
+async def test_stream_chat_completion(setup_test_user, test_user_id):
+    """
+    Test the stream_chat_completion function.
+    """
+    api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
+    if not api_key:
+        return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
+
+    session = await create_chat_session(test_user_id)
+
+    has_errors = False
+    has_ended = False
+    assistant_message = ""
+    async for chunk in chat_service.stream_chat_completion(
+        session.session_id, "Hello, how are you?", user_id=session.user_id
+    ):
+        logger.info(chunk)
+        if isinstance(chunk, StreamError):
+            has_errors = True
+        if isinstance(chunk, StreamTextDelta):
+            assistant_message += chunk.delta
+        if isinstance(chunk, StreamFinish):
+            has_ended = True
+
+    assert has_ended, "Chat completion did not end"
+    assert not has_errors, "Error occurred while streaming chat completion"
+    assert assistant_message, "Assistant message is empty"
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_stream_chat_completion_with_tool_calls(setup_test_user, test_user_id):
+    """
+    Test the stream_chat_completion function.
+    """
+    api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
+    if not api_key:
+        return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
+
+    session = await create_chat_session(test_user_id)
+    session = await upsert_chat_session(session)
+
+    has_errors = False
+    has_ended = False
+    had_tool_calls = False
+    async for chunk in chat_service.stream_chat_completion(
+        session.session_id,
+        "Please find me an agent that can help me with my business. Use the query 'moneny printing agent'",
+        user_id=session.user_id,
+    ):
+        logger.info(chunk)
+        if isinstance(chunk, StreamError):
+            has_errors = True
+
+        if isinstance(chunk, StreamFinish):
+            has_ended = True
+        if isinstance(chunk, StreamToolOutputAvailable):
+            had_tool_calls = True
+
+    assert has_ended, "Chat completion did not end"
+    assert not has_errors, "Error occurred while streaming chat completion"
+    assert had_tool_calls, "Tool calls did not occur"
+    session = await get_chat_session(session.session_id)
+    assert session, "Session not found"
+    assert session.usage, "Usage is empty"
+
+
@pytest.mark.asyncio(loop_scope="session")
 async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
    """Test that the SDK --resume path captures and uses transcripts across turns.
@@ -41,6 +114,7 @@ async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
    )
    turn1_text = ""
    turn1_errors: list[str] = []
+    turn1_ended = False

    async for chunk in sdk_service.stream_chat_completion_sdk(
        session.session_id,
@@ -51,27 +125,24 @@ async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
            turn1_text += chunk.delta
        elif isinstance(chunk, StreamError):
            turn1_errors.append(chunk.errorText)
+        elif isinstance(chunk, StreamFinish):
+            turn1_ended = True

+    assert turn1_ended, "Turn 1 did not finish"
    assert not turn1_errors, f"Turn 1 errors: {turn1_errors}"
    assert turn1_text, "Turn 1 produced no text"

-    # Wait for background upload task to complete (retry up to 5s).
-    # The CLI may not produce a usable transcript for very short
-    # conversations (only metadata entries) — this is environment-dependent
-    # (CLI version, platform).  When that happens, multi-turn still works
-    # via conversation compression (non-resume path), but we can't test
-    # the --resume round-trip.
+    # Wait for background upload task to complete (retry up to 5s)
    transcript = None
    for _ in range(10):
        await asyncio.sleep(0.5)
        transcript = await download_transcript(test_user_id, session.session_id)
        if transcript:
            break
-    if not transcript:
-        return pytest.skip(
-            "CLI did not produce a usable transcript — "
-            "cannot test --resume round-trip in this environment"
-        )
+    assert transcript, (
+        "Transcript was not uploaded to bucket after turn 1 — "
+        "Stop hook may not have fired or transcript was too small"
+    )
    logger.info(f"Turn 1 transcript uploaded: {len(transcript.content)} bytes")

    # Reload session for turn 2
@@ -82,6 +153,7 @@ async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
    turn2_msg = "What was the special keyword I asked you to remember?"
    turn2_text = ""
    turn2_errors: list[str] = []
+    turn2_ended = False

    async for chunk in sdk_service.stream_chat_completion_sdk(
        session.session_id,
@@ -93,7 +165,10 @@ async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
            turn2_text += chunk.delta
        elif isinstance(chunk, StreamError):
            turn2_errors.append(chunk.errorText)
+        elif isinstance(chunk, StreamFinish):
+            turn2_ended = True

+    assert turn2_ended, "Turn 2 did not finish"
    assert not turn2_errors, f"Turn 2 errors: {turn2_errors}"
    assert turn2_text, "Turn 2 produced no text"
    assert keyword in turn2_text, (
--- a/autogpt_platform/backend/backend/copilot/stream_registry.py
+++ b/autogpt_platform/backend/backend/copilot/stream_registry.py
--- a/autogpt_platform/backend/backend/copilot/test_copilot_e2e.py
+++ b/autogpt_platform/backend/backend/copilot/test_copilot_e2e.py
@@ -1,401 +0,0 @@
-"""End-to-end tests for Copilot streaming with dummy implementations.
-
-These tests verify the complete copilot flow using dummy implementations
-for agent generator and SDK service, allowing automated testing without
-external LLM calls.
-
-Enable test mode with COPILOT_TEST_MODE=true environment variable.
-
-Note: StreamFinish is NOT emitted by the dummy service — it is published
-by mark_session_completed in the processor layer.  These tests only cover
-the service-level streaming output (StreamStart + StreamTextDelta).
-"""
-
-import asyncio
-import os
-from uuid import uuid4
-
-import pytest
-
-from backend.copilot.model import ChatMessage, ChatSession, upsert_chat_session
-from backend.copilot.response_model import (
-    StreamError,
-    StreamHeartbeat,
-    StreamStart,
-    StreamTextDelta,
-)
-from backend.copilot.sdk.dummy import stream_chat_completion_dummy
-
-
-@pytest.fixture(autouse=True)
-def enable_test_mode():
-    """Enable test mode for all tests in this module."""
-    os.environ["COPILOT_TEST_MODE"] = "true"
-    yield
-    os.environ.pop("COPILOT_TEST_MODE", None)
-
-
-@pytest.mark.asyncio
-async def test_dummy_streaming_basic_flow():
-    """Test that dummy streaming produces correct event sequence."""
-    events = []
-
-    async for event in stream_chat_completion_dummy(
-        session_id="test-session-basic",
-        message="Hello",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        events.append(event)
-
-    # Verify we got events
-    assert len(events) > 0, "Should receive events"
-
-    # Verify StreamStart
-    start_events = [e for e in events if isinstance(e, StreamStart)]
-    assert len(start_events) == 1
-    assert start_events[0].messageId
-    assert start_events[0].sessionId
-
-    # Verify StreamTextDelta events
-    text_events = [e for e in events if isinstance(e, StreamTextDelta)]
-    assert len(text_events) > 0
-    full_text = "".join(e.delta for e in text_events)
-    assert len(full_text) > 0
-
-    # Verify order: start before text
-    start_idx = events.index(start_events[0])
-    first_text_idx = events.index(text_events[0]) if text_events else -1
-    if first_text_idx >= 0:
-        assert start_idx < first_text_idx
-
-    print(f"✅ Basic flow: {len(events)} events, {len(text_events)} text deltas")
-
-
-@pytest.mark.asyncio
-async def test_streaming_no_timeout():
-    """Test that streaming completes within reasonable time without timeout."""
-    import time
-
-    start_time = time.monotonic()
-    event_count = 0
-
-    async for _event in stream_chat_completion_dummy(
-        session_id="test-session-timeout",
-        message="count to 10",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        event_count += 1
-
-    elapsed = time.monotonic() - start_time
-
-    # Should complete in < 5 seconds (dummy has 0.1s delays between words)
-    assert elapsed < 5.0, f"Streaming took {elapsed:.1f}s, expected < 5s"
-    assert event_count > 0, "Should receive events"
-
-    print(f"✅ No timeout: completed in {elapsed:.2f}s with {event_count} events")
-
-
-@pytest.mark.asyncio
-async def test_streaming_event_types():
-    """Test that all expected event types are present."""
-    event_types = set()
-
-    async for event in stream_chat_completion_dummy(
-        session_id="test-session-types",
-        message="test",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        event_types.add(type(event).__name__)
-
-    # Required event types (StreamFinish is published by processor, not service)
-    assert "StreamStart" in event_types, "Missing StreamStart"
-    assert "StreamTextDelta" in event_types, "Missing StreamTextDelta"
-
-    print(f"✅ Event types: {sorted(event_types)}")
-
-
-@pytest.mark.asyncio
-async def test_streaming_text_content():
-    """Test that streamed text is coherent and complete."""
-    text_events = []
-
-    async for event in stream_chat_completion_dummy(
-        session_id="test-session-content",
-        message="count to 3",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        if isinstance(event, StreamTextDelta):
-            text_events.append(event)
-
-    # Verify text deltas
-    assert len(text_events) > 0, "Should have text deltas"
-
-    # Reconstruct full text
-    full_text = "".join(e.delta for e in text_events)
-    assert len(full_text) > 0, "Text should not be empty"
-    assert (
-        "1" in full_text or "counted" in full_text.lower()
-    ), "Text should contain count"
-
-    # Verify all deltas have IDs
-    for text_event in text_events:
-        assert text_event.id, "Text delta must have ID"
-        assert text_event.delta, "Text delta must have content"
-
-    print(f"✅ Text content: '{full_text}' ({len(text_events)} deltas)")
-
-
-@pytest.mark.asyncio
-async def test_streaming_heartbeat_timing():
-    """Test that heartbeats are sent at correct interval during long operations."""
-    # This test would need a dummy that takes longer
-    # For now, just verify heartbeat structure if we receive one
-    heartbeats = []
-
-    async for event in stream_chat_completion_dummy(
-        session_id="test-session-heartbeat",
-        message="test",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        if isinstance(event, StreamHeartbeat):
-            heartbeats.append(event)
-
-    # Dummy is fast, so we might not get heartbeats
-    # But if we do, verify they're valid
-    if heartbeats:
-        print(f"✅ Heartbeat structure verified ({len(heartbeats)} received)")
-    else:
-        print("✅ No heartbeats (dummy executes quickly)")
-
-
-@pytest.mark.asyncio
-async def test_error_handling():
-    """Test that errors are properly formatted and sent."""
-    # This would require a dummy that can trigger errors
-    # For now, just verify error event structure
-
-    error = StreamError(errorText="Test error", code="test_error")
-    assert error.errorText == "Test error"
-    assert error.code == "test_error"
-    assert str(error.type.value) in ["error", "error"]
-
-    print("✅ Error structure verified")
-
-
-@pytest.mark.asyncio
-async def test_concurrent_sessions():
-    """Test that multiple sessions can stream concurrently."""
-
-    async def stream_session(session_id: str) -> int:
-        count = 0
-        async for _event in stream_chat_completion_dummy(
-            session_id=session_id,
-            message="test",
-            is_user_message=True,
-            user_id="test-user",
-        ):
-            count += 1
-        return count
-
-    # Run 3 concurrent sessions
-    results = await asyncio.gather(
-        stream_session("session-1"),
-        stream_session("session-2"),
-        stream_session("session-3"),
-    )
-
-    # All should complete successfully
-    assert all(count > 0 for count in results), "All sessions should produce events"
-    print(f"✅ Concurrent sessions: {results} events each")
-
-
-@pytest.mark.asyncio
-@pytest.mark.xfail(
-    reason="Event loop isolation issue with DB operations in tests - needs fixture refactoring"
-)
-async def test_session_state_persistence():
-    """Test that session state is maintained across multiple messages."""
-    from datetime import datetime, timezone
-
-    session_id = f"test-session-{uuid4()}"
-    user_id = "test-user"
-
-    # Create session with first message
-    session = ChatSession(
-        session_id=session_id,
-        user_id=user_id,
-        messages=[
-            ChatMessage(role="user", content="Hello"),
-            ChatMessage(role="assistant", content="Hi there!"),
-        ],
-        usage=[],
-        started_at=datetime.now(timezone.utc),
-        updated_at=datetime.now(timezone.utc),
-    )
-    await upsert_chat_session(session)
-
-    # Stream second message
-    events = []
-    async for event in stream_chat_completion_dummy(
-        session_id=session_id,
-        message="How are you?",
-        is_user_message=True,
-        user_id=user_id,
-        session=session,  # Pass existing session
-    ):
-        events.append(event)
-
-    # Verify events were produced
-    assert len(events) > 0, "Should produce events for second message"
-
-    print(f"✅ Session persistence: {len(events)} events for second message")
-
-
-@pytest.mark.asyncio
-async def test_message_deduplication():
-    """Test that duplicate messages are filtered out."""
-
-    # Simulate receiving duplicate events (e.g., from reconnection)
-    events = []
-
-    # First stream
-    async for event in stream_chat_completion_dummy(
-        session_id="test-dedup-1",
-        message="Hello",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        events.append(event)
-
-    # Count unique message IDs in StreamStart events
-    start_events = [e for e in events if isinstance(e, StreamStart)]
-    message_ids = [e.messageId for e in start_events]
-
-    # Verify all IDs are present
-    assert len(message_ids) == len(set(message_ids)), "Message IDs should be unique"
-
-    print(f"✅ Deduplication: {len(events)} events, all unique")
-
-
-@pytest.mark.asyncio
-async def test_event_ordering():
-    """Test that events arrive in correct order."""
-    events = []
-
-    async for event in stream_chat_completion_dummy(
-        session_id="test-ordering",
-        message="Test",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        events.append(event)
-
-    # Find event indices
-    start_idx = next(
-        (i for i, e in enumerate(events) if isinstance(e, StreamStart)), None
-    )
-    text_indices = [i for i, e in enumerate(events) if isinstance(e, StreamTextDelta)]
-
-    # Verify ordering
-    assert start_idx is not None, "Should have StreamStart"
-    assert start_idx == 0, "StreamStart should be first"
-
-    if text_indices:
-        assert all(
-            start_idx < i for i in text_indices
-        ), "Text deltas should be after start"
-
-    print(f"✅ Event ordering: start({start_idx}) < text deltas")
-
-
-@pytest.mark.asyncio
-async def test_stream_completeness():
-    """Test that stream includes all required event types."""
-    events = []
-
-    async for event in stream_chat_completion_dummy(
-        session_id="test-completeness",
-        message="Complete stream test",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        events.append(event)
-
-    # Check for required events (StreamFinish is published by processor)
-    has_start = any(isinstance(e, StreamStart) for e in events)
-    has_text = any(isinstance(e, StreamTextDelta) for e in events)
-
-    assert has_start, "Stream must include StreamStart"
-    assert has_text, "Stream must include text deltas"
-
-    # Verify exactly one start
-    start_count = sum(1 for e in events if isinstance(e, StreamStart))
-    assert start_count == 1, f"Should have exactly 1 StreamStart, got {start_count}"
-
-    print(
-        f"✅ Completeness: 1 start, {sum(1 for e in events if isinstance(e, StreamTextDelta))} text deltas"
-    )
-
-
-@pytest.mark.asyncio
-async def test_text_delta_consistency():
-    """Test that text deltas have consistent IDs and build coherent text."""
-    text_events = []
-
-    async for event in stream_chat_completion_dummy(
-        session_id="test-consistency",
-        message="Test consistency",
-        is_user_message=True,
-        user_id="test-user",
-    ):
-        if isinstance(event, StreamTextDelta):
-            text_events.append(event)
-
-    # Verify all text deltas have IDs
-    assert all(e.id for e in text_events), "All text deltas must have IDs"
-
-    # Verify all deltas have the same ID (same text block)
-    if text_events:
-        first_id = text_events[0].id
-        assert all(
-            e.id == first_id for e in text_events
-        ), "All text deltas should share the same block ID"
-
-    # Verify deltas build coherent text
-    full_text = "".join(e.delta for e in text_events)
-    assert len(full_text) > 0, "Deltas should build non-empty text"
-    assert (
-        full_text == full_text.strip()
-    ), "Text should not have leading/trailing whitespace artifacts"
-
-    print(
-        f"✅ Consistency: {len(text_events)} deltas with ID '{text_events[0].id if text_events else 'N/A'}', text: '{full_text}'"
-    )
-
-
-if __name__ == "__main__":
-    # Run tests directly
-
-    print("Running Copilot E2E tests with dummy implementations...")
-    print("=" * 60)
-
-    asyncio.run(test_dummy_streaming_basic_flow())
-    asyncio.run(test_streaming_no_timeout())
-    asyncio.run(test_streaming_event_types())
-    asyncio.run(test_streaming_text_content())
-    asyncio.run(test_streaming_heartbeat_timing())
-    asyncio.run(test_error_handling())
-    asyncio.run(test_concurrent_sessions())
-    asyncio.run(test_session_state_persistence())
-    asyncio.run(test_message_deduplication())
-    asyncio.run(test_event_ordering())
-    asyncio.run(test_stream_completeness())
-    asyncio.run(test_text_delta_consistency())
-
-    print("=" * 60)
-    print("✅ All E2E tests passed!")
--- a/autogpt_platform/backend/backend/copilot/tools/init.py
+++ b/autogpt_platform/backend/backend/copilot/tools/init.py
@@ -1,17 +1,16 @@
-from __future__ import annotations
-
 import logging
 from typing import TYPE_CHECKING, Any

 from openai.types.chat import ChatCompletionToolParam

+from backend.copilot.model import ChatSession
 from backend.copilot.tracking import track_tool_called

 from .add_understanding import AddUnderstandingTool
-from .agent_browser import BrowserActTool, BrowserNavigateTool, BrowserScreenshotTool
 from .agent_output import AgentOutputTool
 from .base import BaseTool
 from .bash_exec import BashExecTool
+from .check_operation_status import CheckOperationStatusTool
 from .create_agent import CreateAgentTool
 from .customize_agent import CustomizeAgentTool
 from .edit_agent import EditAgentTool
@@ -20,17 +19,8 @@ from .find_agent import FindAgentTool
 from .find_block import FindBlockTool
 from .find_library_agent import FindLibraryAgentTool
 from .get_doc_page import GetDocPageTool
-from .manage_folders import (
-    CreateFolderTool,
-    DeleteFolderTool,
-    ListFoldersTool,
-    MoveAgentsToFolderTool,
-    MoveFolderTool,
-    UpdateFolderTool,
-)
 from .run_agent import RunAgentTool
 from .run_block import RunBlockTool
-from .run_mcp_tool import RunMCPToolTool
 from .search_docs import SearchDocsTool
 from .web_fetch import WebFetchTool
 from .workspace_files import (
@@ -41,7 +31,6 @@ from .workspace_files import (
 )

 if TYPE_CHECKING:
-    from backend.copilot.model import ChatSession
    from backend.copilot.response_model import StreamToolOutputAvailable

 logger = logging.getLogger(__name__)
@@ -55,25 +44,14 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
    "find_agent": FindAgentTool(),
    "find_block": FindBlockTool(),
    "find_library_agent": FindLibraryAgentTool(),
-    # Folder management tools
-    "create_folder": CreateFolderTool(),
-    "list_folders": ListFoldersTool(),
-    "update_folder": UpdateFolderTool(),
-    "move_folder": MoveFolderTool(),
-    "delete_folder": DeleteFolderTool(),
-    "move_agents_to_folder": MoveAgentsToFolderTool(),
    "run_agent": RunAgentTool(),
    "run_block": RunBlockTool(),
-    "run_mcp_tool": RunMCPToolTool(),
    "view_agent_output": AgentOutputTool(),
+    "check_operation_status": CheckOperationStatusTool(),
    "search_docs": SearchDocsTool(),
    "get_doc_page": GetDocPageTool(),
    # Web fetch for safe URL retrieval
    "web_fetch": WebFetchTool(),
-    # Agent-browser multi-step automation (navigate, act, screenshot)
-    "browser_navigate": BrowserNavigateTool(),
-    "browser_act": BrowserActTool(),
-    "browser_screenshot": BrowserScreenshotTool(),
    # Sandboxed code execution (bubblewrap)
    "bash_exec": BashExecTool(),
    # Persistent workspace tools (cloud storage, survives across sessions)
@@ -91,17 +69,10 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
 find_agent_tool = TOOL_REGISTRY["find_agent"]
 run_agent_tool = TOOL_REGISTRY["run_agent"]

-
-def get_available_tools() -> list[ChatCompletionToolParam]:
-    """Return OpenAI tool schemas for tools available in the current environment.
-
-    Called per-request so that env-var or binary availability is evaluated
-    fresh each time (e.g. browser_* tools are excluded when agent-browser
-    CLI is not installed).
-    """
-    return [
-        tool.as_openai_tool() for tool in TOOL_REGISTRY.values() if tool.is_available
-    ]
+# Generated from registry for OpenAI API
+tools: list[ChatCompletionToolParam] = [
+    tool.as_openai_tool() for tool in TOOL_REGISTRY.values()
+]


 def get_tool(tool_name: str) -> BaseTool | None:
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Zamil Majdy	11e6fca8c3	fix(copilot): resolve dangling tool spinners when stream finishes When the backend doesn't emit StreamToolOutputAvailable for all tool calls before StreamFinish (e.g. SDK built-in tools like WebSearch), the frontend spinners would spin forever. Add a useEffect that watches for the streaming→ready transition and marks any remaining input-available/input-streaming tool parts as output-available. Extract shared resolveInProgressTools helper used by both the stop handler (cancelled) and stream-end (completed).	2026-02-20 03:48:20 +07:00
Zamil Majdy	6e737e0b74	style: fix Black formatting on cancel endpoint	2026-02-20 02:46:03 +07:00
Zamil Majdy	5ce002803d	fix(copilot): toast when cancel confirmation times out Check the reason field in the cancel response — if "cancel_published_not_confirmed", show a non-destructive toast so the user knows the stop was sent but not yet confirmed by the executor.	2026-02-20 02:31:18 +07:00
Zamil Majdy	f8ad8484ee	refactor(copilot): convert stop to plain function declaration Remove useCallback wrapper per project guidelines — stopRef.current captures the latest closure on every render regardless.	2026-02-20 02:25:43 +07:00
Zamil Majdy	b6064d0155	fix(copilot): address round-2 PR review and fix tool loading on stop Backend: - Add _validate_and_get_session() call to cancel endpoint (404 for invalid sessions, consistent with other endpoints) - Reduce polling max_wait from 10s to 5s (stay below reverse-proxy read timeouts) - Return cancelled=True with reason="cancel_published_not_confirmed" on timeout (cancel event IS published, just not yet confirmed) Frontend: - Mark in-progress tool parts as output-error on stop so spinners clear immediately instead of spinning forever - Toast on cancel API failure (network error / 5xx)	2026-02-20 02:21:28 +07:00
Zamil Majdy	76e0c96aa9	feat: fix openapi.json	2026-02-20 02:14:07 +07:00
Zamil Majdy	3364a8e415	refactor(copilot): use generated client for cancel API call Replace raw fetch() with generated postV2CancelSessionTask() and remove the now-unnecessary dedicated cancel proxy route — the general /api/proxy handles auth and forwarding. Toast on cancel failure so the user knows the backend may still be running.	2026-02-20 02:10:10 +07:00
Zamil Majdy	9f4f2749a4	fix(copilot): address PR review comments for cancel endpoint - Add CancelTaskResponse Pydantic model with typed return annotation - Handle non-JSON backend responses in cancel proxy route - Check for "no-token-found" token before forwarding auth header - Truncate IDs in log messages for consistency - Add cancel endpoint to openapi.json for frontend codegen	2026-02-20 02:02:14 +07:00
Zamil Majdy	2b0f457985	feat(copilot): wire up stop button to cancel executor tasks The stop button was completely disconnected — clicking it only aborted the client-side SSE fetch while the executor kept running indefinitely. - Add `enqueue_cancel_task()` to publish `CancelCoPilotEvent` to the existing RabbitMQ FANOUT exchange that the executor already consumes - Add `POST /sessions/{session_id}/cancel` endpoint that finds the active task, publishes the cancel event, and polls Redis until the task status confirms stopped (up to 10s) - Add Next.js API proxy route for the cancel endpoint - Wrap the AI SDK's `stop()` to also call the cancel API so the executor actually terminates	2026-02-20 01:20:19 +07:00