fix(copilot): prevent duplicate error markers and extract shared helper

- Extract `_append_error_marker()` helper to deduplicate marker appending logic across 4 call sites - Skip appending error marker in BaseException handler when one was already appended inside the stream loop (ended_with_stream_error) - Update misleading "mark as retryable" comment to match actual behavior (uses retryable prefix, not a model field) - Add docstring to `_safe()` helper - Remove unused `prefix` variable from stream error tuple
2026-03-17 03:00:27 -04:00 · 2026-03-17 13:51:19 +07:00 · 2026-03-17 13:50:30 +07:00 · 2026-03-17 13:44:50 +07:00 · 2026-03-17 13:36:21 +07:00 · 2026-03-17 13:32:29 +07:00
56 changed files with 4764 additions and 709 deletions
--- a/.claude/skills/backend-check/SKILL.md
+++ b/.claude/skills/backend-check/SKILL.md
@@ -1,17 +0,0 @@
---
-name: backend-check
-description: Run the full backend formatting, linting, and test suite. Ensures code quality before commits and PRs. TRIGGER when backend Python code has been modified and needs validation.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Backend Check
-
-## Steps
-
-1. **Format**: `poetry run format` — runs formatting AND linting. NEVER run ruff/black/isort individually
-2. **Fix** any remaining errors manually, re-run until clean
-3. **Test**: `poetry run test` (runs DB setup + pytest). For specific files: `poetry run pytest -s -vvv <test_files>`
-4. **Snapshots** (if needed): `poetry run pytest path/to/test.py --snapshot-update` — review with `git diff`
--- a/.claude/skills/code-style/SKILL.md
+++ b/.claude/skills/code-style/SKILL.md
@@ -1,35 +0,0 @@
---
-name: code-style
-description: Python code style preferences for the AutoGPT backend. Apply when writing or reviewing Python code. TRIGGER when writing new Python code, reviewing PRs, or refactoring backend code.
-user-invocable: false
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Code Style
-
-## Imports
-
- **Top-level only** — no local/inner imports. Move all imports to the top of the file.
-
-## Typing
-
- **No duck typing** — avoid `hasattr`, `getattr`, `isinstance` for type dispatch. Use proper typed interfaces, unions, or protocols.
- **Pydantic models** over dataclass, namedtuple, or raw dict for structured data.
- **No linter suppressors** — avoid `# type: ignore`, `# noqa`, `# pyright: ignore` etc. 99% of the time the right fix is fixing the type/code, not silencing the tool.
-
-## Code Structure
-
- **List comprehensions** over manual loop-and-append.
- **Early return** — guard clauses first, avoid deep nesting.
- **Flatten inline** — prefer short, concise expressions. Reduce `if/else` chains with direct returns or ternaries when readable.
- **Modular functions** — break complex logic into small, focused functions rather than long blocks with nested conditionals.
-
-## Review Checklist
-
-Before finishing, always ask:
- Can any function be split into smaller pieces?
- Is there unnecessary nesting that an early return would eliminate?
- Can any loop be a comprehension?
- Is there a simpler way to express this logic?
--- a/.claude/skills/frontend-check/SKILL.md
+++ b/.claude/skills/frontend-check/SKILL.md
@@ -1,16 +0,0 @@
---
-name: frontend-check
-description: Run the full frontend formatting, linting, and type checking suite. Ensures code quality before commits and PRs. TRIGGER when frontend TypeScript/React code has been modified and needs validation.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Frontend Check
-
-## Steps (in order)
-
-1. **Format**: `pnpm format` — NEVER run individual formatters
-2. **Lint**: `pnpm lint` — fix errors, re-run until clean
-3. **Types**: `pnpm types` — if it keeps failing after multiple attempts, stop and ask the user
--- a/.claude/skills/new-block/SKILL.md
+++ b/.claude/skills/new-block/SKILL.md
@@ -1,29 +0,0 @@
---
-name: new-block
-description: Create a new backend block following the Block SDK Guide. Guides through provider configuration, schema definition, authentication, and testing. TRIGGER when user asks to create a new block, add a new integration, or build a new node for the graph editor.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# New Block Creation
-
-Read `docs/platform/block-sdk-guide.md` first for the full guide.
-
-## Steps
-
-1. **Provider config** (if external service): create `_config.py` with `ProviderBuilder`
-2. **Block file** in `backend/blocks/` (from `autogpt_platform/backend/`):
-   - Generate a UUID once with `uuid.uuid4()`, then **hard-code that string** as `id` (IDs must be stable across imports)
-   - `Input(BlockSchema)` and `Output(BlockSchema)` classes
-   - `async def run` that `yield`s output fields
-3. **Files**: use `store_media_file()` with `"for_block_output"` for outputs
-4. **Test**: `poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[MyBlock]' -xvs`
-5. **Format**: `poetry run format`
-
-## Rules
-
- Analyze interfaces: do inputs/outputs connect well with other blocks in a graph?
- Use top-level imports, avoid duck typing
- Always use `for_block_output` for block outputs
--- a/.claude/skills/openapi-regen/SKILL.md
+++ b/.claude/skills/openapi-regen/SKILL.md
@@ -1,28 +0,0 @@
---
-name: openapi-regen
-description: Regenerate the OpenAPI spec and frontend API client. Starts the backend REST server, fetches the spec, and regenerates the typed frontend hooks. TRIGGER when API routes change, new endpoints are added, or frontend API types are stale.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# OpenAPI Spec Regeneration
-
-## Steps
-
-1. **Run end-to-end** in a single shell block (so `REST_PID` persists):
-   ```bash
-   cd autogpt_platform/backend && poetry run rest &
-   REST_PID=$!
-   WAIT=0; until curl -sf http://localhost:8006/health > /dev/null 2>&1; do sleep 1; WAIT=$((WAIT+1)); [ $WAIT -ge 60 ] && echo "Timed out" && kill $REST_PID && exit 1; done
-   cd ../frontend && pnpm generate:api:force
-   kill $REST_PID
-   pnpm types && pnpm lint && pnpm format
-   ```
-
-## Rules
-
- Always use `pnpm generate:api:force` (not `pnpm generate:api`)
- Don't manually edit files in `src/app/api/__generated__/`
- Generated hooks follow: `use{Method}{Version}{OperationName}`
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -0,0 +1,79 @@
+---
+name: pr-address
+description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
+user-invocable: true
+args: "[PR number or URL] — if omitted, finds PR for current branch."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# PR Address
+
+## Find the PR
+
+```bash
+gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT
+gh pr view {N}
+```
+
+## Fetch comments (all sources)
+
+```bash
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews       # top-level reviews
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments      # inline review comments
+gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments     # PR conversation comments
+```
+
+**Bots to watch for:**
+- `autogpt-reviewer` — posts "Blockers", "Should Fix", "Nice to Have". Address ALL of them.
+- `sentry[bot]` — bug predictions. Fix real bugs, explain false positives.
+- `coderabbitai[bot]` — automated review. Address actionable items.
+
+## For each unaddressed comment
+
+Address comments **one at a time**: fix → commit → push → inline reply → next.
+
+1. Read the referenced code, make the fix (or reply explaining why it's not needed)
+2. Commit and push the fix
+3. Reply **inline** (not as a new top-level comment) referencing the fixing commit — this is what resolves the conversation for bot reviewers (coderabbitai, sentry):
+
+| Comment type | How to reply |
+|---|---|
+| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="Fixed in <commit-sha>: <description>"` |
+| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="Fixed in <commit-sha>: <description>"` |
+
+## Format and commit
+
+After fixing, format the changed code:
+
+- **Backend** (from `autogpt_platform/backend/`): `poetry run format`
+- **Frontend** (from `autogpt_platform/frontend/`): `pnpm format && pnpm lint && pnpm types`
+
+If API routes changed, regenerate the frontend client:
+```bash
+cd autogpt_platform/backend && poetry run rest &
+REST_PID=$!
+trap "kill $REST_PID 2>/dev/null" EXIT
+WAIT=0; until curl -sf http://localhost:8006/health > /dev/null 2>&1; do sleep 1; WAIT=$((WAIT+1)); [ $WAIT -ge 60 ] && echo "Timed out" && exit 1; done
+cd ../frontend && pnpm generate:api:force
+kill $REST_PID 2>/dev/null; trap - EXIT
+```
+Never manually edit files in `src/app/api/__generated__/`.
+
+Then commit and **push immediately** — never batch commits without pushing.
+
+For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).
+
+## The loop
+
+```text
+address comments → format → commit → push
+→ re-check comments → fix new ones → push
+→ wait for CI → re-check comments after CI settles
+→ repeat until: all comments addressed AND CI green AND no new comments arriving
+```
+
+While CI runs, stay productive: run local tests, address remaining comments.
+
+**The loop ends when:** CI fully green + all comments addressed + no new comments since CI settled.
--- a/.claude/skills/pr-create/SKILL.md
+++ b/.claude/skills/pr-create/SKILL.md
@@ -1,31 +0,0 @@
---
-name: pr-create
-description: Create a pull request for the current branch. TRIGGER when user asks to create a PR, open a pull request, push changes for review, or submit work for merging.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Create Pull Request
-
-## Steps
-
-1. **Check for existing PR**: `gh pr view --json url -q .url 2>/dev/null` — if a PR already exists, output its URL and stop
-2. **Understand changes**: `git status`, `git diff dev...HEAD`, `git log dev..HEAD --oneline`
-3. **Read PR template**: `.github/PULL_REQUEST_TEMPLATE.md`
-4. **Draft PR title**: Use conventional commits format (see CLAUDE.md for types and scopes)
-5. **Fill out PR template** as the body — be thorough in the Changes section
-6. **Format first** (if relevant changes exist):
-   - Backend: `cd autogpt_platform/backend && poetry run format`
-   - Frontend: `cd autogpt_platform/frontend && pnpm format`
-   - Fix any lint errors, then commit formatting changes before pushing
-7. **Push**: `git push -u origin HEAD`
-8. **Create PR**: `gh pr create --base dev`
-9. **Output** the PR URL
-
-## Rules
-
- Always target `dev` branch
- Do NOT run tests — CI will handle that
- Use the PR template from `.github/PULL_REQUEST_TEMPLATE.md`
--- a/.claude/skills/pr-review/SKILL.md
+++ b/.claude/skills/pr-review/SKILL.md
@@ -1,51 +1,74 @@
 ---
 name: pr-review
-description: Address all open PR review comments systematically. Fetches comments, addresses each one, reacts +1/-1, and replies when clarification is needed. Keeps iterating until all comments are addressed and CI is green. TRIGGER when user shares a PR URL, asks to address review comments, fix PR feedback, or respond to reviewer comments.
+description: Review a PR for correctness, security, code quality, and testing issues. TRIGGER when user asks to review a PR, check PR quality, or give feedback on a PR.
 user-invocable: true
+args: "[PR number or URL] — if omitted, finds PR for current branch."
 metadata:
  author: autogpt-team
  version: "1.0.0"
 ---

-# PR Review Comment Workflow
+# PR Review

-## Steps
+## Find the PR

-1. **Find PR**: `gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT`
-2. **Fetch comments** (all three sources):
-   - `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` (top-level reviews)
-   - `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` (inline review comments)
-   - `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` (PR conversation comments)
-3. **Skip** comments already reacted to by PR author
-4. **For each unreacted comment**:
-   - Read referenced code, make the fix (or reply if you disagree/need info)
-   - **Inline review comments** (`pulls/{N}/comments`):
-     - React: `gh api repos/.../pulls/comments/{ID}/reactions -f content="+1"` (or `-1`)
-     - Reply: `gh api repos/.../pulls/{N}/comments/{ID}/replies -f body="..."`
-   - **PR conversation comments** (`issues/{N}/comments`):
-     - React: `gh api repos/.../issues/comments/{ID}/reactions -f content="+1"` (or `-1`)
-     - No threaded replies — post a new issue comment if needed
-   - **Top-level reviews**: no reaction API — address in code, reply via issue comment if needed
-5. **Include autogpt-reviewer bot fixes** too
-6. **Format**: `cd autogpt_platform/backend && poetry run format`, `cd autogpt_platform/frontend && pnpm format`
-7. **Commit & push**
-8. **Re-fetch comments** immediately — address any new unreacted ones before waiting on CI
-9. **Stay productive while CI runs** — don't idle. In priority order:
-   - Run any pending local tests (`poetry run pytest`, e2e, etc.) and fix failures
-   - Address any remaining comments
-   - Only poll `gh pr checks {N}` as the last resort when there's truly nothing left to do
-10. **If CI fails** — fix, go back to step 6
-11. **Re-fetch comments again** after CI is green — address anything that appeared while CI was running
-12. **Done** only when: all comments reacted AND CI is green.
+```bash
+gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT
+gh pr view {N}
+```

-## CRITICAL: Do Not Stop
+## Read the diff

-**Loop is: address → format → commit → push → re-check comments → run local tests → wait CI → re-check comments → repeat.**
+```bash
+gh pr diff {N}
+```

-Never idle. If CI is running and you have nothing to address, run local tests. Waiting on CI is the last resort.
+## Fetch existing review comments

-## Rules
+Before posting anything, fetch existing inline comments to avoid duplicates:

- One todo per comment
- For inline review comments: reply on existing threads. For PR conversation comments: post a new issue comment (API doesn't support threaded replies)
- React to every comment: +1 addressed, -1 disagreed (with explanation)
+```bash
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
+```
+
+## What to check
+
+**Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).
+
+**Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).
+
+**Code quality:** apply rules from backend/frontend CLAUDE.md files.
+
+**Architecture:** DRY, single responsibility, modular functions. `Security()` vs `Depends()` for FastAPI auth. `data:` for SSE events, `: comment` for heartbeats. `transaction=True` for Redis pipelines.
+
+**Testing:** edge cases covered, colocated `*_test.py` (backend) / `__tests__/` (frontend), mocks target where symbol is **used** not defined, `AsyncMock` for async.
+
+## Output format
+
+Every comment **must** be prefixed with `🤖` and a criticality badge:
+
+| Tier | Badge | Meaning |
+|---|---|---|
+| Blocker | `🔴 **Blocker**` | Must fix before merge |
+| Should Fix | `🟠 **Should Fix**` | Important improvement |
+| Nice to Have | `🟡 **Nice to Have**` | Minor suggestion |
+| Nit | `🔵 **Nit**` | Style / wording |
+
+Example: `🤖 🔴 **Blocker**: Missing error handling for X — suggest wrapping in try/except.`
+
+## Post inline comments
+
+For each finding, post an inline comment on the PR (do not just write a local report):
+
+```bash
+# Get the latest commit SHA for the PR
+COMMIT_SHA=$(gh api repos/Significant-Gravitas/AutoGPT/pulls/{N} --jq '.head.sha')
+
+# Post an inline comment on a specific file/line
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments \
+  -f body="🤖 🔴 **Blocker**: <description>" \
+  -f commit_id="$COMMIT_SHA" \
+  -f path="<file path>" \
+  -F line=<line number>
+```
--- a/.claude/skills/worktree-setup/SKILL.md
+++ b/.claude/skills/worktree-setup/SKILL.md
@@ -1,45 +0,0 @@
---
-name: worktree-setup
-description: Set up a new git worktree for parallel development. Creates the worktree, copies .env files, installs dependencies, generates Prisma client, and optionally starts the app (with port conflict resolution) or runs tests. TRIGGER when user asks to set up a worktree, work on a branch in isolation, or needs a separate environment for a branch or PR.
-user-invocable: true
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Worktree Setup
-
-## Preferred: Use Branchlet
-
-The repo has a `.branchlet.json` config — it handles env file copying, dependency installation, and Prisma generation automatically.
-
-```bash
-npm install -g branchlet                                      # install once
-branchlet create -n <name> -s <source-branch> -b <new-branch>
-branchlet list --json   # list all worktrees
-```
-
-## Manual Fallback
-
-If branchlet isn't available:
-
-1. `git worktree add ../<RepoName><N> <branch-name>`
-2. Copy `.env` files: `backend/.env`, `frontend/.env`, `autogpt_platform/.env`, `db/docker/.env`
-3. Install deps:
-   - `cd autogpt_platform/backend && poetry install && poetry run prisma generate`
-   - `cd autogpt_platform/frontend && pnpm install`
-
-## Running the App
-
-Free ports first — backend uses: 8001, 8002, 8003, 8005, 8006, 8007, 8008.
-
-```bash
-for port in 8001 8002 8003 8005 8006 8007 8008; do
-  lsof -ti :$port | xargs kill -9 2>/dev/null || true
-done
-cd <worktree>/autogpt_platform/backend && poetry run app
-```
-
-## CoPilot Testing Gotcha
-
-SDK mode spawns a Claude subprocess — **won't work inside Claude Code**. Set `CHAT_USE_CLAUDE_AGENT_SDK=false` in `backend/.env` to use baseline mode.
--- a/.claude/skills/worktree/SKILL.md
+++ b/.claude/skills/worktree/SKILL.md
@@ -0,0 +1,85 @@
+---
+name: worktree
+description: Set up a new git worktree for parallel development. Creates the worktree, copies .env files, installs dependencies, and generates Prisma client. TRIGGER when user asks to set up a worktree, work on a branch in isolation, or needs a separate environment for a branch or PR.
+user-invocable: true
+args: "[name] — optional worktree name (e.g., 'AutoGPT7'). If omitted, uses next available AutoGPT<N>."
+metadata:
+  author: autogpt-team
+  version: "3.0.0"
+---
+
+# Worktree Setup
+
+## Create the worktree
+
+Derive paths from the git toplevel. If a name is provided as argument, use it. Otherwise, check `git worktree list` and pick the next `AutoGPT<N>`.
+
+```bash
+ROOT=$(git rev-parse --show-toplevel)
+PARENT=$(dirname "$ROOT")
+
+# From an existing branch
+git worktree add "$PARENT/<NAME>" <branch-name>
+
+# From a new branch off dev
+git worktree add -b <new-branch> "$PARENT/<NAME>" dev
+```
+
+## Copy environment files
+
+Copy `.env` from the root worktree. Falls back to `.env.default` if `.env` doesn't exist.
+
+```bash
+ROOT=$(git rev-parse --show-toplevel)
+TARGET="$(dirname "$ROOT")/<NAME>"
+
+for envpath in autogpt_platform/backend autogpt_platform/frontend autogpt_platform; do
+  if [ -f "$ROOT/$envpath/.env" ]; then
+    cp "$ROOT/$envpath/.env" "$TARGET/$envpath/.env"
+  elif [ -f "$ROOT/$envpath/.env.default" ]; then
+    cp "$ROOT/$envpath/.env.default" "$TARGET/$envpath/.env"
+  fi
+done
+```
+
+## Install dependencies
+
+```bash
+TARGET="$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
+cd "$TARGET/autogpt_platform/autogpt_libs" && poetry install
+cd "$TARGET/autogpt_platform/backend" && poetry install && poetry run prisma generate
+cd "$TARGET/autogpt_platform/frontend" && pnpm install
+```
+
+Replace `<NAME>` with the actual worktree name (e.g., `AutoGPT7`).
+
+## Running the app (optional)
+
+Backend uses ports: 8001, 8002, 8003, 8005, 8006, 8007, 8008. Free them first if needed:
+
+```bash
+TARGET="$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
+for port in 8001 8002 8003 8005 8006 8007 8008; do
+  lsof -ti :$port | xargs kill -9 2>/dev/null || true
+done
+cd "$TARGET/autogpt_platform/backend" && poetry run app
+```
+
+## CoPilot testing
+
+SDK mode spawns a Claude subprocess — won't work inside Claude Code. Set `CHAT_USE_CLAUDE_AGENT_SDK=false` in `backend/.env` to use baseline mode.
+
+## Cleanup
+
+```bash
+# Replace <NAME> with the actual worktree name (e.g., AutoGPT7)
+git worktree remove "$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
+```
+
+## Alternative: Branchlet (optional)
+
+If [branchlet](https://www.npmjs.com/package/branchlet) is installed:
+
+```bash
+branchlet create -n <name> -s <source-branch> -b <new-branch>
+```
--- a/.github/workflows/platform-backend-ci.yml
+++ b/.github/workflows/platform-backend-ci.yml
@@ -5,12 +5,14 @@ on:
    branches: [master, dev, ci-test*]
    paths:
      - ".github/workflows/platform-backend-ci.yml"
+      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/backend/**"
      - "autogpt_platform/autogpt_libs/**"
  pull_request:
    branches: [master, dev, release-*]
    paths:
      - ".github/workflows/platform-backend-ci.yml"
+      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/backend/**"
      - "autogpt_platform/autogpt_libs/**"
  merge_group:
--- a/.github/workflows/platform-frontend-ci.yml
+++ b/.github/workflows/platform-frontend-ci.yml
@@ -120,175 +120,6 @@ jobs:
          token: ${{ secrets.GITHUB_TOKEN }}
          exitOnceUploaded: true

-  e2e_test:
-    name: end-to-end tests
-    runs-on: big-boi
-
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-        with:
-          submodules: recursive
-
-      - name: Set up Platform - Copy default supabase .env
-        run: |
-          cp ../.env.default ../.env
-
-      - name: Set up Platform - Copy backend .env and set OpenAI API key
-        run: |
-          cp ../backend/.env.default ../backend/.env
-          echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
-        env:
-          # Used by E2E test data script to generate embeddings for approved store agents
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-
-      - name: Set up Platform - Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-        with:
-          driver: docker-container
-          driver-opts: network=host
-
-      - name: Set up Platform - Expose GHA cache to docker buildx CLI
-        uses: crazy-max/ghaction-github-runtime@v4
-
-      - name: Set up Platform - Build Docker images (with cache)
-        working-directory: autogpt_platform
-        run: |
-          pip install pyyaml
-
-          # Resolve extends and generate a flat compose file that bake can understand
-          docker compose -f docker-compose.yml config > docker-compose.resolved.yml
-
-          # Add cache configuration to the resolved compose file
-          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
-            --source docker-compose.resolved.yml \
-            --cache-from "type=gha" \
-            --cache-to "type=gha,mode=max" \
-            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend') }}" \
-            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src') }}" \
-            --git-ref "${{ github.ref }}"
-
-          # Build with bake using the resolved compose file (now includes cache config)
-          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Cache E2E test data
-        id: e2e-data-cache
-        uses: actions/cache@v5
-        with:
-          path: /tmp/e2e_test_data.sql
-          key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-frontend-ci.yml') }}
-
-      - name: Set up Platform - Start Supabase DB + Auth
-        run: |
-          docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
-          echo "Waiting for database to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
-          echo "Waiting for auth service to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
-
-      - name: Set up Platform - Run migrations
-        run: |
-          echo "Running migrations..."
-          docker compose -f ../docker-compose.resolved.yml run --rm migrate
-          echo "✅ Migrations completed"
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Load cached E2E test data
-        if: steps.e2e-data-cache.outputs.cache-hit == 'true'
-        run: |
-          echo "✅ Found cached E2E test data, restoring..."
-          {
-            echo "SET session_replication_role = 'replica';"
-            cat /tmp/e2e_test_data.sql
-            echo "SET session_replication_role = 'origin';"
-          } | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
-          # Refresh materialized views after restore
-          docker compose -f ../docker-compose.resolved.yml exec -T db \
-            psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
-
-          echo "✅ E2E test data restored from cache"
-
-      - name: Set up Platform - Start (all other services)
-        run: |
-          docker compose -f ../docker-compose.resolved.yml up -d --no-build
-          echo "Waiting for rest_server to be ready..."
-          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Create E2E test data
-        if: steps.e2e-data-cache.outputs.cache-hit != 'true'
-        run: |
-          echo "Creating E2E test data..."
-          docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
-          docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
-            echo "❌ E2E test data creation failed!"
-            docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
-            exit 1
-          }
-
-          # Dump auth.users + platform schema for cache (two separate dumps)
-          echo "Dumping database for cache..."
-          {
-            docker compose -f ../docker-compose.resolved.yml exec -T db \
-              pg_dump -U postgres --data-only --column-inserts \
-              --table='auth.users' postgres
-            docker compose -f ../docker-compose.resolved.yml exec -T db \
-              pg_dump -U postgres --data-only --column-inserts \
-              --schema=platform \
-              --exclude-table='platform._prisma_migrations' \
-              --exclude-table='platform.apscheduler_jobs' \
-              --exclude-table='platform.apscheduler_jobs_batched_notifications' \
-              postgres
-          } > /tmp/e2e_test_data.sql
-
-          echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
-
-      - name: Set up tests - Enable corepack
-        run: corepack enable
-
-      - name: Set up tests - Set up Node
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
-
-      - name: Set up tests - Install dependencies
-        run: pnpm install --frozen-lockfile
-
-      - name: Set up tests - Install browser 'chromium'
-        run: pnpm playwright install --with-deps chromium
-
-      - name: Run Playwright tests
-        run: pnpm test:no-build
-        continue-on-error: false
-
-      - name: Upload Playwright report
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: playwright-report
-          path: playwright-report
-          if-no-files-found: ignore
-          retention-days: 3
-
-      - name: Upload Playwright test results
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: playwright-test-results
-          path: test-results
-          if-no-files-found: ignore
-          retention-days: 3
-
-      - name: Print Final Docker Compose logs
-        if: always()
-        run: docker compose -f ../docker-compose.resolved.yml logs
-
  integration_test:
    runs-on: ubuntu-latest
    needs: setup
--- a/.github/workflows/platform-fullstack-ci.yml
+++ b/.github/workflows/platform-fullstack-ci.yml
@@ -1,14 +1,18 @@
-name: AutoGPT Platform - Frontend CI
+name: AutoGPT Platform - Full-stack CI

 on:
  push:
    branches: [master, dev]
    paths:
      - ".github/workflows/platform-fullstack-ci.yml"
+      - ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
+      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/**"
  pull_request:
    paths:
      - ".github/workflows/platform-fullstack-ci.yml"
+      - ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
+      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/**"
  merge_group:

@@ -24,42 +28,28 @@ defaults:
 jobs:
  setup:
    runs-on: ubuntu-latest
-    outputs:
-      cache-key: ${{ steps.cache-key.outputs.key }}

    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

-      - name: Set up Node.js
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22.18.0"
-
      - name: Enable corepack
        run: corepack enable

-      - name: Generate cache key
-        id: cache-key
-        run: echo "key=${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}" >> $GITHUB_OUTPUT
-
-      - name: Cache dependencies
-        uses: actions/cache@v5
+      - name: Set up Node
+        uses: actions/setup-node@v6
        with:
-          path: ~/.pnpm-store
-          key: ${{ steps.cache-key.outputs.key }}
-          restore-keys: |
-            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
-            ${{ runner.os }}-pnpm-
+          node-version: "22.18.0"
+          cache: "pnpm"
+          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

-      - name: Install dependencies
+      - name: Install dependencies to populate cache
        run: pnpm install --frozen-lockfile

-  types:
-    runs-on: big-boi
+  check-api-types:
+    name: check API types
+    runs-on: ubuntu-latest
    needs: setup
-    strategy:
-      fail-fast: false

    steps:
      - name: Checkout repository
@@ -67,70 +57,256 @@ jobs:
        with:
          submodules: recursive

-      - name: Set up Node.js
+      # ------------------------ Backend setup ------------------------
+
+      - name: Set up Backend - Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Set up Backend - Install Poetry
+        working-directory: autogpt_platform/backend
+        run: |
+          POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
+          echo "Installing Poetry version ${POETRY_VERSION}"
+          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$POETRY_VERSION python3 -
+
+      - name: Set up Backend - Set up dependency cache
+        uses: actions/cache@v5
+        with:
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+
+      - name: Set up Backend - Install dependencies
+        working-directory: autogpt_platform/backend
+        run: poetry install
+
+      - name: Set up Backend - Generate Prisma client
+        working-directory: autogpt_platform/backend
+        run: poetry run prisma generate && poetry run gen-prisma-stub
+
+      - name: Set up Frontend - Export OpenAPI schema from Backend
+        working-directory: autogpt_platform/backend
+        run: poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
+
+      # ------------------------ Frontend setup ------------------------
+
+      - name: Set up Frontend - Enable corepack
+        run: corepack enable
+
+      - name: Set up Frontend - Set up Node
        uses: actions/setup-node@v6
        with:
          node-version: "22.18.0"
+          cache: "pnpm"
+          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

-      - name: Enable corepack
-        run: corepack enable
-
-      - name: Copy default supabase .env
-        run: |
-          cp ../.env.default ../.env
-
-      - name: Copy backend .env
-        run: |
-          cp ../backend/.env.default ../backend/.env
-
-      - name: Run docker compose
-        run: |
-          docker compose -f ../docker-compose.yml --profile local up -d deps_backend
-
-      - name: Restore dependencies cache
-        uses: actions/cache@v5
-        with:
-          path: ~/.pnpm-store
-          key: ${{ needs.setup.outputs.cache-key }}
-          restore-keys: |
-            ${{ runner.os }}-pnpm-
-
-      - name: Install dependencies
+      - name: Set up Frontend - Install dependencies
        run: pnpm install --frozen-lockfile

-      - name: Setup .env
-        run: cp .env.default .env
-
-      - name: Wait for services to be ready
-        run: |
-          echo "Waiting for rest_server to be ready..."
-          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
-          echo "Waiting for database to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done' || echo "Database ready check timeout, continuing..."
-
-      - name: Generate API queries
-        run: pnpm generate:api:force
+      - name: Set up Frontend - Format OpenAPI schema
+        id: format-schema
+        run: pnpm prettier --write ./src/app/api/openapi.json

      - name: Check for API schema changes
        run: |
          if ! git diff --exit-code src/app/api/openapi.json; then
            echo "❌ API schema changes detected in src/app/api/openapi.json"
            echo ""
-            echo "The openapi.json file has been modified after running 'pnpm generate:api-all'."
+            echo "The openapi.json file has been modified after exporting the API schema."
            echo "This usually means changes have been made in the BE endpoints without updating the Frontend."
            echo "The API schema is now out of sync with the Front-end queries."
            echo ""
            echo "To fix this:"
-            echo "1. Pull the backend 'docker compose pull && docker compose up -d --build --force-recreate'"
-            echo "2. Run 'pnpm generate:api' locally"
-            echo "3. Run 'pnpm types' locally"
-            echo "4. Fix any TypeScript errors that may have been introduced"
-            echo "5. Commit and push your changes"
+            echo "\nIn the backend directory:"
+            echo "1. Run 'poetry run export-api-schema --output ../frontend/src/app/api/openapi.json'"
+            echo "\nIn the frontend directory:"
+            echo "2. Run 'pnpm prettier --write src/app/api/openapi.json'"
+            echo "3. Run 'pnpm generate:api'"
+            echo "4. Run 'pnpm types'"
+            echo "5. Fix any TypeScript errors that may have been introduced"
+            echo "6. Commit and push your changes"
            echo ""
            exit 1
          else
            echo "✅ No API schema changes detected"
          fi

-      - name: Run Typescript checks
+      - name: Set up Frontend - Generate API client
+        id: generate-api-client
+        run: pnpm orval --config ./orval.config.ts
+        # Continue with type generation & check even if there are schema changes
+        if: success() || (steps.format-schema.outcome == 'success')
+
+      - name: Check for TypeScript errors
        run: pnpm types
+        if: success() || (steps.generate-api-client.outcome == 'success')
+
+  e2e_test:
+    name: end-to-end tests
+    runs-on: big-boi
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          submodules: recursive
+
+      - name: Set up Platform - Copy default supabase .env
+        run: |
+          cp ../.env.default ../.env
+
+      - name: Set up Platform - Copy backend .env and set OpenAI API key
+        run: |
+          cp ../backend/.env.default ../backend/.env
+          echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
+        env:
+          # Used by E2E test data script to generate embeddings for approved store agents
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+
+      - name: Set up Platform - Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+        with:
+          driver: docker-container
+          driver-opts: network=host
+
+      - name: Set up Platform - Expose GHA cache to docker buildx CLI
+        uses: crazy-max/ghaction-github-runtime@v4
+
+      - name: Set up Platform - Build Docker images (with cache)
+        working-directory: autogpt_platform
+        run: |
+          pip install pyyaml
+
+          # Resolve extends and generate a flat compose file that bake can understand
+          docker compose -f docker-compose.yml config > docker-compose.resolved.yml
+
+          # Add cache configuration to the resolved compose file
+          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
+            --source docker-compose.resolved.yml \
+            --cache-from "type=gha" \
+            --cache-to "type=gha,mode=max" \
+            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
+            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
+            --git-ref "${{ github.ref }}"
+
+          # Build with bake using the resolved compose file (now includes cache config)
+          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+
+      - name: Set up tests - Cache E2E test data
+        id: e2e-data-cache
+        uses: actions/cache@v5
+        with:
+          path: /tmp/e2e_test_data.sql
+          key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-fullstack-ci.yml') }}
+
+      - name: Set up Platform - Start Supabase DB + Auth
+        run: |
+          docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
+          echo "Waiting for database to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
+          echo "Waiting for auth service to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
+
+      - name: Set up Platform - Run migrations
+        run: |
+          echo "Running migrations..."
+          docker compose -f ../docker-compose.resolved.yml run --rm migrate
+          echo "✅ Migrations completed"
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+
+      - name: Set up tests - Load cached E2E test data
+        if: steps.e2e-data-cache.outputs.cache-hit == 'true'
+        run: |
+          echo "✅ Found cached E2E test data, restoring..."
+          {
+            echo "SET session_replication_role = 'replica';"
+            cat /tmp/e2e_test_data.sql
+            echo "SET session_replication_role = 'origin';"
+          } | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
+          # Refresh materialized views after restore
+          docker compose -f ../docker-compose.resolved.yml exec -T db \
+            psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
+
+          echo "✅ E2E test data restored from cache"
+
+      - name: Set up Platform - Start (all other services)
+        run: |
+          docker compose -f ../docker-compose.resolved.yml up -d --no-build
+          echo "Waiting for rest_server to be ready..."
+          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+
+      - name: Set up tests - Create E2E test data
+        if: steps.e2e-data-cache.outputs.cache-hit != 'true'
+        run: |
+          echo "Creating E2E test data..."
+          docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
+          docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
+            echo "❌ E2E test data creation failed!"
+            docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
+            exit 1
+          }
+
+          # Dump auth.users + platform schema for cache (two separate dumps)
+          echo "Dumping database for cache..."
+          {
+            docker compose -f ../docker-compose.resolved.yml exec -T db \
+              pg_dump -U postgres --data-only --column-inserts \
+              --table='auth.users' postgres
+            docker compose -f ../docker-compose.resolved.yml exec -T db \
+              pg_dump -U postgres --data-only --column-inserts \
+              --schema=platform \
+              --exclude-table='platform._prisma_migrations' \
+              --exclude-table='platform.apscheduler_jobs' \
+              --exclude-table='platform.apscheduler_jobs_batched_notifications' \
+              postgres
+          } > /tmp/e2e_test_data.sql
+
+          echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
+
+      - name: Set up tests - Enable corepack
+        run: corepack enable
+
+      - name: Set up tests - Set up Node
+        uses: actions/setup-node@v6
+        with:
+          node-version: "22.18.0"
+          cache: "pnpm"
+          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+
+      - name: Set up tests - Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Set up tests - Install browser 'chromium'
+        run: pnpm playwright install --with-deps chromium
+
+      - name: Run Playwright tests
+        run: pnpm test:no-build
+        continue-on-error: false
+
+      - name: Upload Playwright report
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-report
+          path: playwright-report
+          if-no-files-found: ignore
+          retention-days: 3
+
+      - name: Upload Playwright test results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-test-results
+          path: test-results
+          if-no-files-found: ignore
+          retention-days: 3
+
+      - name: Print Final Docker Compose logs
+        if: always()
+        run: docker compose -f ../docker-compose.resolved.yml logs
--- a/autogpt_platform/CLAUDE.md
+++ b/autogpt_platform/CLAUDE.md
@@ -60,9 +60,12 @@ AutoGPT Platform is a monorepo containing:

 ### Reviewing/Revising Pull Requests

- When the user runs /pr-comments or tries to fetch them, also run gh api /repos/Significant-Gravitas/AutoGPT/pulls/[issuenum]/reviews to get the reviews
- Use gh api /repos/Significant-Gravitas/AutoGPT/pulls/[issuenum]/reviews/[review_id]/comments to get the review contents
- Use gh api /repos/Significant-Gravitas/AutoGPT/issues/9924/comments to get the pr specific comments
+Use `/pr-review` to review a PR or `/pr-address` to address comments.
+
+When fetching comments manually:
+- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` — top-level reviews
+- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` — inline review comments
+- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments

 ### Conventional Commits

--- a/autogpt_platform/backend/CLAUDE.md
+++ b/autogpt_platform/backend/CLAUDE.md
@@ -58,10 +58,31 @@ poetry run pytest path/to/test.py --snapshot-update
 - **Authentication**: JWT-based with Supabase integration
 - **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies

+## Code Style
+
+- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
+- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
+- **Pydantic models** over dataclass/namedtuple/dict for structured data
+- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
+- **List comprehensions** over manual loop-and-append
+- **Early return** — guard clauses first, avoid deep nesting
+- **Lazy `%s` logging** — `logger.info("Processing %s items", count)` not `logger.info(f"Processing {count} items")`
+- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
+- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
+- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
+- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
+- **`max(0, value)` guards** — for computed values that should never be negative
+- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
+- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
+- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
+
 ## Testing Approach

 - Uses pytest with snapshot testing for API responses
 - Test files are colocated with source files (`*_test.py`)
+- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
+- After refactoring, update mock targets to match new module paths
+- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)

 ## Database Schema

--- a/autogpt_platform/backend/backend/blocks/github/commits.py
+++ b/autogpt_platform/backend/backend/blocks/github/commits.py
@@ -11,7 +11,10 @@ from backend.blocks._base import (
    BlockSchemaInput,
    BlockSchemaOutput,
 )
+from backend.data.execution import ExecutionContext
 from backend.data.model import SchemaField
+from backend.util.file import parse_data_uri, resolve_media_content
+from backend.util.type import MediaFileType

 from ._api import get_api
 from ._auth import (
@@ -178,7 +181,8 @@ class FileOperation(StrEnum):

 class FileOperationInput(TypedDict):
    path: str
-    content: str
+    # MediaFileType is a str NewType — no runtime breakage for existing callers.
+    content: MediaFileType
    operation: FileOperation


@@ -275,11 +279,11 @@ class GithubMultiFileCommitBlock(Block):
        base_tree_sha = commit_data["tree"]["sha"]

        # 3. Build tree entries for each file operation (blobs created concurrently)
-        async def _create_blob(content: str) -> str:
+        async def _create_blob(content: str, encoding: str = "utf-8") -> str:
            blob_url = repo_url + "/git/blobs"
            blob_response = await api.post(
                blob_url,
-                json={"content": content, "encoding": "utf-8"},
+                json={"content": content, "encoding": encoding},
            )
            return blob_response.json()["sha"]

@@ -301,10 +305,19 @@ class GithubMultiFileCommitBlock(Block):
            else:
                upsert_files.append((path, file_op.get("content", "")))

-        # Create all blobs concurrently
+        # Create all blobs concurrently. Data URIs (from store_media_file)
+        # are sent as base64 blobs to preserve binary content.
        if upsert_files:
+
+            async def _make_blob(content: str) -> str:
+                parsed = parse_data_uri(content)
+                if parsed is not None:
+                    _, b64_payload = parsed
+                    return await _create_blob(b64_payload, encoding="base64")
+                return await _create_blob(content)
+
            blob_shas = await asyncio.gather(
-                *[_create_blob(content) for _, content in upsert_files]
+                *[_make_blob(content) for _, content in upsert_files]
            )
            for (path, _), blob_sha in zip(upsert_files, blob_shas):
                tree_entries.append(
@@ -358,15 +371,36 @@ class GithubMultiFileCommitBlock(Block):
        input_data: Input,
        *,
        credentials: GithubCredentials,
+        execution_context: ExecutionContext,
        **kwargs,
    ) -> BlockOutput:
        try:
+            # Resolve media references (workspace://, data:, URLs) to data
+            # URIs so _make_blob can send binary content correctly.
+            resolved_files: list[FileOperationInput] = []
+            for file_op in input_data.files:
+                content = file_op.get("content", "")
+                operation = FileOperation(file_op.get("operation", "upsert"))
+                if operation != FileOperation.DELETE:
+                    content = await resolve_media_content(
+                        MediaFileType(content),
+                        execution_context,
+                        return_format="for_external_api",
+                    )
+                resolved_files.append(
+                    FileOperationInput(
+                        path=file_op["path"],
+                        content=MediaFileType(content),
+                        operation=operation,
+                    )
+                )
+
            sha, url = await self.multi_file_commit(
                credentials,
                input_data.repo_url,
                input_data.branch,
                input_data.commit_message,
-                input_data.files,
+                resolved_files,
            )
            yield "sha", sha
            yield "url", url
--- a/autogpt_platform/backend/backend/blocks/github/test_github_blocks.py
+++ b/autogpt_platform/backend/backend/blocks/github/test_github_blocks.py
@@ -8,6 +8,7 @@ from backend.blocks.github.pull_requests import (
    GithubMergePullRequestBlock,
    prepare_pr_api_url,
 )
+from backend.data.execution import ExecutionContext
 from backend.util.exceptions import BlockExecutionError

 # ── prepare_pr_api_url tests ──
@@ -97,7 +98,11 @@ async def test_multi_file_commit_error_path():
        "credentials": TEST_CREDENTIALS_INPUT,
    }
    with pytest.raises(BlockExecutionError, match="ref update failed"):
-        async for _ in block.execute(input_data, credentials=TEST_CREDENTIALS):
+        async for _ in block.execute(
+            input_data,
+            credentials=TEST_CREDENTIALS,
+            execution_context=ExecutionContext(),
+        ):
            pass


--- a/autogpt_platform/backend/backend/copilot/baseline/service.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service.py
@@ -40,7 +40,7 @@ from backend.copilot.response_model import (
 from backend.copilot.service import (
    _build_system_prompt,
    _generate_session_title,
-    client,
+    _get_openai_client,
    config,
 )
 from backend.copilot.tools import execute_tool, get_available_tools
@@ -89,7 +89,7 @@ async def _compress_session_messages(
        result = await compress_context(
            messages=messages_dict,
            model=config.model,
-            client=client,
+            client=_get_openai_client(),
        )
    except Exception as e:
        logger.warning("[Baseline] Context compression with LLM failed: %s", e)
@@ -235,7 +235,7 @@ async def stream_chat_completion_baseline(
            )
            if tools:
                create_kwargs["tools"] = tools
-            response = await client.chat.completions.create(**create_kwargs)  # type: ignore[arg-type]  # dynamic kwargs
+            response = await _get_openai_client().chat.completions.create(**create_kwargs)  # type: ignore[arg-type]  # dynamic kwargs

            # Accumulate streamed response (text + tool calls)
            round_text = ""
--- a/autogpt_platform/backend/backend/copilot/config.py
+++ b/autogpt_platform/backend/backend/copilot/config.py
@@ -94,6 +94,11 @@ class ChatConfig(BaseSettings):
        description="Use --resume for multi-turn conversations instead of "
        "history compression. Falls back to compression when unavailable.",
    )
+    use_openrouter: bool = Field(
+        default=True,
+        description="Route API calls through OpenRouter proxy. When False, the SDK "
+        "uses ANTHROPIC_API_KEY from the environment directly (no proxy hop).",
+    )
    use_claude_code_subscription: bool = Field(
        default=False,
        description="For personal/dev use: use Claude Code CLI subscription auth instead of API keys. Requires `claude login` on the host. Only works with SDK mode.",
@@ -209,6 +214,15 @@ class ChatConfig(BaseSettings):
        # Default to True (SDK enabled by default)
        return True if v is None else v

+    @field_validator("use_openrouter", mode="before")
+    @classmethod
+    def get_use_openrouter(cls, v):
+        """Get use_openrouter from environment if not provided."""
+        env_val = os.getenv("CHAT_USE_OPENROUTER", "").lower()
+        if env_val:
+            return env_val in ("true", "1", "yes", "on")
+        return True if v is None else v
+
    @field_validator("use_claude_code_subscription", mode="before")
    @classmethod
    def get_use_claude_code_subscription(cls, v):
--- a/autogpt_platform/backend/backend/copilot/constants.py
+++ b/autogpt_platform/backend/backend/copilot/constants.py
@@ -4,6 +4,9 @@
 # The hex suffix makes accidental LLM generation of these strings virtually
 # impossible, avoiding false-positive marker detection in normal conversation.
 COPILOT_ERROR_PREFIX = "[__COPILOT_ERROR_f7a1__]"  # Renders as ErrorCard
+COPILOT_RETRYABLE_ERROR_PREFIX = (
+    "[__COPILOT_RETRYABLE_ERROR_a9c2__]"  # ErrorCard + retry
+)
 COPILOT_SYSTEM_PREFIX = "[__COPILOT_SYSTEM_e3b0__]"  # Renders as system info message

 # Prefix for all synthetic IDs generated by CoPilot block execution.
@@ -35,3 +38,24 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
    Format: "{node_id}:{random_hex}" → returns "{node_id}".
    """
    return node_exec_id.rsplit(COPILOT_NODE_EXEC_ID_SEPARATOR, 1)[0]
+
+
+# ---------------------------------------------------------------------------
+# Transient Anthropic API error detection
+# ---------------------------------------------------------------------------
+# Patterns in error text that indicate a transient Anthropic API error
+# (ECONNRESET / dropped TCP connection) which is retryable.
+_TRANSIENT_ERROR_PATTERNS = (
+    "socket connection was closed unexpectedly",
+    "ECONNRESET",
+    "connection was forcibly closed",
+    "network socket disconnected",
+)
+
+FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"
+
+
+def is_transient_api_error(error_text: str) -> bool:
+    """Return True if *error_text* matches a known transient Anthropic API error."""
+    lower = error_text.lower()
+    return any(pat.lower() in lower for pat in _TRANSIENT_ERROR_PATTERNS)
--- a/autogpt_platform/backend/backend/copilot/context.py
+++ b/autogpt_platform/backend/backend/copilot/context.py
@@ -11,6 +11,8 @@ from contextvars import ContextVar
 from typing import TYPE_CHECKING

 from backend.copilot.model import ChatSession
+from backend.data.db_accessors import workspace_db
+from backend.util.workspace import WorkspaceManager

 if TYPE_CHECKING:
    from e2b import AsyncSandbox
@@ -82,6 +84,17 @@ def resolve_sandbox_path(path: str) -> str:
    return normalized


+async def get_workspace_manager(user_id: str, session_id: str) -> WorkspaceManager:
+    """Create a session-scoped :class:`WorkspaceManager`.
+
+    Placed here (rather than in ``tools/workspace_files``) so that modules
+    like ``sdk/file_ref`` can import it without triggering the heavy
+    ``tools/__init__`` import chain.
+    """
+    workspace = await workspace_db().get_or_create_workspace(user_id)
+    return WorkspaceManager(user_id, workspace.id, session_id)
+
+
 def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
    """Return True if *path* is within an allowed host-filesystem location.

--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -52,11 +52,43 @@ Examples:
 You can embed a reference inside any string argument, or use it as the entire
 value.  Multiple references in one argument are all expanded.

-**Type coercion**: The platform automatically coerces expanded string values
-to match the block's expected input types.  For example, if a block expects
-`list[list[str]]` and you pass a string containing a JSON array (e.g. from
-an @@agptfile: expansion), the string will be parsed into the correct type.
+**Structured data**: When the **entire** argument value is a single file
+reference (no surrounding text), the platform automatically parses the file
+content based on its extension or MIME type.  Supported formats: JSON, JSONL,
+CSV, TSV, YAML, TOML, Parquet, and Excel (.xlsx — first sheet only).
+For example, pass `@@agptfile:workspace://<id>` where the file is a `.csv` and
+the rows will be parsed into `list[list[str]]` automatically.  If the format is
+unrecognised or parsing fails, the content is returned as a plain string.
+Legacy `.xls` files are **not** supported — only the modern `.xlsx` format.

+**Type coercion**: The platform also coerces expanded values to match the
+block's expected input types.  For example, if a block expects `list[list[str]]`
+and the expanded value is a JSON string, it will be parsed into the correct type.
+
+### Media file inputs (format: "file")
+Some block inputs accept media files — their schema shows `"format": "file"`.
+These fields accept:
+- **`workspace://<file_id>`** or **`workspace://<file_id>#<mime>`** — preferred
+  for large files (images, videos, PDFs). The platform passes the reference
+  directly to the block without reading the content into memory.
+- **`data:<mime>;base64,<payload>`** — inline base64 data URI, suitable for
+  small files only.
+
+When a block input has `format: "file"`, **pass the `workspace://` URI
+directly as the value** (do NOT wrap it in `@@agptfile:`). This avoids large
+payloads in tool arguments and preserves binary content (images, videos)
+that would be corrupted by text encoding.
+
+Example — committing an image file to GitHub:
+```json
+{
+  "files": [{
+    "path": "docs/hero.png",
+    "content": "workspace://abc123#image/png",
+    "operation": "upsert"
+  }]
+}
+```

 ### Sub-agent tasks
 - When using the Task tool, NEVER set `run_in_background` to true.
--- a/autogpt_platform/backend/backend/copilot/sdk/init.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/init.py
@@ -3,12 +3,45 @@
 This module provides the integration layer between the Claude Agent SDK
 and the existing CoPilot tool system, enabling drop-in replacement of
 the current LLM orchestration with the battle-tested Claude Agent SDK.
+
+Submodule imports are deferred via PEP 562 ``__getattr__`` to break a
+circular import cycle::
+
+    sdk/__init__ → tool_adapter → copilot.tools (TOOL_REGISTRY)
+    copilot.tools → run_block → sdk.file_ref  (no cycle here, but…)
+    sdk/__init__ → service → copilot.prompting → copilot.tools  (cycle!)
+
+``tool_adapter`` uses ``TOOL_REGISTRY`` at **module level** to build the
+static ``COPILOT_TOOL_NAMES`` list, so the import cannot be deferred to
+function scope without a larger refactor (moving tool-name registration
+to a separate lightweight module).  The lazy-import pattern here is the
+least invasive way to break the cycle while keeping module-level constants
+intact.
 """

-from .service import stream_chat_completion_sdk
-from .tool_adapter import create_copilot_mcp_server
+from typing import Any

 __all__ = [
    "stream_chat_completion_sdk",
    "create_copilot_mcp_server",
 ]
+
+# Dispatch table for PEP 562 lazy imports.  Each entry is a (module, attr)
+# pair so new exports can be added without touching __getattr__ itself.
+_LAZY_IMPORTS: dict[str, tuple[str, str]] = {
+    "stream_chat_completion_sdk": (".service", "stream_chat_completion_sdk"),
+    "create_copilot_mcp_server": (".tool_adapter", "create_copilot_mcp_server"),
+}
+
+
+def __getattr__(name: str) -> Any:
+    entry = _LAZY_IMPORTS.get(name)
+    if entry is not None:
+        module_path, attr = entry
+        import importlib
+
+        module = importlib.import_module(module_path, package=__name__)
+        value = getattr(module, attr)
+        globals()[name] = value
+        return value
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
--- a/autogpt_platform/backend/backend/copilot/sdk/file_ref.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/file_ref.py
@@ -41,12 +41,20 @@ from typing import Any
 from backend.copilot.context import (
    get_current_sandbox,
    get_sdk_cwd,
+    get_workspace_manager,
    is_allowed_local_path,
    resolve_sandbox_path,
 )
 from backend.copilot.model import ChatSession
-from backend.copilot.tools.workspace_files import get_manager
 from backend.util.file import parse_workspace_uri
+from backend.util.file_content_parser import (
+    BINARY_FORMATS,
+    MIME_TO_FORMAT,
+    PARSE_EXCEPTIONS,
+    infer_format_from_uri,
+    parse_file_content,
+)
+from backend.util.type import MediaFileType


 class FileRefExpansionError(Exception):
@@ -74,6 +82,8 @@ _FILE_REF_RE = re.compile(
 _MAX_EXPAND_CHARS = 200_000
 # Maximum total characters across all @@agptfile: expansions in one string.
 _MAX_TOTAL_EXPAND_CHARS = 1_000_000
+# Maximum raw byte size for bare ref structured parsing (10 MB).
+_MAX_BARE_REF_BYTES = 10_000_000


@dataclass
@@ -83,6 +93,11 @@ class FileRef:
    end_line: int | None  # 1-indexed, inclusive


+# ---------------------------------------------------------------------------
+# Public API  (top-down: main functions first, helpers below)
+# ---------------------------------------------------------------------------
+
+
 def parse_file_ref(text: str) -> FileRef | None:
    """Return a :class:`FileRef` if *text* is a bare file reference token.

@@ -104,17 +119,6 @@ def parse_file_ref(text: str) -> FileRef | None:
    return FileRef(uri=m.group(1), start_line=start, end_line=end)


-def _apply_line_range(text: str, start: int | None, end: int | None) -> str:
-    """Slice *text* to the requested 1-indexed line range (inclusive)."""
-    if start is None and end is None:
-        return text
-    lines = text.splitlines(keepends=True)
-    s = (start - 1) if start is not None else 0
-    e = end if end is not None else len(lines)
-    selected = list(itertools.islice(lines, s, e))
-    return "".join(selected)
-
-
 async def read_file_bytes(
    uri: str,
    user_id: str | None,
@@ -130,27 +134,47 @@ async def read_file_bytes(
    if plain.startswith("workspace://"):
        if not user_id:
            raise ValueError("workspace:// file references require authentication")
-        manager = await get_manager(user_id, session.session_id)
+        manager = await get_workspace_manager(user_id, session.session_id)
        ws = parse_workspace_uri(plain)
        try:
-            return await (
+            data = await (
                manager.read_file(ws.file_ref)
                if ws.is_path
                else manager.read_file_by_id(ws.file_ref)
            )
        except FileNotFoundError:
            raise ValueError(f"File not found: {plain}")
-        except Exception as exc:
+        except (PermissionError, OSError) as exc:
            raise ValueError(f"Failed to read {plain}: {exc}") from exc
+        except (AttributeError, TypeError, RuntimeError) as exc:
+            # AttributeError/TypeError: workspace manager returned an
+            # unexpected type or interface; RuntimeError: async runtime issues.
+            logger.warning("Unexpected error reading %s: %s", plain, exc)
+            raise ValueError(f"Failed to read {plain}: {exc}") from exc
+        # NOTE: Workspace API does not support pre-read size checks;
+        # the full file is loaded before the size guard below.
+        if len(data) > _MAX_BARE_REF_BYTES:
+            raise ValueError(
+                f"File too large ({len(data)} bytes, limit {_MAX_BARE_REF_BYTES})"
+            )
+        return data

    if is_allowed_local_path(plain, get_sdk_cwd()):
        resolved = os.path.realpath(os.path.expanduser(plain))
        try:
+            # Read with a one-byte overshoot to detect files that exceed the limit
+            # without a separate os.path.getsize call (avoids TOCTOU race).
            with open(resolved, "rb") as fh:
-                return fh.read()
+                data = fh.read(_MAX_BARE_REF_BYTES + 1)
+            if len(data) > _MAX_BARE_REF_BYTES:
+                raise ValueError(
+                    f"File too large (>{_MAX_BARE_REF_BYTES} bytes, "
+                    f"limit {_MAX_BARE_REF_BYTES})"
+                )
+            return data
        except FileNotFoundError:
            raise ValueError(f"File not found: {plain}")
-        except Exception as exc:
+        except OSError as exc:
            raise ValueError(f"Failed to read {plain}: {exc}") from exc

    sandbox = get_current_sandbox()
@@ -162,9 +186,33 @@ async def read_file_bytes(
                f"Path is not allowed (not in workspace, sdk_cwd, or sandbox): {plain}"
            ) from exc
        try:
-            return bytes(await sandbox.files.read(remote, format="bytes"))
-        except Exception as exc:
+            data = bytes(await sandbox.files.read(remote, format="bytes"))
+        except (FileNotFoundError, OSError, UnicodeDecodeError) as exc:
            raise ValueError(f"Failed to read from sandbox: {plain}: {exc}") from exc
+        except Exception as exc:
+            # E2B SDK raises SandboxException subclasses (NotFoundException,
+            # TimeoutException, NotEnoughSpaceException, etc.) which don't
+            # inherit from standard exceptions.  Import lazily to avoid a
+            # hard dependency on e2b at module level.
+            try:
+                from e2b.exceptions import SandboxException  # noqa: PLC0415
+
+                if isinstance(exc, SandboxException):
+                    raise ValueError(
+                        f"Failed to read from sandbox: {plain}: {exc}"
+                    ) from exc
+            except ImportError:
+                pass
+            # Re-raise unexpected exceptions (TypeError, AttributeError, etc.)
+            # so they surface as real bugs rather than being silently masked.
+            raise
+        # NOTE: E2B sandbox API does not support pre-read size checks;
+        # the full file is loaded before the size guard below.
+        if len(data) > _MAX_BARE_REF_BYTES:
+            raise ValueError(
+                f"File too large ({len(data)} bytes, limit {_MAX_BARE_REF_BYTES})"
+            )
+        return data

    raise ValueError(
        f"Path is not allowed (not in workspace, sdk_cwd, or sandbox): {plain}"
@@ -178,15 +226,13 @@ async def resolve_file_ref(
 ) -> str:
    """Resolve a :class:`FileRef` to its text content."""
    raw = await read_file_bytes(ref.uri, user_id, session)
-    return _apply_line_range(
-        raw.decode("utf-8", errors="replace"), ref.start_line, ref.end_line
-    )
+    return _apply_line_range(_to_str(raw), ref.start_line, ref.end_line)


 async def expand_file_refs_in_string(
    text: str,
    user_id: str | None,
-    session: "ChatSession",
+    session: ChatSession,
    *,
    raise_on_error: bool = False,
 ) -> str:
@@ -232,6 +278,9 @@ async def expand_file_refs_in_string(
            if len(content) > _MAX_EXPAND_CHARS:
                content = content[:_MAX_EXPAND_CHARS] + "\n... [truncated]"
            remaining = _MAX_TOTAL_EXPAND_CHARS - total_chars
+            # remaining == 0 means the budget was exactly exhausted by the
+            # previous ref.  The elif below (len > remaining) won't catch
+            # this since 0 > 0 is false, so we need the <= 0 check.
            if remaining <= 0:
                content = "[file-ref budget exhausted: total expansion limit reached]"
            elif len(content) > remaining:
@@ -252,13 +301,31 @@ async def expand_file_refs_in_string(
 async def expand_file_refs_in_args(
    args: dict[str, Any],
    user_id: str | None,
-    session: "ChatSession",
+    session: ChatSession,
+    *,
+    input_schema: dict[str, Any] | None = None,
 ) -> dict[str, Any]:
    """Recursively expand ``@@agptfile:...`` references in tool call arguments.

    String values are expanded in-place.  Nested dicts and lists are
    traversed.  Non-string scalars are returned unchanged.

+    **Bare references** (the entire argument value is a single
+    ``@@agptfile:...`` token with no surrounding text) are resolved and then
+    parsed according to the file's extension or MIME type.  See
+    :mod:`backend.util.file_content_parser` for the full list of supported
+    formats (JSON, JSONL, CSV, TSV, YAML, TOML, Parquet, Excel).
+
+    When *input_schema* is provided and the target property has
+    ``"type": "string"``, structured parsing is skipped — the raw file content
+    is returned as a plain string so blocks receive the original text.
+
+    If the format is unrecognised or parsing fails, the content is returned as
+    a plain string (the fallback).
+
+    **Embedded references** (``@@agptfile:`` mixed with other text) always
+    produce a plain string — structured parsing only applies to bare refs.
+
    Raises :class:`FileRefExpansionError` if any reference fails to resolve,
    so the tool is *not* executed with an error string as its input.  The
    caller (the MCP tool wrapper) should convert this into an MCP error
@@ -267,15 +334,382 @@ async def expand_file_refs_in_args(
    if not args:
        return args

-    async def _expand(value: Any) -> Any:
+    properties = (input_schema or {}).get("properties", {})
+
+    async def _expand(
+        value: Any,
+        *,
+        prop_schema: dict[str, Any] | None = None,
+    ) -> Any:
+        """Recursively expand a single argument value.
+
+        Strings are checked for ``@@agptfile:`` references and expanded
+        (bare refs get structured parsing; embedded refs get inline
+        substitution).  Dicts and lists are traversed recursively,
+        threading the corresponding sub-schema from *prop_schema* so
+        that nested fields also receive correct type-aware expansion.
+        Non-string scalars pass through unchanged.
+        """
        if isinstance(value, str):
+            ref = parse_file_ref(value)
+            if ref is not None:
+                # MediaFileType fields: return the raw URI immediately —
+                # no file reading, no format inference, no content parsing.
+                if _is_media_file_field(prop_schema):
+                    return ref.uri
+
+                fmt = infer_format_from_uri(ref.uri)
+                # Workspace URIs by ID (workspace://abc123) have no extension.
+                # When the MIME fragment is also missing, fall back to the
+                # workspace file manager's metadata for format detection.
+                if fmt is None and ref.uri.startswith("workspace://"):
+                    fmt = await _infer_format_from_workspace(ref.uri, user_id, session)
+                return await _expand_bare_ref(ref, fmt, user_id, session, prop_schema)
+
+            # Not a bare ref — do normal inline expansion.
            return await expand_file_refs_in_string(
                value, user_id, session, raise_on_error=True
            )
        if isinstance(value, dict):
-            return {k: await _expand(v) for k, v in value.items()}
+            # When the schema says this is an object but doesn't define
+            # inner properties, skip expansion — the caller (e.g.
+            # RunBlockTool) will expand with the actual nested schema.
+            if (
+                prop_schema is not None
+                and prop_schema.get("type") == "object"
+                and "properties" not in prop_schema
+            ):
+                return value
+            nested_props = (prop_schema or {}).get("properties", {})
+            return {
+                k: await _expand(v, prop_schema=nested_props.get(k))
+                for k, v in value.items()
+            }
        if isinstance(value, list):
-            return [await _expand(item) for item in value]
+            items_schema = (prop_schema or {}).get("items")
+            return [await _expand(item, prop_schema=items_schema) for item in value]
        return value

-    return {k: await _expand(v) for k, v in args.items()}
+    return {k: await _expand(v, prop_schema=properties.get(k)) for k, v in args.items()}
+
+
+# ---------------------------------------------------------------------------
+# Private helpers  (used by the public functions above)
+# ---------------------------------------------------------------------------
+
+
+def _apply_line_range(text: str, start: int | None, end: int | None) -> str:
+    """Slice *text* to the requested 1-indexed line range (inclusive).
+
+    When the requested range extends beyond the file, a note is appended
+    so the LLM knows it received the entire remaining content.
+    """
+    if start is None and end is None:
+        return text
+    lines = text.splitlines(keepends=True)
+    total = len(lines)
+    s = (start - 1) if start is not None else 0
+    e = end if end is not None else total
+    selected = list(itertools.islice(lines, s, e))
+    result = "".join(selected)
+    if end is not None and end > total:
+        result += f"\n[Note: file has only {total} lines]\n"
+    return result
+
+
+def _to_str(content: str | bytes) -> str:
+    """Decode *content* to a string if it is bytes, otherwise return as-is."""
+    if isinstance(content, str):
+        return content
+    return content.decode("utf-8", errors="replace")
+
+
+def _check_content_size(content: str | bytes) -> None:
+    """Raise :class:`ValueError` if *content* exceeds the byte limit.
+
+    Raises ``ValueError`` (not ``FileRefExpansionError``) so that the caller
+    (``_expand_bare_ref``) can unify all resolution errors into a single
+    ``except ValueError`` → ``FileRefExpansionError`` handler, keeping the
+    error-flow consistent with ``read_file_bytes`` and ``resolve_file_ref``.
+
+    For ``bytes``, the length is the byte count directly.  For ``str``,
+    we encode to UTF-8 first because multi-byte characters (e.g. emoji)
+    mean the byte size can be up to 4x the character count.
+    """
+    if isinstance(content, bytes):
+        size = len(content)
+    else:
+        char_len = len(content)
+        # Fast lower bound: UTF-8 byte count >= char count.
+        # If char count already exceeds the limit, reject immediately
+        # without allocating an encoded copy.
+        if char_len > _MAX_BARE_REF_BYTES:
+            size = char_len  # real byte size is even larger
+        # Fast upper bound: each char is at most 4 UTF-8 bytes.
+        # If worst-case is still under the limit, skip encoding entirely.
+        elif char_len * 4 <= _MAX_BARE_REF_BYTES:
+            return
+        else:
+            # Edge case: char count is under limit but multibyte chars
+            # might push byte count over. Encode to get exact size.
+            size = len(content.encode("utf-8"))
+    if size > _MAX_BARE_REF_BYTES:
+        raise ValueError(
+            f"File too large for structured parsing "
+            f"({size} bytes, limit {_MAX_BARE_REF_BYTES})"
+        )
+
+
+async def _infer_format_from_workspace(
+    uri: str,
+    user_id: str | None,
+    session: ChatSession,
+) -> str | None:
+    """Look up workspace file metadata to infer the format.
+
+    Workspace URIs by ID (``workspace://abc123``) have no file extension.
+    When the MIME fragment is also absent, we query the workspace file
+    manager for the file's stored MIME type and original filename.
+    """
+    if not user_id:
+        return None
+    try:
+        ws = parse_workspace_uri(uri)
+        manager = await get_workspace_manager(user_id, session.session_id)
+        info = await (
+            manager.get_file_info(ws.file_ref)
+            if not ws.is_path
+            else manager.get_file_info_by_path(ws.file_ref)
+        )
+        if info is None:
+            return None
+        # Try MIME type first, then filename extension.
+        mime = (info.mime_type or "").split(";", 1)[0].strip().lower()
+        return MIME_TO_FORMAT.get(mime) or infer_format_from_uri(info.name)
+    except (
+        ValueError,
+        FileNotFoundError,
+        OSError,
+        PermissionError,
+        AttributeError,
+        TypeError,
+    ):
+        # Expected failures: bad URI, missing file, permission denied, or
+        # workspace manager returning unexpected types.  Propagate anything
+        # else (e.g. programming errors) so they don't get silently swallowed.
+        logger.debug("workspace metadata lookup failed for %s", uri, exc_info=True)
+        return None
+
+
+def _is_media_file_field(prop_schema: dict[str, Any] | None) -> bool:
+    """Return True if *prop_schema* describes a MediaFileType field (format: file)."""
+    if prop_schema is None:
+        return False
+    return (
+        prop_schema.get("type") == "string"
+        and prop_schema.get("format") == MediaFileType.string_format
+    )
+
+
+async def _expand_bare_ref(
+    ref: FileRef,
+    fmt: str | None,
+    user_id: str | None,
+    session: ChatSession,
+    prop_schema: dict[str, Any] | None,
+) -> Any:
+    """Resolve and parse a bare ``@@agptfile:`` reference.
+
+    This is the structured-parsing path: the file is read, optionally parsed
+    according to *fmt*, and adapted to the target *prop_schema*.
+
+    Raises :class:`FileRefExpansionError` on resolution or parsing failure.
+
+    Note: MediaFileType fields (format: "file") are handled earlier in
+    ``_expand`` to avoid unnecessary format inference and file I/O.
+    """
+    try:
+        if fmt is not None and fmt in BINARY_FORMATS:
+            # Binary formats need raw bytes, not UTF-8 text.
+            # Line ranges are meaningless for binary formats (parquet/xlsx)
+            # — ignore them and parse full bytes.  Warn so the caller/model
+            # knows the range was silently dropped.
+            if ref.start_line is not None or ref.end_line is not None:
+                logger.warning(
+                    "Line range [%s-%s] ignored for binary format %s (%s); "
+                    "binary formats are always parsed in full.",
+                    ref.start_line,
+                    ref.end_line,
+                    fmt,
+                    ref.uri,
+                )
+            content: str | bytes = await read_file_bytes(ref.uri, user_id, session)
+        else:
+            content = await resolve_file_ref(ref, user_id, session)
+    except ValueError as exc:
+        raise FileRefExpansionError(str(exc)) from exc
+
+    # For known formats this rejects files >10 MB before parsing.
+    # For unknown formats _MAX_EXPAND_CHARS (200K chars) below is stricter,
+    # but this check still guards the parsing path which has no char limit.
+    # _check_content_size raises ValueError, which we unify here just like
+    # resolution errors above.
+    try:
+        _check_content_size(content)
+    except ValueError as exc:
+        raise FileRefExpansionError(str(exc)) from exc
+
+    # When the schema declares this parameter as "string",
+    # return raw file content — don't parse into a structured
+    # type that would need json.dumps() serialisation.
+    expect_string = (prop_schema or {}).get("type") == "string"
+    if expect_string:
+        if isinstance(content, bytes):
+            raise FileRefExpansionError(
+                f"Cannot use {fmt} file as text input: "
+                f"binary formats (parquet, xlsx) must be passed "
+                f"to a block that accepts structured data (list/object), "
+                f"not a string-typed parameter."
+            )
+        return content
+
+    if fmt is not None:
+        # Use strict mode for binary formats so we surface the
+        # actual error (e.g. missing pyarrow/openpyxl, corrupt
+        # file) instead of silently returning garbled bytes.
+        strict = fmt in BINARY_FORMATS
+        try:
+            parsed = parse_file_content(content, fmt, strict=strict)
+        except PARSE_EXCEPTIONS as exc:
+            raise FileRefExpansionError(f"Failed to parse {fmt} file: {exc}") from exc
+        # Normalize bytes fallback to str so tools never
+        # receive raw bytes when parsing fails.
+        if isinstance(parsed, bytes):
+            parsed = _to_str(parsed)
+        return _adapt_to_schema(parsed, prop_schema)
+
+    # Unknown format — return as plain string, but apply
+    # the same per-ref character limit used by inline refs
+    # to prevent injecting unexpectedly large content.
+    text = _to_str(content)
+    if len(text) > _MAX_EXPAND_CHARS:
+        text = text[:_MAX_EXPAND_CHARS] + "\n... [truncated]"
+    return text
+
+
+def _adapt_to_schema(parsed: Any, prop_schema: dict[str, Any] | None) -> Any:
+    """Adapt a parsed file value to better fit the target schema type.
+
+    When the parser returns a natural type (e.g. dict from YAML, list from CSV)
+    that doesn't match the block's expected type, this function converts it to
+    a more useful representation instead of relying on pydantic's generic
+    coercion (which can produce awkward results like flattened dicts → lists).
+
+    Returns *parsed* unchanged when no adaptation is needed.
+    """
+    if prop_schema is None:
+        return parsed
+
+    target_type = prop_schema.get("type")
+
+    # Dict → array: delegate to helper.
+    if isinstance(parsed, dict) and target_type == "array":
+        return _adapt_dict_to_array(parsed, prop_schema)
+
+    # List → object: delegate to helper (raises for non-tabular lists).
+    if isinstance(parsed, list) and target_type == "object":
+        return _adapt_list_to_object(parsed)
+
+    # Tabular list → Any (no type): convert to list of dicts.
+    # Blocks like FindInDictionaryBlock have `input: Any` which produces
+    # a schema with no "type" key.  Tabular [[header],[rows]] is unusable
+    # for key lookup, but [{col: val}, ...] works with FindInDict's
+    # list-of-dicts branch (line 195-199 in data_manipulation.py).
+    if isinstance(parsed, list) and target_type is None and _is_tabular(parsed):
+        return _tabular_to_list_of_dicts(parsed)
+
+    return parsed
+
+
+def _adapt_dict_to_array(parsed: dict, prop_schema: dict[str, Any]) -> Any:
+    """Adapt a parsed dict to an array-typed field.
+
+    Extracts list-valued entries when the target item type is ``array``,
+    passes through unchanged when item type is ``string`` (lets pydantic error),
+    or wraps in ``[parsed]`` as a fallback.
+    """
+    items_type = (prop_schema.get("items") or {}).get("type")
+    if items_type == "array":
+        # Target is List[List[Any]] — extract list-typed values from the
+        # dict as inner lists.  E.g. YAML {"fruits": [{...},...]}} with
+        # ConcatenateLists (List[List[Any]]) → [[{...},...]].
+        list_values = [v for v in parsed.values() if isinstance(v, list)]
+        if list_values:
+            return list_values
+    if items_type == "string":
+        # Target is List[str] — wrapping a dict would give [dict]
+        # which can't coerce to strings.  Return unchanged and let
+        # pydantic surface a clear validation error.
+        return parsed
+    # Fallback: wrap in a single-element list so the block gets [dict]
+    # instead of pydantic flattening keys/values into a flat list.
+    return [parsed]
+
+
+def _adapt_list_to_object(parsed: list) -> Any:
+    """Adapt a parsed list to an object-typed field.
+
+    Converts tabular lists to column-dicts; raises for non-tabular lists.
+    """
+    if _is_tabular(parsed):
+        return _tabular_to_column_dict(parsed)
+    # Non-tabular list (e.g. a plain Python list from a YAML file) cannot
+    # be meaningfully coerced to an object.  Raise explicitly so callers
+    # get a clear error rather than pydantic silently wrapping the list.
+    raise FileRefExpansionError(
+        "Cannot adapt a non-tabular list to an object-typed field. "
+        "Expected a tabular structure ([[header], [row1], ...]) or a dict."
+    )
+
+
+def _is_tabular(parsed: Any) -> bool:
+    """Check if parsed data is in tabular format: [[header], [row1], ...].
+
+    Uses isinstance checks because this is a structural type guard on
+    opaque parser output (Any), not duck typing.  A Protocol wouldn't
+    help here — we need to verify exact list-of-lists shape.
+    """
+    if not isinstance(parsed, list) or len(parsed) < 2:
+        return False
+    header = parsed[0]
+    if not isinstance(header, list) or not header:
+        return False
+    if not all(isinstance(h, str) for h in header):
+        return False
+    return all(isinstance(row, list) for row in parsed[1:])
+
+
+def _tabular_to_list_of_dicts(parsed: list) -> list[dict[str, Any]]:
+    """Convert [[header], [row1], ...] → [{header[0]: row[0], ...}, ...].
+
+    Ragged rows (fewer columns than the header) get None for missing values.
+    Extra values beyond the header length are silently dropped.
+    """
+    header = parsed[0]
+    return [
+        dict(itertools.zip_longest(header, row[: len(header)], fillvalue=None))
+        for row in parsed[1:]
+    ]
+
+
+def _tabular_to_column_dict(parsed: list) -> dict[str, list]:
+    """Convert [[header], [row1], ...] → {"col1": [val1, ...], ...}.
+
+    Ragged rows (fewer columns than the header) get None for missing values,
+    ensuring all columns have equal length.
+    """
+    header = parsed[0]
+    return {
+        col: [row[i] if i < len(row) else None for row in parsed[1:]]
+        for i, col in enumerate(header)
+    }
--- a/autogpt_platform/backend/backend/copilot/sdk/file_ref_integration_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/file_ref_integration_test.py
@@ -175,6 +175,199 @@ async def test_expand_args_replaces_file_ref_in_nested_dict():
        assert result["count"] == 42


+# ---------------------------------------------------------------------------
+# expand_file_refs_in_args — bare ref structured parsing
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_bare_ref_json_returns_parsed_dict():
+    """Bare ref to a .json file returns parsed dict, not raw string."""
+    with tempfile.TemporaryDirectory() as sdk_cwd:
+        json_file = os.path.join(sdk_cwd, "data.json")
+        with open(json_file, "w") as f:
+            f.write('{"key": "value", "count": 42}')
+
+        with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
+            mock_cwd_var.get.return_value = sdk_cwd
+
+            result = await expand_file_refs_in_args(
+                {"data": f"@@agptfile:{json_file}"},
+                user_id="u1",
+                session=_make_session(),
+            )
+
+        assert result["data"] == {"key": "value", "count": 42}
+
+
+@pytest.mark.asyncio
+async def test_bare_ref_csv_returns_parsed_table():
+    """Bare ref to a .csv file returns list[list[str]] table."""
+    with tempfile.TemporaryDirectory() as sdk_cwd:
+        csv_file = os.path.join(sdk_cwd, "data.csv")
+        with open(csv_file, "w") as f:
+            f.write("Name,Score\nAlice,90\nBob,85")
+
+        with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
+            mock_cwd_var.get.return_value = sdk_cwd
+
+            result = await expand_file_refs_in_args(
+                {"input": f"@@agptfile:{csv_file}"},
+                user_id="u1",
+                session=_make_session(),
+            )
+
+        assert result["input"] == [
+            ["Name", "Score"],
+            ["Alice", "90"],
+            ["Bob", "85"],
+        ]
+
+
+@pytest.mark.asyncio
+async def test_bare_ref_unknown_extension_returns_string():
+    """Bare ref to a file with unknown extension returns plain string."""
+    with tempfile.TemporaryDirectory() as sdk_cwd:
+        txt_file = os.path.join(sdk_cwd, "readme.txt")
+        with open(txt_file, "w") as f:
+            f.write("plain text content")
+
+        with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
+            mock_cwd_var.get.return_value = sdk_cwd
+
+            result = await expand_file_refs_in_args(
+                {"data": f"@@agptfile:{txt_file}"},
+                user_id="u1",
+                session=_make_session(),
+            )
+
+        assert result["data"] == "plain text content"
+        assert isinstance(result["data"], str)
+
+
+@pytest.mark.asyncio
+async def test_bare_ref_invalid_json_falls_back_to_string():
+    """Bare ref to a .json file with invalid JSON falls back to string."""
+    with tempfile.TemporaryDirectory() as sdk_cwd:
+        json_file = os.path.join(sdk_cwd, "bad.json")
+        with open(json_file, "w") as f:
+            f.write("not valid json {{{")
+
+        with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
+            mock_cwd_var.get.return_value = sdk_cwd
+
+            result = await expand_file_refs_in_args(
+                {"data": f"@@agptfile:{json_file}"},
+                user_id="u1",
+                session=_make_session(),
+            )
+
+        assert result["data"] == "not valid json {{{"
+        assert isinstance(result["data"], str)
+
+
+@pytest.mark.asyncio
+async def test_embedded_ref_always_returns_string_even_for_json():
+    """Embedded ref (text around it) returns plain string, not parsed JSON."""
+    with tempfile.TemporaryDirectory() as sdk_cwd:
+        json_file = os.path.join(sdk_cwd, "data.json")
+        with open(json_file, "w") as f:
+            f.write('{"key": "value"}')
+
+        with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
+            mock_cwd_var.get.return_value = sdk_cwd
+
+            result = await expand_file_refs_in_args(
+                {"data": f"prefix @@agptfile:{json_file} suffix"},
+                user_id="u1",
+                session=_make_session(),
+            )
+
+        assert isinstance(result["data"], str)
+        assert result["data"].startswith("prefix ")
+        assert result["data"].endswith(" suffix")
+
+
+@pytest.mark.asyncio
+async def test_bare_ref_yaml_returns_parsed_dict():
+    """Bare ref to a .yaml file returns parsed dict."""
+    with tempfile.TemporaryDirectory() as sdk_cwd:
+        yaml_file = os.path.join(sdk_cwd, "config.yaml")
+        with open(yaml_file, "w") as f:
+            f.write("name: test\ncount: 42\n")
+
+        with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
+            mock_cwd_var.get.return_value = sdk_cwd
+
+            result = await expand_file_refs_in_args(
+                {"config": f"@@agptfile:{yaml_file}"},
+                user_id="u1",
+                session=_make_session(),
+            )
+
+        assert result["config"] == {"name": "test", "count": 42}
+
+
+@pytest.mark.asyncio
+async def test_bare_ref_binary_with_line_range_ignores_range():
+    """Bare ref to a binary file (.parquet) with line range parses the full file.
+
+    Binary formats (parquet, xlsx) ignore line ranges — the full content is
+    parsed and the range is silently dropped with a log warning.
+    """
+    try:
+        import pandas as pd
+    except ImportError:
+        pytest.skip("pandas not installed")
+    try:
+        import pyarrow  # noqa: F401  # pyright: ignore[reportMissingImports]
+    except ImportError:
+        pytest.skip("pyarrow not installed")
+
+    with tempfile.TemporaryDirectory() as sdk_cwd:
+        parquet_file = os.path.join(sdk_cwd, "data.parquet")
+        import io as _io
+
+        df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+        buf = _io.BytesIO()
+        df.to_parquet(buf, index=False)
+        with open(parquet_file, "wb") as f:
+            f.write(buf.getvalue())
+
+        with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
+            mock_cwd_var.get.return_value = sdk_cwd
+
+            # Line range [1-2] should be silently ignored for binary formats.
+            result = await expand_file_refs_in_args(
+                {"data": f"@@agptfile:{parquet_file}[1-2]"},
+                user_id="u1",
+                session=_make_session(),
+            )
+
+        # Full file is returned despite the line range.
+        assert result["data"] == [["A", "B"], [1, 4], [2, 5], [3, 6]]
+
+
+@pytest.mark.asyncio
+async def test_bare_ref_toml_returns_parsed_dict():
+    """Bare ref to a .toml file returns parsed dict."""
+    with tempfile.TemporaryDirectory() as sdk_cwd:
+        toml_file = os.path.join(sdk_cwd, "config.toml")
+        with open(toml_file, "w") as f:
+            f.write('name = "test"\ncount = 42\n')
+
+        with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
+            mock_cwd_var.get.return_value = sdk_cwd
+
+            result = await expand_file_refs_in_args(
+                {"config": f"@@agptfile:{toml_file}"},
+                user_id="u1",
+                session=_make_session(),
+            )
+
+        assert result["config"] == {"name": "test", "count": 42}
+
+
 # ---------------------------------------------------------------------------
 # _read_file_handler — extended to accept workspace:// and local paths
 # ---------------------------------------------------------------------------
@@ -219,7 +412,7 @@ async def test_read_file_handler_workspace_uri():
        "backend.copilot.sdk.tool_adapter.get_execution_context",
        return_value=("user-1", mock_session),
    ), patch(
-        "backend.copilot.sdk.file_ref.get_manager",
+        "backend.copilot.sdk.file_ref.get_workspace_manager",
        new=AsyncMock(return_value=mock_manager),
    ):
        result = await _read_file_handler(
@@ -276,7 +469,7 @@ async def test_read_file_bytes_workspace_virtual_path():
    mock_manager.read_file.return_value = b"virtual path content"

    with patch(
-        "backend.copilot.sdk.file_ref.get_manager",
+        "backend.copilot.sdk.file_ref.get_workspace_manager",
        new=AsyncMock(return_value=mock_manager),
    ):
        result = await read_file_bytes("workspace:///reports/q1.md", "user-1", session)
--- a/autogpt_platform/backend/backend/copilot/sdk/file_ref_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/file_ref_test.py
--- a/autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
@@ -20,6 +20,7 @@ from claude_agent_sdk import (
    UserMessage,
 )

+from backend.copilot.constants import FRIENDLY_TRANSIENT_MSG, is_transient_api_error
 from backend.copilot.response_model import (
    StreamBaseResponse,
    StreamError,
@@ -214,10 +215,12 @@ class SDKResponseAdapter:
            if sdk_message.subtype == "success":
                responses.append(StreamFinish())
            elif sdk_message.subtype in ("error", "error_during_execution"):
-                error_msg = sdk_message.result or "Unknown error"
-                responses.append(
-                    StreamError(errorText=str(error_msg), code="sdk_error")
-                )
+                raw_error = str(sdk_message.result or "Unknown error")
+                if is_transient_api_error(raw_error):
+                    error_text, code = FRIENDLY_TRANSIENT_MSG, "transient_api_error"
+                else:
+                    error_text, code = raw_error, "sdk_error"
+                responses.append(StreamError(errorText=error_text, code=code))
                responses.append(StreamFinish())
            else:
                logger.warning(
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
@@ -29,6 +29,7 @@ from langfuse import propagate_attributes
 from langsmith.integrations.claude_agent_sdk import configure_claude_agent_sdk
 from pydantic import BaseModel

+from backend.copilot.context import get_workspace_manager
 from backend.data.redis_client import get_redis_async
 from backend.executor.cluster_lock import AsyncClusterLock
 from backend.util.exceptions import NotFoundError
@@ -36,7 +37,13 @@ from backend.util.prompt import compress_context
 from backend.util.settings import Settings

 from ..config import ChatConfig
-from ..constants import COPILOT_ERROR_PREFIX, COPILOT_SYSTEM_PREFIX
+from ..constants import (
+    COPILOT_ERROR_PREFIX,
+    COPILOT_RETRYABLE_ERROR_PREFIX,
+    COPILOT_SYSTEM_PREFIX,
+    FRIENDLY_TRANSIENT_MSG,
+    is_transient_api_error,
+)
 from ..model import (
    ChatMessage,
    ChatSession,
@@ -62,7 +69,6 @@ from ..service import (
 )
 from ..tools.e2b_sandbox import get_or_create_sandbox, pause_sandbox_direct
 from ..tools.sandbox import WORKSPACE_PREFIX, make_session_path
-from ..tools.workspace_files import get_manager
 from ..tracking import track_user_message
 from .compaction import CompactionTracker, filter_compaction_messages
 from .response_adapter import SDKResponseAdapter
@@ -88,6 +94,28 @@ logger = logging.getLogger(__name__)
 config = ChatConfig()


+def _append_error_marker(
+    session: ChatSession | None,
+    display_msg: str,
+    *,
+    retryable: bool = False,
+) -> None:
+    """Append a copilot error marker to *session* so it persists across refresh.
+
+    Args:
+        session: The chat session to append to (no-op if ``None``).
+        display_msg: User-visible error text.
+        retryable: If ``True``, use the retryable prefix so the frontend
+            shows a "Try Again" button.
+    """
+    if session is None:
+        return
+    prefix = COPILOT_RETRYABLE_ERROR_PREFIX if retryable else COPILOT_ERROR_PREFIX
+    session.messages.append(
+        ChatMessage(role="assistant", content=f"{prefix} {display_msg}")
+    )
+
+
 def _setup_langfuse_otel() -> None:
    """Configure OTEL tracing for the Claude Agent SDK → Langfuse.

@@ -207,61 +235,57 @@ def _build_sdk_env(
    session_id: str | None = None,
    user_id: str | None = None,
 ) -> dict[str, str]:
-    """Build env vars for the SDK CLI process.
+    """Build env vars for the SDK CLI subprocess.

-    Routes API calls through OpenRouter (or a custom base_url) using
-    the same ``config.api_key`` / ``config.base_url`` as the non-SDK path.
-    This gives per-call token and cost tracking on the OpenRouter dashboard.
-
-    When *session_id* is provided, an ``x-session-id`` custom header is
-    injected via ``ANTHROPIC_CUSTOM_HEADERS`` so that OpenRouter Broadcast
-    forwards traces (including cost/usage) to Langfuse for the
-    ``/api/v1/messages`` endpoint.
-
-    Only overrides ``ANTHROPIC_API_KEY`` when a valid proxy URL and auth
-    token are both present — otherwise returns an empty dict so the SDK
-    falls back to its default credentials.
+    Three modes (checked in order):
+    1. **Subscription** — clears all keys; CLI uses ``claude login`` auth.
+    2. **Direct Anthropic** — returns ``{}``; subprocess inherits
+       ``ANTHROPIC_API_KEY`` from the parent environment.
+    3. **OpenRouter** (default) — overrides base URL and auth token to
+       route through the proxy, with Langfuse trace headers.
    """
-    env: dict[str, str] = {}
-
+    # --- Mode 1: Claude Code subscription auth ---
    if config.use_claude_code_subscription:
-        # Claude Code subscription: let the CLI use its own logged-in auth.
-        # Explicitly clear API key env vars so the subprocess doesn't pick
-        # them up from the parent process and bypass subscription auth.
        _validate_claude_code_subscription()
-        env["ANTHROPIC_API_KEY"] = ""
-        env["ANTHROPIC_AUTH_TOKEN"] = ""
-        env["ANTHROPIC_BASE_URL"] = ""
-    elif config.api_key and config.base_url:
-        # Strip /v1 suffix — SDK expects the base URL without a version path
-        base = config.base_url.rstrip("/")
-        if base.endswith("/v1"):
-            base = base[:-3]
-        if not base or not base.startswith("http"):
-            # Invalid base_url — don't override SDK defaults
-            return env
-        env["ANTHROPIC_BASE_URL"] = base
-        env["ANTHROPIC_AUTH_TOKEN"] = config.api_key
-        # Must be explicitly empty so the CLI uses AUTH_TOKEN instead
-        env["ANTHROPIC_API_KEY"] = ""
+        return {
+            "ANTHROPIC_API_KEY": "",
+            "ANTHROPIC_AUTH_TOKEN": "",
+            "ANTHROPIC_BASE_URL": "",
+        }
+
+    # --- Mode 2: Direct Anthropic (no proxy hop) ---
+    # Also the fallback when OpenRouter is enabled but credentials are missing.
+    # Strip /v1 suffix — SDK expects the base URL without a version path.
+    base = (config.base_url or "").rstrip("/")
+    if base.endswith("/v1"):
+        base = base[:-3]
+    if (
+        not config.use_openrouter
+        or not config.api_key
+        or not base
+        or not base.startswith("http")
+    ):
+        return {}
+
+    # --- Mode 3: OpenRouter proxy ---
+    env: dict[str, str] = {
+        "ANTHROPIC_BASE_URL": base,
+        "ANTHROPIC_AUTH_TOKEN": config.api_key,
+        "ANTHROPIC_API_KEY": "",  # force CLI to use AUTH_TOKEN
+    }

    # Inject broadcast headers so OpenRouter forwards traces to Langfuse.
-    # The ``x-session-id`` header is *required* for the Anthropic-native
-    # ``/messages`` endpoint — without it broadcast silently drops the
-    # trace even when org-level Langfuse integration is configured.
-    def _safe(value: str) -> str:
-        """Strip CR/LF to prevent header injection, then truncate."""
-        return value.replace("\r", "").replace("\n", "").strip()[:128]
+    def _safe(v: str) -> str:
+        """Sanitise a header value: strip newlines/whitespace and cap length."""
+        return v.replace("\r", "").replace("\n", "").strip()[:128]

-    headers: list[str] = []
+    parts = []
    if session_id:
-        headers.append(f"x-session-id: {_safe(session_id)}")
+        parts.append(f"x-session-id: {_safe(session_id)}")
    if user_id:
-        headers.append(f"x-user-id: {_safe(user_id)}")
-    # Only inject headers when routing through OpenRouter/proxy — they're
-    # meaningless (and leak internal IDs) when using subscription mode.
-    if headers and env.get("ANTHROPIC_BASE_URL"):
-        env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(headers)
+        parts.append(f"x-user-id: {_safe(user_id)}")
+    if parts:
+        env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)

    return env

@@ -565,7 +589,7 @@ async def _prepare_file_attachments(
        return empty

    try:
-        manager = await get_manager(user_id, session_id)
+        manager = await get_workspace_manager(user_id, session_id)
    except Exception:
        logger.warning(
            "Failed to create workspace manager for file attachments",
@@ -653,13 +677,17 @@ async def stream_chat_completion_sdk(
    # Type narrowing: session is guaranteed ChatSession after the check above
    session = cast(ChatSession, session)

-    # Clean up stale error markers from previous turn before starting new turn
-    # If the last message contains an error marker, remove it (user is retrying)
-    if (
+    # Clean up ALL trailing error markers from previous turn before starting
+    # a new turn.  Multiple markers can accumulate when a mid-stream error is
+    # followed by a cleanup error in __aexit__ (both append a marker).
+    while (
        len(session.messages) > 0
        and session.messages[-1].role == "assistant"
        and session.messages[-1].content
-        and COPILOT_ERROR_PREFIX in session.messages[-1].content
+        and (
+            COPILOT_ERROR_PREFIX in session.messages[-1].content
+            or COPILOT_RETRYABLE_ERROR_PREFIX in session.messages[-1].content
+        )
    ):
        logger.info(
            "[SDK] [%s] Removing stale error marker from previous turn",
@@ -797,7 +825,7 @@ async def stream_chat_completion_sdk(
                )
            except Exception as transcript_err:
                logger.warning(
-                    "%s Transcript download failed, continuing without " "--resume: %s",
+                    "%s Transcript download failed, continuing without --resume: %s",
                    log_prefix,
                    transcript_err,
                )
@@ -820,7 +848,7 @@ async def stream_chat_completion_sdk(
            is_valid = validate_transcript(dl.content)
            dl_lines = dl.content.strip().split("\n") if dl.content else []
            logger.info(
-                "%s Downloaded transcript: %dB, %d lines, " "msg_count=%d, valid=%s",
+                "%s Downloaded transcript: %dB, %d lines, msg_count=%d, valid=%s",
                log_prefix,
                len(dl.content),
                len(dl_lines),
@@ -1039,23 +1067,36 @@ async def stream_chat_completion_sdk(
                        # Exception in receive_response() — capture it
                        # so the session can still be saved and the
                        # frontend gets a clean finish.
-                        logger.error(
+                        if is_transient_api_error(str(stream_err)):
+                            log, display, code = (
+                                logger.warning,
+                                FRIENDLY_TRANSIENT_MSG,
+                                "transient_api_error",
+                            )
+                        else:
+                            log, display, code = (
+                                logger.error,
+                                f"SDK stream error: {stream_err}",
+                                "sdk_stream_error",
+                            )
+
+                        log(
                            "%s Stream error from SDK: %s",
                            log_prefix,
                            stream_err,
                            exc_info=True,
                        )
                        ended_with_stream_error = True
-
-                        yield StreamError(
-                            errorText=f"SDK stream error: {stream_err}",
-                            code="sdk_stream_error",
+                        _append_error_marker(
+                            session,
+                            display,
+                            retryable=(code == "transient_api_error"),
                        )
+                        yield StreamError(errorText=display, code=code)
                        break

                    logger.info(
-                        "%s Received: %s %s "
-                        "(unresolved=%d, current=%d, resolved=%d)",
+                        "%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
                        log_prefix,
                        type(sdk_msg).__name__,
                        getattr(sdk_msg, "subtype", ""),
@@ -1069,15 +1110,42 @@ async def stream_chat_completion_sdk(
                    # so we can debug Anthropic API 400s surfaced by the CLI.
                    sdk_error = getattr(sdk_msg, "error", None)
                    if isinstance(sdk_msg, AssistantMessage) and sdk_error:
+                        error_text = str(sdk_error)
+                        error_preview = str(sdk_msg.content)[:500]
                        logger.error(
                            "[SDK] [%s] AssistantMessage has error=%s, "
                            "content_blocks=%d, content_preview=%s",
                            session_id[:12],
                            sdk_error,
                            len(sdk_msg.content),
-                            str(sdk_msg.content)[:500],
+                            error_preview,
                        )

+                        # Intercept transient API errors (socket closed,
+                        # ECONNRESET) — replace the raw message with a
+                        # user-friendly error text and use the retryable
+                        # error prefix so the frontend shows a retry button.
+                        # Check both the error field and content for patterns.
+                        if is_transient_api_error(error_text) or is_transient_api_error(
+                            error_preview
+                        ):
+                            logger.warning(
+                                "%s Transient Anthropic API error detected, "
+                                "suppressing raw error text",
+                                log_prefix,
+                            )
+                            ended_with_stream_error = True
+                            _append_error_marker(
+                                session,
+                                FRIENDLY_TRANSIENT_MSG,
+                                retryable=True,
+                            )
+                            yield StreamError(
+                                errorText=FRIENDLY_TRANSIENT_MSG,
+                                code="transient_api_error",
+                            )
+                            break
+
                    # Race-condition fix: SDK hooks (PostToolUse) are
                    # executed asynchronously via start_soon() — the next
                    # message can arrive before the hook stashes output.
@@ -1176,7 +1244,7 @@ async def stream_chat_completion_sdk(
                                extra,
                            )

-                        # Log errors being sent to frontend
+                        # Persist error markers so they survive page refresh
                        if isinstance(response, StreamError):
                            logger.error(
                                "%s Sending error to frontend: %s (code=%s)",
@@ -1184,6 +1252,12 @@ async def stream_chat_completion_sdk(
                                response.errorText,
                                response.code,
                            )
+                            _append_error_marker(
+                                session,
+                                response.errorText,
+                                retryable=(response.code == "transient_api_error"),
+                            )
+                            ended_with_stream_error = True

                        yield response

@@ -1378,14 +1452,18 @@ async def stream_chat_completion_sdk(
            else:
                logger.error("%s Error: %s", log_prefix, error_msg, exc_info=True)

-        # Append error marker to session (non-invasive text parsing approach)
-        # The finally block will persist the session with this error marker
-        if session:
-            session.messages.append(
-                ChatMessage(
-                    role="assistant", content=f"{COPILOT_ERROR_PREFIX} {error_msg}"
-                )
-            )
+        is_transient = is_transient_api_error(error_msg)
+        if is_transient:
+            display_msg, code = FRIENDLY_TRANSIENT_MSG, "transient_api_error"
+        else:
+            display_msg, code = error_msg, "sdk_error"
+
+        # Append error marker to session (non-invasive text parsing approach).
+        # The finally block will persist the session with this error marker.
+        # Skip if a marker was already appended inside the stream loop
+        # (ended_with_stream_error) to avoid duplicate stale markers.
+        if not ended_with_stream_error:
+            _append_error_marker(session, display_msg, retryable=is_transient)
            logger.debug(
                "%s Appended error marker, will be persisted in finally",
                log_prefix,
@@ -1397,10 +1475,7 @@ async def stream_chat_completion_sdk(
            isinstance(e, RuntimeError) and "cancel scope" in str(e)
        )
        if not is_cancellation:
-            yield StreamError(
-                errorText=error_msg,
-                code="sdk_error",
-            )
+            yield StreamError(errorText=display_msg, code=code)

        raise
    finally:
--- a/autogpt_platform/backend/backend/copilot/sdk/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service_test.py
@@ -20,7 +20,7 @@ class _FakeFileInfo:
    size_bytes: int


-_PATCH_TARGET = "backend.copilot.sdk.service.get_manager"
+_PATCH_TARGET = "backend.copilot.sdk.service.get_workspace_manager"


 class TestPrepareFileAttachments:
--- a/autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
@@ -347,7 +347,7 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
    :func:`get_sdk_disallowed_tools`.
    """

-    def _truncating(fn, tool_name: str):
+    def _truncating(fn, tool_name: str, input_schema: dict[str, Any] | None = None):
        """Wrap a tool handler so its response is truncated to stay under the
        SDK's 10 MB JSON buffer, and stash the (truncated) output for the
        response adapter before the SDK can apply its own head-truncation.
@@ -361,7 +361,9 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
            user_id, session = get_execution_context()
            if session is not None:
                try:
-                    args = await expand_file_refs_in_args(args, user_id, session)
+                    args = await expand_file_refs_in_args(
+                        args, user_id, session, input_schema=input_schema
+                    )
                except FileRefExpansionError as exc:
                    return _mcp_error(
                        f"@@agptfile: reference could not be resolved: {exc}. "
@@ -389,11 +391,12 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):

    for tool_name, base_tool in TOOL_REGISTRY.items():
        handler = create_tool_handler(base_tool)
+        schema = _build_input_schema(base_tool)
        decorated = tool(
            tool_name,
            base_tool.description,
-            _build_input_schema(base_tool),
-        )(_truncating(handler, tool_name))
+            schema,
+        )(_truncating(handler, tool_name, input_schema=schema))
        sdk_tools.append(decorated)

    # E2B file tools replace SDK built-in Read/Write/Edit/Glob/Grep.
--- a/autogpt_platform/backend/backend/copilot/service.py
+++ b/autogpt_platform/backend/backend/copilot/service.py
@@ -28,10 +28,24 @@ logger = logging.getLogger(__name__)

 config = ChatConfig()
 settings = Settings()
-client = LangfuseAsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
+
+_client: LangfuseAsyncOpenAI | None = None
+_langfuse = None


-langfuse = get_client()
+def _get_openai_client() -> LangfuseAsyncOpenAI:
+    global _client
+    if _client is None:
+        _client = LangfuseAsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
+    return _client
+
+
+def _get_langfuse():
+    global _langfuse
+    if _langfuse is None:
+        _langfuse = get_client()
+    return _langfuse
+

 # Default system prompt used when Langfuse is not configured
 # Provides minimal baseline tone and personality - all workflow, tools, and
@@ -84,7 +98,7 @@ async def _get_system_prompt_template(context: str) -> str:
                else "latest"
            )
            prompt = await asyncio.to_thread(
-                langfuse.get_prompt,
+                _get_langfuse().get_prompt,
                config.langfuse_prompt_name,
                label=label,
                cache_ttl_seconds=config.langfuse_prompt_cache_ttl,
@@ -158,7 +172,7 @@ async def _generate_session_title(
            "environment": settings.config.app_env.value,
        }

-        response = await client.chat.completions.create(
+        response = await _get_openai_client().chat.completions.create(
            model=config.title_model,
            messages=[
                {
--- a/autogpt_platform/backend/backend/copilot/tools/agent_browser.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_browser.py
@@ -32,6 +32,7 @@ import shutil
 import tempfile
 from typing import Any

+from backend.copilot.context import get_workspace_manager
 from backend.copilot.model import ChatSession
 from backend.util.request import validate_url_host

@@ -43,7 +44,6 @@ from .models import (
    ErrorResponse,
    ToolResponseBase,
 )
-from .workspace_files import get_manager

 logger = logging.getLogger(__name__)

@@ -194,7 +194,7 @@ async def _save_browser_state(
            ),
        }

-        manager = await get_manager(user_id, session.session_id)
+        manager = await get_workspace_manager(user_id, session.session_id)
        await manager.write_file(
            content=json.dumps(state).encode("utf-8"),
            filename=_STATE_FILENAME,
@@ -218,7 +218,7 @@ async def _restore_browser_state(
    Returns True on success (or no state to restore), False on failure.
    """
    try:
-        manager = await get_manager(user_id, session.session_id)
+        manager = await get_workspace_manager(user_id, session.session_id)

        file_info = await manager.get_file_info_by_path(_STATE_FILENAME)
        if file_info is None:
@@ -360,7 +360,7 @@ async def close_browser_session(session_name: str, user_id: str | None = None) -
    # Delete persisted browser state (cookies, localStorage) from workspace.
    if user_id:
        try:
-            manager = await get_manager(user_id, session_name)
+            manager = await get_workspace_manager(user_id, session_name)
            file_info = await manager.get_file_info_by_path(_STATE_FILENAME)
            if file_info is not None:
                await manager.delete_file(file_info.id)
--- a/autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_browser_test.py
@@ -897,7 +897,7 @@ class TestHasLocalSession:
 # _save_browser_state
 # ---------------------------------------------------------------------------

-_GET_MANAGER = "backend.copilot.tools.agent_browser.get_manager"
+_GET_MANAGER = "backend.copilot.tools.agent_browser.get_workspace_manager"


 def _make_mock_manager():
--- a/autogpt_platform/backend/backend/copilot/tools/run_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_block.py
@@ -12,6 +12,7 @@ from backend.copilot.constants import (
    COPILOT_SESSION_PREFIX,
 )
 from backend.copilot.model import ChatSession
+from backend.copilot.sdk.file_ref import FileRefExpansionError, expand_file_refs_in_args
 from backend.data.db_accessors import review_db
 from backend.data.execution import ExecutionContext

@@ -197,6 +198,29 @@ class RunBlockTool(BaseTool):
                session_id=session_id,
            )

+        # Expand @@agptfile: refs in input_data with the block's input
+        # schema.  The generic _truncating wrapper skips opaque object
+        # properties (input_data has no declared inner properties in the
+        # tool schema), so file ref tokens are still intact here.
+        # Using the block's schema lets us return raw text for string-typed
+        # fields and parsed structures for list/dict-typed fields.
+        if input_data:
+            try:
+                input_data = await expand_file_refs_in_args(
+                    input_data,
+                    user_id,
+                    session,
+                    input_schema=input_schema,
+                )
+            except FileRefExpansionError as exc:
+                return ErrorResponse(
+                    message=(
+                        f"Failed to resolve file reference: {exc}. "
+                        "Ensure the file exists before referencing it."
+                    ),
+                    session_id=session_id,
+                )
+
        if missing_credentials:
            # Return setup requirements response with missing credentials
            credentials_fields_info = block.input_schema.get_credentials_fields_info()
--- a/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
+++ b/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
@@ -10,11 +10,11 @@ from pydantic import BaseModel
 from backend.copilot.context import (
    E2B_WORKDIR,
    get_current_sandbox,
+    get_workspace_manager,
    resolve_sandbox_path,
 )
 from backend.copilot.model import ChatSession
 from backend.copilot.tools.sandbox import make_session_path
-from backend.data.db_accessors import workspace_db
 from backend.util.settings import Config
 from backend.util.virus_scanner import scan_content_safe
 from backend.util.workspace import WorkspaceManager
@@ -218,12 +218,6 @@ def _is_text_mime(mime_type: str) -> bool:
    return any(mime_type.startswith(t) for t in _TEXT_MIME_PREFIXES)


-async def get_manager(user_id: str, session_id: str) -> WorkspaceManager:
-    """Create a session-scoped WorkspaceManager."""
-    workspace = await workspace_db().get_or_create_workspace(user_id)
-    return WorkspaceManager(user_id, workspace.id, session_id)
-
-
 async def _resolve_file(
    manager: WorkspaceManager,
    file_id: str | None,
@@ -386,7 +380,7 @@ class ListWorkspaceFilesTool(BaseTool):
        include_all_sessions: bool = kwargs.get("include_all_sessions", False)

        try:
-            manager = await get_manager(user_id, session_id)
+            manager = await get_workspace_manager(user_id, session_id)
            files = await manager.list_files(
                path=path_prefix, limit=limit, include_all_sessions=include_all_sessions
            )
@@ -536,7 +530,7 @@ class ReadWorkspaceFileTool(BaseTool):
            )

        try:
-            manager = await get_manager(user_id, session_id)
+            manager = await get_workspace_manager(user_id, session_id)
            resolved = await _resolve_file(manager, file_id, path, session_id)
            if isinstance(resolved, ErrorResponse):
                return resolved
@@ -772,7 +766,7 @@ class WriteWorkspaceFileTool(BaseTool):

        try:
            await scan_content_safe(content, filename=filename)
-            manager = await get_manager(user_id, session_id)
+            manager = await get_workspace_manager(user_id, session_id)
            rec = await manager.write_file(
                content=content,
                filename=filename,
@@ -899,7 +893,7 @@ class DeleteWorkspaceFileTool(BaseTool):
            )

        try:
-            manager = await get_manager(user_id, session_id)
+            manager = await get_workspace_manager(user_id, session_id)
            resolved = await _resolve_file(manager, file_id, path, session_id)
            if isinstance(resolved, ErrorResponse):
                return resolved
--- a/autogpt_platform/backend/backend/executor/manager.py
+++ b/autogpt_platform/backend/backend/executor/manager.py
@@ -61,7 +61,12 @@ from backend.util.decorator import (
    error_logged,
    time_measured,
 )
-from backend.util.exceptions import InsufficientBalanceError, ModerationError
+from backend.util.exceptions import (
+    GraphNotFoundError,
+    InsufficientBalanceError,
+    ModerationError,
+    NotFoundError,
+)
 from backend.util.file import clean_exec_files
 from backend.util.logging import TruncatedLogger, configure_logging
 from backend.util.metrics import DiscordChannel
@@ -375,9 +380,16 @@ async def execute_node(
            log_metadata.debug("Node produced output", **{output_name: output_data})
            yield output_name, output_data
    except Exception as ex:
-        # Capture exception WITH context still set before restoring scope
-        sentry_sdk.capture_exception(error=ex, scope=scope)
-        sentry_sdk.flush()  # Ensure it's sent before we restore scope
+        # Only capture unexpected errors to Sentry, not user-caused ones.
+        # Most ValueError subclasses here are expected (BlockExecutionError,
+        # InsufficientBalanceError, plain ValueError for auth/disabled blocks, etc.)
+        # but NotFoundError/GraphNotFoundError could indicate real platform issues.
+        is_expected = isinstance(ex, ValueError) and not isinstance(
+            ex, (NotFoundError, GraphNotFoundError)
+        )
+        if not is_expected:
+            sentry_sdk.capture_exception(error=ex, scope=scope)
+            sentry_sdk.flush()
        # Re-raise to maintain normal error flow
        raise
    finally:
@@ -1478,7 +1490,7 @@ class ExecutionProcessor:
                    alert_message, DiscordChannel.PRODUCT
                )
            except Exception as e:
-                logger.error(f"Failed to send low balance Discord alert: {e}")
+                logger.warning(f"Failed to send low balance Discord alert: {e}")


 class ExecutionManager(AppProcess):
@@ -1900,17 +1912,16 @@ class ExecutionManager(AppProcess):
            channel = client.get_channel()
            channel.connection.add_callback_threadsafe(lambda: channel.stop_consuming())

-            try:
-                thread.join(timeout=300)
-            except TimeoutError:
-                logger.error(
+            thread.join(timeout=300)
+            if thread.is_alive():
+                logger.warning(
                    f"{prefix} ⚠️ Run thread did not finish in time, forcing disconnect"
                )

            client.disconnect()
            logger.info(f"{prefix} ✅ Run client disconnected")
        except Exception as e:
-            logger.error(f"{prefix} ⚠️ Error disconnecting run client: {type(e)} {e}")
+            logger.warning(f"{prefix} ⚠️ Error disconnecting run client: {type(e)} {e}")

    def cleanup(self):
        """Override cleanup to implement graceful shutdown with active execution waiting."""
@@ -1926,7 +1937,9 @@ class ExecutionManager(AppProcess):
            )
            logger.info(f"{prefix} ✅ Exec consumer has been signaled to stop")
        except Exception as e:
-            logger.error(f"{prefix} ⚠️ Error signaling consumer to stop: {type(e)} {e}")
+            logger.warning(
+                f"{prefix} ⚠️ Error signaling consumer to stop: {type(e)} {e}"
+            )

        # Wait for active executions to complete
        if self.active_graph_runs:
@@ -1957,7 +1970,7 @@ class ExecutionManager(AppProcess):
                waited += wait_interval

            if self.active_graph_runs:
-                logger.error(
+                logger.warning(
                    f"{prefix} ⚠️ {len(self.active_graph_runs)} executions still running after {max_wait}s"
                )
            else:
@@ -1968,7 +1981,7 @@ class ExecutionManager(AppProcess):
            self.executor.shutdown(cancel_futures=True, wait=False)
            logger.info(f"{prefix} ✅ Executor shutdown completed")
        except Exception as e:
-            logger.error(f"{prefix} ⚠️ Error during executor shutdown: {type(e)} {e}")
+            logger.warning(f"{prefix} ⚠️ Error during executor shutdown: {type(e)} {e}")

        # Release remaining execution locks
        try:
--- a/autogpt_platform/backend/backend/executor/scheduler.py
+++ b/autogpt_platform/backend/backend/executor/scheduler.py
@@ -94,7 +94,7 @@ SCHEDULER_OPERATION_TIMEOUT_SECONDS = 300  # 5 minutes for scheduler operations
 def job_listener(event):
    """Logs job execution outcomes for better monitoring."""
    if event.exception:
-        logger.error(
+        logger.warning(
            f"Job {event.job_id} failed: {type(event.exception).__name__}: {event.exception}"
        )
    else:
@@ -137,7 +137,7 @@ def run_async(coro, timeout: float = SCHEDULER_OPERATION_TIMEOUT_SECONDS):
    try:
        return future.result(timeout=timeout)
    except Exception as e:
-        logger.error(f"Async operation failed: {type(e).__name__}: {e}")
+        logger.warning(f"Async operation failed: {type(e).__name__}: {e}")
        raise


@@ -186,7 +186,7 @@ async def _execute_graph(**kwargs):


 async def _handle_graph_validation_error(args: "GraphExecutionJobArgs") -> None:
-    logger.error(
+    logger.warning(
        f"Scheduled Graph {args.graph_id} failed validation. Unscheduling graph"
    )
    if args.schedule_id:
@@ -196,8 +196,9 @@ async def _handle_graph_validation_error(args: "GraphExecutionJobArgs") -> None:
            user_id=args.user_id,
        )
    else:
-        logger.error(
-            f"Unable to unschedule graph: {args.graph_id} as this is an old job with no associated schedule_id please remove manually"
+        logger.warning(
+            f"Unable to unschedule graph: {args.graph_id} as this is an old job "
+            f"with no associated schedule_id please remove manually"
        )


--- a/autogpt_platform/backend/backend/notifications/notifications.py
+++ b/autogpt_platform/backend/backend/notifications/notifications.py
@@ -303,9 +303,9 @@ class NotificationManager(AppService):
                    )

                    if not oldest_message:
-                        # this should never happen
-                        logger.error(
-                            f"Batch for user {batch.user_id} and type {notification_type} has no oldest message whichshould never happen!!!!!!!!!!!!!!!!"
+                        logger.warning(
+                            f"Batch for user {batch.user_id} and type {notification_type} "
+                            f"has no oldest message — batch may have been cleared concurrently"
                        )
                        continue

@@ -318,7 +318,7 @@ class NotificationManager(AppService):
                        ).get_user_email_by_id(batch.user_id)

                        if not recipient_email:
-                            logger.error(
+                            logger.warning(
                                f"User email not found for user {batch.user_id}"
                            )
                            continue
@@ -344,7 +344,7 @@ class NotificationManager(AppService):
                        ).get_user_notification_batch(batch.user_id, notification_type)

                        if not batch_data or not batch_data.notifications:
-                            logger.error(
+                            logger.warning(
                                f"Batch data not found for user {batch.user_id}"
                            )
                            # Clear the batch
@@ -372,7 +372,7 @@ class NotificationManager(AppService):
                                    )
                                )
                            except Exception as e:
-                                logger.error(
+                                logger.warning(
                                    f"Error parsing notification event: {e=}, {db_event=}"
                                )
                                continue
@@ -415,7 +415,10 @@ class NotificationManager(AppService):
    async def discord_system_alert(
        self, content: str, channel: DiscordChannel = DiscordChannel.PLATFORM
    ):
-        await discord_send_alert(content, channel)
+        try:
+            await discord_send_alert(content, channel)
+        except Exception as e:
+            logger.warning(f"Failed to send Discord system alert: {e}")

    async def _queue_scheduled_notification(self, event: SummaryParamsEventModel):
        """Queue a scheduled notification - exposed method for other services to call"""
@@ -516,7 +519,7 @@ class NotificationManager(AppService):
                raise ValueError("Invalid event type or params")

        except Exception as e:
-            logger.error(f"Failed to gather summary data: {e}")
+            logger.warning(f"Failed to gather summary data: {e}")
            # Return sensible defaults in case of error
            if event_type == NotificationType.DAILY_SUMMARY and isinstance(
                params, DailySummaryParams
@@ -562,8 +565,9 @@ class NotificationManager(AppService):
            should_retry=False
        ).get_user_notification_oldest_message_in_batch(user_id, event_type)
        if not oldest_message:
-            logger.error(
-                f"Batch for user {user_id} and type {event_type} has no oldest message whichshould never happen!!!!!!!!!!!!!!!!"
+            logger.warning(
+                f"Batch for user {user_id} and type {event_type} "
+                f"has no oldest message — batch may have been cleared concurrently"
            )
            return False
        oldest_age = oldest_message.created_at
@@ -585,7 +589,7 @@ class NotificationManager(AppService):
                get_notif_data_type(event.type)
            ].model_validate_json(message)
        except Exception as e:
-            logger.error(f"Error parsing message due to non matching schema {e}")
+            logger.warning(f"Error parsing message due to non matching schema {e}")
            return None

    async def _process_admin_message(self, message: str) -> bool:
@@ -614,7 +618,7 @@ class NotificationManager(AppService):
                should_retry=False
            ).get_user_email_by_id(event.user_id)
            if not recipient_email:
-                logger.error(f"User email not found for user {event.user_id}")
+                logger.warning(f"User email not found for user {event.user_id}")
                return False

            should_send = await self._should_email_user_based_on_preference(
@@ -651,7 +655,7 @@ class NotificationManager(AppService):
                should_retry=False
            ).get_user_email_by_id(event.user_id)
            if not recipient_email:
-                logger.error(f"User email not found for user {event.user_id}")
+                logger.warning(f"User email not found for user {event.user_id}")
                return False

            should_send = await self._should_email_user_based_on_preference(
@@ -672,7 +676,7 @@ class NotificationManager(AppService):
                should_retry=False
            ).get_user_notification_batch(event.user_id, event.type)
            if not batch or not batch.notifications:
-                logger.error(f"Batch not found for user {event.user_id}")
+                logger.warning(f"Batch not found for user {event.user_id}")
                return False
            unsub_link = generate_unsubscribe_link(event.user_id)

@@ -745,7 +749,7 @@ class NotificationManager(AppService):
                                        f"Removed {len(chunk_ids)} sent notifications from batch"
                                    )
                                except Exception as e:
-                                    logger.error(
+                                    logger.warning(
                                        f"Failed to remove sent notifications: {e}"
                                    )
                                    # Continue anyway - better to risk duplicates than lose emails
@@ -770,7 +774,7 @@ class NotificationManager(AppService):
                        else:
                            # Message is too large even after size reduction
                            if attempt_size == 1:
-                                logger.error(
+                                logger.warning(
                                    f"Failed to send notification at index {i}: "
                                    f"Single notification exceeds email size limit "
                                    f"({len(test_message):,} chars > {MAX_EMAIL_SIZE:,} chars). "
@@ -789,7 +793,7 @@ class NotificationManager(AppService):
                                            f"Removed oversized notification {chunk_ids[0]} from batch permanently"
                                        )
                                    except Exception as e:
-                                        logger.error(
+                                        logger.warning(
                                            f"Failed to remove oversized notification: {e}"
                                        )

@@ -823,7 +827,7 @@ class NotificationManager(AppService):
                                        f"Set email verification to false for user {event.user_id}"
                                    )
                                except Exception as deactivation_error:
-                                    logger.error(
+                                    logger.warning(
                                        f"Failed to deactivate email for user {event.user_id}: "
                                        f"{deactivation_error}"
                                    )
@@ -835,7 +839,7 @@ class NotificationManager(AppService):
                                        f"Disabled all notification preferences for user {event.user_id}"
                                    )
                                except Exception as disable_error:
-                                    logger.error(
+                                    logger.warning(
                                        f"Failed to disable notification preferences: {disable_error}"
                                    )

@@ -848,7 +852,7 @@ class NotificationManager(AppService):
                                        f"Cleared ALL notification batches for user {event.user_id}"
                                    )
                                except Exception as remove_error:
-                                    logger.error(
+                                    logger.warning(
                                        f"Failed to clear batches for inactive recipient: {remove_error}"
                                    )

@@ -859,7 +863,7 @@ class NotificationManager(AppService):
                                "422" in error_message
                                or "unprocessable" in error_message
                            ):
-                                logger.error(
+                                logger.warning(
                                    f"Failed to send notification at index {i}: "
                                    f"Malformed notification data rejected by Postmark. "
                                    f"Error: {e}. Removing from batch permanently."
@@ -877,7 +881,7 @@ class NotificationManager(AppService):
                                            "Removed malformed notification from batch permanently"
                                        )
                                    except Exception as remove_error:
-                                        logger.error(
+                                        logger.warning(
                                            f"Failed to remove malformed notification: {remove_error}"
                                        )
                            # Check if it's a ValueError for size limit
@@ -885,14 +889,14 @@ class NotificationManager(AppService):
                                isinstance(e, ValueError)
                                and "too large" in error_message
                            ):
-                                logger.error(
+                                logger.warning(
                                    f"Failed to send notification at index {i}: "
                                    f"Notification size exceeds email limit. "
                                    f"Error: {e}. Skipping this notification."
                                )
                            # Other API errors
                            else:
-                                logger.error(
+                                logger.warning(
                                    f"Failed to send notification at index {i}: "
                                    f"Email API error ({error_type}): {e}. "
                                    f"Skipping this notification."
@@ -907,7 +911,9 @@ class NotificationManager(AppService):

                if not chunk_sent:
                    # Should not reach here due to single notification handling
-                    logger.error(f"Failed to send notifications starting at index {i}")
+                    logger.warning(
+                        f"Failed to send notifications starting at index {i}"
+                    )
                    failed_indices.append(i)
                    i += 1

@@ -946,7 +952,7 @@ class NotificationManager(AppService):
                should_retry=False
            ).get_user_email_by_id(event.user_id)
            if not recipient_email:
-                logger.error(f"User email not found for user {event.user_id}")
+                logger.warning(f"User email not found for user {event.user_id}")
                return False
            should_send = await self._should_email_user_based_on_preference(
                event.user_id, event.type
@@ -1007,7 +1013,10 @@ class NotificationManager(AppService):
                        # Let message.process() handle the rejection
                        pass
                    except Exception as e:
-                        logger.error(f"Error processing message in {queue_name}: {e}")
+                        logger.warning(
+                            f"Error processing message in {queue_name}: {e}",
+                            exc_info=True,
+                        )
                        # Let message.process() handle the rejection
                        raise
        except asyncio.CancelledError:
--- a/autogpt_platform/backend/backend/notifications/test_notifications.py
+++ b/autogpt_platform/backend/backend/notifications/test_notifications.py
@@ -256,9 +256,9 @@ class TestNotificationErrorHandling:
            assert 2 not in successful_indices  # Index 2 failed

            # Verify 422 error was logged
-            error_calls = [call[0][0] for call in mock_logger.error.call_args_list]
+            warning_calls = [call[0][0] for call in mock_logger.warning.call_args_list]
            assert any(
-                "422" in call or "malformed" in call.lower() for call in error_calls
+                "422" in call or "malformed" in call.lower() for call in warning_calls
            )

            # Verify all notifications were removed (4 successful + 1 malformed)
@@ -371,10 +371,10 @@ class TestNotificationErrorHandling:
            assert 3 not in successful_indices  # Index 3 was not sent

            # Verify oversized error was logged
-            error_calls = [call[0][0] for call in mock_logger.error.call_args_list]
+            warning_calls = [call[0][0] for call in mock_logger.warning.call_args_list]
            assert any(
                "exceeds email size limit" in call or "oversized" in call.lower()
-                for call in error_calls
+                for call in warning_calls
            )

    @pytest.mark.asyncio
@@ -478,10 +478,10 @@ class TestNotificationErrorHandling:
            assert 1 in failed_indices  # Index 1 failed

            # Verify generic error was logged
-            error_calls = [call[0][0] for call in mock_logger.error.call_args_list]
+            warning_calls = [call[0][0] for call in mock_logger.warning.call_args_list]
            assert any(
                "api error" in call.lower() or "skipping" in call.lower()
-                for call in error_calls
+                for call in warning_calls
            )

            # Only successful ones should be removed from batch (failed one stays for retry)
--- a/autogpt_platform/backend/backend/util/cloud_storage.py
+++ b/autogpt_platform/backend/backend/util/cloud_storage.py
@@ -613,5 +613,5 @@ async def cleanup_expired_files_async() -> int:
            )
            return deleted_count
        except Exception as e:
-            logger.error(f"[CloudStorage] Error during cloud storage cleanup: {e}")
+            logger.warning(f"[CloudStorage] Error during cloud storage cleanup: {e}")
            return 0
--- a/autogpt_platform/backend/backend/util/file.py
+++ b/autogpt_platform/backend/backend/util/file.py
@@ -275,13 +275,12 @@ async def store_media_file(
    # Process file
    elif file.startswith("data:"):
        # Data URI
-        match = re.match(r"^data:([^;]+);base64,(.*)$", file, re.DOTALL)
-        if not match:
+        parsed_uri = parse_data_uri(file)
+        if parsed_uri is None:
            raise ValueError(
                "Invalid data URI format. Expected data:<mime>;base64,<data>"
            )
-        mime_type = match.group(1).strip().lower()
-        b64_content = match.group(2).strip()
+        mime_type, b64_content = parsed_uri

        # Generate filename and decode
        extension = _extension_from_mime(mime_type)
@@ -415,13 +414,70 @@ def get_dir_size(path: Path) -> int:
    return total


+async def resolve_media_content(
+    content: MediaFileType,
+    execution_context: "ExecutionContext",
+    *,
+    return_format: MediaReturnFormat,
+) -> MediaFileType:
+    """Resolve a ``MediaFileType`` value if it is a media reference, pass through otherwise.
+
+    Convenience wrapper around :func:`is_media_file_ref` + :func:`store_media_file`.
+    Plain text content (source code, filenames) is returned unchanged.  Media
+    references (``data:``, ``workspace://``, ``http(s)://``) are resolved via
+    :func:`store_media_file` using *return_format*.
+
+    Use this when a block field is typed as ``MediaFileType`` but may contain
+    either literal text or a media reference.
+    """
+    if not content or not is_media_file_ref(content):
+        return content
+    return await store_media_file(
+        content, execution_context, return_format=return_format
+    )
+
+
+def is_media_file_ref(value: str) -> bool:
+    """Return True if *value* looks like a ``MediaFileType`` reference.
+
+    Detects data URIs, workspace:// references, and HTTP(S) URLs — the
+    formats accepted by :func:`store_media_file`.  Plain text content
+    (e.g. source code, filenames) returns False.
+
+    Known limitation: HTTP(S) URL detection is heuristic.  Any string that
+    starts with ``http://`` or ``https://`` is treated as a media URL, even
+    if it appears as a URL inside source-code comments or documentation.
+    Blocks that produce source code or Markdown as output may therefore
+    trigger false positives.  Callers that need higher precision should
+    inspect the string further (e.g. verify the URL is reachable or has a
+    media-friendly extension).
+
+    Note: this does *not* match local file paths, which are ambiguous
+    (could be filenames or actual paths).  Blocks that need to resolve
+    local paths should check for them separately.
+    """
+    return value.startswith(("data:", "workspace://", "http://", "https://"))
+
+
+def parse_data_uri(value: str) -> tuple[str, str] | None:
+    """Parse a ``data:<mime>;base64,<payload>`` URI.
+
+    Returns ``(mime_type, base64_payload)`` if *value* is a valid data URI,
+    or ``None`` if it is not.
+    """
+    match = re.match(r"^data:([^;]+);base64,(.*)$", value, re.DOTALL)
+    if not match:
+        return None
+    return match.group(1).strip().lower(), match.group(2).strip()
+
+
 def get_mime_type(file: str) -> str:
    """
    Get the MIME type of a file, whether it's a data URI, URL, or local path.
    """
    if file.startswith("data:"):
-        match = re.match(r"^data:([^;]+);base64,", file)
-        return match.group(1) if match else "application/octet-stream"
+        parsed_uri = parse_data_uri(file)
+        return parsed_uri[0] if parsed_uri else "application/octet-stream"

    elif file.startswith(("http://", "https://")):
        parsed_url = urlparse(file)
--- a/autogpt_platform/backend/backend/util/file_content_parser.py
+++ b/autogpt_platform/backend/backend/util/file_content_parser.py
@@ -0,0 +1,375 @@
+"""Parse file content into structured Python objects based on file format.
+
+Used by the ``@@agptfile:`` expansion system to eagerly parse well-known file
+formats into native Python types *before* schema-driven coercion runs.  This
+lets blocks with ``Any``-typed inputs receive structured data rather than raw
+strings, while blocks expecting strings get the value coerced back via
+``convert()``.
+
+Supported formats:
+
+- **JSON** (``.json``) — arrays and objects are promoted; scalars stay as strings
+- **JSON Lines** (``.jsonl``, ``.ndjson``) — each non-empty line parsed as JSON;
+  when all lines are dicts with the same keys (tabular data), output is
+  ``list[list[Any]]`` with a header row, consistent with CSV/Parquet/Excel;
+  otherwise returns a plain ``list`` of parsed values
+- **CSV** (``.csv``) — ``csv.reader`` → ``list[list[str]]``
+- **TSV** (``.tsv``) — tab-delimited → ``list[list[str]]``
+- **YAML** (``.yaml``, ``.yml``) — parsed via PyYAML; containers only
+- **TOML** (``.toml``) — parsed via stdlib ``tomllib``
+- **Parquet** (``.parquet``) — via pandas/pyarrow → ``list[list[Any]]`` with header row
+- **Excel** (``.xlsx``) — via pandas/openpyxl → ``list[list[Any]]`` with header row
+  (legacy ``.xls`` is **not** supported — only the modern OOXML format)
+
+The **fallback contract** is enforced by :func:`parse_file_content`, not by
+individual parser functions.  If any parser raises, ``parse_file_content``
+catches the exception and returns the original content unchanged (string for
+text formats, bytes for binary formats).  Callers should never see an
+exception from the public API when ``strict=False``.
+"""
+
+import csv
+import io
+import json
+import logging
+import tomllib
+import zipfile
+from collections.abc import Callable
+
+# posixpath.splitext handles forward-slash URI paths correctly on all platforms,
+# unlike os.path.splitext which uses platform-native separators.
+from posixpath import splitext
+from typing import Any
+
+import yaml
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Extension / MIME → format label mapping
+# ---------------------------------------------------------------------------
+
+_EXT_TO_FORMAT: dict[str, str] = {
+    ".json": "json",
+    ".jsonl": "jsonl",
+    ".ndjson": "jsonl",
+    ".csv": "csv",
+    ".tsv": "tsv",
+    ".yaml": "yaml",
+    ".yml": "yaml",
+    ".toml": "toml",
+    ".parquet": "parquet",
+    ".xlsx": "xlsx",
+}
+
+MIME_TO_FORMAT: dict[str, str] = {
+    "application/json": "json",
+    "application/x-ndjson": "jsonl",
+    "application/jsonl": "jsonl",
+    "text/csv": "csv",
+    "text/tab-separated-values": "tsv",
+    "application/x-yaml": "yaml",
+    "application/yaml": "yaml",
+    "text/yaml": "yaml",
+    "application/toml": "toml",
+    "application/vnd.apache.parquet": "parquet",
+    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "xlsx",
+}
+
+# Formats that require raw bytes rather than decoded text.
+BINARY_FORMATS: frozenset[str] = frozenset({"parquet", "xlsx"})
+
+
+# ---------------------------------------------------------------------------
+# Public API  (top-down: main functions first, helpers below)
+# ---------------------------------------------------------------------------
+
+
+def infer_format_from_uri(uri: str) -> str | None:
+    """Return a format label based on URI extension or MIME fragment.
+
+    Returns ``None`` when the format cannot be determined — the caller should
+    fall back to returning the content as a plain string.
+    """
+    # 1. Check MIME fragment  (workspace://abc123#application/json)
+    if "#" in uri:
+        _, fragment = uri.rsplit("#", 1)
+        fmt = MIME_TO_FORMAT.get(fragment.lower())
+        if fmt:
+            return fmt
+
+    # 2. Check file extension from the path portion.
+    #    Strip the fragment first so ".json#mime" doesn't confuse splitext.
+    path = uri.split("#")[0].split("?")[0]
+    _, ext = splitext(path)
+    fmt = _EXT_TO_FORMAT.get(ext.lower())
+    if fmt is not None:
+        return fmt
+
+    # Legacy .xls is not supported — map it so callers can produce a
+    # user-friendly error instead of returning garbled binary.
+    if ext.lower() == ".xls":
+        return "xls"
+
+    return None
+
+
+def parse_file_content(content: str | bytes, fmt: str, *, strict: bool = False) -> Any:
+    """Parse *content* according to *fmt* and return a native Python value.
+
+    When *strict* is ``False`` (default), returns the original *content*
+    unchanged if *fmt* is not recognised or parsing fails for any reason.
+    This mode **never raises**.
+
+    When *strict* is ``True``, parsing errors are propagated to the caller.
+    Unrecognised formats or type mismatches (e.g. text for a binary format)
+    still return *content* unchanged without raising.
+    """
+    if fmt == "xls":
+        return (
+            "[Unsupported format] Legacy .xls files are not supported. "
+            "Please re-save the file as .xlsx (Excel 2007+) and upload again."
+        )
+
+    try:
+        if fmt in BINARY_FORMATS:
+            parser = _BINARY_PARSERS.get(fmt)
+            if parser is None:
+                return content
+            if isinstance(content, str):
+                # Caller gave us text for a binary format — can't parse.
+                return content
+            return parser(content)
+
+        parser = _TEXT_PARSERS.get(fmt)
+        if parser is None:
+            return content
+        if isinstance(content, bytes):
+            content = content.decode("utf-8", errors="replace")
+        return parser(content)
+
+    except PARSE_EXCEPTIONS:
+        if strict:
+            raise
+        logger.debug("Structured parsing failed for format=%s, falling back", fmt)
+        return content
+
+
+# ---------------------------------------------------------------------------
+# Exception loading helpers
+# ---------------------------------------------------------------------------
+
+
+def _load_openpyxl_exception() -> type[Exception]:
+    """Return openpyxl's InvalidFileException, raising ImportError if absent."""
+    from openpyxl.utils.exceptions import InvalidFileException  # noqa: PLC0415
+
+    return InvalidFileException
+
+
+def _load_arrow_exception() -> type[Exception]:
+    """Return pyarrow's ArrowException, raising ImportError if absent."""
+    from pyarrow import ArrowException  # noqa: PLC0415
+
+    return ArrowException
+
+
+def _optional_exc(loader: "Callable[[], type[Exception]]") -> "type[Exception] | None":
+    """Return the exception class from *loader*, or ``None`` if the dep is absent."""
+    try:
+        return loader()
+    except ImportError:
+        return None
+
+
+# Exception types that can be raised during file content parsing.
+# Shared between ``parse_file_content`` (which catches them in non-strict mode)
+# and ``file_ref._expand_bare_ref`` (which re-raises them as FileRefExpansionError).
+#
+# Optional-dependency exception types are loaded via a helper that raises
+# ``ImportError`` at *parse time* rather than silently becoming ``None`` here.
+# This ensures mypy sees clean types and missing deps surface as real errors.
+PARSE_EXCEPTIONS: tuple[type[BaseException], ...] = tuple(
+    exc
+    for exc in (
+        json.JSONDecodeError,
+        csv.Error,
+        yaml.YAMLError,
+        tomllib.TOMLDecodeError,
+        ValueError,
+        UnicodeDecodeError,
+        ImportError,
+        OSError,
+        KeyError,
+        TypeError,
+        zipfile.BadZipFile,
+        _optional_exc(_load_openpyxl_exception),
+        # ArrowException covers ArrowIOError and ArrowCapacityError which
+        # do not inherit from standard exceptions; ArrowInvalid/ArrowTypeError
+        # already map to ValueError/TypeError but this catches the rest.
+        _optional_exc(_load_arrow_exception),
+    )
+    if exc is not None
+)
+
+
+# ---------------------------------------------------------------------------
+# Text-based parsers  (content: str → Any)
+# ---------------------------------------------------------------------------
+
+
+def _parse_container(parser: Callable[[str], Any], content: str) -> list | dict | str:
+    """Parse *content* and return the result only if it is a container (list/dict).
+
+    Scalar values (strings, numbers, booleans, None) are discarded and the
+    original *content* string is returned instead.  This prevents e.g. a JSON
+    file containing just ``"42"`` from silently becoming an int.
+    """
+    parsed = parser(content)
+    if isinstance(parsed, (list, dict)):
+        return parsed
+    return content
+
+
+def _parse_json(content: str) -> list | dict | str:
+    return _parse_container(json.loads, content)
+
+
+def _parse_jsonl(content: str) -> Any:
+    lines = [json.loads(line) for line in content.splitlines() if line.strip()]
+    if not lines:
+        return content
+
+    # When every line is a dict with the same keys, convert to table format
+    # (header row + data rows) — consistent with CSV/TSV/Parquet/Excel output.
+    # Require ≥2 dicts so a single-line JSONL stays as [dict] (not a table).
+    if len(lines) >= 2 and all(isinstance(obj, dict) for obj in lines):
+        keys = list(lines[0].keys())
+        # Cache as tuple to avoid O(n×k) list allocations in the all() call.
+        keys_tuple = tuple(keys)
+        if keys and all(tuple(obj.keys()) == keys_tuple for obj in lines[1:]):
+            return [keys] + [[obj[k] for k in keys] for obj in lines]
+
+    return lines
+
+
+def _parse_csv(content: str) -> Any:
+    return _parse_delimited(content, delimiter=",")
+
+
+def _parse_tsv(content: str) -> Any:
+    return _parse_delimited(content, delimiter="\t")
+
+
+def _parse_delimited(content: str, *, delimiter: str) -> Any:
+    reader = csv.reader(io.StringIO(content), delimiter=delimiter)
+    # csv.reader never yields [] — blank lines yield [""]. Filter out
+    # rows where every cell is empty (i.e. truly blank lines).
+    rows = [row for row in reader if _row_has_content(row)]
+    if not rows:
+        return content
+    # If the declared delimiter produces only single-column rows, try
+    # sniffing the actual delimiter — catches misidentified files (e.g.
+    # a tab-delimited file with a .csv extension).
+    if len(rows[0]) == 1:
+        try:
+            dialect = csv.Sniffer().sniff(content[:8192])
+            if dialect.delimiter != delimiter:
+                reader = csv.reader(io.StringIO(content), dialect)
+                rows = [row for row in reader if _row_has_content(row)]
+        except csv.Error:
+            pass
+    if rows and len(rows[0]) >= 2:
+        return rows
+    return content
+
+
+def _row_has_content(row: list[str]) -> bool:
+    """Return True when *row* contains at least one non-empty cell.
+
+    ``csv.reader`` never yields ``[]`` — truly blank lines yield ``[""]``.
+    This predicate filters those out consistently across the initial read
+    and the sniffer-fallback re-read.
+    """
+    return any(cell for cell in row)
+
+
+def _parse_yaml(content: str) -> list | dict | str:
+    # NOTE: YAML anchor/alias expansion can amplify input beyond the 10MB cap.
+    # safe_load prevents code execution; for production hardening consider
+    # a YAML parser with expansion limits (e.g. ruamel.yaml with max_alias_count).
+    if "\n---" in content or content.startswith("---\n"):
+        # Multi-document YAML: only the first document is parsed; the rest
+        # are silently ignored by yaml.safe_load.  Warn so callers are aware.
+        logger.warning(
+            "Multi-document YAML detected (--- separator); "
+            "only the first document will be parsed."
+        )
+    return _parse_container(yaml.safe_load, content)
+
+
+def _parse_toml(content: str) -> Any:
+    parsed = tomllib.loads(content)
+    # tomllib.loads always returns a dict — return it even if empty.
+    return parsed
+
+
+_TEXT_PARSERS: dict[str, Callable[[str], Any]] = {
+    "json": _parse_json,
+    "jsonl": _parse_jsonl,
+    "csv": _parse_csv,
+    "tsv": _parse_tsv,
+    "yaml": _parse_yaml,
+    "toml": _parse_toml,
+}
+
+# ---------------------------------------------------------------------------
+# Binary-based parsers  (content: bytes → Any)
+# ---------------------------------------------------------------------------
+
+
+def _parse_parquet(content: bytes) -> list[list[Any]]:
+    import pandas as pd
+
+    df = pd.read_parquet(io.BytesIO(content))
+    return _df_to_rows(df)
+
+
+def _parse_xlsx(content: bytes) -> list[list[Any]]:
+    import pandas as pd
+
+    # Explicitly specify openpyxl engine; the default engine varies by pandas
+    # version and does not support legacy .xls (which is excluded by our format map).
+    df = pd.read_excel(io.BytesIO(content), engine="openpyxl")
+    return _df_to_rows(df)
+
+
+def _df_to_rows(df: Any) -> list[list[Any]]:
+    """Convert a DataFrame to ``list[list[Any]]`` with a header row.
+
+    NaN values are replaced with ``None`` so the result is JSON-serializable.
+    Uses explicit cell-level checking because ``df.where(df.notna(), None)``
+    silently converts ``None`` back to ``NaN`` in float64 columns.
+    """
+    header = df.columns.tolist()
+    rows = [
+        [None if _is_nan(cell) else cell for cell in row] for row in df.values.tolist()
+    ]
+    return [header] + rows
+
+
+def _is_nan(cell: Any) -> bool:
+    """Check if a cell value is NaN, handling non-scalar types (lists, dicts).
+
+    ``pd.isna()`` on a list/dict returns a boolean array which raises
+    ``ValueError`` in a boolean context.  Guard with a scalar check first.
+    """
+    import pandas as pd
+
+    return bool(pd.api.types.is_scalar(cell) and pd.isna(cell))
+
+
+_BINARY_PARSERS: dict[str, Callable[[bytes], Any]] = {
+    "parquet": _parse_parquet,
+    "xlsx": _parse_xlsx,
+}
--- a/autogpt_platform/backend/backend/util/file_content_parser_test.py
+++ b/autogpt_platform/backend/backend/util/file_content_parser_test.py
@@ -0,0 +1,624 @@
+"""Tests for file_content_parser — format inference and structured parsing."""
+
+import io
+import json
+
+import pytest
+
+from backend.util.file_content_parser import (
+    BINARY_FORMATS,
+    infer_format_from_uri,
+    parse_file_content,
+)
+
+# ---------------------------------------------------------------------------
+# infer_format_from_uri
+# ---------------------------------------------------------------------------
+
+
+class TestInferFormat:
+    # --- extension-based ---
+
+    def test_json_extension(self):
+        assert infer_format_from_uri("/home/user/data.json") == "json"
+
+    def test_jsonl_extension(self):
+        assert infer_format_from_uri("/tmp/events.jsonl") == "jsonl"
+
+    def test_ndjson_extension(self):
+        assert infer_format_from_uri("/tmp/events.ndjson") == "jsonl"
+
+    def test_csv_extension(self):
+        assert infer_format_from_uri("workspace:///reports/sales.csv") == "csv"
+
+    def test_tsv_extension(self):
+        assert infer_format_from_uri("/home/user/data.tsv") == "tsv"
+
+    def test_yaml_extension(self):
+        assert infer_format_from_uri("/home/user/config.yaml") == "yaml"
+
+    def test_yml_extension(self):
+        assert infer_format_from_uri("/home/user/config.yml") == "yaml"
+
+    def test_toml_extension(self):
+        assert infer_format_from_uri("/home/user/config.toml") == "toml"
+
+    def test_parquet_extension(self):
+        assert infer_format_from_uri("/data/table.parquet") == "parquet"
+
+    def test_xlsx_extension(self):
+        assert infer_format_from_uri("/data/spreadsheet.xlsx") == "xlsx"
+
+    def test_xls_extension_returns_xls_label(self):
+        # Legacy .xls is mapped so callers can produce a helpful error.
+        assert infer_format_from_uri("/data/old_spreadsheet.xls") == "xls"
+
+    def test_case_insensitive(self):
+        assert infer_format_from_uri("/data/FILE.JSON") == "json"
+        assert infer_format_from_uri("/data/FILE.CSV") == "csv"
+
+    def test_unicode_filename(self):
+        assert infer_format_from_uri("/home/user/\u30c7\u30fc\u30bf.json") == "json"
+        assert infer_format_from_uri("/home/user/\u00e9t\u00e9.csv") == "csv"
+
+    def test_unknown_extension(self):
+        assert infer_format_from_uri("/home/user/readme.txt") is None
+
+    def test_no_extension(self):
+        assert infer_format_from_uri("workspace://abc123") is None
+
+    # --- MIME-based ---
+
+    def test_mime_json(self):
+        assert infer_format_from_uri("workspace://abc123#application/json") == "json"
+
+    def test_mime_csv(self):
+        assert infer_format_from_uri("workspace://abc123#text/csv") == "csv"
+
+    def test_mime_tsv(self):
+        assert (
+            infer_format_from_uri("workspace://abc123#text/tab-separated-values")
+            == "tsv"
+        )
+
+    def test_mime_ndjson(self):
+        assert (
+            infer_format_from_uri("workspace://abc123#application/x-ndjson") == "jsonl"
+        )
+
+    def test_mime_yaml(self):
+        assert infer_format_from_uri("workspace://abc123#application/x-yaml") == "yaml"
+
+    def test_mime_xlsx(self):
+        uri = "workspace://abc123#application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+        assert infer_format_from_uri(uri) == "xlsx"
+
+    def test_mime_parquet(self):
+        assert (
+            infer_format_from_uri("workspace://abc123#application/vnd.apache.parquet")
+            == "parquet"
+        )
+
+    def test_unknown_mime(self):
+        assert infer_format_from_uri("workspace://abc123#text/plain") is None
+
+    def test_unknown_mime_falls_through_to_extension(self):
+        # Unknown MIME (text/plain) should fall through to extension-based detection.
+        assert infer_format_from_uri("workspace:///data.csv#text/plain") == "csv"
+
+    # --- MIME takes precedence over extension ---
+
+    def test_mime_overrides_extension(self):
+        # .txt extension but JSON MIME → json
+        assert infer_format_from_uri("workspace:///file.txt#application/json") == "json"
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — JSON
+# ---------------------------------------------------------------------------
+
+
+class TestParseJson:
+    def test_array(self):
+        result = parse_file_content("[1, 2, 3]", "json")
+        assert result == [1, 2, 3]
+
+    def test_object(self):
+        result = parse_file_content('{"key": "value"}', "json")
+        assert result == {"key": "value"}
+
+    def test_nested(self):
+        content = json.dumps({"rows": [[1, 2], [3, 4]]})
+        result = parse_file_content(content, "json")
+        assert result == {"rows": [[1, 2], [3, 4]]}
+
+    def test_scalar_string_stays_as_string(self):
+        result = parse_file_content('"hello"', "json")
+        assert result == '"hello"'  # original content, not parsed
+
+    def test_scalar_number_stays_as_string(self):
+        result = parse_file_content("42", "json")
+        assert result == "42"
+
+    def test_scalar_boolean_stays_as_string(self):
+        result = parse_file_content("true", "json")
+        assert result == "true"
+
+    def test_null_stays_as_string(self):
+        result = parse_file_content("null", "json")
+        assert result == "null"
+
+    def test_invalid_json_fallback(self):
+        content = "not json at all"
+        result = parse_file_content(content, "json")
+        assert result == content
+
+    def test_empty_string_fallback(self):
+        result = parse_file_content("", "json")
+        assert result == ""
+
+    def test_bytes_input_decoded(self):
+        result = parse_file_content(b"[1, 2, 3]", "json")
+        assert result == [1, 2, 3]
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — JSONL
+# ---------------------------------------------------------------------------
+
+
+class TestParseJsonl:
+    def test_tabular_uniform_dicts_to_table_format(self):
+        """JSONL with uniform dict keys → table format (header + rows),
+        consistent with CSV/TSV/Parquet/Excel output."""
+        content = '{"name":"apple","color":"red"}\n{"name":"banana","color":"yellow"}\n{"name":"cherry","color":"red"}'
+        result = parse_file_content(content, "jsonl")
+        assert result == [
+            ["name", "color"],
+            ["apple", "red"],
+            ["banana", "yellow"],
+            ["cherry", "red"],
+        ]
+
+    def test_tabular_single_key_dicts(self):
+        """JSONL with single-key uniform dicts → table format."""
+        content = '{"a": 1}\n{"a": 2}\n{"a": 3}'
+        result = parse_file_content(content, "jsonl")
+        assert result == [["a"], [1], [2], [3]]
+
+    def test_tabular_blank_lines_skipped(self):
+        content = '{"a": 1}\n\n{"a": 2}\n'
+        result = parse_file_content(content, "jsonl")
+        assert result == [["a"], [1], [2]]
+
+    def test_heterogeneous_dicts_stay_as_list(self):
+        """JSONL with different keys across objects → list of dicts (no table)."""
+        content = '{"name":"apple"}\n{"color":"red"}\n{"size":3}'
+        result = parse_file_content(content, "jsonl")
+        assert result == [{"name": "apple"}, {"color": "red"}, {"size": 3}]
+
+    def test_partially_overlapping_keys_stay_as_list(self):
+        """JSONL dicts with partially overlapping keys → list of dicts."""
+        content = '{"name":"apple","color":"red"}\n{"name":"banana","size":"medium"}'
+        result = parse_file_content(content, "jsonl")
+        assert result == [
+            {"name": "apple", "color": "red"},
+            {"name": "banana", "size": "medium"},
+        ]
+
+    def test_mixed_types_stay_as_list(self):
+        """JSONL with non-dict lines → list of parsed values (no table)."""
+        content = '1\n"hello"\n[1,2]\n'
+        result = parse_file_content(content, "jsonl")
+        assert result == [1, "hello", [1, 2]]
+
+    def test_mixed_dicts_and_non_dicts_stay_as_list(self):
+        """JSONL mixing dicts and non-dicts → list of parsed values."""
+        content = '{"a": 1}\n42\n{"b": 2}'
+        result = parse_file_content(content, "jsonl")
+        assert result == [{"a": 1}, 42, {"b": 2}]
+
+    def test_tabular_preserves_key_order(self):
+        """Table header should follow the key order of the first object."""
+        content = '{"z": 1, "a": 2}\n{"z": 3, "a": 4}'
+        result = parse_file_content(content, "jsonl")
+        assert result[0] == ["z", "a"]  # order from first object
+        assert result[1] == [1, 2]
+        assert result[2] == [3, 4]
+
+    def test_single_dict_stays_as_list(self):
+        """Single-line JSONL with one dict → [dict], NOT a table.
+        Tabular detection requires ≥2 dicts to avoid vacuously true all()."""
+        content = '{"a": 1, "b": 2}'
+        result = parse_file_content(content, "jsonl")
+        assert result == [{"a": 1, "b": 2}]
+
+    def test_tabular_with_none_values(self):
+        """Uniform keys but some null values → table with None cells."""
+        content = '{"name":"apple","color":"red"}\n{"name":"banana","color":null}'
+        result = parse_file_content(content, "jsonl")
+        assert result == [
+            ["name", "color"],
+            ["apple", "red"],
+            ["banana", None],
+        ]
+
+    def test_empty_file_fallback(self):
+        result = parse_file_content("", "jsonl")
+        assert result == ""
+
+    def test_all_blank_lines_fallback(self):
+        result = parse_file_content("\n\n\n", "jsonl")
+        assert result == "\n\n\n"
+
+    def test_invalid_line_fallback(self):
+        content = '{"a": 1}\nnot json\n'
+        result = parse_file_content(content, "jsonl")
+        assert result == content  # fallback
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — CSV
+# ---------------------------------------------------------------------------
+
+
+class TestParseCsv:
+    def test_basic(self):
+        content = "Name,Score\nAlice,90\nBob,85"
+        result = parse_file_content(content, "csv")
+        assert result == [["Name", "Score"], ["Alice", "90"], ["Bob", "85"]]
+
+    def test_quoted_fields(self):
+        content = 'Name,Bio\nAlice,"Loves, commas"\nBob,Simple'
+        result = parse_file_content(content, "csv")
+        assert result[1] == ["Alice", "Loves, commas"]
+
+    def test_single_column_fallback(self):
+        # Only 1 column — not tabular enough.
+        content = "Name\nAlice\nBob"
+        result = parse_file_content(content, "csv")
+        assert result == content
+
+    def test_empty_rows_skipped(self):
+        content = "A,B\n\n1,2\n\n3,4"
+        result = parse_file_content(content, "csv")
+        assert result == [["A", "B"], ["1", "2"], ["3", "4"]]
+
+    def test_empty_file_fallback(self):
+        result = parse_file_content("", "csv")
+        assert result == ""
+
+    def test_utf8_bom(self):
+        """CSV with a UTF-8 BOM should parse correctly (BOM stripped by decode)."""
+        bom = "\ufeff"
+        content = bom + "Name,Score\nAlice,90\nBob,85"
+        result = parse_file_content(content, "csv")
+        # The BOM may be part of the first header cell; ensure rows are still parsed.
+        assert len(result) == 3
+        assert result[1] == ["Alice", "90"]
+        assert result[2] == ["Bob", "85"]
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — TSV
+# ---------------------------------------------------------------------------
+
+
+class TestParseTsv:
+    def test_basic(self):
+        content = "Name\tScore\nAlice\t90\nBob\t85"
+        result = parse_file_content(content, "tsv")
+        assert result == [["Name", "Score"], ["Alice", "90"], ["Bob", "85"]]
+
+    def test_single_column_fallback(self):
+        content = "Name\nAlice\nBob"
+        result = parse_file_content(content, "tsv")
+        assert result == content
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — YAML
+# ---------------------------------------------------------------------------
+
+
+class TestParseYaml:
+    def test_list(self):
+        content = "- apple\n- banana\n- cherry"
+        result = parse_file_content(content, "yaml")
+        assert result == ["apple", "banana", "cherry"]
+
+    def test_dict(self):
+        content = "name: Alice\nage: 30"
+        result = parse_file_content(content, "yaml")
+        assert result == {"name": "Alice", "age": 30}
+
+    def test_nested(self):
+        content = "users:\n  - name: Alice\n  - name: Bob"
+        result = parse_file_content(content, "yaml")
+        assert result == {"users": [{"name": "Alice"}, {"name": "Bob"}]}
+
+    def test_scalar_stays_as_string(self):
+        result = parse_file_content("hello world", "yaml")
+        assert result == "hello world"
+
+    def test_invalid_yaml_fallback(self):
+        content = ":\n  :\n    invalid: - -"
+        result = parse_file_content(content, "yaml")
+        # Malformed YAML should fall back to the original string, not raise.
+        assert result == content
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — TOML
+# ---------------------------------------------------------------------------
+
+
+class TestParseToml:
+    def test_basic(self):
+        content = '[server]\nhost = "localhost"\nport = 8080'
+        result = parse_file_content(content, "toml")
+        assert result == {"server": {"host": "localhost", "port": 8080}}
+
+    def test_flat(self):
+        content = 'name = "test"\ncount = 42'
+        result = parse_file_content(content, "toml")
+        assert result == {"name": "test", "count": 42}
+
+    def test_empty_string_returns_empty_dict(self):
+        result = parse_file_content("", "toml")
+        assert result == {}
+
+    def test_invalid_toml_fallback(self):
+        result = parse_file_content("not = [valid toml", "toml")
+        assert result == "not = [valid toml"
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — Parquet (binary)
+# ---------------------------------------------------------------------------
+
+
+try:
+    import pyarrow as _pa  # noqa: F401  # pyright: ignore[reportMissingImports]
+
+    _has_pyarrow = True
+except ImportError:
+    _has_pyarrow = False
+
+
+@pytest.mark.skipif(not _has_pyarrow, reason="pyarrow not installed")
+class TestParseParquet:
+    @pytest.fixture
+    def parquet_bytes(self) -> bytes:
+        import pandas as pd
+
+        df = pd.DataFrame({"Name": ["Alice", "Bob"], "Score": [90, 85]})
+        buf = io.BytesIO()
+        df.to_parquet(buf, index=False)
+        return buf.getvalue()
+
+    def test_basic(self, parquet_bytes: bytes):
+        result = parse_file_content(parquet_bytes, "parquet")
+        assert result == [["Name", "Score"], ["Alice", 90], ["Bob", 85]]
+
+    def test_string_input_fallback(self):
+        # Parquet is binary — string input can't be parsed.
+        result = parse_file_content("not parquet", "parquet")
+        assert result == "not parquet"
+
+    def test_invalid_bytes_fallback(self):
+        result = parse_file_content(b"not parquet bytes", "parquet")
+        assert result == b"not parquet bytes"
+
+    def test_empty_bytes_fallback(self):
+        """Empty binary input should return the empty bytes, not crash."""
+        result = parse_file_content(b"", "parquet")
+        assert result == b""
+
+    def test_nan_replaced_with_none(self):
+        """NaN values in Parquet must become None for JSON serializability."""
+        import math
+
+        import pandas as pd
+
+        df = pd.DataFrame({"A": [1.0, float("nan"), 3.0], "B": ["x", None, "z"]})
+        buf = io.BytesIO()
+        df.to_parquet(buf, index=False)
+        result = parse_file_content(buf.getvalue(), "parquet")
+        # Row with NaN in float col → None
+        assert result[2][0] is None  # float NaN → None
+        assert result[2][1] is None  # str None → None
+        # Ensure no NaN leaks
+        for row in result[1:]:
+            for cell in row:
+                if isinstance(cell, float):
+                    assert not math.isnan(cell), f"NaN leaked: {row}"
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — Excel (binary)
+# ---------------------------------------------------------------------------
+
+
+class TestParseExcel:
+    @pytest.fixture
+    def xlsx_bytes(self) -> bytes:
+        import pandas as pd
+
+        df = pd.DataFrame({"Name": ["Alice", "Bob"], "Score": [90, 85]})
+        buf = io.BytesIO()
+        df.to_excel(buf, index=False)  # type: ignore[arg-type]  # BytesIO is a valid target
+        return buf.getvalue()
+
+    def test_basic(self, xlsx_bytes: bytes):
+        result = parse_file_content(xlsx_bytes, "xlsx")
+        assert result == [["Name", "Score"], ["Alice", 90], ["Bob", 85]]
+
+    def test_string_input_fallback(self):
+        result = parse_file_content("not xlsx", "xlsx")
+        assert result == "not xlsx"
+
+    def test_invalid_bytes_fallback(self):
+        result = parse_file_content(b"not xlsx bytes", "xlsx")
+        assert result == b"not xlsx bytes"
+
+    def test_empty_bytes_fallback(self):
+        """Empty binary input should return the empty bytes, not crash."""
+        result = parse_file_content(b"", "xlsx")
+        assert result == b""
+
+    def test_nan_replaced_with_none(self):
+        """NaN values in float columns must become None for JSON serializability."""
+        import math
+
+        import pandas as pd
+
+        df = pd.DataFrame({"A": [1.0, float("nan"), 3.0], "B": ["x", "y", None]})
+        buf = io.BytesIO()
+        df.to_excel(buf, index=False)  # type: ignore[arg-type]
+        result = parse_file_content(buf.getvalue(), "xlsx")
+        # Row with NaN in float col → None, not float('nan')
+        assert result[2][0] is None  # float NaN → None
+        assert result[3][1] is None  # str None → None
+        # Ensure no NaN leaks
+        for row in result[1:]:  # skip header
+            for cell in row:
+                if isinstance(cell, float):
+                    assert not math.isnan(cell), f"NaN leaked: {row}"
+
+
+# ---------------------------------------------------------------------------
+# parse_file_content — unknown format / fallback
+# ---------------------------------------------------------------------------
+
+
+class TestFallback:
+    def test_unknown_format_returns_content(self):
+        result = parse_file_content("hello world", "xml")
+        assert result == "hello world"
+
+    def test_none_format_returns_content(self):
+        # Shouldn't normally be called with unrecognised format, but must not crash.
+        result = parse_file_content("hello", "unknown_format")
+        assert result == "hello"
+
+
+# ---------------------------------------------------------------------------
+# BINARY_FORMATS
+# ---------------------------------------------------------------------------
+
+
+class TestBinaryFormats:
+    def test_parquet_is_binary(self):
+        assert "parquet" in BINARY_FORMATS
+
+    def test_xlsx_is_binary(self):
+        assert "xlsx" in BINARY_FORMATS
+
+    def test_text_formats_not_binary(self):
+        for fmt in ("json", "jsonl", "csv", "tsv", "yaml", "toml"):
+            assert fmt not in BINARY_FORMATS
+
+
+# ---------------------------------------------------------------------------
+# MIME mapping
+# ---------------------------------------------------------------------------
+
+
+class TestMimeMapping:
+    def test_application_yaml(self):
+        assert infer_format_from_uri("workspace://abc123#application/yaml") == "yaml"
+
+
+# ---------------------------------------------------------------------------
+# CSV sniffer fallback
+# ---------------------------------------------------------------------------
+
+
+class TestCsvSnifferFallback:
+    def test_tab_delimited_with_csv_format(self):
+        """Tab-delimited content parsed as csv should use sniffer fallback."""
+        content = "Name\tScore\nAlice\t90\nBob\t85"
+        result = parse_file_content(content, "csv")
+        assert result == [["Name", "Score"], ["Alice", "90"], ["Bob", "85"]]
+
+    def test_sniffer_failure_returns_content(self):
+        """When sniffer fails, single-column falls back to raw content."""
+        content = "Name\nAlice\nBob"
+        result = parse_file_content(content, "csv")
+        assert result == content
+
+
+# ---------------------------------------------------------------------------
+# OpenpyxlInvalidFile fallback
+# ---------------------------------------------------------------------------
+
+
+class TestOpenpyxlFallback:
+    def test_invalid_xlsx_non_strict(self):
+        """Invalid xlsx bytes should fall back gracefully in non-strict mode."""
+        result = parse_file_content(b"not xlsx bytes", "xlsx")
+        assert result == b"not xlsx bytes"
+
+
+# ---------------------------------------------------------------------------
+# Header-only CSV
+# ---------------------------------------------------------------------------
+
+
+class TestHeaderOnlyCsv:
+    def test_header_only_csv_returns_header_row(self):
+        """CSV with only a header row (no data rows) should return [[header]]."""
+        content = "Name,Score"
+        result = parse_file_content(content, "csv")
+        assert result == [["Name", "Score"]]
+
+    def test_header_only_csv_with_trailing_newline(self):
+        content = "Name,Score\n"
+        result = parse_file_content(content, "csv")
+        assert result == [["Name", "Score"]]
+
+
+# ---------------------------------------------------------------------------
+# Binary format + line range (line range ignored for binary formats)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.skipif(not _has_pyarrow, reason="pyarrow not installed")
+class TestBinaryFormatLineRange:
+    def test_parquet_ignores_line_range(self):
+        """Binary formats should parse the full file regardless of line range.
+
+        Line ranges are meaningless for binary formats (parquet/xlsx) — the
+        caller (file_ref._expand_bare_ref) passes raw bytes and the parser
+        should return the complete structured data.
+        """
+        import pandas as pd
+
+        df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+        buf = io.BytesIO()
+        df.to_parquet(buf, index=False)
+        # parse_file_content itself doesn't take a line range — this tests
+        # that the full content is parsed even though the bytes could have
+        # been truncated upstream (it's not, by design).
+        result = parse_file_content(buf.getvalue(), "parquet")
+        assert result == [["A", "B"], [1, 4], [2, 5], [3, 6]]
+
+
+# ---------------------------------------------------------------------------
+# Legacy .xls UX
+# ---------------------------------------------------------------------------
+
+
+class TestXlsFallback:
+    def test_xls_returns_helpful_error_string(self):
+        """Uploading a .xls file should produce a helpful error, not garbled binary."""
+        result = parse_file_content(b"\xd0\xcf\x11\xe0garbled", "xls")
+        assert isinstance(result, str)
+        assert ".xlsx" in result
+        assert "not supported" in result.lower()
+
+    def test_xls_with_string_content(self):
+        result = parse_file_content("some text", "xls")
+        assert isinstance(result, str)
+        assert ".xlsx" in result
--- a/autogpt_platform/backend/backend/util/file_test.py
+++ b/autogpt_platform/backend/backend/util/file_test.py
@@ -8,7 +8,12 @@ from unittest.mock import AsyncMock, MagicMock, patch
 import pytest

 from backend.data.execution import ExecutionContext
-from backend.util.file import store_media_file
+from backend.util.file import (
+    is_media_file_ref,
+    parse_data_uri,
+    resolve_media_content,
+    store_media_file,
+)
 from backend.util.type import MediaFileType


@@ -344,3 +349,162 @@ class TestFileCloudIntegration:
                    execution_context=make_test_context(graph_exec_id=graph_exec_id),
                    return_format="for_local_processing",
                )
+
+
+# ---------------------------------------------------------------------------
+# is_media_file_ref
+# ---------------------------------------------------------------------------
+
+
+class TestIsMediaFileRef:
+    def test_data_uri(self):
+        assert is_media_file_ref("data:image/png;base64,iVBORw0KGg==") is True
+
+    def test_workspace_uri(self):
+        assert is_media_file_ref("workspace://abc123") is True
+
+    def test_workspace_uri_with_mime(self):
+        assert is_media_file_ref("workspace://abc123#image/png") is True
+
+    def test_http_url(self):
+        assert is_media_file_ref("http://example.com/image.png") is True
+
+    def test_https_url(self):
+        assert is_media_file_ref("https://example.com/image.png") is True
+
+    def test_plain_text(self):
+        assert is_media_file_ref("print('hello')") is False
+
+    def test_local_path(self):
+        assert is_media_file_ref("/tmp/file.txt") is False
+
+    def test_empty_string(self):
+        assert is_media_file_ref("") is False
+
+    def test_filename(self):
+        assert is_media_file_ref("image.png") is False
+
+
+# ---------------------------------------------------------------------------
+# parse_data_uri
+# ---------------------------------------------------------------------------
+
+
+class TestParseDataUri:
+    def test_valid_png(self):
+        result = parse_data_uri("data:image/png;base64,iVBORw0KGg==")
+        assert result is not None
+        mime, payload = result
+        assert mime == "image/png"
+        assert payload == "iVBORw0KGg=="
+
+    def test_valid_text(self):
+        result = parse_data_uri("data:text/plain;base64,SGVsbG8=")
+        assert result is not None
+        assert result[0] == "text/plain"
+        assert result[1] == "SGVsbG8="
+
+    def test_mime_case_normalized(self):
+        result = parse_data_uri("data:IMAGE/PNG;base64,abc")
+        assert result is not None
+        assert result[0] == "image/png"
+
+    def test_not_data_uri(self):
+        assert parse_data_uri("workspace://abc123") is None
+
+    def test_plain_text(self):
+        assert parse_data_uri("hello world") is None
+
+    def test_missing_base64(self):
+        assert parse_data_uri("data:image/png;utf-8,abc") is None
+
+    def test_empty_payload(self):
+        result = parse_data_uri("data:image/png;base64,")
+        assert result is not None
+        assert result[1] == ""
+
+
+# ---------------------------------------------------------------------------
+# resolve_media_content
+# ---------------------------------------------------------------------------
+
+
+class TestResolveMediaContent:
+    @pytest.mark.asyncio
+    async def test_plain_text_passthrough(self):
+        """Plain text content (not a media ref) passes through unchanged."""
+        ctx = make_test_context()
+        result = await resolve_media_content(
+            MediaFileType("print('hello')"),
+            ctx,
+            return_format="for_external_api",
+        )
+        assert result == "print('hello')"
+
+    @pytest.mark.asyncio
+    async def test_empty_string_passthrough(self):
+        """Empty string passes through unchanged."""
+        ctx = make_test_context()
+        result = await resolve_media_content(
+            MediaFileType(""),
+            ctx,
+            return_format="for_external_api",
+        )
+        assert result == ""
+
+    @pytest.mark.asyncio
+    async def test_media_ref_delegates_to_store(self):
+        """Media references are resolved via store_media_file."""
+        ctx = make_test_context()
+        with patch(
+            "backend.util.file.store_media_file",
+            new=AsyncMock(return_value=MediaFileType("data:image/png;base64,abc")),
+        ) as mock_store:
+            result = await resolve_media_content(
+                MediaFileType("workspace://img123"),
+                ctx,
+                return_format="for_external_api",
+            )
+        assert result == "data:image/png;base64,abc"
+        mock_store.assert_called_once_with(
+            MediaFileType("workspace://img123"),
+            ctx,
+            return_format="for_external_api",
+        )
+
+    @pytest.mark.asyncio
+    async def test_data_uri_delegates_to_store(self):
+        """Data URIs are also resolved via store_media_file."""
+        ctx = make_test_context()
+        data_uri = "data:image/png;base64,iVBORw0KGg=="
+        with patch(
+            "backend.util.file.store_media_file",
+            new=AsyncMock(return_value=MediaFileType(data_uri)),
+        ) as mock_store:
+            result = await resolve_media_content(
+                MediaFileType(data_uri),
+                ctx,
+                return_format="for_external_api",
+            )
+        assert result == data_uri
+        mock_store.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_https_url_delegates_to_store(self):
+        """HTTPS URLs are resolved via store_media_file."""
+        ctx = make_test_context()
+        with patch(
+            "backend.util.file.store_media_file",
+            new=AsyncMock(return_value=MediaFileType("data:image/png;base64,abc")),
+        ) as mock_store:
+            result = await resolve_media_content(
+                MediaFileType("https://example.com/image.png"),
+                ctx,
+                return_format="for_local_processing",
+            )
+        assert result == "data:image/png;base64,abc"
+        mock_store.assert_called_once_with(
+            MediaFileType("https://example.com/image.png"),
+            ctx,
+            return_format="for_local_processing",
+        )
--- a/autogpt_platform/backend/backend/util/metrics.py
+++ b/autogpt_platform/backend/backend/util/metrics.py
@@ -10,7 +10,7 @@ from sentry_sdk.integrations.launchdarkly import LaunchDarklyIntegration
 from sentry_sdk.integrations.logging import LoggingIntegration

 from backend.util import feature_flag
-from backend.util.settings import Settings
+from backend.util.settings import BehaveAs, Settings

 settings = Settings()
 logger = logging.getLogger(__name__)
@@ -21,6 +21,95 @@ class DiscordChannel(str, Enum):
    PRODUCT = "product"  # For product alerts (low balance, zero balance, etc.)


+def _before_send(event, hint):
+    """Filter out expected/transient errors from Sentry to reduce noise."""
+    if "exc_info" in hint:
+        exc_type, exc_value, _ = hint["exc_info"]
+        exc_msg = str(exc_value).lower() if exc_value else ""
+
+        # AMQP/RabbitMQ transient connection errors — expected during deploys
+        amqp_keywords = [
+            "amqpconnection",
+            "amqpconnector",
+            "connection_forced",
+            "channelinvalidstateerror",
+            "no active transport",
+        ]
+        if any(kw in exc_msg for kw in amqp_keywords):
+            return None
+
+        # "connection refused" only for AMQP-related exceptions (not other services)
+        if "connection refused" in exc_msg:
+            exc_module = getattr(exc_type, "__module__", "") or ""
+            exc_name = getattr(exc_type, "__name__", "") or ""
+            amqp_indicators = ["aio_pika", "aiormq", "amqp", "pika", "rabbitmq"]
+            if any(
+                ind in exc_module.lower() or ind in exc_name.lower()
+                for ind in amqp_indicators
+            ) or any(kw in exc_msg for kw in ["amqp", "pika", "rabbitmq"]):
+                return None
+
+        # User-caused credential/auth errors — not platform bugs
+        user_auth_keywords = [
+            "incorrect api key",
+            "invalid x-api-key",
+            "missing authentication header",
+            "invalid api token",
+            "authentication_error",
+        ]
+        if any(kw in exc_msg for kw in user_auth_keywords):
+            return None
+
+        # Expected business logic — insufficient balance
+        if "insufficient balance" in exc_msg or "no credits left" in exc_msg:
+            return None
+
+        # Expected security check — blocked IP access
+        if "access to blocked or private ip" in exc_msg:
+            return None
+
+        # Discord bot token misconfiguration — not a platform error
+        if "improper token has been passed" in exc_msg or (
+            exc_type and exc_type.__name__ == "Forbidden" and "50001" in exc_msg
+        ):
+            return None
+
+        # Google metadata DNS errors — expected in non-GCP environments
+        if (
+            "metadata.google.internal" in exc_msg
+            and settings.config.behave_as != BehaveAs.CLOUD
+        ):
+            return None
+
+        # Inactive email recipients — expected for bounced addresses
+        if "marked as inactive" in exc_msg or "inactive addresses" in exc_msg:
+            return None
+
+    # Also filter log-based events for known noisy messages.
+    # Sentry's LoggingIntegration stores log messages under "logentry", not "message".
+    logentry = event.get("logentry") or {}
+    log_msg = (
+        logentry.get("formatted") or logentry.get("message") or event.get("message")
+    )
+    if event.get("logger") and log_msg:
+        msg = log_msg.lower()
+        noisy_patterns = [
+            "amqpconnection",
+            "connection_forced",
+            "unclosed client session",
+            "unclosed connector",
+        ]
+        if any(p in msg for p in noisy_patterns):
+            return None
+        # "connection refused" in logs only when AMQP-related context is present
+        if "connection refused" in msg and any(
+            ind in msg for ind in ("amqp", "pika", "rabbitmq", "aio_pika", "aiormq")
+        ):
+            return None
+
+    return event
+
+
 def sentry_init():
    sentry_dsn = settings.secrets.sentry_dsn
    integrations = []
@@ -35,6 +124,7 @@ def sentry_init():
        profiles_sample_rate=1.0,
        environment=f"app:{settings.config.app_env.value}-behave:{settings.config.behave_as.value}",
        _experiments={"enable_logs": True},
+        before_send=_before_send,
        integrations=[
            AsyncioIntegration(),
            LoggingIntegration(sentry_logs_level=logging.INFO),
--- a/autogpt_platform/backend/backend/util/retry.py
+++ b/autogpt_platform/backend/backend/util/retry.py
@@ -64,7 +64,7 @@ def send_rate_limited_discord_alert(
        return True

    except Exception as alert_error:
-        logger.error(f"Failed to send Discord alert: {alert_error}")
+        logger.warning(f"Failed to send Discord alert: {alert_error}")
        return False


@@ -182,7 +182,8 @@ def conn_retry(
        func_name = getattr(retry_state.fn, "__name__", "unknown")

        if retry_state.outcome.failed and retry_state.next_action is None:
-            logger.error(f"{prefix} {action_name} failed after retries: {exception}")
+            # Final failure is logged by sync_wrapper/async_wrapper — skip here to avoid duplicates
+            pass
        else:
            if attempt_number == EXCESSIVE_RETRY_THRESHOLD:
                if send_rate_limited_discord_alert(
@@ -225,7 +226,7 @@ def conn_retry(
                logger.info(f"{prefix} {action_name} completed successfully.")
                return result
            except Exception as e:
-                logger.error(f"{prefix} {action_name} failed after retries: {e}")
+                logger.warning(f"{prefix} {action_name} failed after retries: {e}")
                raise

        @wraps(func)
@@ -237,7 +238,7 @@ def conn_retry(
                logger.info(f"{prefix} {action_name} completed successfully.")
                return result
            except Exception as e:
-                logger.error(f"{prefix} {action_name} failed after retries: {e}")
+                logger.warning(f"{prefix} {action_name} failed after retries: {e}")
                raise

        return async_wrapper if is_coroutine else sync_wrapper
--- a/autogpt_platform/backend/poetry.lock
+++ b/autogpt_platform/backend/poetry.lock
@@ -1360,6 +1360,18 @@ files = [
 dnspython = ">=2.0.0"
 idna = ">=2.0.0"

+[[package]]
+name = "et-xmlfile"
+version = "2.0.0"
+description = "An implementation of lxml.xmlfile for the standard library"
+optional = false
+python-versions = ">=3.8"
+groups = ["main"]
+files = [
+    {file = "et_xmlfile-2.0.0-py3-none-any.whl", hash = "sha256:7a91720bc756843502c3b7504c77b8fe44217c85c537d85037f0f536151b2caa"},
+    {file = "et_xmlfile-2.0.0.tar.gz", hash = "sha256:dab3f4764309081ce75662649be815c4c9081e88f0837825f90fd28317d4da54"},
+]
+
 [[package]]
 name = "exa-py"
 version = "1.16.1"
@@ -4228,6 +4240,21 @@ datalib = ["numpy (>=1)", "pandas (>=1.2.3)", "pandas-stubs (>=1.1.0.11)"]
 realtime = ["websockets (>=13,<16)"]
 voice-helpers = ["numpy (>=2.0.2)", "sounddevice (>=0.5.1)"]

+[[package]]
+name = "openpyxl"
+version = "3.1.5"
+description = "A Python library to read/write Excel 2010 xlsx/xlsm files"
+optional = false
+python-versions = ">=3.8"
+groups = ["main"]
+files = [
+    {file = "openpyxl-3.1.5-py2.py3-none-any.whl", hash = "sha256:5282c12b107bffeef825f4617dc029afaf41d0ea60823bbb665ef3079dc79de2"},
+    {file = "openpyxl-3.1.5.tar.gz", hash = "sha256:cf0e3cf56142039133628b5acffe8ef0c12bc902d2aadd3e0fe5878dc08d1050"},
+]
+
+[package.dependencies]
+et-xmlfile = "*"
+
 [[package]]
 name = "opentelemetry-api"
 version = "1.39.1"
@@ -5430,6 +5457,66 @@ files = [
    {file = "psycopg2_binary-2.9.11-cp39-cp39-win_amd64.whl", hash = "sha256:875039274f8a2361e5207857899706da840768e2a775bf8c65e82f60b197df02"},
 ]

+[[package]]
+name = "pyarrow"
+version = "23.0.1"
+description = "Python library for Apache Arrow"
+optional = false
+python-versions = ">=3.10"
+groups = ["main"]
+files = [
+    {file = "pyarrow-23.0.1-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:3fab8f82571844eb3c460f90a75583801d14ca0cc32b1acc8c361650e006fd56"},
+    {file = "pyarrow-23.0.1-cp310-cp310-macosx_12_0_x86_64.whl", hash = "sha256:3f91c038b95f71ddfc865f11d5876c42f343b4495535bd262c7b321b0b94507c"},
+    {file = "pyarrow-23.0.1-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:d0744403adabef53c985a7f8a082b502a368510c40d184df349a0a8754533258"},
+    {file = "pyarrow-23.0.1-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:c33b5bf406284fd0bba436ed6f6c3ebe8e311722b441d89397c54f871c6863a2"},
+    {file = "pyarrow-23.0.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:ddf743e82f69dcd6dbbcb63628895d7161e04e56794ef80550ac6f3315eeb1d5"},
+    {file = "pyarrow-23.0.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:e052a211c5ac9848ae15d5ec875ed0943c0221e2fcfe69eee80b604b4e703222"},
+    {file = "pyarrow-23.0.1-cp310-cp310-win_amd64.whl", hash = "sha256:5abde149bb3ce524782d838eb67ac095cd3fd6090eba051130589793f1a7f76d"},
+    {file = "pyarrow-23.0.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:6f0147ee9e0386f519c952cc670eb4a8b05caa594eeffe01af0e25f699e4e9bb"},
+    {file = "pyarrow-23.0.1-cp311-cp311-macosx_12_0_x86_64.whl", hash = "sha256:0ae6e17c828455b6265d590100c295193f93cc5675eb0af59e49dbd00d2de350"},
+    {file = "pyarrow-23.0.1-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:fed7020203e9ef273360b9e45be52a2a47d3103caf156a30ace5247ffb51bdbd"},
+    {file = "pyarrow-23.0.1-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:26d50dee49d741ac0e82185033488d28d35be4d763ae6f321f97d1140eb7a0e9"},
+    {file = "pyarrow-23.0.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:3c30143b17161310f151f4a2bcfe41b5ff744238c1039338779424e38579d701"},
+    {file = "pyarrow-23.0.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:db2190fa79c80a23fdd29fef4b8992893f024ae7c17d2f5f4db7171fa30c2c78"},
+    {file = "pyarrow-23.0.1-cp311-cp311-win_amd64.whl", hash = "sha256:f00f993a8179e0e1c9713bcc0baf6d6c01326a406a9c23495ec1ba9c9ebf2919"},
+    {file = "pyarrow-23.0.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:f4b0dbfa124c0bb161f8b5ebb40f1a680b70279aa0c9901d44a2b5a20806039f"},
+    {file = "pyarrow-23.0.1-cp312-cp312-macosx_12_0_x86_64.whl", hash = "sha256:7707d2b6673f7de054e2e83d59f9e805939038eebe1763fe811ee8fa5c0cd1a7"},
+    {file = "pyarrow-23.0.1-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:86ff03fb9f1a320266e0de855dee4b17da6794c595d207f89bba40d16b5c78b9"},
+    {file = "pyarrow-23.0.1-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:813d99f31275919c383aab17f0f455a04f5a429c261cc411b1e9a8f5e4aaaa05"},
+    {file = "pyarrow-23.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:bf5842f960cddd2ef757d486041d57c96483efc295a8c4a0e20e704cbbf39c67"},
+    {file = "pyarrow-23.0.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:564baf97c858ecc03ec01a41062e8f4698abc3e6e2acd79c01c2e97880a19730"},
+    {file = "pyarrow-23.0.1-cp312-cp312-win_amd64.whl", hash = "sha256:07deae7783782ac7250989a7b2ecde9b3c343a643f82e8a4df03d93b633006f0"},
+    {file = "pyarrow-23.0.1-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:6b8fda694640b00e8af3c824f99f789e836720aa8c9379fb435d4c4953a756b8"},
+    {file = "pyarrow-23.0.1-cp313-cp313-macosx_12_0_x86_64.whl", hash = "sha256:8ff51b1addc469b9444b7c6f3548e19dc931b172ab234e995a60aea9f6e6025f"},
+    {file = "pyarrow-23.0.1-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:71c5be5cbf1e1cb6169d2a0980850bccb558ddc9b747b6206435313c47c37677"},
+    {file = "pyarrow-23.0.1-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:9b6f4f17b43bc39d56fec96e53fe89d94bac3eb134137964371b45352d40d0c2"},
+    {file = "pyarrow-23.0.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:9fc13fc6c403d1337acab46a2c4346ca6c9dec5780c3c697cf8abfd5e19b6b37"},
+    {file = "pyarrow-23.0.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:5c16ed4f53247fa3ffb12a14d236de4213a4415d127fe9cebed33d51671113e2"},
+    {file = "pyarrow-23.0.1-cp313-cp313-win_amd64.whl", hash = "sha256:cecfb12ef629cf6be0b1887f9f86463b0dd3dc3195ae6224e74006be4736035a"},
+    {file = "pyarrow-23.0.1-cp313-cp313t-macosx_12_0_arm64.whl", hash = "sha256:29f7f7419a0e30264ea261fdc0e5fe63ce5a6095003db2945d7cd78df391a7e1"},
+    {file = "pyarrow-23.0.1-cp313-cp313t-macosx_12_0_x86_64.whl", hash = "sha256:33d648dc25b51fd8055c19e4261e813dfc4d2427f068bcecc8b53d01b81b0500"},
+    {file = "pyarrow-23.0.1-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:cd395abf8f91c673dd3589cadc8cc1ee4e8674fa61b2e923c8dd215d9c7d1f41"},
+    {file = "pyarrow-23.0.1-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:00be9576d970c31defb5c32eb72ef585bf600ef6d0a82d5eccaae96639cf9d07"},
+    {file = "pyarrow-23.0.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:c2139549494445609f35a5cda4eb94e2c9e4d704ce60a095b342f82460c73a83"},
+    {file = "pyarrow-23.0.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:7044b442f184d84e2351e5084600f0d7343d6117aabcbc1ac78eb1ae11eb4125"},
+    {file = "pyarrow-23.0.1-cp313-cp313t-win_amd64.whl", hash = "sha256:a35581e856a2fafa12f3f54fce4331862b1cfb0bef5758347a858a4aa9d6bae8"},
+    {file = "pyarrow-23.0.1-cp314-cp314-macosx_12_0_arm64.whl", hash = "sha256:5df1161da23636a70838099d4aaa65142777185cc0cdba4037a18cee7d8db9ca"},
+    {file = "pyarrow-23.0.1-cp314-cp314-macosx_12_0_x86_64.whl", hash = "sha256:fa8e51cb04b9f8c9c5ace6bab63af9a1f88d35c0d6cbf53e8c17c098552285e1"},
+    {file = "pyarrow-23.0.1-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:0b95a3994f015be13c63148fef8832e8a23938128c185ee951c98908a696e0eb"},
+    {file = "pyarrow-23.0.1-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:4982d71350b1a6e5cfe1af742c53dfb759b11ce14141870d05d9e540d13bc5d1"},
+    {file = "pyarrow-23.0.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:c250248f1fe266db627921c89b47b7c06fee0489ad95b04d50353537d74d6886"},
+    {file = "pyarrow-23.0.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:5f4763b83c11c16e5f4c15601ba6dfa849e20723b46aa2617cb4bffe8768479f"},
+    {file = "pyarrow-23.0.1-cp314-cp314-win_amd64.whl", hash = "sha256:3a4c85ef66c134161987c17b147d6bffdca4566f9a4c1d81a0a01cdf08414ea5"},
+    {file = "pyarrow-23.0.1-cp314-cp314t-macosx_12_0_arm64.whl", hash = "sha256:17cd28e906c18af486a499422740298c52d7c6795344ea5002a7720b4eadf16d"},
+    {file = "pyarrow-23.0.1-cp314-cp314t-macosx_12_0_x86_64.whl", hash = "sha256:76e823d0e86b4fb5e1cf4a58d293036e678b5a4b03539be933d3b31f9406859f"},
+    {file = "pyarrow-23.0.1-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:a62e1899e3078bf65943078b3ad2a6ddcacf2373bc06379aac61b1e548a75814"},
+    {file = "pyarrow-23.0.1-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:df088e8f640c9fae3b1f495b3c64755c4e719091caf250f3a74d095ddf3c836d"},
+    {file = "pyarrow-23.0.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:46718a220d64677c93bc243af1d44b55998255427588e400677d7192671845c7"},
+    {file = "pyarrow-23.0.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:a09f3876e87f48bc2f13583ab551f0379e5dfb83210391e68ace404181a20690"},
+    {file = "pyarrow-23.0.1-cp314-cp314t-win_amd64.whl", hash = "sha256:527e8d899f14bd15b740cd5a54ad56b7f98044955373a17179d5956ddb93d9ce"},
+    {file = "pyarrow-23.0.1.tar.gz", hash = "sha256:b8c5873e33440b2bc2f4a79d2b47017a89c5a24116c055625e6f2ee50523f019"},
+]
+
 [[package]]
 name = "pyasn1"
 version = "0.6.2"
@@ -8882,4 +8969,4 @@ cffi = ["cffi (>=1.17,<2.0) ; platform_python_implementation != \"PyPy\" and pyt
 [metadata]
 lock-version = "2.1"
 python-versions = ">=3.10,<3.14"
-content-hash = "4e4365721cd3b68c58c237353b74adae1c64233fd4446904c335f23eb866fdca"
+content-hash = "86dab25684dd46e635a33bd33281a926e5626a874ecc048c34389fecf34a87d8"
--- a/autogpt_platform/backend/pyproject.toml
+++ b/autogpt_platform/backend/pyproject.toml
@@ -92,6 +92,8 @@ gravitas-md2gdocs = "^0.1.0"
 posthog = "^7.6.0"
 fpdf2 = "^2.8.6"
 langsmith = "^0.7.7"
+openpyxl = "^3.1.5"
+pyarrow = "^23.0.0"

 [tool.poetry.group.dev.dependencies]
 aiohappyeyeballs = "^2.6.1"
--- a/autogpt_platform/frontend/CLAUDE.md
+++ b/autogpt_platform/frontend/CLAUDE.md
@@ -44,6 +44,12 @@ Do NOT skip these steps. If any command reports errors, fix them and re-run unti

 - Fully capitalize acronyms in symbols, e.g. `graphID`, `useBackendAPI`
 - Use function declarations (not arrow functions) for components/handlers
+- No `dark:` Tailwind classes — the design system handles dark mode
+- Use Next.js `<Link>` for internal navigation — never raw `<a>` tags
+- No `any` types unless the value genuinely can be anything
+- No linter suppressors (`// @ts-ignore`, `// eslint-disable`) — fix the actual issue
+- **File length** — keep files under ~200 lines; extract sub-components or hooks into their own files when a file grows beyond this
+- **Function/component length** — keep render functions and hooks under ~50 lines; extract named helpers or sub-components when they grow longer

 ## Architecture

--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx
@@ -2,7 +2,7 @@
 import { ChatInput } from "@/app/(platform)/copilot/components/ChatInput/ChatInput";
 import { UIDataTypes, UIMessage, UITools } from "ai";
 import { LayoutGroup, motion } from "framer-motion";
-import { ReactNode } from "react";
+import { ReactNode, useCallback } from "react";
 import { ChatMessagesContainer } from "../ChatMessagesContainer/ChatMessagesContainer";
 import { CopilotChatActionsProvider } from "../CopilotChatActionsProvider/CopilotChatActionsProvider";
 import { EmptySession } from "../EmptySession/EmptySession";
@@ -52,6 +52,20 @@ export const ChatContainer = ({
    !!isSessionError;
  const inputLayoutId = "copilot-2-chat-input";

+  // Retry: re-send the last user message (used by ErrorCard on transient errors)
+  const handleRetry = useCallback(() => {
+    const lastUserMsg = [...messages].reverse().find((m) => m.role === "user");
+    const lastText = lastUserMsg?.parts
+      .filter(
+        (p): p is Extract<typeof p, { type: "text" }> => p.type === "text",
+      )
+      .map((p) => p.text)
+      .join("");
+    if (lastText) {
+      onSend(lastText);
+    }
+  }, [messages, onSend]);
+
  return (
    <CopilotChatActionsProvider onSend={onSend}>
      <LayoutGroup id="copilot-2-chat-layout">
@@ -65,6 +79,7 @@ export const ChatContainer = ({
                isLoading={isLoadingSession}
                headerSlot={headerSlot}
                sessionID={sessionId}
+                onRetry={handleRetry}
              />
              <motion.div
                initial={{ opacity: 0 }}
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx
@@ -32,11 +32,13 @@ interface Props {
  isLoading: boolean;
  headerSlot?: React.ReactNode;
  sessionID?: string | null;
+  onRetry?: () => void;
 }

 function renderSegments(
  segments: RenderSegment[],
  messageID: string,
+  onRetry?: () => void,
 ): React.ReactNode[] {
  return segments.map((seg, segIdx) => {
    if (seg.kind === "collapsed-group") {
@@ -48,6 +50,7 @@ function renderSegments(
        part={seg.part}
        messageID={messageID}
        partIndex={seg.index}
+        onRetry={onRetry}
      />
    );
  });
@@ -104,6 +107,7 @@ export function ChatMessagesContainer({
  isLoading,
  headerSlot,
  sessionID,
+  onRetry,
 }: Props) {
  const lastMessage = messages[messages.length - 1];
  const graphExecId = useMemo(() => extractGraphExecId(messages), [messages]);
@@ -212,13 +216,18 @@ export function ChatMessagesContainer({
                  </ReasoningCollapse>
                )}
                {responseSegments
-                  ? renderSegments(responseSegments, message.id)
+                  ? renderSegments(
+                      responseSegments,
+                      message.id,
+                      isLastAssistant ? onRetry : undefined,
+                    )
                  : message.parts.map((part, i) => (
                      <MessagePartRenderer
                        key={`${message.id}-${i}`}
                        part={part}
                        messageID={message.id}
                        partIndex={i}
+                        onRetry={isLastAssistant ? onRetry : undefined}
                      />
                    ))}
                {isLastInTurn && !isCurrentlyStreaming && (
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessagePartRenderer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessagePartRenderer.tsx
@@ -69,9 +69,15 @@ interface Props {
  part: UIMessage<unknown, UIDataTypes, UITools>["parts"][number];
  messageID: string;
  partIndex: number;
+  onRetry?: () => void;
 }

-export function MessagePartRenderer({ part, messageID, partIndex }: Props) {
+export function MessagePartRenderer({
+  part,
+  messageID,
+  partIndex,
+  onRetry,
+}: Props) {
  const key = `${messageID}-${partIndex}`;

  switch (part.type) {
@@ -80,7 +86,7 @@ export function MessagePartRenderer({ part, messageID, partIndex }: Props) {
        part.text,
      );

-      if (markerType === "error") {
+      if (markerType === "error" || markerType === "retryable_error") {
        const lowerMarker = markerText.toLowerCase();
        const isCancellation =
          lowerMarker === "operation cancelled" ||
@@ -100,6 +106,7 @@ export function MessagePartRenderer({ part, messageID, partIndex }: Props) {
            key={key}
            responseError={{ message: markerText }}
            context="execution"
+            onRetry={markerType === "retryable_error" ? onRetry : undefined}
          />
        );
      }
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.ts
@@ -172,16 +172,22 @@ export function getTurnMessages(
 // The hex suffix makes it virtually impossible for an LLM to accidentally
 // produce these strings in normal conversation.
 const COPILOT_ERROR_PREFIX = "[__COPILOT_ERROR_f7a1__]";
+const COPILOT_RETRYABLE_ERROR_PREFIX = "[__COPILOT_RETRYABLE_ERROR_a9c2__]";
 const COPILOT_SYSTEM_PREFIX = "[__COPILOT_SYSTEM_e3b0__]";

-export type MarkerType = "error" | "system" | null;
+export type MarkerType = "error" | "retryable_error" | "system" | null;

 /** Escape all regex special characters in a string. */
 function escapeRegExp(s: string): string {
  return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
 }

-// Pre-compiled marker regexes (avoids re-creating on every call / render)
+// Pre-compiled marker regexes (avoids re-creating on every call / render).
+// Retryable check must come first since it's more specific.
+const RETRYABLE_ERROR_MARKER_RE = new RegExp(
+  `${escapeRegExp(COPILOT_RETRYABLE_ERROR_PREFIX)}\\s*(.+?)$`,
+  "s",
+);
 const ERROR_MARKER_RE = new RegExp(
  `${escapeRegExp(COPILOT_ERROR_PREFIX)}\\s*(.+?)$`,
  "s",
@@ -196,6 +202,15 @@ export function parseSpecialMarkers(text: string): {
  markerText: string;
  cleanText: string;
 } {
+  const retryableMatch = text.match(RETRYABLE_ERROR_MARKER_RE);
+  if (retryableMatch) {
+    return {
+      markerType: "retryable_error",
+      markerText: retryableMatch[1].trim(),
+      cleanText: text.replace(retryableMatch[0], "").trim(),
+    };
+  }
+
  const errorMatch = text.match(ERROR_MARKER_RE);
  if (errorMatch) {
    return {
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/JumpBackIn/JumpBackIn.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/JumpBackIn/JumpBackIn.tsx
@@ -0,0 +1,40 @@
+"use client";
+
+import { ArrowRight, Lightning } from "@phosphor-icons/react";
+import NextLink from "next/link";
+
+import { Button } from "@/components/atoms/Button/Button";
+import { Text } from "@/components/atoms/Text/Text";
+import { useJumpBackIn } from "./useJumpBackIn";
+
+export function JumpBackIn() {
+  const { agent, isLoading } = useJumpBackIn();
+
+  if (isLoading || !agent) {
+    return null;
+  }
+
+  return (
+    <div className="flex items-center justify-between rounded-large border border-zinc-200 bg-gradient-to-r from-zinc-50 to-white px-5 py-4">
+      <div className="flex items-center gap-3">
+        <div className="flex h-9 w-9 items-center justify-center rounded-full bg-zinc-900">
+          <Lightning size={18} weight="fill" className="text-white" />
+        </div>
+        <div className="flex flex-col">
+          <Text variant="small" className="text-zinc-500">
+            Continue where you left off
+          </Text>
+          <Text variant="body-medium" className="text-zinc-900">
+            {agent.name}
+          </Text>
+        </div>
+      </div>
+      <NextLink href={`/library/agents/${agent.id}`}>
+        <Button variant="primary" size="small" className="gap-1.5">
+          Jump Back In
+          <ArrowRight size={16} />
+        </Button>
+      </NextLink>
+    </div>
+  );
+}
--- a/autogpt_platform/frontend/src/app/(platform)/library/components/JumpBackIn/useJumpBackIn.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/library/components/JumpBackIn/useJumpBackIn.ts
@@ -0,0 +1,28 @@
+"use client";
+
+import { useGetV2ListLibraryAgents } from "@/app/api/__generated__/endpoints/library/library";
+import { okData } from "@/app/api/helpers";
+
+export function useJumpBackIn() {
+  const { data, isLoading } = useGetV2ListLibraryAgents(
+    {
+      page: 1,
+      page_size: 1,
+      sort_by: "updatedAt",
+    },
+    {
+      query: { select: okData },
+    },
+  );
+
+  // The API doesn't include execution data by default (include_executions is
+  // internal to the backend), so recent_executions is always empty here.
+  // We use the most recently updated agent as the "jump back in" candidate
+  // instead — updatedAt is the best available proxy for recent activity.
+  const agent = data?.agents[0] ?? null;
+
+  return {
+    agent,
+    isLoading,
+  };
+}
--- a/autogpt_platform/frontend/src/app/(platform)/library/page.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/library/page.tsx
@@ -2,6 +2,7 @@

 import { useEffect, useState, useCallback } from "react";
 import { HeartIcon, ListIcon } from "@phosphor-icons/react";
+import { JumpBackIn } from "./components/JumpBackIn/JumpBackIn";
 import { LibraryActionHeader } from "./components/LibraryActionHeader/LibraryActionHeader";
 import { LibraryAgentList } from "./components/LibraryAgentList/LibraryAgentList";
 import { Tab } from "./components/LibraryTabs/LibraryTabs";
@@ -38,6 +39,7 @@ export default function LibraryPage() {
      onAnimationComplete={handleFavoriteAnimationComplete}
    >
      <main className="pt-160 container min-h-screen space-y-4 pb-20 pt-16 sm:px-8 md:px-12">
+        <JumpBackIn />
        <LibraryActionHeader setSearchTerm={setSearchTerm} />
        <LibraryAgentList
          searchTerm={searchTerm}