Merge remote-tracking branch 'origin/dev' into feat/github-cli-copilot

fix(copilot): remove implicit gh auth setup-git from sandbox creation
Remove the automatic GitHub credential helper configuration that ran on every E2B sandbox connect/reconnect. This addressed a review concern about implicitly giving AutoPilot full GitHub access without user awareness or opt-in. The bash_exec tool already injects GH_TOKEN/GITHUB_TOKEN per-command for users who have connected their account via connect_integration, which is the explicit opt-in path.
2026-03-17 03:00:27 -04:00 · 2026-03-17 06:17:03 +07:00 · 2026-03-17 00:36:51 +07:00 · 2026-03-16 17:10:18 +07:00 · 2026-03-16 15:52:40 +07:00 · 2026-03-16 15:45:18 +07:00
25 changed files with 1332 additions and 527 deletions
--- a/.github/workflows/platform-backend-ci.yml
+++ b/.github/workflows/platform-backend-ci.yml
@@ -5,14 +5,12 @@ on:
    branches: [master, dev, ci-test*]
    paths:
      - ".github/workflows/platform-backend-ci.yml"
-      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/backend/**"
      - "autogpt_platform/autogpt_libs/**"
  pull_request:
    branches: [master, dev, release-*]
    paths:
      - ".github/workflows/platform-backend-ci.yml"
-      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/backend/**"
      - "autogpt_platform/autogpt_libs/**"
  merge_group:
--- a/.github/workflows/platform-frontend-ci.yml
+++ b/.github/workflows/platform-frontend-ci.yml
@@ -120,6 +120,175 @@ jobs:
          token: ${{ secrets.GITHUB_TOKEN }}
          exitOnceUploaded: true

+  e2e_test:
+    name: end-to-end tests
+    runs-on: big-boi
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          submodules: recursive
+
+      - name: Set up Platform - Copy default supabase .env
+        run: |
+          cp ../.env.default ../.env
+
+      - name: Set up Platform - Copy backend .env and set OpenAI API key
+        run: |
+          cp ../backend/.env.default ../backend/.env
+          echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
+        env:
+          # Used by E2E test data script to generate embeddings for approved store agents
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+
+      - name: Set up Platform - Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+        with:
+          driver: docker-container
+          driver-opts: network=host
+
+      - name: Set up Platform - Expose GHA cache to docker buildx CLI
+        uses: crazy-max/ghaction-github-runtime@v4
+
+      - name: Set up Platform - Build Docker images (with cache)
+        working-directory: autogpt_platform
+        run: |
+          pip install pyyaml
+
+          # Resolve extends and generate a flat compose file that bake can understand
+          docker compose -f docker-compose.yml config > docker-compose.resolved.yml
+
+          # Add cache configuration to the resolved compose file
+          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
+            --source docker-compose.resolved.yml \
+            --cache-from "type=gha" \
+            --cache-to "type=gha,mode=max" \
+            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend') }}" \
+            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src') }}" \
+            --git-ref "${{ github.ref }}"
+
+          # Build with bake using the resolved compose file (now includes cache config)
+          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+
+      - name: Set up tests - Cache E2E test data
+        id: e2e-data-cache
+        uses: actions/cache@v5
+        with:
+          path: /tmp/e2e_test_data.sql
+          key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-frontend-ci.yml') }}
+
+      - name: Set up Platform - Start Supabase DB + Auth
+        run: |
+          docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
+          echo "Waiting for database to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
+          echo "Waiting for auth service to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
+
+      - name: Set up Platform - Run migrations
+        run: |
+          echo "Running migrations..."
+          docker compose -f ../docker-compose.resolved.yml run --rm migrate
+          echo "✅ Migrations completed"
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+
+      - name: Set up tests - Load cached E2E test data
+        if: steps.e2e-data-cache.outputs.cache-hit == 'true'
+        run: |
+          echo "✅ Found cached E2E test data, restoring..."
+          {
+            echo "SET session_replication_role = 'replica';"
+            cat /tmp/e2e_test_data.sql
+            echo "SET session_replication_role = 'origin';"
+          } | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
+          # Refresh materialized views after restore
+          docker compose -f ../docker-compose.resolved.yml exec -T db \
+            psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
+
+          echo "✅ E2E test data restored from cache"
+
+      - name: Set up Platform - Start (all other services)
+        run: |
+          docker compose -f ../docker-compose.resolved.yml up -d --no-build
+          echo "Waiting for rest_server to be ready..."
+          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+
+      - name: Set up tests - Create E2E test data
+        if: steps.e2e-data-cache.outputs.cache-hit != 'true'
+        run: |
+          echo "Creating E2E test data..."
+          docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
+          docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
+            echo "❌ E2E test data creation failed!"
+            docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
+            exit 1
+          }
+
+          # Dump auth.users + platform schema for cache (two separate dumps)
+          echo "Dumping database for cache..."
+          {
+            docker compose -f ../docker-compose.resolved.yml exec -T db \
+              pg_dump -U postgres --data-only --column-inserts \
+              --table='auth.users' postgres
+            docker compose -f ../docker-compose.resolved.yml exec -T db \
+              pg_dump -U postgres --data-only --column-inserts \
+              --schema=platform \
+              --exclude-table='platform._prisma_migrations' \
+              --exclude-table='platform.apscheduler_jobs' \
+              --exclude-table='platform.apscheduler_jobs_batched_notifications' \
+              postgres
+          } > /tmp/e2e_test_data.sql
+
+          echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
+
+      - name: Set up tests - Enable corepack
+        run: corepack enable
+
+      - name: Set up tests - Set up Node
+        uses: actions/setup-node@v6
+        with:
+          node-version: "22.18.0"
+          cache: "pnpm"
+          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+
+      - name: Set up tests - Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Set up tests - Install browser 'chromium'
+        run: pnpm playwright install --with-deps chromium
+
+      - name: Run Playwright tests
+        run: pnpm test:no-build
+        continue-on-error: false
+
+      - name: Upload Playwright report
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-report
+          path: playwright-report
+          if-no-files-found: ignore
+          retention-days: 3
+
+      - name: Upload Playwright test results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-test-results
+          path: test-results
+          if-no-files-found: ignore
+          retention-days: 3
+
+      - name: Print Final Docker Compose logs
+        if: always()
+        run: docker compose -f ../docker-compose.resolved.yml logs
+
  integration_test:
    runs-on: ubuntu-latest
    needs: setup
--- a/.github/workflows/platform-fullstack-ci.yml
+++ b/.github/workflows/platform-fullstack-ci.yml
@@ -1,18 +1,14 @@
-name: AutoGPT Platform - Full-stack CI
+name: AutoGPT Platform - Frontend CI

 on:
  push:
    branches: [master, dev]
    paths:
      - ".github/workflows/platform-fullstack-ci.yml"
-      - ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
-      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/**"
  pull_request:
    paths:
      - ".github/workflows/platform-fullstack-ci.yml"
-      - ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
-      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/**"
  merge_group:

@@ -28,28 +24,42 @@ defaults:
 jobs:
  setup:
    runs-on: ubuntu-latest
+    outputs:
+      cache-key: ${{ steps.cache-key.outputs.key }}

    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

-      - name: Enable corepack
-        run: corepack enable
-
-      - name: Set up Node
+      - name: Set up Node.js
        uses: actions/setup-node@v6
        with:
          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

-      - name: Install dependencies to populate cache
+      - name: Enable corepack
+        run: corepack enable
+
+      - name: Generate cache key
+        id: cache-key
+        run: echo "key=${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}" >> $GITHUB_OUTPUT
+
+      - name: Cache dependencies
+        uses: actions/cache@v5
+        with:
+          path: ~/.pnpm-store
+          key: ${{ steps.cache-key.outputs.key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-
+
+      - name: Install dependencies
        run: pnpm install --frozen-lockfile

-  check-api-types:
-    name: check API types
-    runs-on: ubuntu-latest
+  types:
+    runs-on: big-boi
    needs: setup
+    strategy:
+      fail-fast: false

    steps:
      - name: Checkout repository
@@ -57,256 +67,70 @@ jobs:
        with:
          submodules: recursive

-      # ------------------------ Backend setup ------------------------
-
-      - name: Set up Backend - Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: "3.12"
-
-      - name: Set up Backend - Install Poetry
-        working-directory: autogpt_platform/backend
-        run: |
-          POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
-          echo "Installing Poetry version ${POETRY_VERSION}"
-          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$POETRY_VERSION python3 -
-
-      - name: Set up Backend - Set up dependency cache
-        uses: actions/cache@v5
-        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
-
-      - name: Set up Backend - Install dependencies
-        working-directory: autogpt_platform/backend
-        run: poetry install
-
-      - name: Set up Backend - Generate Prisma client
-        working-directory: autogpt_platform/backend
-        run: poetry run prisma generate && poetry run gen-prisma-stub
-
-      - name: Set up Frontend - Export OpenAPI schema from Backend
-        working-directory: autogpt_platform/backend
-        run: poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
-
-      # ------------------------ Frontend setup ------------------------
-
-      - name: Set up Frontend - Enable corepack
-        run: corepack enable
-
-      - name: Set up Frontend - Set up Node
+      - name: Set up Node.js
        uses: actions/setup-node@v6
        with:
          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

-      - name: Set up Frontend - Install dependencies
+      - name: Enable corepack
+        run: corepack enable
+
+      - name: Copy default supabase .env
+        run: |
+          cp ../.env.default ../.env
+
+      - name: Copy backend .env
+        run: |
+          cp ../backend/.env.default ../backend/.env
+
+      - name: Run docker compose
+        run: |
+          docker compose -f ../docker-compose.yml --profile local up -d deps_backend
+
+      - name: Restore dependencies cache
+        uses: actions/cache@v5
+        with:
+          path: ~/.pnpm-store
+          key: ${{ needs.setup.outputs.cache-key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-
+
+      - name: Install dependencies
        run: pnpm install --frozen-lockfile

-      - name: Set up Frontend - Format OpenAPI schema
-        id: format-schema
-        run: pnpm prettier --write ./src/app/api/openapi.json
+      - name: Setup .env
+        run: cp .env.default .env
+
+      - name: Wait for services to be ready
+        run: |
+          echo "Waiting for rest_server to be ready..."
+          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
+          echo "Waiting for database to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done' || echo "Database ready check timeout, continuing..."
+
+      - name: Generate API queries
+        run: pnpm generate:api:force

      - name: Check for API schema changes
        run: |
          if ! git diff --exit-code src/app/api/openapi.json; then
            echo "❌ API schema changes detected in src/app/api/openapi.json"
            echo ""
-            echo "The openapi.json file has been modified after exporting the API schema."
+            echo "The openapi.json file has been modified after running 'pnpm generate:api-all'."
            echo "This usually means changes have been made in the BE endpoints without updating the Frontend."
            echo "The API schema is now out of sync with the Front-end queries."
            echo ""
            echo "To fix this:"
-            echo "\nIn the backend directory:"
-            echo "1. Run 'poetry run export-api-schema --output ../frontend/src/app/api/openapi.json'"
-            echo "\nIn the frontend directory:"
-            echo "2. Run 'pnpm prettier --write src/app/api/openapi.json'"
-            echo "3. Run 'pnpm generate:api'"
-            echo "4. Run 'pnpm types'"
-            echo "5. Fix any TypeScript errors that may have been introduced"
-            echo "6. Commit and push your changes"
+            echo "1. Pull the backend 'docker compose pull && docker compose up -d --build --force-recreate'"
+            echo "2. Run 'pnpm generate:api' locally"
+            echo "3. Run 'pnpm types' locally"
+            echo "4. Fix any TypeScript errors that may have been introduced"
+            echo "5. Commit and push your changes"
            echo ""
            exit 1
          else
            echo "✅ No API schema changes detected"
          fi

-      - name: Set up Frontend - Generate API client
-        id: generate-api-client
-        run: pnpm orval --config ./orval.config.ts
-        # Continue with type generation & check even if there are schema changes
-        if: success() || (steps.format-schema.outcome == 'success')
-
-      - name: Check for TypeScript errors
+      - name: Run Typescript checks
        run: pnpm types
-        if: success() || (steps.generate-api-client.outcome == 'success')
-
-  e2e_test:
-    name: end-to-end tests
-    runs-on: big-boi
-
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-        with:
-          submodules: recursive
-
-      - name: Set up Platform - Copy default supabase .env
-        run: |
-          cp ../.env.default ../.env
-
-      - name: Set up Platform - Copy backend .env and set OpenAI API key
-        run: |
-          cp ../backend/.env.default ../backend/.env
-          echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
-        env:
-          # Used by E2E test data script to generate embeddings for approved store agents
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-
-      - name: Set up Platform - Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-        with:
-          driver: docker-container
-          driver-opts: network=host
-
-      - name: Set up Platform - Expose GHA cache to docker buildx CLI
-        uses: crazy-max/ghaction-github-runtime@v4
-
-      - name: Set up Platform - Build Docker images (with cache)
-        working-directory: autogpt_platform
-        run: |
-          pip install pyyaml
-
-          # Resolve extends and generate a flat compose file that bake can understand
-          docker compose -f docker-compose.yml config > docker-compose.resolved.yml
-
-          # Add cache configuration to the resolved compose file
-          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
-            --source docker-compose.resolved.yml \
-            --cache-from "type=gha" \
-            --cache-to "type=gha,mode=max" \
-            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
-            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
-            --git-ref "${{ github.ref }}"
-
-          # Build with bake using the resolved compose file (now includes cache config)
-          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Cache E2E test data
-        id: e2e-data-cache
-        uses: actions/cache@v5
-        with:
-          path: /tmp/e2e_test_data.sql
-          key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-fullstack-ci.yml') }}
-
-      - name: Set up Platform - Start Supabase DB + Auth
-        run: |
-          docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
-          echo "Waiting for database to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
-          echo "Waiting for auth service to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
-
-      - name: Set up Platform - Run migrations
-        run: |
-          echo "Running migrations..."
-          docker compose -f ../docker-compose.resolved.yml run --rm migrate
-          echo "✅ Migrations completed"
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Load cached E2E test data
-        if: steps.e2e-data-cache.outputs.cache-hit == 'true'
-        run: |
-          echo "✅ Found cached E2E test data, restoring..."
-          {
-            echo "SET session_replication_role = 'replica';"
-            cat /tmp/e2e_test_data.sql
-            echo "SET session_replication_role = 'origin';"
-          } | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
-          # Refresh materialized views after restore
-          docker compose -f ../docker-compose.resolved.yml exec -T db \
-            psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
-
-          echo "✅ E2E test data restored from cache"
-
-      - name: Set up Platform - Start (all other services)
-        run: |
-          docker compose -f ../docker-compose.resolved.yml up -d --no-build
-          echo "Waiting for rest_server to be ready..."
-          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Create E2E test data
-        if: steps.e2e-data-cache.outputs.cache-hit != 'true'
-        run: |
-          echo "Creating E2E test data..."
-          docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
-          docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
-            echo "❌ E2E test data creation failed!"
-            docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
-            exit 1
-          }
-
-          # Dump auth.users + platform schema for cache (two separate dumps)
-          echo "Dumping database for cache..."
-          {
-            docker compose -f ../docker-compose.resolved.yml exec -T db \
-              pg_dump -U postgres --data-only --column-inserts \
-              --table='auth.users' postgres
-            docker compose -f ../docker-compose.resolved.yml exec -T db \
-              pg_dump -U postgres --data-only --column-inserts \
-              --schema=platform \
-              --exclude-table='platform._prisma_migrations' \
-              --exclude-table='platform.apscheduler_jobs' \
-              --exclude-table='platform.apscheduler_jobs_batched_notifications' \
-              postgres
-          } > /tmp/e2e_test_data.sql
-
-          echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
-
-      - name: Set up tests - Enable corepack
-        run: corepack enable
-
-      - name: Set up tests - Set up Node
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
-
-      - name: Set up tests - Install dependencies
-        run: pnpm install --frozen-lockfile
-
-      - name: Set up tests - Install browser 'chromium'
-        run: pnpm playwright install --with-deps chromium
-
-      - name: Run Playwright tests
-        run: pnpm test:no-build
-        continue-on-error: false
-
-      - name: Upload Playwright report
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: playwright-report
-          path: playwright-report
-          if-no-files-found: ignore
-          retention-days: 3
-
-      - name: Upload Playwright test results
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: playwright-test-results
-          path: test-results
-          if-no-files-found: ignore
-          retention-days: 3
-
-      - name: Print Final Docker Compose logs
-        if: always()
-        run: docker compose -f ../docker-compose.resolved.yml logs
--- a/autogpt_platform/backend/backend/copilot/baseline/service.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service.py
@@ -40,7 +40,7 @@ from backend.copilot.response_model import (
 from backend.copilot.service import (
    _build_system_prompt,
    _generate_session_title,
-    _get_openai_client,
+    client,
    config,
 )
 from backend.copilot.tools import execute_tool, get_available_tools
@@ -89,7 +89,7 @@ async def _compress_session_messages(
        result = await compress_context(
            messages=messages_dict,
            model=config.model,
-            client=_get_openai_client(),
+            client=client,
        )
    except Exception as e:
        logger.warning("[Baseline] Context compression with LLM failed: %s", e)
@@ -235,7 +235,7 @@ async def stream_chat_completion_baseline(
            )
            if tools:
                create_kwargs["tools"] = tools
-            response = await _get_openai_client().chat.completions.create(**create_kwargs)  # type: ignore[arg-type]  # dynamic kwargs
+            response = await client.chat.completions.create(**create_kwargs)  # type: ignore[arg-type]  # dynamic kwargs

            # Accumulate streamed response (text + tool calls)
            round_text = ""
--- a/autogpt_platform/backend/backend/copilot/config.py
+++ b/autogpt_platform/backend/backend/copilot/config.py
@@ -94,11 +94,6 @@ class ChatConfig(BaseSettings):
        description="Use --resume for multi-turn conversations instead of "
        "history compression. Falls back to compression when unavailable.",
    )
-    use_openrouter: bool = Field(
-        default=True,
-        description="Route API calls through OpenRouter proxy. When False, the SDK "
-        "uses ANTHROPIC_API_KEY from the environment directly (no proxy hop).",
-    )
    use_claude_code_subscription: bool = Field(
        default=False,
        description="For personal/dev use: use Claude Code CLI subscription auth instead of API keys. Requires `claude login` on the host. Only works with SDK mode.",
@@ -214,15 +209,6 @@ class ChatConfig(BaseSettings):
        # Default to True (SDK enabled by default)
        return True if v is None else v

-    @field_validator("use_openrouter", mode="before")
-    @classmethod
-    def get_use_openrouter(cls, v):
-        """Get use_openrouter from environment if not provided."""
-        env_val = os.getenv("CHAT_USE_OPENROUTER", "").lower()
-        if env_val:
-            return env_val in ("true", "1", "yes", "on")
-        return True if v is None else v
-
    @field_validator("use_claude_code_subscription", mode="before")
    @classmethod
    def get_use_claude_code_subscription(cls, v):
--- a/autogpt_platform/backend/backend/copilot/constants.py
+++ b/autogpt_platform/backend/backend/copilot/constants.py
@@ -4,9 +4,6 @@
 # The hex suffix makes accidental LLM generation of these strings virtually
 # impossible, avoiding false-positive marker detection in normal conversation.
 COPILOT_ERROR_PREFIX = "[__COPILOT_ERROR_f7a1__]"  # Renders as ErrorCard
-COPILOT_RETRYABLE_ERROR_PREFIX = (
-    "[__COPILOT_RETRYABLE_ERROR_a9c2__]"  # ErrorCard + retry
-)
 COPILOT_SYSTEM_PREFIX = "[__COPILOT_SYSTEM_e3b0__]"  # Renders as system info message

 # Prefix for all synthetic IDs generated by CoPilot block execution.
@@ -38,24 +35,3 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
    Format: "{node_id}:{random_hex}" → returns "{node_id}".
    """
    return node_exec_id.rsplit(COPILOT_NODE_EXEC_ID_SEPARATOR, 1)[0]
-
-
-# ---------------------------------------------------------------------------
-# Transient Anthropic API error detection
-# ---------------------------------------------------------------------------
-# Patterns in error text that indicate a transient Anthropic API error
-# (ECONNRESET / dropped TCP connection) which is retryable.
-_TRANSIENT_ERROR_PATTERNS = (
-    "socket connection was closed unexpectedly",
-    "ECONNRESET",
-    "connection was forcibly closed",
-    "network socket disconnected",
-)
-
-FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"
-
-
-def is_transient_api_error(error_text: str) -> bool:
-    """Return True if *error_text* matches a known transient Anthropic API error."""
-    lower = error_text.lower()
-    return any(pat.lower() in lower for pat in _TRANSIENT_ERROR_PATTERNS)
--- a/autogpt_platform/backend/backend/copilot/integration_creds.py
+++ b/autogpt_platform/backend/backend/copilot/integration_creds.py
@@ -0,0 +1,162 @@
+"""Integration credential lookup with per-process TTL cache.
+
+Provides token retrieval for connected integrations so that copilot tools
+(e.g. bash_exec) can inject auth tokens into the execution environment without
+hitting the database on every command.
+
+Cache semantics (handled automatically by TTLCache):
+- Token found → cached for _TOKEN_CACHE_TTL (5 min).  Avoids repeated DB hits
+  for users who have credentials and are running many bash commands.
+- No credentials found → cached for _NULL_CACHE_TTL (60 s).  Avoids a DB hit
+  on every E2B command for users who haven't connected an account yet, while
+  still picking up a newly-connected account within one minute.
+
+Both caches are bounded to _CACHE_MAX_SIZE entries; cachetools evicts the
+least-recently-used entry when the limit is reached.
+
+Multi-worker note: both caches are in-process only.  Each worker/replica
+maintains its own independent cache, so a credential fetch may be duplicated
+across processes.  This is acceptable for the current goal (reduce DB hits per
+session per-process), but if cache efficiency across replicas becomes important
+a shared cache (e.g. Redis) should be used instead.
+"""
+
+import logging
+from typing import cast
+
+from cachetools import TTLCache
+
+from backend.data.model import APIKeyCredentials, OAuth2Credentials
+from backend.integrations.creds_manager import (
+    IntegrationCredentialsManager,
+    register_creds_changed_hook,
+)
+
+logger = logging.getLogger(__name__)
+
+# Maps provider slug → env var names to inject when the provider is connected.
+# Add new providers here when adding integration support.
+# NOTE: keep in sync with connect_integration._PROVIDER_INFO — both registries
+# must be updated when adding a new provider.
+PROVIDER_ENV_VARS: dict[str, list[str]] = {
+    "github": ["GH_TOKEN", "GITHUB_TOKEN"],
+}
+
+_TOKEN_CACHE_TTL = 300.0  # seconds — for found tokens
+_NULL_CACHE_TTL = 60.0  # seconds — for "not connected" results
+_CACHE_MAX_SIZE = 10_000
+
+# (user_id, provider) → token string.  TTLCache handles expiry + eviction.
+# Thread-safety note: TTLCache is NOT thread-safe, but that is acceptable here
+# because all callers (get_provider_token, invalidate_user_provider_cache) run
+# exclusively on the asyncio event loop.  There are no await points between a
+# cache read and its corresponding write within any function, so no concurrent
+# coroutine can interleave.  If ThreadPoolExecutor workers are ever added to
+# this path, a threading.RLock should be wrapped around these caches.
+_token_cache: TTLCache[tuple[str, str], str] = TTLCache(
+    maxsize=_CACHE_MAX_SIZE, ttl=_TOKEN_CACHE_TTL
+)
+# Separate cache for "no credentials" results with a shorter TTL.
+_null_cache: TTLCache[tuple[str, str], bool] = TTLCache(
+    maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
+)
+
+
+def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
+    """Remove the cached entry for *user_id*/*provider* from both caches.
+
+    Call this after storing new credentials so that the next
+    ``get_provider_token()`` call performs a fresh DB lookup instead of
+    serving a stale TTL-cached result.
+    """
+    key = (user_id, provider)
+    _token_cache.pop(key, None)
+    _null_cache.pop(key, None)
+
+
+# Register this module's cache-bust function with the credentials manager so
+# that any create/update/delete operation immediately evicts stale cache
+# entries.  This avoids a lazy import inside creds_manager and eliminates the
+# circular-import risk.
+register_creds_changed_hook(invalidate_user_provider_cache)
+
+# Module-level singleton to avoid re-instantiating IntegrationCredentialsManager
+# on every cache-miss call to get_provider_token().
+_manager = IntegrationCredentialsManager()
+
+
+async def get_provider_token(user_id: str, provider: str) -> str | None:
+    """Return the user's access token for *provider*, or ``None`` if not connected.
+
+    OAuth2 tokens are preferred (refreshed if needed); API keys are the fallback.
+    Found tokens are cached for _TOKEN_CACHE_TTL (5 min).  "Not connected" results
+    are cached for _NULL_CACHE_TTL (60 s) to avoid a DB hit on every bash_exec
+    command for users who haven't connected yet, while still picking up a
+    newly-connected account within one minute.
+    """
+    cache_key = (user_id, provider)
+
+    if cache_key in _null_cache:
+        return None
+    if cached := _token_cache.get(cache_key):
+        return cached
+
+    manager = _manager
+    try:
+        creds_list = await manager.store.get_creds_by_provider(user_id, provider)
+    except Exception:
+        logger.debug("Failed to fetch %s credentials for user %s", provider, user_id)
+        return None
+
+    # Pass 1: prefer OAuth2 (carry scope info, refreshable via token endpoint).
+    # Sort so broader-scoped tokens come first: a token with "repo" scope covers
+    # full git access, while a public-data-only token lacks push/pull permission.
+    # lock=False — background injection; not worth a distributed lock acquisition.
+    oauth2_creds = sorted(
+        [c for c in creds_list if c.type == "oauth2"],
+        key=lambda c: 0 if "repo" in (cast(OAuth2Credentials, c).scopes or []) else 1,
+    )
+    for creds in oauth2_creds:
+        if creds.type == "oauth2":
+            try:
+                fresh = await manager.refresh_if_needed(
+                    user_id, cast(OAuth2Credentials, creds), lock=False
+                )
+                token = fresh.access_token.get_secret_value()
+            except Exception:
+                logger.warning(
+                    "Failed to refresh %s OAuth token for user %s; "
+                    "falling back to potentially stale token",
+                    provider,
+                    user_id,
+                )
+                token = cast(OAuth2Credentials, creds).access_token.get_secret_value()
+            _token_cache[cache_key] = token
+            return token
+
+    # Pass 2: fall back to API key (no expiry, no refresh needed).
+    for creds in creds_list:
+        if creds.type == "api_key":
+            token = cast(APIKeyCredentials, creds).api_key.get_secret_value()
+            _token_cache[cache_key] = token
+            return token
+
+    # No credentials found — cache to avoid repeated DB hits.
+    _null_cache[cache_key] = True
+    return None
+
+
+async def get_integration_env_vars(user_id: str) -> dict[str, str]:
+    """Return env vars for all providers the user has connected.
+
+    Iterates :data:`PROVIDER_ENV_VARS`, fetches each token, and builds a flat
+    ``{env_var: token}`` dict ready to pass to a subprocess or E2B sandbox.
+    Only providers with a stored credential contribute entries.
+    """
+    env: dict[str, str] = {}
+    for provider, var_names in PROVIDER_ENV_VARS.items():
+        token = await get_provider_token(user_id, provider)
+        if token:
+            for var in var_names:
+                env[var] = token
+    return env
--- a/autogpt_platform/backend/backend/copilot/integration_creds_test.py
+++ b/autogpt_platform/backend/backend/copilot/integration_creds_test.py
@@ -0,0 +1,193 @@
+"""Tests for integration_creds — TTL cache and token lookup paths."""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from pydantic import SecretStr
+
+from backend.copilot.integration_creds import (
+    _NULL_CACHE_TTL,
+    _TOKEN_CACHE_TTL,
+    PROVIDER_ENV_VARS,
+    _null_cache,
+    _token_cache,
+    get_integration_env_vars,
+    get_provider_token,
+    invalidate_user_provider_cache,
+)
+from backend.data.model import APIKeyCredentials, OAuth2Credentials
+
+_USER = "user-integration-creds-test"
+_PROVIDER = "github"
+
+
+def _make_api_key_creds(key: str = "test-api-key") -> APIKeyCredentials:
+    return APIKeyCredentials(
+        id="creds-api-key",
+        provider=_PROVIDER,
+        api_key=SecretStr(key),
+        title="Test API Key",
+        expires_at=None,
+    )
+
+
+def _make_oauth2_creds(token: str = "test-oauth-token") -> OAuth2Credentials:
+    return OAuth2Credentials(
+        id="creds-oauth2",
+        provider=_PROVIDER,
+        title="Test OAuth",
+        access_token=SecretStr(token),
+        refresh_token=SecretStr("test-refresh"),
+        access_token_expires_at=None,
+        refresh_token_expires_at=None,
+        scopes=[],
+    )
+
+
+@pytest.fixture(autouse=True)
+def clear_caches():
+    """Ensure clean caches before and after every test."""
+    _token_cache.clear()
+    _null_cache.clear()
+    yield
+    _token_cache.clear()
+    _null_cache.clear()
+
+
+class TestInvalidateUserProviderCache:
+    def test_removes_token_entry(self):
+        key = (_USER, _PROVIDER)
+        _token_cache[key] = "tok"
+        invalidate_user_provider_cache(_USER, _PROVIDER)
+        assert key not in _token_cache
+
+    def test_removes_null_entry(self):
+        key = (_USER, _PROVIDER)
+        _null_cache[key] = True
+        invalidate_user_provider_cache(_USER, _PROVIDER)
+        assert key not in _null_cache
+
+    def test_noop_when_key_not_cached(self):
+        # Should not raise even when there is no cache entry.
+        invalidate_user_provider_cache("no-such-user", _PROVIDER)
+
+    def test_only_removes_targeted_key(self):
+        other_key = ("other-user", _PROVIDER)
+        _token_cache[other_key] = "other-tok"
+        invalidate_user_provider_cache(_USER, _PROVIDER)
+        assert other_key in _token_cache
+
+
+class TestGetProviderToken:
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_returns_cached_token_without_db_hit(self):
+        _token_cache[(_USER, _PROVIDER)] = "cached-tok"
+
+        mock_manager = MagicMock()
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result == "cached-tok"
+        mock_manager.store.get_creds_by_provider.assert_not_called()
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_returns_none_for_null_cached_provider(self):
+        _null_cache[(_USER, _PROVIDER)] = True
+
+        mock_manager = MagicMock()
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result is None
+        mock_manager.store.get_creds_by_provider.assert_not_called()
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_api_key_creds_returned_and_cached(self):
+        api_creds = _make_api_key_creds("my-api-key")
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[api_creds])
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result == "my-api-key"
+        assert _token_cache.get((_USER, _PROVIDER)) == "my-api-key"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_oauth2_preferred_over_api_key(self):
+        oauth_creds = _make_oauth2_creds("oauth-tok")
+        api_creds = _make_api_key_creds("api-tok")
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(
+            return_value=[api_creds, oauth_creds]
+        )
+        mock_manager.refresh_if_needed = AsyncMock(return_value=oauth_creds)
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result == "oauth-tok"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_oauth2_refresh_failure_falls_back_to_stale_token(self):
+        oauth_creds = _make_oauth2_creds("stale-oauth-tok")
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[oauth_creds])
+        mock_manager.refresh_if_needed = AsyncMock(side_effect=RuntimeError("network"))
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result == "stale-oauth-tok"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_no_credentials_caches_null_entry(self):
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[])
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result is None
+        assert _null_cache.get((_USER, _PROVIDER)) is True
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_db_exception_returns_none_without_caching(self):
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(
+            side_effect=RuntimeError("db down")
+        )
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result is None
+        # DB errors are not cached — next call will retry
+        assert (_USER, _PROVIDER) not in _token_cache
+        assert (_USER, _PROVIDER) not in _null_cache
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_null_cache_has_shorter_ttl_than_token_cache(self):
+        """Verify the TTL constants are set correctly for each cache."""
+        assert _null_cache.ttl == _NULL_CACHE_TTL
+        assert _token_cache.ttl == _TOKEN_CACHE_TTL
+        assert _NULL_CACHE_TTL < _TOKEN_CACHE_TTL
+
+
+class TestGetIntegrationEnvVars:
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_injects_all_env_vars_for_provider(self):
+        _token_cache[(_USER, "github")] = "gh-tok"
+
+        result = await get_integration_env_vars(_USER)
+
+        for var in PROVIDER_ENV_VARS["github"]:
+            assert result[var] == "gh-tok"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_empty_dict_when_no_credentials(self):
+        _null_cache[(_USER, "github")] = True
+
+        result = await get_integration_env_vars(_USER)
+
+        assert result == {}
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -95,6 +95,25 @@ Example — committing an image file to GitHub:
  All tasks must run in the foreground.
 """

+# E2B-only notes — E2B has full internet access so gh CLI works there.
+# Not shown in local (bubblewrap) mode: --unshare-net blocks all network.
+_E2B_TOOL_NOTES = """
+### GitHub CLI (`gh`) and git
+- If the user has connected their GitHub account, both `gh` and `git` are
+  pre-authenticated — use them directly without any manual login step.
+  `git` HTTPS operations (clone, push, pull) work automatically.
+- If the token changes mid-session (e.g. user reconnects with a new token),
+  run `gh auth setup-git` to re-register the credential helper.
+- If `gh` or `git` fails with an authentication error (e.g. "authentication
+  required", "could not read Username", or exit code 128), call
+  `connect_integration(provider="github")` to surface the GitHub credentials
+  setup card so the user can connect their account. Once connected, retry
+  the operation.
+- For operations that need broader access (e.g. private org repos, GitHub
+  Actions), pass the required scopes: e.g.
+  `connect_integration(provider="github", scopes=["repo", "read:org"])`.
+"""
+

 # Environment-specific supplement templates
 def _build_storage_supplement(
@@ -105,6 +124,7 @@ def _build_storage_supplement(
    storage_system_1_persistence: list[str],
    file_move_name_1_to_2: str,
    file_move_name_2_to_1: str,
+    extra_notes: str = "",
 ) -> str:
    """Build storage/filesystem supplement for a specific environment.

@@ -119,6 +139,7 @@ def _build_storage_supplement(
        storage_system_1_persistence: List of persistence behavior descriptions
        file_move_name_1_to_2: Direction label for primary→persistent
        file_move_name_2_to_1: Direction label for persistent→primary
+        extra_notes: Environment-specific notes appended after shared notes
    """
    # Format lists as bullet points with proper indentation
    characteristics = "\n".join(f"   - {c}" for c in storage_system_1_characteristics)
@@ -152,12 +173,16 @@ def _build_storage_supplement(

 ### File persistence
 Important files (code, configs, outputs) should be saved to workspace to ensure they persist.
-{_SHARED_TOOL_NOTES}"""
+{_SHARED_TOOL_NOTES}{extra_notes}"""


 # Pre-built supplements for common environments
 def _get_local_storage_supplement(cwd: str) -> str:
-    """Local ephemeral storage (files lost between turns)."""
+    """Local ephemeral storage (files lost between turns).
+
+    Network is isolated (bubblewrap --unshare-net), so internet-dependent CLIs
+    like gh will not work — no integration env-var notes are included.
+    """
    return _build_storage_supplement(
        working_dir=cwd,
        sandbox_type="in a network-isolated sandbox",
@@ -175,7 +200,11 @@ def _get_local_storage_supplement(cwd: str) -> str:


 def _get_cloud_sandbox_supplement() -> str:
-    """Cloud persistent sandbox (files survive across turns in session)."""
+    """Cloud persistent sandbox (files survive across turns in session).
+
+    E2B has full internet access, so integration tokens (GH_TOKEN etc.) are
+    injected per command in bash_exec — include the CLI guidance notes.
+    """
    return _build_storage_supplement(
        working_dir="/home/user",
        sandbox_type="in a cloud sandbox with full internet access",
@@ -190,6 +219,7 @@ def _get_cloud_sandbox_supplement() -> str:
        ],
        file_move_name_1_to_2="Sandbox → Persistent",
        file_move_name_2_to_1="Persistent → Sandbox",
+        extra_notes=_E2B_TOOL_NOTES,
    )


--- a/autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
@@ -20,7 +20,6 @@ from claude_agent_sdk import (
    UserMessage,
 )

-from backend.copilot.constants import FRIENDLY_TRANSIENT_MSG, is_transient_api_error
 from backend.copilot.response_model import (
    StreamBaseResponse,
    StreamError,
@@ -215,12 +214,10 @@ class SDKResponseAdapter:
            if sdk_message.subtype == "success":
                responses.append(StreamFinish())
            elif sdk_message.subtype in ("error", "error_during_execution"):
-                raw_error = str(sdk_message.result or "Unknown error")
-                if is_transient_api_error(raw_error):
-                    error_text, code = FRIENDLY_TRANSIENT_MSG, "transient_api_error"
-                else:
-                    error_text, code = raw_error, "sdk_error"
-                responses.append(StreamError(errorText=error_text, code=code))
+                error_msg = sdk_message.result or "Unknown error"
+                responses.append(
+                    StreamError(errorText=str(error_msg), code="sdk_error")
+                )
                responses.append(StreamFinish())
            else:
                logger.warning(
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
@@ -37,13 +37,7 @@ from backend.util.prompt import compress_context
 from backend.util.settings import Settings

 from ..config import ChatConfig
-from ..constants import (
-    COPILOT_ERROR_PREFIX,
-    COPILOT_RETRYABLE_ERROR_PREFIX,
-    COPILOT_SYSTEM_PREFIX,
-    FRIENDLY_TRANSIENT_MSG,
-    is_transient_api_error,
-)
+from ..constants import COPILOT_ERROR_PREFIX, COPILOT_SYSTEM_PREFIX
 from ..model import (
    ChatMessage,
    ChatSession,
@@ -94,28 +88,6 @@ logger = logging.getLogger(__name__)
 config = ChatConfig()


-def _append_error_marker(
-    session: ChatSession | None,
-    display_msg: str,
-    *,
-    retryable: bool = False,
-) -> None:
-    """Append a copilot error marker to *session* so it persists across refresh.
-
-    Args:
-        session: The chat session to append to (no-op if ``None``).
-        display_msg: User-visible error text.
-        retryable: If ``True``, use the retryable prefix so the frontend
-            shows a "Try Again" button.
-    """
-    if session is None:
-        return
-    prefix = COPILOT_RETRYABLE_ERROR_PREFIX if retryable else COPILOT_ERROR_PREFIX
-    session.messages.append(
-        ChatMessage(role="assistant", content=f"{prefix} {display_msg}")
-    )
-
-
 def _setup_langfuse_otel() -> None:
    """Configure OTEL tracing for the Claude Agent SDK → Langfuse.

@@ -235,57 +207,61 @@ def _build_sdk_env(
    session_id: str | None = None,
    user_id: str | None = None,
 ) -> dict[str, str]:
-    """Build env vars for the SDK CLI subprocess.
+    """Build env vars for the SDK CLI process.

-    Three modes (checked in order):
-    1. **Subscription** — clears all keys; CLI uses ``claude login`` auth.
-    2. **Direct Anthropic** — returns ``{}``; subprocess inherits
-       ``ANTHROPIC_API_KEY`` from the parent environment.
-    3. **OpenRouter** (default) — overrides base URL and auth token to
-       route through the proxy, with Langfuse trace headers.
+    Routes API calls through OpenRouter (or a custom base_url) using
+    the same ``config.api_key`` / ``config.base_url`` as the non-SDK path.
+    This gives per-call token and cost tracking on the OpenRouter dashboard.
+
+    When *session_id* is provided, an ``x-session-id`` custom header is
+    injected via ``ANTHROPIC_CUSTOM_HEADERS`` so that OpenRouter Broadcast
+    forwards traces (including cost/usage) to Langfuse for the
+    ``/api/v1/messages`` endpoint.
+
+    Only overrides ``ANTHROPIC_API_KEY`` when a valid proxy URL and auth
+    token are both present — otherwise returns an empty dict so the SDK
+    falls back to its default credentials.
    """
-    # --- Mode 1: Claude Code subscription auth ---
+    env: dict[str, str] = {}
+
    if config.use_claude_code_subscription:
+        # Claude Code subscription: let the CLI use its own logged-in auth.
+        # Explicitly clear API key env vars so the subprocess doesn't pick
+        # them up from the parent process and bypass subscription auth.
        _validate_claude_code_subscription()
-        return {
-            "ANTHROPIC_API_KEY": "",
-            "ANTHROPIC_AUTH_TOKEN": "",
-            "ANTHROPIC_BASE_URL": "",
-        }
-
-    # --- Mode 2: Direct Anthropic (no proxy hop) ---
-    # Also the fallback when OpenRouter is enabled but credentials are missing.
-    # Strip /v1 suffix — SDK expects the base URL without a version path.
-    base = (config.base_url or "").rstrip("/")
-    if base.endswith("/v1"):
-        base = base[:-3]
-    if (
-        not config.use_openrouter
-        or not config.api_key
-        or not base
-        or not base.startswith("http")
-    ):
-        return {}
-
-    # --- Mode 3: OpenRouter proxy ---
-    env: dict[str, str] = {
-        "ANTHROPIC_BASE_URL": base,
-        "ANTHROPIC_AUTH_TOKEN": config.api_key,
-        "ANTHROPIC_API_KEY": "",  # force CLI to use AUTH_TOKEN
-    }
+        env["ANTHROPIC_API_KEY"] = ""
+        env["ANTHROPIC_AUTH_TOKEN"] = ""
+        env["ANTHROPIC_BASE_URL"] = ""
+    elif config.api_key and config.base_url:
+        # Strip /v1 suffix — SDK expects the base URL without a version path
+        base = config.base_url.rstrip("/")
+        if base.endswith("/v1"):
+            base = base[:-3]
+        if not base or not base.startswith("http"):
+            # Invalid base_url — don't override SDK defaults
+            return env
+        env["ANTHROPIC_BASE_URL"] = base
+        env["ANTHROPIC_AUTH_TOKEN"] = config.api_key
+        # Must be explicitly empty so the CLI uses AUTH_TOKEN instead
+        env["ANTHROPIC_API_KEY"] = ""

    # Inject broadcast headers so OpenRouter forwards traces to Langfuse.
-    def _safe(v: str) -> str:
-        """Sanitise a header value: strip newlines/whitespace and cap length."""
-        return v.replace("\r", "").replace("\n", "").strip()[:128]
+    # The ``x-session-id`` header is *required* for the Anthropic-native
+    # ``/messages`` endpoint — without it broadcast silently drops the
+    # trace even when org-level Langfuse integration is configured.
+    def _safe(value: str) -> str:
+        """Strip CR/LF to prevent header injection, then truncate."""
+        return value.replace("\r", "").replace("\n", "").strip()[:128]

-    parts = []
+    headers: list[str] = []
    if session_id:
-        parts.append(f"x-session-id: {_safe(session_id)}")
+        headers.append(f"x-session-id: {_safe(session_id)}")
    if user_id:
-        parts.append(f"x-user-id: {_safe(user_id)}")
-    if parts:
-        env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)
+        headers.append(f"x-user-id: {_safe(user_id)}")
+    # Only inject headers when routing through OpenRouter/proxy — they're
+    # meaningless (and leak internal IDs) when using subscription mode.
+    if headers and env.get("ANTHROPIC_BASE_URL"):
+        env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(headers)

    return env

@@ -677,17 +653,13 @@ async def stream_chat_completion_sdk(
    # Type narrowing: session is guaranteed ChatSession after the check above
    session = cast(ChatSession, session)

-    # Clean up ALL trailing error markers from previous turn before starting
-    # a new turn.  Multiple markers can accumulate when a mid-stream error is
-    # followed by a cleanup error in __aexit__ (both append a marker).
-    while (
+    # Clean up stale error markers from previous turn before starting new turn
+    # If the last message contains an error marker, remove it (user is retrying)
+    if (
        len(session.messages) > 0
        and session.messages[-1].role == "assistant"
        and session.messages[-1].content
-        and (
-            COPILOT_ERROR_PREFIX in session.messages[-1].content
-            or COPILOT_RETRYABLE_ERROR_PREFIX in session.messages[-1].content
-        )
+        and COPILOT_ERROR_PREFIX in session.messages[-1].content
    ):
        logger.info(
            "[SDK] [%s] Removing stale error marker from previous turn",
@@ -797,7 +769,7 @@ async def stream_chat_completion_sdk(
                    )
                return None
            try:
-                return await get_or_create_sandbox(
+                sandbox = await get_or_create_sandbox(
                    session_id,
                    api_key=e2b_api_key,
                    template=config.e2b_sandbox_template,
@@ -811,7 +783,9 @@ async def stream_chat_completion_sdk(
                    e2b_err,
                    exc_info=True,
                )
-            return None
+                return None
+
+            return sandbox

        async def _fetch_transcript():
            """Download transcript for --resume if applicable."""
@@ -825,7 +799,7 @@ async def stream_chat_completion_sdk(
                )
            except Exception as transcript_err:
                logger.warning(
-                    "%s Transcript download failed, continuing without --resume: %s",
+                    "%s Transcript download failed, continuing without " "--resume: %s",
                    log_prefix,
                    transcript_err,
                )
@@ -848,7 +822,7 @@ async def stream_chat_completion_sdk(
            is_valid = validate_transcript(dl.content)
            dl_lines = dl.content.strip().split("\n") if dl.content else []
            logger.info(
-                "%s Downloaded transcript: %dB, %d lines, msg_count=%d, valid=%s",
+                "%s Downloaded transcript: %dB, %d lines, " "msg_count=%d, valid=%s",
                log_prefix,
                len(dl.content),
                len(dl_lines),
@@ -1067,36 +1041,23 @@ async def stream_chat_completion_sdk(
                        # Exception in receive_response() — capture it
                        # so the session can still be saved and the
                        # frontend gets a clean finish.
-                        if is_transient_api_error(str(stream_err)):
-                            log, display, code = (
-                                logger.warning,
-                                FRIENDLY_TRANSIENT_MSG,
-                                "transient_api_error",
-                            )
-                        else:
-                            log, display, code = (
-                                logger.error,
-                                f"SDK stream error: {stream_err}",
-                                "sdk_stream_error",
-                            )
-
-                        log(
+                        logger.error(
                            "%s Stream error from SDK: %s",
                            log_prefix,
                            stream_err,
                            exc_info=True,
                        )
                        ended_with_stream_error = True
-                        _append_error_marker(
-                            session,
-                            display,
-                            retryable=(code == "transient_api_error"),
+
+                        yield StreamError(
+                            errorText=f"SDK stream error: {stream_err}",
+                            code="sdk_stream_error",
                        )
-                        yield StreamError(errorText=display, code=code)
                        break

                    logger.info(
-                        "%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
+                        "%s Received: %s %s "
+                        "(unresolved=%d, current=%d, resolved=%d)",
                        log_prefix,
                        type(sdk_msg).__name__,
                        getattr(sdk_msg, "subtype", ""),
@@ -1110,42 +1071,15 @@ async def stream_chat_completion_sdk(
                    # so we can debug Anthropic API 400s surfaced by the CLI.
                    sdk_error = getattr(sdk_msg, "error", None)
                    if isinstance(sdk_msg, AssistantMessage) and sdk_error:
-                        error_text = str(sdk_error)
-                        error_preview = str(sdk_msg.content)[:500]
                        logger.error(
                            "[SDK] [%s] AssistantMessage has error=%s, "
                            "content_blocks=%d, content_preview=%s",
                            session_id[:12],
                            sdk_error,
                            len(sdk_msg.content),
-                            error_preview,
+                            str(sdk_msg.content)[:500],
                        )

-                        # Intercept transient API errors (socket closed,
-                        # ECONNRESET) — replace the raw message with a
-                        # user-friendly error text and use the retryable
-                        # error prefix so the frontend shows a retry button.
-                        # Check both the error field and content for patterns.
-                        if is_transient_api_error(error_text) or is_transient_api_error(
-                            error_preview
-                        ):
-                            logger.warning(
-                                "%s Transient Anthropic API error detected, "
-                                "suppressing raw error text",
-                                log_prefix,
-                            )
-                            ended_with_stream_error = True
-                            _append_error_marker(
-                                session,
-                                FRIENDLY_TRANSIENT_MSG,
-                                retryable=True,
-                            )
-                            yield StreamError(
-                                errorText=FRIENDLY_TRANSIENT_MSG,
-                                code="transient_api_error",
-                            )
-                            break
-
                    # Race-condition fix: SDK hooks (PostToolUse) are
                    # executed asynchronously via start_soon() — the next
                    # message can arrive before the hook stashes output.
@@ -1244,7 +1178,7 @@ async def stream_chat_completion_sdk(
                                extra,
                            )

-                        # Persist error markers so they survive page refresh
+                        # Log errors being sent to frontend
                        if isinstance(response, StreamError):
                            logger.error(
                                "%s Sending error to frontend: %s (code=%s)",
@@ -1252,12 +1186,6 @@ async def stream_chat_completion_sdk(
                                response.errorText,
                                response.code,
                            )
-                            _append_error_marker(
-                                session,
-                                response.errorText,
-                                retryable=(response.code == "transient_api_error"),
-                            )
-                            ended_with_stream_error = True

                        yield response

@@ -1452,18 +1380,14 @@ async def stream_chat_completion_sdk(
            else:
                logger.error("%s Error: %s", log_prefix, error_msg, exc_info=True)

-        is_transient = is_transient_api_error(error_msg)
-        if is_transient:
-            display_msg, code = FRIENDLY_TRANSIENT_MSG, "transient_api_error"
-        else:
-            display_msg, code = error_msg, "sdk_error"
-
-        # Append error marker to session (non-invasive text parsing approach).
-        # The finally block will persist the session with this error marker.
-        # Skip if a marker was already appended inside the stream loop
-        # (ended_with_stream_error) to avoid duplicate stale markers.
-        if not ended_with_stream_error:
-            _append_error_marker(session, display_msg, retryable=is_transient)
+        # Append error marker to session (non-invasive text parsing approach)
+        # The finally block will persist the session with this error marker
+        if session:
+            session.messages.append(
+                ChatMessage(
+                    role="assistant", content=f"{COPILOT_ERROR_PREFIX} {error_msg}"
+                )
+            )
            logger.debug(
                "%s Appended error marker, will be persisted in finally",
                log_prefix,
@@ -1475,7 +1399,10 @@ async def stream_chat_completion_sdk(
            isinstance(e, RuntimeError) and "cancel scope" in str(e)
        )
        if not is_cancellation:
-            yield StreamError(errorText=display_msg, code=code)
+            yield StreamError(
+                errorText=error_msg,
+                code="sdk_error",
+            )

        raise
    finally:
--- a/autogpt_platform/backend/backend/copilot/service.py
+++ b/autogpt_platform/backend/backend/copilot/service.py
@@ -28,24 +28,10 @@ logger = logging.getLogger(__name__)

 config = ChatConfig()
 settings = Settings()
-
-_client: LangfuseAsyncOpenAI | None = None
-_langfuse = None
+client = LangfuseAsyncOpenAI(api_key=config.api_key, base_url=config.base_url)


-def _get_openai_client() -> LangfuseAsyncOpenAI:
-    global _client
-    if _client is None:
-        _client = LangfuseAsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
-    return _client
-
-
-def _get_langfuse():
-    global _langfuse
-    if _langfuse is None:
-        _langfuse = get_client()
-    return _langfuse
-
+langfuse = get_client()

 # Default system prompt used when Langfuse is not configured
 # Provides minimal baseline tone and personality - all workflow, tools, and
@@ -98,7 +84,7 @@ async def _get_system_prompt_template(context: str) -> str:
                else "latest"
            )
            prompt = await asyncio.to_thread(
-                _get_langfuse().get_prompt,
+                langfuse.get_prompt,
                config.langfuse_prompt_name,
                label=label,
                cache_ttl_seconds=config.langfuse_prompt_cache_ttl,
@@ -172,7 +158,7 @@ async def _generate_session_title(
            "environment": settings.config.app_env.value,
        }

-        response = await _get_openai_client().chat.completions.create(
+        response = await client.chat.completions.create(
            model=config.title_model,
            messages=[
                {
--- a/autogpt_platform/backend/backend/copilot/tools/init.py
+++ b/autogpt_platform/backend/backend/copilot/tools/init.py
@@ -12,6 +12,7 @@ from .agent_browser import BrowserActTool, BrowserNavigateTool, BrowserScreensho
 from .agent_output import AgentOutputTool
 from .base import BaseTool
 from .bash_exec import BashExecTool
+from .connect_integration import ConnectIntegrationTool
 from .continue_run_block import ContinueRunBlockTool
 from .create_agent import CreateAgentTool
 from .customize_agent import CustomizeAgentTool
@@ -84,6 +85,7 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
    "browser_screenshot": BrowserScreenshotTool(),
    # Sandboxed code execution (bubblewrap)
    "bash_exec": BashExecTool(),
+    "connect_integration": ConnectIntegrationTool(),
    # Persistent workspace tools (cloud storage, survives across sessions)
    # Feature request tools
    "search_feature_requests": SearchFeatureRequestsTool(),
--- a/autogpt_platform/backend/backend/copilot/tools/bash_exec.py
+++ b/autogpt_platform/backend/backend/copilot/tools/bash_exec.py
@@ -22,6 +22,7 @@ from e2b import AsyncSandbox
 from e2b.exceptions import TimeoutException

 from backend.copilot.context import E2B_WORKDIR, get_current_sandbox
+from backend.copilot.integration_creds import get_integration_env_vars
 from backend.copilot.model import ChatSession

 from .base import BaseTool
@@ -96,7 +97,9 @@ class BashExecTool(BaseTool):

        sandbox = get_current_sandbox()
        if sandbox is not None:
-            return await self._execute_on_e2b(sandbox, command, timeout, session_id)
+            return await self._execute_on_e2b(
+                sandbox, command, timeout, session_id, user_id
+            )

        # Bubblewrap fallback: local isolated execution.
        if not has_full_sandbox():
@@ -133,14 +136,27 @@ class BashExecTool(BaseTool):
        command: str,
        timeout: int,
        session_id: str | None,
+        user_id: str | None = None,
    ) -> ToolResponseBase:
-        """Execute *command* on the E2B sandbox via commands.run()."""
+        """Execute *command* on the E2B sandbox via commands.run().
+
+        Integration tokens (e.g. GH_TOKEN) are injected into the sandbox env
+        for any user with connected accounts. E2B has full internet access, so
+        CLI tools like ``gh`` work without manual authentication.
+        """
+        envs: dict[str, str] = {
+            "PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin",
+        }
+        if user_id is not None:
+            integration_env = await get_integration_env_vars(user_id)
+            envs.update(integration_env)
+
        try:
            result = await sandbox.commands.run(
                f"bash -c {shlex.quote(command)}",
                cwd=E2B_WORKDIR,
                timeout=timeout,
-                envs={"PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"},
+                envs=envs,
            )
            return BashExecResponse(
                message=f"Command executed on E2B (exit {result.exit_code})",
--- a/autogpt_platform/backend/backend/copilot/tools/bash_exec_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/bash_exec_test.py
@@ -0,0 +1,78 @@
+"""Tests for BashExecTool — E2B path with token injection."""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from ._test_data import make_session
+from .bash_exec import BashExecTool
+from .models import BashExecResponse
+
+_USER = "user-bash-exec-test"
+
+
+def _make_tool() -> BashExecTool:
+    return BashExecTool()
+
+
+def _make_sandbox(exit_code: int = 0, stdout: str = "", stderr: str = "") -> MagicMock:
+    result = MagicMock()
+    result.exit_code = exit_code
+    result.stdout = stdout
+    result.stderr = stderr
+
+    sandbox = MagicMock()
+    sandbox.commands.run = AsyncMock(return_value=result)
+    return sandbox
+
+
+class TestBashExecE2BTokenInjection:
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_token_injected_when_user_id_set(self):
+        """When user_id is provided, integration env vars are merged into sandbox envs."""
+        tool = _make_tool()
+        session = make_session(user_id=_USER)
+        sandbox = _make_sandbox(stdout="ok")
+        env_vars = {"GH_TOKEN": "gh-secret", "GITHUB_TOKEN": "gh-secret"}
+
+        with patch(
+            "backend.copilot.tools.bash_exec.get_integration_env_vars",
+            new=AsyncMock(return_value=env_vars),
+        ) as mock_get_env:
+            result = await tool._execute_on_e2b(
+                sandbox=sandbox,
+                command="echo hi",
+                timeout=10,
+                session_id=session.session_id,
+                user_id=_USER,
+            )
+
+        mock_get_env.assert_awaited_once_with(_USER)
+        call_kwargs = sandbox.commands.run.call_args[1]
+        assert call_kwargs["envs"]["GH_TOKEN"] == "gh-secret"
+        assert call_kwargs["envs"]["GITHUB_TOKEN"] == "gh-secret"
+        assert isinstance(result, BashExecResponse)
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_no_token_injection_when_user_id_is_none(self):
+        """When user_id is None, get_integration_env_vars must NOT be called."""
+        tool = _make_tool()
+        session = make_session(user_id=_USER)
+        sandbox = _make_sandbox(stdout="ok")
+
+        with patch(
+            "backend.copilot.tools.bash_exec.get_integration_env_vars",
+            new=AsyncMock(return_value={"GH_TOKEN": "should-not-appear"}),
+        ) as mock_get_env:
+            result = await tool._execute_on_e2b(
+                sandbox=sandbox,
+                command="echo hi",
+                timeout=10,
+                session_id=session.session_id,
+                user_id=None,
+            )
+
+        mock_get_env.assert_not_called()
+        call_kwargs = sandbox.commands.run.call_args[1]
+        assert "GH_TOKEN" not in call_kwargs["envs"]
+        assert isinstance(result, BashExecResponse)
--- a/autogpt_platform/backend/backend/copilot/tools/connect_integration.py
+++ b/autogpt_platform/backend/backend/copilot/tools/connect_integration.py
@@ -0,0 +1,215 @@
+"""Tool for prompting the user to connect a required integration.
+
+When the copilot encounters an authentication failure (e.g. `gh` CLI returns
+"authentication required"), it calls this tool to surface the credentials
+setup card in the chat — the same UI that appears when a GitHub block runs
+without configured credentials.
+"""
+
+import functools
+from typing import Any, TypedDict
+
+from backend.copilot.model import ChatSession
+from backend.copilot.tools.models import (
+    ErrorResponse,
+    ResponseType,
+    SetupInfo,
+    SetupRequirementsResponse,
+    ToolResponseBase,
+    UserReadiness,
+)
+
+from .base import BaseTool
+
+
+class _ProviderInfo(TypedDict):
+    name: str
+    types: list[str]
+    # Default OAuth scopes requested when the agent doesn't specify any.
+    scopes: list[str]
+
+
+class _CredentialEntry(TypedDict):
+    """Shape of each entry inside SetupRequirementsResponse.user_readiness.missing_credentials."""
+
+    id: str
+    title: str
+    provider: str
+    provider_name: str
+    type: str
+    types: list[str]
+    scopes: list[str]
+
+
+@functools.lru_cache(maxsize=1)
+def _is_github_oauth_configured() -> bool:
+    """Return True if GitHub OAuth env vars are set.
+
+    Evaluated lazily (not at import time) to avoid triggering Secrets() during
+    module import, which can fail in environments where secrets are not loaded.
+    """
+    from backend.blocks.github._auth import GITHUB_OAUTH_IS_CONFIGURED
+
+    return GITHUB_OAUTH_IS_CONFIGURED
+
+
+# Registry of known providers: name + supported credential types for the UI.
+# When adding a new provider, also add its env var names to
+# backend.copilot.integration_creds.PROVIDER_ENV_VARS.
+def _get_provider_info() -> dict[str, _ProviderInfo]:
+    """Build the provider registry, evaluating OAuth config lazily."""
+    return {
+        "github": {
+            "name": "GitHub",
+            "types": (
+                ["api_key", "oauth2"] if _is_github_oauth_configured() else ["api_key"]
+            ),
+            # Default: repo scope covers clone/push/pull for public and private repos.
+            # Agent can request additional scopes (e.g. "read:org") via the scopes param.
+            "scopes": ["repo"],
+        },
+    }
+
+
+class ConnectIntegrationTool(BaseTool):
+    """Surface the credentials setup UI when an integration is not connected."""
+
+    @property
+    def name(self) -> str:
+        return "connect_integration"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Prompt the user to connect a required integration (e.g. GitHub). "
+            "Call this when an external CLI or API call fails because the user "
+            "has not connected the relevant account. "
+            "The tool surfaces a credentials setup card in the chat so the user "
+            "can authenticate without leaving the page. "
+            "After the user connects the account, retry the operation. "
+            "In E2B/cloud sandbox mode the token (GH_TOKEN/GITHUB_TOKEN) is "
+            "automatically injected per-command in bash_exec — no manual export needed. "
+            "In local bubblewrap mode network is isolated so GitHub CLI commands "
+            "will still fail after connecting; inform the user of this limitation."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "provider": {
+                    "type": "string",
+                    "description": (
+                        "Integration provider slug, e.g. 'github'. "
+                        "Must be one of the supported providers."
+                    ),
+                    "enum": list(_get_provider_info().keys()),
+                },
+                "reason": {
+                    "type": "string",
+                    "description": (
+                        "Brief explanation of why the integration is needed, "
+                        "shown to the user in the setup card."
+                    ),
+                    "maxLength": 500,
+                },
+                "scopes": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": (
+                        "OAuth scopes to request. Omit to use the provider default. "
+                        "Add extra scopes when you need more access — e.g. for GitHub: "
+                        "'repo' (clone/push/pull), 'read:org' (org membership), "
+                        "'workflow' (GitHub Actions). "
+                        "Requesting only the scopes you actually need is best practice."
+                    ),
+                },
+            },
+            "required": ["provider"],
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        # Require auth so only authenticated users can trigger the setup card.
+        # The card itself is user-agnostic (no per-user data needed), so
+        # user_id is intentionally unused in _execute.
+        return True
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs: Any,
+    ) -> ToolResponseBase:
+        del user_id  # setup card is user-agnostic; auth is enforced via requires_auth
+        session_id = session.session_id if session else None
+        provider: str = (kwargs.get("provider") or "").strip().lower()
+        reason: str = (kwargs.get("reason") or "").strip()[
+            :500
+        ]  # cap LLM-controlled text
+        extra_scopes: list[str] = [
+            str(s).strip() for s in (kwargs.get("scopes") or []) if str(s).strip()
+        ]
+
+        provider_info = _get_provider_info()
+        info = provider_info.get(provider)
+        if not info:
+            supported = ", ".join(f"'{p}'" for p in provider_info)
+            return ErrorResponse(
+                message=(
+                    f"Unknown provider '{provider}'. "
+                    f"Supported providers: {supported}."
+                ),
+                error="unknown_provider",
+                session_id=session_id,
+            )
+
+        provider_name: str = info["name"]
+        supported_types: list[str] = info["types"]
+        # Merge agent-requested scopes with provider defaults (deduplicated, order preserved).
+        default_scopes: list[str] = info["scopes"]
+        seen: set[str] = set()
+        scopes: list[str] = []
+        for s in default_scopes + extra_scopes:
+            if s not in seen:
+                seen.add(s)
+                scopes.append(s)
+        field_key = f"{provider}_credentials"
+
+        message_parts = [
+            f"To continue, please connect your {provider_name} account.",
+        ]
+        if reason:
+            message_parts.append(reason)
+
+        credential_entry: _CredentialEntry = {
+            "id": field_key,
+            "title": f"{provider_name} Credentials",
+            "provider": provider,
+            "provider_name": provider_name,
+            "type": supported_types[0],
+            "types": supported_types,
+            "scopes": scopes,
+        }
+        missing_credentials: dict[str, _CredentialEntry] = {field_key: credential_entry}
+
+        return SetupRequirementsResponse(
+            type=ResponseType.SETUP_REQUIREMENTS,
+            message=" ".join(message_parts),
+            session_id=session_id,
+            setup_info=SetupInfo(
+                agent_id=f"connect_{provider}",
+                agent_name=provider_name,
+                user_readiness=UserReadiness(
+                    has_all_credentials=False,
+                    missing_credentials=missing_credentials,
+                    ready_to_run=False,
+                ),
+                requirements={
+                    "credentials": [missing_credentials[field_key]],
+                    "inputs": [],
+                    "execution_modes": [],
+                },
+            ),
+        )
--- a/autogpt_platform/backend/backend/copilot/tools/connect_integration_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/connect_integration_test.py
@@ -0,0 +1,135 @@
+"""Tests for ConnectIntegrationTool."""
+
+import pytest
+
+from ._test_data import make_session
+from .connect_integration import ConnectIntegrationTool
+from .models import ErrorResponse, SetupRequirementsResponse
+
+_TEST_USER_ID = "test-user-connect-integration"
+
+
+class TestConnectIntegrationTool:
+    def _make_tool(self) -> ConnectIntegrationTool:
+        return ConnectIntegrationTool()
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_unknown_provider_returns_error(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="nonexistent"
+        )
+        assert isinstance(result, ErrorResponse)
+        assert result.error == "unknown_provider"
+        assert "nonexistent" in result.message
+        assert "github" in result.message
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_empty_provider_returns_error(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider=""
+        )
+        assert isinstance(result, ErrorResponse)
+        assert result.error == "unknown_provider"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_github_provider_returns_setup_response(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        assert result.setup_info.agent_name == "GitHub"
+        assert result.setup_info.agent_id == "connect_github"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_github_has_missing_credentials_in_readiness(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        readiness = result.setup_info.user_readiness
+        assert readiness.has_all_credentials is False
+        assert readiness.ready_to_run is False
+        assert "github_credentials" in readiness.missing_credentials
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_github_requirements_include_credential_entry(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        creds = result.setup_info.requirements["credentials"]
+        assert len(creds) == 1
+        assert creds[0]["provider"] == "github"
+        assert creds[0]["id"] == "github_credentials"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_reason_appears_in_message(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        reason = "Needed to create a pull request."
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github", reason=reason
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        assert reason in result.message
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_session_id_propagated(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        assert result.session_id == session.session_id
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_provider_case_insensitive(self):
+        """Provider slug is normalised to lowercase before lookup."""
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="GitHub"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+
+    def test_tool_name(self):
+        assert ConnectIntegrationTool().name == "connect_integration"
+
+    def test_requires_auth(self):
+        assert ConnectIntegrationTool().requires_auth is True
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_unauthenticated_user_gets_need_login_response(self):
+        """execute() with user_id=None must return NeedLoginResponse, not the setup card.
+
+        This verifies that the requires_auth guard in BaseTool.execute() fires
+        before _execute() is called, so unauthenticated callers cannot probe
+        which integrations are configured.
+        """
+        import json
+
+        tool = self._make_tool()
+        # Session still needs a user_id string; the None is passed to execute()
+        # to simulate an unauthenticated call.
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool.execute(
+            user_id=None,
+            session=session,
+            tool_call_id="test-call-id",
+            provider="github",
+        )
+        raw = result.output
+        output = json.loads(raw) if isinstance(raw, str) else raw
+        assert output.get("type") == "need_login"
+        assert result.success is False
--- a/autogpt_platform/backend/backend/integrations/creds_manager.py
+++ b/autogpt_platform/backend/backend/integrations/creds_manager.py
@@ -25,6 +25,35 @@ logger = logging.getLogger(__name__)
 settings = Settings()


+_on_creds_changed: Callable[[str, str], None] | None = None
+
+
+def register_creds_changed_hook(hook: Callable[[str, str], None]) -> None:
+    """Register a callback invoked after any credential is created/updated/deleted.
+
+    The callback receives ``(user_id, provider)`` and should be idempotent.
+    Only one hook can be registered at a time; calling this again replaces the
+    previous hook.  Intended to be called once at application startup by the
+    copilot module to bust its token cache without creating an import cycle.
+    """
+    global _on_creds_changed
+    _on_creds_changed = hook
+
+
+def _bust_copilot_cache(user_id: str, provider: str) -> None:
+    """Invoke the registered hook (if any) to bust downstream token caches."""
+    if _on_creds_changed is not None:
+        try:
+            _on_creds_changed(user_id, provider)
+        except Exception:
+            logger.warning(
+                "Credential-change hook failed for user=%s provider=%s",
+                user_id,
+                provider,
+                exc_info=True,
+            )
+
+
 class IntegrationCredentialsManager:
    """
    Handles the lifecycle of integration credentials.
@@ -69,7 +98,11 @@ class IntegrationCredentialsManager:
        return self._locks

    async def create(self, user_id: str, credentials: Credentials) -> None:
-        return await self.store.add_creds(user_id, credentials)
+        result = await self.store.add_creds(user_id, credentials)
+        # Bust the copilot token cache so that the next bash_exec picks up the
+        # new credential immediately instead of waiting for _NULL_CACHE_TTL.
+        _bust_copilot_cache(user_id, credentials.provider)
+        return result

    async def exists(self, user_id: str, credentials_id: str) -> bool:
        return (await self.store.get_creds_by_id(user_id, credentials_id)) is not None
@@ -156,6 +189,8 @@ class IntegrationCredentialsManager:

                fresh_credentials = await oauth_handler.refresh_tokens(credentials)
                await self.store.update_creds(user_id, fresh_credentials)
+                # Bust copilot cache so the refreshed token is picked up immediately.
+                _bust_copilot_cache(user_id, fresh_credentials.provider)
                if _lock and (await _lock.locked()) and (await _lock.owned()):
                    try:
                        await _lock.release()
@@ -168,10 +203,17 @@ class IntegrationCredentialsManager:
    async def update(self, user_id: str, updated: Credentials) -> None:
        async with self._locked(user_id, updated.id):
            await self.store.update_creds(user_id, updated)
+        # Bust the copilot token cache so the updated credential is picked up immediately.
+        _bust_copilot_cache(user_id, updated.provider)

    async def delete(self, user_id: str, credentials_id: str) -> None:
        async with self._locked(user_id, credentials_id):
+            # Read inside the lock to avoid TOCTOU — another coroutine could
+            # delete the same credential between the read and the delete.
+            creds = await self.store.get_creds_by_id(user_id, credentials_id)
            await self.store.delete_creds_by_id(user_id, credentials_id)
+        if creds:
+            _bust_copilot_cache(user_id, creds.provider)

    # -- Locking utilities -- #

--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx
@@ -2,7 +2,7 @@
 import { ChatInput } from "@/app/(platform)/copilot/components/ChatInput/ChatInput";
 import { UIDataTypes, UIMessage, UITools } from "ai";
 import { LayoutGroup, motion } from "framer-motion";
-import { ReactNode, useCallback } from "react";
+import { ReactNode } from "react";
 import { ChatMessagesContainer } from "../ChatMessagesContainer/ChatMessagesContainer";
 import { CopilotChatActionsProvider } from "../CopilotChatActionsProvider/CopilotChatActionsProvider";
 import { EmptySession } from "../EmptySession/EmptySession";
@@ -52,20 +52,6 @@ export const ChatContainer = ({
    !!isSessionError;
  const inputLayoutId = "copilot-2-chat-input";

-  // Retry: re-send the last user message (used by ErrorCard on transient errors)
-  const handleRetry = useCallback(() => {
-    const lastUserMsg = [...messages].reverse().find((m) => m.role === "user");
-    const lastText = lastUserMsg?.parts
-      .filter(
-        (p): p is Extract<typeof p, { type: "text" }> => p.type === "text",
-      )
-      .map((p) => p.text)
-      .join("");
-    if (lastText) {
-      onSend(lastText);
-    }
-  }, [messages, onSend]);
-
  return (
    <CopilotChatActionsProvider onSend={onSend}>
      <LayoutGroup id="copilot-2-chat-layout">
@@ -79,7 +65,6 @@ export const ChatContainer = ({
                isLoading={isLoadingSession}
                headerSlot={headerSlot}
                sessionID={sessionId}
-                onRetry={handleRetry}
              />
              <motion.div
                initial={{ opacity: 0 }}
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx
@@ -32,13 +32,11 @@ interface Props {
  isLoading: boolean;
  headerSlot?: React.ReactNode;
  sessionID?: string | null;
-  onRetry?: () => void;
 }

 function renderSegments(
  segments: RenderSegment[],
  messageID: string,
-  onRetry?: () => void,
 ): React.ReactNode[] {
  return segments.map((seg, segIdx) => {
    if (seg.kind === "collapsed-group") {
@@ -50,7 +48,6 @@ function renderSegments(
        part={seg.part}
        messageID={messageID}
        partIndex={seg.index}
-        onRetry={onRetry}
      />
    );
  });
@@ -107,7 +104,6 @@ export function ChatMessagesContainer({
  isLoading,
  headerSlot,
  sessionID,
-  onRetry,
 }: Props) {
  const lastMessage = messages[messages.length - 1];
  const graphExecId = useMemo(() => extractGraphExecId(messages), [messages]);
@@ -216,18 +212,13 @@ export function ChatMessagesContainer({
                  </ReasoningCollapse>
                )}
                {responseSegments
-                  ? renderSegments(
-                      responseSegments,
-                      message.id,
-                      isLastAssistant ? onRetry : undefined,
-                    )
+                  ? renderSegments(responseSegments, message.id)
                  : message.parts.map((part, i) => (
                      <MessagePartRenderer
                        key={`${message.id}-${i}`}
                        part={part}
                        messageID={message.id}
                        partIndex={i}
-                        onRetry={isLastAssistant ? onRetry : undefined}
                      />
                    ))}
                {isLastInTurn && !isCurrentlyStreaming && (
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessagePartRenderer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessagePartRenderer.tsx
@@ -3,6 +3,7 @@ import { ErrorCard } from "@/components/molecules/ErrorCard/ErrorCard";
 import { ExclamationMarkIcon } from "@phosphor-icons/react";
 import { ToolUIPart, UIDataTypes, UIMessage, UITools } from "ai";
 import { useState } from "react";
+import { ConnectIntegrationTool } from "../../../tools/ConnectIntegrationTool/ConnectIntegrationTool";
 import { CreateAgentTool } from "../../../tools/CreateAgent/CreateAgent";
 import { EditAgentTool } from "../../../tools/EditAgent/EditAgent";
 import {
@@ -69,15 +70,9 @@ interface Props {
  part: UIMessage<unknown, UIDataTypes, UITools>["parts"][number];
  messageID: string;
  partIndex: number;
-  onRetry?: () => void;
 }

-export function MessagePartRenderer({
-  part,
-  messageID,
-  partIndex,
-  onRetry,
-}: Props) {
+export function MessagePartRenderer({ part, messageID, partIndex }: Props) {
  const key = `${messageID}-${partIndex}`;

  switch (part.type) {
@@ -86,7 +81,7 @@ export function MessagePartRenderer({
        part.text,
      );

-      if (markerType === "error" || markerType === "retryable_error") {
+      if (markerType === "error") {
        const lowerMarker = markerText.toLowerCase();
        const isCancellation =
          lowerMarker === "operation cancelled" ||
@@ -106,7 +101,6 @@ export function MessagePartRenderer({
            key={key}
            responseError={{ message: markerText }}
            context="execution"
-            onRetry={markerType === "retryable_error" ? onRetry : undefined}
          />
        );
      }
@@ -136,6 +130,8 @@ export function MessagePartRenderer({
    case "tool-search_docs":
    case "tool-get_doc_page":
      return <SearchDocsTool key={key} part={part as ToolUIPart} />;
+    case "tool-connect_integration":
+      return <ConnectIntegrationTool key={key} part={part as ToolUIPart} />;
    case "tool-run_block":
    case "tool-continue_run_block":
      return <RunBlockTool key={key} part={part as ToolUIPart} />;
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.ts
@@ -172,22 +172,16 @@ export function getTurnMessages(
 // The hex suffix makes it virtually impossible for an LLM to accidentally
 // produce these strings in normal conversation.
 const COPILOT_ERROR_PREFIX = "[__COPILOT_ERROR_f7a1__]";
-const COPILOT_RETRYABLE_ERROR_PREFIX = "[__COPILOT_RETRYABLE_ERROR_a9c2__]";
 const COPILOT_SYSTEM_PREFIX = "[__COPILOT_SYSTEM_e3b0__]";

-export type MarkerType = "error" | "retryable_error" | "system" | null;
+export type MarkerType = "error" | "system" | null;

 /** Escape all regex special characters in a string. */
 function escapeRegExp(s: string): string {
  return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
 }

-// Pre-compiled marker regexes (avoids re-creating on every call / render).
-// Retryable check must come first since it's more specific.
-const RETRYABLE_ERROR_MARKER_RE = new RegExp(
-  `${escapeRegExp(COPILOT_RETRYABLE_ERROR_PREFIX)}\\s*(.+?)$`,
-  "s",
-);
+// Pre-compiled marker regexes (avoids re-creating on every call / render)
 const ERROR_MARKER_RE = new RegExp(
  `${escapeRegExp(COPILOT_ERROR_PREFIX)}\\s*(.+?)$`,
  "s",
@@ -202,15 +196,6 @@ export function parseSpecialMarkers(text: string): {
  markerText: string;
  cleanText: string;
 } {
-  const retryableMatch = text.match(RETRYABLE_ERROR_MARKER_RE);
-  if (retryableMatch) {
-    return {
-      markerType: "retryable_error",
-      markerText: retryableMatch[1].trim(),
-      cleanText: text.replace(retryableMatch[0], "").trim(),
-    };
-  }
-
  const errorMatch = text.match(ERROR_MARKER_RE);
  if (errorMatch) {
    return {
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/ConnectIntegrationTool/ConnectIntegrationTool.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/ConnectIntegrationTool/ConnectIntegrationTool.tsx
@@ -0,0 +1,104 @@
+"use client";
+
+import type { SetupRequirementsResponse } from "@/app/api/__generated__/models/setupRequirementsResponse";
+import type { ToolUIPart } from "ai";
+import { useState } from "react";
+import { MorphingTextAnimation } from "../../components/MorphingTextAnimation/MorphingTextAnimation";
+import { ContentMessage } from "../../components/ToolAccordion/AccordionContent";
+import { SetupRequirementsCard } from "../RunBlock/components/SetupRequirementsCard/SetupRequirementsCard";
+
+type Props = {
+  part: ToolUIPart;
+};
+
+function parseJson(raw: unknown): unknown {
+  if (typeof raw === "string") {
+    try {
+      return JSON.parse(raw);
+    } catch {
+      return null;
+    }
+  }
+  return raw;
+}
+
+function parseOutput(raw: unknown): SetupRequirementsResponse | null {
+  const parsed = parseJson(raw);
+  if (parsed && typeof parsed === "object" && "setup_info" in parsed) {
+    return parsed as SetupRequirementsResponse;
+  }
+  return null;
+}
+
+function parseError(raw: unknown): string | null {
+  const parsed = parseJson(raw);
+  if (parsed && typeof parsed === "object" && "message" in parsed) {
+    return String((parsed as { message: unknown }).message);
+  }
+  return null;
+}
+
+export function ConnectIntegrationTool({ part }: Props) {
+  // Persist dismissed state here so SetupRequirementsCard remounts don't re-enable Proceed.
+  const [isDismissed, setIsDismissed] = useState(false);
+
+  const isStreaming =
+    part.state === "input-streaming" || part.state === "input-available";
+  const isError = part.state === "output-error";
+
+  const output =
+    part.state === "output-available"
+      ? parseOutput((part as { output?: unknown }).output)
+      : null;
+
+  const errorMessage = isError
+    ? (parseError((part as { output?: unknown }).output) ??
+      "Failed to connect integration")
+    : null;
+
+  const rawProvider =
+    (part as { input?: { provider?: string } }).input?.provider ?? "";
+  const providerName =
+    output?.setup_info?.agent_name ??
+    // Sanitize LLM-controlled provider slug: trim and cap at 64 chars to
+    // prevent runaway text in the DOM.
+    (rawProvider ? rawProvider.trim().slice(0, 64) : "integration");
+
+  const label = isStreaming
+    ? `Connecting ${providerName}…`
+    : isError
+      ? `Failed to connect ${providerName}`
+      : output
+        ? `Connect ${output.setup_info?.agent_name ?? providerName}`
+        : `Connect ${providerName}`;
+
+  return (
+    <div className="py-2">
+      <div className="flex items-center gap-2 text-sm text-muted-foreground">
+        <MorphingTextAnimation
+          text={label}
+          className={isError ? "text-red-500" : undefined}
+        />
+      </div>
+
+      {isError && errorMessage && (
+        <p className="mt-1 text-sm text-red-500">{errorMessage}</p>
+      )}
+
+      {output && (
+        <div className="mt-2">
+          {isDismissed ? (
+            <ContentMessage>Connected. Continuing…</ContentMessage>
+          ) : (
+            <SetupRequirementsCard
+              output={output}
+              credentialsLabel={`${output.setup_info?.agent_name ?? providerName} credentials`}
+              retryInstruction="I've connected my account. Please continue."
+              onComplete={() => setIsDismissed(true)}
+            />
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/SetupRequirementsCard.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/SetupRequirementsCard.tsx
@@ -23,12 +23,16 @@ interface Props {
  /** Override the label shown above the credentials section.
   * Defaults to "Credentials". */
  credentialsLabel?: string;
+  /** Called after Proceed is clicked so the parent can persist the dismissed state
+   * across remounts (avoids re-enabling the Proceed button on remount). */
+  onComplete?: () => void;
 }

 export function SetupRequirementsCard({
  output,
  retryInstruction,
  credentialsLabel,
+  onComplete,
 }: Props) {
  const { onSend } = useCopilotChatActions();

@@ -68,13 +72,17 @@ export function SetupRequirementsCard({
      return v !== undefined && v !== null && v !== "";
    });

+  if (hasSent) {
+    return <ContentMessage>Connected. Continuing…</ContentMessage>;
+  }
+
  const canRun =
-    !hasSent &&
    (!needsCredentials || isAllCredentialsComplete) &&
    (!needsInputs || isAllInputsComplete);

  function handleRun() {
    setHasSent(true);
+    onComplete?.();

    const parts: string[] = [];
    if (needsCredentials) {
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/useCredentialsInput.ts
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/useCredentialsInput.ts
@@ -125,9 +125,9 @@ export function useCredentialsInput({
      if (hasAttemptedAutoSelect.current) return;
      hasAttemptedAutoSelect.current = true;

-      // Auto-select if exactly one credential matches.
-      // For optional fields with multiple options, let the user choose.
-      if (isOptional && savedCreds.length > 1) return;
+      // Auto-select only when there is exactly one saved credential.
+      // With multiple options the user must choose — regardless of optional/required.
+      if (savedCreds.length > 1) return;

      const cred = savedCreds[0];
      onSelectCredential({
Author	SHA1	Message	Date
Zamil Majdy	88eaab2baa	Merge remote-tracking branch 'origin/dev' into feat/github-cli-copilot	2026-03-17 06:17:03 +07:00
Zamil Majdy	4b0a445635	fix(copilot): remove implicit gh auth setup-git from sandbox creation Remove the automatic GitHub credential helper configuration that ran on every E2B sandbox connect/reconnect. This addressed a review concern about implicitly giving AutoPilot full GitHub access without user awareness or opt-in. The bash_exec tool already injects GH_TOKEN/GITHUB_TOKEN per-command for users who have connected their account via connect_integration, which is the explicit opt-in path.	2026-03-17 00:36:51 +07:00
Zamil Majdy	36312d2c6e	fix(backend/copilot): bust cache on OAuth refresh + persist dismissed state - creds_manager: call _bust_copilot_cache after refresh_if_needed persists the refreshed token so the copilot cache doesn't serve a stale access token after silent refresh - ConnectIntegrationTool: lift isDismissed state to parent so SetupRequirementsCard remounts don't re-enable the Proceed button; onComplete callback propagates the dismissed signal up	2026-03-16 17:10:18 +07:00
Zamil Majdy	d6d3b8d710	fix(copilot): address coderabbitai major issues — scope token description to E2B, guard cache-bust hook - connect_integration.py: clarify that GH_TOKEN is injected per-command in E2B/cloud only; note that bubblewrap isolates network so retry won't work - creds_manager._bust_copilot_cache: wrap _on_creds_changed in try/except so a failing hook doesn't turn successful create/update/delete into a 500	2026-03-16 15:52:40 +07:00
Zamil Majdy	17d8d0bf05	fix(backend/copilot): run gh auth setup-git once on sandbox connect/reconnect Move git credential helper setup out of bash_exec (where it ran on every command) and into _setup_e2b so it runs exactly once per sandbox connect or reconnect. Non-fatal: logged at debug level on failure.	2026-03-16 15:45:18 +07:00
Zamil Majdy	5a2ab65f41	fix(backend/copilot): run gh auth setup-git once per sandbox session Use grep to skip re-running if the credential helper is already configured in ~/.gitconfig — only pays the cost on first command. Agent can still call it manually if GH_TOKEN changes mid-session.	2026-03-16 15:42:54 +07:00
Zamil Majdy	81a318de3e	feat(backend/copilot): improve GitHub OAuth UX and git auth - Dynamic OAuth scopes: connect_integration tool now accepts a `scopes` param so the agent can request exactly the access it needs (e.g. `["repo", "read:org"]`); GitHub defaults to `["repo"]` so git push/pull works out of the box instead of public-data-only - Lazy git auth: prepend `gh auth setup-git` on every E2B bash_exec when GH_TOKEN is present — git HTTPS clone/push/pull now work automatically without the agent needing to set this up manually - Prefer broadest-scoped OAuth2 credential: sort repo-scoped tokens first so a stale public-data token is never picked over a full one - Collapse SetupRequirementsCard to "Connected. Continuing…" after Proceed is clicked instead of leaving the full card visible - Fix credential auto-select: don't silently pick the first token when multiple credentials exist — let the user choose via the dropdown	2026-03-16 15:26:14 +07:00
Zamil Majdy	62c8e8634b	fix(copilot): patch _manager singleton directly in tests instead of class constructor The module-level _manager singleton is created at import time, so patching IntegrationCredentialsManager after import has no effect. Patch the _manager attribute directly so get_provider_token uses the mock.	2026-03-16 06:32:36 +07:00
Zamil Majdy	b91c959cd9	fix(copilot): address remaining review findings - creds_manager: fix TOCTOU in delete() — move get_creds_by_id inside the lock - creds_manager: replace lazy import in _bust_copilot_cache with a register_creds_changed_hook() callback so creds_manager has no runtime dependency on the copilot module - integration_creds: register invalidate_user_provider_cache at module import via register_creds_changed_hook() — eliminates the circular-import risk - integration_creds: add module-level _manager singleton (avoids re-instantiating IntegrationCredentialsManager on every cache miss) - integration_creds: document TTLCache asyncio-only thread-safety assumption - connect_integration: defer GITHUB_OAUTH_IS_CONFIGURED evaluation to runtime with an lru_cache'd helper; importing the module no longer triggers Secrets() - connect_integration: type missing_credentials dict with _CredentialEntry TypedDict - connect_integration: cap reason field at 500 chars; add maxLength to JSON schema - bash_exec: use 'user_id is not None' instead of truthy check - connect_integration_test: add test for unauthenticated caller (requires_auth guard) - bash_exec_test: add E2B path tests — token injected when user_id set, skipped when user_id is None - ConnectIntegrationTool.tsx: sanitize LLM-controlled providerName fallback (trim + slice to 64 chars)	2026-03-16 06:21:44 +07:00
Zamil Majdy	5b95a2a1ef	refactor(copilot): strongly type _PROVIDER_INFO with TypedDict Replace dict[str, Any] with a _ProviderInfo TypedDict for provider metadata entries, eliminating key/type drift as new providers are added.	2026-03-16 06:04:02 +07:00
Zamil Majdy	9c2a601167	refactor(copilot): simplify cache with cachetools.TTLCache, fix prompt wording - Replace manual dict+sentinel cache with two TTLCache instances: _token_cache (5min TTL) and _null_cache (60s TTL) - Remove _cache_set helper and _NO_TOKEN sentinel — TTLCache handles expiry and LRU eviction natively - Update tests to use _token_cache/_null_cache directly; add TTL constant test - Change _E2B_TOOL_NOTES from "GH_TOKEN is set" to "gh is pre-authenticated" so the AI doesn't attempt to read the env var directly	2026-03-16 00:16:26 +07:00
Zamil Majdy	b98e37bf23	refactor(copilot): DRY cache-bust helper, fast eviction test, unified JSON parse Backend: - Extract _bust_copilot_cache() in creds_manager.py; create/update/delete now each call it once instead of repeating the try/except ImportError block - test_evicts_oldest_when_full: patch _CACHE_MAX_SIZE to 3 to avoid allocating 10 000 entries in CI; remove now-unused _CACHE_MAX_SIZE import Frontend: - Extract parseJson() helper shared by parseOutput and parseError in ConnectIntegrationTool.tsx, eliminating duplicated try/catch logic	2026-03-16 00:01:10 +07:00
Zamil Majdy	fec8924361	fix(copilot): bust token cache on update/delete, tighten except clause - creds_manager.create/update/delete now all call invalidate_user_provider_cache after mutating credentials, so the next bash_exec always picks up the current state without waiting for TTL to expire - Change broad `except Exception` to `except ImportError` in all three methods so real bugs inside invalidate_user_provider_cache are not silently swallowed - delete() reads the provider before deletion so we know which cache key to evict - Add tests for invalidate_user_provider_cache: removes sentinel/token entry, no-op when key absent, only removes the targeted key	2026-03-15 23:57:11 +07:00
Zamil Majdy	712aee7302	fix(copilot): warn on stale OAuth token fallback, document per-process cache - Log at WARNING (not DEBUG) when OAuth refresh fails and we fall back to a potentially stale token, so operators can diagnose repeated auth failures - Add multi-worker note to module docstring: _token_cache is process-local; each replica maintains its own cache (acceptable for current goal, but a shared cache would be needed for cross-replica efficiency)	2026-03-15 23:53:10 +07:00
Zamil Majdy	bef292033e	fix(copilot): render error state in ConnectIntegrationTool When part.state is 'output-error', show the error message from the backend (ErrorResponse.message) in red text below the status line. Without this, errors from unknown/unsupported providers were silently discarded, leaving the user without any feedback.	2026-03-15 23:51:09 +07:00
Zamil Majdy	ec6974e3b8	fix(copilot): invalidate null cache on credential creation When a user connects an integration, IntegrationCredentialsManager.create() now calls invalidate_user_provider_cache() to remove any stale _NO_TOKEN sentinel from the TTL cache. Without this, the first retry after connecting would still return None for up to _NULL_CACHE_TTL (60 s). The import is done lazily inside create() to avoid a circular import between integrations.creds_manager and copilot.integration_creds.	2026-03-15 23:50:34 +07:00
Zamil Majdy	2ef5e2fe77	feat(copilot): bounded TTL cache with sentinel for integration creds - Replace empty-string sentinel with explicit _NO_TOKEN = object() to avoid ambiguity with zero-length tokens - Bound _token_cache to _CACHE_MAX_SIZE=10_000 entries; _cache_set() evicts oldest insertion-order entry when full - Cache "not connected" results with _NULL_CACHE_TTL=60s (vs 300s for found tokens) to avoid a DB hit on every E2B bash_exec for users who haven't connected yet, while still picking up a new connection quickly - Add integration_creds_test.py covering all cache paths, sentinel, eviction, OAuth2 preferred/fallback, DB exception, and env var injection	2026-03-15 23:46:05 +07:00
Zamil Majdy	0a8c7221ce	fix(copilot): address all review findings - prompting: rename _SDK_TOOL_NOTES → _E2B_TOOL_NOTES; pass it only to _get_cloud_sandbox_supplement() via new extra_notes param — local (bubblewrap) mode uses --unshare-net so gh CLI cannot reach GitHub - integration_creds: cache None results with 60 s TTL (_NULL_CACHE_TTL) to avoid a DB hit on every E2B bash_exec for users without GitHub creds; found tokens still cached for 5 min (_TOKEN_CACHE_TTL) - connect_integration: add cross-reference comment to PROVIDER_ENV_VARS - ConnectIntegrationTool: use provider-specific credentialsLabel (e.g. "GitHub credentials" instead of "Integration credentials")	2026-03-15 23:35:25 +07:00
Zamil Majdy	840d1de636	refactor(copilot): move token injection to bash_exec + add integration_creds module - Extract integration token lookup into backend/copilot/integration_creds.py: * Generic get_provider_token(user_id, provider) with 5-min TTL cache * get_integration_env_vars(user_id) loops over PROVIDER_ENV_VARS registry * Adding a new provider only requires a one-line PROVIDER_ENV_VARS entry - Inject tokens lazily in bash_exec._execute_on_e2b (E2B has internet access; bubblewrap uses --unshare-net so gh CLI cannot reach GitHub regardless) - Remove eager per-turn GH_TOKEN injection from sdk/service.py (wrong layer: bubblewrap is network-isolated, E2B injection now done per-command in bash_exec) - Fix unsafe output.setup_info?.agent_name access in ConnectIntegrationTool - Add connect_integration_test.py: unknown provider, known provider structure, reason in message, session_id propagation, case-insensitive provider slug	2026-03-15 23:29:33 +07:00
Zamil Majdy	ac55ab619b	fix(copilot): address coderabbitai nitpicks - Use `type Props` instead of `interface Props` in ConnectIntegrationTool - Simplify parseOutput: skip stringify→parse round-trip for objects - Document why requires_auth=True despite user_id not being used	2026-03-15 23:14:54 +07:00
Zamil Majdy	a8014d1e92	fix(copilot): address sentry — refresh expired OAuth tokens, handle object output in parseOutput - service.py: use IntegrationCredentialsManager.refresh_if_needed() instead of raw IntegrationCredentialsStore so expired GitHub OAuth tokens are refreshed before injection; falls back to stale token on refresh failure to avoid breaking the turn entirely (lock=False to avoid blocking the turn) - ConnectIntegrationTool.tsx: parseOutput now handles both string and already-parsed object inputs, matching the RunBlock helper pattern used elsewhere in the codebase	2026-03-15 23:06:57 +07:00
Zamil Majdy	7de13c7713	fix(copilot): address self-review — GH_TOKEN OAuth preference, unknown provider error, baseline note scope - service.py: two-pass loop in _get_github_token_for_user() to genuinely prefer OAuth2 tokens over API keys; use creds.type discriminator instead of isinstance to match codebase style - connect_integration.py: return ErrorResponse (not SetupRequirementsResponse) for unknown providers so the frontend renders a proper error instead of a blank broken card; trim "Integration" suffix from agent_name to avoid "Connect GitHub Integration" redundancy - prompting.py: move GitHub CLI / connect_integration guidance from _SHARED_TOOL_NOTES (baseline+SDK) to _SDK_TOOL_NOTES (SDK-only) since baseline mode has no subprocess, no gh CLI, and no connect_integration tool - ConnectIntegrationTool.tsx: simplify parseOutput to short-circuit when raw is not a string, removing unnecessary JSON.stringify round-trip	2026-03-15 23:00:46 +07:00
Zamil Majdy	9358b525a0	feat(copilot): inject GH_TOKEN and add connect_integration tool for missing GitHub credentials When the user has connected GitHub, GH_TOKEN is automatically injected into the Claude Agent SDK subprocess environment so `gh` CLI works without any manual auth step. When GitHub is not connected, the copilot can call the new connect_integration(provider="github") MCP tool, which surfaces the same credentials setup card used by GitHub blocks — letting the user connect their account inline without leaving the chat. - backend: _get_github_token_for_user() fetches the user's GitHub credentials (OAuth2 or API key) and injects GH_TOKEN + GITHUB_TOKEN into sdk_env before the Claude Agent SDK subprocess starts - backend: ConnectIntegrationTool MCP tool returns a SetupRequirementsResponse for any known provider (github for now) - backend: prompting.py documents the gh CLI / connect_integration flow in _SHARED_TOOL_NOTES so the copilot knows when to call it - frontend: ConnectIntegrationTool component renders the existing SetupRequirementsCard with a tailored retry instruction - frontend: MessagePartRenderer dispatches tool-connect_integration to the new component	2026-03-15 22:55:08 +07:00