fix(copilot): remove redundant success-path transcript upload

The success path always uploaded the resume file (old downloaded data), then the finally block overwrote with the stop hook (new turn data). With always-upload, this caused the smaller stop hook to overwrite larger (but stale) data from the resume file. Remove the success path upload — the finally block handles it correctly by preferring stop hook content and falling back to the resume file when empty.
fix(copilot): always upload transcript instead of size-based skip
2026-03-17 03:00:27 -04:00 · 2026-03-06 12:45:12 +07:00 · 2026-03-06 02:16:36 +07:00 · 2026-03-05 18:49:41 +00:00 · 2026-03-05 18:44:55 +00:00 · 2026-03-05 15:59:06 +00:00
7 changed files with 47 additions and 93 deletions
--- a/.github/workflows/classic-autogpt-ci.yml
+++ b/.github/workflows/classic-autogpt-ci.yml
@@ -139,7 +139,7 @@ jobs:

      - name: Upload logs to artifact
        if: always()
-        uses: actions/upload-artifact@v7
+        uses: actions/upload-artifact@v4
        with:
          name: test-logs
          path: classic/original_autogpt/logs/
--- a/.github/workflows/classic-forge-ci.yml
+++ b/.github/workflows/classic-forge-ci.yml
@@ -237,7 +237,7 @@ jobs:

      - name: Upload logs to artifact
        if: always()
-        uses: actions/upload-artifact@v7
+        uses: actions/upload-artifact@v4
        with:
          name: test-logs
          path: classic/forge/logs/
--- a/.github/workflows/platform-frontend-ci.yml
+++ b/.github/workflows/platform-frontend-ci.yml
@@ -149,7 +149,7 @@ jobs:
          driver-opts: network=host

      - name: Set up Platform - Expose GHA cache to docker buildx CLI
-        uses: crazy-max/ghaction-github-runtime@v3
+        uses: crazy-max/ghaction-github-runtime@v4

      - name: Set up Platform - Build Docker images (with cache)
        working-directory: autogpt_platform
@@ -269,7 +269,7 @@ jobs:

      - name: Upload Playwright report
        if: always()
-        uses: actions/upload-artifact@v7
+        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: playwright-report
@@ -278,7 +278,7 @@ jobs:

      - name: Upload Playwright test results
        if: always()
-        uses: actions/upload-artifact@v7
+        uses: actions/upload-artifact@v4
        with:
          name: playwright-test-results
          path: test-results
--- a/autogpt_platform/backend/Dockerfile
+++ b/autogpt_platform/backend/Dockerfile
@@ -111,13 +111,29 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
 # Copy poetry (build-time only, for `poetry install --only-root` to create entry points)
 COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
 COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
-# Copy Node.js installation for Prisma
+# Copy Node.js installation for Prisma and agent-browser.
+# npm/npx are symlinks in the builder (-> ../lib/node_modules/npm/bin/*-cli.js);
+# COPY resolves them to regular files, breaking require() paths.  Recreate as
+# proper symlinks so npm/npx can find their modules.
 COPY --from=builder /usr/bin/node /usr/bin/node
 COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
-COPY --from=builder /usr/bin/npm /usr/bin/npm
-COPY --from=builder /usr/bin/npx /usr/bin/npx
+RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
+    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
 COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries

+# Install agent-browser (Copilot browser tool) + Chromium runtime dependencies.
+# These are the runtime libraries Chromium/Playwright needs on Debian 13 (trixie).
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
+    libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
+    libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
+    libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
+    fonts-liberation libfontconfig1 \
+    && rm -rf /var/lib/apt/lists/* \
+    && npm install -g agent-browser \
+    && agent-browser install \
+    && rm -rf /tmp/* /root/.npm
+
 WORKDIR /app/autogpt_platform/backend

 # Copy only the .venv from builder (not the entire /app directory)
--- a/autogpt_platform/backend/backend/copilot/model.py
+++ b/autogpt_platform/backend/backend/copilot/model.py
@@ -705,19 +705,10 @@ async def update_session_title(session_id: str, title: str) -> bool:
            logger.warning(f"Session {session_id} not found for title update")
            return False

-        # Update title in cache if it exists (instead of invalidating).
-        # This prevents race conditions where cache invalidation causes
-        # the frontend to see stale DB data while streaming is still in progress.
-        try:
-            cached = await _get_session_from_cache(session_id)
-            if cached:
-                cached.title = title
-                await cache_chat_session(cached)
-        except Exception as e:
-            # Not critical - title will be correct on next full cache refresh
-            logger.warning(
-                f"Failed to update title in cache for session {session_id}: {e}"
-            )
+        # Invalidate the cache so the next access reloads from DB with the
+        # updated title. This avoids a read-modify-write on the full session
+        # blob, which could overwrite concurrent message updates.
+        await invalidate_session_cache(session_id)

        return True
    except Exception as e:
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
@@ -1408,46 +1408,11 @@ async def stream_chat_completion_sdk(
            ) and not has_appended_assistant:
                session.messages.append(assistant_response)

-        # --- Upload transcript for next-turn --resume ---
-        # After async with the SDK task group has exited, so the Stop
-        # hook has already fired and the CLI has been SIGTERMed.  The
-        # CLI uses appendFileSync, so all writes are safely on disk.
-        if config.claude_agent_use_resume and user_id:
-            # With --resume the CLI appends to the resume file (most
-            # complete).  Otherwise use the Stop hook path.
-            if use_resume and resume_file:
-                raw_transcript = read_transcript_file(resume_file)
-                logger.debug("[SDK] Transcript source: resume file")
-            elif captured_transcript.path:
-                raw_transcript = read_transcript_file(captured_transcript.path)
-                logger.debug(
-                    "[SDK] Transcript source: stop hook (%s), read result: %s",
-                    captured_transcript.path,
-                    f"{len(raw_transcript)}B" if raw_transcript else "None",
-                )
-            else:
-                raw_transcript = None
-
-            if not raw_transcript:
-                logger.debug(
-                    "[SDK] No usable transcript — CLI file had no "
-                    "conversation entries (expected for first turn "
-                    "without --resume)"
-                )
-
-            if raw_transcript:
-                # Shield the upload from generator cancellation so a
-                # client disconnect / page refresh doesn't lose the
-                # transcript.  The upload must finish even if the SSE
-                # connection is torn down.
-                await asyncio.shield(
-                    _try_upload_transcript(
-                        user_id,
-                        session_id,
-                        raw_transcript,
-                        message_count=len(session.messages),
-                    )
-                )
+        # Transcript upload is handled in the finally block below — it
+        # correctly prefers the stop hook content (new turn data) over the
+        # resume file (old downloaded data).  Uploading here would write
+        # stale data that the finally block then overwrites with potentially
+        # smaller (but newer) stop hook content.

        logger.info(
            "[SDK] [%s] Stream completed successfully with %d messages",
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript.py
@@ -331,10 +331,10 @@ async def upload_transcript(
 ) -> None:
    """Strip progress entries and upload transcript to bucket storage.

-    Safety: only overwrites when the new (stripped) transcript is larger than
-    what is already stored.  Since JSONL is append-only, the latest transcript
-    is always the longest.  This prevents a slow/stale background task from
-    clobbering a newer upload from a concurrent turn.
+    The executor holds a cluster lock per session, so concurrent uploads for
+    the same session cannot happen.  We always overwrite — with ``--resume``
+    the CLI may compact old tool results, so neither byte size nor line count
+    is a reliable proxy for "newer".

    Args:
        message_count: ``len(session.messages)`` at upload time — used by
@@ -353,33 +353,16 @@ async def upload_transcript(
    storage = await get_workspace_storage()
    wid, fid, fname = _storage_path_parts(user_id, session_id)
    encoded = stripped.encode("utf-8")
-    new_size = len(encoded)

-    # Check existing transcript size to avoid overwriting newer with older
-    path = _build_storage_path(user_id, session_id, storage)
-    content_skipped = False
-    try:
-        existing = await storage.retrieve(path)
-        if len(existing) >= new_size:
-            logger.info(
-                f"[Transcript] Skipping content upload — existing ({len(existing)}B) "
-                f">= new ({new_size}B) for session {session_id}"
-            )
-            content_skipped = True
-    except (FileNotFoundError, Exception):
-        pass  # No existing transcript or retrieval error — proceed with upload
+    await storage.store(
+        workspace_id=wid,
+        file_id=fid,
+        filename=fname,
+        content=encoded,
+    )

-    if not content_skipped:
-        await storage.store(
-            workspace_id=wid,
-            file_id=fid,
-            filename=fname,
-            content=encoded,
-        )
-
-    # Always update metadata (even when content is skipped) so message_count
-    # stays current.  The gap-fill logic in _build_query_message relies on
-    # message_count to avoid re-compressing the same messages every turn.
+    # Update metadata so message_count stays current.  The gap-fill logic
+    # in _build_query_message relies on it to avoid re-compressing messages.
    try:
        meta = {"message_count": message_count, "uploaded_at": time.time()}
        mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
@@ -393,9 +376,8 @@ async def upload_transcript(
        logger.warning(f"[Transcript] Failed to write metadata for {session_id}: {e}")

    logger.info(
-        f"[Transcript] Uploaded {new_size}B "
-        f"(stripped from {len(content)}B, msg_count={message_count}, "
-        f"content_skipped={content_skipped}) "
+        f"[Transcript] Uploaded {len(encoded)}B "
+        f"(stripped from {len(content)}B, msg_count={message_count}) "
        f"for session {session_id}"
    )
Author	SHA1	Message	Date
Zamil Majdy	3cfe2f384a	fix(copilot): remove redundant success-path transcript upload The success path always uploaded the resume file (old downloaded data), then the finally block overwrote with the stop hook (new turn data). With always-upload, this caused the smaller stop hook to overwrite larger (but stale) data from the resume file. Remove the success path upload — the finally block handles it correctly by preferring stop hook content and falling back to the resume file when empty.	2026-03-06 12:45:12 +07:00
Zamil Majdy	fea6711ae7	fix(copilot): always upload transcript instead of size-based skip The size comparison (existing >= new) prevented transcript uploads when the CLI compacted old tool results via --resume, causing the stored transcript to become permanently stale. Since the executor holds a cluster lock per session, concurrent uploads cannot race — just always overwrite.	2026-03-06 02:16:36 +07:00
Zamil Majdy	21c705af6e	fix(backend/copilot): prevent title update from overwriting session messages (#12302 ) ### Changes 🏗️ Fixes a race condition in `update_session_title()` where the background title generation task could overwrite the Redis session cache with a stale snapshot, causing the copilot to "forget" its previous turns. Root cause: `update_session_title()` performs a read-modify-write on the Redis cache (read full session → set title → write back). Meanwhile, `upsert_chat_session()` writes a newer version with more messages during streaming. If the title task reads early (e.g., 34 messages) and writes late (after streaming persisted 101 messages), the stale 34-message version overwrites the 101-message version. When the next message lands on a different pod, it loads the stale session from Redis. Fix: Replace the read-modify-write with a simple cache invalidation (`invalidate_session_cache`). The title is already updated in the DB; the next access just reloads from DB with the correct title and messages. No locks, no deserialization of the full session blob, no risk of stale overwrites. Evidence from prod logs (session `41a3814c`): - Pod `tm2jb` persisted session with 101 messages - Pod `phflm` loaded session from Redis cache with only 35 messages (66 messages lost) - The title background task ran between these events, overwriting the cache ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/copilot/model_test.py` — 15/15 pass - [x] All pre-commit hooks pass (ruff, black, isort, pyright) - [ ] After deploy: verify long sessions no longer lose context on multi-pod setups	2026-03-05 18:49:41 +00:00
Zamil Majdy	a576be9db2	fix(backend): install agent-browser + Chromium in Docker image (#12301 ) The Copilot browser tool (`browser_navigate`, `browser_act`, `browser_screenshot`) has been broken on dev because `agent-browser` CLI + Chromium were never installed in the backend Docker image. ### Changes 🏗️ - Added `npx playwright install-deps chromium` to install Chromium runtime libraries (libnss3, libatk, etc.) - Added `npm install -g agent-browser` to install the CLI - Added `agent-browser install` to download the Chromium binary - Layer is placed after existing COPY-from-builder lines to preserve Docker cache ordering ### Root cause Every `browser_navigate` call fails with: ``` WARNING [browser_navigate] open failed for <url>: agent-browser is not installed (run: npm install -g agent-browser && agent-browser install). ``` The error originates from `FileNotFoundError` in `agent_browser.py:101` when the subprocess tries to execute the `agent-browser` binary which doesn't exist in the container. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified `agent-browser` binary is missing from current dev pod via `kubectl logs` - [x] Confirmed session `01eeac29-5a7` shows repeated failures for all URLs - [ ] After deploy: verify browser_navigate works in a Copilot session on dev #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes)	2026-03-05 18:44:55 +00:00
dependabot[bot]	5e90585f10	chore(deps): bump crazy-max/ghaction-github-runtime from 3 to 4 (#12262 ) Bumps [crazy-max/ghaction-github-runtime](https://github.com/crazy-max/ghaction-github-runtime) from 3 to 4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/crazy-max/ghaction-github-runtime/releases">crazy-max/ghaction-github-runtime's releases</a>.</em></p> <blockquote> <h2>v3.1.0</h2> <ul> <li>Bump <code>@actions/core</code> from 1.10.0 to 1.11.1 in <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/58">crazy-max/ghaction-github-runtime#58</a></li> <li>Bump braces from 3.0.2 to 3.0.3 in <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/54">crazy-max/ghaction-github-runtime#54</a></li> <li>Bump cross-spawn from 7.0.3 to 7.0.6 in <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/59">crazy-max/ghaction-github-runtime#59</a></li> <li>Bump ip from 2.0.0 to 2.0.1 in <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/50">crazy-max/ghaction-github-runtime#50</a></li> <li>Bump micromatch from 4.0.5 to 4.0.8 in <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/55">crazy-max/ghaction-github-runtime#55</a></li> <li>Bump tar from 6.1.14 to 6.2.1 in <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/51">crazy-max/ghaction-github-runtime#51</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/crazy-max/ghaction-github-runtime/compare/v3.0.0...v3.1.0">https://github.com/crazy-max/ghaction-github-runtime/compare/v3.0.0...v3.1.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`04d248b846`"><code>04d248b</code></a> Merge pull request <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/issues/76">#76</a> from crazy-max/node24</li> <li><a href="`c8f8e4e4e2`"><code>c8f8e4e</code></a> node 24 as default runtime</li> <li><a href="`494a382acb`"><code>494a382</code></a> Merge pull request <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/issues/68">#68</a> from crazy-max/dependabot/npm_and_yarn/actions/core-2.0.1</li> <li><a href="`5d51b8ef32`"><code>5d51b8e</code></a> Merge pull request <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/issues/74">#74</a> from crazy-max/dependabot/npm_and_yarn/minimatch-3.1.5</li> <li><a href="`f7077dccce`"><code>f7077dc</code></a> chore: update generated content</li> <li><a href="`4d1e03547a`"><code>4d1e035</code></a> chore(deps): bump minimatch from 3.1.2 to 3.1.5</li> <li><a href="`b59d56d5bc`"><code>b59d56d</code></a> chore(deps): bump <code>@actions/core</code> from 1.11.1 to 2.0.1</li> <li><a href="`6d0e2ef281`"><code>6d0e2ef</code></a> Merge pull request <a href="https://redirect.github.com/crazy-max/ghaction-github-runtime/issues/75">#75</a> from crazy-max/esm</li> <li><a href="`41d6f6acdb`"><code>41d6f6a</code></a> remove codecov config</li> <li><a href="`b5018eca65`"><code>b5018ec</code></a> chore: update generated content</li> <li>Additional commits viewable in <a href="https://github.com/crazy-max/ghaction-github-runtime/compare/v3...v4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=crazy-max/ghaction-github-runtime&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-03-05 15:59:06 +00:00