dx(orchestrate): fix stale-review gate and add pr-test evaluation rules to SKILL.md (#12701 )

## Changes ### verify-complete.sh - CHANGES_REQUESTED reviews are now compared against the latest commit timestamp. If the review was submitted **before** the latest commit, it is treated as stale and does not block verification. - Added fail-closed guard: if the `gh pr view` fetch fails, the script exits 1 (rather than treating missing data as "no blocking reviews") - Fixed edge case: a `CHANGES_REQUESTED` review with a null `submittedAt` is now counted as fresh/blocking (previously silently skipped) - Combined two separate `gh pr view` calls into one (`--json commits,reviews`) to reduce API calls and ensure consistency ### SKILL.md (orchestrate skill) - Added `### /pr-test result evaluation` section with explicit pass/partial/fail handling table - **PARTIAL on any headline feature scenario = immediate blocker**: re-brief the agent, fix, and re-run from scratch. Never approve or output ORCHESTRATOR:DONE with a PARTIAL headline result. - Concrete incident callout: PR #12699 S5 (Apply suggestions) was PARTIAL — AI never output JSON action blocks — but was nearly approved. This rule prevents recurrence. - Updated `verify-complete.sh` description throughout to include "no fresh CHANGES_REQUESTED" - Added staleness rule documentation: a review only blocks if submitted *after* the latest commit ## Why Two separate incidents prompted these changes: 1. **verify-complete.sh false positive**: An automated bot (autogpt-pr-reviewer) submitted a `CHANGES_REQUESTED` review in April. An agent then pushed fixing commits. The old script still blocked on the stale review, preventing the PR from being verified as done. 2. **Missed PARTIAL signal**: PR #12699 had a PARTIAL result on its headline scenario (S5 Apply button) because the AI emitted direct builder tool calls instead of JSON action blocks. The orchestrator nearly approved it. The new SKILL.md rule makes PARTIAL = blocker explicit. ## Checklist - [x] I have read the contribution guide - [x] My changes follow the code style of this project - [x] Changes are limited to the scope of this PR (< 20% unrelated changes) - [x] All new and existing tests pass
2026-04-08 03:00:28 -04:00 · 2026-04-08 08:58:42 +07:00
8 changed files with 98 additions and 985 deletions
--- a/.claude/skills/orchestrate/SKILL.md
+++ b/.claude/skills/orchestrate/SKILL.md
@@ -25,7 +25,7 @@ STATE_FILE=~/.claude/orchestrator-state.json
 | `spawn-agent.sh SESSION PATH SPARE NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]` | Create window + checkout branch + launch claude + send task. **Stdout: `SESSION:WIN` only** |
 | `recycle-agent.sh WINDOW PATH SPARE_BRANCH` | Kill window + restore spare branch |
 | `run-loop.sh` | **Mechanical babysitter** — idle restart + dialog approval + recycle on ORCHESTRATOR:DONE + supervisor health check + all-done notification |
-| `verify-complete.sh WINDOW` | Verify PR is done: checkpoints ✓ + 0 unresolved threads + CI green. Repo auto-derived from state file `.repo` or git remote. |
+| `verify-complete.sh WINDOW` | Verify PR is done: checkpoints ✓ + 0 unresolved threads + CI green + no fresh CHANGES_REQUESTED. Repo auto-derived from state file `.repo` or git remote. |
 | `notify.sh MESSAGE` | Send notification via Discord webhook (env `DISCORD_WEBHOOK_URL` or state `.discord_webhook`), macOS notification center, and stdout |
 | `capacity.sh [REPO_ROOT]` | Print available + in-use worktrees |
 | `status.sh` | Print fleet status + live pane commands |
@@ -64,7 +64,7 @@ spare/N branch  →  spawn-agent.sh (--session-id UUID)  →  window + feat/bran
                                                                 ↓
                                                        ORCHESTRATOR:DONE
                                                                 ↓
-                                    verify-complete.sh: checkpoints ✓ + 0 threads + CI green
+                                    verify-complete.sh: checkpoints ✓ + 0 threads + CI green + no fresh CHANGES_REQUESTED
                                                                 ↓
                                              state → "done", notify, window KEPT OPEN
                                                                 ↓
@@ -328,7 +328,9 @@ For each agent, decide:

 ### Strict ORCHESTRATOR:DONE gate

-`verify-complete.sh` handles the main checks automatically (checkpoints, threads, CHANGES_REQUESTED, CI green, spawned_at). Run it:
+`verify-complete.sh` handles the main checks automatically (checkpoints, threads, CI green, spawned_at, and CHANGES_REQUESTED). Run it:
+
+**CHANGES_REQUESTED staleness rule**: a `CHANGES_REQUESTED` review only blocks if it was submitted *after* the latest commit. If the latest commit postdates the review, the review is considered stale (feedback already addressed) and does not block. This avoids false negatives when a bot reviewer hasn't re-reviewed after the agent's fixing commits.

 ```bash
 SKILLS_DIR=~/.claude/orchestrator/scripts
@@ -412,6 +414,38 @@ Please verify: <specific behaviors to check>.

 Only one `/pr-test` at a time — they share ports and DB.

+### /pr-test result evaluation
+
+**PARTIAL on any headline feature scenario is an immediate blocker.** Do not approve, do not mark done, do not let the agent output `ORCHESTRATOR:DONE`.
+
+| `/pr-test` result | Action |
+|---|---|
+| All headline scenarios **PASS** | Proceed to evaluation step 2 |
+| Any headline scenario **PARTIAL** | Re-brief the agent immediately — see below |
+| Any headline scenario **FAIL** | Re-brief the agent immediately |
+
+**What PARTIAL means**: the feature is only partly working. Example: the Apply button never appeared, or the AI returned no action blocks. The agent addressed part of the objective but not all of it.
+
+**When any headline scenario is PARTIAL or FAIL:**
+
+1. Do NOT mark the agent done or accept `ORCHESTRATOR:DONE`
+2. Re-brief the agent with the specific scenario that failed and what was missing:
+   ```bash
+   tmux send-keys -t SESSION:WIN "PARTIAL result on /pr-test — S5 (Apply button) never appeared. The AI must output JSON action blocks for the Apply button to render. Fix this before re-running /pr-test."
+   sleep 0.3
+   tmux send-keys -t SESSION:WIN Enter
+   ```
+3. Set state back to `running`:
+   ```bash
+   jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "running"' \
+     ~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+   ```
+4. Wait for new `ORCHESTRATOR:DONE`, then re-run `/pr-test` from scratch
+
+**Rule: only ALL-PASS qualifies for approval.** A mix of PASS + PARTIAL is a failure.
+
+> **Why this matters**: PR #12699 was wrongly approved with S5 PARTIAL — the AI never output JSON action blocks so the Apply button never appeared. The fix was already in the agent's reach but slipped through because PARTIAL was not treated as blocking.
+
 ### 2. Do your own evaluation

 1. **Read the PR diff and objective** — does the code actually implement what was asked? Is anything obviously missing or half-done?
@@ -421,8 +455,9 @@ Only one `/pr-test` at a time — they share ports and DB.

 ### 3. Decide

- `/pr-test` passes + evaluation looks good → mark `done` in state, tell the user the PR is ready, ask if window should be closed
- `/pr-test` fails or evaluation finds gaps → re-brief the agent with specific failures, set state back to `running`
+- `/pr-test` all scenarios PASS + evaluation looks good → mark `done` in state, tell the user the PR is ready, ask if window should be closed
+- `/pr-test` any scenario PARTIAL or FAIL → re-brief the agent with the specific failing scenario, set state back to `running` (see `/pr-test result evaluation` above)
+- Evaluation finds gaps even with all PASS → re-brief the agent with specific gaps, set state back to `running`

 **Never mark done based purely on script output.** You hold the full objective context; the script does not.

@@ -441,6 +476,7 @@ Stop the fleet (`active = false`) when **all** of the following are true:
 | All agents are `done` or `escalated` | `jq '[.agents[] | select(.state | test("running\|stuck\|idle\|waiting_approval"))] | length' ~/.claude/orchestrator-state.json` == 0 |
 | All PRs have 0 unresolved review threads | GraphQL `isResolved` check per PR |
 | All PRs have green CI **on a run triggered after the agent's last push** | `gh run list --branch BRANCH --limit 1` timestamp > `spawned_at` in state |
+| No fresh CHANGES_REQUESTED (after latest commit) | `verify-complete.sh` checks this — stale pre-commit reviews are ignored |
 | No agents are `escalated` without human review | If any are escalated, surface to user first |

 **Do NOT stop just because agents output `ORCHESTRATOR:DONE`.** That is a signal to verify, not a signal to stop.
--- a/.claude/skills/orchestrate/scripts/verify-complete.sh
+++ b/.claude/skills/orchestrate/scripts/verify-complete.sh
@@ -115,13 +115,64 @@ if [ "$UNRESOLVED" -gt 0 ]; then
 fi

 # --- Check 6: no CHANGES_REQUESTED (checked AFTER CI — bots post reviews after their check) ---
-CHANGES_REQUESTED=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
-  --json reviews --jq '[.reviews[] | select(.state == "CHANGES_REQUESTED")] | length' 2>/dev/null || echo "0")
+# A CHANGES_REQUESTED review is stale if the latest commit was pushed AFTER the review was submitted.
+# Stale reviews (pre-dating the fixing commits) should not block verification.
+#
+# Fetch commits and latestReviews in a single call and fail closed — if gh fails,
+# treat that as NOT COMPLETE rather than silently passing.
+# Use latestReviews (not reviews) so each reviewer's latest state is used — superseded
+# CHANGES_REQUESTED entries are automatically excluded when the reviewer later approved.
+# Note: we intentionally use committedDate (not PR updatedAt) because updatedAt changes on any
+# PR activity (bot comments, label changes) which would create false negatives.
+PR_REVIEW_METADATA=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
+  --json commits,latestReviews 2>/dev/null) || {
+  echo "NOT COMPLETE: unable to fetch PR review metadata for PR #$PR_NUMBER" >&2
+  exit 1
+}

-if [ "$CHANGES_REQUESTED" -gt 0 ]; then
-  REQUESTERS=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
-    --json reviews --jq '[.reviews[] | select(.state == "CHANGES_REQUESTED") | .author.login] | join(", ")' 2>/dev/null || echo "unknown")
-  echo "NOT COMPLETE: CHANGES_REQUESTED from ${REQUESTERS} on PR #$PR_NUMBER" >&2
+LATEST_COMMIT_DATE=$(jq -r '.commits[-1].committedDate // ""' <<< "$PR_REVIEW_METADATA")
+CHANGES_REQUESTED_REVIEWS=$(jq '[.latestReviews[]? | select(.state == "CHANGES_REQUESTED")]' <<< "$PR_REVIEW_METADATA")
+
+BLOCKING_CHANGES_REQUESTED=0
+BLOCKING_REQUESTERS=""
+
+if [ -n "$LATEST_COMMIT_DATE" ] && [ "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length)" -gt 0 ]; then
+  if date --version >/dev/null 2>&1; then
+    LATEST_COMMIT_EPOCH=$(date -d "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
+  else
+    LATEST_COMMIT_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
+  fi
+
+  while IFS= read -r review; do
+    [ -z "$review" ] && continue
+    REVIEW_DATE=$(echo "$review" | jq -r '.submittedAt // ""')
+    REVIEWER=$(echo "$review" | jq -r '.author.login // "unknown"')
+    if [ -z "$REVIEW_DATE" ]; then
+      # No submission date — treat as fresh (conservative: blocks verification)
+      BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
+      BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
+    else
+      if date --version >/dev/null 2>&1; then
+        REVIEW_EPOCH=$(date -d "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
+      else
+        REVIEW_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
+      fi
+      if [ "$REVIEW_EPOCH" -gt "$LATEST_COMMIT_EPOCH" ]; then
+        # Review was submitted AFTER latest commit — still fresh, blocks verification
+        BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
+        BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
+      fi
+      # Review submitted BEFORE latest commit — stale, skip
+    fi
+  done <<< "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -c '.[]')"
+else
+  # No commit date or no changes_requested — check raw count as fallback
+  BLOCKING_CHANGES_REQUESTED=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length 2>/dev/null || echo "0")
+  BLOCKING_REQUESTERS=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -r '[.[].author.login] | join(", ")' 2>/dev/null || echo "unknown")
+fi
+
+if [ "$BLOCKING_CHANGES_REQUESTED" -gt 0 ]; then
+  echo "NOT COMPLETE: CHANGES_REQUESTED (after latest commit) from ${BLOCKING_REQUESTERS} on PR #$PR_NUMBER" >&2
  exit 1
 fi

--- a/autogpt_platform/frontend/src/app/(platform)/build/components/BuilderChatPanel/BuilderChatPanel.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/build/components/BuilderChatPanel/BuilderChatPanel.tsx
@@ -1,317 +0,0 @@
-"use client";
-
-import { Button } from "@/components/atoms/Button/Button";
-import { cn } from "@/lib/utils";
-import {
-  ChatCircle,
-  PaperPlaneTilt,
-  SpinnerGap,
-  StopCircle,
-  X,
-} from "@phosphor-icons/react";
-import { KeyboardEvent, useEffect, useRef, useState } from "react";
-import type { CustomNode } from "../FlowEditor/nodes/CustomNode/CustomNode";
-import { GraphAction } from "./helpers";
-import { useBuilderChatPanel } from "./useBuilderChatPanel";
-
-interface Props {
-  className?: string;
-  isGraphLoaded?: boolean;
-}
-
-export function BuilderChatPanel({ className, isGraphLoaded }: Props) {
-  const {
-    isOpen,
-    handleToggle,
-    messages,
-    sendMessage,
-    stop,
-    status,
-    isCreatingSession,
-    sessionError,
-    sessionId,
-    nodes,
-    parsedActions,
-    handleApplyAction,
-  } = useBuilderChatPanel({ isGraphLoaded });
-
-  const [inputValue, setInputValue] = useState("");
-  const messagesEndRef = useRef<HTMLDivElement>(null);
-  const isStreaming = status === "streaming" || status === "submitted";
-  // Block input until the session is ready to prevent messages being sent
-  // before the seed context has been delivered to the AI.
-  const canSend =
-    Boolean(sessionId) && !isCreatingSession && !sessionError && !isStreaming;
-
-  // Scroll to bottom whenever a new message lands (AI response or user send)
-  useEffect(() => {
-    messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
-  }, [messages.length]);
-
-  function handleSend() {
-    const text = inputValue.trim();
-    if (!text || !canSend) return;
-    setInputValue("");
-    sendMessage({ text });
-    setTimeout(() => {
-      messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
-    }, 50);
-  }
-
-  function handleKeyDown(e: KeyboardEvent<HTMLTextAreaElement>) {
-    if (e.key === "Enter" && !e.shiftKey) {
-      e.preventDefault();
-      handleSend();
-    }
-  }
-
-  return (
-    <div
-      className={cn(
-        "pointer-events-none fixed bottom-4 right-4 z-50 flex flex-col items-end gap-2",
-        className,
-      )}
-    >
-      {isOpen && (
-        <div className="pointer-events-auto flex h-[70vh] w-96 flex-col overflow-hidden rounded-xl border border-slate-200 bg-white shadow-2xl">
-          <PanelHeader onClose={handleToggle} />
-
-          <MessageList
-            messages={messages}
-            isCreatingSession={isCreatingSession}
-            sessionError={sessionError}
-            nodes={nodes}
-            parsedActions={parsedActions}
-            onApplyAction={handleApplyAction}
-            messagesEndRef={messagesEndRef}
-          />
-
-          <PanelInput
-            value={inputValue}
-            onChange={setInputValue}
-            onKeyDown={handleKeyDown}
-            onSend={handleSend}
-            onStop={stop}
-            isStreaming={isStreaming}
-            isDisabled={!canSend}
-          />
-        </div>
-      )}
-
-      <button
-        onClick={handleToggle}
-        className={cn(
-          "pointer-events-auto flex h-12 w-12 items-center justify-center rounded-full shadow-lg transition-colors",
-          isOpen
-            ? "bg-slate-800 text-white hover:bg-slate-700"
-            : "border border-slate-200 bg-white text-slate-700 hover:bg-slate-50",
-        )}
-        aria-label={isOpen ? "Close chat" : "Chat with builder"}
-      >
-        {isOpen ? <X size={20} /> : <ChatCircle size={22} weight="fill" />}
-      </button>
-    </div>
-  );
-}
-
-function PanelHeader({ onClose }: { onClose: () => void }) {
-  return (
-    <div className="flex items-center justify-between border-b border-slate-100 px-4 py-3">
-      <div className="flex items-center gap-2">
-        <ChatCircle size={18} weight="fill" className="text-violet-600" />
-        <span className="text-sm font-semibold text-slate-800">
-          Chat with Builder
-        </span>
-      </div>
-      <Button variant="icon" size="icon" onClick={onClose} aria-label="Close">
-        <X size={16} />
-      </Button>
-    </div>
-  );
-}
-
-interface MessageListProps {
-  messages: ReturnType<typeof useBuilderChatPanel>["messages"];
-  isCreatingSession: boolean;
-  sessionError: boolean;
-  nodes: CustomNode[];
-  parsedActions: GraphAction[];
-  onApplyAction: (action: GraphAction) => void;
-  messagesEndRef: React.RefObject<HTMLDivElement>;
-}
-
-function MessageList({
-  messages,
-  isCreatingSession,
-  sessionError,
-  nodes,
-  parsedActions,
-  onApplyAction,
-  messagesEndRef,
-}: MessageListProps) {
-  return (
-    <div className="flex-1 space-y-3 overflow-y-auto p-4">
-      {isCreatingSession && (
-        <div className="flex items-center gap-2 text-xs text-slate-500">
-          <SpinnerGap size={14} className="animate-spin" />
-          <span>Setting up chat session…</span>
-        </div>
-      )}
-
-      {sessionError && (
-        <div className="rounded-lg border border-red-100 bg-red-50 px-3 py-2 text-xs text-red-600">
-          Failed to start chat session. Please close and try again.
-        </div>
-      )}
-
-      {messages.map((msg) => {
-        const textParts = msg.parts
-          .filter(
-            (p): p is Extract<typeof p, { type: "text" }> => p.type === "text",
-          )
-          .map((p) => p.text)
-          .join("");
-
-        if (!textParts) return null;
-
-        return (
-          <div
-            key={msg.id}
-            className={cn(
-              "max-w-[85%] rounded-lg px-3 py-2 text-sm leading-relaxed",
-              msg.role === "user"
-                ? "ml-auto bg-violet-600 text-white"
-                : "bg-slate-100 text-slate-800",
-            )}
-          >
-            {textParts}
-          </div>
-        );
-      })}
-
-      {parsedActions.length > 0 && (
-        <div className="space-y-2 rounded-lg border border-violet-100 bg-violet-50 p-3">
-          <p className="text-xs font-medium text-violet-700">
-            AI applied these changes
-          </p>
-          {parsedActions.map((action) => {
-            const key =
-              action.type === "update_node_input"
-                ? `${action.nodeId}:${action.key}`
-                : `${action.source}:${action.sourceHandle}->${action.target}:${action.targetHandle}`;
-            return (
-              <ActionItem
-                key={key}
-                action={action}
-                nodes={nodes}
-                onApply={() => onApplyAction(action)}
-              />
-            );
-          })}
-        </div>
-      )}
-
-      <div ref={messagesEndRef} />
-    </div>
-  );
-}
-
-function ActionItem({
-  action,
-  nodes,
-  onApply,
-}: {
-  action: GraphAction;
-  nodes: CustomNode[];
-  onApply: () => void;
-}) {
-  // The AI applies changes server-side via edit_agent; the canvas refreshes
-  // automatically via invalidateQueries. The button starts in the applied state
-  // to reflect that changes are already live — not pending user confirmation.
-  const [applied, setApplied] = useState(true);
-
-  function handleApply() {
-    onApply();
-    setApplied(true);
-  }
-
-  const nodeName = (id: string) =>
-    nodes.find((n) => n.id === id)?.data.title ?? id;
-
-  const label =
-    action.type === "update_node_input"
-      ? `Set "${nodeName(action.nodeId)}" "${action.key}" = ${JSON.stringify(action.value)}`
-      : `Connect "${nodeName(action.source)}" → "${nodeName(action.target)}"`;
-
-  return (
-    <div className="flex items-start justify-between gap-2 rounded bg-white p-2 text-xs shadow-sm">
-      <span className="leading-tight text-slate-700">{label}</span>
-      <button
-        onClick={handleApply}
-        disabled={applied}
-        className={cn(
-          "shrink-0 rounded px-2 py-0.5 text-xs font-medium transition-colors",
-          applied
-            ? "bg-green-100 text-green-700"
-            : "bg-violet-600 text-white hover:bg-violet-700",
-        )}
-      >
-        {applied ? "Applied" : "Apply"}
-      </button>
-    </div>
-  );
-}
-
-interface PanelInputProps {
-  value: string;
-  onChange: (v: string) => void;
-  onKeyDown: (e: KeyboardEvent<HTMLTextAreaElement>) => void;
-  onSend: () => void;
-  onStop: () => void;
-  isStreaming: boolean;
-  isDisabled: boolean;
-}
-
-function PanelInput({
-  value,
-  onChange,
-  onKeyDown,
-  onSend,
-  onStop,
-  isStreaming,
-  isDisabled,
-}: PanelInputProps) {
-  return (
-    <div className="border-t border-slate-100 p-3">
-      <div className="flex items-end gap-2">
-        <textarea
-          value={value}
-          disabled={isDisabled}
-          onChange={(e) => onChange(e.target.value)}
-          onKeyDown={onKeyDown}
-          placeholder="Ask about your agent…"
-          rows={2}
-          className="flex-1 resize-none rounded-lg border border-slate-200 bg-slate-50 px-3 py-2 text-sm text-slate-800 placeholder:text-slate-400 focus:border-violet-400 focus:outline-none focus:ring-1 focus:ring-violet-200 disabled:opacity-50"
-        />
-        {isStreaming ? (
-          <button
-            onClick={onStop}
-            className="flex h-9 w-9 items-center justify-center rounded-lg bg-red-100 text-red-600 transition-colors hover:bg-red-200"
-            aria-label="Stop"
-          >
-            <StopCircle size={18} />
-          </button>
-        ) : (
-          <button
-            onClick={onSend}
-            disabled={isDisabled || !value.trim()}
-            className="flex h-9 w-9 items-center justify-center rounded-lg bg-violet-600 text-white transition-colors hover:bg-violet-700 disabled:opacity-40"
-            aria-label="Send"
-          >
-            <PaperPlaneTilt size={18} />
-          </button>
-        )}
-      </div>
-    </div>
-  );
-}
--- a/autogpt_platform/frontend/src/app/(platform)/build/components/BuilderChatPanel/tests/BuilderChatPanel.test.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/build/components/BuilderChatPanel/tests/BuilderChatPanel.test.tsx
@@ -1,316 +0,0 @@
-import {
-  render,
-  screen,
-  fireEvent,
-  cleanup,
-} from "@/tests/integrations/test-utils";
-import { describe, expect, it, vi, beforeEach, afterEach } from "vitest";
-import { BuilderChatPanel } from "../BuilderChatPanel";
-import { serializeGraphForChat, parseGraphActions } from "../helpers";
-import type { CustomNode } from "../../FlowEditor/nodes/CustomNode/CustomNode";
-import type { CustomEdge } from "../../FlowEditor/edges/CustomEdge";
-
-// Mock the hook so we isolate the component rendering
-vi.mock("../useBuilderChatPanel", () => ({
-  useBuilderChatPanel: vi.fn(),
-}));
-
-import { useBuilderChatPanel } from "../useBuilderChatPanel";
-
-const mockUseBuilderChatPanel = vi.mocked(useBuilderChatPanel);
-
-function makeMockHook(
-  overrides: Partial<ReturnType<typeof useBuilderChatPanel>> = {},
-): ReturnType<typeof useBuilderChatPanel> {
-  return {
-    isOpen: false,
-    handleToggle: vi.fn(),
-    messages: [],
-    sendMessage: vi.fn(),
-    stop: vi.fn(),
-    status: "ready",
-    isCreatingSession: false,
-    sessionError: false,
-    sessionId: null,
-    nodes: [],
-    parsedActions: [],
-    handleApplyAction: vi.fn(),
-    ...overrides,
-  };
-}
-
-beforeEach(() => {
-  mockUseBuilderChatPanel.mockReturnValue(makeMockHook());
-});
-
-afterEach(() => {
-  cleanup();
-});
-
-describe("BuilderChatPanel", () => {
-  it("renders the toggle button when closed", () => {
-    render(<BuilderChatPanel />);
-    expect(screen.getByLabelText("Chat with builder")).toBeDefined();
-  });
-
-  it("does not render the panel content when closed", () => {
-    render(<BuilderChatPanel />);
-    expect(screen.queryByText("Chat with Builder")).toBeNull();
-  });
-
-  it("calls handleToggle when the toggle button is clicked", () => {
-    const handleToggle = vi.fn();
-    mockUseBuilderChatPanel.mockReturnValue(makeMockHook({ handleToggle }));
-    render(<BuilderChatPanel />);
-    fireEvent.click(screen.getByLabelText("Chat with builder"));
-    expect(handleToggle).toHaveBeenCalledOnce();
-  });
-
-  it("renders the panel when isOpen is true", () => {
-    mockUseBuilderChatPanel.mockReturnValue(makeMockHook({ isOpen: true }));
-    render(<BuilderChatPanel />);
-    expect(screen.getByText("Chat with Builder")).toBeDefined();
-  });
-
-  it("shows creating session indicator when isCreatingSession is true", () => {
-    mockUseBuilderChatPanel.mockReturnValue(
-      makeMockHook({ isOpen: true, isCreatingSession: true }),
-    );
-    render(<BuilderChatPanel />);
-    expect(screen.getByText(/Setting up chat session/i)).toBeDefined();
-  });
-
-  it("renders user and assistant messages", () => {
-    mockUseBuilderChatPanel.mockReturnValue(
-      makeMockHook({
-        isOpen: true,
-        messages: [
-          {
-            id: "1",
-            role: "user",
-            parts: [{ type: "text", text: "What does this agent do?" }],
-          },
-          {
-            id: "2",
-            role: "assistant",
-            parts: [{ type: "text", text: "This agent searches the web." }],
-          },
-        ] as ReturnType<typeof useBuilderChatPanel>["messages"],
-      }),
-    );
-    render(<BuilderChatPanel />);
-    expect(screen.getByText("What does this agent do?")).toBeDefined();
-    expect(screen.getByText("This agent searches the web.")).toBeDefined();
-  });
-
-  it("renders applied actions section when parsedActions are present", () => {
-    mockUseBuilderChatPanel.mockReturnValue(
-      makeMockHook({
-        isOpen: true,
-        parsedActions: [
-          {
-            type: "update_node_input",
-            nodeId: "1",
-            key: "query",
-            value: "AI news",
-          },
-        ],
-      }),
-    );
-    render(<BuilderChatPanel />);
-    expect(screen.getByText("AI applied these changes")).toBeDefined();
-    expect(screen.getByText("Applied")).toBeDefined();
-  });
-
-  it("shows pre-applied actions as disabled", () => {
-    const action = {
-      type: "update_node_input" as const,
-      nodeId: "1",
-      key: "query",
-      value: "AI news",
-    };
-    mockUseBuilderChatPanel.mockReturnValue(
-      makeMockHook({
-        isOpen: true,
-        parsedActions: [action],
-      }),
-    );
-    render(<BuilderChatPanel />);
-    const button = screen.getByRole("button", {
-      name: "Applied",
-    }) as HTMLButtonElement;
-    expect(button.disabled).toBe(true);
-  });
-
-  it("calls sendMessage when the user submits a message", () => {
-    const sendMessage = vi.fn();
-    mockUseBuilderChatPanel.mockReturnValue(
-      makeMockHook({ isOpen: true, sessionId: "sess-1", sendMessage }),
-    );
-    render(<BuilderChatPanel />);
-    const textarea = screen.getByPlaceholderText("Ask about your agent…");
-    fireEvent.change(textarea, { target: { value: "Add a summarizer block" } });
-    fireEvent.click(screen.getByLabelText("Send"));
-    expect(sendMessage).toHaveBeenCalledWith({
-      text: "Add a summarizer block",
-    });
-  });
-
-  it("shows Stop button when streaming", () => {
-    const stop = vi.fn();
-    mockUseBuilderChatPanel.mockReturnValue(
-      makeMockHook({ isOpen: true, status: "streaming", stop }),
-    );
-    render(<BuilderChatPanel />);
-    expect(screen.getByLabelText("Stop")).toBeDefined();
-    fireEvent.click(screen.getByLabelText("Stop"));
-    expect(stop).toHaveBeenCalledOnce();
-  });
-});
-
-describe("serializeGraphForChat", () => {
-  it("returns empty message when no nodes", () => {
-    const result = serializeGraphForChat([], []);
-    expect(result).toBe("The graph is currently empty.");
-  });
-
-  it("lists block names and descriptions", () => {
-    const nodes = [
-      {
-        id: "1",
-        data: {
-          title: "Google Search",
-          description: "Searches the web",
-          hardcodedValues: {},
-          inputSchema: {},
-          outputSchema: {},
-          uiType: 1,
-          block_id: "block-1",
-          costs: [],
-          categories: [],
-        },
-        type: "custom" as const,
-        position: { x: 0, y: 0 },
-      },
-    ] as unknown as CustomNode[];
-
-    const result = serializeGraphForChat(nodes, []);
-    expect(result).toContain('"Google Search"');
-    expect(result).toContain("Searches the web");
-  });
-
-  it("lists connections between nodes", () => {
-    const nodes = [
-      {
-        id: "1",
-        data: {
-          title: "Search",
-          description: "",
-          hardcodedValues: {},
-          inputSchema: {},
-          outputSchema: {},
-          uiType: 1,
-          block_id: "b1",
-          costs: [],
-          categories: [],
-        },
-        type: "custom" as const,
-        position: { x: 0, y: 0 },
-      },
-      {
-        id: "2",
-        data: {
-          title: "Formatter",
-          description: "",
-          hardcodedValues: {},
-          inputSchema: {},
-          outputSchema: {},
-          uiType: 1,
-          block_id: "b2",
-          costs: [],
-          categories: [],
-        },
-        type: "custom" as const,
-        position: { x: 200, y: 0 },
-      },
-    ] as unknown as CustomNode[];
-
-    const edges = [
-      {
-        id: "1:result->2:input",
-        source: "1",
-        target: "2",
-        sourceHandle: "result",
-        targetHandle: "input",
-        type: "custom" as const,
-      },
-    ] as unknown as CustomEdge[];
-
-    const result = serializeGraphForChat(nodes, edges);
-    expect(result).toContain("Connections");
-    expect(result).toContain('"Search"');
-    expect(result).toContain('"Formatter"');
-  });
-});
-
-describe("parseGraphActions", () => {
-  it("returns empty array for plain text", () => {
-    expect(parseGraphActions("This agent searches the web.")).toEqual([]);
-  });
-
-  it("parses update_node_input action", () => {
-    const text = `
-Here is a suggestion:
-\`\`\`json
-{"action": "update_node_input", "node_id": "1", "key": "query", "value": "AI news"}
-\`\`\`
-    `;
-    const actions = parseGraphActions(text);
-    expect(actions).toHaveLength(1);
-    expect(actions[0]).toEqual({
-      type: "update_node_input",
-      nodeId: "1",
-      key: "query",
-      value: "AI news",
-    });
-  });
-
-  it("parses connect_nodes action", () => {
-    const text = `
-\`\`\`json
-{"action": "connect_nodes", "source": "1", "target": "2", "source_handle": "result", "target_handle": "input"}
-\`\`\`
-    `;
-    const actions = parseGraphActions(text);
-    expect(actions).toHaveLength(1);
-    expect(actions[0]).toEqual({
-      type: "connect_nodes",
-      source: "1",
-      target: "2",
-      sourceHandle: "result",
-      targetHandle: "input",
-    });
-  });
-
-  it("ignores invalid JSON blocks", () => {
-    const text = "```json\nnot valid json\n```";
-    expect(parseGraphActions(text)).toEqual([]);
-  });
-
-  it("ignores blocks without action field", () => {
-    const text = '```json\n{"key": "value"}\n```';
-    expect(parseGraphActions(text)).toEqual([]);
-  });
-
-  it("ignores update_node_input actions with missing required fields", () => {
-    const text =
-      '```json\n{"action": "update_node_input", "node_id": "1"}\n```';
-    expect(parseGraphActions(text)).toEqual([]);
-  });
-
-  it("ignores connect_nodes actions with empty handles", () => {
-    const text =
-      '```json\n{"action": "connect_nodes", "source": "1", "target": "2", "source_handle": "", "target_handle": "input"}\n```';
-    expect(parseGraphActions(text)).toEqual([]);
-  });
-});
--- a/autogpt_platform/frontend/src/app/(platform)/build/components/BuilderChatPanel/helpers.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/build/components/BuilderChatPanel/helpers.ts
@@ -1,110 +0,0 @@
-import type { CustomNode } from "../FlowEditor/nodes/CustomNode/CustomNode";
-import type { CustomEdge } from "../FlowEditor/edges/CustomEdge";
-
-export type GraphAction =
-  | {
-      type: "update_node_input";
-      nodeId: string;
-      key: string;
-      value: unknown;
-    }
-  | {
-      type: "connect_nodes";
-      source: string;
-      target: string;
-      sourceHandle: string;
-      targetHandle: string;
-    };
-
-export function serializeGraphForChat(
-  nodes: CustomNode[],
-  edges: CustomEdge[],
-): string {
-  if (nodes.length === 0) return "The graph is currently empty.";
-
-  const nodeLines = nodes.map((n) => {
-    const name = n.data.metadata?.customized_name || n.data.title;
-    const desc = n.data.description ? ` — ${n.data.description}` : "";
-    return `- Node ${n.id}: "${name}"${desc}`;
-  });
-
-  const edgeLines = edges.map((e) => {
-    const src = nodes.find((n) => n.id === e.source);
-    const tgt = nodes.find((n) => n.id === e.target);
-    const srcName =
-      src?.data.metadata?.customized_name || src?.data.title || e.source;
-    const tgtName =
-      tgt?.data.metadata?.customized_name || tgt?.data.title || e.target;
-    return `- "${srcName}" (${e.sourceHandle}) → "${tgtName}" (${e.targetHandle})`;
-  });
-
-  const parts = [`Blocks (${nodes.length}):\n${nodeLines.join("\n")}`];
-  if (edgeLines.length > 0) {
-    parts.push(`Connections (${edges.length}):\n${edgeLines.join("\n")}`);
-  }
-  return parts.join("\n\n");
-}
-
-export function parseGraphActions(text: string): GraphAction[] {
-  const actions: GraphAction[] = [];
-  const jsonBlockRegex = /```(?:json)?\s*\n?([\s\S]*?)\n?```/g;
-  let match: RegExpExecArray | null;
-
-  while ((match = jsonBlockRegex.exec(text)) !== null) {
-    try {
-      const parsed = JSON.parse(match[1]) as unknown;
-      if (
-        typeof parsed !== "object" ||
-        parsed === null ||
-        !("action" in parsed)
-      ) {
-        continue;
-      }
-      const obj = parsed as Record<string, unknown>;
-      if (obj.action === "update_node_input") {
-        const nodeId = obj.node_id;
-        const key = obj.key;
-        if (
-          typeof nodeId !== "string" ||
-          !nodeId ||
-          typeof key !== "string" ||
-          !key ||
-          obj.value === undefined
-        )
-          continue;
-        actions.push({
-          type: "update_node_input",
-          nodeId,
-          key,
-          value: obj.value,
-        });
-      } else if (obj.action === "connect_nodes") {
-        const source = obj.source;
-        const target = obj.target;
-        const sourceHandle = obj.source_handle;
-        const targetHandle = obj.target_handle;
-        if (
-          typeof source !== "string" ||
-          !source ||
-          typeof target !== "string" ||
-          !target ||
-          typeof sourceHandle !== "string" ||
-          !sourceHandle ||
-          typeof targetHandle !== "string" ||
-          !targetHandle
-        )
-          continue;
-        actions.push({
-          type: "connect_nodes",
-          source,
-          target,
-          sourceHandle,
-          targetHandle,
-        });
-      }
-    } catch {
-      // Not valid JSON, skip
-    }
-  }
-  return actions;
-}
--- a/autogpt_platform/frontend/src/app/(platform)/build/components/BuilderChatPanel/useBuilderChatPanel.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/build/components/BuilderChatPanel/useBuilderChatPanel.ts
@@ -1,222 +0,0 @@
-import { postV2CreateSession } from "@/app/api/__generated__/endpoints/chat/chat";
-import { getGetV1GetSpecificGraphQueryKey } from "@/app/api/__generated__/endpoints/graphs/graphs";
-import { getWebSocketToken } from "@/lib/supabase/actions";
-import { environment } from "@/services/environment";
-import { useQueryClient } from "@tanstack/react-query";
-import { useChat } from "@ai-sdk/react";
-import { DefaultChatTransport } from "ai";
-import { useEffect, useMemo, useRef, useState } from "react";
-import { parseAsString, useQueryStates } from "nuqs";
-import { useShallow } from "zustand/react/shallow";
-import { useEdgeStore } from "../../stores/edgeStore";
-import { useNodeStore } from "../../stores/nodeStore";
-import {
-  GraphAction,
-  parseGraphActions,
-  serializeGraphForChat,
-} from "./helpers";
-
-type SendMessageFn = ReturnType<typeof useChat>["sendMessage"];
-
-interface UseBuilderChatPanelArgs {
-  isGraphLoaded?: boolean;
-}
-
-export function useBuilderChatPanel({
-  isGraphLoaded = true,
-}: UseBuilderChatPanelArgs = {}) {
-  const [isOpen, setIsOpen] = useState(false);
-  const [sessionId, setSessionId] = useState<string | null>(null);
-  const [isCreatingSession, setIsCreatingSession] = useState(false);
-  const [sessionError, setSessionError] = useState(false);
-  const initializedRef = useRef(false);
-  const sendMessageRef = useRef<SendMessageFn | null>(null);
-  const prevStatusRef = useRef<string>("ready");
-
-  const [{ flowID }] = useQueryStates({ flowID: parseAsString });
-  const queryClient = useQueryClient();
-
-  const nodes = useNodeStore(useShallow((s) => s.nodes));
-  const edges = useEdgeStore(useShallow((s) => s.edges));
-  const updateNodeData = useNodeStore(useShallow((s) => s.updateNodeData));
-  const addEdge = useEdgeStore(useShallow((s) => s.addEdge));
-
-  // Reset session and initialized state when the user navigates to a different
-  // graph so the new graph's context is sent to the AI on next open.
-  useEffect(() => {
-    setSessionId(null);
-    setSessionError(false);
-    initializedRef.current = false;
-  }, [flowID]);
-
-  useEffect(() => {
-    if (!isOpen || sessionId || isCreatingSession || sessionError) return;
-
-    async function createSession() {
-      setIsCreatingSession(true);
-      try {
-        const res = await postV2CreateSession(null);
-        if (res.status === 200) {
-          setSessionId(res.data.id);
-        } else {
-          setSessionError(true);
-        }
-      } catch {
-        setSessionError(true);
-      } finally {
-        setIsCreatingSession(false);
-      }
-    }
-
-    createSession();
-  }, [isOpen, sessionId, isCreatingSession, sessionError]);
-
-  const transport = useMemo(
-    () =>
-      sessionId
-        ? new DefaultChatTransport({
-            api: `${environment.getAGPTServerBaseUrl()}/api/chat/sessions/${sessionId}/stream`,
-            prepareSendMessagesRequest: async ({ messages }) => {
-              const last = messages[messages.length - 1];
-              const { token, error } = await getWebSocketToken();
-              if (error || !token)
-                throw new Error(
-                  "Authentication failed — please sign in again.",
-                );
-              const messageText =
-                last.parts
-                  ?.map((p) => (p.type === "text" ? p.text : ""))
-                  .join("") ?? "";
-              return {
-                body: {
-                  message: messageText,
-                  is_user_message: last.role === "user",
-                  context: null,
-                  file_ids: null,
-                  mode: null,
-                },
-                headers: { Authorization: `Bearer ${token}` },
-              };
-            },
-          })
-        : null,
-    [sessionId],
-  );
-
-  const { messages, sendMessage, stop, status } = useChat({
-    id: sessionId ?? undefined,
-    transport: transport ?? undefined,
-  });
-
-  // Keep a stable ref so the initialization effect can call sendMessage
-  // without including it in the deps array (avoids re-triggering the effect)
-  sendMessageRef.current = sendMessage;
-
-  // Parsed actions from the last assistant message. Placed before the
-  // invalidation effect so the effect can check whether a turn mutated the graph.
-  const parsedActions = useMemo(() => {
-    const assistantMessages = messages.filter((m) => m.role === "assistant");
-    const last = assistantMessages[assistantMessages.length - 1];
-    if (!last) return [];
-    const text = last.parts
-      .filter(
-        (p): p is Extract<typeof p, { type: "text" }> => p.type === "text",
-      )
-      .map((p) => p.text)
-      .join("");
-    const parsed = parseGraphActions(text);
-    const seen = new Set<string>();
-    return parsed.filter((action) => {
-      const key =
-        action.type === "update_node_input"
-          ? `${action.nodeId}:${action.key}`
-          : `${action.source}:${action.sourceHandle}->${action.target}:${action.targetHandle}`;
-      if (seen.has(key)) return false;
-      seen.add(key);
-      return true;
-    });
-  }, [messages]);
-
-  // Refresh the canvas only when the AI turn actually mutated the graph via
-  // edit_agent. Gating on parsedActions.length > 0 avoids an unnecessary
-  // refetch after read-only turns (e.g. the initial description response).
-  useEffect(() => {
-    const prev = prevStatusRef.current;
-    prevStatusRef.current = status;
-    if (
-      status === "ready" &&
-      (prev === "streaming" || prev === "submitted") &&
-      flowID &&
-      parsedActions.length > 0
-    ) {
-      queryClient.invalidateQueries({
-        queryKey: getGetV1GetSpecificGraphQueryKey(flowID),
-      });
-    }
-  }, [status, flowID, queryClient, parsedActions.length]);
-
-  useEffect(() => {
-    if (!sessionId || !transport || !isGraphLoaded || initializedRef.current)
-      return;
-    initializedRef.current = true;
-    const summary = serializeGraphForChat(nodes, edges);
-    sendMessageRef.current?.({
-      text:
-        `I'm building an agent in the AutoGPT flow builder. Here's the current graph:\n\n${summary}\n\n` +
-        `IMPORTANT: When you modify the graph using edit_agent or fix_agent_graph, you MUST output one JSON ` +
-        `code block per change using EXACTLY these formats — no other structure is recognized:\n\n` +
-        `To update a node input field:\n` +
-        `\`\`\`json\n{"action": "update_node_input", "node_id": "<exact node id>", "key": "<input field name>", "value": <new value>}\n\`\`\`\n\n` +
-        `To add a connection between nodes:\n` +
-        `\`\`\`json\n{"action": "connect_nodes", "source": "<source node id>", "target": "<target node id>", "source_handle": "<output handle name>", "target_handle": "<input handle name>"}\n\`\`\`\n\n` +
-        `Rules: the "action" key is required and must be exactly "update_node_input" or "connect_nodes". ` +
-        `Do not use any other field names (e.g. "block", "change", "field", "from", "to" are NOT valid).\n\n` +
-        `What does this agent do?`,
-    });
-  }, [sessionId, transport, isGraphLoaded]);
-
-  function handleToggle() {
-    // Reset session error when reopening so the panel can retry session creation
-    if (!isOpen && !sessionId) {
-      setSessionError(false);
-    }
-    setIsOpen((o) => !o);
-  }
-
-  function handleApplyAction(action: GraphAction) {
-    if (action.type === "update_node_input") {
-      const node = nodes.find((n) => n.id === action.nodeId);
-      if (!node) return;
-      updateNodeData(action.nodeId, {
-        hardcodedValues: {
-          ...node.data.hardcodedValues,
-          [action.key]: action.value,
-        },
-      });
-    } else if (action.type === "connect_nodes") {
-      addEdge({
-        id: `${action.source}:${action.sourceHandle}->${action.target}:${action.targetHandle}`,
-        source: action.source,
-        target: action.target,
-        sourceHandle: action.sourceHandle,
-        targetHandle: action.targetHandle,
-        type: "custom",
-      });
-    }
-  }
-
-  return {
-    isOpen,
-    handleToggle,
-    messages,
-    sendMessage,
-    stop,
-    status,
-    isCreatingSession,
-    sessionError,
-    sessionId,
-    nodes,
-    parsedActions,
-    handleApplyAction,
-  };
-}
--- a/autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/Flow/Flow.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/Flow/Flow.tsx
@@ -1,8 +1,6 @@
 import { useGetV1GetSpecificGraph } from "@/app/api/__generated__/endpoints/graphs/graphs";
 import { okData } from "@/app/api/helpers";
 import { FloatingReviewsPanel } from "@/components/organisms/FloatingReviewsPanel/FloatingReviewsPanel";
-import { BuilderChatPanel } from "../../BuilderChatPanel/BuilderChatPanel";
-import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import { Background, ReactFlow } from "@xyflow/react";
 import { parseAsString, useQueryStates } from "nuqs";
 import { useCallback, useMemo } from "react";
@@ -92,8 +90,6 @@ export const Flow = () => {
    useShallow((state) => state.isGraphRunning),
  );

-  const isBuilderChatEnabled = useGetFlag(Flag.BUILDER_CHAT_PANEL);
-
  return (
    <div className="flex h-full w-full dark:bg-slate-900">
      <div className="relative flex-1">
@@ -138,9 +134,6 @@ export const Flow = () => {
        executionId={flowExecutionID || undefined}
        graphId={flowID || undefined}
      />
-      {isBuilderChatEnabled && (
-        <BuilderChatPanel isGraphLoaded={isInitialLoadComplete} />
-      )}
    </div>
  );
 };
--- a/autogpt_platform/frontend/src/services/feature-flags/use-get-flag.ts
+++ b/autogpt_platform/frontend/src/services/feature-flags/use-get-flag.ts
@@ -10,7 +10,6 @@ export enum Flag {
  ENABLE_PLATFORM_PAYMENT = "enable-platform-payment",
  ARTIFACTS = "artifacts",
  CHAT_MODE_OPTION = "chat-mode-option",
-  BUILDER_CHAT_PANEL = "builder-chat-panel",
 }

 const isPwMockEnabled = process.env.NEXT_PUBLIC_PW_TEST === "true";
@@ -21,7 +20,6 @@ const defaultFlags = {
  [Flag.ENABLE_PLATFORM_PAYMENT]: false,
  [Flag.ARTIFACTS]: false,
  [Flag.CHAT_MODE_OPTION]: false,
-  [Flag.BUILDER_CHAT_PANEL]: false,
 };

 type FlagValues = typeof defaultFlags;