Compare commits

..

1 Commits

Author SHA1 Message Date
Zamil Majdy
f5e2eccda7 dx(orchestrate): fix stale-review gate and add pr-test evaluation rules to SKILL.md (#12701)
## Changes

### verify-complete.sh
- CHANGES_REQUESTED reviews are now compared against the latest commit
timestamp. If the review was submitted **before** the latest commit, it
is treated as stale and does not block verification.
- Added fail-closed guard: if the `gh pr view` fetch fails, the script
exits 1 (rather than treating missing data as "no blocking reviews")
- Fixed edge case: a `CHANGES_REQUESTED` review with a null
`submittedAt` is now counted as fresh/blocking (previously silently
skipped)
- Combined two separate `gh pr view` calls into one (`--json
commits,reviews`) to reduce API calls and ensure consistency

### SKILL.md (orchestrate skill)
- Added `### /pr-test result evaluation` section with explicit
pass/partial/fail handling table
- **PARTIAL on any headline feature scenario = immediate blocker**:
re-brief the agent, fix, and re-run from scratch. Never approve or
output ORCHESTRATOR:DONE with a PARTIAL headline result.
- Concrete incident callout: PR #12699 S5 (Apply suggestions) was
PARTIAL — AI never output JSON action blocks — but was nearly approved.
This rule prevents recurrence.
- Updated `verify-complete.sh` description throughout to include "no
fresh CHANGES_REQUESTED"
- Added staleness rule documentation: a review only blocks if submitted
*after* the latest commit

## Why

Two separate incidents prompted these changes:

1. **verify-complete.sh false positive**: An automated bot
(autogpt-pr-reviewer) submitted a `CHANGES_REQUESTED` review in April.
An agent then pushed fixing commits. The old script still blocked on the
stale review, preventing the PR from being verified as done.

2. **Missed PARTIAL signal**: PR #12699 had a PARTIAL result on its
headline scenario (S5 Apply button) because the AI emitted direct
builder tool calls instead of JSON action blocks. The orchestrator
nearly approved it. The new SKILL.md rule makes PARTIAL = blocker
explicit.

## Checklist

- [x] I have read the contribution guide
- [x] My changes follow the code style of this project  
- [x] Changes are limited to the scope of this PR (< 20% unrelated
changes)
- [x] All new and existing tests pass
2026-04-08 08:58:42 +07:00
8 changed files with 98 additions and 985 deletions

View File

@@ -25,7 +25,7 @@ STATE_FILE=~/.claude/orchestrator-state.json
| `spawn-agent.sh SESSION PATH SPARE NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]` | Create window + checkout branch + launch claude + send task. **Stdout: `SESSION:WIN` only** |
| `recycle-agent.sh WINDOW PATH SPARE_BRANCH` | Kill window + restore spare branch |
| `run-loop.sh` | **Mechanical babysitter** — idle restart + dialog approval + recycle on ORCHESTRATOR:DONE + supervisor health check + all-done notification |
| `verify-complete.sh WINDOW` | Verify PR is done: checkpoints ✓ + 0 unresolved threads + CI green. Repo auto-derived from state file `.repo` or git remote. |
| `verify-complete.sh WINDOW` | Verify PR is done: checkpoints ✓ + 0 unresolved threads + CI green + no fresh CHANGES_REQUESTED. Repo auto-derived from state file `.repo` or git remote. |
| `notify.sh MESSAGE` | Send notification via Discord webhook (env `DISCORD_WEBHOOK_URL` or state `.discord_webhook`), macOS notification center, and stdout |
| `capacity.sh [REPO_ROOT]` | Print available + in-use worktrees |
| `status.sh` | Print fleet status + live pane commands |
@@ -64,7 +64,7 @@ spare/N branch → spawn-agent.sh (--session-id UUID) → window + feat/bran
ORCHESTRATOR:DONE
verify-complete.sh: checkpoints ✓ + 0 threads + CI green
verify-complete.sh: checkpoints ✓ + 0 threads + CI green + no fresh CHANGES_REQUESTED
state → "done", notify, window KEPT OPEN
@@ -328,7 +328,9 @@ For each agent, decide:
### Strict ORCHESTRATOR:DONE gate
`verify-complete.sh` handles the main checks automatically (checkpoints, threads, CHANGES_REQUESTED, CI green, spawned_at). Run it:
`verify-complete.sh` handles the main checks automatically (checkpoints, threads, CI green, spawned_at, and CHANGES_REQUESTED). Run it:
**CHANGES_REQUESTED staleness rule**: a `CHANGES_REQUESTED` review only blocks if it was submitted *after* the latest commit. If the latest commit postdates the review, the review is considered stale (feedback already addressed) and does not block. This avoids false negatives when a bot reviewer hasn't re-reviewed after the agent's fixing commits.
```bash
SKILLS_DIR=~/.claude/orchestrator/scripts
@@ -412,6 +414,38 @@ Please verify: <specific behaviors to check>.
Only one `/pr-test` at a time — they share ports and DB.
### /pr-test result evaluation
**PARTIAL on any headline feature scenario is an immediate blocker.** Do not approve, do not mark done, do not let the agent output `ORCHESTRATOR:DONE`.
| `/pr-test` result | Action |
|---|---|
| All headline scenarios **PASS** | Proceed to evaluation step 2 |
| Any headline scenario **PARTIAL** | Re-brief the agent immediately — see below |
| Any headline scenario **FAIL** | Re-brief the agent immediately |
**What PARTIAL means**: the feature is only partly working. Example: the Apply button never appeared, or the AI returned no action blocks. The agent addressed part of the objective but not all of it.
**When any headline scenario is PARTIAL or FAIL:**
1. Do NOT mark the agent done or accept `ORCHESTRATOR:DONE`
2. Re-brief the agent with the specific scenario that failed and what was missing:
```bash
tmux send-keys -t SESSION:WIN "PARTIAL result on /pr-test — S5 (Apply button) never appeared. The AI must output JSON action blocks for the Apply button to render. Fix this before re-running /pr-test."
sleep 0.3
tmux send-keys -t SESSION:WIN Enter
```
3. Set state back to `running`:
```bash
jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "running"' \
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
```
4. Wait for new `ORCHESTRATOR:DONE`, then re-run `/pr-test` from scratch
**Rule: only ALL-PASS qualifies for approval.** A mix of PASS + PARTIAL is a failure.
> **Why this matters**: PR #12699 was wrongly approved with S5 PARTIAL — the AI never output JSON action blocks so the Apply button never appeared. The fix was already in the agent's reach but slipped through because PARTIAL was not treated as blocking.
### 2. Do your own evaluation
1. **Read the PR diff and objective** — does the code actually implement what was asked? Is anything obviously missing or half-done?
@@ -421,8 +455,9 @@ Only one `/pr-test` at a time — they share ports and DB.
### 3. Decide
- `/pr-test` passes + evaluation looks good → mark `done` in state, tell the user the PR is ready, ask if window should be closed
- `/pr-test` fails or evaluation finds gaps → re-brief the agent with specific failures, set state back to `running`
- `/pr-test` all scenarios PASS + evaluation looks good → mark `done` in state, tell the user the PR is ready, ask if window should be closed
- `/pr-test` any scenario PARTIAL or FAIL → re-brief the agent with the specific failing scenario, set state back to `running` (see `/pr-test result evaluation` above)
- Evaluation finds gaps even with all PASS → re-brief the agent with specific gaps, set state back to `running`
**Never mark done based purely on script output.** You hold the full objective context; the script does not.
@@ -441,6 +476,7 @@ Stop the fleet (`active = false`) when **all** of the following are true:
| All agents are `done` or `escalated` | `jq '[.agents[] | select(.state | test("running\|stuck\|idle\|waiting_approval"))] | length' ~/.claude/orchestrator-state.json` == 0 |
| All PRs have 0 unresolved review threads | GraphQL `isResolved` check per PR |
| All PRs have green CI **on a run triggered after the agent's last push** | `gh run list --branch BRANCH --limit 1` timestamp > `spawned_at` in state |
| No fresh CHANGES_REQUESTED (after latest commit) | `verify-complete.sh` checks this — stale pre-commit reviews are ignored |
| No agents are `escalated` without human review | If any are escalated, surface to user first |
**Do NOT stop just because agents output `ORCHESTRATOR:DONE`.** That is a signal to verify, not a signal to stop.

View File

@@ -115,13 +115,64 @@ if [ "$UNRESOLVED" -gt 0 ]; then
fi
# --- Check 6: no CHANGES_REQUESTED (checked AFTER CI — bots post reviews after their check) ---
CHANGES_REQUESTED=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
--json reviews --jq '[.reviews[] | select(.state == "CHANGES_REQUESTED")] | length' 2>/dev/null || echo "0")
# A CHANGES_REQUESTED review is stale if the latest commit was pushed AFTER the review was submitted.
# Stale reviews (pre-dating the fixing commits) should not block verification.
#
# Fetch commits and latestReviews in a single call and fail closed — if gh fails,
# treat that as NOT COMPLETE rather than silently passing.
# Use latestReviews (not reviews) so each reviewer's latest state is used — superseded
# CHANGES_REQUESTED entries are automatically excluded when the reviewer later approved.
# Note: we intentionally use committedDate (not PR updatedAt) because updatedAt changes on any
# PR activity (bot comments, label changes) which would create false negatives.
PR_REVIEW_METADATA=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
--json commits,latestReviews 2>/dev/null) || {
echo "NOT COMPLETE: unable to fetch PR review metadata for PR #$PR_NUMBER" >&2
exit 1
}
if [ "$CHANGES_REQUESTED" -gt 0 ]; then
REQUESTERS=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
--json reviews --jq '[.reviews[] | select(.state == "CHANGES_REQUESTED") | .author.login] | join(", ")' 2>/dev/null || echo "unknown")
echo "NOT COMPLETE: CHANGES_REQUESTED from ${REQUESTERS} on PR #$PR_NUMBER" >&2
LATEST_COMMIT_DATE=$(jq -r '.commits[-1].committedDate // ""' <<< "$PR_REVIEW_METADATA")
CHANGES_REQUESTED_REVIEWS=$(jq '[.latestReviews[]? | select(.state == "CHANGES_REQUESTED")]' <<< "$PR_REVIEW_METADATA")
BLOCKING_CHANGES_REQUESTED=0
BLOCKING_REQUESTERS=""
if [ -n "$LATEST_COMMIT_DATE" ] && [ "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length)" -gt 0 ]; then
if date --version >/dev/null 2>&1; then
LATEST_COMMIT_EPOCH=$(date -d "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
else
LATEST_COMMIT_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
fi
while IFS= read -r review; do
[ -z "$review" ] && continue
REVIEW_DATE=$(echo "$review" | jq -r '.submittedAt // ""')
REVIEWER=$(echo "$review" | jq -r '.author.login // "unknown"')
if [ -z "$REVIEW_DATE" ]; then
# No submission date — treat as fresh (conservative: blocks verification)
BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
else
if date --version >/dev/null 2>&1; then
REVIEW_EPOCH=$(date -d "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
else
REVIEW_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
fi
if [ "$REVIEW_EPOCH" -gt "$LATEST_COMMIT_EPOCH" ]; then
# Review was submitted AFTER latest commit — still fresh, blocks verification
BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
fi
# Review submitted BEFORE latest commit — stale, skip
fi
done <<< "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -c '.[]')"
else
# No commit date or no changes_requested — check raw count as fallback
BLOCKING_CHANGES_REQUESTED=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length 2>/dev/null || echo "0")
BLOCKING_REQUESTERS=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -r '[.[].author.login] | join(", ")' 2>/dev/null || echo "unknown")
fi
if [ "$BLOCKING_CHANGES_REQUESTED" -gt 0 ]; then
echo "NOT COMPLETE: CHANGES_REQUESTED (after latest commit) from ${BLOCKING_REQUESTERS} on PR #$PR_NUMBER" >&2
exit 1
fi

View File

@@ -1,317 +0,0 @@
"use client";
import { Button } from "@/components/atoms/Button/Button";
import { cn } from "@/lib/utils";
import {
ChatCircle,
PaperPlaneTilt,
SpinnerGap,
StopCircle,
X,
} from "@phosphor-icons/react";
import { KeyboardEvent, useEffect, useRef, useState } from "react";
import type { CustomNode } from "../FlowEditor/nodes/CustomNode/CustomNode";
import { GraphAction } from "./helpers";
import { useBuilderChatPanel } from "./useBuilderChatPanel";
interface Props {
className?: string;
isGraphLoaded?: boolean;
}
export function BuilderChatPanel({ className, isGraphLoaded }: Props) {
const {
isOpen,
handleToggle,
messages,
sendMessage,
stop,
status,
isCreatingSession,
sessionError,
sessionId,
nodes,
parsedActions,
handleApplyAction,
} = useBuilderChatPanel({ isGraphLoaded });
const [inputValue, setInputValue] = useState("");
const messagesEndRef = useRef<HTMLDivElement>(null);
const isStreaming = status === "streaming" || status === "submitted";
// Block input until the session is ready to prevent messages being sent
// before the seed context has been delivered to the AI.
const canSend =
Boolean(sessionId) && !isCreatingSession && !sessionError && !isStreaming;
// Scroll to bottom whenever a new message lands (AI response or user send)
useEffect(() => {
messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages.length]);
function handleSend() {
const text = inputValue.trim();
if (!text || !canSend) return;
setInputValue("");
sendMessage({ text });
setTimeout(() => {
messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
}, 50);
}
function handleKeyDown(e: KeyboardEvent<HTMLTextAreaElement>) {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
handleSend();
}
}
return (
<div
className={cn(
"pointer-events-none fixed bottom-4 right-4 z-50 flex flex-col items-end gap-2",
className,
)}
>
{isOpen && (
<div className="pointer-events-auto flex h-[70vh] w-96 flex-col overflow-hidden rounded-xl border border-slate-200 bg-white shadow-2xl">
<PanelHeader onClose={handleToggle} />
<MessageList
messages={messages}
isCreatingSession={isCreatingSession}
sessionError={sessionError}
nodes={nodes}
parsedActions={parsedActions}
onApplyAction={handleApplyAction}
messagesEndRef={messagesEndRef}
/>
<PanelInput
value={inputValue}
onChange={setInputValue}
onKeyDown={handleKeyDown}
onSend={handleSend}
onStop={stop}
isStreaming={isStreaming}
isDisabled={!canSend}
/>
</div>
)}
<button
onClick={handleToggle}
className={cn(
"pointer-events-auto flex h-12 w-12 items-center justify-center rounded-full shadow-lg transition-colors",
isOpen
? "bg-slate-800 text-white hover:bg-slate-700"
: "border border-slate-200 bg-white text-slate-700 hover:bg-slate-50",
)}
aria-label={isOpen ? "Close chat" : "Chat with builder"}
>
{isOpen ? <X size={20} /> : <ChatCircle size={22} weight="fill" />}
</button>
</div>
);
}
function PanelHeader({ onClose }: { onClose: () => void }) {
return (
<div className="flex items-center justify-between border-b border-slate-100 px-4 py-3">
<div className="flex items-center gap-2">
<ChatCircle size={18} weight="fill" className="text-violet-600" />
<span className="text-sm font-semibold text-slate-800">
Chat with Builder
</span>
</div>
<Button variant="icon" size="icon" onClick={onClose} aria-label="Close">
<X size={16} />
</Button>
</div>
);
}
interface MessageListProps {
messages: ReturnType<typeof useBuilderChatPanel>["messages"];
isCreatingSession: boolean;
sessionError: boolean;
nodes: CustomNode[];
parsedActions: GraphAction[];
onApplyAction: (action: GraphAction) => void;
messagesEndRef: React.RefObject<HTMLDivElement>;
}
function MessageList({
messages,
isCreatingSession,
sessionError,
nodes,
parsedActions,
onApplyAction,
messagesEndRef,
}: MessageListProps) {
return (
<div className="flex-1 space-y-3 overflow-y-auto p-4">
{isCreatingSession && (
<div className="flex items-center gap-2 text-xs text-slate-500">
<SpinnerGap size={14} className="animate-spin" />
<span>Setting up chat session</span>
</div>
)}
{sessionError && (
<div className="rounded-lg border border-red-100 bg-red-50 px-3 py-2 text-xs text-red-600">
Failed to start chat session. Please close and try again.
</div>
)}
{messages.map((msg) => {
const textParts = msg.parts
.filter(
(p): p is Extract<typeof p, { type: "text" }> => p.type === "text",
)
.map((p) => p.text)
.join("");
if (!textParts) return null;
return (
<div
key={msg.id}
className={cn(
"max-w-[85%] rounded-lg px-3 py-2 text-sm leading-relaxed",
msg.role === "user"
? "ml-auto bg-violet-600 text-white"
: "bg-slate-100 text-slate-800",
)}
>
{textParts}
</div>
);
})}
{parsedActions.length > 0 && (
<div className="space-y-2 rounded-lg border border-violet-100 bg-violet-50 p-3">
<p className="text-xs font-medium text-violet-700">
AI applied these changes
</p>
{parsedActions.map((action) => {
const key =
action.type === "update_node_input"
? `${action.nodeId}:${action.key}`
: `${action.source}:${action.sourceHandle}->${action.target}:${action.targetHandle}`;
return (
<ActionItem
key={key}
action={action}
nodes={nodes}
onApply={() => onApplyAction(action)}
/>
);
})}
</div>
)}
<div ref={messagesEndRef} />
</div>
);
}
function ActionItem({
action,
nodes,
onApply,
}: {
action: GraphAction;
nodes: CustomNode[];
onApply: () => void;
}) {
// The AI applies changes server-side via edit_agent; the canvas refreshes
// automatically via invalidateQueries. The button starts in the applied state
// to reflect that changes are already live — not pending user confirmation.
const [applied, setApplied] = useState(true);
function handleApply() {
onApply();
setApplied(true);
}
const nodeName = (id: string) =>
nodes.find((n) => n.id === id)?.data.title ?? id;
const label =
action.type === "update_node_input"
? `Set "${nodeName(action.nodeId)}" "${action.key}" = ${JSON.stringify(action.value)}`
: `Connect "${nodeName(action.source)}" → "${nodeName(action.target)}"`;
return (
<div className="flex items-start justify-between gap-2 rounded bg-white p-2 text-xs shadow-sm">
<span className="leading-tight text-slate-700">{label}</span>
<button
onClick={handleApply}
disabled={applied}
className={cn(
"shrink-0 rounded px-2 py-0.5 text-xs font-medium transition-colors",
applied
? "bg-green-100 text-green-700"
: "bg-violet-600 text-white hover:bg-violet-700",
)}
>
{applied ? "Applied" : "Apply"}
</button>
</div>
);
}
interface PanelInputProps {
value: string;
onChange: (v: string) => void;
onKeyDown: (e: KeyboardEvent<HTMLTextAreaElement>) => void;
onSend: () => void;
onStop: () => void;
isStreaming: boolean;
isDisabled: boolean;
}
function PanelInput({
value,
onChange,
onKeyDown,
onSend,
onStop,
isStreaming,
isDisabled,
}: PanelInputProps) {
return (
<div className="border-t border-slate-100 p-3">
<div className="flex items-end gap-2">
<textarea
value={value}
disabled={isDisabled}
onChange={(e) => onChange(e.target.value)}
onKeyDown={onKeyDown}
placeholder="Ask about your agent…"
rows={2}
className="flex-1 resize-none rounded-lg border border-slate-200 bg-slate-50 px-3 py-2 text-sm text-slate-800 placeholder:text-slate-400 focus:border-violet-400 focus:outline-none focus:ring-1 focus:ring-violet-200 disabled:opacity-50"
/>
{isStreaming ? (
<button
onClick={onStop}
className="flex h-9 w-9 items-center justify-center rounded-lg bg-red-100 text-red-600 transition-colors hover:bg-red-200"
aria-label="Stop"
>
<StopCircle size={18} />
</button>
) : (
<button
onClick={onSend}
disabled={isDisabled || !value.trim()}
className="flex h-9 w-9 items-center justify-center rounded-lg bg-violet-600 text-white transition-colors hover:bg-violet-700 disabled:opacity-40"
aria-label="Send"
>
<PaperPlaneTilt size={18} />
</button>
)}
</div>
</div>
);
}

View File

@@ -1,316 +0,0 @@
import {
render,
screen,
fireEvent,
cleanup,
} from "@/tests/integrations/test-utils";
import { describe, expect, it, vi, beforeEach, afterEach } from "vitest";
import { BuilderChatPanel } from "../BuilderChatPanel";
import { serializeGraphForChat, parseGraphActions } from "../helpers";
import type { CustomNode } from "../../FlowEditor/nodes/CustomNode/CustomNode";
import type { CustomEdge } from "../../FlowEditor/edges/CustomEdge";
// Mock the hook so we isolate the component rendering
vi.mock("../useBuilderChatPanel", () => ({
useBuilderChatPanel: vi.fn(),
}));
import { useBuilderChatPanel } from "../useBuilderChatPanel";
const mockUseBuilderChatPanel = vi.mocked(useBuilderChatPanel);
function makeMockHook(
overrides: Partial<ReturnType<typeof useBuilderChatPanel>> = {},
): ReturnType<typeof useBuilderChatPanel> {
return {
isOpen: false,
handleToggle: vi.fn(),
messages: [],
sendMessage: vi.fn(),
stop: vi.fn(),
status: "ready",
isCreatingSession: false,
sessionError: false,
sessionId: null,
nodes: [],
parsedActions: [],
handleApplyAction: vi.fn(),
...overrides,
};
}
beforeEach(() => {
mockUseBuilderChatPanel.mockReturnValue(makeMockHook());
});
afterEach(() => {
cleanup();
});
describe("BuilderChatPanel", () => {
it("renders the toggle button when closed", () => {
render(<BuilderChatPanel />);
expect(screen.getByLabelText("Chat with builder")).toBeDefined();
});
it("does not render the panel content when closed", () => {
render(<BuilderChatPanel />);
expect(screen.queryByText("Chat with Builder")).toBeNull();
});
it("calls handleToggle when the toggle button is clicked", () => {
const handleToggle = vi.fn();
mockUseBuilderChatPanel.mockReturnValue(makeMockHook({ handleToggle }));
render(<BuilderChatPanel />);
fireEvent.click(screen.getByLabelText("Chat with builder"));
expect(handleToggle).toHaveBeenCalledOnce();
});
it("renders the panel when isOpen is true", () => {
mockUseBuilderChatPanel.mockReturnValue(makeMockHook({ isOpen: true }));
render(<BuilderChatPanel />);
expect(screen.getByText("Chat with Builder")).toBeDefined();
});
it("shows creating session indicator when isCreatingSession is true", () => {
mockUseBuilderChatPanel.mockReturnValue(
makeMockHook({ isOpen: true, isCreatingSession: true }),
);
render(<BuilderChatPanel />);
expect(screen.getByText(/Setting up chat session/i)).toBeDefined();
});
it("renders user and assistant messages", () => {
mockUseBuilderChatPanel.mockReturnValue(
makeMockHook({
isOpen: true,
messages: [
{
id: "1",
role: "user",
parts: [{ type: "text", text: "What does this agent do?" }],
},
{
id: "2",
role: "assistant",
parts: [{ type: "text", text: "This agent searches the web." }],
},
] as ReturnType<typeof useBuilderChatPanel>["messages"],
}),
);
render(<BuilderChatPanel />);
expect(screen.getByText("What does this agent do?")).toBeDefined();
expect(screen.getByText("This agent searches the web.")).toBeDefined();
});
it("renders applied actions section when parsedActions are present", () => {
mockUseBuilderChatPanel.mockReturnValue(
makeMockHook({
isOpen: true,
parsedActions: [
{
type: "update_node_input",
nodeId: "1",
key: "query",
value: "AI news",
},
],
}),
);
render(<BuilderChatPanel />);
expect(screen.getByText("AI applied these changes")).toBeDefined();
expect(screen.getByText("Applied")).toBeDefined();
});
it("shows pre-applied actions as disabled", () => {
const action = {
type: "update_node_input" as const,
nodeId: "1",
key: "query",
value: "AI news",
};
mockUseBuilderChatPanel.mockReturnValue(
makeMockHook({
isOpen: true,
parsedActions: [action],
}),
);
render(<BuilderChatPanel />);
const button = screen.getByRole("button", {
name: "Applied",
}) as HTMLButtonElement;
expect(button.disabled).toBe(true);
});
it("calls sendMessage when the user submits a message", () => {
const sendMessage = vi.fn();
mockUseBuilderChatPanel.mockReturnValue(
makeMockHook({ isOpen: true, sessionId: "sess-1", sendMessage }),
);
render(<BuilderChatPanel />);
const textarea = screen.getByPlaceholderText("Ask about your agent…");
fireEvent.change(textarea, { target: { value: "Add a summarizer block" } });
fireEvent.click(screen.getByLabelText("Send"));
expect(sendMessage).toHaveBeenCalledWith({
text: "Add a summarizer block",
});
});
it("shows Stop button when streaming", () => {
const stop = vi.fn();
mockUseBuilderChatPanel.mockReturnValue(
makeMockHook({ isOpen: true, status: "streaming", stop }),
);
render(<BuilderChatPanel />);
expect(screen.getByLabelText("Stop")).toBeDefined();
fireEvent.click(screen.getByLabelText("Stop"));
expect(stop).toHaveBeenCalledOnce();
});
});
describe("serializeGraphForChat", () => {
it("returns empty message when no nodes", () => {
const result = serializeGraphForChat([], []);
expect(result).toBe("The graph is currently empty.");
});
it("lists block names and descriptions", () => {
const nodes = [
{
id: "1",
data: {
title: "Google Search",
description: "Searches the web",
hardcodedValues: {},
inputSchema: {},
outputSchema: {},
uiType: 1,
block_id: "block-1",
costs: [],
categories: [],
},
type: "custom" as const,
position: { x: 0, y: 0 },
},
] as unknown as CustomNode[];
const result = serializeGraphForChat(nodes, []);
expect(result).toContain('"Google Search"');
expect(result).toContain("Searches the web");
});
it("lists connections between nodes", () => {
const nodes = [
{
id: "1",
data: {
title: "Search",
description: "",
hardcodedValues: {},
inputSchema: {},
outputSchema: {},
uiType: 1,
block_id: "b1",
costs: [],
categories: [],
},
type: "custom" as const,
position: { x: 0, y: 0 },
},
{
id: "2",
data: {
title: "Formatter",
description: "",
hardcodedValues: {},
inputSchema: {},
outputSchema: {},
uiType: 1,
block_id: "b2",
costs: [],
categories: [],
},
type: "custom" as const,
position: { x: 200, y: 0 },
},
] as unknown as CustomNode[];
const edges = [
{
id: "1:result->2:input",
source: "1",
target: "2",
sourceHandle: "result",
targetHandle: "input",
type: "custom" as const,
},
] as unknown as CustomEdge[];
const result = serializeGraphForChat(nodes, edges);
expect(result).toContain("Connections");
expect(result).toContain('"Search"');
expect(result).toContain('"Formatter"');
});
});
describe("parseGraphActions", () => {
it("returns empty array for plain text", () => {
expect(parseGraphActions("This agent searches the web.")).toEqual([]);
});
it("parses update_node_input action", () => {
const text = `
Here is a suggestion:
\`\`\`json
{"action": "update_node_input", "node_id": "1", "key": "query", "value": "AI news"}
\`\`\`
`;
const actions = parseGraphActions(text);
expect(actions).toHaveLength(1);
expect(actions[0]).toEqual({
type: "update_node_input",
nodeId: "1",
key: "query",
value: "AI news",
});
});
it("parses connect_nodes action", () => {
const text = `
\`\`\`json
{"action": "connect_nodes", "source": "1", "target": "2", "source_handle": "result", "target_handle": "input"}
\`\`\`
`;
const actions = parseGraphActions(text);
expect(actions).toHaveLength(1);
expect(actions[0]).toEqual({
type: "connect_nodes",
source: "1",
target: "2",
sourceHandle: "result",
targetHandle: "input",
});
});
it("ignores invalid JSON blocks", () => {
const text = "```json\nnot valid json\n```";
expect(parseGraphActions(text)).toEqual([]);
});
it("ignores blocks without action field", () => {
const text = '```json\n{"key": "value"}\n```';
expect(parseGraphActions(text)).toEqual([]);
});
it("ignores update_node_input actions with missing required fields", () => {
const text =
'```json\n{"action": "update_node_input", "node_id": "1"}\n```';
expect(parseGraphActions(text)).toEqual([]);
});
it("ignores connect_nodes actions with empty handles", () => {
const text =
'```json\n{"action": "connect_nodes", "source": "1", "target": "2", "source_handle": "", "target_handle": "input"}\n```';
expect(parseGraphActions(text)).toEqual([]);
});
});

View File

@@ -1,110 +0,0 @@
import type { CustomNode } from "../FlowEditor/nodes/CustomNode/CustomNode";
import type { CustomEdge } from "../FlowEditor/edges/CustomEdge";
export type GraphAction =
| {
type: "update_node_input";
nodeId: string;
key: string;
value: unknown;
}
| {
type: "connect_nodes";
source: string;
target: string;
sourceHandle: string;
targetHandle: string;
};
export function serializeGraphForChat(
nodes: CustomNode[],
edges: CustomEdge[],
): string {
if (nodes.length === 0) return "The graph is currently empty.";
const nodeLines = nodes.map((n) => {
const name = n.data.metadata?.customized_name || n.data.title;
const desc = n.data.description ? `${n.data.description}` : "";
return `- Node ${n.id}: "${name}"${desc}`;
});
const edgeLines = edges.map((e) => {
const src = nodes.find((n) => n.id === e.source);
const tgt = nodes.find((n) => n.id === e.target);
const srcName =
src?.data.metadata?.customized_name || src?.data.title || e.source;
const tgtName =
tgt?.data.metadata?.customized_name || tgt?.data.title || e.target;
return `- "${srcName}" (${e.sourceHandle}) → "${tgtName}" (${e.targetHandle})`;
});
const parts = [`Blocks (${nodes.length}):\n${nodeLines.join("\n")}`];
if (edgeLines.length > 0) {
parts.push(`Connections (${edges.length}):\n${edgeLines.join("\n")}`);
}
return parts.join("\n\n");
}
export function parseGraphActions(text: string): GraphAction[] {
const actions: GraphAction[] = [];
const jsonBlockRegex = /```(?:json)?\s*\n?([\s\S]*?)\n?```/g;
let match: RegExpExecArray | null;
while ((match = jsonBlockRegex.exec(text)) !== null) {
try {
const parsed = JSON.parse(match[1]) as unknown;
if (
typeof parsed !== "object" ||
parsed === null ||
!("action" in parsed)
) {
continue;
}
const obj = parsed as Record<string, unknown>;
if (obj.action === "update_node_input") {
const nodeId = obj.node_id;
const key = obj.key;
if (
typeof nodeId !== "string" ||
!nodeId ||
typeof key !== "string" ||
!key ||
obj.value === undefined
)
continue;
actions.push({
type: "update_node_input",
nodeId,
key,
value: obj.value,
});
} else if (obj.action === "connect_nodes") {
const source = obj.source;
const target = obj.target;
const sourceHandle = obj.source_handle;
const targetHandle = obj.target_handle;
if (
typeof source !== "string" ||
!source ||
typeof target !== "string" ||
!target ||
typeof sourceHandle !== "string" ||
!sourceHandle ||
typeof targetHandle !== "string" ||
!targetHandle
)
continue;
actions.push({
type: "connect_nodes",
source,
target,
sourceHandle,
targetHandle,
});
}
} catch {
// Not valid JSON, skip
}
}
return actions;
}

View File

@@ -1,222 +0,0 @@
import { postV2CreateSession } from "@/app/api/__generated__/endpoints/chat/chat";
import { getGetV1GetSpecificGraphQueryKey } from "@/app/api/__generated__/endpoints/graphs/graphs";
import { getWebSocketToken } from "@/lib/supabase/actions";
import { environment } from "@/services/environment";
import { useQueryClient } from "@tanstack/react-query";
import { useChat } from "@ai-sdk/react";
import { DefaultChatTransport } from "ai";
import { useEffect, useMemo, useRef, useState } from "react";
import { parseAsString, useQueryStates } from "nuqs";
import { useShallow } from "zustand/react/shallow";
import { useEdgeStore } from "../../stores/edgeStore";
import { useNodeStore } from "../../stores/nodeStore";
import {
GraphAction,
parseGraphActions,
serializeGraphForChat,
} from "./helpers";
type SendMessageFn = ReturnType<typeof useChat>["sendMessage"];
interface UseBuilderChatPanelArgs {
isGraphLoaded?: boolean;
}
export function useBuilderChatPanel({
isGraphLoaded = true,
}: UseBuilderChatPanelArgs = {}) {
const [isOpen, setIsOpen] = useState(false);
const [sessionId, setSessionId] = useState<string | null>(null);
const [isCreatingSession, setIsCreatingSession] = useState(false);
const [sessionError, setSessionError] = useState(false);
const initializedRef = useRef(false);
const sendMessageRef = useRef<SendMessageFn | null>(null);
const prevStatusRef = useRef<string>("ready");
const [{ flowID }] = useQueryStates({ flowID: parseAsString });
const queryClient = useQueryClient();
const nodes = useNodeStore(useShallow((s) => s.nodes));
const edges = useEdgeStore(useShallow((s) => s.edges));
const updateNodeData = useNodeStore(useShallow((s) => s.updateNodeData));
const addEdge = useEdgeStore(useShallow((s) => s.addEdge));
// Reset session and initialized state when the user navigates to a different
// graph so the new graph's context is sent to the AI on next open.
useEffect(() => {
setSessionId(null);
setSessionError(false);
initializedRef.current = false;
}, [flowID]);
useEffect(() => {
if (!isOpen || sessionId || isCreatingSession || sessionError) return;
async function createSession() {
setIsCreatingSession(true);
try {
const res = await postV2CreateSession(null);
if (res.status === 200) {
setSessionId(res.data.id);
} else {
setSessionError(true);
}
} catch {
setSessionError(true);
} finally {
setIsCreatingSession(false);
}
}
createSession();
}, [isOpen, sessionId, isCreatingSession, sessionError]);
const transport = useMemo(
() =>
sessionId
? new DefaultChatTransport({
api: `${environment.getAGPTServerBaseUrl()}/api/chat/sessions/${sessionId}/stream`,
prepareSendMessagesRequest: async ({ messages }) => {
const last = messages[messages.length - 1];
const { token, error } = await getWebSocketToken();
if (error || !token)
throw new Error(
"Authentication failed — please sign in again.",
);
const messageText =
last.parts
?.map((p) => (p.type === "text" ? p.text : ""))
.join("") ?? "";
return {
body: {
message: messageText,
is_user_message: last.role === "user",
context: null,
file_ids: null,
mode: null,
},
headers: { Authorization: `Bearer ${token}` },
};
},
})
: null,
[sessionId],
);
const { messages, sendMessage, stop, status } = useChat({
id: sessionId ?? undefined,
transport: transport ?? undefined,
});
// Keep a stable ref so the initialization effect can call sendMessage
// without including it in the deps array (avoids re-triggering the effect)
sendMessageRef.current = sendMessage;
// Parsed actions from the last assistant message. Placed before the
// invalidation effect so the effect can check whether a turn mutated the graph.
const parsedActions = useMemo(() => {
const assistantMessages = messages.filter((m) => m.role === "assistant");
const last = assistantMessages[assistantMessages.length - 1];
if (!last) return [];
const text = last.parts
.filter(
(p): p is Extract<typeof p, { type: "text" }> => p.type === "text",
)
.map((p) => p.text)
.join("");
const parsed = parseGraphActions(text);
const seen = new Set<string>();
return parsed.filter((action) => {
const key =
action.type === "update_node_input"
? `${action.nodeId}:${action.key}`
: `${action.source}:${action.sourceHandle}->${action.target}:${action.targetHandle}`;
if (seen.has(key)) return false;
seen.add(key);
return true;
});
}, [messages]);
// Refresh the canvas only when the AI turn actually mutated the graph via
// edit_agent. Gating on parsedActions.length > 0 avoids an unnecessary
// refetch after read-only turns (e.g. the initial description response).
useEffect(() => {
const prev = prevStatusRef.current;
prevStatusRef.current = status;
if (
status === "ready" &&
(prev === "streaming" || prev === "submitted") &&
flowID &&
parsedActions.length > 0
) {
queryClient.invalidateQueries({
queryKey: getGetV1GetSpecificGraphQueryKey(flowID),
});
}
}, [status, flowID, queryClient, parsedActions.length]);
useEffect(() => {
if (!sessionId || !transport || !isGraphLoaded || initializedRef.current)
return;
initializedRef.current = true;
const summary = serializeGraphForChat(nodes, edges);
sendMessageRef.current?.({
text:
`I'm building an agent in the AutoGPT flow builder. Here's the current graph:\n\n${summary}\n\n` +
`IMPORTANT: When you modify the graph using edit_agent or fix_agent_graph, you MUST output one JSON ` +
`code block per change using EXACTLY these formats — no other structure is recognized:\n\n` +
`To update a node input field:\n` +
`\`\`\`json\n{"action": "update_node_input", "node_id": "<exact node id>", "key": "<input field name>", "value": <new value>}\n\`\`\`\n\n` +
`To add a connection between nodes:\n` +
`\`\`\`json\n{"action": "connect_nodes", "source": "<source node id>", "target": "<target node id>", "source_handle": "<output handle name>", "target_handle": "<input handle name>"}\n\`\`\`\n\n` +
`Rules: the "action" key is required and must be exactly "update_node_input" or "connect_nodes". ` +
`Do not use any other field names (e.g. "block", "change", "field", "from", "to" are NOT valid).\n\n` +
`What does this agent do?`,
});
}, [sessionId, transport, isGraphLoaded]);
function handleToggle() {
// Reset session error when reopening so the panel can retry session creation
if (!isOpen && !sessionId) {
setSessionError(false);
}
setIsOpen((o) => !o);
}
function handleApplyAction(action: GraphAction) {
if (action.type === "update_node_input") {
const node = nodes.find((n) => n.id === action.nodeId);
if (!node) return;
updateNodeData(action.nodeId, {
hardcodedValues: {
...node.data.hardcodedValues,
[action.key]: action.value,
},
});
} else if (action.type === "connect_nodes") {
addEdge({
id: `${action.source}:${action.sourceHandle}->${action.target}:${action.targetHandle}`,
source: action.source,
target: action.target,
sourceHandle: action.sourceHandle,
targetHandle: action.targetHandle,
type: "custom",
});
}
}
return {
isOpen,
handleToggle,
messages,
sendMessage,
stop,
status,
isCreatingSession,
sessionError,
sessionId,
nodes,
parsedActions,
handleApplyAction,
};
}

View File

@@ -1,8 +1,6 @@
import { useGetV1GetSpecificGraph } from "@/app/api/__generated__/endpoints/graphs/graphs";
import { okData } from "@/app/api/helpers";
import { FloatingReviewsPanel } from "@/components/organisms/FloatingReviewsPanel/FloatingReviewsPanel";
import { BuilderChatPanel } from "../../BuilderChatPanel/BuilderChatPanel";
import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
import { Background, ReactFlow } from "@xyflow/react";
import { parseAsString, useQueryStates } from "nuqs";
import { useCallback, useMemo } from "react";
@@ -92,8 +90,6 @@ export const Flow = () => {
useShallow((state) => state.isGraphRunning),
);
const isBuilderChatEnabled = useGetFlag(Flag.BUILDER_CHAT_PANEL);
return (
<div className="flex h-full w-full dark:bg-slate-900">
<div className="relative flex-1">
@@ -138,9 +134,6 @@ export const Flow = () => {
executionId={flowExecutionID || undefined}
graphId={flowID || undefined}
/>
{isBuilderChatEnabled && (
<BuilderChatPanel isGraphLoaded={isInitialLoadComplete} />
)}
</div>
);
};

View File

@@ -10,7 +10,6 @@ export enum Flag {
ENABLE_PLATFORM_PAYMENT = "enable-platform-payment",
ARTIFACTS = "artifacts",
CHAT_MODE_OPTION = "chat-mode-option",
BUILDER_CHAT_PANEL = "builder-chat-panel",
}
const isPwMockEnabled = process.env.NEXT_PUBLIC_PW_TEST === "true";
@@ -21,7 +20,6 @@ const defaultFlags = {
[Flag.ENABLE_PLATFORM_PAYMENT]: false,
[Flag.ARTIFACTS]: false,
[Flag.CHAT_MODE_OPTION]: false,
[Flag.BUILDER_CHAT_PANEL]: false,
};
type FlagValues = typeof defaultFlags;