Compare commits
35 Commits
dev
...
test-scree
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
86906ced40 | ||
|
|
a33ae7b24b | ||
|
|
42f072fb2a | ||
|
|
1a52b0d02c | ||
|
|
b101069eaf | ||
|
|
de094eee36 | ||
|
|
bddc633a11 | ||
|
|
2411cc386d | ||
|
|
49bef40ef0 | ||
|
|
eeb2f08d6d | ||
|
|
eda02f9ce6 | ||
|
|
2a969e5018 | ||
|
|
a68f48e6b7 | ||
|
|
2bf5a37646 | ||
|
|
289a19d402 | ||
|
|
e57e48272a | ||
|
|
c2f421cb42 | ||
|
|
e3d589b180 | ||
|
|
8de935c84b | ||
|
|
a55653f8c1 | ||
|
|
3e6faf2de7 | ||
|
|
22e8c5c353 | ||
|
|
b3d9e9e856 | ||
|
|
32bfe1b209 | ||
|
|
b220fe4347 | ||
|
|
61513b9dad | ||
|
|
e753aee7a0 | ||
|
|
3f24a003ad | ||
|
|
a369fbe169 | ||
|
|
d3173605eb | ||
|
|
98c27653f2 | ||
|
|
dced534df3 | ||
|
|
4ebe294707 | ||
|
|
2e8e115cd1 | ||
|
|
5ca49a8ec9 |
@@ -1,545 +0,0 @@
|
||||
---
|
||||
name: orchestrate
|
||||
description: "Meta-agent supervisor that manages a fleet of Claude Code agents running in tmux windows. Auto-discovers spare worktrees, spawns agents, monitors state, kicks idle agents, approves safe confirmations, and recycles worktrees when done. TRIGGER when user asks to supervise agents, run parallel tasks, manage worktrees, check agent status, or orchestrate parallel work."
|
||||
user-invocable: true
|
||||
argument-hint: "any free text — e.g. 'start 3 agents on X Y Z', 'show status', 'add task: implement feature A', 'stop', 'how many are free?'"
|
||||
metadata:
|
||||
author: autogpt-team
|
||||
version: "6.0.0"
|
||||
---
|
||||
|
||||
# Orchestrate — Agent Fleet Supervisor
|
||||
|
||||
One tmux session, N windows — each window is one agent working in its own worktree. Speak naturally; Claude maps your intent to the right scripts.
|
||||
|
||||
## Scripts
|
||||
|
||||
```bash
|
||||
SKILLS_DIR=$(git rev-parse --show-toplevel)/.claude/skills/orchestrate/scripts
|
||||
STATE_FILE=~/.claude/orchestrator-state.json
|
||||
```
|
||||
|
||||
| Script | Purpose |
|
||||
|---|---|
|
||||
| `find-spare.sh [REPO_ROOT]` | List free worktrees — one `PATH BRANCH` per line |
|
||||
| `spawn-agent.sh SESSION PATH SPARE NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]` | Create window + checkout branch + launch claude + send task. **Stdout: `SESSION:WIN` only** |
|
||||
| `recycle-agent.sh WINDOW PATH SPARE_BRANCH` | Kill window + restore spare branch |
|
||||
| `run-loop.sh` | **Mechanical babysitter** — idle restart + dialog approval + recycle on ORCHESTRATOR:DONE + supervisor health check + all-done notification |
|
||||
| `verify-complete.sh WINDOW` | Verify PR is done: checkpoints ✓ + 0 unresolved threads + CI green + no fresh CHANGES_REQUESTED. Repo auto-derived from state file `.repo` or git remote. |
|
||||
| `notify.sh MESSAGE` | Send notification via Discord webhook (env `DISCORD_WEBHOOK_URL` or state `.discord_webhook`), macOS notification center, and stdout |
|
||||
| `capacity.sh [REPO_ROOT]` | Print available + in-use worktrees |
|
||||
| `status.sh` | Print fleet status + live pane commands |
|
||||
| `poll-cycle.sh` | One monitoring cycle — classifies panes, tracks checkpoints, returns JSON action array |
|
||||
| `classify-pane.sh WINDOW` | Classify one pane state |
|
||||
|
||||
## Supervision model
|
||||
|
||||
```
|
||||
Orchestrating Claude (this Claude session — IS the supervisor)
|
||||
└── Reads pane output, checks CI, intervenes with targeted guidance
|
||||
run-loop.sh (separate tmux window, every 30s)
|
||||
└── Mechanical only: idle restart, dialog approval, recycle on ORCHESTRATOR:DONE
|
||||
```
|
||||
|
||||
**You (the orchestrating Claude)** are the supervisor. After spawning agents, stay in this conversation and actively monitor: poll each agent's pane every 2-3 minutes, check CI, nudge stalled agents, and verify completions. Do not spawn a separate supervisor Claude window — it loses context, is hard to observe, and compounds context compression problems.
|
||||
|
||||
**run-loop.sh** is the mechanical layer — zero tokens, handles things that need no judgment: restart crashed agents, press Enter on dialogs, recycle completed worktrees (only after `verify-complete.sh` passes).
|
||||
|
||||
## Checkpoint protocol
|
||||
|
||||
Agents output checkpoints as they complete each required step:
|
||||
|
||||
```
|
||||
CHECKPOINT:<step-name>
|
||||
```
|
||||
|
||||
Required steps are passed as args to `spawn-agent.sh` (e.g. `pr-address pr-test`). `run-loop.sh` will not recycle a window until all required checkpoints are found in the pane output. If `verify-complete.sh` fails, the agent is re-briefed automatically.
|
||||
|
||||
## Worktree lifecycle
|
||||
|
||||
```text
|
||||
spare/N branch → spawn-agent.sh (--session-id UUID) → window + feat/branch + claude running
|
||||
↓
|
||||
CHECKPOINT:<step> (as steps complete)
|
||||
↓
|
||||
ORCHESTRATOR:DONE
|
||||
↓
|
||||
verify-complete.sh: checkpoints ✓ + 0 threads + CI green + no fresh CHANGES_REQUESTED
|
||||
↓
|
||||
state → "done", notify, window KEPT OPEN
|
||||
↓
|
||||
user/orchestrator explicitly requests recycle
|
||||
↓
|
||||
recycle-agent.sh → spare/N (free again)
|
||||
```
|
||||
|
||||
**Windows are never auto-killed.** The worktree stays on its branch, the session stays alive. The agent is done working but the window, git state, and Claude session are all preserved until you choose to recycle.
|
||||
|
||||
**To resume a done or crashed session:**
|
||||
```bash
|
||||
# Resume by stored session ID (preferred — exact session, full context)
|
||||
claude --resume SESSION_ID --permission-mode bypassPermissions
|
||||
|
||||
# Or resume most recent session in that worktree directory
|
||||
cd /path/to/worktree && claude --continue --permission-mode bypassPermissions
|
||||
```
|
||||
|
||||
**To manually recycle when ready:**
|
||||
```bash
|
||||
bash ~/.claude/orchestrator/scripts/recycle-agent.sh SESSION:WIN WORKTREE_PATH spare/N
|
||||
# Then update state:
|
||||
jq --arg w "SESSION:WIN" '.agents |= map(if .window == $w then .state = "recycled" else . end)' \
|
||||
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
|
||||
```
|
||||
|
||||
## State file (`~/.claude/orchestrator-state.json`)
|
||||
|
||||
Never committed to git. You maintain this file directly using `jq` + atomic writes (`.tmp` → `mv`).
|
||||
|
||||
```json
|
||||
{
|
||||
"active": true,
|
||||
"tmux_session": "autogpt1",
|
||||
"idle_threshold_seconds": 300,
|
||||
"loop_window": "autogpt1:5",
|
||||
"repo": "Significant-Gravitas/AutoGPT",
|
||||
"discord_webhook": "https://discord.com/api/webhooks/...",
|
||||
"last_poll_at": 0,
|
||||
"agents": [
|
||||
{
|
||||
"window": "autogpt1:3",
|
||||
"worktree": "AutoGPT6",
|
||||
"worktree_path": "/path/to/AutoGPT6",
|
||||
"spare_branch": "spare/6",
|
||||
"branch": "feat/my-feature",
|
||||
"objective": "Implement X and open a PR",
|
||||
"pr_number": "12345",
|
||||
"session_id": "550e8400-e29b-41d4-a716-446655440000",
|
||||
"steps": ["pr-address", "pr-test"],
|
||||
"checkpoints": ["pr-address"],
|
||||
"state": "running",
|
||||
"last_output_hash": "",
|
||||
"last_seen_at": 0,
|
||||
"spawned_at": 0,
|
||||
"idle_since": 0,
|
||||
"revision_count": 0,
|
||||
"last_rebriefed_at": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Top-level optional fields:
|
||||
- `repo` — GitHub `owner/repo` for CI/thread checks. Auto-derived from git remote if omitted.
|
||||
- `discord_webhook` — Discord webhook URL for completion notifications. Also reads `DISCORD_WEBHOOK_URL` env var.
|
||||
|
||||
Per-agent fields:
|
||||
- `session_id` — UUID passed to `claude --session-id` at spawn; use with `claude --resume UUID` to restore exact session context after a crash or window close.
|
||||
- `last_rebriefed_at` — Unix timestamp of last re-brief; enforces 5-min cooldown to prevent spam.
|
||||
|
||||
Agent states: `running` | `idle` | `stuck` | `waiting_approval` | `complete` | `done` | `escalated`
|
||||
|
||||
`done` means verified complete — window is still open, session still alive, worktree still on task branch. Not recycled yet.
|
||||
|
||||
## Serial /pr-test rule
|
||||
|
||||
`/pr-test` and `/pr-test --fix` run local Docker + integration tests that use shared ports, a shared database, and shared build caches. **Running two `/pr-test` jobs simultaneously will cause port conflicts and database corruption.**
|
||||
|
||||
**Rule: only one `/pr-test` runs at a time. The orchestrator serializes them.**
|
||||
|
||||
You (the orchestrating Claude) own the test queue:
|
||||
1. Agents do `pr-review` and `pr-address` in parallel — that's safe (they only push code and reply to GitHub).
|
||||
2. When a PR needs local testing, add it to your mental queue — don't give agents a `pr-test` step.
|
||||
3. Run `/pr-test https://github.com/OWNER/REPO/pull/PR_NUMBER --fix` yourself, sequentially.
|
||||
4. Feed results back to the relevant agent via `tmux send-keys`:
|
||||
```bash
|
||||
tmux send-keys -t SESSION:WIN "Local tests for PR #N: <paste failure output or 'all passed'>. Fix any failures and push, then output ORCHESTRATOR:DONE."
|
||||
sleep 0.3
|
||||
tmux send-keys -t SESSION:WIN Enter
|
||||
```
|
||||
5. Wait for CI to confirm green before marking the agent done.
|
||||
|
||||
If multiple PRs need testing at the same time, pick the one furthest along (fewest pending CI checks) and test it first. Only start the next test after the previous one completes.
|
||||
|
||||
## Session restore (tested and confirmed)
|
||||
|
||||
Agent sessions are saved to disk. To restore a closed or crashed session:
|
||||
|
||||
```bash
|
||||
# If session_id is in state (preferred):
|
||||
NEW_WIN=$(tmux new-window -t SESSION -n WORKTREE_NAME -P -F '#{window_index}')
|
||||
tmux send-keys -t "SESSION:${NEW_WIN}" "cd /path/to/worktree && claude --resume SESSION_ID --permission-mode bypassPermissions" Enter
|
||||
|
||||
# If no session_id (use --continue for most recent session in that directory):
|
||||
tmux send-keys -t "SESSION:${NEW_WIN}" "cd /path/to/worktree && claude --continue --permission-mode bypassPermissions" Enter
|
||||
```
|
||||
|
||||
`--continue` restores the full conversation history including all tool calls, file edits, and context. The agent resumes exactly where it left off. After restoring, update the window address in the state file:
|
||||
|
||||
```bash
|
||||
jq --arg old "SESSION:OLD_WIN" --arg new "SESSION:NEW_WIN" \
|
||||
'(.agents[] | select(.window == $old)).window = $new' \
|
||||
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
|
||||
```
|
||||
|
||||
## Intent → action mapping
|
||||
|
||||
Match the user's message to one of these intents:
|
||||
|
||||
| The user says something like… | What to do |
|
||||
|---|---|
|
||||
| "status", "what's running", "show agents" | Run `status.sh` + `capacity.sh`, show output |
|
||||
| "how many free", "capacity", "available worktrees" | Run `capacity.sh`, show output |
|
||||
| "start N agents on X, Y, Z" or "run these tasks: …" | See **Spawning agents** below |
|
||||
| "add task: …", "add one more agent for …" | See **Adding an agent** below |
|
||||
| "stop", "shut down", "pause the fleet" | See **Stopping** below |
|
||||
| "poll", "check now", "run a cycle" | Run `poll-cycle.sh`, process actions |
|
||||
| "recycle window X", "free up autogpt3" | Run `recycle-agent.sh` directly |
|
||||
|
||||
When the intent is ambiguous, show capacity first and ask what tasks to run.
|
||||
|
||||
## Spawning agents
|
||||
|
||||
### 1. Resolve tmux session
|
||||
|
||||
```bash
|
||||
tmux list-sessions -F "#{session_name}: #{session_windows} windows" 2>/dev/null
|
||||
```
|
||||
|
||||
Use an existing session. **Never create a tmux session from within Claude** — it becomes a child of Claude's process and dies when the session ends. If no session exists, tell the user to run `tmux new-session -d -s autogpt1` in their terminal first, then re-invoke `/orchestrate`.
|
||||
|
||||
### 2. Show available capacity
|
||||
|
||||
```bash
|
||||
bash $SKILLS_DIR/capacity.sh $(git rev-parse --show-toplevel)
|
||||
```
|
||||
|
||||
### 3. Collect tasks from the user
|
||||
|
||||
For each task, gather:
|
||||
- **objective** — what to do (e.g. "implement feature X and open a PR")
|
||||
- **branch name** — e.g. `feat/my-feature` (derive from objective if not given)
|
||||
- **pr_number** — GitHub PR number if working on an existing PR (for verification)
|
||||
- **steps** — required checkpoint names in order (e.g. `pr-address pr-test`) — derive from objective
|
||||
|
||||
Ask for `idle_threshold_seconds` only if the user mentions it (default: 300).
|
||||
|
||||
Never ask the user to specify a worktree — auto-assign from `find-spare.sh`.
|
||||
|
||||
### 4. Spawn one agent per task
|
||||
|
||||
```bash
|
||||
# Get ordered list of spare worktrees
|
||||
SPARE_LIST=$(bash $SKILLS_DIR/find-spare.sh $(git rev-parse --show-toplevel))
|
||||
|
||||
# For each task, take the next spare line:
|
||||
WORKTREE_PATH=$(echo "$SPARE_LINE" | awk '{print $1}')
|
||||
SPARE_BRANCH=$(echo "$SPARE_LINE" | awk '{print $2}')
|
||||
|
||||
# With PR number and required steps:
|
||||
WINDOW=$(bash $SKILLS_DIR/spawn-agent.sh "$SESSION" "$WORKTREE_PATH" "$SPARE_BRANCH" "$NEW_BRANCH" "$OBJECTIVE" "$PR_NUMBER" "pr-address" "pr-test")
|
||||
|
||||
# Without PR (new work):
|
||||
WINDOW=$(bash $SKILLS_DIR/spawn-agent.sh "$SESSION" "$WORKTREE_PATH" "$SPARE_BRANCH" "$NEW_BRANCH" "$OBJECTIVE")
|
||||
```
|
||||
|
||||
Build an agent record and append it to the state file. If the state file doesn't exist yet, initialize it:
|
||||
|
||||
```bash
|
||||
# Derive repo from git remote (used by verify-complete.sh + supervisor)
|
||||
REPO=$(git remote get-url origin 2>/dev/null | sed 's|.*github\.com[:/]||; s|\.git$||' || echo "")
|
||||
|
||||
jq -n \
|
||||
--arg session "$SESSION" \
|
||||
--arg repo "$REPO" \
|
||||
--argjson threshold 300 \
|
||||
'{active:true, tmux_session:$session, idle_threshold_seconds:$threshold,
|
||||
repo:$repo, loop_window:null, supervisor_window:null, last_poll_at:0, agents:[]}' \
|
||||
> ~/.claude/orchestrator-state.json
|
||||
```
|
||||
|
||||
Optionally add a Discord webhook for completion notifications:
|
||||
```bash
|
||||
jq --arg hook "$DISCORD_WEBHOOK_URL" '.discord_webhook = $hook' ~/.claude/orchestrator-state.json \
|
||||
> /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
|
||||
```
|
||||
|
||||
`spawn-agent.sh` writes the initial agent record (window, worktree_path, branch, objective, state, etc.) to the state file automatically — **do not append the record again after calling it.** The record already exists and `pr_number`/`steps` are patched in by the script itself.
|
||||
|
||||
### 5. Start the mechanical babysitter
|
||||
|
||||
```bash
|
||||
LOOP_WIN=$(tmux new-window -t "$SESSION" -n "orchestrator" -P -F '#{window_index}')
|
||||
LOOP_WINDOW="${SESSION}:${LOOP_WIN}"
|
||||
tmux send-keys -t "$LOOP_WINDOW" "bash $SKILLS_DIR/run-loop.sh" Enter
|
||||
|
||||
jq --arg w "$LOOP_WINDOW" '.loop_window = $w' ~/.claude/orchestrator-state.json \
|
||||
> /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
|
||||
```
|
||||
|
||||
### 6. Begin supervising directly in this conversation
|
||||
|
||||
You are the supervisor. After spawning, immediately start your first poll loop (see **Supervisor duties** below) and continue every 2-3 minutes. Do NOT spawn a separate supervisor Claude window.
|
||||
|
||||
## Adding an agent
|
||||
|
||||
Find the next spare worktree, then spawn and append to state — same as steps 2–4 above but for a single task. If no spare worktrees are available, tell the user.
|
||||
|
||||
## Supervisor duties (YOUR job, every 2-3 min in this conversation)
|
||||
|
||||
You are the supervisor. Run this poll loop directly in your Claude session — not in a separate window.
|
||||
|
||||
### Poll loop mechanism
|
||||
|
||||
You are reactive — you only act when a tool completes or the user sends a message. To create a self-sustaining poll loop without user involvement:
|
||||
|
||||
1. Start each poll with `run_in_background: true` + a sleep before the work:
|
||||
```bash
|
||||
sleep 120 && tmux capture-pane -t autogpt1:0 -p -S -200 | tail -40
|
||||
# + similar for each active window
|
||||
```
|
||||
2. When the background job notifies you, read the pane output and take action.
|
||||
3. Immediately schedule the next background poll — this keeps the loop alive.
|
||||
4. Stop scheduling when all agents are done/escalated.
|
||||
|
||||
**Never tell the user "I'll poll every 2-3 minutes"** — that does nothing without a trigger. Start the background job instead.
|
||||
|
||||
### Each poll: what to check
|
||||
|
||||
```bash
|
||||
# 1. Read state
|
||||
cat ~/.claude/orchestrator-state.json | jq '.agents[] | {window, worktree, branch, state, pr_number, checkpoints}'
|
||||
|
||||
# 2. For each running/stuck/idle agent, capture pane
|
||||
tmux capture-pane -t SESSION:WIN -p -S -200 | tail -60
|
||||
```
|
||||
|
||||
For each agent, decide:
|
||||
|
||||
| What you see | Action |
|
||||
|---|---|
|
||||
| Spinner / tools running | Do nothing — agent is working |
|
||||
| Idle `❯` prompt, no `ORCHESTRATOR:DONE` | Stalled — send specific nudge with objective from state |
|
||||
| Stuck in error loop | Send targeted fix with exact error + solution |
|
||||
| Waiting for input / question | Answer and unblock via `tmux send-keys` |
|
||||
| CI red | `gh pr checks PR_NUMBER --repo REPO` → tell agent exactly what's failing |
|
||||
| Context compacted / agent lost | Send recovery: `cat ~/.claude/orchestrator-state.json | jq '.agents[] | select(.window=="WIN")'` + `gh pr view PR_NUMBER --json title,body` |
|
||||
| `ORCHESTRATOR:DONE` in output | Run `verify-complete.sh` — if it fails, re-brief with specific reason |
|
||||
|
||||
### Strict ORCHESTRATOR:DONE gate
|
||||
|
||||
`verify-complete.sh` handles the main checks automatically (checkpoints, threads, CI green, spawned_at, and CHANGES_REQUESTED). Run it:
|
||||
|
||||
**CHANGES_REQUESTED staleness rule**: a `CHANGES_REQUESTED` review only blocks if it was submitted *after* the latest commit. If the latest commit postdates the review, the review is considered stale (feedback already addressed) and does not block. This avoids false negatives when a bot reviewer hasn't re-reviewed after the agent's fixing commits.
|
||||
|
||||
```bash
|
||||
SKILLS_DIR=~/.claude/orchestrator/scripts
|
||||
bash $SKILLS_DIR/verify-complete.sh SESSION:WIN
|
||||
```
|
||||
|
||||
If it passes → run-loop.sh will recycle the window automatically. No manual action needed.
|
||||
If it fails → re-brief the agent with the failure reason. Never manually mark state `done` to bypass this.
|
||||
|
||||
### Re-brief a stalled agent
|
||||
|
||||
```bash
|
||||
OBJ=$(jq -r --arg w SESSION:WIN '.agents[] | select(.window==$w) | .objective' ~/.claude/orchestrator-state.json)
|
||||
PR=$(jq -r --arg w SESSION:WIN '.agents[] | select(.window==$w) | .pr_number' ~/.claude/orchestrator-state.json)
|
||||
tmux send-keys -t SESSION:WIN "You appear stalled. Your objective: $OBJ. Check: gh pr view $PR --json title,body,headRefName to reorient."
|
||||
sleep 0.3
|
||||
tmux send-keys -t SESSION:WIN Enter
|
||||
```
|
||||
|
||||
If `image_path` is set on the agent record, include: "Re-read context at IMAGE_PATH with the Read tool."
|
||||
|
||||
## Self-recovery protocol (agents)
|
||||
|
||||
spawn-agent.sh automatically includes this instruction in every objective:
|
||||
|
||||
> If your context compacts and you lose track of what to do, run:
|
||||
> `cat ~/.claude/orchestrator-state.json | jq '.agents[] | select(.window=="SESSION:WIN")'`
|
||||
> and `gh pr view PR_NUMBER --json title,body,headRefName` to reorient.
|
||||
> Output each completed step as `CHECKPOINT:<step-name>` on its own line.
|
||||
|
||||
## Passing images and screenshots to agents
|
||||
|
||||
`tmux send-keys` is text-only — you cannot paste a raw image into a pane. To give an agent visual context (screenshots, diagrams, mockups):
|
||||
|
||||
1. **Save the image to a temp file** with a stable path:
|
||||
```bash
|
||||
# If the user drags in a screenshot or you receive a file path:
|
||||
IMAGE_PATH="/tmp/orchestrator-context-$(date +%s).png"
|
||||
cp "$USER_PROVIDED_PATH" "$IMAGE_PATH"
|
||||
```
|
||||
|
||||
2. **Reference the path in the objective string**:
|
||||
```bash
|
||||
OBJECTIVE="Implement the layout shown in /tmp/orchestrator-context-1234567890.png. Read that image first with the Read tool to understand the design."
|
||||
```
|
||||
|
||||
3. The agent uses its `Read` tool to view the image at startup — Claude Code agents are multimodal and can read image files directly.
|
||||
|
||||
**Rule**: always use `/tmp/orchestrator-context-<timestamp>.png` as the naming convention so the supervisor knows what to look for if it needs to re-brief an agent with the same image.
|
||||
|
||||
---
|
||||
|
||||
## Orchestrator final evaluation (YOU decide, not the script)
|
||||
|
||||
`verify-complete.sh` is a gate — it blocks premature marking. But it cannot tell you if the work is actually good. That is YOUR job.
|
||||
|
||||
When run-loop marks an agent `pending_evaluation` and you're notified, do all of these before marking done:
|
||||
|
||||
### 1. Run /pr-test (required, serialized, use TodoWrite to queue)
|
||||
|
||||
`/pr-test` is the only reliable confirmation that the objective is actually met. Run it yourself, not the agent.
|
||||
|
||||
**When multiple PRs reach `pending_evaluation` at the same time, use TodoWrite to queue them:**
|
||||
```
|
||||
- [ ] /pr-test PR #12636 — fix copilot retry logic
|
||||
- [ ] /pr-test PR #12699 — builder chat panel
|
||||
```
|
||||
Run one at a time. Check off as you go.
|
||||
|
||||
```
|
||||
/pr-test https://github.com/Significant-Gravitas/AutoGPT/pull/PR_NUMBER
|
||||
```
|
||||
|
||||
**/pr-test can be lazy** — if it gives vague output, re-run with full context:
|
||||
|
||||
```
|
||||
/pr-test https://github.com/OWNER/REPO/pull/PR_NUMBER
|
||||
Context: This PR implements <objective from state file>. Key files: <list>.
|
||||
Please verify: <specific behaviors to check>.
|
||||
```
|
||||
|
||||
Only one `/pr-test` at a time — they share ports and DB.
|
||||
|
||||
### /pr-test result evaluation
|
||||
|
||||
**PARTIAL on any headline feature scenario is an immediate blocker.** Do not approve, do not mark done, do not let the agent output `ORCHESTRATOR:DONE`.
|
||||
|
||||
| `/pr-test` result | Action |
|
||||
|---|---|
|
||||
| All headline scenarios **PASS** | Proceed to evaluation step 2 |
|
||||
| Any headline scenario **PARTIAL** | Re-brief the agent immediately — see below |
|
||||
| Any headline scenario **FAIL** | Re-brief the agent immediately |
|
||||
|
||||
**What PARTIAL means**: the feature is only partly working. Example: the Apply button never appeared, or the AI returned no action blocks. The agent addressed part of the objective but not all of it.
|
||||
|
||||
**When any headline scenario is PARTIAL or FAIL:**
|
||||
|
||||
1. Do NOT mark the agent done or accept `ORCHESTRATOR:DONE`
|
||||
2. Re-brief the agent with the specific scenario that failed and what was missing:
|
||||
```bash
|
||||
tmux send-keys -t SESSION:WIN "PARTIAL result on /pr-test — S5 (Apply button) never appeared. The AI must output JSON action blocks for the Apply button to render. Fix this before re-running /pr-test."
|
||||
sleep 0.3
|
||||
tmux send-keys -t SESSION:WIN Enter
|
||||
```
|
||||
3. Set state back to `running`:
|
||||
```bash
|
||||
jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "running"' \
|
||||
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
|
||||
```
|
||||
4. Wait for new `ORCHESTRATOR:DONE`, then re-run `/pr-test` from scratch
|
||||
|
||||
**Rule: only ALL-PASS qualifies for approval.** A mix of PASS + PARTIAL is a failure.
|
||||
|
||||
> **Why this matters**: PR #12699 was wrongly approved with S5 PARTIAL — the AI never output JSON action blocks so the Apply button never appeared. The fix was already in the agent's reach but slipped through because PARTIAL was not treated as blocking.
|
||||
|
||||
### 2. Do your own evaluation
|
||||
|
||||
1. **Read the PR diff and objective** — does the code actually implement what was asked? Is anything obviously missing or half-done?
|
||||
2. **Read the resolved threads** — were comments addressed with real fixes, or just dismissed/resolved without changes?
|
||||
3. **Check CI run names** — any suspicious retries that shouldn't have passed?
|
||||
4. **Check the PR description** — title, summary, test plan complete?
|
||||
|
||||
### 3. Decide
|
||||
|
||||
- `/pr-test` all scenarios PASS + evaluation looks good → mark `done` in state, tell the user the PR is ready, ask if window should be closed
|
||||
- `/pr-test` any scenario PARTIAL or FAIL → re-brief the agent with the specific failing scenario, set state back to `running` (see `/pr-test result evaluation` above)
|
||||
- Evaluation finds gaps even with all PASS → re-brief the agent with specific gaps, set state back to `running`
|
||||
|
||||
**Never mark done based purely on script output.** You hold the full objective context; the script does not.
|
||||
|
||||
```bash
|
||||
# Mark done after your positive evaluation:
|
||||
jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "done"' \
|
||||
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
|
||||
```
|
||||
|
||||
## When to stop the fleet
|
||||
|
||||
Stop the fleet (`active = false`) when **all** of the following are true:
|
||||
|
||||
| Check | How to verify |
|
||||
|---|---|
|
||||
| All agents are `done` or `escalated` | `jq '[.agents[] | select(.state | test("running\|stuck\|idle\|waiting_approval"))] | length' ~/.claude/orchestrator-state.json` == 0 |
|
||||
| All PRs have 0 unresolved review threads | GraphQL `isResolved` check per PR |
|
||||
| All PRs have green CI **on a run triggered after the agent's last push** | `gh run list --branch BRANCH --limit 1` timestamp > `spawned_at` in state |
|
||||
| No fresh CHANGES_REQUESTED (after latest commit) | `verify-complete.sh` checks this — stale pre-commit reviews are ignored |
|
||||
| No agents are `escalated` without human review | If any are escalated, surface to user first |
|
||||
|
||||
**Do NOT stop just because agents output `ORCHESTRATOR:DONE`.** That is a signal to verify, not a signal to stop.
|
||||
|
||||
**Do stop** if the user explicitly says "stop", "shut down", or "kill everything", even with agents still running.
|
||||
|
||||
```bash
|
||||
# Graceful stop
|
||||
jq '.active = false' ~/.claude/orchestrator-state.json > /tmp/orch.tmp \
|
||||
&& mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
|
||||
|
||||
LOOP_WINDOW=$(jq -r '.loop_window // ""' ~/.claude/orchestrator-state.json)
|
||||
[ -n "$LOOP_WINDOW" ] && tmux kill-window -t "$LOOP_WINDOW" 2>/dev/null || true
|
||||
```
|
||||
|
||||
Does **not** recycle running worktrees — agents may still be mid-task. Run `capacity.sh` to see what's still in progress.
|
||||
|
||||
## tmux send-keys pattern
|
||||
|
||||
**Always split long messages into text + Enter as two separate calls with a sleep between them.** If sent as one call (`"text" Enter`), Enter can fire before the full string is buffered into Claude's input — leaving the message stuck as `[Pasted text +N lines]` unsent.
|
||||
|
||||
```bash
|
||||
# CORRECT — text then Enter separately
|
||||
tmux send-keys -t "$WINDOW" "your long message here"
|
||||
sleep 0.3
|
||||
tmux send-keys -t "$WINDOW" Enter
|
||||
|
||||
# WRONG — Enter may fire before text is buffered
|
||||
tmux send-keys -t "$WINDOW" "your long message here" Enter
|
||||
```
|
||||
|
||||
Short single-character sends (`y`, `Down`, empty Enter for dialog approval) are safe to combine since they have no buffering lag.
|
||||
|
||||
---
|
||||
|
||||
## Protected worktrees
|
||||
|
||||
Some worktrees must **never** be used as spare worktrees for agent tasks because they host files critical to the orchestrator itself:
|
||||
|
||||
| Worktree | Protected branch | Why |
|
||||
|---|---|---|
|
||||
| `AutoGPT1` | `dx/orchestrate-skill` | Hosts the orchestrate skill scripts. `recycle-agent.sh` would check out `spare/1`, wiping `.claude/skills/` and breaking all subsequent `spawn-agent.sh` calls. |
|
||||
|
||||
**Rule**: when selecting spare worktrees via `find-spare.sh`, skip any worktree whose CURRENT branch matches a protected branch. If you accidentally spawn an agent in a protected worktree, do not let `recycle-agent.sh` run on it — manually restore the branch after the agent finishes.
|
||||
|
||||
When `dx/orchestrate-skill` is merged into `dev`, `AutoGPT1` becomes a normal spare again.
|
||||
|
||||
---
|
||||
|
||||
## Key rules
|
||||
|
||||
1. **Scripts do all the heavy lifting** — don't reimplement their logic inline in this file
|
||||
2. **Never ask the user to pick a worktree** — auto-assign from `find-spare.sh` output
|
||||
3. **Never restart a running agent** — only restart on `idle` kicks (foreground is a shell)
|
||||
4. **Auto-dismiss settings dialogs** — if "Enter to confirm" appears, send Down+Enter
|
||||
5. **Always `--permission-mode bypassPermissions`** on every spawn
|
||||
6. **Escalate after 3 kicks** — mark `escalated`, surface to user
|
||||
7. **Atomic state writes** — always write to `.tmp` then `mv`
|
||||
8. **Never approve destructive commands** outside the worktree scope — when in doubt, escalate
|
||||
9. **Never recycle without verification** — `verify-complete.sh` must pass before recycling
|
||||
10. **No TASK.md files** — commit risk; use state file + `gh pr view` for agent context persistence
|
||||
11. **Re-brief stalled agents** — read objective from state file + `gh pr view`, send via tmux
|
||||
12. **ORCHESTRATOR:DONE is a signal to verify, not to accept** — always run `verify-complete.sh` and check CI run timestamp before recycling
|
||||
13. **Protected worktrees** — never use the worktree hosting the skill scripts as a spare
|
||||
14. **Images via file path** — save screenshots to `/tmp/orchestrator-context-<ts>.png`, pass path in objective; agents read with the `Read` tool
|
||||
15. **Split send-keys** — always separate text and Enter with `sleep 0.3` between calls for long strings
|
||||
@@ -1,43 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# capacity.sh — show fleet capacity: available spare worktrees + in-use agents
|
||||
#
|
||||
# Usage: capacity.sh [REPO_ROOT]
|
||||
# REPO_ROOT defaults to the root worktree of the current git repo.
|
||||
#
|
||||
# Reads: ~/.claude/orchestrator-state.json (skipped if missing or corrupt)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
|
||||
REPO_ROOT="${1:-$(git rev-parse --show-toplevel 2>/dev/null || echo "")}"
|
||||
|
||||
echo "=== Available (spare) worktrees ==="
|
||||
if [ -n "$REPO_ROOT" ]; then
|
||||
SPARE=$("$SCRIPTS_DIR/find-spare.sh" "$REPO_ROOT" 2>/dev/null || echo "")
|
||||
else
|
||||
SPARE=$("$SCRIPTS_DIR/find-spare.sh" 2>/dev/null || echo "")
|
||||
fi
|
||||
|
||||
if [ -z "$SPARE" ]; then
|
||||
echo " (none)"
|
||||
else
|
||||
while IFS= read -r line; do
|
||||
[ -z "$line" ] && continue
|
||||
echo " ✓ $line"
|
||||
done <<< "$SPARE"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== In-use worktrees ==="
|
||||
if [ -f "$STATE_FILE" ] && jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
|
||||
IN_USE=$(jq -r '.agents[] | select(.state != "done") | " [\(.state)] \(.worktree_path) → \(.branch)"' \
|
||||
"$STATE_FILE" 2>/dev/null || echo "")
|
||||
if [ -n "$IN_USE" ]; then
|
||||
echo "$IN_USE"
|
||||
else
|
||||
echo " (none)"
|
||||
fi
|
||||
else
|
||||
echo " (no active state file)"
|
||||
fi
|
||||
@@ -1,85 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# classify-pane.sh — Classify the current state of a tmux pane
|
||||
#
|
||||
# Usage: classify-pane.sh <tmux-target>
|
||||
# tmux-target: e.g. "work:0", "work:1.0"
|
||||
#
|
||||
# Output (stdout): JSON object:
|
||||
# { "state": "running|idle|waiting_approval|complete", "reason": "...", "pane_cmd": "..." }
|
||||
#
|
||||
# Exit codes: 0=ok, 1=error (invalid target or tmux window not found)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
TARGET="${1:-}"
|
||||
|
||||
if [ -z "$TARGET" ]; then
|
||||
echo '{"state":"error","reason":"no target provided","pane_cmd":""}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validate tmux target format: session:window or session:window.pane
|
||||
if ! [[ "$TARGET" =~ ^[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+(\.[0-9]+)?$ ]]; then
|
||||
echo '{"state":"error","reason":"invalid tmux target format","pane_cmd":""}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check session exists (use %%:* to extract session name from session:window)
|
||||
if ! tmux list-windows -t "${TARGET%%:*}" &>/dev/null 2>&1; then
|
||||
echo '{"state":"error","reason":"tmux target not found","pane_cmd":""}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Get the current foreground command in the pane
|
||||
PANE_CMD=$(tmux display-message -t "$TARGET" -p '#{pane_current_command}' 2>/dev/null || echo "unknown")
|
||||
|
||||
# Capture and strip ANSI codes (use perl for cross-platform compatibility — BSD sed lacks \x1b support)
|
||||
RAW=$(tmux capture-pane -t "$TARGET" -p -S -50 2>/dev/null || echo "")
|
||||
CLEAN=$(echo "$RAW" | perl -pe 's/\x1b\[[0-9;]*[a-zA-Z]//g; s/\x1b\(B//g; s/\x1b\[\?[0-9]*[hl]//g; s/\r//g' \
|
||||
| grep -v '^[[:space:]]*$' || true)
|
||||
|
||||
# --- Check: explicit completion marker ---
|
||||
# Must be on its own line (not buried in the objective text sent at spawn time).
|
||||
if echo "$CLEAN" | grep -qE "^[[:space:]]*ORCHESTRATOR:DONE[[:space:]]*$"; then
|
||||
jq -n --arg cmd "$PANE_CMD" '{"state":"complete","reason":"ORCHESTRATOR:DONE marker found","pane_cmd":$cmd}'
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# --- Check: Claude Code approval prompt patterns ---
|
||||
LAST_40=$(echo "$CLEAN" | tail -40)
|
||||
APPROVAL_PATTERNS=(
|
||||
"Do you want to proceed"
|
||||
"Do you want to make this"
|
||||
"\\[y/n\\]"
|
||||
"\\[Y/n\\]"
|
||||
"\\[n/Y\\]"
|
||||
"Proceed\\?"
|
||||
"Allow this command"
|
||||
"Run bash command"
|
||||
"Allow bash"
|
||||
"Would you like"
|
||||
"Press enter to continue"
|
||||
"Esc to cancel"
|
||||
)
|
||||
for pattern in "${APPROVAL_PATTERNS[@]}"; do
|
||||
if echo "$LAST_40" | grep -qiE "$pattern"; then
|
||||
jq -n --arg pattern "$pattern" --arg cmd "$PANE_CMD" \
|
||||
'{"state":"waiting_approval","reason":"approval pattern: \($pattern)","pane_cmd":$cmd}'
|
||||
exit 0
|
||||
fi
|
||||
done
|
||||
|
||||
# --- Check: shell prompt (claude has exited) ---
|
||||
# If the foreground process is a shell (not claude/node), the agent has exited
|
||||
case "$PANE_CMD" in
|
||||
zsh|bash|fish|sh|dash|tcsh|ksh)
|
||||
jq -n --arg cmd "$PANE_CMD" \
|
||||
'{"state":"idle","reason":"agent exited — shell prompt active","pane_cmd":$cmd}'
|
||||
exit 0
|
||||
;;
|
||||
esac
|
||||
|
||||
# Agent is still running (claude/node/python is the foreground process)
|
||||
jq -n --arg cmd "$PANE_CMD" \
|
||||
'{"state":"running","reason":"foreground process: \($cmd)","pane_cmd":$cmd}'
|
||||
exit 0
|
||||
@@ -1,24 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# find-spare.sh — list worktrees on spare/N branches (free to use)
|
||||
#
|
||||
# Usage: find-spare.sh [REPO_ROOT]
|
||||
# REPO_ROOT defaults to the root worktree containing the current git repo.
|
||||
#
|
||||
# Output (stdout): one line per available worktree: "PATH BRANCH"
|
||||
# e.g.: /Users/me/Code/AutoGPT3 spare/3
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
REPO_ROOT="${1:-$(git rev-parse --show-toplevel 2>/dev/null || echo "")}"
|
||||
if [ -z "$REPO_ROOT" ]; then
|
||||
echo "Error: not inside a git repo and no REPO_ROOT provided" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
git -C "$REPO_ROOT" worktree list --porcelain \
|
||||
| awk '
|
||||
/^worktree / { path = substr($0, 10) }
|
||||
/^branch / { branch = substr($0, 8); print path " " branch }
|
||||
' \
|
||||
| { grep -E " refs/heads/spare/[0-9]+$" || true; } \
|
||||
| sed 's|refs/heads/||'
|
||||
@@ -1,40 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# notify.sh — send a fleet notification message
|
||||
#
|
||||
# Delivery order (first available wins):
|
||||
# 1. Discord webhook — DISCORD_WEBHOOK_URL env var OR state file .discord_webhook
|
||||
# 2. macOS notification center — osascript (silent fail if unavailable)
|
||||
# 3. Stdout only
|
||||
#
|
||||
# Usage: notify.sh MESSAGE
|
||||
# Exit: always 0 (notification failure must not abort the caller)
|
||||
|
||||
MESSAGE="${1:-}"
|
||||
[ -z "$MESSAGE" ] && exit 0
|
||||
|
||||
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
|
||||
|
||||
# --- Resolve Discord webhook ---
|
||||
WEBHOOK="${DISCORD_WEBHOOK_URL:-}"
|
||||
if [ -z "$WEBHOOK" ] && [ -f "$STATE_FILE" ]; then
|
||||
WEBHOOK=$(jq -r '.discord_webhook // ""' "$STATE_FILE" 2>/dev/null || echo "")
|
||||
fi
|
||||
|
||||
# --- Discord delivery ---
|
||||
if [ -n "$WEBHOOK" ]; then
|
||||
PAYLOAD=$(jq -n --arg msg "$MESSAGE" '{"content": $msg}')
|
||||
curl -s -X POST "$WEBHOOK" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "$PAYLOAD" > /dev/null 2>&1 || true
|
||||
fi
|
||||
|
||||
# --- macOS notification center (silent if not macOS or osascript missing) ---
|
||||
if command -v osascript &>/dev/null 2>&1; then
|
||||
# Escape single quotes for AppleScript
|
||||
SAFE_MSG=$(echo "$MESSAGE" | sed "s/'/\\\\'/g")
|
||||
osascript -e "display notification \"${SAFE_MSG}\" with title \"Orchestrator\"" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Always print to stdout so run-loop.sh logs it
|
||||
echo "$MESSAGE"
|
||||
exit 0
|
||||
@@ -1,257 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# poll-cycle.sh — Single orchestrator poll cycle
|
||||
#
|
||||
# Reads ~/.claude/orchestrator-state.json, classifies each agent, updates state,
|
||||
# and outputs a JSON array of actions for Claude to take.
|
||||
#
|
||||
# Usage: poll-cycle.sh
|
||||
# Output (stdout): JSON array of action objects
|
||||
# [{ "window": "work:0", "action": "kick|approve|none", "state": "...",
|
||||
# "worktree": "...", "objective": "...", "reason": "..." }]
|
||||
#
|
||||
# The state file is updated in-place (atomic write via .tmp).
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
|
||||
SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
CLASSIFY="$SCRIPTS_DIR/classify-pane.sh"
|
||||
|
||||
# Cross-platform md5: always outputs just the hex digest
|
||||
md5_hash() {
|
||||
if command -v md5sum &>/dev/null; then
|
||||
md5sum | awk '{print $1}'
|
||||
else
|
||||
md5 | awk '{print $NF}'
|
||||
fi
|
||||
}
|
||||
|
||||
# Clean up temp file on any exit (avoids stale .tmp if jq write fails)
|
||||
trap 'rm -f "${STATE_FILE}.tmp"' EXIT
|
||||
|
||||
# Ensure state file exists
|
||||
if [ ! -f "$STATE_FILE" ]; then
|
||||
echo '{"active":false,"agents":[]}' > "$STATE_FILE"
|
||||
fi
|
||||
|
||||
# Validate JSON upfront before any jq reads that run under set -e.
|
||||
# A truncated/corrupt file (e.g. from a SIGKILL mid-write) would otherwise
|
||||
# abort the script at the ACTIVE read below without emitting any JSON output.
|
||||
if ! jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
|
||||
echo "State file parse error — check $STATE_FILE" >&2
|
||||
echo "[]"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
ACTIVE=$(jq -r '.active // false' "$STATE_FILE")
|
||||
if [ "$ACTIVE" != "true" ]; then
|
||||
echo "[]"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
NOW=$(date +%s)
|
||||
IDLE_THRESHOLD=$(jq -r '.idle_threshold_seconds // 300' "$STATE_FILE")
|
||||
|
||||
ACTIONS="[]"
|
||||
UPDATED_AGENTS="[]"
|
||||
|
||||
# Read agents as newline-delimited JSON objects.
|
||||
# jq exits non-zero when .agents[] has no matches on an empty array, which is valid —
|
||||
# so we suppress that exit code and separately validate the file is well-formed JSON.
|
||||
if ! AGENTS_JSON=$(jq -e -c '.agents // empty | .[]' "$STATE_FILE" 2>/dev/null); then
|
||||
if ! jq -e '.' "$STATE_FILE" > /dev/null 2>&1; then
|
||||
echo "State file parse error — check $STATE_FILE" >&2
|
||||
fi
|
||||
echo "[]"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [ -z "$AGENTS_JSON" ]; then
|
||||
echo "[]"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
while IFS= read -r agent; do
|
||||
[ -z "$agent" ] && continue
|
||||
|
||||
# Use // "" defaults so a single malformed field doesn't abort the whole cycle
|
||||
WINDOW=$(echo "$agent" | jq -r '.window // ""')
|
||||
WORKTREE=$(echo "$agent" | jq -r '.worktree // ""')
|
||||
OBJECTIVE=$(echo "$agent"| jq -r '.objective // ""')
|
||||
STATE=$(echo "$agent" | jq -r '.state // "running"')
|
||||
LAST_HASH=$(echo "$agent"| jq -r '.last_output_hash // ""')
|
||||
IDLE_SINCE=$(echo "$agent"| jq -r '.idle_since // 0')
|
||||
REVISION_COUNT=$(echo "$agent"| jq -r '.revision_count // 0')
|
||||
|
||||
# Validate window format to prevent tmux target injection.
|
||||
# Allow session:window (numeric or named) and session:window.pane
|
||||
if ! [[ "$WINDOW" =~ ^[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+(\.[0-9]+)?$ ]]; then
|
||||
echo "Skipping agent with invalid window value: $WINDOW" >&2
|
||||
UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
|
||||
continue
|
||||
fi
|
||||
|
||||
# Pass-through terminal-state agents
|
||||
if [[ "$STATE" == "done" || "$STATE" == "escalated" || "$STATE" == "complete" || "$STATE" == "pending_evaluation" ]]; then
|
||||
UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
|
||||
continue
|
||||
fi
|
||||
|
||||
# Classify pane.
|
||||
# classify-pane.sh always emits JSON before exit (even on error), so using
|
||||
# "|| echo '...'" would concatenate two JSON objects when it exits non-zero.
|
||||
# Use "|| true" inside the substitution so set -euo pipefail does not abort
|
||||
# the poll cycle when classify exits with a non-zero status code.
|
||||
CLASSIFICATION=$("$CLASSIFY" "$WINDOW" 2>/dev/null || true)
|
||||
[ -z "$CLASSIFICATION" ] && CLASSIFICATION='{"state":"error","reason":"classify failed","pane_cmd":"unknown"}'
|
||||
|
||||
PANE_STATE=$(echo "$CLASSIFICATION" | jq -r '.state')
|
||||
PANE_REASON=$(echo "$CLASSIFICATION" | jq -r '.reason')
|
||||
|
||||
# Capture full pane output once — used for hash (stuck detection) and checkpoint parsing.
|
||||
# Use -S -500 to get the last ~500 lines of scrollback so checkpoints aren't missed.
|
||||
RAW=$(tmux capture-pane -t "$WINDOW" -p -S -500 2>/dev/null || echo "")
|
||||
|
||||
# --- Checkpoint tracking ---
|
||||
# Parse any "CHECKPOINT:<step>" lines the agent has output and merge into state file.
|
||||
# The agent writes these as it completes each required step so verify-complete.sh can gate recycling.
|
||||
EXISTING_CPS=$(echo "$agent" | jq -c '.checkpoints // []')
|
||||
NEW_CHECKPOINTS_JSON="$EXISTING_CPS"
|
||||
if [ -n "$RAW" ]; then
|
||||
FOUND_CPS=$(echo "$RAW" \
|
||||
| grep -oE "CHECKPOINT:[a-zA-Z0-9_-]+" \
|
||||
| sed 's/CHECKPOINT://' \
|
||||
| sort -u \
|
||||
| jq -R . | jq -s . 2>/dev/null || echo "[]")
|
||||
NEW_CHECKPOINTS_JSON=$(jq -n \
|
||||
--argjson existing "$EXISTING_CPS" \
|
||||
--argjson found "$FOUND_CPS" \
|
||||
'($existing + $found) | unique' 2>/dev/null || echo "$EXISTING_CPS")
|
||||
fi
|
||||
|
||||
# Compute content hash for stuck-detection (only for running agents)
|
||||
CURRENT_HASH=""
|
||||
if [[ "$PANE_STATE" == "running" ]] && [ -n "$RAW" ]; then
|
||||
CURRENT_HASH=$(echo "$RAW" | tail -20 | md5_hash)
|
||||
fi
|
||||
|
||||
NEW_STATE="$STATE"
|
||||
NEW_IDLE_SINCE="$IDLE_SINCE"
|
||||
NEW_REVISION_COUNT="$REVISION_COUNT"
|
||||
ACTION="none"
|
||||
REASON="$PANE_REASON"
|
||||
|
||||
case "$PANE_STATE" in
|
||||
complete)
|
||||
# Agent output ORCHESTRATOR:DONE — mark pending_evaluation so orchestrator handles it.
|
||||
# run-loop does NOT verify or notify; orchestrator's background poll picks this up.
|
||||
NEW_STATE="pending_evaluation"
|
||||
ACTION="complete" # run-loop logs it but takes no action
|
||||
;;
|
||||
waiting_approval)
|
||||
NEW_STATE="waiting_approval"
|
||||
ACTION="approve"
|
||||
;;
|
||||
idle)
|
||||
# Agent process has exited — needs restart
|
||||
NEW_STATE="idle"
|
||||
ACTION="kick"
|
||||
REASON="agent exited (shell is foreground)"
|
||||
NEW_REVISION_COUNT=$(( REVISION_COUNT + 1 ))
|
||||
NEW_IDLE_SINCE=$NOW
|
||||
if [ "$NEW_REVISION_COUNT" -ge 3 ]; then
|
||||
NEW_STATE="escalated"
|
||||
ACTION="none"
|
||||
REASON="escalated after ${NEW_REVISION_COUNT} kicks — needs human attention"
|
||||
fi
|
||||
;;
|
||||
running)
|
||||
# Clear idle_since only when transitioning from idle (agent was kicked and
|
||||
# restarted). Do NOT reset for stuck — idle_since must persist across polls
|
||||
# so STUCK_DURATION can accumulate and trigger escalation.
|
||||
# Also update the local IDLE_SINCE so the hash-stability check below uses
|
||||
# the reset value on this same poll, not the stale kick timestamp.
|
||||
if [[ "$STATE" == "idle" ]]; then
|
||||
NEW_IDLE_SINCE=0
|
||||
IDLE_SINCE=0
|
||||
fi
|
||||
# Check if hash has been stable (agent may be stuck mid-task)
|
||||
if [ -n "$CURRENT_HASH" ] && [ "$CURRENT_HASH" = "$LAST_HASH" ] && [ "$LAST_HASH" != "" ]; then
|
||||
if [ "$IDLE_SINCE" = "0" ] || [ "$IDLE_SINCE" = "null" ]; then
|
||||
NEW_IDLE_SINCE=$NOW
|
||||
else
|
||||
STUCK_DURATION=$(( NOW - IDLE_SINCE ))
|
||||
if [ "$STUCK_DURATION" -gt "$IDLE_THRESHOLD" ]; then
|
||||
NEW_REVISION_COUNT=$(( REVISION_COUNT + 1 ))
|
||||
NEW_IDLE_SINCE=$NOW
|
||||
if [ "$NEW_REVISION_COUNT" -ge 3 ]; then
|
||||
NEW_STATE="escalated"
|
||||
ACTION="none"
|
||||
REASON="escalated after ${NEW_REVISION_COUNT} kicks — needs human attention"
|
||||
else
|
||||
NEW_STATE="stuck"
|
||||
ACTION="kick"
|
||||
REASON="output unchanged for ${STUCK_DURATION}s (threshold: ${IDLE_THRESHOLD}s)"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
else
|
||||
# Only reset the idle timer when we have a valid hash comparison (pane
|
||||
# capture succeeded). If CURRENT_HASH is empty (tmux capture-pane failed),
|
||||
# preserve existing timers so stuck detection is not inadvertently reset.
|
||||
if [ -n "$CURRENT_HASH" ]; then
|
||||
NEW_STATE="running"
|
||||
NEW_IDLE_SINCE=0
|
||||
fi
|
||||
fi
|
||||
;;
|
||||
error)
|
||||
REASON="classify error: $PANE_REASON"
|
||||
;;
|
||||
esac
|
||||
|
||||
# Build updated agent record (ensure idle_since and revision_count are numeric)
|
||||
# Use || true on each jq call so a malformed field skips this agent rather than
|
||||
# aborting the entire poll cycle under set -e.
|
||||
UPDATED_AGENT=$(echo "$agent" | jq \
|
||||
--arg state "$NEW_STATE" \
|
||||
--arg hash "$CURRENT_HASH" \
|
||||
--argjson now "$NOW" \
|
||||
--arg idle_since "$NEW_IDLE_SINCE" \
|
||||
--arg revision_count "$NEW_REVISION_COUNT" \
|
||||
--argjson checkpoints "$NEW_CHECKPOINTS_JSON" \
|
||||
'.state = $state
|
||||
| .last_output_hash = (if $hash == "" then .last_output_hash else $hash end)
|
||||
| .last_seen_at = $now
|
||||
| .idle_since = ($idle_since | tonumber)
|
||||
| .revision_count = ($revision_count | tonumber)
|
||||
| .checkpoints = $checkpoints' 2>/dev/null) || {
|
||||
echo "Warning: failed to build updated agent for window $WINDOW — keeping original" >&2
|
||||
UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
|
||||
continue
|
||||
}
|
||||
|
||||
UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$UPDATED_AGENT" '. + [$a]')
|
||||
|
||||
# Add action if needed
|
||||
if [ "$ACTION" != "none" ]; then
|
||||
ACTION_OBJ=$(jq -n \
|
||||
--arg window "$WINDOW" \
|
||||
--arg action "$ACTION" \
|
||||
--arg state "$NEW_STATE" \
|
||||
--arg worktree "$WORKTREE" \
|
||||
--arg objective "$OBJECTIVE" \
|
||||
--arg reason "$REASON" \
|
||||
'{window:$window, action:$action, state:$state, worktree:$worktree, objective:$objective, reason:$reason}')
|
||||
ACTIONS=$(echo "$ACTIONS" | jq --argjson a "$ACTION_OBJ" '. + [$a]')
|
||||
fi
|
||||
|
||||
done <<< "$AGENTS_JSON"
|
||||
|
||||
# Atomic state file update
|
||||
jq --argjson agents "$UPDATED_AGENTS" \
|
||||
--argjson now "$NOW" \
|
||||
'.agents = $agents | .last_poll_at = $now' \
|
||||
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
|
||||
|
||||
echo "$ACTIONS"
|
||||
@@ -1,32 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# recycle-agent.sh — kill a tmux window and restore the worktree to its spare branch
|
||||
#
|
||||
# Usage: recycle-agent.sh WINDOW WORKTREE_PATH SPARE_BRANCH
|
||||
# WINDOW — tmux target, e.g. autogpt1:3
|
||||
# WORKTREE_PATH — absolute path to the git worktree
|
||||
# SPARE_BRANCH — branch to restore, e.g. spare/6
|
||||
#
|
||||
# Stdout: one status line
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
if [ $# -lt 3 ]; then
|
||||
echo "Usage: recycle-agent.sh WINDOW WORKTREE_PATH SPARE_BRANCH" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
WINDOW="$1"
|
||||
WORKTREE_PATH="$2"
|
||||
SPARE_BRANCH="$3"
|
||||
|
||||
# Kill the tmux window (ignore error — may already be gone)
|
||||
tmux kill-window -t "$WINDOW" 2>/dev/null || true
|
||||
|
||||
# Restore to spare branch: abort any in-progress operation, then clean
|
||||
git -C "$WORKTREE_PATH" rebase --abort 2>/dev/null || true
|
||||
git -C "$WORKTREE_PATH" merge --abort 2>/dev/null || true
|
||||
git -C "$WORKTREE_PATH" reset --hard HEAD 2>/dev/null
|
||||
git -C "$WORKTREE_PATH" clean -fd 2>/dev/null
|
||||
git -C "$WORKTREE_PATH" checkout "$SPARE_BRANCH"
|
||||
|
||||
echo "Recycled: $(basename "$WORKTREE_PATH") → $SPARE_BRANCH (window $WINDOW closed)"
|
||||
@@ -1,164 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# run-loop.sh — Mechanical babysitter for the agent fleet (runs in its own tmux window)
|
||||
#
|
||||
# Handles ONLY two things that need no intelligence:
|
||||
# idle → restart claude using --resume SESSION_ID (or --continue) to restore context
|
||||
# approve → auto-approve safe dialogs, press Enter on numbered-option dialogs
|
||||
#
|
||||
# Everything else — ORCHESTRATOR:DONE, verification, /pr-test, final evaluation,
|
||||
# marking done, deciding to close windows — is the orchestrating Claude's job.
|
||||
# poll-cycle.sh sets state to pending_evaluation when ORCHESTRATOR:DONE is detected;
|
||||
# the orchestrator's background poll loop handles it from there.
|
||||
#
|
||||
# Usage: run-loop.sh
|
||||
# Env: POLL_INTERVAL (default: 30), ORCHESTRATOR_STATE_FILE
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Copy scripts to a stable location outside the repo so they survive branch
|
||||
# checkouts (e.g. recycle-agent.sh switching spare/N back into this worktree
|
||||
# would wipe .claude/skills/orchestrate/scripts if the skill only exists on the
|
||||
# current branch).
|
||||
_ORIGIN_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
STABLE_SCRIPTS_DIR="$HOME/.claude/orchestrator/scripts"
|
||||
mkdir -p "$STABLE_SCRIPTS_DIR"
|
||||
cp "$_ORIGIN_DIR"/*.sh "$STABLE_SCRIPTS_DIR/"
|
||||
chmod +x "$STABLE_SCRIPTS_DIR"/*.sh
|
||||
SCRIPTS_DIR="$STABLE_SCRIPTS_DIR"
|
||||
|
||||
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
|
||||
POLL_INTERVAL="${POLL_INTERVAL:-30}"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# update_state WINDOW FIELD VALUE
|
||||
# ---------------------------------------------------------------------------
|
||||
update_state() {
|
||||
local window="$1" field="$2" value="$3"
|
||||
jq --arg w "$window" --arg f "$field" --arg v "$value" \
|
||||
'.agents |= map(if .window == $w then .[$f] = $v else . end)' \
|
||||
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
|
||||
}
|
||||
|
||||
update_state_int() {
|
||||
local window="$1" field="$2" value="$3"
|
||||
jq --arg w "$window" --arg f "$field" --argjson v "$value" \
|
||||
'.agents |= map(if .window == $w then .[$f] = $v else . end)' \
|
||||
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
|
||||
}
|
||||
|
||||
agent_field() {
|
||||
jq -r --arg w "$1" --arg f "$2" \
|
||||
'.agents[] | select(.window == $w) | .[$f] // ""' \
|
||||
"$STATE_FILE" 2>/dev/null
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# wait_for_prompt WINDOW — wait up to 60s for Claude's ❯ prompt
|
||||
# ---------------------------------------------------------------------------
|
||||
wait_for_prompt() {
|
||||
local window="$1"
|
||||
for i in $(seq 1 60); do
|
||||
local cmd pane
|
||||
cmd=$(tmux display-message -t "$window" -p '#{pane_current_command}' 2>/dev/null || echo "")
|
||||
pane=$(tmux capture-pane -t "$window" -p 2>/dev/null || echo "")
|
||||
if echo "$pane" | grep -q "Enter to confirm"; then
|
||||
tmux send-keys -t "$window" Down Enter; sleep 2; continue
|
||||
fi
|
||||
[[ "$cmd" == "node" ]] && echo "$pane" | grep -q "❯" && return 0
|
||||
sleep 1
|
||||
done
|
||||
return 1 # timed out
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# handle_kick WINDOW STATE — only for idle (crashed) agents, not stuck
|
||||
# ---------------------------------------------------------------------------
|
||||
handle_kick() {
|
||||
local window="$1" state="$2"
|
||||
[[ "$state" != "idle" ]] && return # stuck agents handled by supervisor
|
||||
|
||||
local worktree_path session_id
|
||||
worktree_path=$(agent_field "$window" "worktree_path")
|
||||
session_id=$(agent_field "$window" "session_id")
|
||||
|
||||
echo "[$(date +%H:%M:%S)] KICK restart $window — agent exited, resuming session"
|
||||
|
||||
# Resume the exact session so the agent retains full context — no need to re-send objective
|
||||
if [ -n "$session_id" ]; then
|
||||
tmux send-keys -t "$window" "cd '${worktree_path}' && claude --resume '${session_id}' --permission-mode bypassPermissions" Enter
|
||||
else
|
||||
tmux send-keys -t "$window" "cd '${worktree_path}' && claude --continue --permission-mode bypassPermissions" Enter
|
||||
fi
|
||||
|
||||
wait_for_prompt "$window" || echo "[$(date +%H:%M:%S)] KICK WARNING $window — timed out waiting for ❯"
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# handle_approve WINDOW — auto-approve dialogs that need no judgment
|
||||
# ---------------------------------------------------------------------------
|
||||
handle_approve() {
|
||||
local window="$1"
|
||||
local pane_tail
|
||||
pane_tail=$(tmux capture-pane -t "$window" -p 2>/dev/null | tail -3 || echo "")
|
||||
|
||||
# Settings error dialog at startup
|
||||
if echo "$pane_tail" | grep -q "Enter to confirm"; then
|
||||
echo "[$(date +%H:%M:%S)] APPROVE dialog $window — settings error"
|
||||
tmux send-keys -t "$window" Down Enter
|
||||
return
|
||||
fi
|
||||
|
||||
# Numbered-option dialog (e.g. "Do you want to make this edit?")
|
||||
# ❯ is already on option 1 (Yes) — Enter confirms it
|
||||
if echo "$pane_tail" | grep -qE "❯\s*1\." || echo "$pane_tail" | grep -q "Esc to cancel"; then
|
||||
echo "[$(date +%H:%M:%S)] APPROVE edit $window"
|
||||
tmux send-keys -t "$window" "" Enter
|
||||
return
|
||||
fi
|
||||
|
||||
# y/n prompt for safe operations
|
||||
if echo "$pane_tail" | grep -qiE "(^git |^npm |^pnpm |^poetry |^pytest|^docker |^make |^cargo |^pip |^yarn |curl .*(localhost|127\.0\.0\.1))"; then
|
||||
echo "[$(date +%H:%M:%S)] APPROVE safe $window"
|
||||
tmux send-keys -t "$window" "y" Enter
|
||||
return
|
||||
fi
|
||||
|
||||
# Anything else — supervisor handles it, just log
|
||||
echo "[$(date +%H:%M:%S)] APPROVE skip $window — unknown dialog, supervisor will handle"
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main loop
|
||||
# ---------------------------------------------------------------------------
|
||||
echo "[$(date +%H:%M:%S)] run-loop started (mechanical only, poll every ${POLL_INTERVAL}s)"
|
||||
echo "[$(date +%H:%M:%S)] Supervisor: orchestrating Claude session (not a separate window)"
|
||||
echo "---"
|
||||
|
||||
while true; do
|
||||
if ! jq -e '.active == true' "$STATE_FILE" >/dev/null 2>&1; then
|
||||
echo "[$(date +%H:%M:%S)] active=false — exiting."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
ACTIONS=$("$SCRIPTS_DIR/poll-cycle.sh" 2>/dev/null || echo "[]")
|
||||
KICKED=0; DONE=0
|
||||
|
||||
while IFS= read -r action; do
|
||||
[ -z "$action" ] && continue
|
||||
WINDOW=$(echo "$action" | jq -r '.window // ""')
|
||||
ACTION=$(echo "$action" | jq -r '.action // ""')
|
||||
STATE=$(echo "$action" | jq -r '.state // ""')
|
||||
|
||||
case "$ACTION" in
|
||||
kick) handle_kick "$WINDOW" "$STATE" || true; KICKED=$(( KICKED + 1 )) ;;
|
||||
approve) handle_approve "$WINDOW" || true ;;
|
||||
complete) DONE=$(( DONE + 1 )) ;; # poll-cycle already set state=pending_evaluation; orchestrator handles
|
||||
esac
|
||||
done < <(echo "$ACTIONS" | jq -c '.[]' 2>/dev/null || true)
|
||||
|
||||
RUNNING=$(jq '[.agents[] | select(.state | test("running|stuck|waiting_approval|idle"))] | length' \
|
||||
"$STATE_FILE" 2>/dev/null || echo 0)
|
||||
|
||||
echo "[$(date +%H:%M:%S)] Poll — ${RUNNING} running ${KICKED} kicked ${DONE} recycled"
|
||||
sleep "$POLL_INTERVAL"
|
||||
done
|
||||
@@ -1,122 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# spawn-agent.sh — create tmux window, checkout branch, launch claude, send task
|
||||
#
|
||||
# Usage: spawn-agent.sh SESSION WORKTREE_PATH SPARE_BRANCH NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]
|
||||
# SESSION — tmux session name, e.g. autogpt1
|
||||
# WORKTREE_PATH — absolute path to the git worktree
|
||||
# SPARE_BRANCH — spare branch being replaced, e.g. spare/6 (saved for recycle)
|
||||
# NEW_BRANCH — task branch to create, e.g. feat/my-feature
|
||||
# OBJECTIVE — task description sent to the agent
|
||||
# PR_NUMBER — (optional) GitHub PR number for completion verification
|
||||
# STEPS... — (optional) required checkpoint names, e.g. pr-address pr-test
|
||||
#
|
||||
# Stdout: SESSION:WINDOW_INDEX (nothing else — callers rely on this)
|
||||
# Exit non-zero on failure.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
if [ $# -lt 5 ]; then
|
||||
echo "Usage: spawn-agent.sh SESSION WORKTREE_PATH SPARE_BRANCH NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
SESSION="$1"
|
||||
WORKTREE_PATH="$2"
|
||||
SPARE_BRANCH="$3"
|
||||
NEW_BRANCH="$4"
|
||||
OBJECTIVE="$5"
|
||||
PR_NUMBER="${6:-}"
|
||||
STEPS=("${@:7}")
|
||||
WORKTREE_NAME=$(basename "$WORKTREE_PATH")
|
||||
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
|
||||
|
||||
# Generate a stable session ID so this agent's Claude session can always be resumed:
|
||||
# claude --resume $SESSION_ID --permission-mode bypassPermissions
|
||||
SESSION_ID=$(uuidgen 2>/dev/null || python3 -c "import uuid; print(uuid.uuid4())")
|
||||
|
||||
# Create (or switch to) the task branch
|
||||
git -C "$WORKTREE_PATH" checkout -b "$NEW_BRANCH" 2>/dev/null \
|
||||
|| git -C "$WORKTREE_PATH" checkout "$NEW_BRANCH"
|
||||
|
||||
# Open a new named tmux window; capture its numeric index
|
||||
WIN_IDX=$(tmux new-window -t "$SESSION" -n "$WORKTREE_NAME" -P -F '#{window_index}')
|
||||
WINDOW="${SESSION}:${WIN_IDX}"
|
||||
|
||||
# Append the initial agent record to the state file so subsequent jq updates find it.
|
||||
# This must happen before the pr_number/steps update below.
|
||||
if [ -f "$STATE_FILE" ]; then
|
||||
NOW=$(date +%s)
|
||||
jq --arg window "$WINDOW" \
|
||||
--arg worktree "$WORKTREE_NAME" \
|
||||
--arg worktree_path "$WORKTREE_PATH" \
|
||||
--arg spare_branch "$SPARE_BRANCH" \
|
||||
--arg branch "$NEW_BRANCH" \
|
||||
--arg objective "$OBJECTIVE" \
|
||||
--arg session_id "$SESSION_ID" \
|
||||
--argjson now "$NOW" \
|
||||
'.agents += [{
|
||||
"window": $window,
|
||||
"worktree": $worktree,
|
||||
"worktree_path": $worktree_path,
|
||||
"spare_branch": $spare_branch,
|
||||
"branch": $branch,
|
||||
"objective": $objective,
|
||||
"session_id": $session_id,
|
||||
"state": "running",
|
||||
"checkpoints": [],
|
||||
"last_output_hash": "",
|
||||
"last_seen_at": $now,
|
||||
"spawned_at": $now,
|
||||
"idle_since": 0,
|
||||
"revision_count": 0,
|
||||
"last_rebriefed_at": 0
|
||||
}]' "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
|
||||
fi
|
||||
|
||||
# Store pr_number + steps in state file if provided (enables verify-complete.sh).
|
||||
# The agent record was appended above so the jq select now finds it.
|
||||
if [ -n "$PR_NUMBER" ] && [ -f "$STATE_FILE" ]; then
|
||||
if [ "${#STEPS[@]}" -gt 0 ]; then
|
||||
STEPS_JSON=$(printf '%s\n' "${STEPS[@]}" | jq -R . | jq -s .)
|
||||
else
|
||||
STEPS_JSON='[]'
|
||||
fi
|
||||
jq --arg w "$WINDOW" --arg pr "$PR_NUMBER" --argjson steps "$STEPS_JSON" \
|
||||
'.agents |= map(if .window == $w then . + {pr_number: $pr, steps: $steps, checkpoints: []} else . end)' \
|
||||
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
|
||||
fi
|
||||
|
||||
# Launch claude with a stable session ID so it can always be resumed after a crash:
|
||||
# claude --resume SESSION_ID --permission-mode bypassPermissions
|
||||
tmux send-keys -t "$WINDOW" "cd '${WORKTREE_PATH}' && claude --permission-mode bypassPermissions --session-id '${SESSION_ID}'" Enter
|
||||
|
||||
# Wait up to 60s for claude to be fully interactive:
|
||||
# both pane_current_command == 'node' AND the '❯' prompt is visible.
|
||||
PROMPT_FOUND=false
|
||||
for i in $(seq 1 60); do
|
||||
CMD=$(tmux display-message -t "$WINDOW" -p '#{pane_current_command}' 2>/dev/null || echo "")
|
||||
PANE=$(tmux capture-pane -t "$WINDOW" -p 2>/dev/null || echo "")
|
||||
if echo "$PANE" | grep -q "Enter to confirm"; then
|
||||
tmux send-keys -t "$WINDOW" Down Enter
|
||||
sleep 2
|
||||
continue
|
||||
fi
|
||||
if [[ "$CMD" == "node" ]] && echo "$PANE" | grep -q "❯"; then
|
||||
PROMPT_FOUND=true
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
if ! $PROMPT_FOUND; then
|
||||
echo "[spawn-agent] WARNING: timed out waiting for ❯ prompt on $WINDOW — sending objective anyway" >&2
|
||||
fi
|
||||
|
||||
# Send the task. Split text and Enter — if combined, Enter can fire before the string
|
||||
# is fully buffered, leaving the message stuck as "[Pasted text +N lines]" unsent.
|
||||
tmux send-keys -t "$WINDOW" "${OBJECTIVE} Output each completed step as CHECKPOINT:<step-name>. When ALL steps are done, output ORCHESTRATOR:DONE on its own line."
|
||||
sleep 0.3
|
||||
tmux send-keys -t "$WINDOW" Enter
|
||||
|
||||
# Only output the window address — nothing else (callers parse this)
|
||||
echo "$WINDOW"
|
||||
@@ -1,43 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# status.sh — print orchestrator status: state file summary + live tmux pane commands
|
||||
#
|
||||
# Usage: status.sh
|
||||
# Reads: ~/.claude/orchestrator-state.json
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
|
||||
|
||||
if [ ! -f "$STATE_FILE" ] || ! jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
|
||||
echo "No orchestrator state found at $STATE_FILE"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Header: active status, session, thresholds, last poll
|
||||
jq -r '
|
||||
"=== Orchestrator [\(if .active then "RUNNING" else "STOPPED" end)] ===",
|
||||
"Session: \(.tmux_session // "unknown") | Idle threshold: \(.idle_threshold_seconds // 300)s",
|
||||
"Last poll: \(if (.last_poll_at // 0) == 0 then "never" else (.last_poll_at | strftime("%H:%M:%S")) end)",
|
||||
""
|
||||
' "$STATE_FILE"
|
||||
|
||||
# Each agent: state, window, worktree/branch, truncated objective
|
||||
AGENT_COUNT=$(jq '.agents | length' "$STATE_FILE")
|
||||
if [ "$AGENT_COUNT" -eq 0 ]; then
|
||||
echo " (no agents registered)"
|
||||
else
|
||||
jq -r '
|
||||
.agents[] |
|
||||
" [\(.state | ascii_upcase)] \(.window) \(.worktree)/\(.branch)",
|
||||
" \(.objective // "" | .[0:70])"
|
||||
' "$STATE_FILE"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Live pane_current_command for non-done agents
|
||||
while IFS= read -r WINDOW; do
|
||||
[ -z "$WINDOW" ] && continue
|
||||
CMD=$(tmux display-message -t "$WINDOW" -p '#{pane_current_command}' 2>/dev/null || echo "unreachable")
|
||||
echo " $WINDOW live: $CMD"
|
||||
done < <(jq -r '.agents[] | select(.state != "done") | .window' "$STATE_FILE" 2>/dev/null || true)
|
||||
@@ -1,180 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# verify-complete.sh — verify a PR task is truly done before marking the agent done
|
||||
#
|
||||
# Check order matters:
|
||||
# 1. Checkpoints — did the agent do all required steps?
|
||||
# 2. CI complete — no pending (bots post comments AFTER their check runs, must wait)
|
||||
# 3. CI passing — no failures (agent must fix before done)
|
||||
# 4. spawned_at — a new CI run was triggered after agent spawned (proves real work)
|
||||
# 5. Unresolved threads — checked AFTER CI so bot-posted comments are included
|
||||
# 6. CHANGES_REQUESTED — checked AFTER CI so bot reviews are included
|
||||
#
|
||||
# Usage: verify-complete.sh WINDOW
|
||||
# Exit 0 = verified complete; exit 1 = not complete (stderr has reason)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
WINDOW="$1"
|
||||
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
|
||||
|
||||
PR_NUMBER=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .pr_number // ""' "$STATE_FILE" 2>/dev/null)
|
||||
STEPS=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .steps // [] | .[]' "$STATE_FILE" 2>/dev/null || true)
|
||||
CHECKPOINTS=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .checkpoints // [] | .[]' "$STATE_FILE" 2>/dev/null || true)
|
||||
WORKTREE_PATH=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .worktree_path // ""' "$STATE_FILE" 2>/dev/null)
|
||||
BRANCH=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .branch // ""' "$STATE_FILE" 2>/dev/null)
|
||||
SPAWNED_AT=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .spawned_at // "0"' "$STATE_FILE" 2>/dev/null || echo "0")
|
||||
|
||||
# No PR number = cannot verify
|
||||
if [ -z "$PR_NUMBER" ]; then
|
||||
echo "NOT COMPLETE: no pr_number in state — set pr_number or mark done manually" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- Check 1: all required steps are checkpointed ---
|
||||
MISSING=""
|
||||
while IFS= read -r step; do
|
||||
[ -z "$step" ] && continue
|
||||
if ! echo "$CHECKPOINTS" | grep -qFx "$step"; then
|
||||
MISSING="$MISSING $step"
|
||||
fi
|
||||
done <<< "$STEPS"
|
||||
|
||||
if [ -n "$MISSING" ]; then
|
||||
echo "NOT COMPLETE: missing checkpoints:$MISSING on PR #$PR_NUMBER" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Resolve repo for all GitHub checks below
|
||||
REPO=$(jq -r '.repo // ""' "$STATE_FILE" 2>/dev/null || echo "")
|
||||
if [ -z "$REPO" ] && [ -n "$WORKTREE_PATH" ] && [ -d "$WORKTREE_PATH" ]; then
|
||||
REPO=$(git -C "$WORKTREE_PATH" remote get-url origin 2>/dev/null \
|
||||
| sed 's|.*github\.com[:/]||; s|\.git$||' || echo "")
|
||||
fi
|
||||
|
||||
if [ -z "$REPO" ]; then
|
||||
echo "Warning: cannot resolve repo — skipping CI/thread checks" >&2
|
||||
echo "VERIFIED: PR #$PR_NUMBER — checkpoints ✓ (CI/thread checks skipped — no repo)"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
CI_BUCKETS=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket 2>/dev/null || echo "[]")
|
||||
|
||||
# --- Check 2: CI fully complete — no pending checks ---
|
||||
# Pending checks MUST finish before we check threads/reviews:
|
||||
# bots (Seer, Check PR Status, etc.) post comments and CHANGES_REQUESTED AFTER their CI check runs.
|
||||
PENDING=$(echo "$CI_BUCKETS" | jq '[.[] | select(.bucket == "pending")] | length' 2>/dev/null || echo "0")
|
||||
if [ "$PENDING" -gt 0 ]; then
|
||||
PENDING_NAMES=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket,name 2>/dev/null \
|
||||
| jq -r '[.[] | select(.bucket == "pending") | .name] | join(", ")' 2>/dev/null || echo "unknown")
|
||||
echo "NOT COMPLETE: $PENDING CI checks still pending on PR #$PR_NUMBER ($PENDING_NAMES)" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- Check 3: CI passing — no failures ---
|
||||
FAILING=$(echo "$CI_BUCKETS" | jq '[.[] | select(.bucket == "fail")] | length' 2>/dev/null || echo "0")
|
||||
if [ "$FAILING" -gt 0 ]; then
|
||||
FAILING_NAMES=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket,name 2>/dev/null \
|
||||
| jq -r '[.[] | select(.bucket == "fail") | .name] | join(", ")' 2>/dev/null || echo "unknown")
|
||||
echo "NOT COMPLETE: $FAILING failing CI checks on PR #$PR_NUMBER ($FAILING_NAMES)" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- Check 4: a new CI run was triggered AFTER the agent spawned ---
|
||||
if [ -n "$BRANCH" ] && [ "${SPAWNED_AT:-0}" -gt 0 ]; then
|
||||
LATEST_RUN_AT=$(gh run list --repo "$REPO" --branch "$BRANCH" \
|
||||
--json createdAt --limit 1 2>/dev/null | jq -r '.[0].createdAt // ""')
|
||||
if [ -n "$LATEST_RUN_AT" ]; then
|
||||
if date --version >/dev/null 2>&1; then
|
||||
LATEST_RUN_EPOCH=$(date -d "$LATEST_RUN_AT" "+%s" 2>/dev/null || echo "0")
|
||||
else
|
||||
LATEST_RUN_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_RUN_AT" "+%s" 2>/dev/null || echo "0")
|
||||
fi
|
||||
if [ "$LATEST_RUN_EPOCH" -le "$SPAWNED_AT" ]; then
|
||||
echo "NOT COMPLETE: latest CI run on $BRANCH predates agent spawn — agent may not have pushed yet" >&2
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
OWNER=$(echo "$REPO" | cut -d/ -f1)
|
||||
REPONAME=$(echo "$REPO" | cut -d/ -f2)
|
||||
|
||||
# --- Check 5: no unresolved review threads (checked AFTER CI — bots post after their check) ---
|
||||
UNRESOLVED=$(gh api graphql -f query="
|
||||
{ repository(owner: \"${OWNER}\", name: \"${REPONAME}\") {
|
||||
pullRequest(number: ${PR_NUMBER}) {
|
||||
reviewThreads(first: 50) { nodes { isResolved } }
|
||||
}
|
||||
}
|
||||
}
|
||||
" --jq '[.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false)] | length' 2>/dev/null || echo "0")
|
||||
|
||||
if [ "$UNRESOLVED" -gt 0 ]; then
|
||||
echo "NOT COMPLETE: $UNRESOLVED unresolved review threads on PR #$PR_NUMBER" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- Check 6: no CHANGES_REQUESTED (checked AFTER CI — bots post reviews after their check) ---
|
||||
# A CHANGES_REQUESTED review is stale if the latest commit was pushed AFTER the review was submitted.
|
||||
# Stale reviews (pre-dating the fixing commits) should not block verification.
|
||||
#
|
||||
# Fetch commits and latestReviews in a single call and fail closed — if gh fails,
|
||||
# treat that as NOT COMPLETE rather than silently passing.
|
||||
# Use latestReviews (not reviews) so each reviewer's latest state is used — superseded
|
||||
# CHANGES_REQUESTED entries are automatically excluded when the reviewer later approved.
|
||||
# Note: we intentionally use committedDate (not PR updatedAt) because updatedAt changes on any
|
||||
# PR activity (bot comments, label changes) which would create false negatives.
|
||||
PR_REVIEW_METADATA=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
|
||||
--json commits,latestReviews 2>/dev/null) || {
|
||||
echo "NOT COMPLETE: unable to fetch PR review metadata for PR #$PR_NUMBER" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
LATEST_COMMIT_DATE=$(jq -r '.commits[-1].committedDate // ""' <<< "$PR_REVIEW_METADATA")
|
||||
CHANGES_REQUESTED_REVIEWS=$(jq '[.latestReviews[]? | select(.state == "CHANGES_REQUESTED")]' <<< "$PR_REVIEW_METADATA")
|
||||
|
||||
BLOCKING_CHANGES_REQUESTED=0
|
||||
BLOCKING_REQUESTERS=""
|
||||
|
||||
if [ -n "$LATEST_COMMIT_DATE" ] && [ "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length)" -gt 0 ]; then
|
||||
if date --version >/dev/null 2>&1; then
|
||||
LATEST_COMMIT_EPOCH=$(date -d "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
|
||||
else
|
||||
LATEST_COMMIT_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
|
||||
fi
|
||||
|
||||
while IFS= read -r review; do
|
||||
[ -z "$review" ] && continue
|
||||
REVIEW_DATE=$(echo "$review" | jq -r '.submittedAt // ""')
|
||||
REVIEWER=$(echo "$review" | jq -r '.author.login // "unknown"')
|
||||
if [ -z "$REVIEW_DATE" ]; then
|
||||
# No submission date — treat as fresh (conservative: blocks verification)
|
||||
BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
|
||||
BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
|
||||
else
|
||||
if date --version >/dev/null 2>&1; then
|
||||
REVIEW_EPOCH=$(date -d "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
|
||||
else
|
||||
REVIEW_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
|
||||
fi
|
||||
if [ "$REVIEW_EPOCH" -gt "$LATEST_COMMIT_EPOCH" ]; then
|
||||
# Review was submitted AFTER latest commit — still fresh, blocks verification
|
||||
BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
|
||||
BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
|
||||
fi
|
||||
# Review submitted BEFORE latest commit — stale, skip
|
||||
fi
|
||||
done <<< "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -c '.[]')"
|
||||
else
|
||||
# No commit date or no changes_requested — check raw count as fallback
|
||||
BLOCKING_CHANGES_REQUESTED=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length 2>/dev/null || echo "0")
|
||||
BLOCKING_REQUESTERS=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -r '[.[].author.login] | join(", ")' 2>/dev/null || echo "unknown")
|
||||
fi
|
||||
|
||||
if [ "$BLOCKING_CHANGES_REQUESTED" -gt 0 ]; then
|
||||
echo "NOT COMPLETE: CHANGES_REQUESTED (after latest commit) from ${BLOCKING_REQUESTERS} on PR #$PR_NUMBER" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "VERIFIED: PR #$PR_NUMBER — checkpoints ✓, CI complete + green, 0 unresolved threads, no CHANGES_REQUESTED"
|
||||
exit 0
|
||||
@@ -90,12 +90,10 @@ Address comments **one at a time**: fix → commit → push → inline reply →
|
||||
2. Commit and push the fix
|
||||
3. Reply **inline** (not as a new top-level comment) referencing the fixing commit — this is what resolves the conversation for bot reviewers (coderabbitai, sentry):
|
||||
|
||||
Use a **markdown commit link** so GitHub renders it as a clickable reference. Get the full SHA with `git rev-parse HEAD` after committing:
|
||||
|
||||
| Comment type | How to reply |
|
||||
|---|---|
|
||||
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in [abc1234](https://github.com/Significant-Gravitas/AutoGPT/commit/FULL_SHA): <description>"` |
|
||||
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in [abc1234](https://github.com/Significant-Gravitas/AutoGPT/commit/FULL_SHA): <description>"` |
|
||||
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
|
||||
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
|
||||
|
||||
## Codecov coverage
|
||||
|
||||
|
||||
@@ -530,9 +530,19 @@ After showing all screenshots, output a **detailed** summary table:
|
||||
# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
|
||||
# plain variable with a lookup function instead.
|
||||
declare -A SCREENSHOT_EXPLANATIONS=(
|
||||
["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
|
||||
["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
|
||||
# ... one entry per screenshot, using the same explanations you showed the user above
|
||||
# Each explanation MUST answer three things:
|
||||
# 1. FLOW: Which test scenario / user journey is this part of?
|
||||
# 2. STEPS: What exact actions were taken to reach this state?
|
||||
# 3. EVIDENCE: What does this screenshot prove (pass/fail/data)?
|
||||
#
|
||||
# Good example:
|
||||
# ["03-cost-log-after-run.png"]="Flow: LLM block cost tracking. Steps: Logged in as tester@gmail.com → ran 'Cost Test Agent' → waited for COMPLETED status. Evidence: PlatformCostLog table shows 1 new row with cost_microdollars=1234 and correct user_id."
|
||||
#
|
||||
# Bad example (too vague — never do this):
|
||||
# ["03-cost-log.png"]="Shows the cost log table."
|
||||
["01-login-page.png"]="Flow: Login flow. Steps: Opened /login. Evidence: Login page renders with email/password fields and SSO options visible."
|
||||
["02-builder-with-block.png"]="Flow: Block execution. Steps: Logged in → /build → added LLM block. Evidence: Builder canvas shows block connected to trigger, ready to run."
|
||||
# ... one entry per screenshot using the flow/steps/evidence format above
|
||||
)
|
||||
|
||||
TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
|
||||
@@ -547,7 +557,8 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations
|
||||
|
||||
**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
|
||||
|
||||
**CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.** Every screenshot MUST appear as `` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
|
||||
|
||||
```bash
|
||||
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
|
||||
@@ -584,11 +595,11 @@ for img in "${SCREENSHOT_FILES[@]}"; do
|
||||
done
|
||||
TREE_JSON+=']'
|
||||
|
||||
# Step 2: Create tree, commit, and branch ref
|
||||
# Step 2: Create tree, commit (with parent), and branch ref
|
||||
TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
|
||||
|
||||
# Resolve parent commit so screenshots are chained, not orphan root commits
|
||||
PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || echo "")
|
||||
# Resolve existing branch tip as parent (avoids orphan commits on repeat runs)
|
||||
PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || true)
|
||||
if [ -n "$PARENT_SHA" ]; then
|
||||
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
|
||||
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
|
||||
@@ -596,6 +607,7 @@ if [ -n "$PARENT_SHA" ]; then
|
||||
-f "parents[]=$PARENT_SHA" \
|
||||
--jq '.sha')
|
||||
else
|
||||
# First commit on this branch — no parent
|
||||
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
|
||||
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
|
||||
-f tree="$TREE_SHA" \
|
||||
@@ -606,7 +618,7 @@ gh api "repos/${REPO}/git/refs" \
|
||||
-f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
|
||||
-f sha="$COMMIT_SHA" 2>/dev/null \
|
||||
|| gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
|
||||
-X PATCH -f sha="$COMMIT_SHA" -F force=true
|
||||
-X PATCH -f sha="$COMMIT_SHA" -f force=true
|
||||
```
|
||||
|
||||
Then post the comment with **inline images AND explanations for each screenshot**:
|
||||
@@ -670,122 +682,122 @@ ${IMAGE_MARKDOWN}
|
||||
${FAILED_SECTION}
|
||||
INNEREOF
|
||||
|
||||
gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
|
||||
POSTED_BODY=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE" --jq '.body')
|
||||
rm -f "$COMMENT_FILE"
|
||||
|
||||
# Verify the posted comment contains inline images — exit 1 if none found
|
||||
# Use separate --paginate + jq pipe: --jq applies per-page, not to the full list
|
||||
LAST_COMMENT=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" --paginate 2>/dev/null | jq -r '.[-1].body // ""')
|
||||
if ! echo "$LAST_COMMENT" | grep -q '!\['; then
|
||||
echo "ERROR: Posted comment contains no inline images (![). Bare directory links are not acceptable." >&2
|
||||
exit 1
|
||||
fi
|
||||
echo "✓ Inline images verified in posted comment"
|
||||
```
|
||||
|
||||
**The PR comment MUST include:**
|
||||
1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
|
||||
2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
|
||||
3. A 1-2 sentence explanation below each screenshot describing what it proves
|
||||
3. A structured explanation below each screenshot covering: **Flow** (which scenario), **Steps** (exact actions taken to reach this state), **Evidence** (what this proves — pass/fail/data values). A bare "shows the page" caption is not acceptable.
|
||||
|
||||
This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
|
||||
|
||||
## Step 8: Evaluate and post a formal PR review
|
||||
|
||||
After the test comment is posted, evaluate whether the run was thorough enough to make a merge decision, then post a formal GitHub review (approve or request changes). **This step is mandatory — every test run MUST end with a formal review decision.**
|
||||
|
||||
### Evaluation criteria
|
||||
|
||||
Re-read the PR description:
|
||||
```bash
|
||||
gh pr view "$PR_NUMBER" --json body --jq '.body' --repo "$REPO"
|
||||
```
|
||||
|
||||
Score the run against each criterion:
|
||||
|
||||
| Criterion | Pass condition |
|
||||
|-----------|---------------|
|
||||
| **Coverage** | Every feature/change described in the PR has at least one test scenario |
|
||||
| **All scenarios pass** | No FAIL rows in the results table |
|
||||
| **Negative tests** | At least one failure-path test per feature (invalid input, unauthorized, edge case) |
|
||||
| **Before/after evidence** | Every state-changing API call has before/after values logged |
|
||||
| **Screenshots are meaningful** | Screenshots show the actual state change, not just a loading spinner or blank page |
|
||||
| **No regressions** | Existing core flows (login, agent create/run) still work |
|
||||
|
||||
### Decision logic
|
||||
|
||||
```
|
||||
ALL criteria pass → APPROVE
|
||||
Any scenario FAIL or missing PR feature → REQUEST_CHANGES (list gaps)
|
||||
Evidence weak (no before/after, vague shots) → REQUEST_CHANGES (list what's missing)
|
||||
```
|
||||
|
||||
### Post the review
|
||||
**Verify inline rendering after posting — this is required, not optional:**
|
||||
|
||||
```bash
|
||||
REVIEW_FILE=$(mktemp)
|
||||
# 1. Confirm the posted comment body contains inline image markdown syntax
|
||||
if ! echo "$POSTED_BODY" | grep -q '!\['; then
|
||||
echo "❌ FAIL: No inline image tags in posted comment body. Re-check IMAGE_MARKDOWN and re-post."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Count results
|
||||
PASS_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "PASS" || true)
|
||||
FAIL_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "FAIL" || true)
|
||||
TOTAL=$(( PASS_COUNT + FAIL_COUNT ))
|
||||
|
||||
# List any coverage gaps found during evaluation (populate this array as you assess)
|
||||
# e.g. COVERAGE_GAPS=("PR claims to add X but no test covers it")
|
||||
COVERAGE_GAPS=()
|
||||
# 2. Verify at least one raw URL actually resolves (catches wrong branch name, wrong path, etc.)
|
||||
FIRST_IMG_URL=$(echo "$POSTED_BODY" | grep -o 'https://raw.githubusercontent.com[^)]*' | head -1)
|
||||
if [ -n "$FIRST_IMG_URL" ]; then
|
||||
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$FIRST_IMG_URL")
|
||||
if [ "$HTTP_STATUS" = "200" ]; then
|
||||
echo "✅ Inline images confirmed and raw URL resolves (HTTP 200)"
|
||||
else
|
||||
echo "❌ FAIL: Raw image URL returned HTTP $HTTP_STATUS — images will not render inline."
|
||||
echo " URL: $FIRST_IMG_URL"
|
||||
echo " Check branch name, path, and that the push succeeded."
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo "⚠️ Could not extract a raw URL from the comment — verify manually."
|
||||
fi
|
||||
```
|
||||
|
||||
**If APPROVING** — all criteria met, zero failures, full coverage:
|
||||
## Step 8: Evaluate test completeness and post a GitHub review
|
||||
|
||||
After posting the PR comment, evaluate whether the test run actually covered everything it needed to. This is NOT a rubber-stamp — be critical. Then post a formal GitHub review so the PR author and reviewers can see the verdict.
|
||||
|
||||
### 8a. Evaluate against the test plan
|
||||
|
||||
Re-read `$RESULTS_DIR/test-plan.md` (written in Step 2) and `$RESULTS_DIR/test-report.md` (written in Step 5). For each scenario in the plan, answer:
|
||||
|
||||
> **Note:** `test-report.md` is written in Step 5. If it doesn't exist, write it before proceeding here — see the Step 5 template. Do not skip evaluation because the file is missing; create it from your notes instead.
|
||||
|
||||
| Question | Pass criteria |
|
||||
|----------|--------------|
|
||||
| Was it tested? | Explicit steps were executed, not just described |
|
||||
| Is there screenshot evidence? | At least one before/after screenshot per scenario |
|
||||
| Did the core feature work correctly? | Expected state matches actual state |
|
||||
| Were negative cases tested? | At least one failure/rejection case per feature |
|
||||
| Was DB/API state verified (not just UI)? | Raw API response or DB query confirms state change |
|
||||
|
||||
Build a verdict:
|
||||
- **APPROVE** — every scenario tested, evidence present, no bugs found or all bugs are minor/known
|
||||
- **REQUEST_CHANGES** — one or more: untested scenarios, missing evidence, bugs found, data not verified
|
||||
|
||||
### 8b. Post the GitHub review
|
||||
|
||||
```bash
|
||||
cat > "$REVIEW_FILE" <<REVIEWEOF
|
||||
## E2E Test Evaluation — APPROVED
|
||||
EVAL_FILE=$(mktemp)
|
||||
|
||||
**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed.
|
||||
# === STEP A: Write header ===
|
||||
cat > "$EVAL_FILE" << 'ENDEVAL'
|
||||
## 🧪 Test Evaluation
|
||||
|
||||
**Coverage:** All features described in the PR were exercised.
|
||||
### Coverage checklist
|
||||
ENDEVAL
|
||||
|
||||
**Evidence:** Before/after API values logged for all state-changing operations; screenshots show meaningful state transitions.
|
||||
# === STEP B: Append ONE line per scenario — do this BEFORE calculating verdict ===
|
||||
# Format: "- ✅ **Scenario N – name**: <what was done and verified>"
|
||||
# or "- ❌ **Scenario N – name**: <what is missing or broken>"
|
||||
# Examples:
|
||||
# echo "- ✅ **Scenario 1 – Login flow**: tested, screenshot evidence present, auth token verified via API" >> "$EVAL_FILE"
|
||||
# echo "- ❌ **Scenario 3 – Cost logging**: NOT verified in DB — UI showed entry but raw SQL query was skipped" >> "$EVAL_FILE"
|
||||
#
|
||||
# !!! IMPORTANT: append ALL scenario lines here before proceeding to STEP C !!!
|
||||
|
||||
**Negative tests:** Failure paths tested for each feature.
|
||||
# === STEP C: Derive verdict from the checklist — runs AFTER all lines are appended ===
|
||||
FAIL_COUNT=$(grep -c "^- ❌" "$EVAL_FILE" || true)
|
||||
if [ "$FAIL_COUNT" -eq 0 ]; then
|
||||
VERDICT="APPROVE"
|
||||
else
|
||||
VERDICT="REQUEST_CHANGES"
|
||||
fi
|
||||
|
||||
No regressions observed on core flows.
|
||||
REVIEWEOF
|
||||
# === STEP D: Append verdict section ===
|
||||
cat >> "$EVAL_FILE" << ENDVERDICT
|
||||
|
||||
gh pr review "$PR_NUMBER" --repo "$REPO" --approve --body "$(cat "$REVIEW_FILE")"
|
||||
echo "✅ PR approved"
|
||||
```
|
||||
### Verdict
|
||||
ENDVERDICT
|
||||
|
||||
**If REQUESTING CHANGES** — any failure, coverage gap, or missing evidence:
|
||||
if [ "$VERDICT" = "APPROVE" ]; then
|
||||
echo "✅ All scenarios covered with evidence. No blocking issues found." >> "$EVAL_FILE"
|
||||
else
|
||||
echo "❌ $FAIL_COUNT scenario(s) incomplete or have confirmed bugs. See ❌ items above." >> "$EVAL_FILE"
|
||||
echo "" >> "$EVAL_FILE"
|
||||
echo "**Required before merge:** address each ❌ item above." >> "$EVAL_FILE"
|
||||
fi
|
||||
|
||||
```bash
|
||||
FAIL_LIST=$(echo "$TEST_RESULTS_TABLE" | grep "FAIL" | awk -F'|' '{print "- Scenario" $2 "failed"}' || true)
|
||||
# === STEP E: Post the review ===
|
||||
gh api "repos/${REPO}/pulls/$PR_NUMBER/reviews" \
|
||||
--method POST \
|
||||
-f body="$(cat "$EVAL_FILE")" \
|
||||
-f event="$VERDICT"
|
||||
|
||||
cat > "$REVIEW_FILE" <<REVIEWEOF
|
||||
## E2E Test Evaluation — Changes Requested
|
||||
|
||||
**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed, ${FAIL_COUNT} failed.
|
||||
|
||||
### Required before merge
|
||||
|
||||
${FAIL_LIST}
|
||||
$(for gap in "${COVERAGE_GAPS[@]}"; do echo "- $gap"; done)
|
||||
|
||||
Please fix the above and re-run the E2E tests.
|
||||
REVIEWEOF
|
||||
|
||||
gh pr review "$PR_NUMBER" --repo "$REPO" --request-changes --body "$(cat "$REVIEW_FILE")"
|
||||
echo "❌ Changes requested"
|
||||
```
|
||||
|
||||
```bash
|
||||
rm -f "$REVIEW_FILE"
|
||||
rm -f "$EVAL_FILE"
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- In `--fix` mode, fix all failures before posting the review — the review reflects the final state after fixes
|
||||
- Never approve if any scenario failed, even if it seems like a flake — rerun that scenario first
|
||||
- Never request changes for issues already fixed in this run
|
||||
- Never auto-approve without checking every scenario in the test plan
|
||||
- `REQUEST_CHANGES` if ANY scenario is untested, lacks DB/API evidence, or has a confirmed bug
|
||||
- The evaluation body must list every scenario explicitly (✅ or ❌) — not just the failures
|
||||
- If you find new bugs during evaluation, add them to the request-changes body and (if `--fix` flag is set) fix them before posting
|
||||
|
||||
## Fix mode (--fix flag)
|
||||
|
||||
|
||||
@@ -16,7 +16,6 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator
|
||||
from backend.copilot import service as chat_service
|
||||
from backend.copilot import stream_registry
|
||||
from backend.copilot.config import ChatConfig, CopilotMode
|
||||
from backend.copilot.db import get_chat_messages_paginated
|
||||
from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
|
||||
from backend.copilot.model import (
|
||||
ChatMessage,
|
||||
@@ -156,8 +155,6 @@ class SessionDetailResponse(BaseModel):
|
||||
user_id: str | None
|
||||
messages: list[dict]
|
||||
active_stream: ActiveStreamInfo | None = None # Present if stream is still active
|
||||
has_more_messages: bool = False
|
||||
oldest_sequence: int | None = None
|
||||
total_prompt_tokens: int = 0
|
||||
total_completion_tokens: int = 0
|
||||
metadata: ChatSessionMetadata = ChatSessionMetadata()
|
||||
@@ -397,78 +394,60 @@ async def update_session_title_route(
|
||||
async def get_session(
|
||||
session_id: str,
|
||||
user_id: Annotated[str, Security(auth.get_user_id)],
|
||||
limit: int = Query(default=50, ge=1, le=200),
|
||||
before_sequence: int | None = Query(default=None, ge=0),
|
||||
) -> SessionDetailResponse:
|
||||
"""
|
||||
Retrieve the details of a specific chat session.
|
||||
|
||||
Supports cursor-based pagination via ``limit`` and ``before_sequence``.
|
||||
When no pagination params are provided, returns the most recent messages.
|
||||
Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
|
||||
If there's an active stream for this session, returns active_stream info for reconnection.
|
||||
|
||||
Args:
|
||||
session_id: The unique identifier for the desired chat session.
|
||||
user_id: The authenticated user's ID.
|
||||
limit: Maximum number of messages to return (1-200, default 50).
|
||||
before_sequence: Return messages with sequence < this value (cursor).
|
||||
user_id: The optional authenticated user ID, or None for anonymous access.
|
||||
|
||||
Returns:
|
||||
SessionDetailResponse: Details for the requested session, including
|
||||
active_stream info and pagination metadata.
|
||||
SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
|
||||
|
||||
"""
|
||||
page = await get_chat_messages_paginated(
|
||||
session_id, limit, before_sequence, user_id=user_id
|
||||
)
|
||||
if page is None:
|
||||
session = await get_chat_session(session_id, user_id)
|
||||
if not session:
|
||||
raise NotFoundError(f"Session {session_id} not found.")
|
||||
messages = [message.model_dump() for message in page.messages]
|
||||
|
||||
# Only check active stream on initial load (not on "load more" requests)
|
||||
messages = [message.model_dump() for message in session.messages]
|
||||
|
||||
# Check if there's an active stream for this session
|
||||
active_stream_info = None
|
||||
if before_sequence is None:
|
||||
active_session, last_message_id = await stream_registry.get_active_session(
|
||||
session_id, user_id
|
||||
)
|
||||
logger.info(
|
||||
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
|
||||
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
|
||||
)
|
||||
if active_session:
|
||||
active_stream_info = ActiveStreamInfo(
|
||||
turn_id=active_session.turn_id,
|
||||
last_message_id=last_message_id,
|
||||
)
|
||||
|
||||
# Skip session metadata on "load more" — frontend only needs messages
|
||||
if before_sequence is not None:
|
||||
return SessionDetailResponse(
|
||||
id=page.session.session_id,
|
||||
created_at=page.session.started_at.isoformat(),
|
||||
updated_at=page.session.updated_at.isoformat(),
|
||||
user_id=page.session.user_id or None,
|
||||
messages=messages,
|
||||
active_stream=None,
|
||||
has_more_messages=page.has_more,
|
||||
oldest_sequence=page.oldest_sequence,
|
||||
total_prompt_tokens=0,
|
||||
total_completion_tokens=0,
|
||||
active_session, last_message_id = await stream_registry.get_active_session(
|
||||
session_id, user_id
|
||||
)
|
||||
logger.info(
|
||||
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
|
||||
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
|
||||
)
|
||||
if active_session:
|
||||
# Keep the assistant message (including tool_calls) so the frontend can
|
||||
# render the correct tool UI (e.g. CreateAgent with mini game).
|
||||
# convertChatSessionToUiMessages handles isComplete=false by setting
|
||||
# tool parts without output to state "input-available".
|
||||
active_stream_info = ActiveStreamInfo(
|
||||
turn_id=active_session.turn_id,
|
||||
last_message_id=last_message_id,
|
||||
)
|
||||
|
||||
total_prompt = sum(u.prompt_tokens for u in page.session.usage)
|
||||
total_completion = sum(u.completion_tokens for u in page.session.usage)
|
||||
# Sum token usage from session
|
||||
total_prompt = sum(u.prompt_tokens for u in session.usage)
|
||||
total_completion = sum(u.completion_tokens for u in session.usage)
|
||||
|
||||
return SessionDetailResponse(
|
||||
id=page.session.session_id,
|
||||
created_at=page.session.started_at.isoformat(),
|
||||
updated_at=page.session.updated_at.isoformat(),
|
||||
user_id=page.session.user_id or None,
|
||||
id=session.session_id,
|
||||
created_at=session.started_at.isoformat(),
|
||||
updated_at=session.updated_at.isoformat(),
|
||||
user_id=session.user_id or None,
|
||||
messages=messages,
|
||||
active_stream=active_stream_info,
|
||||
has_more_messages=page.has_more,
|
||||
oldest_sequence=page.oldest_sequence,
|
||||
total_prompt_tokens=total_prompt,
|
||||
total_completion_tokens=total_completion,
|
||||
metadata=page.session.metadata,
|
||||
metadata=session.metadata,
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -146,6 +146,32 @@ class ChatConfig(BaseSettings):
|
||||
description="Use --resume for multi-turn conversations instead of "
|
||||
"history compression. Falls back to compression when unavailable.",
|
||||
)
|
||||
claude_agent_fallback_model: str = Field(
|
||||
default="claude-sonnet-4-20250514",
|
||||
description="Fallback model when the primary model is unavailable (e.g. 529 "
|
||||
"overloaded). The SDK automatically retries with this cheaper model.",
|
||||
)
|
||||
claude_agent_max_turns: int = Field(
|
||||
default=1000,
|
||||
ge=1,
|
||||
le=10000,
|
||||
description="Maximum number of agentic turns (tool-use loops) per query. "
|
||||
"Prevents runaway tool loops from burning budget.",
|
||||
)
|
||||
claude_agent_max_budget_usd: float = Field(
|
||||
default=100.0,
|
||||
ge=0.01,
|
||||
le=1000.0,
|
||||
description="Maximum spend in USD per SDK query. The CLI aborts the "
|
||||
"request if this budget is exceeded.",
|
||||
)
|
||||
claude_agent_max_transient_retries: int = Field(
|
||||
default=3,
|
||||
ge=0,
|
||||
le=10,
|
||||
description="Maximum number of retries for transient API errors "
|
||||
"(429, 5xx, ECONNRESET) before surfacing the error to the user.",
|
||||
)
|
||||
use_openrouter: bool = Field(
|
||||
default=True,
|
||||
description="Enable routing API calls through the OpenRouter proxy. "
|
||||
|
||||
@@ -44,12 +44,31 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
|
||||
# Transient Anthropic API error detection
|
||||
# ---------------------------------------------------------------------------
|
||||
# Patterns in error text that indicate a transient Anthropic API error
|
||||
# (ECONNRESET / dropped TCP connection) which is retryable.
|
||||
# which is retryable. Covers:
|
||||
# - Connection-level: ECONNRESET, dropped TCP connections
|
||||
# - HTTP 429: rate-limit / too-many-requests
|
||||
# - HTTP 5xx: server errors
|
||||
#
|
||||
# Prefer specific status-code patterns over natural-language phrases
|
||||
# (e.g. "overloaded", "bad gateway") — those phrases can appear in
|
||||
# application-level SDK messages and would trigger spurious retries.
|
||||
_TRANSIENT_ERROR_PATTERNS = (
|
||||
# Connection-level
|
||||
"socket connection was closed unexpectedly",
|
||||
"ECONNRESET",
|
||||
"connection was forcibly closed",
|
||||
"network socket disconnected",
|
||||
# 429 rate-limit patterns
|
||||
"rate limit",
|
||||
"rate_limit",
|
||||
"too many requests",
|
||||
"status code 429",
|
||||
# 5xx server error patterns (status-code-specific to avoid false positives)
|
||||
"status code 529",
|
||||
"status code 500",
|
||||
"status code 502",
|
||||
"status code 503",
|
||||
"status code 504",
|
||||
)
|
||||
|
||||
FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"
|
||||
|
||||
@@ -14,7 +14,6 @@ from prisma.types import (
|
||||
ChatSessionUpdateInput,
|
||||
ChatSessionWhereInput,
|
||||
)
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.data import db
|
||||
from backend.util.json import SafeJson, sanitize_string
|
||||
@@ -31,15 +30,6 @@ from .model import get_chat_session as get_chat_session_cached
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PaginatedMessages(BaseModel):
|
||||
"""Result of a paginated message query."""
|
||||
|
||||
messages: list[ChatMessage]
|
||||
has_more: bool
|
||||
oldest_sequence: int | None
|
||||
session: ChatSessionInfo
|
||||
|
||||
|
||||
async def get_chat_session(session_id: str) -> ChatSession | None:
|
||||
"""Get a chat session by ID from the database."""
|
||||
session = await PrismaChatSession.prisma().find_unique(
|
||||
@@ -49,116 +39,6 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
|
||||
return ChatSession.from_db(session) if session else None
|
||||
|
||||
|
||||
async def get_chat_session_metadata(session_id: str) -> ChatSessionInfo | None:
|
||||
"""Get chat session metadata (without messages) for ownership validation."""
|
||||
session = await PrismaChatSession.prisma().find_unique(
|
||||
where={"id": session_id},
|
||||
)
|
||||
return ChatSessionInfo.from_db(session) if session else None
|
||||
|
||||
|
||||
async def get_chat_messages_paginated(
|
||||
session_id: str,
|
||||
limit: int = 50,
|
||||
before_sequence: int | None = None,
|
||||
user_id: str | None = None,
|
||||
) -> PaginatedMessages | None:
|
||||
"""Get paginated messages for a session, newest first.
|
||||
|
||||
Verifies session existence (and ownership when ``user_id`` is provided)
|
||||
in parallel with the message query. Returns ``None`` when the session
|
||||
is not found or does not belong to the user.
|
||||
|
||||
Args:
|
||||
session_id: The chat session ID.
|
||||
limit: Max messages to return.
|
||||
before_sequence: Cursor — return messages with sequence < this value.
|
||||
user_id: If provided, filters via ``Session.userId`` so only the
|
||||
session owner's messages are returned (acts as an ownership guard).
|
||||
"""
|
||||
# Build session-existence / ownership check
|
||||
session_where: ChatSessionWhereInput = {"id": session_id}
|
||||
if user_id is not None:
|
||||
session_where["userId"] = user_id
|
||||
|
||||
# Build message include — fetch paginated messages in the same query
|
||||
msg_include: dict[str, Any] = {
|
||||
"order_by": {"sequence": "desc"},
|
||||
"take": limit + 1,
|
||||
}
|
||||
if before_sequence is not None:
|
||||
msg_include["where"] = {"sequence": {"lt": before_sequence}}
|
||||
|
||||
# Single query: session existence/ownership + paginated messages
|
||||
session = await PrismaChatSession.prisma().find_first(
|
||||
where=session_where,
|
||||
include={"Messages": msg_include},
|
||||
)
|
||||
|
||||
if session is None:
|
||||
return None
|
||||
|
||||
session_info = ChatSessionInfo.from_db(session)
|
||||
results = list(session.Messages) if session.Messages else []
|
||||
|
||||
has_more = len(results) > limit
|
||||
results = results[:limit]
|
||||
|
||||
# Reverse to ascending order
|
||||
results.reverse()
|
||||
|
||||
# Tool-call boundary fix: if the oldest message is a tool message,
|
||||
# expand backward to include the preceding assistant message that
|
||||
# owns the tool_calls, so convertChatSessionMessagesToUiMessages
|
||||
# can pair them correctly.
|
||||
_BOUNDARY_SCAN_LIMIT = 10
|
||||
if results and results[0].role == "tool":
|
||||
boundary_where: dict[str, Any] = {
|
||||
"sessionId": session_id,
|
||||
"sequence": {"lt": results[0].sequence},
|
||||
}
|
||||
if user_id is not None:
|
||||
boundary_where["Session"] = {"is": {"userId": user_id}}
|
||||
extra = await PrismaChatMessage.prisma().find_many(
|
||||
where=boundary_where,
|
||||
order={"sequence": "desc"},
|
||||
take=_BOUNDARY_SCAN_LIMIT,
|
||||
)
|
||||
# Find the first non-tool message (should be the assistant)
|
||||
boundary_msgs = []
|
||||
found_owner = False
|
||||
for msg in extra:
|
||||
boundary_msgs.append(msg)
|
||||
if msg.role != "tool":
|
||||
found_owner = True
|
||||
break
|
||||
boundary_msgs.reverse()
|
||||
if not found_owner:
|
||||
logger.warning(
|
||||
"Boundary expansion did not find owning assistant message "
|
||||
"for session=%s before sequence=%s (%d msgs scanned)",
|
||||
session_id,
|
||||
results[0].sequence,
|
||||
len(extra),
|
||||
)
|
||||
if boundary_msgs:
|
||||
results = boundary_msgs + results
|
||||
# Only mark has_more if the expanded boundary isn't the
|
||||
# very start of the conversation (sequence 0).
|
||||
if boundary_msgs[0].sequence > 0:
|
||||
has_more = True
|
||||
|
||||
messages = [ChatMessage.from_db(m) for m in results]
|
||||
oldest_sequence = messages[0].sequence if messages else None
|
||||
|
||||
return PaginatedMessages(
|
||||
messages=messages,
|
||||
has_more=has_more,
|
||||
oldest_sequence=oldest_sequence,
|
||||
session=session_info,
|
||||
)
|
||||
|
||||
|
||||
async def create_chat_session(
|
||||
session_id: str,
|
||||
user_id: str,
|
||||
|
||||
@@ -1,341 +1,7 @@
|
||||
"""Unit tests for copilot.db — paginated message queries."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
from prisma.models import ChatMessage as PrismaChatMessage
|
||||
from prisma.models import ChatSession as PrismaChatSession
|
||||
|
||||
from backend.copilot.db import (
|
||||
PaginatedMessages,
|
||||
get_chat_messages_paginated,
|
||||
set_turn_duration,
|
||||
)
|
||||
from backend.copilot.model import ChatMessage as CopilotChatMessage
|
||||
from backend.copilot.model import ChatSession, get_chat_session, upsert_chat_session
|
||||
|
||||
|
||||
def _make_msg(
|
||||
sequence: int,
|
||||
role: str = "assistant",
|
||||
content: str | None = "hello",
|
||||
tool_calls: Any = None,
|
||||
) -> PrismaChatMessage:
|
||||
"""Build a minimal PrismaChatMessage for testing."""
|
||||
return PrismaChatMessage(
|
||||
id=f"msg-{sequence}",
|
||||
createdAt=datetime.now(UTC),
|
||||
sessionId="sess-1",
|
||||
role=role,
|
||||
content=content,
|
||||
sequence=sequence,
|
||||
toolCalls=tool_calls,
|
||||
name=None,
|
||||
toolCallId=None,
|
||||
refusal=None,
|
||||
functionCall=None,
|
||||
)
|
||||
|
||||
|
||||
def _make_session(
|
||||
session_id: str = "sess-1",
|
||||
user_id: str = "user-1",
|
||||
messages: list[PrismaChatMessage] | None = None,
|
||||
) -> PrismaChatSession:
|
||||
"""Build a minimal PrismaChatSession for testing."""
|
||||
now = datetime.now(UTC)
|
||||
session = PrismaChatSession.model_construct(
|
||||
id=session_id,
|
||||
createdAt=now,
|
||||
updatedAt=now,
|
||||
userId=user_id,
|
||||
credentials={},
|
||||
successfulAgentRuns={},
|
||||
successfulAgentSchedules={},
|
||||
totalPromptTokens=0,
|
||||
totalCompletionTokens=0,
|
||||
title=None,
|
||||
metadata={},
|
||||
Messages=messages or [],
|
||||
)
|
||||
return session
|
||||
|
||||
|
||||
SESSION_ID = "sess-1"
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def mock_db():
|
||||
"""Patch ChatSession.prisma().find_first and ChatMessage.prisma().find_many.
|
||||
|
||||
find_first is used for the main query (session + included messages).
|
||||
find_many is used only for boundary expansion queries.
|
||||
"""
|
||||
with (
|
||||
patch.object(PrismaChatSession, "prisma") as mock_session_prisma,
|
||||
patch.object(PrismaChatMessage, "prisma") as mock_msg_prisma,
|
||||
):
|
||||
find_first = AsyncMock()
|
||||
mock_session_prisma.return_value.find_first = find_first
|
||||
|
||||
find_many = AsyncMock(return_value=[])
|
||||
mock_msg_prisma.return_value.find_many = find_many
|
||||
|
||||
yield find_first, find_many
|
||||
|
||||
|
||||
# ---------- Basic pagination ----------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_basic_page_returns_messages_ascending(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""Messages are returned in ascending sequence order."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
|
||||
)
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert isinstance(page, PaginatedMessages)
|
||||
assert [m.sequence for m in page.messages] == [1, 2, 3]
|
||||
assert page.has_more is False
|
||||
assert page.oldest_sequence == 1
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_has_more_when_results_exceed_limit(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""has_more is True when DB returns more than limit items."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
|
||||
)
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=2)
|
||||
|
||||
assert page is not None
|
||||
assert page.has_more is True
|
||||
assert len(page.messages) == 2
|
||||
assert [m.sequence for m in page.messages] == [2, 3]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_session_returns_no_messages(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(messages=[])
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
|
||||
|
||||
assert page is not None
|
||||
assert page.messages == []
|
||||
assert page.has_more is False
|
||||
assert page.oldest_sequence is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_before_sequence_filters_correctly(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""before_sequence is passed as a where filter inside the Messages include."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(2), _make_msg(1)],
|
||||
)
|
||||
|
||||
await get_chat_messages_paginated(SESSION_ID, limit=50, before_sequence=5)
|
||||
|
||||
call_kwargs = find_first.call_args
|
||||
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
|
||||
assert include["Messages"]["where"] == {"sequence": {"lt": 5}}
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_where_on_messages_without_before_sequence(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""Without before_sequence, the Messages include has no where clause."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(messages=[_make_msg(1)])
|
||||
|
||||
await get_chat_messages_paginated(SESSION_ID, limit=50)
|
||||
|
||||
call_kwargs = find_first.call_args
|
||||
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
|
||||
assert "where" not in include["Messages"]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_user_id_filter_applied_to_session_where(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""user_id adds a userId filter to the session-level where clause."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(messages=[_make_msg(1)])
|
||||
|
||||
await get_chat_messages_paginated(SESSION_ID, limit=50, user_id="user-abc")
|
||||
|
||||
call_kwargs = find_first.call_args
|
||||
where = call_kwargs.kwargs.get("where") or call_kwargs[1].get("where")
|
||||
assert where["userId"] == "user-abc"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_session_not_found_returns_none(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""Returns None when session doesn't exist or user doesn't own it."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = None
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
|
||||
|
||||
assert page is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_session_info_included_in_result(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""PaginatedMessages includes session metadata."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(messages=[_make_msg(1)])
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
|
||||
|
||||
assert page is not None
|
||||
assert page.session.session_id == SESSION_ID
|
||||
|
||||
|
||||
# ---------- Backward boundary expansion ----------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_includes_assistant(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""When page starts with a tool message, expand backward to include
|
||||
the owning assistant message."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(5, role="tool"), _make_msg(4, role="tool")],
|
||||
)
|
||||
find_many.return_value = [_make_msg(3, role="assistant")]
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert [m.sequence for m in page.messages] == [3, 4, 5]
|
||||
assert page.messages[0].role == "assistant"
|
||||
assert page.oldest_sequence == 3
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_includes_multiple_tool_msgs(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""Boundary expansion scans past consecutive tool messages to find
|
||||
the owning assistant."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(7, role="tool")],
|
||||
)
|
||||
find_many.return_value = [
|
||||
_make_msg(6, role="tool"),
|
||||
_make_msg(5, role="tool"),
|
||||
_make_msg(4, role="assistant"),
|
||||
]
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert [m.sequence for m in page.messages] == [4, 5, 6, 7]
|
||||
assert page.messages[0].role == "assistant"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_sets_has_more_when_not_at_start(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""After boundary expansion, has_more=True if expanded msgs aren't at seq 0."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(3, role="tool")],
|
||||
)
|
||||
find_many.return_value = [_make_msg(2, role="assistant")]
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert page.has_more is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_no_has_more_at_conversation_start(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""has_more stays False when boundary expansion reaches seq 0."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(1, role="tool")],
|
||||
)
|
||||
find_many.return_value = [_make_msg(0, role="assistant")]
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert page.has_more is False
|
||||
assert page.oldest_sequence == 0
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_boundary_expansion_when_first_msg_not_tool(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""No boundary expansion when the first message is not a tool message."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(3, role="user"), _make_msg(2, role="assistant")],
|
||||
)
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert find_many.call_count == 0
|
||||
assert [m.sequence for m in page.messages] == [2, 3]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_warns_when_no_owner_found(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""When boundary scan doesn't find a non-tool message, a warning is logged
|
||||
and the boundary messages are still included."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(10, role="tool")],
|
||||
)
|
||||
find_many.return_value = [_make_msg(i, role="tool") for i in range(9, -1, -1)]
|
||||
|
||||
with patch("backend.copilot.db.logger") as mock_logger:
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
mock_logger.warning.assert_called_once()
|
||||
|
||||
assert page is not None
|
||||
assert page.messages[0].role == "tool"
|
||||
assert len(page.messages) > 1
|
||||
|
||||
|
||||
# ---------- Turn duration (integration tests) ----------
|
||||
from .db import set_turn_duration
|
||||
from .model import ChatMessage, ChatSession, get_chat_session, upsert_chat_session
|
||||
|
||||
|
||||
@pytest.mark.asyncio(loop_scope="session")
|
||||
@@ -349,8 +15,8 @@ async def test_set_turn_duration_updates_cache_in_place(setup_test_user, test_us
|
||||
"""
|
||||
session = ChatSession.new(user_id=test_user_id, dry_run=False)
|
||||
session.messages = [
|
||||
CopilotChatMessage(role="user", content="hello"),
|
||||
CopilotChatMessage(role="assistant", content="hi there"),
|
||||
ChatMessage(role="user", content="hello"),
|
||||
ChatMessage(role="assistant", content="hi there"),
|
||||
]
|
||||
session = await upsert_chat_session(session)
|
||||
|
||||
@@ -375,7 +41,7 @@ async def test_set_turn_duration_no_assistant_message(setup_test_user, test_user
|
||||
"""set_turn_duration is a no-op when there are no assistant messages."""
|
||||
session = ChatSession.new(user_id=test_user_id, dry_run=False)
|
||||
session.messages = [
|
||||
CopilotChatMessage(role="user", content="hello"),
|
||||
ChatMessage(role="user", content="hello"),
|
||||
]
|
||||
session = await upsert_chat_session(session)
|
||||
|
||||
|
||||
@@ -64,7 +64,6 @@ class ChatMessage(BaseModel):
|
||||
refusal: str | None = None
|
||||
tool_calls: list[dict] | None = None
|
||||
function_call: dict | None = None
|
||||
sequence: int | None = None
|
||||
duration_ms: int | None = None
|
||||
|
||||
@staticmethod
|
||||
@@ -78,7 +77,6 @@ class ChatMessage(BaseModel):
|
||||
refusal=prisma_message.refusal,
|
||||
tool_calls=_parse_json_field(prisma_message.toolCalls),
|
||||
function_call=_parse_json_field(prisma_message.functionCall),
|
||||
sequence=prisma_message.sequence,
|
||||
duration_ms=prisma_message.durationMs,
|
||||
)
|
||||
|
||||
|
||||
@@ -26,14 +26,14 @@ def build_sdk_env(
|
||||
|
||||
Three modes (checked in order):
|
||||
1. **Subscription** — clears all keys; CLI uses ``claude login`` auth.
|
||||
2. **Direct Anthropic** — returns ``{}``; subprocess inherits
|
||||
``ANTHROPIC_API_KEY`` from the parent environment.
|
||||
2. **Direct Anthropic** — subprocess inherits ``ANTHROPIC_API_KEY``
|
||||
from the parent environment (no overrides needed).
|
||||
3. **OpenRouter** (default) — overrides base URL and auth token to
|
||||
route through the proxy, with Langfuse trace headers.
|
||||
|
||||
When *sdk_cwd* is provided, ``CLAUDE_CODE_TMPDIR`` is set so that
|
||||
the CLI writes temp/sub-agent output inside the per-session workspace
|
||||
directory rather than an inaccessible system temp path.
|
||||
All modes receive workspace isolation (``CLAUDE_CODE_TMPDIR``) and
|
||||
security hardening env vars to prevent .claude.md loading, prompt
|
||||
history persistence, auto-memory writes, and non-essential traffic.
|
||||
"""
|
||||
# --- Mode 1: Claude Code subscription auth ---
|
||||
if config.use_claude_code_subscription:
|
||||
@@ -43,40 +43,46 @@ def build_sdk_env(
|
||||
"ANTHROPIC_AUTH_TOKEN": "",
|
||||
"ANTHROPIC_BASE_URL": "",
|
||||
}
|
||||
if sdk_cwd:
|
||||
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
|
||||
return env
|
||||
|
||||
# --- Mode 2: Direct Anthropic (no proxy hop) ---
|
||||
if not config.openrouter_active:
|
||||
elif not config.openrouter_active:
|
||||
env = {}
|
||||
if sdk_cwd:
|
||||
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
|
||||
return env
|
||||
|
||||
# --- Mode 3: OpenRouter proxy ---
|
||||
base = (config.base_url or "").rstrip("/")
|
||||
if base.endswith("/v1"):
|
||||
base = base[:-3]
|
||||
env = {
|
||||
"ANTHROPIC_BASE_URL": base,
|
||||
"ANTHROPIC_AUTH_TOKEN": config.api_key or "",
|
||||
"ANTHROPIC_API_KEY": "", # force CLI to use AUTH_TOKEN
|
||||
}
|
||||
else:
|
||||
base = (config.base_url or "").rstrip("/")
|
||||
if base.endswith("/v1"):
|
||||
base = base[:-3]
|
||||
env = {
|
||||
"ANTHROPIC_BASE_URL": base,
|
||||
"ANTHROPIC_AUTH_TOKEN": config.api_key or "",
|
||||
"ANTHROPIC_API_KEY": "", # force CLI to use AUTH_TOKEN
|
||||
}
|
||||
|
||||
# Inject broadcast headers so OpenRouter forwards traces to Langfuse.
|
||||
def _safe(v: str) -> str:
|
||||
return v.replace("\r", "").replace("\n", "").strip()[:128]
|
||||
# Inject broadcast headers so OpenRouter forwards traces to Langfuse.
|
||||
def _safe(v: str) -> str:
|
||||
return v.replace("\r", "").replace("\n", "").strip()[:128]
|
||||
|
||||
parts = []
|
||||
if session_id:
|
||||
parts.append(f"x-session-id: {_safe(session_id)}")
|
||||
if user_id:
|
||||
parts.append(f"x-user-id: {_safe(user_id)}")
|
||||
if parts:
|
||||
env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)
|
||||
parts = []
|
||||
if session_id:
|
||||
parts.append(f"x-session-id: {_safe(session_id)}")
|
||||
if user_id:
|
||||
parts.append(f"x-user-id: {_safe(user_id)}")
|
||||
if parts:
|
||||
env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)
|
||||
|
||||
# --- Common: workspace isolation + security hardening (all modes) ---
|
||||
# Route subagent temp files into the per-session workspace so output
|
||||
# files are accessible (fixes /tmp/claude-0/ permission errors in E2B).
|
||||
if sdk_cwd:
|
||||
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
|
||||
|
||||
# Harden multi-tenant deployment: prevent loading untrusted workspace
|
||||
# .claude.md files, persisting prompt history, writing auto-memory,
|
||||
# and sending non-essential telemetry traffic.
|
||||
env["CLAUDE_CODE_DISABLE_CLAUDE_MDS"] = "1"
|
||||
env["CLAUDE_CODE_SKIP_PROMPT_HISTORY"] = "1"
|
||||
env["CLAUDE_CODE_DISABLE_AUTO_MEMORY"] = "1"
|
||||
env["CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC"] = "1"
|
||||
|
||||
return env
|
||||
|
||||
@@ -41,11 +41,9 @@ class TestBuildSdkEnvSubscription:
|
||||
|
||||
result = build_sdk_env()
|
||||
|
||||
assert result == {
|
||||
"ANTHROPIC_API_KEY": "",
|
||||
"ANTHROPIC_AUTH_TOKEN": "",
|
||||
"ANTHROPIC_BASE_URL": "",
|
||||
}
|
||||
assert result["ANTHROPIC_API_KEY"] == ""
|
||||
assert result["ANTHROPIC_AUTH_TOKEN"] == ""
|
||||
assert result["ANTHROPIC_BASE_URL"] == ""
|
||||
mock_validate.assert_called_once()
|
||||
|
||||
@patch(
|
||||
@@ -68,18 +66,20 @@ class TestBuildSdkEnvSubscription:
|
||||
|
||||
|
||||
class TestBuildSdkEnvDirectAnthropic:
|
||||
"""When OpenRouter is inactive, return empty dict (inherit parent env)."""
|
||||
"""When OpenRouter is inactive, no ANTHROPIC_* overrides (inherit parent env)."""
|
||||
|
||||
def test_returns_empty_dict_when_openrouter_inactive(self):
|
||||
def test_no_anthropic_key_overrides_when_openrouter_inactive(self):
|
||||
cfg = _make_config(use_openrouter=False)
|
||||
with patch("backend.copilot.sdk.env.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
result = build_sdk_env()
|
||||
|
||||
assert result == {}
|
||||
assert "ANTHROPIC_API_KEY" not in result
|
||||
assert "ANTHROPIC_AUTH_TOKEN" not in result
|
||||
assert "ANTHROPIC_BASE_URL" not in result
|
||||
|
||||
def test_returns_empty_dict_when_openrouter_flag_true_but_no_key(self):
|
||||
def test_no_anthropic_key_overrides_when_openrouter_flag_true_but_no_key(self):
|
||||
"""OpenRouter flag is True but no api_key => openrouter_active is False."""
|
||||
cfg = _make_config(use_openrouter=True, base_url="https://openrouter.ai/api/v1")
|
||||
# Force api_key to None after construction (field_validator may pick up env vars)
|
||||
@@ -90,7 +90,9 @@ class TestBuildSdkEnvDirectAnthropic:
|
||||
|
||||
result = build_sdk_env()
|
||||
|
||||
assert result == {}
|
||||
assert "ANTHROPIC_API_KEY" not in result
|
||||
assert "ANTHROPIC_AUTH_TOKEN" not in result
|
||||
assert "ANTHROPIC_BASE_URL" not in result
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -234,12 +236,12 @@ class TestBuildSdkEnvModePriority:
|
||||
|
||||
result = build_sdk_env()
|
||||
|
||||
# Should get subscription result, not OpenRouter
|
||||
assert result == {
|
||||
"ANTHROPIC_API_KEY": "",
|
||||
"ANTHROPIC_AUTH_TOKEN": "",
|
||||
"ANTHROPIC_BASE_URL": "",
|
||||
}
|
||||
# Should get subscription result (blanked keys), not OpenRouter proxy
|
||||
assert result["ANTHROPIC_API_KEY"] == ""
|
||||
assert result["ANTHROPIC_AUTH_TOKEN"] == ""
|
||||
assert result["ANTHROPIC_BASE_URL"] == ""
|
||||
# OpenRouter-specific key must NOT be present
|
||||
assert "ANTHROPIC_CUSTOM_HEADERS" not in result
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@@ -0,0 +1,442 @@
|
||||
"""Tests for P0 guardrails: _resolve_fallback_model, security env vars, TMPDIR."""
|
||||
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
from pydantic import ValidationError
|
||||
|
||||
from backend.copilot.config import ChatConfig
|
||||
from backend.copilot.constants import is_transient_api_error
|
||||
|
||||
|
||||
def _make_config(**overrides) -> ChatConfig:
|
||||
"""Create a ChatConfig with safe defaults, applying *overrides*."""
|
||||
defaults = {
|
||||
"use_claude_code_subscription": False,
|
||||
"use_openrouter": False,
|
||||
"api_key": None,
|
||||
"base_url": None,
|
||||
}
|
||||
defaults.update(overrides)
|
||||
return ChatConfig(**defaults)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _resolve_fallback_model
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_SVC = "backend.copilot.sdk.service"
|
||||
_ENV = "backend.copilot.sdk.env"
|
||||
|
||||
|
||||
class TestResolveFallbackModel:
|
||||
"""Provider-aware fallback model resolution."""
|
||||
|
||||
def test_returns_none_when_empty(self):
|
||||
cfg = _make_config(claude_agent_fallback_model="")
|
||||
with patch(f"{_SVC}.config", cfg):
|
||||
from backend.copilot.sdk.service import _resolve_fallback_model
|
||||
|
||||
assert _resolve_fallback_model() is None
|
||||
|
||||
def test_strips_provider_prefix(self):
|
||||
"""OpenRouter-style 'anthropic/claude-sonnet-4-...' is stripped."""
|
||||
cfg = _make_config(
|
||||
claude_agent_fallback_model="anthropic/claude-sonnet-4-20250514",
|
||||
use_openrouter=True,
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
)
|
||||
with patch(f"{_SVC}.config", cfg):
|
||||
from backend.copilot.sdk.service import _resolve_fallback_model
|
||||
|
||||
result = _resolve_fallback_model()
|
||||
|
||||
assert result == "claude-sonnet-4-20250514"
|
||||
assert "/" not in result
|
||||
|
||||
def test_dots_replaced_for_direct_anthropic(self):
|
||||
"""Direct Anthropic requires hyphen-separated versions."""
|
||||
cfg = _make_config(
|
||||
claude_agent_fallback_model="claude-sonnet-4.5-20250514",
|
||||
use_openrouter=False,
|
||||
)
|
||||
with patch(f"{_SVC}.config", cfg):
|
||||
from backend.copilot.sdk.service import _resolve_fallback_model
|
||||
|
||||
result = _resolve_fallback_model()
|
||||
|
||||
assert result is not None
|
||||
assert "." not in result
|
||||
assert result == "claude-sonnet-4-5-20250514"
|
||||
|
||||
def test_dots_preserved_for_openrouter(self):
|
||||
"""OpenRouter uses dot-separated versions — don't normalise."""
|
||||
cfg = _make_config(
|
||||
claude_agent_fallback_model="claude-sonnet-4.5-20250514",
|
||||
use_openrouter=True,
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
)
|
||||
with patch(f"{_SVC}.config", cfg):
|
||||
from backend.copilot.sdk.service import _resolve_fallback_model
|
||||
|
||||
result = _resolve_fallback_model()
|
||||
|
||||
assert result == "claude-sonnet-4.5-20250514"
|
||||
|
||||
def test_default_value(self):
|
||||
"""Default fallback model resolves to a valid string."""
|
||||
cfg = _make_config()
|
||||
with patch(f"{_SVC}.config", cfg):
|
||||
from backend.copilot.sdk.service import _resolve_fallback_model
|
||||
|
||||
result = _resolve_fallback_model()
|
||||
|
||||
assert result is not None
|
||||
assert "sonnet" in result.lower() or "claude" in result.lower()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Security & isolation env vars
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
_SECURITY_VARS = (
|
||||
"CLAUDE_CODE_DISABLE_CLAUDE_MDS",
|
||||
"CLAUDE_CODE_SKIP_PROMPT_HISTORY",
|
||||
"CLAUDE_CODE_DISABLE_AUTO_MEMORY",
|
||||
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC",
|
||||
)
|
||||
|
||||
|
||||
class TestSecurityEnvVars:
|
||||
"""Verify security env vars are set in the returned dict for every auth mode.
|
||||
|
||||
Tests call ``build_sdk_env()`` directly and assert the vars are present
|
||||
in the returned dict — not just present somewhere in the source file.
|
||||
"""
|
||||
|
||||
def test_security_vars_set_in_openrouter_mode(self):
|
||||
"""Mode 3 (OpenRouter): security vars must be in the returned env."""
|
||||
cfg = _make_config(
|
||||
use_claude_code_subscription=False,
|
||||
use_openrouter=True,
|
||||
api_key="sk-or-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
)
|
||||
with patch(f"{_ENV}.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env(session_id="s1", user_id="u1")
|
||||
|
||||
for var in _SECURITY_VARS:
|
||||
assert env.get(var) == "1", f"{var} not set in OpenRouter mode"
|
||||
|
||||
def test_security_vars_set_in_direct_anthropic_mode(self):
|
||||
"""Mode 2 (direct Anthropic): security vars must be in the returned env."""
|
||||
cfg = _make_config(use_claude_code_subscription=False, use_openrouter=False)
|
||||
with patch(f"{_ENV}.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env()
|
||||
|
||||
for var in _SECURITY_VARS:
|
||||
assert env.get(var) == "1", f"{var} not set in direct Anthropic mode"
|
||||
|
||||
def test_security_vars_set_in_subscription_mode(self):
|
||||
"""Mode 1 (subscription): security vars must be in the returned env."""
|
||||
cfg = _make_config(use_claude_code_subscription=True)
|
||||
with (
|
||||
patch(f"{_ENV}.config", cfg),
|
||||
patch(f"{_ENV}.validate_subscription"),
|
||||
):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env(session_id="s1", user_id="u1")
|
||||
|
||||
for var in _SECURITY_VARS:
|
||||
assert env.get(var) == "1", f"{var} not set in subscription mode"
|
||||
|
||||
def test_tmpdir_set_when_sdk_cwd_provided(self):
|
||||
"""CLAUDE_CODE_TMPDIR must be set when sdk_cwd is provided."""
|
||||
cfg = _make_config(use_openrouter=False)
|
||||
with patch(f"{_ENV}.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env(sdk_cwd="/workspace/session-1")
|
||||
|
||||
assert env.get("CLAUDE_CODE_TMPDIR") == "/workspace/session-1"
|
||||
|
||||
def test_tmpdir_absent_when_sdk_cwd_not_provided(self):
|
||||
"""CLAUDE_CODE_TMPDIR must NOT be set when sdk_cwd is None."""
|
||||
cfg = _make_config(use_openrouter=False)
|
||||
with patch(f"{_ENV}.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env()
|
||||
|
||||
assert "CLAUDE_CODE_TMPDIR" not in env
|
||||
|
||||
def test_home_not_overridden(self):
|
||||
"""HOME must NOT be overridden — would break git/ssh/npm in subprocesses."""
|
||||
cfg = _make_config(use_openrouter=False)
|
||||
with patch(f"{_ENV}.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env()
|
||||
|
||||
assert "HOME" not in env
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Config defaults
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestConfigDefaults:
|
||||
"""Verify ChatConfig P0 fields have correct defaults."""
|
||||
|
||||
def test_fallback_model_default(self):
|
||||
cfg = _make_config()
|
||||
assert cfg.claude_agent_fallback_model
|
||||
assert "sonnet" in cfg.claude_agent_fallback_model.lower()
|
||||
|
||||
def test_max_turns_default(self):
|
||||
cfg = _make_config()
|
||||
assert cfg.claude_agent_max_turns == 1000
|
||||
|
||||
def test_max_budget_usd_default(self):
|
||||
cfg = _make_config()
|
||||
assert cfg.claude_agent_max_budget_usd == 100.0
|
||||
|
||||
def test_max_transient_retries_default(self):
|
||||
cfg = _make_config()
|
||||
assert cfg.claude_agent_max_transient_retries == 3
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# build_sdk_env — all 3 auth modes
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBuildSdkEnv:
|
||||
"""Verify build_sdk_env returns correct dicts for each auth mode."""
|
||||
|
||||
def test_subscription_mode_clears_keys(self):
|
||||
"""Mode 1: subscription clears API key / auth token / base URL."""
|
||||
cfg = _make_config(use_claude_code_subscription=True)
|
||||
with (
|
||||
patch(f"{_ENV}.config", cfg),
|
||||
patch(f"{_ENV}.validate_subscription"),
|
||||
):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env(session_id="s1", user_id="u1")
|
||||
|
||||
assert env["ANTHROPIC_API_KEY"] == ""
|
||||
assert env["ANTHROPIC_AUTH_TOKEN"] == ""
|
||||
assert env["ANTHROPIC_BASE_URL"] == ""
|
||||
|
||||
def test_direct_anthropic_inherits_api_key(self):
|
||||
"""Mode 2: direct Anthropic doesn't set ANTHROPIC_* keys (inherits from parent)."""
|
||||
cfg = _make_config(
|
||||
use_claude_code_subscription=False,
|
||||
use_openrouter=False,
|
||||
)
|
||||
with patch(f"{_ENV}.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env()
|
||||
|
||||
assert "ANTHROPIC_API_KEY" not in env
|
||||
assert "ANTHROPIC_AUTH_TOKEN" not in env
|
||||
assert "ANTHROPIC_BASE_URL" not in env
|
||||
|
||||
def test_openrouter_sets_base_url_and_auth(self):
|
||||
"""Mode 3: OpenRouter sets base URL, auth token, and clears API key."""
|
||||
cfg = _make_config(
|
||||
use_claude_code_subscription=False,
|
||||
use_openrouter=True,
|
||||
api_key="sk-or-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
)
|
||||
with patch(f"{_ENV}.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env(session_id="sess-1", user_id="user-1")
|
||||
|
||||
assert env["ANTHROPIC_BASE_URL"] == "https://openrouter.ai/api"
|
||||
assert env["ANTHROPIC_AUTH_TOKEN"] == "sk-or-test"
|
||||
assert env["ANTHROPIC_API_KEY"] == ""
|
||||
assert "x-session-id: sess-1" in env["ANTHROPIC_CUSTOM_HEADERS"]
|
||||
assert "x-user-id: user-1" in env["ANTHROPIC_CUSTOM_HEADERS"]
|
||||
|
||||
def test_openrouter_no_headers_when_ids_empty(self):
|
||||
"""Mode 3: No custom headers when session_id/user_id are not given."""
|
||||
cfg = _make_config(
|
||||
use_claude_code_subscription=False,
|
||||
use_openrouter=True,
|
||||
api_key="sk-or-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
)
|
||||
with patch(f"{_ENV}.config", cfg):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env()
|
||||
|
||||
assert "ANTHROPIC_CUSTOM_HEADERS" not in env
|
||||
|
||||
def test_all_modes_return_mutable_dict(self):
|
||||
"""build_sdk_env must return a mutable dict (not None) in every mode."""
|
||||
for cfg in (
|
||||
_make_config(use_claude_code_subscription=True),
|
||||
_make_config(use_openrouter=False),
|
||||
_make_config(
|
||||
use_openrouter=True,
|
||||
api_key="k",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
),
|
||||
):
|
||||
with (
|
||||
patch(f"{_ENV}.config", cfg),
|
||||
patch(f"{_ENV}.validate_subscription"),
|
||||
):
|
||||
from backend.copilot.sdk.env import build_sdk_env
|
||||
|
||||
env = build_sdk_env()
|
||||
|
||||
assert isinstance(env, dict)
|
||||
env["CLAUDE_CODE_TMPDIR"] = "/tmp/test"
|
||||
assert env["CLAUDE_CODE_TMPDIR"] == "/tmp/test"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# is_transient_api_error
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestIsTransientApiError:
|
||||
"""Verify that is_transient_api_error detects all transient patterns."""
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"error_text",
|
||||
[
|
||||
"socket connection was closed unexpectedly",
|
||||
"ECONNRESET",
|
||||
"connection was forcibly closed",
|
||||
"network socket disconnected",
|
||||
],
|
||||
)
|
||||
def test_connection_level_errors(self, error_text: str):
|
||||
assert is_transient_api_error(error_text)
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"error_text",
|
||||
[
|
||||
"rate limit exceeded",
|
||||
"rate_limit_error",
|
||||
"Too Many Requests",
|
||||
"status code 429",
|
||||
],
|
||||
)
|
||||
def test_429_rate_limit_errors(self, error_text: str):
|
||||
assert is_transient_api_error(error_text)
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"error_text",
|
||||
[
|
||||
# Status-code-specific patterns (preferred — no false-positive risk)
|
||||
"status code 529",
|
||||
"status code 500",
|
||||
"status code 502",
|
||||
"status code 503",
|
||||
"status code 504",
|
||||
],
|
||||
)
|
||||
def test_5xx_server_errors(self, error_text: str):
|
||||
assert is_transient_api_error(error_text)
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"error_text",
|
||||
[
|
||||
"invalid_api_key",
|
||||
"Authentication failed",
|
||||
"prompt is too long",
|
||||
"model not found",
|
||||
"",
|
||||
# Natural-language phrases intentionally NOT matched — they are too
|
||||
# broad and could appear in application-level SDK messages unrelated
|
||||
# to Anthropic API transient conditions.
|
||||
"API is overloaded",
|
||||
"Internal Server Error",
|
||||
"Bad Gateway",
|
||||
"Service Unavailable",
|
||||
"Gateway Timeout",
|
||||
],
|
||||
)
|
||||
def test_non_transient_errors(self, error_text: str):
|
||||
assert not is_transient_api_error(error_text)
|
||||
|
||||
def test_case_insensitive(self):
|
||||
assert is_transient_api_error("SOCKET CONNECTION WAS CLOSED UNEXPECTEDLY")
|
||||
assert is_transient_api_error("econnreset")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Config validators for max_turns / max_budget_usd
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestConfigValidators:
|
||||
"""Verify ge/le bounds on max_turns and max_budget_usd."""
|
||||
|
||||
def test_max_turns_rejects_zero(self):
|
||||
with pytest.raises(ValidationError):
|
||||
_make_config(claude_agent_max_turns=0)
|
||||
|
||||
def test_max_turns_rejects_negative(self):
|
||||
with pytest.raises(ValidationError):
|
||||
_make_config(claude_agent_max_turns=-1)
|
||||
|
||||
def test_max_turns_rejects_above_10000(self):
|
||||
with pytest.raises(ValidationError):
|
||||
_make_config(claude_agent_max_turns=10001)
|
||||
|
||||
def test_max_turns_accepts_boundary_values(self):
|
||||
cfg_low = _make_config(claude_agent_max_turns=1)
|
||||
assert cfg_low.claude_agent_max_turns == 1
|
||||
cfg_high = _make_config(claude_agent_max_turns=10000)
|
||||
assert cfg_high.claude_agent_max_turns == 10000
|
||||
|
||||
def test_max_budget_rejects_zero(self):
|
||||
with pytest.raises(ValidationError):
|
||||
_make_config(claude_agent_max_budget_usd=0.0)
|
||||
|
||||
def test_max_budget_rejects_negative(self):
|
||||
with pytest.raises(ValidationError):
|
||||
_make_config(claude_agent_max_budget_usd=-1.0)
|
||||
|
||||
def test_max_budget_rejects_above_1000(self):
|
||||
with pytest.raises(ValidationError):
|
||||
_make_config(claude_agent_max_budget_usd=1000.01)
|
||||
|
||||
def test_max_budget_accepts_boundary_values(self):
|
||||
cfg_low = _make_config(claude_agent_max_budget_usd=0.01)
|
||||
assert cfg_low.claude_agent_max_budget_usd == 0.01
|
||||
cfg_high = _make_config(claude_agent_max_budget_usd=1000.0)
|
||||
assert cfg_high.claude_agent_max_budget_usd == 1000.0
|
||||
|
||||
def test_max_transient_retries_rejects_negative(self):
|
||||
with pytest.raises(ValidationError):
|
||||
_make_config(claude_agent_max_transient_retries=-1)
|
||||
|
||||
def test_max_transient_retries_rejects_above_10(self):
|
||||
with pytest.raises(ValidationError):
|
||||
_make_config(claude_agent_max_transient_retries=11)
|
||||
|
||||
def test_max_transient_retries_accepts_boundary_values(self):
|
||||
cfg_low = _make_config(claude_agent_max_transient_retries=0)
|
||||
assert cfg_low.claude_agent_max_transient_retries == 0
|
||||
cfg_high = _make_config(claude_agent_max_transient_retries=10)
|
||||
assert cfg_high.claude_agent_max_transient_retries == 10
|
||||
@@ -260,13 +260,13 @@ def test_result_error_emits_error_and_finish():
|
||||
is_error=True,
|
||||
num_turns=0,
|
||||
session_id="s1",
|
||||
result="API rate limited",
|
||||
result="Invalid API key provided",
|
||||
)
|
||||
results = adapter.convert_message(msg)
|
||||
# No step was open, so no FinishStep — just Error + Finish
|
||||
assert len(results) == 2
|
||||
assert isinstance(results[0], StreamError)
|
||||
assert "API rate limited" in results[0].errorText
|
||||
assert "Invalid API key provided" in results[0].errorText
|
||||
assert isinstance(results[1], StreamFinish)
|
||||
|
||||
|
||||
|
||||
@@ -105,6 +105,10 @@ def test_agent_options_accepts_all_our_fields():
|
||||
"env",
|
||||
"resume",
|
||||
"max_buffer_size",
|
||||
"stderr",
|
||||
"fallback_model",
|
||||
"max_turns",
|
||||
"max_budget_usd",
|
||||
]
|
||||
sig = inspect.signature(ClaudeAgentOptions)
|
||||
for field in fields_we_use:
|
||||
|
||||
@@ -545,17 +545,34 @@ async def _iter_sdk_messages(
|
||||
pass
|
||||
|
||||
|
||||
def _normalize_model_name(raw_model: str) -> str:
|
||||
"""Normalize a model name for the current routing configuration.
|
||||
|
||||
Applies two transformations shared by both the primary and fallback
|
||||
model resolution paths:
|
||||
|
||||
1. **Strip provider prefix** — OpenRouter-style names like
|
||||
``"anthropic/claude-opus-4.6"`` are reduced to ``"claude-opus-4.6"``.
|
||||
2. **Dot-to-hyphen conversion** — when *not* routing through OpenRouter
|
||||
the direct Anthropic API requires hyphen-separated versions
|
||||
(``"claude-opus-4-6"``), so dots are replaced with hyphens.
|
||||
"""
|
||||
model = raw_model
|
||||
if "/" in model:
|
||||
model = model.split("/", 1)[1]
|
||||
# OpenRouter uses dots in versions (claude-opus-4.6) but the direct
|
||||
# Anthropic API requires hyphens (claude-opus-4-6). Only normalise
|
||||
# when NOT routing through OpenRouter.
|
||||
if not config.openrouter_active:
|
||||
model = model.replace(".", "-")
|
||||
return model
|
||||
|
||||
|
||||
def _resolve_sdk_model() -> str | None:
|
||||
"""Resolve the model name for the Claude Agent SDK CLI.
|
||||
|
||||
Uses `config.claude_agent_model` if set, otherwise derives from
|
||||
`config.model` by stripping the OpenRouter provider prefix (e.g.,
|
||||
`"anthropic/claude-opus-4.6"` → `"claude-opus-4-6"`).
|
||||
|
||||
OpenRouter uses dot-separated versions (`claude-opus-4.6`) while the
|
||||
direct Anthropic API uses hyphen-separated versions (`claude-opus-4-6`).
|
||||
Normalisation is only applied when the SDK will actually talk to
|
||||
Anthropic directly (not through OpenRouter).
|
||||
`config.model` via :func:`_normalize_model_name`.
|
||||
|
||||
When `use_claude_code_subscription` is enabled and no explicit
|
||||
`claude_agent_model` is set, returns `None` so the CLI uses the
|
||||
@@ -565,15 +582,18 @@ def _resolve_sdk_model() -> str | None:
|
||||
return config.claude_agent_model
|
||||
if config.use_claude_code_subscription:
|
||||
return None
|
||||
model = config.model
|
||||
if "/" in model:
|
||||
model = model.split("/", 1)[1]
|
||||
# OpenRouter uses dots in versions (claude-opus-4.6) but the direct
|
||||
# Anthropic API requires hyphens (claude-opus-4-6). Only normalise
|
||||
# when NOT routing through OpenRouter.
|
||||
if not config.openrouter_active:
|
||||
model = model.replace(".", "-")
|
||||
return model
|
||||
return _normalize_model_name(config.model)
|
||||
|
||||
|
||||
def _resolve_fallback_model() -> str | None:
|
||||
"""Resolve the fallback model name via :func:`_normalize_model_name`.
|
||||
|
||||
Returns ``None`` when no fallback is configured (empty string).
|
||||
"""
|
||||
raw = config.claude_agent_fallback_model
|
||||
if not raw:
|
||||
return None
|
||||
return _normalize_model_name(raw)
|
||||
|
||||
|
||||
def _make_sdk_cwd(session_id: str) -> str:
|
||||
@@ -1960,10 +1980,29 @@ async def stream_chat_completion_sdk(
|
||||
allowed = get_copilot_tool_names(use_e2b=use_e2b)
|
||||
disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)
|
||||
|
||||
# Flag set by _on_stderr when the SDK logs that it switched to the
|
||||
# fallback model (e.g. on a 529 overloaded error). Checked once per
|
||||
# heartbeat cycle and emitted as a StreamStatus notification.
|
||||
fallback_model_activated = False
|
||||
|
||||
def _on_stderr(line: str) -> None:
|
||||
"""Log a stderr line emitted by the Claude CLI subprocess."""
|
||||
nonlocal fallback_model_activated
|
||||
sid = session_id[:12] if session_id else "?"
|
||||
logger.info("[SDK] [%s] CLI stderr: %s", sid, line.rstrip())
|
||||
# Detect SDK fallback-model activation. The CLI logs a
|
||||
# message containing "fallback" when it switches models
|
||||
# after a 529/overloaded error. Only match "fallback" —
|
||||
# "overloaded" alone indicates a transient error, not that
|
||||
# the SDK actually switched to the fallback model.
|
||||
lower = line.lower()
|
||||
if not fallback_model_activated and "fallback" in lower:
|
||||
fallback_model_activated = True
|
||||
logger.warning(
|
||||
"[SDK] [%s] Fallback model activated — primary model "
|
||||
"overloaded, switching to fallback",
|
||||
sid,
|
||||
)
|
||||
|
||||
sdk_options_kwargs: dict[str, Any] = {
|
||||
"system_prompt": system_prompt,
|
||||
@@ -1974,6 +2013,15 @@ async def stream_chat_completion_sdk(
|
||||
"cwd": sdk_cwd,
|
||||
"max_buffer_size": config.claude_agent_max_buffer_size,
|
||||
"stderr": _on_stderr,
|
||||
# --- P0 guardrails ---
|
||||
# fallback_model: SDK auto-retries with this cheaper model on
|
||||
# 529 (overloaded) errors, avoiding user-visible failures.
|
||||
"fallback_model": _resolve_fallback_model(),
|
||||
# max_turns: hard cap on agentic tool-use loops per query to
|
||||
# prevent runaway execution from burning budget.
|
||||
"max_turns": config.claude_agent_max_turns,
|
||||
# max_budget_usd: per-query spend ceiling enforced by the CLI.
|
||||
"max_budget_usd": config.claude_agent_max_budget_usd,
|
||||
}
|
||||
if sdk_model:
|
||||
sdk_options_kwargs["model"] = sdk_model
|
||||
@@ -2060,6 +2108,26 @@ async def stream_chat_completion_sdk(
|
||||
attempts_exhausted = False
|
||||
stream_err: Exception | None = None
|
||||
|
||||
# Transient retry helper — deduplicates the logic shared between
|
||||
# _HandledStreamError and the generic except-Exception handler.
|
||||
transient_retries = 0
|
||||
max_transient_retries = config.claude_agent_max_transient_retries
|
||||
|
||||
def _next_transient_backoff() -> int | None:
|
||||
"""Return the next backoff delay in seconds, or ``None`` to surface the error.
|
||||
|
||||
Returns the backoff seconds if a retry should be attempted,
|
||||
or ``None`` if retries are exhausted or events were already
|
||||
yielded. Mutates outer ``transient_retries`` via nonlocal.
|
||||
"""
|
||||
nonlocal transient_retries
|
||||
if events_yielded > 0:
|
||||
return None
|
||||
transient_retries += 1
|
||||
if transient_retries > max_transient_retries:
|
||||
return None
|
||||
return 2 ** (transient_retries - 1) # 1s, 2s, 4s, ...
|
||||
|
||||
state = _RetryState(
|
||||
options=options,
|
||||
query_message=query_message,
|
||||
@@ -2072,7 +2140,19 @@ async def stream_chat_completion_sdk(
|
||||
usage=_TokenUsage(),
|
||||
)
|
||||
|
||||
for attempt in range(_MAX_STREAM_ATTEMPTS):
|
||||
attempt = 0
|
||||
_last_reset_attempt = -1
|
||||
while attempt < _MAX_STREAM_ATTEMPTS:
|
||||
# Reset transient retry counter per context-level attempt so
|
||||
# each attempt (original, compacted, no-transcript) gets the
|
||||
# full retry budget for transient errors.
|
||||
# Only reset when the attempt number actually changes —
|
||||
# transient retries `continue` back to the loop top without
|
||||
# incrementing `attempt`, so resetting unconditionally would
|
||||
# create an infinite retry loop.
|
||||
if attempt != _last_reset_attempt:
|
||||
transient_retries = 0
|
||||
_last_reset_attempt = attempt
|
||||
# Clear any stale stash signal from the previous attempt so
|
||||
# wait_for_stash() doesn't fire prematurely on a leftover event.
|
||||
reset_stash_event()
|
||||
@@ -2127,7 +2207,15 @@ async def stream_chat_completion_sdk(
|
||||
state.usage.reset()
|
||||
|
||||
pre_attempt_msg_count = len(session.messages)
|
||||
# Snapshot transcript builder state — it maintains an
|
||||
# independent _entries list from session.messages, so rolling
|
||||
# back session.messages alone would leave duplicate entries
|
||||
# from the failed attempt in the uploaded transcript.
|
||||
pre_transcript_entries = list(state.transcript_builder._entries)
|
||||
pre_transcript_uuid = state.transcript_builder._last_uuid
|
||||
events_yielded = 0
|
||||
fallback_model_activated = False
|
||||
fallback_notified = False
|
||||
|
||||
try:
|
||||
async for event in _run_stream_attempt(stream_ctx, state):
|
||||
@@ -2143,9 +2231,24 @@ async def stream_chat_completion_sdk(
|
||||
StreamToolInputStart,
|
||||
StreamToolInputAvailable,
|
||||
StreamToolOutputAvailable,
|
||||
# Transient StreamError and StreamStatus are
|
||||
# ephemeral notifications, not content. Counting
|
||||
# them would prevent the backoff retry from firing
|
||||
# because _next_transient_backoff() returns None
|
||||
# when events_yielded > 0.
|
||||
StreamError,
|
||||
StreamStatus,
|
||||
),
|
||||
):
|
||||
events_yielded += 1
|
||||
# Emit a one-time StreamStatus when the SDK switches
|
||||
# to the fallback model (detected via stderr).
|
||||
if fallback_model_activated and not fallback_notified:
|
||||
fallback_notified = True
|
||||
yield StreamStatus(
|
||||
message="Primary model overloaded — "
|
||||
"using fallback model for this request"
|
||||
)
|
||||
yield event
|
||||
break # Stream completed — exit retry loop
|
||||
except asyncio.CancelledError:
|
||||
@@ -2162,6 +2265,31 @@ async def stream_chat_completion_sdk(
|
||||
# session messages and set the error flag — do NOT set
|
||||
# stream_err so the post-loop code won't emit a
|
||||
# duplicate StreamError.
|
||||
session.messages = session.messages[:pre_attempt_msg_count]
|
||||
state.transcript_builder._entries = pre_transcript_entries
|
||||
state.transcript_builder._last_uuid = pre_transcript_uuid
|
||||
# Check if this is a transient error we can retry with backoff.
|
||||
# exc.code is the only reliable signal — str(exc) is always the
|
||||
# static "Stream error handled — StreamError already yielded" message.
|
||||
if exc.code == "transient_api_error":
|
||||
backoff = _next_transient_backoff()
|
||||
if backoff is not None:
|
||||
logger.warning(
|
||||
"%s Transient error — retrying in %ds (%d/%d)",
|
||||
log_prefix,
|
||||
backoff,
|
||||
transient_retries,
|
||||
max_transient_retries,
|
||||
)
|
||||
yield StreamStatus(
|
||||
message=f"Connection interrupted, retrying in {backoff}s…"
|
||||
)
|
||||
await asyncio.sleep(backoff)
|
||||
state.adapter = SDKResponseAdapter(
|
||||
message_id=message_id, session_id=session_id
|
||||
)
|
||||
state.usage.reset()
|
||||
continue # retry the same context-level attempt
|
||||
logger.warning(
|
||||
"%s Stream error handled in attempt "
|
||||
"(attempt %d/%d, code=%s, events_yielded=%d)",
|
||||
@@ -2171,7 +2299,6 @@ async def stream_chat_completion_sdk(
|
||||
exc.code or "transient",
|
||||
events_yielded,
|
||||
)
|
||||
session.messages = session.messages[:pre_attempt_msg_count]
|
||||
# transcript_builder still contains entries from the aborted
|
||||
# attempt that no longer match session.messages. Skip upload
|
||||
# so a future --resume doesn't replay rolled-back content.
|
||||
@@ -2186,22 +2313,29 @@ async def stream_chat_completion_sdk(
|
||||
retryable=True,
|
||||
)
|
||||
ended_with_stream_error = True
|
||||
# _run_stream_attempt already yielded a StreamError to the
|
||||
# client before raising _HandledStreamError — do NOT yield
|
||||
# another one here or the client will see a duplicate.
|
||||
break
|
||||
except Exception as e:
|
||||
stream_err = e
|
||||
is_context_error = _is_prompt_too_long(e)
|
||||
is_transient = is_transient_api_error(str(e))
|
||||
logger.warning(
|
||||
"%s Stream error (attempt %d/%d, context_error=%s, "
|
||||
"events_yielded=%d): %s",
|
||||
"transient=%s, events_yielded=%d): %s",
|
||||
log_prefix,
|
||||
attempt + 1,
|
||||
_MAX_STREAM_ATTEMPTS,
|
||||
is_context_error,
|
||||
is_transient,
|
||||
events_yielded,
|
||||
stream_err,
|
||||
exc_info=True,
|
||||
)
|
||||
session.messages = session.messages[:pre_attempt_msg_count]
|
||||
state.transcript_builder._entries = pre_transcript_entries
|
||||
state.transcript_builder._last_uuid = pre_transcript_uuid
|
||||
if events_yielded > 0:
|
||||
# Events were already sent to the frontend and cannot be
|
||||
# unsent. Retrying would produce duplicate/inconsistent
|
||||
@@ -2214,16 +2348,40 @@ async def stream_chat_completion_sdk(
|
||||
skip_transcript_upload = True
|
||||
ended_with_stream_error = True
|
||||
break
|
||||
# Transient API errors (ECONNRESET, 429, 5xx) — retry
|
||||
# with exponential backoff via the shared helper.
|
||||
if is_transient:
|
||||
backoff = _next_transient_backoff()
|
||||
if backoff is not None:
|
||||
logger.warning(
|
||||
"%s Transient exception — retrying in %ds (%d/%d)",
|
||||
log_prefix,
|
||||
backoff,
|
||||
transient_retries,
|
||||
max_transient_retries,
|
||||
)
|
||||
yield StreamStatus(
|
||||
message=f"Connection interrupted, retrying "
|
||||
f"in {backoff}s…"
|
||||
)
|
||||
await asyncio.sleep(backoff)
|
||||
state.adapter = SDKResponseAdapter(
|
||||
message_id=message_id, session_id=session_id
|
||||
)
|
||||
state.usage.reset()
|
||||
continue # retry same context-level attempt
|
||||
|
||||
if not is_context_error:
|
||||
# Non-context errors (network, auth, rate-limit) should
|
||||
# not trigger compaction — surface the error immediately.
|
||||
# Non-context, non-transient errors (auth, fatal)
|
||||
# should not trigger compaction — surface immediately.
|
||||
skip_transcript_upload = True
|
||||
ended_with_stream_error = True
|
||||
break
|
||||
attempt += 1 # advance to next context-level attempt
|
||||
continue
|
||||
else:
|
||||
# All retry attempts exhausted (loop ended without break)
|
||||
# skip_transcript_upload is already set by _reduce_context
|
||||
# while condition became False — all attempts exhausted without
|
||||
# break. skip_transcript_upload is already set by _reduce_context
|
||||
# when the transcript was dropped (transcript_lost=True).
|
||||
ended_with_stream_error = True
|
||||
attempts_exhausted = True
|
||||
|
||||
@@ -10,6 +10,7 @@ import pytest
|
||||
|
||||
from .service import (
|
||||
_is_sdk_disconnect_error,
|
||||
_normalize_model_name,
|
||||
_prepare_file_attachments,
|
||||
_resolve_sdk_model,
|
||||
_safe_close_sdk_client,
|
||||
@@ -405,6 +406,49 @@ def _clean_config_env(monkeypatch: pytest.MonkeyPatch) -> None:
|
||||
monkeypatch.delenv(var, raising=False)
|
||||
|
||||
|
||||
class TestNormalizeModelName:
|
||||
"""Tests for _normalize_model_name — shared provider-aware normalization."""
|
||||
|
||||
def test_strips_provider_prefix(self, monkeypatch, _clean_config_env):
|
||||
from backend.copilot import config as cfg_mod
|
||||
|
||||
cfg = cfg_mod.ChatConfig(
|
||||
use_openrouter=False,
|
||||
api_key=None,
|
||||
base_url=None,
|
||||
use_claude_code_subscription=False,
|
||||
)
|
||||
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
|
||||
assert _normalize_model_name("anthropic/claude-opus-4.6") == "claude-opus-4-6"
|
||||
|
||||
def test_dots_preserved_for_openrouter(self, monkeypatch, _clean_config_env):
|
||||
from backend.copilot import config as cfg_mod
|
||||
|
||||
cfg = cfg_mod.ChatConfig(
|
||||
use_openrouter=True,
|
||||
api_key="or-key",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
use_claude_code_subscription=False,
|
||||
)
|
||||
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
|
||||
assert _normalize_model_name("anthropic/claude-opus-4.6") == "claude-opus-4.6"
|
||||
|
||||
def test_no_prefix_no_dots(self, monkeypatch, _clean_config_env):
|
||||
from backend.copilot import config as cfg_mod
|
||||
|
||||
cfg = cfg_mod.ChatConfig(
|
||||
use_openrouter=False,
|
||||
api_key=None,
|
||||
base_url=None,
|
||||
use_claude_code_subscription=False,
|
||||
)
|
||||
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
|
||||
assert (
|
||||
_normalize_model_name("claude-sonnet-4-20250514")
|
||||
== "claude-sonnet-4-20250514"
|
||||
)
|
||||
|
||||
|
||||
class TestResolveSdkModel:
|
||||
"""Tests for _resolve_sdk_model — model ID resolution for the SDK CLI."""
|
||||
|
||||
|
||||
@@ -89,10 +89,6 @@ export function CopilotPage() {
|
||||
isUploadingFiles,
|
||||
isUserLoading,
|
||||
isLoggedIn,
|
||||
// Pagination
|
||||
hasMoreMessages,
|
||||
isLoadingMore,
|
||||
loadMore,
|
||||
// Mobile drawer
|
||||
isMobile,
|
||||
isDrawerOpen,
|
||||
@@ -201,9 +197,6 @@ export function CopilotPage() {
|
||||
onSend={onSend}
|
||||
onStop={stop}
|
||||
isUploadingFiles={isUploadingFiles}
|
||||
hasMoreMessages={hasMoreMessages}
|
||||
isLoadingMore={isLoadingMore}
|
||||
onLoadMore={loadMore}
|
||||
droppedFiles={droppedFiles}
|
||||
onDroppedFilesConsumed={handleDroppedFilesConsumed}
|
||||
historicalDurations={historicalDurations}
|
||||
|
||||
@@ -27,9 +27,6 @@ export interface ChatContainerProps {
|
||||
onSend: (message: string, files?: File[]) => void | Promise<void>;
|
||||
onStop: () => void;
|
||||
isUploadingFiles?: boolean;
|
||||
hasMoreMessages?: boolean;
|
||||
isLoadingMore?: boolean;
|
||||
onLoadMore?: () => void;
|
||||
/** Files dropped onto the chat window. */
|
||||
droppedFiles?: File[];
|
||||
/** Called after droppedFiles have been consumed by ChatInput. */
|
||||
@@ -51,9 +48,6 @@ export const ChatContainer = ({
|
||||
onSend,
|
||||
onStop,
|
||||
isUploadingFiles,
|
||||
hasMoreMessages,
|
||||
isLoadingMore,
|
||||
onLoadMore,
|
||||
droppedFiles,
|
||||
onDroppedFilesConsumed,
|
||||
historicalDurations,
|
||||
@@ -108,9 +102,6 @@ export const ChatContainer = ({
|
||||
error={error}
|
||||
isLoading={isLoadingSession}
|
||||
sessionID={sessionId}
|
||||
hasMoreMessages={hasMoreMessages}
|
||||
isLoadingMore={isLoadingMore}
|
||||
onLoadMore={onLoadMore}
|
||||
onRetry={handleRetry}
|
||||
historicalDurations={historicalDurations}
|
||||
/>
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { useMemo, useState } from "react";
|
||||
import { useEffect, useMemo, useRef } from "react";
|
||||
import {
|
||||
Conversation,
|
||||
ConversationContent,
|
||||
@@ -11,8 +11,6 @@ import {
|
||||
} from "@/components/ai-elements/message";
|
||||
import { LoadingSpinner } from "@/components/atoms/LoadingSpinner/LoadingSpinner";
|
||||
import { FileUIPart, UIDataTypes, UIMessage, UITools } from "ai";
|
||||
import { useEffect, useLayoutEffect, useRef } from "react";
|
||||
import { useStickToBottomContext } from "use-stick-to-bottom";
|
||||
import { TOOL_PART_PREFIX } from "../JobStatsBar/constants";
|
||||
import { TurnStatsBar } from "../JobStatsBar/TurnStatsBar";
|
||||
import { useElapsedTimer } from "../JobStatsBar/useElapsedTimer";
|
||||
@@ -39,9 +37,6 @@ interface Props {
|
||||
error: Error | undefined;
|
||||
isLoading: boolean;
|
||||
sessionID?: string | null;
|
||||
hasMoreMessages?: boolean;
|
||||
isLoadingMore?: boolean;
|
||||
onLoadMore?: () => void;
|
||||
onRetry?: () => void;
|
||||
historicalDurations?: Map<string, number>;
|
||||
}
|
||||
@@ -111,120 +106,15 @@ function extractGraphExecId(
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Triggers `onLoadMore` when scrolled near the top, and preserves the
|
||||
* user's scroll position after older messages are prepended to the DOM.
|
||||
*
|
||||
* Scroll preservation works by:
|
||||
* 1. Capturing `scrollHeight` / `scrollTop` in the observer callback
|
||||
* (synchronous, before React re-renders).
|
||||
* 2. Restoring `scrollTop` in a `useLayoutEffect` keyed on
|
||||
* `messageCount` so it only fires when messages actually change
|
||||
* (not on intermediate renders like the loading-spinner toggle).
|
||||
*/
|
||||
function LoadMoreSentinel({
|
||||
hasMore,
|
||||
isLoading,
|
||||
messageCount,
|
||||
onLoadMore,
|
||||
}: {
|
||||
hasMore: boolean;
|
||||
isLoading: boolean;
|
||||
messageCount: number;
|
||||
onLoadMore: () => void;
|
||||
}) {
|
||||
const sentinelRef = useRef<HTMLDivElement>(null);
|
||||
const onLoadMoreRef = useRef(onLoadMore);
|
||||
onLoadMoreRef.current = onLoadMore;
|
||||
// Pre-mutation scroll snapshot, written synchronously before onLoadMore
|
||||
const scrollSnapshotRef = useRef({ scrollHeight: 0, scrollTop: 0 });
|
||||
const { scrollRef } = useStickToBottomContext();
|
||||
|
||||
// IntersectionObserver to trigger load when sentinel is near viewport.
|
||||
// Only fires when the container is actually scrollable to prevent
|
||||
// exhausting all pages when content fits without scrolling.
|
||||
useEffect(() => {
|
||||
if (!sentinelRef.current || !hasMore || isLoading) return;
|
||||
const observer = new IntersectionObserver(
|
||||
([entry]) => {
|
||||
if (!entry.isIntersecting) return;
|
||||
const scrollParent =
|
||||
sentinelRef.current?.closest('[role="log"]') ??
|
||||
sentinelRef.current?.parentElement;
|
||||
if (
|
||||
scrollParent &&
|
||||
scrollParent.scrollHeight <= scrollParent.clientHeight
|
||||
)
|
||||
return;
|
||||
// Capture scroll metrics *before* the state update
|
||||
const el = scrollRef.current;
|
||||
if (el) {
|
||||
scrollSnapshotRef.current = {
|
||||
scrollHeight: el.scrollHeight,
|
||||
scrollTop: el.scrollTop,
|
||||
};
|
||||
}
|
||||
onLoadMoreRef.current();
|
||||
},
|
||||
{ rootMargin: "200px 0px 0px 0px" },
|
||||
);
|
||||
observer.observe(sentinelRef.current);
|
||||
return () => observer.disconnect();
|
||||
}, [hasMore, isLoading, scrollRef]);
|
||||
|
||||
// After React commits new DOM nodes (prepended messages), adjust
|
||||
// scrollTop so the user stays at the same visual position.
|
||||
// Keyed on messageCount so it only fires when messages actually
|
||||
// change — NOT on intermediate renders (loading spinner, etc.)
|
||||
// that would consume the snapshot too early.
|
||||
useLayoutEffect(() => {
|
||||
const el = scrollRef.current;
|
||||
const { scrollHeight: prevHeight, scrollTop: prevTop } =
|
||||
scrollSnapshotRef.current;
|
||||
if (!el || prevHeight === 0) return;
|
||||
const delta = el.scrollHeight - prevHeight;
|
||||
if (delta > 0) {
|
||||
el.scrollTop = prevTop + delta;
|
||||
}
|
||||
scrollSnapshotRef.current = { scrollHeight: 0, scrollTop: 0 };
|
||||
}, [messageCount, scrollRef]);
|
||||
|
||||
return (
|
||||
<div ref={sentinelRef} className="flex justify-center py-1">
|
||||
{isLoading && <LoadingSpinner className="h-5 w-5 text-neutral-400" />}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
export function ChatMessagesContainer({
|
||||
messages,
|
||||
status,
|
||||
error,
|
||||
isLoading,
|
||||
sessionID,
|
||||
hasMoreMessages,
|
||||
isLoadingMore,
|
||||
onLoadMore,
|
||||
onRetry,
|
||||
historicalDurations,
|
||||
}: Props) {
|
||||
// Hide the container for one frame when messages first load so
|
||||
// StickToBottom can scroll to the bottom before the user sees it.
|
||||
const [settled, setSettled] = useState(false);
|
||||
const [prevSessionID, setPrevSessionID] = useState(sessionID);
|
||||
if (sessionID !== prevSessionID) {
|
||||
setPrevSessionID(sessionID);
|
||||
if (settled) setSettled(false);
|
||||
}
|
||||
const messagesReady = messages.length > 0 || !isLoading;
|
||||
useEffect(() => {
|
||||
if (settled || !messagesReady) return;
|
||||
const raf = requestAnimationFrame(() => setSettled(true));
|
||||
return () => cancelAnimationFrame(raf);
|
||||
}, [settled, messagesReady]);
|
||||
// opacity-0 only during the single frame between messages arriving and scroll settling
|
||||
const hideForScroll = messagesReady && !settled;
|
||||
|
||||
const lastMessage = messages[messages.length - 1];
|
||||
const graphExecId = useMemo(() => extractGraphExecId(messages), [messages]);
|
||||
|
||||
@@ -272,27 +162,13 @@ export function ChatMessagesContainer({
|
||||
});
|
||||
|
||||
return (
|
||||
<Conversation
|
||||
key={sessionID ?? "new"}
|
||||
resize={settled ? "smooth" : "instant"}
|
||||
className={
|
||||
"min-h-0 flex-1 " +
|
||||
(hideForScroll
|
||||
? "opacity-0"
|
||||
: "opacity-100 transition-opacity duration-100 ease-out")
|
||||
}
|
||||
>
|
||||
<ConversationContent className="flex min-h-full flex-1 flex-col gap-6 px-3 py-6">
|
||||
{hasMoreMessages && onLoadMore && (
|
||||
<LoadMoreSentinel
|
||||
hasMore={hasMoreMessages}
|
||||
isLoading={!!isLoadingMore}
|
||||
messageCount={messages.length}
|
||||
onLoadMore={onLoadMore}
|
||||
/>
|
||||
)}
|
||||
<Conversation className="min-h-0 flex-1">
|
||||
<ConversationContent className="flex flex-1 flex-col gap-6 px-3 py-6">
|
||||
{isLoading && messages.length === 0 && (
|
||||
<div className="flex flex-1 items-center justify-center">
|
||||
<div
|
||||
className="flex flex-1 items-center justify-center"
|
||||
style={{ minHeight: "calc(100vh - 12rem)" }}
|
||||
>
|
||||
<LoadingSpinner className="text-neutral-600" />
|
||||
</div>
|
||||
)}
|
||||
|
||||
@@ -6,7 +6,6 @@ interface SessionChatMessage {
|
||||
content: string | null;
|
||||
tool_call_id: string | null;
|
||||
tool_calls: unknown[] | null;
|
||||
sequence: number | null;
|
||||
duration_ms: number | null;
|
||||
}
|
||||
|
||||
@@ -36,7 +35,6 @@ function coerceSessionChatMessages(
|
||||
? null
|
||||
: String(msg.tool_call_id),
|
||||
tool_calls: Array.isArray(msg.tool_calls) ? msg.tool_calls : null,
|
||||
sequence: typeof msg.sequence === "number" ? msg.sequence : null,
|
||||
duration_ms:
|
||||
typeof msg.duration_ms === "number" ? msg.duration_ms : null,
|
||||
};
|
||||
@@ -103,67 +101,10 @@ function toToolInput(rawArguments: unknown): unknown {
|
||||
return {};
|
||||
}
|
||||
|
||||
/**
|
||||
* Concatenate two UIMessage arrays, merging consecutive assistant messages
|
||||
* at the join point so that reasoning + response parts stay in a single bubble.
|
||||
*
|
||||
* Within each page, `convertChatSessionMessagesToUiMessages` already merges
|
||||
* consecutive assistant DB rows. This handles the boundary between pages
|
||||
* (or between older-pages and the current/streaming page).
|
||||
*/
|
||||
export function concatWithAssistantMerge(
|
||||
a: UIMessage<unknown, UIDataTypes, UITools>[],
|
||||
b: UIMessage<unknown, UIDataTypes, UITools>[],
|
||||
): UIMessage<unknown, UIDataTypes, UITools>[] {
|
||||
if (a.length === 0) return b;
|
||||
if (b.length === 0) return a;
|
||||
const last = a[a.length - 1];
|
||||
const first = b[0];
|
||||
if (last.role === "assistant" && first.role === "assistant") {
|
||||
return [
|
||||
...a.slice(0, -1),
|
||||
{ ...last, parts: [...last.parts, ...first.parts] },
|
||||
...b.slice(1),
|
||||
];
|
||||
}
|
||||
return [...a, ...b];
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract a toolCallId → output map from raw API messages.
|
||||
* Used to provide cross-page tool output context when converting
|
||||
* older pages that may have assistant tool_calls whose results
|
||||
* are in a newer page.
|
||||
*/
|
||||
export function extractToolOutputsFromRaw(
|
||||
rawMessages: unknown[],
|
||||
): Map<string, unknown> {
|
||||
const map = new Map<string, unknown>();
|
||||
for (const raw of rawMessages) {
|
||||
if (!raw || typeof raw !== "object") continue;
|
||||
const msg = raw as Record<string, unknown>;
|
||||
if (
|
||||
msg.role === "tool" &&
|
||||
typeof msg.tool_call_id === "string" &&
|
||||
msg.content != null
|
||||
) {
|
||||
map.set(
|
||||
msg.tool_call_id,
|
||||
typeof msg.content === "string" ? msg.content : String(msg.content),
|
||||
);
|
||||
}
|
||||
}
|
||||
return map;
|
||||
}
|
||||
|
||||
export function convertChatSessionMessagesToUiMessages(
|
||||
sessionId: string,
|
||||
rawMessages: unknown[],
|
||||
options?: {
|
||||
isComplete?: boolean;
|
||||
/** Tool outputs from adjacent pages, for cross-page tool_call matching. */
|
||||
extraToolOutputs?: Map<string, unknown>;
|
||||
},
|
||||
options?: { isComplete?: boolean },
|
||||
): {
|
||||
messages: UIMessage<unknown, UIDataTypes, UITools>[];
|
||||
durations: Map<string, number>;
|
||||
@@ -171,14 +112,6 @@ export function convertChatSessionMessagesToUiMessages(
|
||||
const messages = coerceSessionChatMessages(rawMessages);
|
||||
const toolOutputsByCallId = new Map<string, unknown>();
|
||||
|
||||
// Seed with extra tool outputs from adjacent pages first;
|
||||
// outputs from this page will override if present in both.
|
||||
if (options?.extraToolOutputs) {
|
||||
for (const [id, output] of options.extraToolOutputs) {
|
||||
toolOutputsByCallId.set(id, output);
|
||||
}
|
||||
}
|
||||
|
||||
for (const msg of messages) {
|
||||
if (msg.role !== "tool") continue;
|
||||
if (!msg.tool_call_id) continue;
|
||||
@@ -189,7 +122,7 @@ export function convertChatSessionMessagesToUiMessages(
|
||||
const uiMessages: UIMessage<unknown, UIDataTypes, UITools>[] = [];
|
||||
const durations = new Map<string, number>();
|
||||
|
||||
messages.forEach((msg) => {
|
||||
messages.forEach((msg, index) => {
|
||||
if (msg.role === "tool") return;
|
||||
if (msg.role !== "user" && msg.role !== "assistant") return;
|
||||
|
||||
@@ -267,7 +200,7 @@ export function convertChatSessionMessagesToUiMessages(
|
||||
return;
|
||||
}
|
||||
|
||||
const msgId = `${sessionId}-seq-${msg.sequence}`;
|
||||
const msgId = `${sessionId}-${index}`;
|
||||
uiMessages.push({
|
||||
id: msgId,
|
||||
role: msg.role,
|
||||
|
||||
@@ -15,7 +15,7 @@ export function useChatSession() {
|
||||
const [sessionId, setSessionId] = useQueryState("sessionId", parseAsString);
|
||||
const queryClient = useQueryClient();
|
||||
|
||||
const sessionQuery = useGetV2GetSession(sessionId ?? "", undefined, {
|
||||
const sessionQuery = useGetV2GetSession(sessionId ?? "", {
|
||||
query: {
|
||||
enabled: !!sessionId,
|
||||
staleTime: Infinity, // Manual invalidation on session switch
|
||||
@@ -57,17 +57,6 @@ export function useChatSession() {
|
||||
return !!sessionQuery.data.data.active_stream;
|
||||
}, [sessionQuery.data, sessionQuery.isFetching, sessionId]);
|
||||
|
||||
// Pagination metadata from the initial page load
|
||||
const hasMoreMessages = useMemo(() => {
|
||||
if (sessionQuery.data?.status !== 200) return false;
|
||||
return !!sessionQuery.data.data.has_more_messages;
|
||||
}, [sessionQuery.data]);
|
||||
|
||||
const oldestSequence = useMemo(() => {
|
||||
if (sessionQuery.data?.status !== 200) return null;
|
||||
return sessionQuery.data.data.oldest_sequence ?? null;
|
||||
}, [sessionQuery.data]);
|
||||
|
||||
// Memoize so the effect in useCopilotPage doesn't infinite-loop on a new
|
||||
// array reference every render. Re-derives only when query data changes.
|
||||
// When the session is complete (no active stream), mark dangling tool
|
||||
@@ -138,22 +127,12 @@ export function useChatSession() {
|
||||
}
|
||||
}
|
||||
|
||||
// Raw messages from the initial page — exposed for cross-page
|
||||
// tool output matching by useLoadMoreMessages.
|
||||
const rawSessionMessages =
|
||||
sessionQuery.data?.status === 200
|
||||
? ((sessionQuery.data.data.messages ?? []) as unknown[])
|
||||
: [];
|
||||
|
||||
return {
|
||||
sessionId,
|
||||
setSessionId,
|
||||
hydratedMessages,
|
||||
rawSessionMessages,
|
||||
historicalDurations,
|
||||
hasActiveStream,
|
||||
hasMoreMessages,
|
||||
oldestSequence,
|
||||
isLoadingSession: sessionQuery.isLoading,
|
||||
isSessionError: sessionQuery.isError,
|
||||
createSession,
|
||||
|
||||
@@ -12,12 +12,10 @@ import { useQueryClient } from "@tanstack/react-query";
|
||||
import type { FileUIPart } from "ai";
|
||||
import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
|
||||
import { useEffect, useRef, useState } from "react";
|
||||
import { concatWithAssistantMerge } from "./helpers/convertChatSessionToUiMessages";
|
||||
import { useCopilotUIStore } from "./store";
|
||||
import { useChatSession } from "./useChatSession";
|
||||
import { useCopilotNotifications } from "./useCopilotNotifications";
|
||||
import { useCopilotStream } from "./useCopilotStream";
|
||||
import { useLoadMoreMessages } from "./useLoadMoreMessages";
|
||||
import { useWorkflowImportAutoSubmit } from "./useWorkflowImportAutoSubmit";
|
||||
|
||||
const TITLE_POLL_INTERVAL_MS = 2_000;
|
||||
@@ -49,11 +47,8 @@ export function useCopilotPage() {
|
||||
sessionId,
|
||||
setSessionId,
|
||||
hydratedMessages,
|
||||
rawSessionMessages,
|
||||
historicalDurations,
|
||||
hasActiveStream,
|
||||
hasMoreMessages,
|
||||
oldestSequence,
|
||||
isLoadingSession,
|
||||
isSessionError,
|
||||
createSession,
|
||||
@@ -62,7 +57,7 @@ export function useCopilotPage() {
|
||||
} = useChatSession();
|
||||
|
||||
const {
|
||||
messages: currentMessages,
|
||||
messages,
|
||||
sendMessage,
|
||||
stop,
|
||||
status,
|
||||
@@ -80,19 +75,6 @@ export function useCopilotPage() {
|
||||
copilotMode: isModeToggleEnabled ? copilotMode : undefined,
|
||||
});
|
||||
|
||||
const { olderMessages, hasMore, isLoadingMore, loadMore } =
|
||||
useLoadMoreMessages({
|
||||
sessionId,
|
||||
initialOldestSequence: oldestSequence,
|
||||
initialHasMore: hasMoreMessages,
|
||||
initialPageRawMessages: rawSessionMessages,
|
||||
});
|
||||
|
||||
// Combine older (paginated) messages with current page messages,
|
||||
// merging consecutive assistant UIMessages at the page boundary so
|
||||
// reasoning + response parts stay in a single bubble.
|
||||
const messages = concatWithAssistantMerge(olderMessages, currentMessages);
|
||||
|
||||
useCopilotNotifications(sessionId);
|
||||
|
||||
// --- Delete session ---
|
||||
@@ -389,10 +371,6 @@ export function useCopilotPage() {
|
||||
isLoggedIn,
|
||||
createSession,
|
||||
onSend,
|
||||
// Pagination
|
||||
hasMoreMessages: hasMore,
|
||||
isLoadingMore,
|
||||
loadMore,
|
||||
// Mobile drawer
|
||||
isMobile,
|
||||
isDrawerOpen,
|
||||
|
||||
@@ -1,161 +0,0 @@
|
||||
import { getV2GetSession } from "@/app/api/__generated__/endpoints/chat/chat";
|
||||
import type { UIDataTypes, UIMessage, UITools } from "ai";
|
||||
import { useEffect, useMemo, useRef, useState } from "react";
|
||||
import {
|
||||
convertChatSessionMessagesToUiMessages,
|
||||
extractToolOutputsFromRaw,
|
||||
} from "./helpers/convertChatSessionToUiMessages";
|
||||
|
||||
interface UseLoadMoreMessagesArgs {
|
||||
sessionId: string | null;
|
||||
initialOldestSequence: number | null;
|
||||
initialHasMore: boolean;
|
||||
/** Raw messages from the initial page, used for cross-page tool output matching. */
|
||||
initialPageRawMessages: unknown[];
|
||||
}
|
||||
|
||||
const MAX_CONSECUTIVE_ERRORS = 3;
|
||||
const MAX_OLDER_MESSAGES = 2000;
|
||||
|
||||
export function useLoadMoreMessages({
|
||||
sessionId,
|
||||
initialOldestSequence,
|
||||
initialHasMore,
|
||||
initialPageRawMessages,
|
||||
}: UseLoadMoreMessagesArgs) {
|
||||
// Store accumulated raw messages from all older pages (in ascending order).
|
||||
// Re-converting them all together ensures tool outputs are matched across
|
||||
// inter-page boundaries.
|
||||
const [olderRawMessages, setOlderRawMessages] = useState<unknown[]>([]);
|
||||
const [oldestSequence, setOldestSequence] = useState<number | null>(
|
||||
initialOldestSequence,
|
||||
);
|
||||
const [hasMore, setHasMore] = useState(initialHasMore);
|
||||
const [isLoadingMore, setIsLoadingMore] = useState(false);
|
||||
const isLoadingMoreRef = useRef(false);
|
||||
const consecutiveErrorsRef = useRef(0);
|
||||
// Epoch counter to discard stale loadMore responses after a reset
|
||||
const epochRef = useRef(0);
|
||||
|
||||
// Track the sessionId and initial cursor to reset state on change
|
||||
const prevSessionIdRef = useRef(sessionId);
|
||||
const prevInitialOldestRef = useRef(initialOldestSequence);
|
||||
|
||||
// Sync initial values from parent when they change
|
||||
useEffect(() => {
|
||||
if (prevSessionIdRef.current !== sessionId) {
|
||||
// Session changed — full reset
|
||||
prevSessionIdRef.current = sessionId;
|
||||
prevInitialOldestRef.current = initialOldestSequence;
|
||||
setOlderRawMessages([]);
|
||||
setOldestSequence(initialOldestSequence);
|
||||
setHasMore(initialHasMore);
|
||||
setIsLoadingMore(false);
|
||||
isLoadingMoreRef.current = false;
|
||||
consecutiveErrorsRef.current = 0;
|
||||
epochRef.current += 1;
|
||||
} else if (
|
||||
prevInitialOldestRef.current !== initialOldestSequence &&
|
||||
olderRawMessages.length > 0
|
||||
) {
|
||||
// Same session but initial window shifted (e.g. new messages arrived) —
|
||||
// clear paged state to avoid gaps/duplicates
|
||||
prevInitialOldestRef.current = initialOldestSequence;
|
||||
setOlderRawMessages([]);
|
||||
setOldestSequence(initialOldestSequence);
|
||||
setHasMore(initialHasMore);
|
||||
setIsLoadingMore(false);
|
||||
isLoadingMoreRef.current = false;
|
||||
consecutiveErrorsRef.current = 0;
|
||||
epochRef.current += 1;
|
||||
} else {
|
||||
// Update from parent when initial data changes (e.g. refetch)
|
||||
prevInitialOldestRef.current = initialOldestSequence;
|
||||
setOldestSequence(initialOldestSequence);
|
||||
setHasMore(initialHasMore);
|
||||
}
|
||||
}, [sessionId, initialOldestSequence, initialHasMore]);
|
||||
|
||||
// Convert all accumulated raw messages in one pass so tool outputs
|
||||
// are matched across inter-page boundaries. Initial page tool outputs
|
||||
// are included via extraToolOutputs to handle the boundary between
|
||||
// the last older page and the initial/streaming page.
|
||||
const olderMessages: UIMessage<unknown, UIDataTypes, UITools>[] =
|
||||
useMemo(() => {
|
||||
if (!sessionId || olderRawMessages.length === 0) return [];
|
||||
const extraToolOutputs =
|
||||
initialPageRawMessages.length > 0
|
||||
? extractToolOutputsFromRaw(initialPageRawMessages)
|
||||
: undefined;
|
||||
return convertChatSessionMessagesToUiMessages(
|
||||
sessionId,
|
||||
olderRawMessages,
|
||||
{ isComplete: true, extraToolOutputs },
|
||||
).messages;
|
||||
}, [sessionId, olderRawMessages, initialPageRawMessages]);
|
||||
|
||||
async function loadMore() {
|
||||
if (
|
||||
!sessionId ||
|
||||
!hasMore ||
|
||||
isLoadingMoreRef.current ||
|
||||
oldestSequence === null
|
||||
)
|
||||
return;
|
||||
|
||||
const requestEpoch = epochRef.current;
|
||||
isLoadingMoreRef.current = true;
|
||||
setIsLoadingMore(true);
|
||||
try {
|
||||
const response = await getV2GetSession(sessionId, {
|
||||
limit: 50,
|
||||
before_sequence: oldestSequence,
|
||||
});
|
||||
|
||||
// Discard response if session/pagination was reset while awaiting
|
||||
if (epochRef.current !== requestEpoch) return;
|
||||
|
||||
if (response.status !== 200) {
|
||||
consecutiveErrorsRef.current += 1;
|
||||
console.warn(
|
||||
`[loadMore] Failed to load messages (status=${response.status}, attempt=${consecutiveErrorsRef.current})`,
|
||||
);
|
||||
if (consecutiveErrorsRef.current >= MAX_CONSECUTIVE_ERRORS) {
|
||||
setHasMore(false);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
consecutiveErrorsRef.current = 0;
|
||||
|
||||
const newRaw = (response.data.messages ?? []) as unknown[];
|
||||
setOlderRawMessages((prev) => {
|
||||
const merged = [...newRaw, ...prev];
|
||||
if (merged.length > MAX_OLDER_MESSAGES) {
|
||||
return merged.slice(merged.length - MAX_OLDER_MESSAGES);
|
||||
}
|
||||
return merged;
|
||||
});
|
||||
setOldestSequence(response.data.oldest_sequence ?? null);
|
||||
if (newRaw.length + olderRawMessages.length >= MAX_OLDER_MESSAGES) {
|
||||
setHasMore(false);
|
||||
} else {
|
||||
setHasMore(!!response.data.has_more_messages);
|
||||
}
|
||||
} catch (error) {
|
||||
if (epochRef.current !== requestEpoch) return;
|
||||
consecutiveErrorsRef.current += 1;
|
||||
console.warn("[loadMore] Network error:", error);
|
||||
if (consecutiveErrorsRef.current >= MAX_CONSECUTIVE_ERRORS) {
|
||||
setHasMore(false);
|
||||
}
|
||||
} finally {
|
||||
if (epochRef.current === requestEpoch) {
|
||||
isLoadingMoreRef.current = false;
|
||||
setIsLoadingMore(false);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return { olderMessages, hasMore, isLoadingMore, loadMore };
|
||||
}
|
||||
@@ -2,6 +2,7 @@ import { GraphExecution } from "@/app/api/__generated__/models/graphExecution";
|
||||
import { LibraryAgent } from "@/app/api/__generated__/models/libraryAgent";
|
||||
import { Button } from "@/components/atoms/Button/Button";
|
||||
import { LoadingSpinner } from "@/components/atoms/LoadingSpinner/LoadingSpinner";
|
||||
import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
|
||||
import {
|
||||
ArrowBendLeftUpIcon,
|
||||
ArrowBendRightDownIcon,
|
||||
@@ -46,6 +47,7 @@ export function SelectedRunActions({
|
||||
onSelectRun: onSelectRun,
|
||||
});
|
||||
|
||||
const shareExecutionResultsEnabled = useGetFlag(Flag.SHARE_EXECUTION_RESULTS);
|
||||
const isRunning = run?.status === "RUNNING";
|
||||
|
||||
if (!run || !agent) return null;
|
||||
@@ -102,12 +104,14 @@ export function SelectedRunActions({
|
||||
<EyeIcon weight="bold" size={18} className="text-zinc-700" />
|
||||
</Button>
|
||||
) : null}
|
||||
<ShareRunButton
|
||||
graphId={agent.graph_id}
|
||||
executionId={run.id}
|
||||
isShared={run.is_shared}
|
||||
shareToken={run.share_token}
|
||||
/>
|
||||
{shareExecutionResultsEnabled && (
|
||||
<ShareRunButton
|
||||
graphId={agent.graph_id}
|
||||
executionId={run.id}
|
||||
isShared={run.is_shared}
|
||||
shareToken={run.share_token}
|
||||
/>
|
||||
)}
|
||||
{canRunManually && (
|
||||
<>
|
||||
<Button
|
||||
|
||||
@@ -1134,7 +1134,7 @@
|
||||
"get": {
|
||||
"tags": ["v2", "chat", "chat"],
|
||||
"summary": "Get Session",
|
||||
"description": "Retrieve the details of a specific chat session.\n\nSupports cursor-based pagination via ``limit`` and ``before_sequence``.\nWhen no pagination params are provided, returns the most recent messages.\n\nArgs:\n session_id: The unique identifier for the desired chat session.\n user_id: The authenticated user's ID.\n limit: Maximum number of messages to return (1-200, default 50).\n before_sequence: Return messages with sequence < this value (cursor).\n\nReturns:\n SessionDetailResponse: Details for the requested session, including\n active_stream info and pagination metadata.",
|
||||
"description": "Retrieve the details of a specific chat session.\n\nLooks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.\nIf there's an active stream for this session, returns active_stream info for reconnection.\n\nArgs:\n session_id: The unique identifier for the desired chat session.\n user_id: The optional authenticated user ID, or None for anonymous access.\n\nReturns:\n SessionDetailResponse: Details for the requested session, including active_stream info if applicable.",
|
||||
"operationId": "getV2GetSession",
|
||||
"security": [{ "HTTPBearerJWT": [] }],
|
||||
"parameters": [
|
||||
@@ -1143,30 +1143,6 @@
|
||||
"in": "path",
|
||||
"required": true,
|
||||
"schema": { "type": "string", "title": "Session Id" }
|
||||
},
|
||||
{
|
||||
"name": "limit",
|
||||
"in": "query",
|
||||
"required": false,
|
||||
"schema": {
|
||||
"type": "integer",
|
||||
"maximum": 200,
|
||||
"minimum": 1,
|
||||
"default": 50,
|
||||
"title": "Limit"
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "before_sequence",
|
||||
"in": "query",
|
||||
"required": false,
|
||||
"schema": {
|
||||
"anyOf": [
|
||||
{ "type": "integer", "minimum": 0 },
|
||||
{ "type": "null" }
|
||||
],
|
||||
"title": "Before Sequence"
|
||||
}
|
||||
}
|
||||
],
|
||||
"responses": {
|
||||
@@ -12468,15 +12444,6 @@
|
||||
{ "type": "null" }
|
||||
]
|
||||
},
|
||||
"has_more_messages": {
|
||||
"type": "boolean",
|
||||
"title": "Has More Messages",
|
||||
"default": false
|
||||
},
|
||||
"oldest_sequence": {
|
||||
"anyOf": [{ "type": "integer" }, { "type": "null" }],
|
||||
"title": "Oldest Sequence"
|
||||
},
|
||||
"total_prompt_tokens": {
|
||||
"type": "integer",
|
||||
"title": "Total Prompt Tokens",
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
"use client";
|
||||
|
||||
import { Button } from "@/components/ui/button";
|
||||
import { scrollbarStyles } from "@/components/styles/scrollbars";
|
||||
import { cn } from "@/lib/utils";
|
||||
import { ArrowDownIcon } from "lucide-react";
|
||||
import type { ComponentProps } from "react";
|
||||
@@ -11,8 +12,12 @@ export type ConversationProps = ComponentProps<typeof StickToBottom>;
|
||||
|
||||
export const Conversation = ({ className, ...props }: ConversationProps) => (
|
||||
<StickToBottom
|
||||
className={cn("relative flex-1 overflow-y-hidden", className)}
|
||||
initial="instant"
|
||||
className={cn(
|
||||
"relative flex-1 overflow-y-hidden",
|
||||
scrollbarStyles,
|
||||
className,
|
||||
)}
|
||||
initial="smooth"
|
||||
resize="smooth"
|
||||
role="log"
|
||||
{...props}
|
||||
@@ -25,15 +30,10 @@ export type ConversationContentProps = ComponentProps<
|
||||
|
||||
export const ConversationContent = ({
|
||||
className,
|
||||
scrollClassName,
|
||||
...props
|
||||
}: ConversationContentProps) => (
|
||||
<StickToBottom.Content
|
||||
className={cn("flex flex-col gap-8 p-4", className)}
|
||||
scrollClassName={cn(
|
||||
"scrollbar-thin scrollbar-track-transparent scrollbar-thumb-zinc-300",
|
||||
scrollClassName,
|
||||
)}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
|
||||
@@ -6,6 +6,11 @@ import { useFlags } from "launchdarkly-react-client-sdk";
|
||||
|
||||
export enum Flag {
|
||||
BETA_BLOCKS = "beta-blocks",
|
||||
NEW_BLOCK_MENU = "new-block-menu",
|
||||
GRAPH_SEARCH = "graph-search",
|
||||
ENABLE_ENHANCED_OUTPUT_HANDLING = "enable-enhanced-output-handling",
|
||||
SHARE_EXECUTION_RESULTS = "share-execution-results",
|
||||
AGENT_FAVORITING = "agent-favoriting",
|
||||
MARKETPLACE_SEARCH_TERMS = "marketplace-search-terms",
|
||||
ENABLE_PLATFORM_PAYMENT = "enable-platform-payment",
|
||||
ARTIFACTS = "artifacts",
|
||||
@@ -16,6 +21,11 @@ const isPwMockEnabled = process.env.NEXT_PUBLIC_PW_TEST === "true";
|
||||
|
||||
const defaultFlags = {
|
||||
[Flag.BETA_BLOCKS]: [],
|
||||
[Flag.NEW_BLOCK_MENU]: false,
|
||||
[Flag.GRAPH_SEARCH]: false,
|
||||
[Flag.ENABLE_ENHANCED_OUTPUT_HANDLING]: false,
|
||||
[Flag.SHARE_EXECUTION_RESULTS]: false,
|
||||
[Flag.AGENT_FAVORITING]: false,
|
||||
[Flag.MARKETPLACE_SEARCH_TERMS]: DEFAULT_SEARCH_TERMS,
|
||||
[Flag.ENABLE_PLATFORM_PAYMENT]: false,
|
||||
[Flag.ARTIFACTS]: false,
|
||||
|
||||
BIN
test-screenshots/PR-12636/01-after-login.png
Normal file
|
After Width: | Height: | Size: 67 KiB |
BIN
test-screenshots/PR-12636/02-copilot-page.png
Normal file
|
After Width: | Height: | Size: 67 KiB |
BIN
test-screenshots/PR-12636/03-copilot-response.png
Normal file
|
After Width: | Height: | Size: 59 KiB |
BIN
test-screenshots/PR-12636/03-onboarding.png
Normal file
|
After Width: | Height: | Size: 36 KiB |
BIN
test-screenshots/PR-12636/03-post-onboarding.png
Normal file
|
After Width: | Height: | Size: 37 KiB |
BIN
test-screenshots/PR-12636/04-chat-interface.png
Normal file
|
After Width: | Height: | Size: 37 KiB |
BIN
test-screenshots/PR-12636/04-copilot-direct.png
Normal file
|
After Width: | Height: | Size: 32 KiB |
BIN
test-screenshots/PR-12636/04-multiturn-response.png
Normal file
|
After Width: | Height: | Size: 61 KiB |
BIN
test-screenshots/PR-12636/05-launch-autopilot.png
Normal file
|
After Width: | Height: | Size: 36 KiB |
BIN
test-screenshots/PR-12636/06-after-onboarding.png
Normal file
|
After Width: | Height: | Size: 37 KiB |
BIN
test-screenshots/PR-12636/07-copilot-forced.png
Normal file
|
After Width: | Height: | Size: 32 KiB |
BIN
test-screenshots/PR-12636/08-copilot-loaded.png
Normal file
|
After Width: | Height: | Size: 85 KiB |
BIN
test-screenshots/PR-12636/09-message-sent.png
Normal file
|
After Width: | Height: | Size: 67 KiB |
BIN
test-screenshots/PR-12636/10-copilot-response.png
Normal file
|
After Width: | Height: | Size: 70 KiB |
BIN
test-screenshots/PR-12636/11-second-message.png
Normal file
|
After Width: | Height: | Size: 70 KiB |
BIN
test-screenshots/PR-12636/12-multi-turn-complete.png
Normal file
|
After Width: | Height: | Size: 70 KiB |