mirror of
https://github.com/openclaw/openclaw.git
synced 2026-02-19 18:39:20 -05:00
feat: add configurable tool loop detection
This commit is contained in:
@@ -1417,6 +1417,39 @@ Controls elevated (host) exec access:
|
||||
}
|
||||
```
|
||||
|
||||
### `tools.loopDetection`
|
||||
|
||||
Tool-loop safety checks are **disabled by default**. Set `enabled: true` to activate detection.
|
||||
Settings can be defined globally in `tools.loopDetection` and overridden per-agent at `agents.list[].tools.loopDetection`.
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
loopDetection: {
|
||||
enabled: true,
|
||||
historySize: 30,
|
||||
warningThreshold: 10,
|
||||
criticalThreshold: 20,
|
||||
globalCircuitBreakerThreshold: 30,
|
||||
detectors: {
|
||||
genericRepeat: true,
|
||||
knownPollNoProgress: true,
|
||||
pingPong: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
- `historySize`: max tool-call history retained for loop analysis.
|
||||
- `warningThreshold`: repeating no-progress pattern threshold for warnings.
|
||||
- `criticalThreshold`: higher repeating threshold for blocking critical loops.
|
||||
- `globalCircuitBreakerThreshold`: hard stop threshold for any no-progress run.
|
||||
- `detectors.genericRepeat`: warn on repeated same-tool/same-args calls.
|
||||
- `detectors.knownPollNoProgress`: warn/block on known poll tools (`process.poll`, `command_status`, etc.).
|
||||
- `detectors.pingPong`: warn/block on alternating no-progress pair patterns.
|
||||
- If `warningThreshold >= criticalThreshold` or `criticalThreshold >= globalCircuitBreakerThreshold`, validation fails.
|
||||
|
||||
### `tools.web`
|
||||
|
||||
```json5
|
||||
|
||||
@@ -224,6 +224,35 @@ Notes:
|
||||
- `log` supports line-based `offset`/`limit` (omit `offset` to grab the last N lines).
|
||||
- `process` is scoped per agent; sessions from other agents are not visible.
|
||||
|
||||
### `loop-detection` (tool-call loop guardrails)
|
||||
|
||||
OpenClaw tracks recent tool-call history and blocks or warns when it detects repetitive no-progress loops.
|
||||
Enable with `tools.loopDetection.enabled: true` (default is `false`).
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
loopDetection: {
|
||||
enabled: true,
|
||||
warningThreshold: 10,
|
||||
criticalThreshold: 20,
|
||||
globalCircuitBreakerThreshold: 30,
|
||||
historySize: 30,
|
||||
detectors: {
|
||||
genericRepeat: true,
|
||||
knownPollNoProgress: true,
|
||||
pingPong: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
- `genericRepeat`: repeated same tool + same params call pattern.
|
||||
- `knownPollNoProgress`: repeating poll-like tools with identical outputs.
|
||||
- `pingPong`: alternating `A/B/A/B` no-progress patterns.
|
||||
- Per-agent override: `agents.list[].tools.loopDetection`.
|
||||
|
||||
### `web_search`
|
||||
|
||||
Search the web using Brave Search API.
|
||||
|
||||
98
docs/tools/loop-detection.md
Normal file
98
docs/tools/loop-detection.md
Normal file
@@ -0,0 +1,98 @@
|
||||
---
|
||||
title: "Tool-loop detection"
|
||||
description: "Configure optional guardrails for preventing repetitive or stalled tool-call loops"
|
||||
read_when:
|
||||
- A user reports agents getting stuck repeating tool calls
|
||||
- You need to tune repetitive-call protection
|
||||
- You are editing agent tool/runtime policies
|
||||
---
|
||||
|
||||
# Tool-loop detection
|
||||
|
||||
OpenClaw can keep agents from getting stuck in repeated tool-call patterns.
|
||||
The guard is **disabled by default**.
|
||||
|
||||
Enable it only where needed, because it can block legitimate repeated calls with strict settings.
|
||||
|
||||
## Why this exists
|
||||
|
||||
- Detect repetitive sequences that do not make progress.
|
||||
- Detect high-frequency no-result loops (same tool, same inputs, repeated errors).
|
||||
- Detect specific repeated-call patterns for known polling tools.
|
||||
|
||||
## Configuration block
|
||||
|
||||
Global defaults:
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
loopDetection: {
|
||||
enabled: false,
|
||||
historySize: 20,
|
||||
detectorCooldownMs: 12000,
|
||||
repeatThreshold: 3,
|
||||
criticalThreshold: 6,
|
||||
detectors: {
|
||||
repeatedFailure: true,
|
||||
knownPollLoop: true,
|
||||
repeatingNoProgress: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Per-agent override (optional):
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
list: [
|
||||
{
|
||||
id: "safe-runner",
|
||||
tools: {
|
||||
loopDetection: {
|
||||
enabled: true,
|
||||
repeatThreshold: 2,
|
||||
criticalThreshold: 5,
|
||||
},
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Field behavior
|
||||
|
||||
- `enabled`: Master switch. `false` means no loop detection is performed.
|
||||
- `historySize`: number of recent tool calls kept for analysis.
|
||||
- `detectorCooldownMs`: time window used by the no-progress detector.
|
||||
- `repeatThreshold`: minimum repeats before warning/blocking starts.
|
||||
- `criticalThreshold`: stronger threshold that can trigger stricter handling.
|
||||
- `detectors.repeatedFailure`: detects repeated failed attempts on the same call path.
|
||||
- `detectors.knownPollLoop`: detects known polling-like loops.
|
||||
- `detectors.repeatingNoProgress`: detects high-frequency repeated calls without state change.
|
||||
|
||||
## Recommended setup
|
||||
|
||||
- Start with `enabled: true`, defaults unchanged.
|
||||
- If false positives occur:
|
||||
- raise `repeatThreshold` and/or `criticalThreshold`
|
||||
- disable only the detector causing issues
|
||||
- reduce `historySize` for less strict historical context
|
||||
|
||||
## Logs and expected behavior
|
||||
|
||||
When a loop is detected, OpenClaw reports a loop event and blocks or dampens the next tool-cycle depending on severity.
|
||||
This protects users from runaway token spend and lockups while preserving normal tool access.
|
||||
|
||||
- Prefer warning and temporary suppression first.
|
||||
- Escalate only when repeated evidence accumulates.
|
||||
|
||||
## Notes
|
||||
|
||||
- `tools.loopDetection` is merged with agent-level overrides.
|
||||
- Per-agent config fully overrides or extends global values.
|
||||
- If no config exists, guardrails stay off.
|
||||
@@ -49,7 +49,7 @@ import {
|
||||
resolveCompactionReserveTokensFloor,
|
||||
} from "../../pi-settings.js";
|
||||
import { toClientToolDefinitions } from "../../pi-tool-definition-adapter.js";
|
||||
import { createOpenClawCodingTools } from "../../pi-tools.js";
|
||||
import { createOpenClawCodingTools, resolveToolLoopDetectionConfig } from "../../pi-tools.js";
|
||||
import { resolveSandboxContext } from "../../sandbox.js";
|
||||
import { resolveSandboxRuntimeStatus } from "../../sandbox/runtime-status.js";
|
||||
import { repairSessionFileIfNeeded } from "../../session-file-repair.js";
|
||||
@@ -544,6 +544,10 @@ export async function runEmbeddedAttempt(
|
||||
|
||||
// Add client tools (OpenResponses hosted tools) to customTools
|
||||
let clientToolCallDetected: { name: string; params: Record<string, unknown> } | null = null;
|
||||
const clientToolLoopDetection = resolveToolLoopDetectionConfig({
|
||||
cfg: params.config,
|
||||
agentId: sessionAgentId,
|
||||
});
|
||||
const clientToolDefs = params.clientTools
|
||||
? toClientToolDefinitions(
|
||||
params.clientTools,
|
||||
@@ -553,6 +557,7 @@ export async function runEmbeddedAttempt(
|
||||
{
|
||||
agentId: sessionAgentId,
|
||||
sessionKey: params.sessionKey,
|
||||
loopDetection: clientToolLoopDetection,
|
||||
},
|
||||
)
|
||||
: [];
|
||||
|
||||
@@ -5,6 +5,7 @@ import type {
|
||||
} from "@mariozechner/pi-agent-core";
|
||||
import type { ToolDefinition } from "@mariozechner/pi-coding-agent";
|
||||
import type { ClientToolDefinition } from "./pi-embedded-runner/run/params.js";
|
||||
import type { HookContext } from "./pi-tools.before-tool-call.js";
|
||||
import { logDebug, logError } from "../logger.js";
|
||||
import { getGlobalHookRunner } from "../plugins/hook-runner-global.js";
|
||||
import { isPlainObject } from "../utils.js";
|
||||
@@ -190,7 +191,7 @@ export function toToolDefinitions(tools: AnyAgentTool[]): ToolDefinition[] {
|
||||
export function toClientToolDefinitions(
|
||||
tools: ClientToolDefinition[],
|
||||
onClientToolCall?: (toolName: string, params: Record<string, unknown>) => void,
|
||||
hookContext?: { agentId?: string; sessionKey?: string },
|
||||
hookContext?: HookContext,
|
||||
): ToolDefinition[] {
|
||||
return tools.map((tool) => {
|
||||
const func = tool.function;
|
||||
|
||||
@@ -19,7 +19,17 @@ describe("before_tool_call loop detection behavior", () => {
|
||||
hasHooks: ReturnType<typeof vi.fn>;
|
||||
runBeforeToolCall: ReturnType<typeof vi.fn>;
|
||||
};
|
||||
const defaultToolContext = { agentId: "main", sessionKey: "main" };
|
||||
const enabledLoopDetectionContext = {
|
||||
agentId: "main",
|
||||
sessionKey: "main",
|
||||
loopDetection: { enabled: true },
|
||||
};
|
||||
|
||||
const disabledLoopDetectionContext = {
|
||||
agentId: "main",
|
||||
sessionKey: "main",
|
||||
loopDetection: { enabled: false },
|
||||
};
|
||||
|
||||
beforeEach(() => {
|
||||
resetDiagnosticSessionStateForTest();
|
||||
@@ -33,10 +43,14 @@ describe("before_tool_call loop detection behavior", () => {
|
||||
hookRunner.hasHooks.mockReturnValue(false);
|
||||
});
|
||||
|
||||
function createWrappedTool(name: string, execute: ReturnType<typeof vi.fn>) {
|
||||
function createWrappedTool(
|
||||
name: string,
|
||||
execute: ReturnType<typeof vi.fn>,
|
||||
loopDetectionContext = enabledLoopDetectionContext,
|
||||
) {
|
||||
return wrapToolWithBeforeToolCallHook(
|
||||
{ name, execute } as unknown as AnyAgentTool,
|
||||
defaultToolContext,
|
||||
loopDetectionContext,
|
||||
);
|
||||
}
|
||||
|
||||
@@ -95,7 +109,6 @@ describe("before_tool_call loop detection behavior", () => {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
it("blocks known poll loops when no progress repeats", async () => {
|
||||
const execute = vi.fn().mockResolvedValue({
|
||||
content: [{ type: "text", text: "(no new output)\n\nProcess still running." }],
|
||||
@@ -113,6 +126,22 @@ describe("before_tool_call loop detection behavior", () => {
|
||||
).rejects.toThrow("CRITICAL");
|
||||
});
|
||||
|
||||
it("does nothing when loopDetection.enabled is false", async () => {
|
||||
const execute = vi.fn().mockResolvedValue({
|
||||
content: [{ type: "text", text: "(no new output)\n\nProcess still running." }],
|
||||
details: { status: "running", aggregated: "steady" },
|
||||
});
|
||||
// oxlint-disable-next-line typescript/no-explicit-any
|
||||
const tool = wrapToolWithBeforeToolCallHook({ name: "process", execute } as any, {
|
||||
...disabledLoopDetectionContext,
|
||||
});
|
||||
const params = { action: "poll", sessionId: "sess-off" };
|
||||
|
||||
for (let i = 0; i < CRITICAL_THRESHOLD; i += 1) {
|
||||
await expect(tool.execute(`poll-${i}`, params, undefined, undefined)).resolves.toBeDefined();
|
||||
}
|
||||
});
|
||||
|
||||
it("does not block known poll loops when output progresses", async () => {
|
||||
const execute = vi.fn().mockImplementation(async (toolCallId: string) => {
|
||||
return {
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
import type { ToolLoopDetectionConfig } from "../config/types.tools.js";
|
||||
import type { SessionState } from "../logging/diagnostic-session-state.js";
|
||||
import type { AnyAgentTool } from "./tools/common.js";
|
||||
import { createSubsystemLogger } from "../logging/subsystem.js";
|
||||
@@ -5,9 +6,10 @@ import { getGlobalHookRunner } from "../plugins/hook-runner-global.js";
|
||||
import { isPlainObject } from "../utils.js";
|
||||
import { normalizeToolName } from "./tool-policy.js";
|
||||
|
||||
type HookContext = {
|
||||
export type HookContext = {
|
||||
agentId?: string;
|
||||
sessionKey?: string;
|
||||
loopDetection?: ToolLoopDetectionConfig;
|
||||
};
|
||||
|
||||
type HookOutcome = { blocked: true; reason: string } | { blocked: false; params: unknown };
|
||||
@@ -62,6 +64,7 @@ async function recordLoopOutcome(args: {
|
||||
toolCallId: args.toolCallId,
|
||||
result: args.result,
|
||||
error: args.error,
|
||||
config: args.ctx.loopDetection,
|
||||
});
|
||||
} catch (err) {
|
||||
log.warn(`tool loop outcome tracking failed: tool=${args.toolName} error=${String(err)}`);
|
||||
@@ -87,7 +90,7 @@ export async function runBeforeToolCallHook(args: {
|
||||
sessionId: args.ctx?.agentId,
|
||||
});
|
||||
|
||||
const loopResult = detectToolCallLoop(sessionState, toolName, params);
|
||||
const loopResult = detectToolCallLoop(sessionState, toolName, params, args.ctx.loopDetection);
|
||||
|
||||
if (loopResult.stuck) {
|
||||
if (loopResult.level === "critical") {
|
||||
@@ -126,7 +129,7 @@ export async function runBeforeToolCallHook(args: {
|
||||
}
|
||||
}
|
||||
|
||||
recordToolCall(sessionState, toolName, params, args.toolCallId);
|
||||
recordToolCall(sessionState, toolName, params, args.toolCallId, args.ctx.loopDetection);
|
||||
}
|
||||
|
||||
const hookRunner = getGlobalHookRunner();
|
||||
|
||||
@@ -6,6 +6,7 @@ import {
|
||||
readTool,
|
||||
} from "@mariozechner/pi-coding-agent";
|
||||
import type { OpenClawConfig } from "../config/config.js";
|
||||
import type { ToolLoopDetectionConfig } from "../config/types.tools.js";
|
||||
import type { ModelAuthMode } from "./model-auth.js";
|
||||
import type { AnyAgentTool } from "./pi-tools.types.js";
|
||||
import type { SandboxContext } from "./sandbox.js";
|
||||
@@ -124,6 +125,33 @@ function resolveFsConfig(params: { cfg?: OpenClawConfig; agentId?: string }) {
|
||||
};
|
||||
}
|
||||
|
||||
export function resolveToolLoopDetectionConfig(params: {
|
||||
cfg?: OpenClawConfig;
|
||||
agentId?: string;
|
||||
}): ToolLoopDetectionConfig | undefined {
|
||||
const global = params.cfg?.tools?.loopDetection;
|
||||
const agent =
|
||||
params.agentId && params.cfg
|
||||
? resolveAgentConfig(params.cfg, params.agentId)?.tools?.loopDetection
|
||||
: undefined;
|
||||
|
||||
if (!agent) {
|
||||
return global;
|
||||
}
|
||||
if (!global) {
|
||||
return agent;
|
||||
}
|
||||
|
||||
return {
|
||||
...global,
|
||||
...agent,
|
||||
detectors: {
|
||||
...global.detectors,
|
||||
...agent.detectors,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export const __testing = {
|
||||
cleanToolSchemaForGemini,
|
||||
normalizeToolParams,
|
||||
@@ -451,6 +479,7 @@ export function createOpenClawCodingTools(options?: {
|
||||
wrapToolWithBeforeToolCallHook(tool, {
|
||||
agentId,
|
||||
sessionKey: options?.sessionKey,
|
||||
loopDetection: resolveToolLoopDetectionConfig({ cfg: options?.config, agentId }),
|
||||
}),
|
||||
);
|
||||
const withAbort = options?.abortSignal
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import type { ToolLoopDetectionConfig } from "../config/types.tools.js";
|
||||
import type { SessionState } from "../logging/diagnostic-session-state.js";
|
||||
import {
|
||||
CRITICAL_THRESHOLD,
|
||||
@@ -20,6 +21,13 @@ function createState(): SessionState {
|
||||
};
|
||||
}
|
||||
|
||||
const enabledLoopDetectionConfig: ToolLoopDetectionConfig = { enabled: true };
|
||||
|
||||
const shortHistoryLoopConfig: ToolLoopDetectionConfig = {
|
||||
enabled: true,
|
||||
historySize: 4,
|
||||
};
|
||||
|
||||
function recordSuccessfulCall(
|
||||
state: SessionState,
|
||||
toolName: string,
|
||||
@@ -111,9 +119,31 @@ describe("tool-loop-detection", () => {
|
||||
expect(timestamp).toBeGreaterThanOrEqual(before);
|
||||
expect(timestamp).toBeLessThanOrEqual(after);
|
||||
});
|
||||
|
||||
it("respects configured historySize", () => {
|
||||
const state = createState();
|
||||
|
||||
for (let i = 0; i < 10; i += 1) {
|
||||
recordToolCall(state, "tool", { iteration: i }, `call-${i}`, shortHistoryLoopConfig);
|
||||
}
|
||||
|
||||
expect(state.toolCallHistory).toHaveLength(4);
|
||||
expect(state.toolCallHistory?.[0]?.argsHash).toBe(hashToolCall("tool", { iteration: 6 }));
|
||||
});
|
||||
});
|
||||
|
||||
describe("detectToolCallLoop", () => {
|
||||
it("is disabled by default", () => {
|
||||
const state = createState();
|
||||
|
||||
for (let i = 0; i < 20; i += 1) {
|
||||
recordToolCall(state, "read", { path: "/same.txt" }, `default-${i}`);
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "read", { path: "/same.txt" });
|
||||
expect(loopResult.stuck).toBe(false);
|
||||
});
|
||||
|
||||
it("does not flag unique tool calls", () => {
|
||||
const state = createState();
|
||||
|
||||
@@ -121,7 +151,12 @@ describe("tool-loop-detection", () => {
|
||||
recordToolCall(state, "read", { path: `/file${i}.txt` }, `call-${i}`);
|
||||
}
|
||||
|
||||
const result = detectToolCallLoop(state, "read", { path: "/new-file.txt" });
|
||||
const result = detectToolCallLoop(
|
||||
state,
|
||||
"read",
|
||||
{ path: "/new-file.txt" },
|
||||
enabledLoopDetectionConfig,
|
||||
);
|
||||
expect(result.stuck).toBe(false);
|
||||
});
|
||||
|
||||
@@ -131,7 +166,12 @@ describe("tool-loop-detection", () => {
|
||||
recordToolCall(state, "read", { path: "/same.txt" }, `warn-${i}`);
|
||||
}
|
||||
|
||||
const result = detectToolCallLoop(state, "read", { path: "/same.txt" });
|
||||
const result = detectToolCallLoop(
|
||||
state,
|
||||
"read",
|
||||
{ path: "/same.txt" },
|
||||
enabledLoopDetectionConfig,
|
||||
);
|
||||
|
||||
expect(result.stuck).toBe(true);
|
||||
if (result.stuck) {
|
||||
@@ -155,13 +195,74 @@ describe("tool-loop-detection", () => {
|
||||
recordSuccessfulCall(state, "read", params, result, i);
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "read", params);
|
||||
const loopResult = detectToolCallLoop(state, "read", params, enabledLoopDetectionConfig);
|
||||
expect(loopResult.stuck).toBe(true);
|
||||
if (loopResult.stuck) {
|
||||
expect(loopResult.level).toBe("warning");
|
||||
}
|
||||
});
|
||||
|
||||
it("applies custom thresholds when detection is enabled", () => {
|
||||
const state = createState();
|
||||
const params = { action: "poll", sessionId: "sess-custom" };
|
||||
const result = {
|
||||
content: [{ type: "text", text: "(no new output)\n\nProcess still running." }],
|
||||
details: { status: "running", aggregated: "steady" },
|
||||
};
|
||||
const config: ToolLoopDetectionConfig = {
|
||||
enabled: true,
|
||||
warningThreshold: 2,
|
||||
criticalThreshold: 4,
|
||||
detectors: {
|
||||
genericRepeat: false,
|
||||
knownPollNoProgress: true,
|
||||
pingPong: false,
|
||||
},
|
||||
};
|
||||
|
||||
for (let i = 0; i < 2; i += 1) {
|
||||
recordSuccessfulCall(state, "process", params, result, i);
|
||||
}
|
||||
const warningResult = detectToolCallLoop(state, "process", params, config);
|
||||
expect(warningResult.stuck).toBe(true);
|
||||
if (warningResult.stuck) {
|
||||
expect(warningResult.level).toBe("warning");
|
||||
}
|
||||
|
||||
recordSuccessfulCall(state, "process", params, result, 2);
|
||||
recordSuccessfulCall(state, "process", params, result, 3);
|
||||
const criticalResult = detectToolCallLoop(state, "process", params, config);
|
||||
expect(criticalResult.stuck).toBe(true);
|
||||
if (criticalResult.stuck) {
|
||||
expect(criticalResult.level).toBe("critical");
|
||||
}
|
||||
expect(criticalResult.detector).toBe("known_poll_no_progress");
|
||||
});
|
||||
|
||||
it("can disable specific detectors", () => {
|
||||
const state = createState();
|
||||
const params = { action: "poll", sessionId: "sess-no-detectors" };
|
||||
const result = {
|
||||
content: [{ type: "text", text: "(no new output)\n\nProcess still running." }],
|
||||
details: { status: "running", aggregated: "steady" },
|
||||
};
|
||||
const config: ToolLoopDetectionConfig = {
|
||||
enabled: true,
|
||||
detectors: {
|
||||
genericRepeat: false,
|
||||
knownPollNoProgress: false,
|
||||
pingPong: false,
|
||||
},
|
||||
};
|
||||
|
||||
for (let i = 0; i < CRITICAL_THRESHOLD; i += 1) {
|
||||
recordSuccessfulCall(state, "process", params, result, i);
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "process", params, config);
|
||||
expect(loopResult.stuck).toBe(false);
|
||||
});
|
||||
|
||||
it("warns for known polling no-progress loops", () => {
|
||||
const state = createState();
|
||||
const params = { action: "poll", sessionId: "sess-1" };
|
||||
@@ -174,7 +275,7 @@ describe("tool-loop-detection", () => {
|
||||
recordSuccessfulCall(state, "process", params, result, i);
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "process", params);
|
||||
const loopResult = detectToolCallLoop(state, "process", params, enabledLoopDetectionConfig);
|
||||
expect(loopResult.stuck).toBe(true);
|
||||
if (loopResult.stuck) {
|
||||
expect(loopResult.level).toBe("warning");
|
||||
@@ -195,7 +296,7 @@ describe("tool-loop-detection", () => {
|
||||
recordSuccessfulCall(state, "process", params, result, i);
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "process", params);
|
||||
const loopResult = detectToolCallLoop(state, "process", params, enabledLoopDetectionConfig);
|
||||
expect(loopResult.stuck).toBe(true);
|
||||
if (loopResult.stuck) {
|
||||
expect(loopResult.level).toBe("critical");
|
||||
@@ -216,7 +317,7 @@ describe("tool-loop-detection", () => {
|
||||
recordSuccessfulCall(state, "process", params, result, i);
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "process", params);
|
||||
const loopResult = detectToolCallLoop(state, "process", params, enabledLoopDetectionConfig);
|
||||
expect(loopResult.stuck).toBe(false);
|
||||
});
|
||||
|
||||
@@ -232,7 +333,7 @@ describe("tool-loop-detection", () => {
|
||||
recordSuccessfulCall(state, "read", params, result, i);
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "read", params);
|
||||
const loopResult = detectToolCallLoop(state, "read", params, enabledLoopDetectionConfig);
|
||||
expect(loopResult.stuck).toBe(true);
|
||||
if (loopResult.stuck) {
|
||||
expect(loopResult.level).toBe("critical");
|
||||
@@ -254,7 +355,7 @@ describe("tool-loop-detection", () => {
|
||||
}
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "list", listParams);
|
||||
const loopResult = detectToolCallLoop(state, "list", listParams, enabledLoopDetectionConfig);
|
||||
expect(loopResult.stuck).toBe(true);
|
||||
if (loopResult.stuck) {
|
||||
expect(loopResult.level).toBe("warning");
|
||||
@@ -289,7 +390,7 @@ describe("tool-loop-detection", () => {
|
||||
}
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "list", listParams);
|
||||
const loopResult = detectToolCallLoop(state, "list", listParams, enabledLoopDetectionConfig);
|
||||
expect(loopResult.stuck).toBe(true);
|
||||
if (loopResult.stuck) {
|
||||
expect(loopResult.level).toBe("critical");
|
||||
@@ -325,7 +426,7 @@ describe("tool-loop-detection", () => {
|
||||
}
|
||||
}
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "list", listParams);
|
||||
const loopResult = detectToolCallLoop(state, "list", listParams, enabledLoopDetectionConfig);
|
||||
expect(loopResult.stuck).toBe(true);
|
||||
if (loopResult.stuck) {
|
||||
expect(loopResult.level).toBe("warning");
|
||||
@@ -341,7 +442,12 @@ describe("tool-loop-detection", () => {
|
||||
recordToolCall(state, "read", { path: "/a.txt" }, "a2");
|
||||
recordToolCall(state, "write", { path: "/tmp/out.txt" }, "c1"); // breaks alternation
|
||||
|
||||
const loopResult = detectToolCallLoop(state, "list", { dir: "/workspace" });
|
||||
const loopResult = detectToolCallLoop(
|
||||
state,
|
||||
"list",
|
||||
{ dir: "/workspace" },
|
||||
enabledLoopDetectionConfig,
|
||||
);
|
||||
expect(loopResult.stuck).toBe(false);
|
||||
});
|
||||
|
||||
@@ -368,7 +474,7 @@ describe("tool-loop-detection", () => {
|
||||
it("handles empty history", () => {
|
||||
const state = createState();
|
||||
|
||||
const result = detectToolCallLoop(state, "tool", { arg: 1 });
|
||||
const result = detectToolCallLoop(state, "tool", { arg: 1 }, enabledLoopDetectionConfig);
|
||||
expect(result.stuck).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import { createHash } from "node:crypto";
|
||||
import type { ToolLoopDetectionConfig } from "../config/types.tools.js";
|
||||
import type { SessionState } from "../logging/diagnostic-session-state.js";
|
||||
import { createSubsystemLogger } from "../logging/subsystem.js";
|
||||
import { isPlainObject } from "../utils.js";
|
||||
@@ -27,6 +28,76 @@ export const TOOL_CALL_HISTORY_SIZE = 30;
|
||||
export const WARNING_THRESHOLD = 10;
|
||||
export const CRITICAL_THRESHOLD = 20;
|
||||
export const GLOBAL_CIRCUIT_BREAKER_THRESHOLD = 30;
|
||||
const DEFAULT_LOOP_DETECTION_CONFIG = {
|
||||
enabled: false,
|
||||
historySize: TOOL_CALL_HISTORY_SIZE,
|
||||
warningThreshold: WARNING_THRESHOLD,
|
||||
criticalThreshold: CRITICAL_THRESHOLD,
|
||||
globalCircuitBreakerThreshold: GLOBAL_CIRCUIT_BREAKER_THRESHOLD,
|
||||
detectors: {
|
||||
genericRepeat: true,
|
||||
knownPollNoProgress: true,
|
||||
pingPong: true,
|
||||
},
|
||||
};
|
||||
|
||||
type ResolvedLoopDetectionConfig = {
|
||||
enabled: boolean;
|
||||
historySize: number;
|
||||
warningThreshold: number;
|
||||
criticalThreshold: number;
|
||||
globalCircuitBreakerThreshold: number;
|
||||
detectors: {
|
||||
genericRepeat: boolean;
|
||||
knownPollNoProgress: boolean;
|
||||
pingPong: boolean;
|
||||
};
|
||||
};
|
||||
|
||||
function asPositiveInt(value: number | undefined, fallback: number): number {
|
||||
if (!Number.isInteger(value) || value <= 0) {
|
||||
return fallback;
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function resolveLoopDetectionConfig(config?: ToolLoopDetectionConfig): ResolvedLoopDetectionConfig {
|
||||
let warningThreshold = asPositiveInt(
|
||||
config?.warningThreshold,
|
||||
DEFAULT_LOOP_DETECTION_CONFIG.warningThreshold,
|
||||
);
|
||||
let criticalThreshold = asPositiveInt(
|
||||
config?.criticalThreshold,
|
||||
DEFAULT_LOOP_DETECTION_CONFIG.criticalThreshold,
|
||||
);
|
||||
let globalCircuitBreakerThreshold = asPositiveInt(
|
||||
config?.globalCircuitBreakerThreshold,
|
||||
DEFAULT_LOOP_DETECTION_CONFIG.globalCircuitBreakerThreshold,
|
||||
);
|
||||
|
||||
if (criticalThreshold <= warningThreshold) {
|
||||
criticalThreshold = warningThreshold + 1;
|
||||
}
|
||||
if (globalCircuitBreakerThreshold <= criticalThreshold) {
|
||||
globalCircuitBreakerThreshold = criticalThreshold + 1;
|
||||
}
|
||||
|
||||
return {
|
||||
enabled: config?.enabled ?? DEFAULT_LOOP_DETECTION_CONFIG.enabled,
|
||||
historySize: asPositiveInt(config?.historySize, DEFAULT_LOOP_DETECTION_CONFIG.historySize),
|
||||
warningThreshold,
|
||||
criticalThreshold,
|
||||
globalCircuitBreakerThreshold,
|
||||
detectors: {
|
||||
genericRepeat:
|
||||
config?.detectors?.genericRepeat ?? DEFAULT_LOOP_DETECTION_CONFIG.detectors.genericRepeat,
|
||||
knownPollNoProgress:
|
||||
config?.detectors?.knownPollNoProgress ??
|
||||
DEFAULT_LOOP_DETECTION_CONFIG.detectors.knownPollNoProgress,
|
||||
pingPong: config?.detectors?.pingPong ?? DEFAULT_LOOP_DETECTION_CONFIG.detectors.pingPong,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Hash a tool call for pattern matching.
|
||||
@@ -302,7 +373,12 @@ export function detectToolCallLoop(
|
||||
state: SessionState,
|
||||
toolName: string,
|
||||
params: unknown,
|
||||
config?: ToolLoopDetectionConfig,
|
||||
): LoopDetectionResult {
|
||||
const resolvedConfig = resolveLoopDetectionConfig(config);
|
||||
if (!resolvedConfig.enabled) {
|
||||
return { stuck: false };
|
||||
}
|
||||
const history = state.toolCallHistory ?? [];
|
||||
const currentHash = hashToolCall(toolName, params);
|
||||
const noProgress = getNoProgressStreak(history, toolName, currentHash);
|
||||
@@ -310,7 +386,7 @@ export function detectToolCallLoop(
|
||||
const knownPollTool = isKnownPollToolCall(toolName, params);
|
||||
const pingPong = getPingPongStreak(history, currentHash);
|
||||
|
||||
if (noProgressStreak >= GLOBAL_CIRCUIT_BREAKER_THRESHOLD) {
|
||||
if (noProgressStreak >= resolvedConfig.globalCircuitBreakerThreshold) {
|
||||
log.error(
|
||||
`Global circuit breaker triggered: ${toolName} repeated ${noProgressStreak} times with no progress`,
|
||||
);
|
||||
@@ -324,7 +400,11 @@ export function detectToolCallLoop(
|
||||
};
|
||||
}
|
||||
|
||||
if (knownPollTool && noProgressStreak >= CRITICAL_THRESHOLD) {
|
||||
if (
|
||||
knownPollTool &&
|
||||
resolvedConfig.detectors.knownPollNoProgress &&
|
||||
noProgressStreak >= resolvedConfig.criticalThreshold
|
||||
) {
|
||||
log.error(`Critical polling loop detected: ${toolName} repeated ${noProgressStreak} times`);
|
||||
return {
|
||||
stuck: true,
|
||||
@@ -336,7 +416,11 @@ export function detectToolCallLoop(
|
||||
};
|
||||
}
|
||||
|
||||
if (knownPollTool && noProgressStreak >= WARNING_THRESHOLD) {
|
||||
if (
|
||||
knownPollTool &&
|
||||
resolvedConfig.detectors.knownPollNoProgress &&
|
||||
noProgressStreak >= resolvedConfig.warningThreshold
|
||||
) {
|
||||
log.warn(`Polling loop warning: ${toolName} repeated ${noProgressStreak} times`);
|
||||
return {
|
||||
stuck: true,
|
||||
@@ -352,7 +436,11 @@ export function detectToolCallLoop(
|
||||
? `pingpong:${canonicalPairKey(currentHash, pingPong.pairedSignature)}`
|
||||
: `pingpong:${toolName}:${currentHash}`;
|
||||
|
||||
if (pingPong.count >= CRITICAL_THRESHOLD && pingPong.noProgressEvidence) {
|
||||
if (
|
||||
resolvedConfig.detectors.pingPong &&
|
||||
pingPong.count >= resolvedConfig.criticalThreshold &&
|
||||
pingPong.noProgressEvidence
|
||||
) {
|
||||
log.error(
|
||||
`Critical ping-pong loop detected: alternating calls count=${pingPong.count} currentTool=${toolName}`,
|
||||
);
|
||||
@@ -367,7 +455,7 @@ export function detectToolCallLoop(
|
||||
};
|
||||
}
|
||||
|
||||
if (pingPong.count >= WARNING_THRESHOLD) {
|
||||
if (resolvedConfig.detectors.pingPong && pingPong.count >= resolvedConfig.warningThreshold) {
|
||||
log.warn(
|
||||
`Ping-pong loop warning: alternating calls count=${pingPong.count} currentTool=${toolName}`,
|
||||
);
|
||||
@@ -387,7 +475,11 @@ export function detectToolCallLoop(
|
||||
(h) => h.toolName === toolName && h.argsHash === currentHash,
|
||||
).length;
|
||||
|
||||
if (!knownPollTool && recentCount >= WARNING_THRESHOLD) {
|
||||
if (
|
||||
!knownPollTool &&
|
||||
resolvedConfig.detectors.genericRepeat &&
|
||||
recentCount >= resolvedConfig.warningThreshold
|
||||
) {
|
||||
log.warn(`Loop warning: ${toolName} called ${recentCount} times with identical arguments`);
|
||||
return {
|
||||
stuck: true,
|
||||
@@ -411,7 +503,9 @@ export function recordToolCall(
|
||||
toolName: string,
|
||||
params: unknown,
|
||||
toolCallId?: string,
|
||||
config?: ToolLoopDetectionConfig,
|
||||
): void {
|
||||
const resolvedConfig = resolveLoopDetectionConfig(config);
|
||||
if (!state.toolCallHistory) {
|
||||
state.toolCallHistory = [];
|
||||
}
|
||||
@@ -423,7 +517,7 @@ export function recordToolCall(
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
|
||||
if (state.toolCallHistory.length > TOOL_CALL_HISTORY_SIZE) {
|
||||
if (state.toolCallHistory.length > resolvedConfig.historySize) {
|
||||
state.toolCallHistory.shift();
|
||||
}
|
||||
}
|
||||
@@ -439,8 +533,10 @@ export function recordToolCallOutcome(
|
||||
toolCallId?: string;
|
||||
result?: unknown;
|
||||
error?: unknown;
|
||||
config?: ToolLoopDetectionConfig;
|
||||
},
|
||||
): void {
|
||||
const resolvedConfig = resolveLoopDetectionConfig(params.config);
|
||||
const resultHash = hashToolOutcome(
|
||||
params.toolName,
|
||||
params.toolParams,
|
||||
@@ -486,8 +582,8 @@ export function recordToolCallOutcome(
|
||||
});
|
||||
}
|
||||
|
||||
if (state.toolCallHistory.length > TOOL_CALL_HISTORY_SIZE) {
|
||||
state.toolCallHistory.splice(0, state.toolCallHistory.length - TOOL_CALL_HISTORY_SIZE);
|
||||
if (state.toolCallHistory.length > resolvedConfig.historySize) {
|
||||
state.toolCallHistory.splice(0, state.toolCallHistory.length - resolvedConfig.historySize);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -66,6 +66,20 @@ export const FIELD_HELP: Record<string, string> = {
|
||||
"Restrict apply_patch paths to the workspace directory (default: true). Set false to allow writing outside the workspace (dangerous).",
|
||||
"tools.exec.applyPatch.allowModels":
|
||||
'Optional allowlist of model ids (e.g. "gpt-5.2" or "openai/gpt-5.2").',
|
||||
"tools.loopDetection.enabled":
|
||||
"Enable repetitive tool-call loop detection and backoff safety checks (default: false).",
|
||||
"tools.loopDetection.historySize": "Tool history window size for loop detection (default: 30).",
|
||||
"tools.loopDetection.warningThreshold":
|
||||
"Warning threshold for repetitive patterns when detector is enabled (default: 10).",
|
||||
"tools.loopDetection.criticalThreshold":
|
||||
"Critical threshold for repetitive patterns when detector is enabled (default: 20).",
|
||||
"tools.loopDetection.globalCircuitBreakerThreshold":
|
||||
"Global no-progress breaker threshold (default: 30).",
|
||||
"tools.loopDetection.detectors.genericRepeat":
|
||||
"Enable generic repeated same-tool/same-params loop detection (default: true).",
|
||||
"tools.loopDetection.detectors.knownPollNoProgress":
|
||||
"Enable known poll tool no-progress loop detection (default: true).",
|
||||
"tools.loopDetection.detectors.pingPong": "Enable ping-pong loop detection (default: true).",
|
||||
"tools.exec.notifyOnExit":
|
||||
"When true (default), backgrounded exec sessions enqueue a system event and request a heartbeat on exit.",
|
||||
"tools.exec.notifyOnExitEmptySuccess":
|
||||
|
||||
@@ -73,6 +73,14 @@ export const FIELD_LABELS: Record<string, string> = {
|
||||
"tools.exec.applyPatch.enabled": "Enable apply_patch",
|
||||
"tools.exec.applyPatch.workspaceOnly": "apply_patch Workspace-Only",
|
||||
"tools.exec.applyPatch.allowModels": "apply_patch Model Allowlist",
|
||||
"tools.loopDetection.enabled": "Tool-loop Detection",
|
||||
"tools.loopDetection.historySize": "Tool-loop History Size",
|
||||
"tools.loopDetection.warningThreshold": "Tool-loop Warning Threshold",
|
||||
"tools.loopDetection.criticalThreshold": "Tool-loop Critical Threshold",
|
||||
"tools.loopDetection.globalCircuitBreakerThreshold": "Tool-loop Global Circuit Breaker Threshold",
|
||||
"tools.loopDetection.detectors.genericRepeat": "Tool-loop Generic Repeat Detection",
|
||||
"tools.loopDetection.detectors.knownPollNoProgress": "Tool-loop Poll No-Progress Detection",
|
||||
"tools.loopDetection.detectors.pingPong": "Tool-loop Ping-Pong Detection",
|
||||
"tools.fs.workspaceOnly": "Workspace-only FS tools",
|
||||
"tools.sessions.visibility": "Session Tools Visibility",
|
||||
"tools.exec.notifyOnExit": "Exec Notify On Exit",
|
||||
|
||||
@@ -138,6 +138,30 @@ export type MediaToolsConfig = {
|
||||
|
||||
export type ToolProfileId = "minimal" | "coding" | "messaging" | "full";
|
||||
|
||||
export type ToolLoopDetectionDetectorConfig = {
|
||||
/** Enable warning/blocking for repeated identical calls to the same tool/params. */
|
||||
genericRepeat?: boolean;
|
||||
/** Enable warning/blocking for known no-progress polling loops. */
|
||||
knownPollNoProgress?: boolean;
|
||||
/** Enable warning/blocking for no-progress ping-pong alternating patterns. */
|
||||
pingPong?: boolean;
|
||||
};
|
||||
|
||||
export type ToolLoopDetectionConfig = {
|
||||
/** Enable tool-loop protection (default: false). */
|
||||
enabled?: boolean;
|
||||
/** Maximum tool call history entries retained for loop detection (default: 30). */
|
||||
historySize?: number;
|
||||
/** Warning threshold before a warning-only loop classification (default: 10). */
|
||||
warningThreshold?: number;
|
||||
/** Critical threshold for blocking repetitive loops (default: 20). */
|
||||
criticalThreshold?: number;
|
||||
/** Global no-progress breaker threshold (default: 30). */
|
||||
globalCircuitBreakerThreshold?: number;
|
||||
/** Detector toggles. */
|
||||
detectors?: ToolLoopDetectionDetectorConfig;
|
||||
};
|
||||
|
||||
export type SessionsToolsVisibility = "self" | "tree" | "agent" | "all";
|
||||
|
||||
export type ToolPolicyConfig = {
|
||||
@@ -235,6 +259,8 @@ export type AgentToolsConfig = {
|
||||
exec?: ExecToolConfig;
|
||||
/** Filesystem tool path guards. */
|
||||
fs?: FsToolsConfig;
|
||||
/** Runtime loop detection for repetitive/ stuck tool-call patterns. */
|
||||
loopDetection?: ToolLoopDetectionConfig;
|
||||
sandbox?: {
|
||||
tools?: {
|
||||
allow?: string[];
|
||||
@@ -497,6 +523,8 @@ export type ToolsConfig = {
|
||||
exec?: ExecToolConfig;
|
||||
/** Filesystem tool path guards. */
|
||||
fs?: FsToolsConfig;
|
||||
/** Runtime loop detection for repetitive/ stuck tool-call patterns. */
|
||||
loopDetection?: ToolLoopDetectionConfig;
|
||||
/** Sub-agent tool policy defaults (deny wins). */
|
||||
subagents?: {
|
||||
/** Default model selection for spawned sub-agents (string or {primary,fallbacks}). */
|
||||
|
||||
@@ -358,6 +358,52 @@ const ToolFsSchema = z
|
||||
.strict()
|
||||
.optional();
|
||||
|
||||
const ToolLoopDetectionDetectorSchema = z
|
||||
.object({
|
||||
genericRepeat: z.boolean().optional(),
|
||||
knownPollNoProgress: z.boolean().optional(),
|
||||
pingPong: z.boolean().optional(),
|
||||
})
|
||||
.strict()
|
||||
.optional();
|
||||
|
||||
const ToolLoopDetectionSchema = z
|
||||
.object({
|
||||
enabled: z.boolean().optional(),
|
||||
historySize: z.number().int().positive().optional(),
|
||||
warningThreshold: z.number().int().positive().optional(),
|
||||
criticalThreshold: z.number().int().positive().optional(),
|
||||
globalCircuitBreakerThreshold: z.number().int().positive().optional(),
|
||||
detectors: ToolLoopDetectionDetectorSchema,
|
||||
})
|
||||
.strict()
|
||||
.superRefine((value, ctx) => {
|
||||
if (
|
||||
value.warningThreshold !== undefined &&
|
||||
value.criticalThreshold !== undefined &&
|
||||
value.warningThreshold >= value.criticalThreshold
|
||||
) {
|
||||
ctx.addIssue({
|
||||
code: z.ZodIssueCode.custom,
|
||||
path: ["criticalThreshold"],
|
||||
message: "tools.loopDetection.warningThreshold must be lower than criticalThreshold.",
|
||||
});
|
||||
}
|
||||
if (
|
||||
value.criticalThreshold !== undefined &&
|
||||
value.globalCircuitBreakerThreshold !== undefined &&
|
||||
value.criticalThreshold >= value.globalCircuitBreakerThreshold
|
||||
) {
|
||||
ctx.addIssue({
|
||||
code: z.ZodIssueCode.custom,
|
||||
path: ["globalCircuitBreakerThreshold"],
|
||||
message:
|
||||
"tools.loopDetection.criticalThreshold must be lower than globalCircuitBreakerThreshold.",
|
||||
});
|
||||
}
|
||||
})
|
||||
.optional();
|
||||
|
||||
export const AgentSandboxSchema = z
|
||||
.object({
|
||||
mode: z.union([z.literal("off"), z.literal("non-main"), z.literal("all")]).optional(),
|
||||
@@ -389,6 +435,7 @@ export const AgentToolsSchema = z
|
||||
.optional(),
|
||||
exec: AgentToolExecSchema,
|
||||
fs: ToolFsSchema,
|
||||
loopDetection: ToolLoopDetectionSchema,
|
||||
sandbox: z
|
||||
.object({
|
||||
tools: ToolPolicySchema,
|
||||
@@ -587,6 +634,7 @@ export const ToolsSchema = z
|
||||
})
|
||||
.strict()
|
||||
.optional(),
|
||||
loopDetection: ToolLoopDetectionSchema,
|
||||
message: z
|
||||
.object({
|
||||
allowCrossContextSend: z.boolean().optional(),
|
||||
|
||||
Reference in New Issue
Block a user