mirror of
https://github.com/openclaw/openclaw.git
synced 2026-02-19 18:39:20 -05:00
* exec: clean up PTY resources on timeout and exit * cli: harden resume cleanup and watchdog stalled runs * cli: productionize PTY and resume reliability paths * docs: add PTY process supervision architecture plan * docs: rewrite PTY supervision plan as pre-rewrite baseline * docs: switch PTY supervision plan to one-go execution * docs: add one-line root cause to PTY supervision plan * docs: add OS contracts and test matrix to PTY supervision plan * docs: define process-supervisor package placement and scope * docs: tie supervisor plan to existing CI lanes * docs: place PTY supervisor plan under src/process * refactor(process): route exec and cli runs through supervisor * docs(process): refresh PTY supervision plan * wip * fix(process): harden supervisor timeout and PTY termination * fix(process): harden supervisor adapters env and wait handling * ci: avoid failing formal conformance on comment permissions * test(ui): fix cron request mock argument typing * fix(ui): remove leftover conflict marker * fix: supervise PTY processes (#14257) (openclaw#14257) (thanks @onutc)
7.8 KiB
7.8 KiB
summary, owner, status, last_updated, title
| summary | owner | status | last_updated | title |
|---|---|---|---|---|
| Production plan for reliable interactive process supervision (PTY + non-PTY) with explicit ownership, unified lifecycle, and deterministic cleanup | openclaw | in-progress | 2026-02-15 | PTY and Process Supervision Plan |
PTY and Process Supervision Plan
1. Problem and goal
We need one reliable lifecycle for long-running command execution across:
execforeground runsexecbackground runsprocessfollow up actions (poll,log,send-keys,paste,submit,kill,remove)- CLI agent runner subprocesses
The goal is not just to support PTY. The goal is predictable ownership, cancellation, timeout, and cleanup with no unsafe process matching heuristics.
2. Scope and boundaries
- Keep implementation internal in
src/process/supervisor. - Do not create a new package for this.
- Keep current behavior compatibility where practical.
- Do not broaden scope to terminal replay or tmux style session persistence.
3. Implemented in this branch
Supervisor baseline already present
- Supervisor module is in place under
src/process/supervisor/*. - Exec runtime and CLI runner are already routed through supervisor spawn and wait.
- Registry finalization is idempotent.
This pass completed
- Explicit PTY command contract
SpawnInputis now a discriminated union insrc/process/supervisor/types.ts.- PTY runs require
ptyCommandinstead of reusing genericargv. - Supervisor no longer rebuilds PTY command strings from argv joins in
src/process/supervisor/supervisor.ts. - Exec runtime now passes
ptyCommanddirectly insrc/agents/bash-tools.exec-runtime.ts.
- Process layer type decoupling
- Supervisor types no longer import
SessionStdinfrom agents. - Process local stdin contract lives in
src/process/supervisor/types.ts(ManagedRunStdin). - Adapters now depend only on process level types:
src/process/supervisor/adapters/child.tssrc/process/supervisor/adapters/pty.ts
- Process tool lifecycle ownership improvement
src/agents/bash-tools.process.tsnow requests cancellation through supervisor first.process kill/removenow use process-tree fallback termination when supervisor lookup misses.removekeeps deterministic remove behavior by dropping running session entries immediately after termination is requested.
- Single source watchdog defaults
- Added shared defaults in
src/agents/cli-watchdog-defaults.ts. src/agents/cli-backends.tsconsumes the shared defaults.src/agents/cli-runner/reliability.tsconsumes the same shared defaults.
- Dead helper cleanup
- Removed unused
killSessionhelper path fromsrc/agents/bash-tools.shared.ts.
- Direct supervisor path tests added
- Added
src/agents/bash-tools.process.supervisor.test.tsto cover kill and remove routing through supervisor cancellation.
- Reliability gap fixes completed
src/agents/bash-tools.process.tsnow falls back to real OS-level process termination when supervisor lookup misses.src/process/supervisor/adapters/child.tsnow uses process-tree termination semantics for default cancel/timeout kill paths.- Added shared process-tree utility in
src/process/kill-tree.ts.
- PTY contract edge-case coverage added
- Added
src/process/supervisor/supervisor.pty-command.test.tsfor verbatim PTY command forwarding and empty-command rejection. - Added
src/process/supervisor/adapters/child.test.tsfor process-tree kill behavior in child adapter cancellation.
4. Remaining gaps and decisions
Reliability status
The two required reliability gaps for this pass are now closed:
process kill/removenow has a real OS termination fallback when supervisor lookup misses.- child cancel/timeout now uses process-tree kill semantics for default kill path.
- Regression tests were added for both behaviors.
Durability and startup reconciliation
Restart behavior is now explicitly defined as in-memory lifecycle only.
reconcileOrphans()remains a no-op insrc/process/supervisor/supervisor.tsby design.- Active runs are not recovered after process restart.
- This boundary is intentional for this implementation pass to avoid partial persistence risks.
Maintainability follow-ups
runExecProcessinsrc/agents/bash-tools.exec-runtime.tsstill handles multiple responsibilities and can be split into focused helpers in a follow-up.
5. Implementation plan
The implementation pass for required reliability and contract items is complete.
Completed:
process kill/removefallback real termination- process-tree cancellation for child adapter default kill path
- regression tests for fallback kill and child adapter kill path
- PTY command edge-case tests under explicit
ptyCommand - explicit in-memory restart boundary with
reconcileOrphans()no-op by design
Optional follow-up:
- split
runExecProcessinto focused helpers with no behavior drift
6. File map
Process supervisor
src/process/supervisor/types.tsupdated with discriminated spawn input and process local stdin contract.src/process/supervisor/supervisor.tsupdated to use explicitptyCommand.src/process/supervisor/adapters/child.tsandsrc/process/supervisor/adapters/pty.tsdecoupled from agent types.src/process/supervisor/registry.tsidempotent finalize unchanged and retained.
Exec and process integration
src/agents/bash-tools.exec-runtime.tsupdated to pass PTY command explicitly and keep fallback path.src/agents/bash-tools.process.tsupdated to cancel via supervisor with real process-tree fallback termination.src/agents/bash-tools.shared.tsremoved direct kill helper path.
CLI reliability
src/agents/cli-watchdog-defaults.tsadded as shared baseline.src/agents/cli-backends.tsandsrc/agents/cli-runner/reliability.tsnow consume same defaults.
7. Validation run in this pass
Unit tests:
pnpm vitest src/process/supervisor/registry.test.tspnpm vitest src/process/supervisor/supervisor.test.tspnpm vitest src/process/supervisor/supervisor.pty-command.test.tspnpm vitest src/process/supervisor/adapters/child.test.tspnpm vitest src/agents/cli-backends.test.tspnpm vitest src/agents/bash-tools.exec.pty-cleanup.test.tspnpm vitest src/agents/bash-tools.process.poll-timeout.test.tspnpm vitest src/agents/bash-tools.process.supervisor.test.tspnpm vitest src/process/exec.test.ts
E2E targets:
pnpm test:e2e src/agents/cli-runner.e2e.test.tspnpm test:e2e src/agents/bash-tools.exec.pty-fallback.e2e.test.ts src/agents/bash-tools.exec.background-abort.e2e.test.ts src/agents/bash-tools.process.send-keys.e2e.test.ts
Typecheck note:
pnpm tsgocurrently fails in this repo due to a pre-existing UI typing dependency issue (@vitest/browser-playwrightresolution), unrelated to this process supervision work.
8. Operational guarantees preserved
- Exec env hardening behavior is unchanged.
- Approval and allowlist flow is unchanged.
- Output sanitization and output caps are unchanged.
- PTY adapter still guarantees wait settlement on forced kill and listener disposal.
9. Definition of done
- Supervisor is lifecycle owner for managed runs.
- PTY spawn uses explicit command contract with no argv reconstruction.
- Process layer has no type dependency on agent layer for supervisor stdin contracts.
- Watchdog defaults are single source.
- Targeted unit and e2e tests remain green.
- Restart durability boundary is explicitly documented or fully implemented.
10. Summary
The branch now has a coherent and safer supervision shape:
- explicit PTY contract
- cleaner process layering
- supervisor driven cancellation path for process operations
- real fallback termination when supervisor lookup misses
- process-tree cancellation for child-run default kill paths
- unified watchdog defaults
- explicit in-memory restart boundary (no orphan reconciliation across restart in this pass)