diff --git a/docs/gateway/configuration-reference.md b/docs/gateway/configuration-reference.md index 582ce91de9..90ec0fb5fb 100644 --- a/docs/gateway/configuration-reference.md +++ b/docs/gateway/configuration-reference.md @@ -597,6 +597,20 @@ Max total characters injected across all workspace bootstrap files. Default: `15 } ``` +### `agents.defaults.imageMaxDimensionPx` + +Max pixel size for the longest image side in transcript/tool image blocks before provider calls. +Default: `1200`. + +Lower values usually reduce vision-token usage and request payload size for screenshot-heavy runs. +Higher values preserve more visual detail. + +```json5 +{ + agents: { defaults: { imageMaxDimensionPx: 1200 } }, +} +``` + ### `agents.defaults.userTimezone` Timezone for system prompt context (not message timestamps). Falls back to host timezone. diff --git a/docs/gateway/configuration.md b/docs/gateway/configuration.md index bfce441c30..1a5e921c01 100644 --- a/docs/gateway/configuration.md +++ b/docs/gateway/configuration.md @@ -126,7 +126,7 @@ When validation fails: - `agents.defaults.models` defines the model catalog and acts as the allowlist for `/model`. - Model refs use `provider/model` format (e.g. `anthropic/claude-opus-4-6`). - - `agents.defaults.imageMaxDimensionPx` controls transcript/tool image downscaling (default `1200`). + - `agents.defaults.imageMaxDimensionPx` controls transcript/tool image downscaling (default `1200`); lower values usually reduce vision-token usage on screenshot-heavy runs. - See [Models CLI](/concepts/models) for switching models in chat and [Model Failover](/concepts/model-failover) for auth rotation and fallback behavior. - For custom/self-hosted providers, see [Custom providers](/gateway/configuration-reference#custom-providers-and-base-urls) in the reference. diff --git a/docs/reference/token-use.md b/docs/reference/token-use.md index 827a4b588d..96a096259e 100644 --- a/docs/reference/token-use.md +++ b/docs/reference/token-use.md @@ -36,6 +36,12 @@ Everything the model receives counts toward the context limit: - Compaction summaries and pruning artifacts - Provider wrappers or safety headers (not visible, but still counted) +For images, OpenClaw downscales transcript/tool image payloads before provider calls. +Use `agents.defaults.imageMaxDimensionPx` (default: `1200`) to tune this: + +- Lower values usually reduce vision-token usage and payload size. +- Higher values preserve more visual detail for OCR/UI-heavy screenshots. + For a practical breakdown (per injected file, tools, skills, and system prompt size), use `/context list` or `/context detail`. See [Context](/concepts/context). ## How to see current token usage @@ -106,6 +112,7 @@ agents: - Use `/compact` to summarize long sessions. - Trim large tool outputs in your workflows. +- Lower `agents.defaults.imageMaxDimensionPx` for screenshot-heavy sessions. - Keep skill descriptions short (skill list is injected into the prompt). - Prefer smaller models for verbose, exploratory work. diff --git a/docs/reference/transcript-hygiene.md b/docs/reference/transcript-hygiene.md index 95e029aec7..1321175ded 100644 --- a/docs/reference/transcript-hygiene.md +++ b/docs/reference/transcript-hygiene.md @@ -53,6 +53,9 @@ Separate from transcript hygiene, session files are repaired (if needed) before Image payloads are always sanitized to prevent provider-side rejection due to size limits (downscale/recompress oversized base64 images). +This also helps control image-driven token pressure for vision-capable models. +Lower max dimensions generally reduce token usage; higher dimensions preserve detail. + Implementation: - `sanitizeSessionMessagesImages` in `src/agents/pi-embedded-helpers/images.ts`