feat (memory): Implement new (opt-in) QMD memory backend

This commit is contained in:
Vignesh Natarajan
2026-01-27 21:57:15 -08:00
committed by vignesh07
parent e25fedf932
commit 8af37aa15f
25 changed files with 1622 additions and 97 deletions

View File

@@ -118,6 +118,83 @@ Notes:
- Only Markdown files are indexed.
- Symlinks are ignored (files or directories).
### QMD backend (experimental)
Set `memory.backend = "qmd"` to swap the built-in SQLite indexer for
[QMD](https://github.com/tobi/qmd): a local-first search sidecar that combines
BM25 + vectors + reranking. Markdown stays the source of truth; OpenClaw shells
out to QMD for retrieval. Key points:
**Prereqs**
- Disabled by default. Opt in per-config (`memory.backend = "qmd"`).
- Install the QMD CLI separately (`bun install -g github.com/tobi/qmd` or grab
a release) and make sure the `qmd` binary is on the gateways `PATH`.
- QMD needs an SQLite build that allows extensions (`brew install sqlite` on
macOS). The gateway sets `INDEX_PATH`/`QMD_CONFIG_DIR` automatically.
**How the sidecar runs**
- The gateway writes a self-contained QMD home under
`~/.clawdbot/agents/<agentId>/qmd/` (config + cache + sqlite DB).
- Collections are rewritten from `memory.qmd.paths` (plus default workspace
memory files) into `index.yml`, then `qmd update` + `qmd embed` run on boot and
on a configurable interval (`memory.qmd.update.interval`, default 5m).
- Searches run via `qmd query --json`. If QMD fails or the binary is missing,
OpenClaw automatically falls back to the builtin SQLite manager so memory tools
keep working.
**Config surface (`memory.qmd.*`)**
- `command` (default `qmd`): override the executable path.
- `includeDefaultMemory` (default `true`): auto-index `MEMORY.md` + `memory/**/*.md`.
- `paths[]`: add extra directories/files (`path`, optional `pattern`, optional
stable `name`).
- `sessions`: opt into session JSONL indexing (`enabled`, `retentionDays`,
`exportDir`, `redactToolOutputs`—defaults to redacting tool payloads).
- `update`: controls refresh cadence (`interval`, `debounceMs`, `onBoot`).
- `limits`: clamp recall payload (`maxResults`, `maxSnippetChars`,
`maxInjectedChars`, `timeoutMs`).
- `scope`: same schema as [`session.sendPolicy`](/reference/configuration#session-sendpolicy).
Default is DM-only (`deny` all, `allow` direct chats); loosen it to surface QMD
hits in groups/channels.
- Snippets sourced outside the workspace show up as
`qmd/<collection>/<relative-path>` in `memory_search` results; `memory_get`
understands that prefix and reads from the configured QMD collection root.
- When `memory.qmd.sessions.enabled = true`, OpenClaw exports sanitized session
transcripts (User/Assistant turns) into a dedicated QMD collection under
`~/.clawdbot/agents/<id>/qmd/sessions/`, so `memory_search` can recall recent
conversations without touching the builtin SQLite index.
- `memory_search` snippets now include a `Source: <path#line>` footer when
`memory.citations` is `auto`/`on`; set `memory.citations = "off"` to keep
the path metadata internal (the agent still receives the path for
`memory_get`, but the snippet text omits the footer and the system prompt
warns the agent not to cite it).
**Example**
```json5
memory: {
backend: "qmd",
citations: "auto",
qmd: {
includeDefaultMemory: true,
update: { interval: "5m", debounceMs: 15000 },
limits: { maxResults: 6, timeoutMs: 4000 },
scope: {
default: "deny",
rules: [{ action: "allow", match: { chatType: "direct" } }]
},
paths: [
{ name: "docs", path: "~/notes", pattern: "**/*.md" }
]
}
}
```
**Citations & fallback**
- `memory.citations` applies regardless of backend (`auto`/`on`/`off`).
- When `qmd` runs, we tag `status().backend = "qmd"` so diagnostics show which engine served the results.
If the QMD subprocess exits or JSON output cant be parsed, the search manager logs a warning and
returns the builtin provider (existing Markdown embeddings) until QMD recovers.
### Gemini embeddings (native)
Set the provider to `gemini` to use the Gemini embeddings API directly: