improvement(kb): improve chunkers, respect user-specified chunk configurations, added tests (#2539)

* improvement(kb): improve chunkers, respect user-specified chunk configurations, added tests

* ack PR commnets

* updated docs

* cleanup
This commit is contained in:
Waleed
2025-12-22 20:47:29 -08:00
committed by GitHub
parent e0d96e2126
commit 37443a7b77
20 changed files with 583 additions and 219 deletions

View File

@@ -34,9 +34,15 @@ Once your documents are processed, you can view and edit the individual chunks.
<Image src="/static/knowledgebase/knowledgebase.png" alt="Document chunks view showing processed content" width={800} height={500} />
### Chunk Configuration
- **Default chunk size**: 1,024 characters
- **Configurable range**: 100-4,000 characters per chunk
- **Smart overlap**: 200 characters by default for context preservation
When creating a knowledge base, you can configure how documents are split into chunks:
| Setting | Unit | Default | Range | Description |
|---------|------|---------|-------|-------------|
| **Max Chunk Size** | tokens | 1,024 | 100-4,000 | Maximum size of each chunk (1 token ≈ 4 characters) |
| **Min Chunk Size** | characters | 1 | 1-2,000 | Minimum chunk size to avoid tiny fragments |
| **Overlap** | characters | 200 | 0-500 | Context overlap between consecutive chunks |
- **Hierarchical splitting**: Respects document structure (sections, paragraphs, sentences)
### Editing Capabilities

View File

@@ -1,6 +1,6 @@
---
title: Memory
description: Store and retrieve conversation history
description: Add memory store
---
import { BlockInfoCard } from "@/components/ui/block-info-card"
@@ -10,94 +10,95 @@ import { BlockInfoCard } from "@/components/ui/block-info-card"
color="#F64F9E"
/>
## Overview
## Usage Instructions
Integrate Memory into the workflow. Can add, get a memory, get all memories, and delete memories.
The Memory block stores conversation history for agents. Each memory is identified by a `conversationId` that you provide. Multiple agents can share the same memory by using the same `conversationId`.
Memory stores only user and assistant messages. System messages are not stored—they are configured in the Agent block and prefixed at runtime.
## Tools
### `memory_add`
Add a message to memory. Creates a new memory if the `conversationId` doesn't exist, or appends to existing memory.
Add a new memory to the database or append to existing memory with the same ID.
#### Input
| Parameter | Type | Required | Description |
| --------- | ---- | -------- | ----------- |
| `conversationId` | string | Yes | Unique identifier for the conversation (e.g., `user-123`, `session-abc`) |
| `role` | string | Yes | Message role: `user` or `assistant` |
| `content` | string | Yes | Message content |
| `conversationId` | string | No | Conversation identifier \(e.g., user-123, session-abc\). If a memory with this conversationId already exists, the new message will be appended to it. |
| `id` | string | No | Legacy parameter for conversation identifier. Use conversationId instead. Provided for backwards compatibility. |
| `role` | string | Yes | Role for agent memory \(user, assistant, or system\) |
| `content` | string | Yes | Content for agent memory |
#### Output
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `success` | boolean | Whether the operation succeeded |
| `memories` | array | Updated memory array |
| `error` | string | Error message if failed |
| `success` | boolean | Whether the memory was added successfully |
| `memories` | array | Array of memory objects including the new or updated memory |
| `error` | string | Error message if operation failed |
### `memory_get`
Retrieve memory by conversation ID.
Retrieve memory by conversationId. Returns matching memories.
#### Input
| Parameter | Type | Required | Description |
| --------- | ---- | -------- | ----------- |
| `conversationId` | string | Yes | Conversation identifier |
| `conversationId` | string | No | Conversation identifier \(e.g., user-123, session-abc\). Returns memories for this conversation. |
| `id` | string | No | Legacy parameter for conversation identifier. Use conversationId instead. Provided for backwards compatibility. |
#### Output
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `success` | boolean | Whether the operation succeeded |
| `memories` | array | Array of messages with `role` and `content` |
| `error` | string | Error message if failed |
| `success` | boolean | Whether the memory was retrieved successfully |
| `memories` | array | Array of memory objects with conversationId and data fields |
| `message` | string | Success or error message |
| `error` | string | Error message if operation failed |
### `memory_get_all`
Retrieve all memories for the current workspace.
#### Output
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `success` | boolean | Whether the operation succeeded |
| `memories` | array | All memory objects with `conversationId` and `data` fields |
| `error` | string | Error message if failed |
### `memory_delete`
Delete memory by conversation ID.
Retrieve all memories from the database
#### Input
| Parameter | Type | Required | Description |
| --------- | ---- | -------- | ----------- |
| `conversationId` | string | Yes | Conversation identifier to delete |
#### Output
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `success` | boolean | Whether the operation succeeded |
| `message` | string | Confirmation message |
| `error` | string | Error message if failed |
| `success` | boolean | Whether all memories were retrieved successfully |
| `memories` | array | Array of all memory objects with key, conversationId, and data fields |
| `message` | string | Success or error message |
| `error` | string | Error message if operation failed |
## Agent Memory Types
### `memory_delete`
Delete memories by conversationId.
#### Input
| Parameter | Type | Required | Description |
| --------- | ---- | -------- | ----------- |
| `conversationId` | string | No | Conversation identifier \(e.g., user-123, session-abc\). Deletes all memories for this conversation. |
| `id` | string | No | Legacy parameter for conversation identifier. Use conversationId instead. Provided for backwards compatibility. |
#### Output
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `success` | boolean | Whether the memory was deleted successfully |
| `message` | string | Success or error message |
| `error` | string | Error message if operation failed |
When using memory with an Agent block, you can configure how conversation history is managed:
| Type | Description |
| ---- | ----------- |
| **Full Conversation** | Stores all messages, limited by model's context window (uses 90% to leave room for response) |
| **Sliding Window (Messages)** | Keeps the last N messages (default: 10) |
| **Sliding Window (Tokens)** | Keeps messages that fit within a token limit (default: 4000) |
## Notes
- Memory is scoped per workspace—workflows in the same workspace share the memory store
- Use unique `conversationId` values to keep conversations separate (e.g., session IDs, user IDs, or UUIDs)
- System messages belong in the Agent block configuration, not in memory
- Category: `blocks`
- Type: `memory`