mirror of
https://github.com/simstudioai/sim.git
synced 2026-02-18 10:22:00 -05:00
improvement(kb): improve chunkers, respect user-specified chunk configurations, added tests (#2539)
* improvement(kb): improve chunkers, respect user-specified chunk configurations, added tests * ack PR commnets * updated docs * cleanup
This commit is contained in:
@@ -34,9 +34,15 @@ Once your documents are processed, you can view and edit the individual chunks.
|
||||
<Image src="/static/knowledgebase/knowledgebase.png" alt="Document chunks view showing processed content" width={800} height={500} />
|
||||
|
||||
### Chunk Configuration
|
||||
- **Default chunk size**: 1,024 characters
|
||||
- **Configurable range**: 100-4,000 characters per chunk
|
||||
- **Smart overlap**: 200 characters by default for context preservation
|
||||
|
||||
When creating a knowledge base, you can configure how documents are split into chunks:
|
||||
|
||||
| Setting | Unit | Default | Range | Description |
|
||||
|---------|------|---------|-------|-------------|
|
||||
| **Max Chunk Size** | tokens | 1,024 | 100-4,000 | Maximum size of each chunk (1 token ≈ 4 characters) |
|
||||
| **Min Chunk Size** | characters | 1 | 1-2,000 | Minimum chunk size to avoid tiny fragments |
|
||||
| **Overlap** | characters | 200 | 0-500 | Context overlap between consecutive chunks |
|
||||
|
||||
- **Hierarchical splitting**: Respects document structure (sections, paragraphs, sentences)
|
||||
|
||||
### Editing Capabilities
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
title: Memory
|
||||
description: Store and retrieve conversation history
|
||||
description: Add memory store
|
||||
---
|
||||
|
||||
import { BlockInfoCard } from "@/components/ui/block-info-card"
|
||||
@@ -10,94 +10,95 @@ import { BlockInfoCard } from "@/components/ui/block-info-card"
|
||||
color="#F64F9E"
|
||||
/>
|
||||
|
||||
## Overview
|
||||
## Usage Instructions
|
||||
|
||||
Integrate Memory into the workflow. Can add, get a memory, get all memories, and delete memories.
|
||||
|
||||
The Memory block stores conversation history for agents. Each memory is identified by a `conversationId` that you provide. Multiple agents can share the same memory by using the same `conversationId`.
|
||||
|
||||
Memory stores only user and assistant messages. System messages are not stored—they are configured in the Agent block and prefixed at runtime.
|
||||
|
||||
## Tools
|
||||
|
||||
### `memory_add`
|
||||
|
||||
Add a message to memory. Creates a new memory if the `conversationId` doesn't exist, or appends to existing memory.
|
||||
Add a new memory to the database or append to existing memory with the same ID.
|
||||
|
||||
#### Input
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --------- | ---- | -------- | ----------- |
|
||||
| `conversationId` | string | Yes | Unique identifier for the conversation (e.g., `user-123`, `session-abc`) |
|
||||
| `role` | string | Yes | Message role: `user` or `assistant` |
|
||||
| `content` | string | Yes | Message content |
|
||||
| `conversationId` | string | No | Conversation identifier \(e.g., user-123, session-abc\). If a memory with this conversationId already exists, the new message will be appended to it. |
|
||||
| `id` | string | No | Legacy parameter for conversation identifier. Use conversationId instead. Provided for backwards compatibility. |
|
||||
| `role` | string | Yes | Role for agent memory \(user, assistant, or system\) |
|
||||
| `content` | string | Yes | Content for agent memory |
|
||||
|
||||
#### Output
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| --------- | ---- | ----------- |
|
||||
| `success` | boolean | Whether the operation succeeded |
|
||||
| `memories` | array | Updated memory array |
|
||||
| `error` | string | Error message if failed |
|
||||
| `success` | boolean | Whether the memory was added successfully |
|
||||
| `memories` | array | Array of memory objects including the new or updated memory |
|
||||
| `error` | string | Error message if operation failed |
|
||||
|
||||
### `memory_get`
|
||||
|
||||
Retrieve memory by conversation ID.
|
||||
Retrieve memory by conversationId. Returns matching memories.
|
||||
|
||||
#### Input
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --------- | ---- | -------- | ----------- |
|
||||
| `conversationId` | string | Yes | Conversation identifier |
|
||||
| `conversationId` | string | No | Conversation identifier \(e.g., user-123, session-abc\). Returns memories for this conversation. |
|
||||
| `id` | string | No | Legacy parameter for conversation identifier. Use conversationId instead. Provided for backwards compatibility. |
|
||||
|
||||
#### Output
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| --------- | ---- | ----------- |
|
||||
| `success` | boolean | Whether the operation succeeded |
|
||||
| `memories` | array | Array of messages with `role` and `content` |
|
||||
| `error` | string | Error message if failed |
|
||||
| `success` | boolean | Whether the memory was retrieved successfully |
|
||||
| `memories` | array | Array of memory objects with conversationId and data fields |
|
||||
| `message` | string | Success or error message |
|
||||
| `error` | string | Error message if operation failed |
|
||||
|
||||
### `memory_get_all`
|
||||
|
||||
Retrieve all memories for the current workspace.
|
||||
|
||||
#### Output
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| --------- | ---- | ----------- |
|
||||
| `success` | boolean | Whether the operation succeeded |
|
||||
| `memories` | array | All memory objects with `conversationId` and `data` fields |
|
||||
| `error` | string | Error message if failed |
|
||||
|
||||
### `memory_delete`
|
||||
|
||||
Delete memory by conversation ID.
|
||||
Retrieve all memories from the database
|
||||
|
||||
#### Input
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --------- | ---- | -------- | ----------- |
|
||||
| `conversationId` | string | Yes | Conversation identifier to delete |
|
||||
|
||||
#### Output
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| --------- | ---- | ----------- |
|
||||
| `success` | boolean | Whether the operation succeeded |
|
||||
| `message` | string | Confirmation message |
|
||||
| `error` | string | Error message if failed |
|
||||
| `success` | boolean | Whether all memories were retrieved successfully |
|
||||
| `memories` | array | Array of all memory objects with key, conversationId, and data fields |
|
||||
| `message` | string | Success or error message |
|
||||
| `error` | string | Error message if operation failed |
|
||||
|
||||
## Agent Memory Types
|
||||
### `memory_delete`
|
||||
|
||||
Delete memories by conversationId.
|
||||
|
||||
#### Input
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --------- | ---- | -------- | ----------- |
|
||||
| `conversationId` | string | No | Conversation identifier \(e.g., user-123, session-abc\). Deletes all memories for this conversation. |
|
||||
| `id` | string | No | Legacy parameter for conversation identifier. Use conversationId instead. Provided for backwards compatibility. |
|
||||
|
||||
#### Output
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| --------- | ---- | ----------- |
|
||||
| `success` | boolean | Whether the memory was deleted successfully |
|
||||
| `message` | string | Success or error message |
|
||||
| `error` | string | Error message if operation failed |
|
||||
|
||||
When using memory with an Agent block, you can configure how conversation history is managed:
|
||||
|
||||
| Type | Description |
|
||||
| ---- | ----------- |
|
||||
| **Full Conversation** | Stores all messages, limited by model's context window (uses 90% to leave room for response) |
|
||||
| **Sliding Window (Messages)** | Keeps the last N messages (default: 10) |
|
||||
| **Sliding Window (Tokens)** | Keeps messages that fit within a token limit (default: 4000) |
|
||||
|
||||
## Notes
|
||||
|
||||
- Memory is scoped per workspace—workflows in the same workspace share the memory store
|
||||
- Use unique `conversationId` values to keep conversations separate (e.g., session IDs, user IDs, or UUIDs)
|
||||
- System messages belong in the Agent block configuration, not in memory
|
||||
- Category: `blocks`
|
||||
- Type: `memory`
|
||||
|
||||
Reference in New Issue
Block a user