mirror of
https://github.com/simstudioai/sim.git
synced 2026-04-28 03:00:29 -04:00
* feat(tools): added calcom * added more triggers, tested * updated regex in script for release to be more lenient * fix(tag-dropdown): performance improvements and scroll bug fixes - Add flatTagIndexMap for O(1) tag lookups (replaces O(n²) findIndex calls) - Memoize caret position calculation to avoid DOM manipulation on every render - Use refs for inputValue/cursorPosition to keep handleTagSelect callback stable - Change itemRefs from index-based to tag-based keys to prevent stale refs - Fix scroll jump in nested folders by removing scroll reset from registerFolder - Add onFolderEnter callback for scroll reset when entering folder via keyboard - Disable keyboard navigation wrap-around at boundaries - Simplify selection reset to single effect on flatTagList.length change Also: - Add safeCompare utility for timing-safe string comparison - Refactor webhook signature validation to use safeCompare Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * updated types * fix(calcom): simplify required field constraints for booking attendee The condition field already restricts these to calcom_create_booking, so simplified to required: true. Per Cal.com API docs, email is optional while name and timeZone are required. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * added tests * updated folder multi select, updated calcom and github tools and docs generator script * updated drag, updated outputs for tools, regen docs with nested docs script * updated setup instructions links, destructure trigger outputs, fix text subblock styling * updated docs gen script * updated docs script * updated docs script * updated script * remove destructuring of stripe webhook * expanded wand textarea, updated calcom tools --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
82 lines
4.1 KiB
Plaintext
82 lines
4.1 KiB
Plaintext
---
|
|
title: Mistral Parser
|
|
description: Extract text from PDF documents
|
|
---
|
|
|
|
import { BlockInfoCard } from "@/components/ui/block-info-card"
|
|
|
|
<BlockInfoCard
|
|
type="mistral_parse_v2"
|
|
color="#000000"
|
|
/>
|
|
|
|
{/* MANUAL-CONTENT-START:intro */}
|
|
The Mistral Parse tool provides a powerful way to extract and process content from PDF documents using [Mistral's OCR API](https://mistral.ai/). This tool leverages advanced optical character recognition to accurately extract text and structure from PDF files, making it easy to incorporate document data into your agent workflows.
|
|
|
|
With the Mistral Parse tool, you can:
|
|
|
|
- **Extract text from PDFs**: Accurately convert PDF content to text, markdown, or JSON formats
|
|
- **Process PDFs from URLs**: Directly extract content from PDFs hosted online by providing their URLs
|
|
- **Maintain document structure**: Preserve formatting, tables, and layout from the original PDFs
|
|
- **Extract images**: Optionally include embedded images from the PDFs
|
|
- **Select specific pages**: Process only the pages you need from multi-page documents
|
|
|
|
The Mistral Parse tool is particularly useful for scenarios where your agents need to work with PDF content, such as analyzing reports, extracting data from forms, or processing text from scanned documents. It simplifies the process of making PDF content available to your agents, allowing them to work with information stored in PDFs just as easily as with direct text input.
|
|
{/* MANUAL-CONTENT-END */}
|
|
|
|
|
|
## Usage Instructions
|
|
|
|
Integrate Mistral Parse into the workflow. Can extract text from uploaded PDF documents, or from a URL.
|
|
|
|
|
|
|
|
## Tools
|
|
|
|
### `mistral_parser`
|
|
|
|
Parse PDF documents using Mistral OCR API
|
|
|
|
#### Input
|
|
|
|
| Parameter | Type | Required | Description |
|
|
| --------- | ---- | -------- | ----------- |
|
|
| `filePath` | string | Yes | URL to a PDF document to be processed |
|
|
| `fileUpload` | object | No | File upload data from file-upload component |
|
|
| `resultType` | string | No | Type of parsed result \(markdown, text, or json\). Defaults to markdown. |
|
|
| `includeImageBase64` | boolean | No | Include base64-encoded images in the response |
|
|
| `pages` | array | No | Specific pages to process \(array of page numbers, starting from 0\) |
|
|
| `imageLimit` | number | No | Maximum number of images to extract from the PDF |
|
|
| `imageMinSize` | number | No | Minimum height and width of images to extract from the PDF |
|
|
| `apiKey` | string | Yes | Mistral API key \(MISTRAL_API_KEY\) |
|
|
|
|
#### Output
|
|
|
|
| Parameter | Type | Description |
|
|
| --------- | ---- | ----------- |
|
|
| `pages` | array | Array of page objects from Mistral OCR |
|
|
| ↳ `index` | number | Page index \(zero-based\) |
|
|
| ↳ `markdown` | string | Extracted markdown content |
|
|
| ↳ `images` | array | Images extracted from this page with bounding boxes |
|
|
| ↳ `id` | string | Image identifier \(e.g., img-0.jpeg\) |
|
|
| ↳ `top_left_x` | number | Top-left X coordinate in pixels |
|
|
| ↳ `top_left_y` | number | Top-left Y coordinate in pixels |
|
|
| ↳ `bottom_right_x` | number | Bottom-right X coordinate in pixels |
|
|
| ↳ `bottom_right_y` | number | Bottom-right Y coordinate in pixels |
|
|
| ↳ `image_base64` | string | Base64-encoded image data \(when include_image_base64=true\) |
|
|
| ↳ `dimensions` | object | Page dimensions |
|
|
| ↳ `dpi` | number | Dots per inch |
|
|
| ↳ `height` | number | Page height in pixels |
|
|
| ↳ `width` | number | Page width in pixels |
|
|
| ↳ `tables` | array | Extracted tables as HTML/markdown \(when table_format is set\). Referenced via placeholders like \[tbl-0.html\] |
|
|
| ↳ `hyperlinks` | array | Array of URL strings detected in the page \(e.g., \["https://...", "mailto:..."\]\) |
|
|
| ↳ `header` | string | Page header content \(when extract_header=true\) |
|
|
| ↳ `footer` | string | Page footer content \(when extract_footer=true\) |
|
|
| `model` | string | Mistral OCR model identifier \(e.g., mistral-ocr-latest\) |
|
|
| `usage_info` | object | Usage and processing statistics |
|
|
| ↳ `pages_processed` | number | Total number of pages processed |
|
|
| ↳ `doc_size_bytes` | number | Document file size in bytes |
|
|
| `document_annotation` | string | Structured annotation data as JSON string \(when applicable\) |
|
|
|
|
|