feat(tools): added textract, added v2 for mistral, updated tag dropdown (#2904)

* feat(tools): added textract

* cleanup

* ack pr comments

* reorder

* removed upload for textract async version

* fix additional fields dropdown in editor, update parser to leave validation to be done on the server

* added mistral v2, files v2, and finalized textract

* updated the rest of the old file patterns, updated mistral outputs for v2

* updated tag dropdown to parse non-operation fields as well

* updated extension finder

* cleanup

* added description for inputs to workflow

* use helper for internal route check

* fix tag dropdown merge conflict change

* remove duplicate code

---------

Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai>
This commit is contained in:
Waleed
2026-01-20 18:41:26 -08:00
committed by GitHub
parent 1f1f015031
commit 563098ca0a
49 changed files with 3360 additions and 377 deletions

View File

@@ -6,7 +6,7 @@ description: Interact with Confluence
import { BlockInfoCard } from "@/components/ui/block-info-card"
<BlockInfoCard
type="confluence"
type="confluence_v2"
color="#E0E0E0"
/>

View File

@@ -6,7 +6,7 @@ description: Read and parse multiple files
import { BlockInfoCard } from "@/components/ui/block-info-card"
<BlockInfoCard
type="file"
type="file_v2"
color="#40916C"
/>
@@ -48,7 +48,7 @@ Parse one or more uploaded files or files from URLs (text, PDF, CSV, images, etc
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `files` | array | Array of parsed files |
| `combinedContent` | string | Combined content of all parsed files |
| `files` | array | Array of parsed files with content, metadata, and file properties |
| `combinedContent` | string | All file contents merged into a single text string |

View File

@@ -106,6 +106,7 @@
"supabase",
"tavily",
"telegram",
"textract",
"tinybird",
"translate",
"trello",

View File

@@ -6,7 +6,7 @@ description: Extract text from PDF documents
import { BlockInfoCard } from "@/components/ui/block-info-card"
<BlockInfoCard
type="mistral_parse"
type="mistral_parse_v2"
color="#000000"
/>
@@ -54,18 +54,37 @@ Parse PDF documents using Mistral OCR API
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `success` | boolean | Whether the PDF was parsed successfully |
| `content` | string | Extracted content in the requested format \(markdown, text, or JSON\) |
| `metadata` | object | Processing metadata including jobId, fileType, pageCount, and usage info |
| ↳ `jobId` | string | Unique job identifier |
| ↳ `fileType` | string | File type \(e.g., pdf\) |
| ↳ `fileName` | string | Original file name |
| ↳ `source` | string | Source type \(url\) |
| ↳ `pageCount` | number | Number of pages processed |
| ↳ `model` | string | Mistral model used |
| ↳ `resultType` | string | Output format \(markdown, text, json\) |
| ↳ `processedAt` | string | Processing timestamp |
| ↳ `sourceUrl` | string | Source URL if applicable |
| ↳ `usageInfo` | object | Usage statistics from OCR processing |
| `pages` | array | Array of page objects from Mistral OCR |
| ↳ `index` | number | Page index \(zero-based\) |
| ↳ `markdown` | string | Extracted markdown content |
| ↳ `images` | array | Images extracted from this page with bounding boxes |
| ↳ `id` | string | Image identifier \(e.g., img-0.jpeg\) |
| ↳ `top_left_x` | number | Top-left X coordinate in pixels |
| ↳ `top_left_y` | number | Top-left Y coordinate in pixels |
| ↳ `bottom_right_x` | number | Bottom-right X coordinate in pixels |
| ↳ `bottom_right_y` | number | Bottom-right Y coordinate in pixels |
| ↳ `image_base64` | string | Base64-encoded image data \(when include_image_base64=true\) |
| ↳ `id` | string | Image identifier \(e.g., img-0.jpeg\) |
| ↳ `top_left_x` | number | Top-left X coordinate in pixels |
| ↳ `top_left_y` | number | Top-left Y coordinate in pixels |
| ↳ `bottom_right_x` | number | Bottom-right X coordinate in pixels |
| ↳ `bottom_right_y` | number | Bottom-right Y coordinate in pixels |
| ↳ `image_base64` | string | Base64-encoded image data \(when include_image_base64=true\) |
| ↳ `dimensions` | object | Page dimensions |
| ↳ `dpi` | number | Dots per inch |
| ↳ `height` | number | Page height in pixels |
| ↳ `width` | number | Page width in pixels |
| ↳ `dpi` | number | Dots per inch |
| ↳ `height` | number | Page height in pixels |
| ↳ `width` | number | Page width in pixels |
| ↳ `tables` | array | Extracted tables as HTML/markdown \(when table_format is set\). Referenced via placeholders like \[tbl-0.html\] |
| ↳ `hyperlinks` | array | Array of URL strings detected in the page \(e.g., \[ |
| ↳ `header` | string | Page header content \(when extract_header=true\) |
| ↳ `footer` | string | Page footer content \(when extract_footer=true\) |
| `model` | string | Mistral OCR model identifier \(e.g., mistral-ocr-latest\) |
| `usage_info` | object | Usage and processing statistics |
| ↳ `pages_processed` | number | Total number of pages processed |
| ↳ `doc_size_bytes` | number | Document file size in bytes |
| `document_annotation` | string | Structured annotation data as JSON string \(when applicable\) |

View File

@@ -58,6 +58,7 @@ Upload a file to an AWS S3 bucket
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `url` | string | URL of the uploaded S3 object |
| `uri` | string | S3 URI of the uploaded object \(s3://bucket/key\) |
| `metadata` | object | Upload metadata including ETag and location |
### `s3_get_object`
@@ -149,6 +150,7 @@ Copy an object within or between AWS S3 buckets
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `url` | string | URL of the copied S3 object |
| `uri` | string | S3 URI of the copied object \(s3://bucket/key\) |
| `metadata` | object | Copy operation metadata |

View File

@@ -0,0 +1,120 @@
---
title: AWS Textract
description: Extract text, tables, and forms from documents
---
import { BlockInfoCard } from "@/components/ui/block-info-card"
<BlockInfoCard
type="textract"
color="linear-gradient(135deg, #055F4E 0%, #56C0A7 100%)"
/>
{/* MANUAL-CONTENT-START:intro */}
[AWS Textract](https://aws.amazon.com/textract/) is a powerful AI service from Amazon Web Services designed to automatically extract printed text, handwriting, tables, forms, key-value pairs, and other structured data from scanned documents and images. Textract leverages advanced optical character recognition (OCR) and document analysis to transform documents into actionable data, enabling automation, analytics, compliance, and more.
With AWS Textract, you can:
- **Extract text from images and documents**: Recognize printed text and handwriting in formats such as PDF, JPEG, PNG, or TIFF
- **Detect and extract tables**: Automatically find tables and output their structured content
- **Parse forms and key-value pairs**: Pull structured data from forms, including fields and their corresponding values
- **Identify signatures and layout features**: Detect signatures, geometric layout, and relationships between document elements
- **Customize extraction with queries**: Extract specific fields and answers using query-based extraction (e.g., "What is the invoice number?")
In Sim, the AWS Textract integration empowers your agents to intelligently process documents as part of their workflows. This unlocks automation scenarios such as data entry from invoices, onboarding documents, contracts, receipts, and more. Your agents can extract relevant data, analyze structured forms, and generate summaries or reports directly from document uploads or URLs. By connecting Sim with AWS Textract, you can reduce manual effort, improve data accuracy, and streamline your business processes with robust document understanding.
{/* MANUAL-CONTENT-END */}
## Usage Instructions
Integrate AWS Textract into your workflow to extract text, tables, forms, and key-value pairs from documents. Single-page mode supports JPEG, PNG, and single-page PDF. Multi-page mode supports multi-page PDF and TIFF.
## Tools
### `textract_parser`
Parse documents using AWS Textract OCR and document analysis
#### Input
| Parameter | Type | Required | Description |
| --------- | ---- | -------- | ----------- |
| `accessKeyId` | string | Yes | AWS Access Key ID |
| `secretAccessKey` | string | Yes | AWS Secret Access Key |
| `region` | string | Yes | AWS region for Textract service \(e.g., us-east-1\) |
| `processingMode` | string | No | Document type: single-page or multi-page. Defaults to single-page. |
| `filePath` | string | No | URL to a document to be processed \(JPEG, PNG, or single-page PDF\). |
| `s3Uri` | string | No | S3 URI for multi-page processing \(s3://bucket/key\). |
| `fileUpload` | object | No | File upload data from file-upload component |
| `featureTypes` | array | No | Feature types to detect: TABLES, FORMS, QUERIES, SIGNATURES, LAYOUT. If not specified, only text detection is performed. |
| `items` | string | No | Feature type |
| `queries` | array | No | Custom queries to extract specific information. Only used when featureTypes includes QUERIES. |
| `items` | object | No | Query configuration |
| `properties` | string | No | The query text |
| `Text` | string | No | No description |
| `Alias` | string | No | No description |
#### Output
| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `blocks` | array | Array of Block objects containing detected text, tables, forms, and other elements |
| ↳ `BlockType` | string | Type of block \(PAGE, LINE, WORD, TABLE, CELL, KEY_VALUE_SET, etc.\) |
| ↳ `Id` | string | Unique identifier for the block |
| ↳ `Text` | string | Query text |
| ↳ `TextType` | string | Type of text \(PRINTED or HANDWRITING\) |
| ↳ `Confidence` | number | Confidence score \(0-100\) |
| ↳ `Page` | number | Page number |
| ↳ `Geometry` | object | Location and bounding box information |
| ↳ `BoundingBox` | object | Height as ratio of document height |
| ↳ `Height` | number | Height as ratio of document height |
| ↳ `Left` | number | Left position as ratio of document width |
| ↳ `Top` | number | Top position as ratio of document height |
| ↳ `Width` | number | Width as ratio of document width |
| ↳ `Height` | number | Height as ratio of document height |
| ↳ `Left` | number | Left position as ratio of document width |
| ↳ `Top` | number | Top position as ratio of document height |
| ↳ `Width` | number | Width as ratio of document width |
| ↳ `Polygon` | array | Polygon coordinates |
| ↳ `X` | number | X coordinate |
| ↳ `Y` | number | Y coordinate |
| ↳ `X` | number | X coordinate |
| ↳ `Y` | number | Y coordinate |
| ↳ `BoundingBox` | object | Height as ratio of document height |
| ↳ `Height` | number | Height as ratio of document height |
| ↳ `Left` | number | Left position as ratio of document width |
| ↳ `Top` | number | Top position as ratio of document height |
| ↳ `Width` | number | Width as ratio of document width |
| ↳ `Height` | number | Height as ratio of document height |
| ↳ `Left` | number | Left position as ratio of document width |
| ↳ `Top` | number | Top position as ratio of document height |
| ↳ `Width` | number | Width as ratio of document width |
| ↳ `Polygon` | array | Polygon coordinates |
| ↳ `X` | number | X coordinate |
| ↳ `Y` | number | Y coordinate |
| ↳ `X` | number | X coordinate |
| ↳ `Y` | number | Y coordinate |
| ↳ `Relationships` | array | Relationships to other blocks |
| ↳ `Type` | string | Relationship type \(CHILD, VALUE, ANSWER, etc.\) |
| ↳ `Ids` | array | IDs of related blocks |
| ↳ `Type` | string | Relationship type \(CHILD, VALUE, ANSWER, etc.\) |
| ↳ `Ids` | array | IDs of related blocks |
| ↳ `EntityTypes` | array | Entity types for KEY_VALUE_SET \(KEY or VALUE\) |
| ↳ `SelectionStatus` | string | For checkboxes: SELECTED or NOT_SELECTED |
| ↳ `RowIndex` | number | Row index for table cells |
| ↳ `ColumnIndex` | number | Column index for table cells |
| ↳ `RowSpan` | number | Row span for merged cells |
| ↳ `ColumnSpan` | number | Column span for merged cells |
| ↳ `Query` | object | Query information for QUERY blocks |
| ↳ `Text` | string | Query text |
| ↳ `Alias` | string | Query alias |
| ↳ `Pages` | array | Pages to search |
| ↳ `Alias` | string | Query alias |
| ↳ `Pages` | array | Pages to search |
| `documentMetadata` | object | Metadata about the analyzed document |
| ↳ `pages` | number | Number of pages in the document |
| `modelVersion` | string | Version of the Textract model used for processing |

View File

@@ -6,7 +6,7 @@ description: Generate videos from text using AI
import { BlockInfoCard } from "@/components/ui/block-info-card"
<BlockInfoCard
type="video_generator"
type="video_generator_v2"
color="#181C1E"
/>