<!-- Clearly explain the need for these changes: -->
we met some reality when merging into the docs site but this fixes it
### Changes 🏗️
updates paths, adds some guides
<!-- Concisely describe all of the changes made in this pull request:
-->
update to match reality
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] deploy it and validate
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Aligns block integrations documentation with GitBook.
>
> - Changes generator default output to
`docs/integrations/block-integrations` and writes overview `README.md`
and `SUMMARY.md` at `docs/integrations/`
> - Adds GitBook frontmatter and hint syntax to overview; prefixes block
links with `block-integrations/`
> - Introduces `generate_summary_md` to build GitBook navigation
(including optional `guides/`)
> - Preserves per-block manual sections and adds optional `extras` +
file-level `additional_content`
> - Updates sync checker to validate parent `README.md` and `SUMMARY.md`
> - Rewrites `docs/integrations/README.md` with GitBook frontmatter and
updated links; adds `docs/integrations/SUMMARY.md`
> - Adds new guides: `guides/llm-providers.md`,
`guides/voice-providers.md`
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
fdb7ff8111. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: bobby.gaffin <bobby.gaffin@agpt.co>
1.8 KiB
Firecrawl Extract
Blocks for extracting structured data from web pages using Firecrawl's AI extraction.
Firecrawl Extract
What it is
Firecrawl crawls websites to extract comprehensive data while bypassing blockers.
How it works
This block uses Firecrawl's extraction API to pull structured data from web pages based on a prompt or schema. It crawls the specified URLs and uses AI to extract information matching your requirements.
Define the data structure you want using a JSON schema for precise extraction, or use natural language prompts for flexible extraction. Wildcards in URLs allow extracting data from multiple pages matching a pattern.
Inputs
| Input | Description | Type | Required |
|---|---|---|---|
| urls | The URLs to crawl - at least one is required. Wildcards are supported. (/*) | List[str] | Yes |
| prompt | The prompt to use for the crawl | str | No |
| output_schema | A Json Schema describing the output structure if more rigid structure is desired. | Dict[str, Any] | No |
| enable_web_search | When true, extraction can follow links outside the specified domain. | bool | No |
Outputs
| Output | Description | Type |
|---|---|---|
| error | Error message if the extraction failed | str |
| data | The result of the crawl | Dict[str, Any] |
Possible use case
Product Data Extraction: Extract structured product information (prices, specs, reviews) from e-commerce sites.
Contact Scraping: Pull business contact information from company websites in a structured format.
Data Pipeline Input: Automatically extract and structure web data for analysis or database population.