Files
AutoGPT/docs/integrations/block-integrations/firecrawl/extract.md
Nicholas Tindle 90466908a8 refactor(docs): restructure platform docs for GitBook and remove MkDo… (#11825)
<!-- Clearly explain the need for these changes: -->
we met some reality when merging into the docs site but this fixes it
### Changes 🏗️
updates paths, adds some guides
<!-- Concisely describe all of the changes made in this pull request:
-->
update to match reality
### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] deploy it and validate

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Aligns block integrations documentation with GitBook.
> 
> - Changes generator default output to
`docs/integrations/block-integrations` and writes overview `README.md`
and `SUMMARY.md` at `docs/integrations/`
> - Adds GitBook frontmatter and hint syntax to overview; prefixes block
links with `block-integrations/`
> - Introduces `generate_summary_md` to build GitBook navigation
(including optional `guides/`)
> - Preserves per-block manual sections and adds optional `extras` +
file-level `additional_content`
> - Updates sync checker to validate parent `README.md` and `SUMMARY.md`
> - Rewrites `docs/integrations/README.md` with GitBook frontmatter and
updated links; adds `docs/integrations/SUMMARY.md`
> - Adds new guides: `guides/llm-providers.md`,
`guides/voice-providers.md`
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
fdb7ff8111. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: bobby.gaffin <bobby.gaffin@agpt.co>
2026-01-23 06:18:16 +00:00

44 lines
1.8 KiB
Markdown

# Firecrawl Extract
<!-- MANUAL: file_description -->
Blocks for extracting structured data from web pages using Firecrawl's AI extraction.
<!-- END MANUAL -->
## Firecrawl Extract
### What it is
Firecrawl crawls websites to extract comprehensive data while bypassing blockers.
### How it works
<!-- MANUAL: how_it_works -->
This block uses Firecrawl's extraction API to pull structured data from web pages based on a prompt or schema. It crawls the specified URLs and uses AI to extract information matching your requirements.
Define the data structure you want using a JSON schema for precise extraction, or use natural language prompts for flexible extraction. Wildcards in URLs allow extracting data from multiple pages matching a pattern.
<!-- END MANUAL -->
### Inputs
| Input | Description | Type | Required |
|-------|-------------|------|----------|
| urls | The URLs to crawl - at least one is required. Wildcards are supported. (/*) | List[str] | Yes |
| prompt | The prompt to use for the crawl | str | No |
| output_schema | A Json Schema describing the output structure if more rigid structure is desired. | Dict[str, Any] | No |
| enable_web_search | When true, extraction can follow links outside the specified domain. | bool | No |
### Outputs
| Output | Description | Type |
|--------|-------------|------|
| error | Error message if the extraction failed | str |
| data | The result of the crawl | Dict[str, Any] |
### Possible use case
<!-- MANUAL: use_case -->
**Product Data Extraction**: Extract structured product information (prices, specs, reviews) from e-commerce sites.
**Contact Scraping**: Pull business contact information from company websites in a structured format.
**Data Pipeline Input**: Automatically extract and structure web data for analysis or database population.
<!-- END MANUAL -->
---