mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-02-06 04:45:10 -05:00
<!-- Clearly explain the need for these changes: -->
we met some reality when merging into the docs site but this fixes it
### Changes 🏗️
updates paths, adds some guides
<!-- Concisely describe all of the changes made in this pull request:
-->
update to match reality
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] deploy it and validate
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Aligns block integrations documentation with GitBook.
>
> - Changes generator default output to
`docs/integrations/block-integrations` and writes overview `README.md`
and `SUMMARY.md` at `docs/integrations/`
> - Adds GitBook frontmatter and hint syntax to overview; prefixes block
links with `block-integrations/`
> - Introduces `generate_summary_md` to build GitBook navigation
(including optional `guides/`)
> - Preserves per-block manual sections and adds optional `extras` +
file-level `additional_content`
> - Updates sync checker to validate parent `README.md` and `SUMMARY.md`
> - Rewrites `docs/integrations/README.md` with GitBook frontmatter and
updated links; adds `docs/integrations/SUMMARY.md`
> - Adds new guides: `guides/llm-providers.md`,
`guides/voice-providers.md`
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
fdb7ff8111. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: bobby.gaffin <bobby.gaffin@agpt.co>
44 lines
1.8 KiB
Markdown
44 lines
1.8 KiB
Markdown
# Firecrawl Extract
|
|
<!-- MANUAL: file_description -->
|
|
Blocks for extracting structured data from web pages using Firecrawl's AI extraction.
|
|
<!-- END MANUAL -->
|
|
|
|
## Firecrawl Extract
|
|
|
|
### What it is
|
|
Firecrawl crawls websites to extract comprehensive data while bypassing blockers.
|
|
|
|
### How it works
|
|
<!-- MANUAL: how_it_works -->
|
|
This block uses Firecrawl's extraction API to pull structured data from web pages based on a prompt or schema. It crawls the specified URLs and uses AI to extract information matching your requirements.
|
|
|
|
Define the data structure you want using a JSON schema for precise extraction, or use natural language prompts for flexible extraction. Wildcards in URLs allow extracting data from multiple pages matching a pattern.
|
|
<!-- END MANUAL -->
|
|
|
|
### Inputs
|
|
|
|
| Input | Description | Type | Required |
|
|
|-------|-------------|------|----------|
|
|
| urls | The URLs to crawl - at least one is required. Wildcards are supported. (/*) | List[str] | Yes |
|
|
| prompt | The prompt to use for the crawl | str | No |
|
|
| output_schema | A Json Schema describing the output structure if more rigid structure is desired. | Dict[str, Any] | No |
|
|
| enable_web_search | When true, extraction can follow links outside the specified domain. | bool | No |
|
|
|
|
### Outputs
|
|
|
|
| Output | Description | Type |
|
|
|--------|-------------|------|
|
|
| error | Error message if the extraction failed | str |
|
|
| data | The result of the crawl | Dict[str, Any] |
|
|
|
|
### Possible use case
|
|
<!-- MANUAL: use_case -->
|
|
**Product Data Extraction**: Extract structured product information (prices, specs, reviews) from e-commerce sites.
|
|
|
|
**Contact Scraping**: Pull business contact information from company websites in a structured format.
|
|
|
|
**Data Pipeline Input**: Automatically extract and structure web data for analysis or database population.
|
|
<!-- END MANUAL -->
|
|
|
|
---
|