Files
AutoGPT/docs/integrations/block-integrations/firecrawl/scrape.md
Nicholas Tindle 90466908a8 refactor(docs): restructure platform docs for GitBook and remove MkDo… (#11825)
<!-- Clearly explain the need for these changes: -->
we met some reality when merging into the docs site but this fixes it
### Changes 🏗️
updates paths, adds some guides
<!-- Concisely describe all of the changes made in this pull request:
-->
update to match reality
### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] deploy it and validate

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Aligns block integrations documentation with GitBook.
> 
> - Changes generator default output to
`docs/integrations/block-integrations` and writes overview `README.md`
and `SUMMARY.md` at `docs/integrations/`
> - Adds GitBook frontmatter and hint syntax to overview; prefixes block
links with `block-integrations/`
> - Introduces `generate_summary_md` to build GitBook navigation
(including optional `guides/`)
> - Preserves per-block manual sections and adds optional `extras` +
file-level `additional_content`
> - Updates sync checker to validate parent `README.md` and `SUMMARY.md`
> - Rewrites `docs/integrations/README.md` with GitBook frontmatter and
updated links; adds `docs/integrations/SUMMARY.md`
> - Adds new guides: `guides/llm-providers.md`,
`guides/voice-providers.md`
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
fdb7ff8111. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: bobby.gaffin <bobby.gaffin@agpt.co>
2026-01-23 06:18:16 +00:00

2.4 KiB

Firecrawl Scrape

Blocks for scraping individual web pages and extracting content using Firecrawl.

Firecrawl Scrape

What it is

Firecrawl scrapes a website to extract comprehensive data while bypassing blockers.

How it works

This block uses Firecrawl's scraping API to extract content from a single URL. It handles JavaScript rendering, bypasses anti-bot measures, and can return content in multiple formats including markdown, HTML, and screenshots.

Configure output formats, filter to main content only, and set wait times for dynamic pages. The block returns comprehensive results including extracted content, links found on the page, and optional change tracking data.

Inputs

Input Description Type Required
url The URL to crawl str Yes
limit The number of pages to crawl int No
only_main_content Only return the main content of the page excluding headers, navs, footers, etc. bool No
max_age The maximum age of the page in milliseconds - default is 1 hour int No
wait_for Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load. int No
formats The format of the crawl List["markdown" | "html" | "rawHtml" | "links" | "screenshot" | "screenshot@fullPage" | "json" | "changeTracking"] No

Outputs

Output Description Type
error Error message if the scrape failed str
data The result of the crawl Dict[str, Any]
markdown The markdown of the crawl str
html The html of the crawl str
raw_html The raw html of the crawl str
links The links of the crawl List[str]
screenshot The screenshot of the crawl str
screenshot_full_page The screenshot full page of the crawl str
json_data The json data of the crawl Dict[str, Any]
change_tracking The change tracking of the crawl Dict[str, Any]

Possible use case

Article Extraction: Scrape news articles or blog posts to extract clean, readable content.

Price Monitoring: Regularly scrape product pages to track price changes over time.

Content Backup: Create markdown backups of important web pages for offline reference.