mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-08 13:55:06 -05:00

Files

Nicholas Tindle c1a1767034 feat(docs): Add block documentation auto-generation system (#11707 )

- Add generate_block_docs.py script that introspects block code to
generate markdown
- Support manual content preservation via <!-- MANUAL: --> markers
- Add migrate_block_docs.py to preserve existing manual content from git
HEAD
- Add CI workflow (docs-block-sync.yml) to fail if docs drift from code
- Add Claude PR review workflow (docs-claude-review.yml) for doc changes
- Add manual LLM enhancement workflow (docs-enhance.yml)
- Add GitBook configuration (.gitbook.yaml, SUMMARY.md)
- Fix non-deterministic category ordering (categories is a set)
- Add comprehensive test suite (32 tests)
- Generate docs for 444 blocks with 66 preserved manual sections

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

<!-- Clearly explain the need for these changes: -->

### Changes 🏗️

<!-- Concisely describe all of the changes made in this pull request:
-->

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] Extensively test code generation for the docs pages



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces an automated documentation pipeline for blocks and
integrates it into CI.
> 
> - Adds `scripts/generate_block_docs.py` (+ tests) to introspect blocks
and generate `docs/integrations/**`, preserving `<!-- MANUAL: -->`
sections
> - New CI workflows: **docs-block-sync** (fails if docs drift),
**docs-claude-review** (AI review for block/docs PRs), and
**docs-enhance** (optional LLM improvements)
> - Updates existing Claude workflows to use `CLAUDE_CODE_OAUTH_TOKEN`
instead of `ANTHROPIC_API_KEY`
> - Improves numerous block descriptions/typos and links across backend
blocks to standardize docs output
> - Commits initial generated docs including
`docs/integrations/README.md` and many provider/category pages
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
631e53e0f6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-19 07:03:19 +00:00

2.4 KiB

Raw Blame History

Firecrawl Scrape

Blocks for scraping individual web pages and extracting content using Firecrawl.

Firecrawl Scrape

What it is

Firecrawl scrapes a website to extract comprehensive data while bypassing blockers.

How it works

This block uses Firecrawl's scraping API to extract content from a single URL. It handles JavaScript rendering, bypasses anti-bot measures, and can return content in multiple formats including markdown, HTML, and screenshots.

Configure output formats, filter to main content only, and set wait times for dynamic pages. The block returns comprehensive results including extracted content, links found on the page, and optional change tracking data.

Inputs

Input	Description	Type	Required
url	The URL to crawl	str	Yes
limit	The number of pages to crawl	int	No
only_main_content	Only return the main content of the page excluding headers, navs, footers, etc.	bool	No
max_age	The maximum age of the page in milliseconds - default is 1 hour	int	No
wait_for	Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.	int	No
formats	The format of the crawl	List["markdown" \| "html" \| "rawHtml" \| "links" \| "screenshot" \| "screenshot@fullPage" \| "json" \| "changeTracking"]	No

Outputs

Output	Description	Type
error	Error message if the scrape failed	str
data	The result of the crawl	Dict[str, Any]
markdown	The markdown of the crawl	str
html	The html of the crawl	str
raw_html	The raw html of the crawl	str
links	The links of the crawl	List[str]
screenshot	The screenshot of the crawl	str
screenshot_full_page	The screenshot full page of the crawl	str
json_data	The json data of the crawl	Dict[str, Any]
change_tracking	The change tracking of the crawl	Dict[str, Any]

Possible use case

Article Extraction: Scrape news articles or blog posts to extract clean, readable content.

Price Monitoring: Regularly scrape product pages to track price changes over time.

Content Backup: Create markdown backups of important web pages for offline reference.

2.4 KiB Raw Blame History

Firecrawl Scrape

Firecrawl Scrape

What it is

How it works

Inputs

Outputs

Possible use case

2.4 KiB

Raw Blame History