- Add generate_block_docs.py script that introspects block code to
generate markdown
- Support manual content preservation via <!-- MANUAL: --> markers
- Add migrate_block_docs.py to preserve existing manual content from git
HEAD
- Add CI workflow (docs-block-sync.yml) to fail if docs drift from code
- Add Claude PR review workflow (docs-claude-review.yml) for doc changes
- Add manual LLM enhancement workflow (docs-enhance.yml)
- Add GitBook configuration (.gitbook.yaml, SUMMARY.md)
- Fix non-deterministic category ordering (categories is a set)
- Add comprehensive test suite (32 tests)
- Generate docs for 444 blocks with 66 preserved manual sections
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
<!-- Clearly explain the need for these changes: -->
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Extensively test code generation for the docs pages
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Introduces an automated documentation pipeline for blocks and
integrates it into CI.
>
> - Adds `scripts/generate_block_docs.py` (+ tests) to introspect blocks
and generate `docs/integrations/**`, preserving `<!-- MANUAL: -->`
sections
> - New CI workflows: **docs-block-sync** (fails if docs drift),
**docs-claude-review** (AI review for block/docs PRs), and
**docs-enhance** (optional LLM improvements)
> - Updates existing Claude workflows to use `CLAUDE_CODE_OAUTH_TOKEN`
instead of `ANTHROPIC_API_KEY`
> - Improves numerous block descriptions/typos and links across backend
blocks to standardize docs output
> - Commits initial generated docs including
`docs/integrations/README.md` and many provider/category pages
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
631e53e0f6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2.6 KiB
Exa Contents
Blocks for retrieving and extracting content from web pages using Exa's contents API.
Exa Contents
What it is
Retrieves document contents using Exa's contents API
How it works
This block retrieves full content from web pages using Exa's contents API. You can provide URLs directly or document IDs from previous searches. The API supports live crawling to fetch fresh content and can extract text, highlights, and AI-generated summaries.
The block supports subpage crawling to gather related content and offers various content retrieval options including full text extraction, relevant highlights, and customizable summary generation. Results are formatted for easy use with LLMs.
Inputs
| Input | Description | Type | Required |
|---|---|---|---|
| urls | Array of URLs to crawl (preferred over 'ids') | List[str] | No |
| ids | [DEPRECATED - use 'urls' instead] Array of document IDs obtained from searches | List[str] | No |
| text | Retrieve text content from pages | bool | No |
| highlights | Text snippets most relevant from each page | HighlightSettings | No |
| summary | LLM-generated summary of the webpage | SummarySettings | No |
| livecrawl | Livecrawling options: never, fallback (default), always, preferred | "never" | "fallback" | "always" | "preferred" | No |
| livecrawl_timeout | Timeout for livecrawling in milliseconds | int | No |
| subpages | Number of subpages to crawl | int | No |
| subpage_target | Keyword(s) to find specific subpages of search results | str | List[str] | No |
| extras | Extra parameters for additional content | ExtrasSettings | No |
Outputs
| Output | Description | Type |
|---|---|---|
| error | Error message if the request failed | str |
| results | List of document contents with metadata | List[ExaSearchResults] |
| result | Single document content result | ExaSearchResults |
| context | A formatted string of the results ready for LLMs | str |
| request_id | Unique identifier for the request | str |
| statuses | Status information for each requested URL | List[ContentStatus] |
| cost_dollars | Cost breakdown for the request | CostDollars |
Possible use case
Content Aggregation: Retrieve full article content from multiple URLs for analysis or summarization.
Competitive Research: Crawl competitor websites to extract product information, pricing, or feature details.
Data Enrichment: Fetch detailed content from URLs discovered through Exa searches to build comprehensive datasets.