mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-01-31 09:58:19 -05:00
- Add generate_block_docs.py script that introspects block code to
generate markdown
- Support manual content preservation via <!-- MANUAL: --> markers
- Add migrate_block_docs.py to preserve existing manual content from git
HEAD
- Add CI workflow (docs-block-sync.yml) to fail if docs drift from code
- Add Claude PR review workflow (docs-claude-review.yml) for doc changes
- Add manual LLM enhancement workflow (docs-enhance.yml)
- Add GitBook configuration (.gitbook.yaml, SUMMARY.md)
- Fix non-deterministic category ordering (categories is a set)
- Add comprehensive test suite (32 tests)
- Generate docs for 444 blocks with 66 preserved manual sections
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
<!-- Clearly explain the need for these changes: -->
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Extensively test code generation for the docs pages
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Introduces an automated documentation pipeline for blocks and
integrates it into CI.
>
> - Adds `scripts/generate_block_docs.py` (+ tests) to introspect blocks
and generate `docs/integrations/**`, preserving `<!-- MANUAL: -->`
sections
> - New CI workflows: **docs-block-sync** (fails if docs drift),
**docs-claude-review** (AI review for block/docs PRs), and
**docs-enhance** (optional LLM improvements)
> - Updates existing Claude workflows to use `CLAUDE_CODE_OAUTH_TOKEN`
instead of `ANTHROPIC_API_KEY`
> - Improves numerous block descriptions/typos and links across backend
blocks to standardize docs output
> - Commits initial generated docs including
`docs/integrations/README.md` and many provider/category pages
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
631e53e0f6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
31 lines
2.1 KiB
Markdown
31 lines
2.1 KiB
Markdown
# Data Sampling
|
|
|
|
## What it is
|
|
The Data Sampling block is a tool for selecting a subset of data from a larger dataset using various sampling methods.
|
|
|
|
## What it does
|
|
This block takes a dataset as input and returns a smaller sample of that data based on specified criteria. It supports multiple sampling methods, allowing users to choose the most appropriate technique for their needs.
|
|
|
|
## How it works
|
|
The block processes the input data and applies the chosen sampling method to select a subset of items. It can work with different data structures and supports data accumulation for scenarios where data is received in batches.
|
|
|
|
## Inputs
|
|
| Input | Description |
|
|
|-------|-------------|
|
|
| Data | The dataset to sample from. This can be a single dictionary, a list of dictionaries, or a list of lists. |
|
|
| Sample Size | The number of items to select from the dataset. |
|
|
| Sampling Method | The technique used to select the sample. Options include random, systematic, top, bottom, stratified, weighted, reservoir, and cluster sampling. |
|
|
| Accumulate | A flag indicating whether to accumulate data before sampling. This is useful for scenarios where data is received in batches. |
|
|
| Random Seed | An optional value to ensure reproducible random sampling. |
|
|
| Stratify Key | The key to use for stratified sampling (required when using the stratified sampling method). |
|
|
| Weight Key | The key to use for weighted sampling (required when using the weighted sampling method). |
|
|
| Cluster Key | The key to use for cluster sampling (required when using the cluster sampling method). |
|
|
|
|
## Outputs
|
|
| Output | Description |
|
|
|--------|-------------|
|
|
| Sampled Data | The selected subset of the input data. |
|
|
| Sample Indices | The indices of the sampled items in the original dataset. |
|
|
|
|
## Possible use case
|
|
A data scientist working with a large customer dataset wants to create a representative sample for analysis. They could use this Data Sampling block to select a smaller subset of customers using stratified sampling, ensuring that the sample maintains the same proportions of different customer segments as the full dataset. |