mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-02-07 13:25:01 -05:00
38 lines
3.1 KiB
Markdown
38 lines
3.1 KiB
Markdown
# Data Sampling
|
|
|
|
## What It Is <a href="#what-it-is" id="what-it-is"></a>
|
|
|
|
The Data Sampling block is a tool for selecting a subset of data from a larger dataset using various sampling methods.
|
|
|
|
## What It Does <a href="#what-it-does" id="what-it-does"></a>
|
|
|
|
This block takes a dataset as input and returns a smaller sample of that data based on specified criteria. It supports multiple sampling methods, allowing users to choose the most appropriate technique for their needs.
|
|
|
|
## How It Works <a href="#how-it-works" id="how-it-works"></a>
|
|
|
|
The block processes the input data and applies the chosen sampling method to select a subset of items. It can work with different data structures and supports data accumulation for scenarios where data is received in batches.
|
|
|
|
## Inputs <a href="#inputs" id="inputs"></a>
|
|
|
|
| Input | Description |
|
|
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
| Data | The dataset to sample from. This can be a single dictionary, a list of dictionaries, or a list of lists. |
|
|
| Sample Size | The number of items to select from the dataset. |
|
|
| Sampling Method | The technique used to select the sample. Options include random, systematic, top, bottom, stratified, weighted, reservoir, and cluster sampling. |
|
|
| Accumulate | A flag indicating whether to accumulate data before sampling. This is useful for scenarios where data is received in batches. |
|
|
| Random Seed | An optional value to ensure reproducible random sampling. |
|
|
| Stratify Key | The key to use for stratified sampling (required when using the stratified sampling method). |
|
|
| Weight Key | The key to use for weighted sampling (required when using the weighted sampling method). |
|
|
| Cluster Key | The key to use for cluster sampling (required when using the cluster sampling method). |
|
|
|
|
## Outputs
|
|
|
|
| Output | Description |
|
|
| -------------- | --------------------------------------------------------- |
|
|
| Sampled Data | The selected subset of the input data. |
|
|
| Sample Indices | The indices of the sampled items in the original dataset. |
|
|
|
|
## Possible Use Case
|
|
|
|
A data scientist working with a large customer dataset wants to create a representative sample for analysis. They could use this Data Sampling block to select a smaller subset of customers using stratified sampling, ensuring that the sample maintains the same proportions of different customer segments as the full dataset.
|