GITBOOK-73: No subject

2026-02-07 13:25:01 -05:00 · 2026-01-22 18:59:20 +00:00
parent 446c71fec8
commit f2d82d8802
46 changed files with 6386 additions and 0 deletions
--- a/docs/integrations/block-integrations/sampling.md
+++ b/docs/integrations/block-integrations/sampling.md
@@ -0,0 +1,37 @@
+# Data Sampling
+
+## What It Is <a href="#what-it-is" id="what-it-is"></a>
+
+The Data Sampling block is a tool for selecting a subset of data from a larger dataset using various sampling methods.
+
+## What It Does <a href="#what-it-does" id="what-it-does"></a>
+
+This block takes a dataset as input and returns a smaller sample of that data based on specified criteria. It supports multiple sampling methods, allowing users to choose the most appropriate technique for their needs.
+
+## How It Works <a href="#how-it-works" id="how-it-works"></a>
+
+The block processes the input data and applies the chosen sampling method to select a subset of items. It can work with different data structures and supports data accumulation for scenarios where data is received in batches.
+
+## Inputs <a href="#inputs" id="inputs"></a>
+
+| Input           | Description                                                                                                                                      |
+| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Data            | The dataset to sample from. This can be a single dictionary, a list of dictionaries, or a list of lists.                                         |
+| Sample Size     | The number of items to select from the dataset.                                                                                                  |
+| Sampling Method | The technique used to select the sample. Options include random, systematic, top, bottom, stratified, weighted, reservoir, and cluster sampling. |
+| Accumulate      | A flag indicating whether to accumulate data before sampling. This is useful for scenarios where data is received in batches.                    |
+| Random Seed     | An optional value to ensure reproducible random sampling.                                                                                        |
+| Stratify Key    | The key to use for stratified sampling (required when using the stratified sampling method).                                                     |
+| Weight Key      | The key to use for weighted sampling (required when using the weighted sampling method).                                                         |
+| Cluster Key     | The key to use for cluster sampling (required when using the cluster sampling method).                                                           |
+
+## Outputs
+
+| Output         | Description                                               |
+| -------------- | --------------------------------------------------------- |
+| Sampled Data   | The selected subset of the input data.                    |
+| Sample Indices | The indices of the sampled items in the original dataset. |
+
+## Possible Use Case
+
+A data scientist working with a large customer dataset wants to create a representative sample for analysis. They could use this Data Sampling block to select a smaller subset of customers using stratified sampling, ensuring that the sample maintains the same proportions of different customer segments as the full dataset.