## Data Sampling ### What it is A versatile data sampling tool that can select specific items from a dataset using various selection methods. ### What it does Takes a collection of data items and returns a smaller subset based on user-defined criteria and sampling methods. It can work with different types of data collections and offers multiple ways to choose which items to include in the sample. ### How it works The system looks at your data collection and selects items based on your chosen sampling method. It can pick items: - Completely randomly - At regular intervals - From specific groups proportionally - Based on importance weights - In clusters or groups - From the beginning or end - Using reservoir sampling for streaming data ### Inputs - Data: The collection of items you want to sample from - Sample Size: How many items you want in your final selection - Sampling Method: How you want to choose the items (random, systematic, top, bottom, stratified, weighted, reservoir, or cluster) - Accumulate: Whether to collect data over time before sampling - Random Seed: A number to ensure you get the same results each time (optional) - Stratify Key: The category to use when ensuring balanced group representation - Weight Key: The value to use when considering item importance - Cluster Key: The group identifier for cluster-based sampling ### Outputs - Sampled Data: The selected items from your dataset - Sample Indices: The positions of the selected items in the original dataset ### Possible use cases - Quality control in manufacturing: Randomly selecting products for inspection - Market research: Selecting a representative group of customers to survey - Data analysis: Creating balanced training datasets for machine learning - Scientific research: Selecting specimens for detailed analysis - Social studies: Choosing participants for a study while maintaining demographic balance