diff --git a/docs/docs/icicle/rust-bindings/multi-gpu.md b/docs/docs/icicle/rust-bindings/multi-gpu.md index a7dac3ab..685d5cd4 100644 --- a/docs/docs/icicle/rust-bindings/multi-gpu.md +++ b/docs/docs/icicle/rust-bindings/multi-gpu.md @@ -4,6 +4,54 @@ To learn more about the theory of Multi GPU programming refer to [this part](../ Here we will cover the core multi GPU apis and a [example](#a-multi-gpu-example) + +## A Multi GPU example + +In this example we will display how you can + +1. Fetch the number of devices installed on a machine +2. For every GPU launch a thread and set an active device per thread. +3. Execute a MSM on each GPU + + + +```rust + +... + +let device_count = get_device_count().unwrap(); + +(0..device_count) + .into_par_iter() + .for_each(move |device_id| { + set_device(device_id).unwrap(); + + // you can allocate points and scalars_d here + + let mut cfg = MSMConfig::default_for_device(device_id); + cfg.ctx.stream = &stream; + cfg.is_async = true; + cfg.are_scalars_montgomery_form = true; + msm(&scalars_d, &HostOrDeviceSlice::on_host(points), &cfg, &mut msm_results).unwrap(); + + // collect and process results + }) + +... +``` + + +We use `get_device_count` to fetch the number of connected devices, device IDs will be `0, 1, 2, ..., device_count - 1` + +[`into_par_iter`](https://docs.rs/rayon/latest/rayon/iter/trait.IntoParallelIterator.html#tymethod.into_par_iter) is a parallel iterator, you should expect it to launch a thread for every iteration. + +We then call `set_device(device_id).unwrap();` it should set the context of that thread to the selected `device_id`. + +Any data you now allocate from the context of this thread will be linked to the `device_id`. We create our `MSMConfig` with the selected device ID `let mut cfg = MSMConfig::default_for_device(device_id);`, behind the scene this will create for us a `DeviceContext` configured for that specific GPU. + +We finally call our `msm` method. + + ## Device management API To streamline device management we offer as part of `icicle-cuda-runtime` package methods for dealing with devices. @@ -152,50 +200,3 @@ let device_id: i32 = 0; // Example device ID check_device(device_id); // Ensures that the current context is correctly set for the specified device ID. ``` - - -## A Multi GPU example - -In this example we will display how you can - -1. Fetch the number of devices installed on a machine -2. For every GPU launch a thread and set a active device per thread. -3. Execute a MSM on each GPU - - - -```rust - -... - -let device_count = get_device_count().unwrap(); - -(0..device_count) - .into_par_iter() - .for_each(move |device_id| { - set_device(device_id).unwrap(); - - // you can allocate points and scalars_d here - - let mut cfg = MSMConfig::default_for_device(device_id); - cfg.ctx.stream = &stream; - cfg.is_async = true; - cfg.are_scalars_montgomery_form = true; - msm(&scalars_d, &HostOrDeviceSlice::on_host(points), &cfg, &mut msm_results).unwrap(); - - // collect and process results - }) - -... -``` - - -We use `get_device_count` to fetch the number of connected devices, device IDs will be `0...device_count-1` - -[`into_par_iter`](https://docs.rs/rayon/latest/rayon/iter/trait.IntoParallelIterator.html#tymethod.into_par_iter) is a parallel iterator, you should expect it to launch a thread for every iteration. - -We then call `set_device(device_id).unwrap();` it should set the context of that thread to the selected `device_id`. - -Any data you now allocate from the context of this thread will be linked to the `device_id`. We create our `MSMConfig` with the selected device ID `let mut cfg = MSMConfig::default_for_device(device_id);`, behind the scene this will create for us a `DeviceContext` configured for that specific GPU. - -We finally call our `msm` method. diff --git a/docs/docs/icicle/rust-bindings/vec-ops.md b/docs/docs/icicle/rust-bindings/vec-ops.md new file mode 100644 index 00000000..0f537edc --- /dev/null +++ b/docs/docs/icicle/rust-bindings/vec-ops.md @@ -0,0 +1,159 @@ +# Vector Operations API + +Our vector operations API which is part of `icicle-cuda-runtime` package, includes fundamental methods for addition, subtraction, and multiplication of vectors, with support for both host and device memory. + + +## Supported curves + +Vector operations are supported on the following curves: + +`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`, `grumpkin` + +## Examples + +### Addition of Scalars + +```rust +use icicle_bn254::curve::{ScalarCfg, ScalarField}; +use icicle_core::vec_ops::{add_scalars}; + +let test_size = 1 << 18; + +let a: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size)); +let b: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size)); +let mut result: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(vec![F::zero(); test_size]); + +let cfg = VecOpsConfig::default(); +add_scalars(&a, &b, &mut result, &cfg).unwrap(); +``` + +### Subtraction of Scalars + +```rust +use icicle_bn254::curve::{ScalarCfg, ScalarField}; +use icicle_core::vec_ops::{sub_scalars}; + +let test_size = 1 << 18; + +let a: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size)); +let b: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size)); +let mut result: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(vec![F::zero(); test_size]); + +let cfg = VecOpsConfig::default(); +sub_scalars(&a, &b, &mut result, &cfg).unwrap(); +``` + +### Multiplication of Scalars + +```rust +use icicle_bn254::curve::{ScalarCfg, ScalarField}; +use icicle_core::vec_ops::{mul_scalars}; + +let test_size = 1 << 18; + +let a: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size)); +let ones: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(vec![F::one(); test_size]); +let mut result: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(vec![F::zero(); test_size]); + +let cfg = VecOpsConfig::default(); +mul_scalars(&a, &ones, &mut result, &cfg).unwrap(); +``` + + +## Vector Operations Configuration + +The `VecOpsConfig` struct encapsulates the settings for vector operations, including device context and operation modes. + +### `VecOpsConfig` + +Defines configuration parameters for vector operations. + +```rust +pub struct VecOpsConfig<'a> { + pub ctx: DeviceContext<'a>, + is_a_on_device: bool, + is_b_on_device: bool, + is_result_on_device: bool, + is_result_montgomery_form: bool, + pub is_async: bool, +} +``` + +#### Fields + +- **`ctx: DeviceContext<'a>`**: Specifies the device context for the operation, including the device ID and memory pool. +- **`is_a_on_device`**: Indicates if the first operand vector resides in device memory. +- **`is_b_on_device`**: Indicates if the second operand vector resides in device memory. +- **`is_result_on_device`**: Specifies if the result vector should be stored in device memory. +- **`is_result_montgomery_form`**: Determines if the result should be in Montgomery form. +- **`is_async`**: Enables asynchronous operation. If `true`, operations are non-blocking; otherwise, they block the current thread. + +### Default Configuration + +`VecOpsConfig` can be initialized with default settings tailored for a specific device: + +``` +let cfg = VecOpsConfig::default(); +``` + +These are the default settings. + +```rust +impl<'a> Default for VecOpsConfig<'a> { + fn default() -> Self { + Self::default_for_device(DEFAULT_DEVICE_ID) + } +} + +impl<'a> VecOpsConfig<'a> { + pub fn default_for_device(device_id: usize) -> Self { + VecOpsConfig { + ctx: DeviceContext::default_for_device(device_id), + is_a_on_device: false, + is_b_on_device: false, + is_result_on_device: false, + is_result_montgomery_form: false, + is_async: false, + } + } +} +``` + +## Vector Operations + +Vector operations are implemented through the `VecOps` trait, these traits are implemented for all [supported curves](#supported-curves) providing methods for addition, subtraction, and multiplication of vectors. + +### `VecOps` Trait + +```rust +pub trait VecOps { + fn add( + a: &HostOrDeviceSlice, + b: &HostOrDeviceSlice, + result: &mut HostOrDeviceSlice, + cfg: &VecOpsConfig, + ) -> IcicleResult<()>; + + fn sub( + a: &HostOrDeviceSlice, + b: &HostOrDeviceSlice, + result: &mut HostOrDeviceSlice, + cfg: &VecOpsConfig, + ) -> IcicleResult<()>; + + fn mul( + a: &HostOrDeviceSlice, + b: &HostOrDeviceSlice, + result: &mut HostOrDeviceSlice, + cfg: &VecOpsConfig, + ) -> IcicleResult<()>; +} +``` + +#### Methods + +All operations are element-wise operations, and the results placed into the `result` param. These operations are not in place. + +- **`add`**: Computes the element-wise sum of two vectors. +- **`sub`**: Computes the element-wise difference between two vectors. +- **`mul`**: Performs element-wise multiplication of two vectors. diff --git a/docs/docusaurus.config.js b/docs/docusaurus.config.js index 25ee1f3f..c7699ea4 100644 --- a/docs/docusaurus.config.js +++ b/docs/docusaurus.config.js @@ -29,13 +29,13 @@ const config = { remarkPlugins: [math, require('mdx-mermaid')], rehypePlugins: [katex], sidebarPath: require.resolve('./sidebars.js'), - editUrl: 'https://github.com/ingonyama-zk/developer-docs/tree/main', + editUrl: 'https://github.com/ingonyama-zk/icicle/tree/main', }, blog: { remarkPlugins: [math, require('mdx-mermaid')], rehypePlugins: [katex], showReadingTime: true, - editUrl: 'https://github.com/ingonyama-zk/developer-docs/tree/main', + editUrl: 'https://github.com/ingonyama-zk/icicle/tree/main', }, pages: {}, theme: { diff --git a/docs/sidebars.js b/docs/sidebars.js index eae710fa..943d20b2 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -42,6 +42,11 @@ module.exports = { type: "doc", label: "Multi GPU Support", id: "icicle/rust-bindings/multi-gpu", + }, + { + type: "doc", + label: "Vector operations", + id: "icicle/rust-bindings/vec-ops", } ] },