feat: add num_bits() function (#570 )

## Describe the changes adding a `num_bits()` function similar to dcf73a5f96/ff/src/biginteger/mod.rs (L482) this could be useful for small field optimizations ## Linked Issues Resolves #
Removed ZK containers from docs sidebar (#571 )
2026-01-12 08:58:09 -05:00 · 2024-08-07 09:37:16 +03:00 · 2024-08-04 18:38:37 +03:00 · 2024-08-04 11:14:06 +03:00 · 2024-08-01 14:58:02 +03:00 · 2024-07-28 15:31:28 +07:00
59 changed files with 650 additions and 628 deletions
--- a/docs/docs/contributor-guide.md
+++ b/docs/docs/contributor-guide.md
@@ -1,6 +1,6 @@
 # Contributor's Guide

-We welcome all contributions with open arms. At Ingonyama we take a village approach, believing it takes many hands and minds to build a ecosystem.
+We welcome all contributions with open arms. At Ingonyama we take a village approach, believing it takes many hands and minds to build an ecosystem.

 ## Contributing to ICICLE

@@ -14,9 +14,9 @@ We welcome all contributions with open arms. At Ingonyama we take a village appr
 When opening a [pull request](https://github.com/ingonyama-zk/icicle/pulls) please keep the following in mind.

 - `Clear Purpose` - The pull request should solve a single issue and be clean of any unrelated changes.
- `Clear description` - If the pull request is for a new feature describe what you built, why you added it and how its best that we test it. For bug fixes please describe the issue and the solution.
+- `Clear description` - If the pull request is for a new feature describe what you built, why you added it and how it's best that we test it. For bug fixes please describe the issue and the solution.
 - `Consistent style` - Rust and Golang code should be linted by the official linters (golang fmt and rust fmt) and maintain a proper style. For CUDA and C++ code we use [`clang-format`](https://github.com/ingonyama-zk/icicle/blob/main/.clang-format), [here](https://github.com/ingonyama-zk/icicle/blob/605c25f9d22135c54ac49683b710fe2ce06e2300/.github/workflows/main-format.yml#L46) you can see how we run it.
- `Minimal Tests` - please add test which cover basic usage of your changes .
+- `Minimal Tests` - please add test which cover basic usage of your changes.

 ## Questions?

--- a/docs/docs/icicle/golang-bindings/ecntt.md
+++ b/docs/docs/icicle/golang-bindings/ecntt.md
@@ -47,7 +47,7 @@ type NTTConfig[T any] struct {
 - **`areInputsOnDevice`**: Indicates if input scalars are located on the device.
 - **`areOutputsOnDevice`**: Indicates if results are stored on the device.
 - **`IsAsync`**: Controls whether the NTT operation runs asynchronously.
- **`NttAlgorithm`**: Explicitly select the NTT algorithm. ECNTT supports running on `Radix2` algoruithm.
+- **`NttAlgorithm`**: Explicitly select the NTT algorithm. ECNTT supports running on `Radix2` algorithm.

 ### Default Configuration

--- a/docs/docs/icicle/golang-bindings/msm.md
+++ b/docs/docs/icicle/golang-bindings/msm.md
@@ -139,7 +139,7 @@ cfg.Ctx.IsBigTriangle = true

 Toggling between MSM modes occurs automatically based on the number of results you are expecting from the `MSM` function.

-The number of results is interpreted from the size of `var out core.DeviceSlice`. Thus its important when allocating memory for `var out core.DeviceSlice` to make sure that you are allocating `<number of results> X <size of a single point>`.
+The number of results is interpreted from the size of `var out core.DeviceSlice`. Thus it's important when allocating memory for `var out core.DeviceSlice` to make sure that you are allocating `<number of results> X <size of a single point>`.

 ```go
 ... 
@@ -168,7 +168,7 @@ import (
 )
 ```

-This package include `G2Projective` and `G2Affine` points as well as a `G2Msm` method.
+This package includes `G2Projective` and `G2Affine` points as well as a `G2Msm` method.

 ```go
 package main
--- a/docs/docs/icicle/polynomials/overview.md
+++ b/docs/docs/icicle/polynomials/overview.md
@@ -171,7 +171,7 @@ Polynomial& add_monomial_inplace(Coeff monomial_coeff, uint64_t monomial = 0);
 Polynomial& sub_monomial_inplace(Coeff monomial_coeff, uint64_t monomial = 0);
 ```

-The ability to add or subtract monomials directly and in-place is an efficient way to manipualte polynomials.
+The ability to add or subtract monomials directly and in-place is an efficient way to manipulate polynomials.

 Example:

--- a/docs/docs/icicle/primitives/keccak.md
+++ b/docs/docs/icicle/primitives/keccak.md
@@ -12,11 +12,68 @@ At its core, Keccak consists of a permutation function operating on a state arra
 - **Chi:** This step applies a nonlinear mixing operation to each lane of the state array.
 - **Iota:** This step introduces a round constant to the state array.

+## Keccak vs Sha3
+
+There exists a [confusion](https://www.cybertest.com/blog/keccak-vs-sha3) between what is called `Keccak` and `Sha3`. In ICICLE we support both. `Keccak256` relates to the old hash function used in Ethereum, and `Sha3-256` relates to the modern hash function.
+
 ## Using Keccak

-ICICLE Keccak supports batch hashing, which can be utilized for constructing a merkle tree.
+ICICLE Keccak supports batch hashing, which can be utilized for constructing a merkle tree or running multiple hashes in parallel.

 ### Supported Bindings

 - [Golang](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/hash/keccak)
- [Rust](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/icicle-hash)
+- [Rust](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/icicle-hash)
+
+### Example usage
+
+This is an example of running 1024 Keccak-256 hashes in parallel, where input strings are of size 136 bytes:
+
+```rust
+use icicle_core::hash::HashConfig;
+use icicle_cuda_runtime::memory::HostSlice;
+use icicle_hash::keccak::keccak256;
+
+let config = HashConfig::default();
+let input_block_len = 136;
+let number_of_hashes = 1024;
+
+let preimages = vec![1u8; number_of_hashes * input_block_len];
+let mut digests = vec![0u8; number_of_hashes * 32];
+
+let preimages_slice = HostSlice::from_slice(&preimages);
+let digests_slice = HostSlice::from_mut_slice(&mut digests);
+
+keccak256(
+    preimages_slice,
+    input_block_len as u32,
+    number_of_hashes as u32,
+    digests_slice,
+    &config,
+)
+.unwrap();
+```
+
+### Merkle Tree
+
+You can build a keccak merkle tree using the corresponding functions:
+
+```rust
+use icicle_core::tree::{merkle_tree_digests_len, TreeBuilderConfig};
+use icicle_cuda_runtime::memory::HostSlice;
+use icicle_hash::keccak::build_keccak256_merkle_tree;
+
+let mut config = TreeBuilderConfig::default();
+config.arity = 2;
+let height = 22;
+let input_block_len = 136;
+let leaves = vec![1u8; (1 << height) * input_block_len];
+let mut digests = vec![0u64; merkle_tree_digests_len((height + 1) as u32, 2, 1)];
+
+let leaves_slice = HostSlice::from_slice(&leaves);
+let digests_slice = HostSlice::from_mut_slice(&mut digests);
+
+build_keccak256_merkle_tree(leaves_slice, digests_slice, height, input_block_len, &config).unwrap();
+```
+
+In the example above, a binary tree of height 22 is being built. Each leaf is considered to be a 136 byte long array. The leaves and digests are aligned in a flat array. You can also use keccak512 in `build_keccak512_merkle_tree` function.
--- a/docs/docs/icicle/primitives/msm.md
+++ b/docs/docs/icicle/primitives/msm.md
@@ -1,6 +1,6 @@
 # MSM - Multi scalar multiplication

-MSM stands for Multi scalar multiplication, its defined as:
+MSM stands for Multi scalar multiplication, it's defined as:

 <math xmlns="http://www.w3.org/1998/Math/MathML">
  <mi>M</mi>
@@ -43,7 +43,7 @@ $a_0, \ldots, a_n$ - Scalars

 $MSM(a, G) \in G$ - a single EC (elliptic curve) point

-In words, MSM is the sum of scalar and EC point multiplications. We can see from this definition that the core operations occurring are Modular Multiplication and Elliptic curve point addition. Its obvious that multiplication can be computed in parallel and then the products summed, making MSM inherently parallelizable.
+In words, MSM is the sum of scalar and EC point multiplications. We can see from this definition that the core operations occurring are Modular Multiplication and Elliptic curve point addition. It's obvious that multiplication can be computed in parallel and then the products summed, making MSM inherently parallelizable.

 Accelerating MSM is crucial to a ZK protocol's performance due to the [large percent of run time](https://hackmd.io/@0xMonia/SkQ6-oRz3#Hardware-acceleration-in-action) they take when generating proofs.

@@ -131,7 +131,7 @@ Large buckets exist in two cases:
 2. When `c` does not divide the scalar bit-size.

 `large_bucket_factor` that is equal to 10 yields good results for most cases, but it's best to fine tune this parameter per `c` and per scalar distribution.
-The two most important parameters for performance are `c` and the `precompute_factor`. They affect the number of EC additions as well as the memory size. When the points are not known in advance we cannot use precomputation. In this case the best `c` value is usually around $log_2(msmSize) - 4$. However, in most protocols the points are known in advanced and precomputation can be used unless limited by memory. Usually it's best to use maximum precomputation (such that we end up with only a single bucket module) combined we a `c` value around $log_2(msmSize) - 1$.
+The two most important parameters for performance are `c` and the `precompute_factor`. They affect the number of EC additions as well as the memory size. When the points are not known in advance we cannot use precomputation. In this case the best `c` value is usually around $log_2(msmSize) - 4$. However, in most protocols the points are known in advance and precomputation can be used unless limited by memory. Usually it's best to use maximum precomputation (such that we end up with only a single bucket module) combined with a `c` value around $log_2(msmSize) - 1$.

 ## Memory usage estimation

--- a/docs/docs/icicle/primitives/ntt.md
+++ b/docs/docs/icicle/primitives/ntt.md
@@ -56,7 +56,7 @@ Choosing an algorithm is heavily dependent on your use case. For example Cooley-

 NTT also supports two different modes `Batch NTT` and `Single NTT`

-Deciding weather to use `batch NTT` vs `single NTT` is highly dependent on your application and use case.
+Deciding whether to use `batch NTT` vs `single NTT` is highly dependent on your application and use case.

 #### Single NTT

--- a/docs/docs/icicle/primitives/poseidon.md
+++ b/docs/docs/icicle/primitives/poseidon.md
@@ -1,6 +1,6 @@
 # Poseidon

-[Poseidon](https://eprint.iacr.org/2019/458.pdf) is a popular hash in the ZK ecosystem primarily because its optimized to work over large prime fields, a common setting for ZK proofs, thereby minimizing the number of multiplicative operations required.
+[Poseidon](https://eprint.iacr.org/2019/458.pdf) is a popular hash in the ZK ecosystem primarily because it's optimized to work over large prime fields, a common setting for ZK proofs, thereby minimizing the number of multiplicative operations required.

 Poseidon has also been specifically designed to be efficient when implemented within ZK circuits, Poseidon uses far less constraints compared to other hash functions like Keccak or SHA-256 in the context of ZK circuits.

@@ -42,7 +42,7 @@ To generate a secure hash output, the algorithm goes through a series of "full r

 **Linear Transformation and Round Constants:** A linear transformation is performed and round constants are added. The linear transformation in partial rounds can be designed to be less computationally intensive (this is done by using a sparse matrix) than in full rounds, further optimizing the function's efficiency.

-The user of Poseidon can often choose how many partial or full rounds he wishes to apply; more full rounds will increase security but degrade performance. The choice and balance is highly dependent on the use case.
+The user of Poseidon can often choose how many partial or full rounds he wishes to apply; more full rounds will increase security but degrade performance. The choice and balance are highly dependent on the use case.

 ## Using Poseidon

@@ -53,13 +53,14 @@ So for Poseidon of arity 2 and input of size 1024 * 2, we would expect 1024 elem

 ### Supported Bindings

+[`Go`](https://github.com/ingonyama-zk/icicle/blob/main/wrappers/golang/curves/bn254/poseidon/poseidon.go)
 [`Rust`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/icicle-core/src/poseidon)

 ### Constants

 Poseidon is extremely customizable and using different constants will produce different hashes, security levels and performance results.

-We support pre-calculated and optimized constants for each of the [supported curves](../core#supported-curves-and-operations).The constants can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/include/poseidon/constants) and are labeled clearly per curve `<curve_name>_poseidon.h`.
+We support pre-calculated and optimized constants for each of the [supported curves](../core#supported-curves-and-operations). The constants can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/include/poseidon/constants) and are labeled clearly per curve `<curve_name>_poseidon.h`.

 If you wish to generate your own constants you can use our python script which can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/include/poseidon/constants/generate_parameters.py).

@@ -91,8 +92,6 @@ primitive_element = 7 # bls12-381
 # primitive_element = 15 # bw6-761
 ```

-We only support `alpha = 5` so if you want to use another alpha for S-box please reach out on discord or open a github issue.
-
 ### Rust API

 This is the most basic way to use the Poseidon API.
@@ -101,71 +100,58 @@ This is the most basic way to use the Poseidon API.
 let test_size = 1 << 10;
 let arity = 2u32;
 let ctx = get_default_device_context();
-let constants = load_optimized_poseidon_constants::<F>(arity, &ctx).unwrap();
-let config = PoseidonConfig::default();
+let poseidon = Poseidon::load(arity, &ctx).unwrap();
+let config = HashConfig::default();

 let inputs = vec![F::one(); test_size * arity as usize];
 let outputs = vec![F::zero(); test_size];
 let mut input_slice = HostOrDeviceSlice::on_host(inputs);
 let mut output_slice = HostOrDeviceSlice::on_host(outputs);

-poseidon_hash_many::<F>(
+poseidon.hash_many::<F>(
    &mut input_slice,
    &mut output_slice,
    test_size as u32,
    arity as u32,
-    &constants,
+    1, // Output length
    &config,
 )
 .unwrap();
 ```

-The `PoseidonConfig::default()` can be modified, by default the inputs and outputs are set to be on `Host` for example.
+The `HashConfig` can be modified, by default the inputs and outputs are set to be on `Host` for example.

 ```rust
-impl<'a> Default for PoseidonConfig<'a> {
+impl<'a> Default for HashConfig<'a> {
    fn default() -> Self {
        let ctx = get_default_device_context();
        Self {
            ctx,
            are_inputs_on_device: false,
            are_outputs_on_device: false,
-            input_is_a_state: false,
-            aligned: false,
-            loop_state: false,
            is_async: false,
        }
    }
 }
 ```

-In the example above `load_optimized_poseidon_constants::<F>(arity, &ctx).unwrap();` is used which will load the correct constants based on arity and curve. Its possible to [generate](#constants) your own constants and load them.
+In the example above `Poseidon::load(arity, &ctx).unwrap();` is used which will load the correct constants based on arity and curve. It's possible to [generate](#constants) your own constants and load them.

 ```rust
 let ctx = get_default_device_context();
-    let cargo_manifest_dir = env!("CARGO_MANIFEST_DIR");
-    let constants_file = PathBuf::from(cargo_manifest_dir)
-        .join("tests")
-        .join(format!("{}_constants.bin", field_prefix));
-    let mut constants_buf = vec![];
-    File::open(constants_file)
-        .unwrap()
-        .read_to_end(&mut constants_buf)
-        .unwrap();
-
-    let mut custom_constants = vec![];
-    for chunk in constants_buf.chunks(field_bytes) {
-        custom_constants.push(F::from_bytes_le(chunk));
-    }
-
-    let custom_constants = create_optimized_poseidon_constants::<F>(
-        arity as u32,
-        &ctx,
-        full_rounds_half,
-        partial_rounds,
-        &mut custom_constants,
-    )
-    .unwrap();
+let custom_poseidon = Poseidon::new(
+    arity, // The arity of poseidon hash. The width will be equal to arity + 1
+    alpha, // The S-box power
+    full_rounds_half,
+    partial_rounds,
+    round_constants,
+    mds_matrix, 
+    non_sparse_matrix,
+    sparse_matrices,
+    domain_tag,
+    ctx,
+)
+.unwrap();
 ```

 ## The Tree Builder
@@ -175,21 +161,34 @@ The tree builder allows you to build Merkle trees using Poseidon.
 You can define both the tree's `height` and its `arity`. The tree `height` determines the number of layers in the tree, including the root and the leaf layer. The `arity` determines how many children each internal node can have.

 ```rust
-let height = 20;
-let arity = 2;
-let leaves = vec![F::one(); 1 << (height - 1)];
-let mut digests = vec![F::zero(); merkle_tree_digests_len(height, arity)];
-
-let mut leaves_slice = HostOrDeviceSlice::on_host(leaves);
-
-let ctx = get_default_device_context();
-let constants = load_optimized_poseidon_constants::<F>(arity, &ctx).unwrap()
+use icicle_bn254::tree::Bn254TreeBuilder;
+use icicle_bn254::poseidon::Poseidon;

 let mut config = TreeBuilderConfig::default();
-config.keep_rows = 1;
-build_poseidon_merkle_tree::<F>(&mut leaves_slice, &mut digests, height, arity, &constants, &config).unwrap();
+let arity = 2;
+config.arity = arity as u32;
+let input_block_len = arity;
+let leaves = vec![F::one(); (1 << height) * arity];
+let mut digests = vec![F::zero(); merkle_tree_digests_len((height + 1) as u32, arity as u32, 1)];

-println!("Root: {:?}", digests[0..1][0]);
+let leaves_slice = HostSlice::from_slice(&leaves);
+let digests_slice = HostSlice::from_mut_slice(&mut digests);
+
+let ctx = device_context::DeviceContext::default();
+let hash = Poseidon::load(2, &ctx).unwrap();
+
+let mut config = TreeBuilderConfig::default();
+config.keep_rows = 5;
+Bn254TreeBuilder::build_merkle_tree(
+    leaves_slice,
+    digests_slice,
+    height,
+    input_block_len,
+    &hash,
+    &hash,
+    &config,
+)
+.unwrap();
 ```

 Similar to Poseidon, you can also configure the Tree Builder `TreeBuilderConfig::default()`
--- a/docs/docs/icicle/primitives/poseidon2.md
+++ b/docs/docs/icicle/primitives/poseidon2.md
@@ -0,0 +1,88 @@
+# Poseidon2
+
+[Poseidon2](https://eprint.iacr.org/2023/323) is a recently released optimized version of Poseidon1. The two versions differ in two crucial points. First, Poseidon is a sponge hash function, while Poseidon2 can be either a sponge or a compression function depending on the use case. Secondly, Poseidon2 is instantiated by new and more efficient linear layers with respect to Poseidon. These changes decrease the number of multiplications in the linear layer by up to 90% and the number of constraints in Plonk circuits by up to 70%. This makes Poseidon2 currently the fastest arithmetization-oriented hash function without lookups.
+
+
+## Using Poseidon2
+
+ICICLE Poseidon2 is implemented for GPU and parallelization is performed for each state.
+We calculate multiple hash-sums over multiple pre-images in parallel, rather than going block by block over the input vector.
+
+For example, for Poseidon2 of width 16, input rate 8, output elements 8 and input of size 1024 * 8, we would expect 1024 * 8 elements of output. Which means each input block would be of size 8, resulting in 1024 Poseidon2 hashes being performed.
+
+### Supported Bindings
+
+[`Rust`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/icicle-core/src/poseidon2)
+
+### Constants
+
+Poseidon2 is also extremely customizable and using different constants will produce different hashes, security levels and performance results.
+
+We support pre-calculated constants for each of the [supported curves](../core#supported-curves-and-operations). The constants can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/include/poseidon2/constants) and are labeled clearly per curve `<curve_name>_poseidon2.h`.
+
+You can also use your own set of constants as shown [here](https://github.com/ingonyama-zk/icicle/blob/main/wrappers/rust/icicle-fields/icicle-babybear/src/poseidon2/mod.rs#L290)
+
+### Rust API
+
+This is the most basic way to use the Poseidon2 API.
+
+```rust
+let test_size = 1 << 10;
+let width = 16;
+let rate = 8;
+let ctx = get_default_device_context();
+let poseidon = Poseidon2::load(width, rate, MdsType::Default, DiffusionStrategy::Default, &ctx).unwrap();
+let config = HashConfig::default();
+
+let inputs = vec![F::one(); test_size * rate as usize];
+let outputs = vec![F::zero(); test_size];
+let mut input_slice = HostOrDeviceSlice::on_host(inputs);
+let mut output_slice = HostOrDeviceSlice::on_host(outputs);
+
+poseidon.hash_many::<F>(
+    &mut input_slice,
+    &mut output_slice,
+    test_size as u32,
+    rate as u32,
+    8, // Output length
+    &config,
+)
+.unwrap();
+```
+
+In the example above `Poseidon2::load(width, rate, MdsType::Default, DiffusionStrategy::Default, &ctx).unwrap();` is used to load the correct constants based on width and curve. Here, the default MDS matrices and diffusion are used. If you want to get a Plonky3 compliant version, set them to `MdsType::Plonky` and `DiffusionStrategy::Montgomery` respectively.
+
+## The Tree Builder
+
+Similar to Poseidon1, you can use Poseidon2 in a tree builder.
+
+```rust
+use icicle_bn254::tree::Bn254TreeBuilder;
+use icicle_bn254::poseidon2::Poseidon2;
+
+let mut config = TreeBuilderConfig::default();
+let arity = 2;
+config.arity = arity as u32;
+let input_block_len = arity;
+let leaves = vec![F::one(); (1 << height) * arity];
+let mut digests = vec![F::zero(); merkle_tree_digests_len((height + 1) as u32, arity as u32, 1)];
+
+let leaves_slice = HostSlice::from_slice(&leaves);
+let digests_slice = HostSlice::from_mut_slice(&mut digests);
+
+let ctx = device_context::DeviceContext::default();
+let hash = Poseidon2::load(arity, arity, MdsType::Default, DiffusionStrategy::Default, &ctx).unwrap();
+
+let mut config = TreeBuilderConfig::default();
+config.keep_rows = 5;
+Bn254TreeBuilder::build_merkle_tree(
+    leaves_slice,
+    digests_slice,
+    height,
+    input_block_len,
+    &hash,
+    &hash,
+    &config,
+)
+.unwrap();
+```
--- a/docs/package-lock.json
+++ b/docs/package-lock.json
@@ -3680,6 +3680,8 @@
      "version": "8.12.0",
      "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.12.0.tgz",
      "integrity": "sha512-sRu1kpcO9yLtYxBKvqfTeh9KzZEwO3STyX1HT+4CaDzC6HpTGYhIhPIzj9XuKU7KYDwnaeh5hcOwjy1QuJzBPA==",
+      "optional": true,
+      "peer": true,
      "dependencies": {
        "fast-deep-equal": "^3.1.1",
        "json-schema-traverse": "^1.0.0",
@@ -3694,7 +3696,9 @@
    "node_modules/ajv-formats/node_modules/json-schema-traverse": {
      "version": "1.0.0",
      "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz",
-      "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug=="
+      "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==",
+      "optional": true,
+      "peer": true
    },
    "node_modules/ajv-keywords": {
      "version": "3.5.2",
@@ -16340,14 +16344,13 @@
      "version": "2.1.1",
      "resolved": "https://registry.npmjs.org/ajv-formats/-/ajv-formats-2.1.1.tgz",
      "integrity": "sha512-Wx0Kx52hxE7C18hkMEggYlEifqWZtYaRgouJor+WMdPnQyEK13vgEWyVNup7SoeeoLMsr4kf5h6dOW11I15MUA==",
-      "requires": {
-        "ajv": "^8.0.0"
-      },
+      "requires": {},
      "dependencies": {
        "ajv": {
-          "version": "8.12.0",
-          "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.12.0.tgz",
+          "version": "https://registry.npmjs.org/ajv/-/ajv-8.12.0.tgz",
          "integrity": "sha512-sRu1kpcO9yLtYxBKvqfTeh9KzZEwO3STyX1HT+4CaDzC6HpTGYhIhPIzj9XuKU7KYDwnaeh5hcOwjy1QuJzBPA==",
+          "optional": true,
+          "peer": true,
          "requires": {
            "fast-deep-equal": "^3.1.1",
            "json-schema-traverse": "^1.0.0",
@@ -16358,7 +16361,9 @@
        "json-schema-traverse": {
          "version": "1.0.0",
          "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz",
-          "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug=="
+          "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==",
+          "optional": true,
+          "peer": true
        }
      }
    },
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -53,6 +53,11 @@ module.exports = {
              label: "Poseidon Hash",
              id: "icicle/primitives/poseidon",
            },
+            {
+              type: "doc",
+              label: "Poseidon2 Hash",
+              id: "icicle/primitives/poseidon2",
+            },
          ],
        },
        {
@@ -186,11 +191,6 @@ module.exports = {
        },
      ]
    },
-    {
-      type: "doc",
-      label: "ZK Containers",
-      id: "ZKContainers",
-    },
    {
      type: "doc",
      label: "Ingonyama Grant program",
--- a/examples/c++/msm/CMakeLists.txt
+++ b/examples/c++/msm/CMakeLists.txt
@@ -19,5 +19,5 @@ add_executable(
  example.cu
 )
 target_include_directories(example PRIVATE "../../../icicle/include")
-target_link_libraries(example ${CMAKE_SOURCE_DIR}/build/icicle/lib/libingo_curve_bn254.a ${CMAKE_SOURCE_DIR}/build/icicle/lib/libingo_curve_bls12_377.a)
+target_link_libraries(example ${CMAKE_SOURCE_DIR}/build/icicle/lib/libingo_curve_bn254.a)
 set_target_properties(example PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
--- a/examples/c++/msm/README.md
+++ b/examples/c++/msm/README.md
@@ -1,9 +1,5 @@
 # Icicle example: Muli-Scalar Multiplication (MSM)

-## Best-Practices
-
-We recommend to run our examples in [ZK-containers](../../ZK-containers.md) to save your time and mental energy.
-
 ## Key-Takeaway

 `Icicle` provides CUDA C++ template function `MSM` to accelerate [Multi-Scalar Multiplication](https://github.com/ingonyama-zk/ingopedia/blob/master/src/msm.md).
--- a/examples/c++/msm/compile.sh
+++ b/examples/c++/msm/compile.sh
@@ -10,9 +10,6 @@ mkdir -p build/icicle
 cmake -S ../../../icicle/ -B build/icicle -DCMAKE_BUILD_TYPE=Release -DCURVE=bn254 -DG2=ON
 cmake --build build/icicle

-cmake -S ../../../icicle/ -B build/icicle -DCMAKE_BUILD_TYPE=Release -DCURVE=bls12_377 -DG2=ON
-cmake --build build/icicle
-
 # Configure and build the example application
 cmake -S . -B build/example
 cmake --build build/example
--- a/examples/c++/msm/example.cu
+++ b/examples/c++/msm/example.cu
@@ -3,23 +3,9 @@
 #include <iomanip>

 #include "api/bn254.h"
-#include "api/bls12_377.h"
+using namespace bn254;

-// using namespace bn254;
-typedef bn254::scalar_t scalar_bn254;
-typedef bn254::affine_t affine_bn254;
-typedef bn254::g2_affine_t g2_affine_bn254;
-typedef bn254::projective_t projective_bn254;
-typedef bn254::g2_projective_t g2_projective_bn254;
-
-typedef bls12_377::scalar_t scalar_bls12377;
-typedef bls12_377::affine_t affine_bls12377;
-typedef bls12_377::g2_affine_t g2_affine_bls12377;
-typedef bls12_377::projective_t projective_bls12377;
-typedef bls12_377::g2_projective_t g2_projective_bls12377;
-
-
-int msm_bn254(int argc, char* argv[])
+int main(int argc, char* argv[])
 {
  std::cout << "Icicle example: Muli-Scalar Multiplication (MSM)" << std::endl;
  std::cout << "Example parameters" << std::endl;
@@ -32,11 +18,11 @@ int msm_bn254(int argc, char* argv[])
  std::cout << "Part I: use G1 points" << std::endl;

  std::cout << "Generating random inputs on-host" << std::endl;
-  scalar_bn254* scalars = new scalar_bn254[N];
-  affine_bn254* points = new affine_bn254[N];
-  projective_bn254 result;
-  scalar_bn254::rand_host_many(scalars, N);
-  projective_bn254::rand_host_many_affine(points, N);
+  scalar_t* scalars = new scalar_t[N];
+  affine_t* points = new affine_t[N];
+  projective_t result;
+  scalar_t::rand_host_many(scalars, N);
+  projective_t::rand_host_many_affine(points, N);

  std::cout << "Using default MSM configuration with on-host inputs" << std::endl;
  device_context::DeviceContext ctx = device_context::get_default_device_context();
@@ -62,17 +48,17 @@ int msm_bn254(int argc, char* argv[])
  cudaStream_t stream = config.ctx.stream;
  // Execute the MSM kernel
  bn254_msm_cuda(scalars, points, msm_size, config, &result);
-  std::cout << projective_bn254::to_affine(result) << std::endl;
+  std::cout << projective_t::to_affine(result) << std::endl;

  std::cout << "Copying inputs on-device" << std::endl;
-  scalar_bn254* scalars_d;
-  affine_bn254* points_d;
-  projective_bn254* result_d;
-  cudaMalloc(&scalars_d, sizeof(scalar_bn254) * N);
-  cudaMalloc(&points_d, sizeof(affine_bn254) * N);
-  cudaMalloc(&result_d, sizeof(projective_bn254));
-  cudaMemcpy(scalars_d, scalars, sizeof(scalar_bn254) * N, cudaMemcpyHostToDevice);
-  cudaMemcpy(points_d, points, sizeof(affine_bn254) * N, cudaMemcpyHostToDevice);
+  scalar_t* scalars_d;
+  affine_t* points_d;
+  projective_t* result_d;
+  cudaMalloc(&scalars_d, sizeof(scalar_t) * N);
+  cudaMalloc(&points_d, sizeof(affine_t) * N);
+  cudaMalloc(&result_d, sizeof(projective_t));
+  cudaMemcpy(scalars_d, scalars, sizeof(scalar_t) * N, cudaMemcpyHostToDevice);
+  cudaMemcpy(points_d, points, sizeof(affine_t) * N, cudaMemcpyHostToDevice);

  std::cout << "Reconfiguring MSM to use on-device inputs" << std::endl;
  config.are_results_on_device = true;
@@ -84,9 +70,9 @@ int msm_bn254(int argc, char* argv[])
  bn254_msm_cuda(scalars_d, points_d, msm_size, config, result_d);

  // Copy the result back to the host
-  cudaMemcpy(&result, result_d, sizeof(projective_bn254), cudaMemcpyDeviceToHost);
+  cudaMemcpy(&result, result_d, sizeof(projective_t), cudaMemcpyDeviceToHost);
  // Print the result
-  std::cout << projective_bn254::to_affine(result) << std::endl;
+  std::cout << projective_t::to_affine(result) << std::endl;
  // Free the device memory
  cudaFree(scalars_d);
  cudaFree(points_d);
@@ -98,25 +84,25 @@ int msm_bn254(int argc, char* argv[])

  std::cout << "Generating random inputs on-host" << std::endl;
  // use the same scalars
-  g2_affine_bn254* g2_points = new g2_affine_bn254[N];
-  g2_projective_bn254::rand_host_many_affine(g2_points, N);
+  g2_affine_t* g2_points = new g2_affine_t[N];
+  g2_projective_t::rand_host_many_affine(g2_points, N);

  std::cout << "Reconfiguring MSM to use on-host inputs" << std::endl;
  config.are_results_on_device = false;
  config.are_scalars_on_device = false;
  config.are_points_on_device = false;
-  g2_projective_bn254 g2_result;
+  g2_projective_t g2_result;
  bn254_g2_msm_cuda(scalars, g2_points, msm_size, config, &g2_result);
-  std::cout << g2_projective_bn254::to_affine(g2_result) << std::endl;
+  std::cout << g2_projective_t::to_affine(g2_result) << std::endl;

  std::cout << "Copying inputs on-device" << std::endl;
-  g2_affine_bn254* g2_points_d;
-  g2_projective_bn254* g2_result_d;
-  cudaMalloc(&scalars_d, sizeof(scalar_bn254) * N);
-  cudaMalloc(&g2_points_d, sizeof(g2_affine_bn254) * N);
-  cudaMalloc(&g2_result_d, sizeof(g2_projective_bn254));
-  cudaMemcpy(scalars_d, scalars, sizeof(scalar_bn254) * N, cudaMemcpyHostToDevice);
-  cudaMemcpy(g2_points_d, g2_points, sizeof(g2_affine_bn254) * N, cudaMemcpyHostToDevice);
+  g2_affine_t* g2_points_d;
+  g2_projective_t* g2_result_d;
+  cudaMalloc(&scalars_d, sizeof(scalar_t) * N);
+  cudaMalloc(&g2_points_d, sizeof(g2_affine_t) * N);
+  cudaMalloc(&g2_result_d, sizeof(g2_projective_t));
+  cudaMemcpy(scalars_d, scalars, sizeof(scalar_t) * N, cudaMemcpyHostToDevice);
+  cudaMemcpy(g2_points_d, g2_points, sizeof(g2_affine_t) * N, cudaMemcpyHostToDevice);

  std::cout << "Reconfiguring MSM to use on-device inputs" << std::endl;
  config.are_results_on_device = true;
@@ -125,140 +111,14 @@ int msm_bn254(int argc, char* argv[])

  std::cout << "Running MSM kernel with on-device inputs" << std::endl;
  bn254_g2_msm_cuda(scalars_d, g2_points_d, msm_size, config, g2_result_d);
-  cudaMemcpy(&g2_result, g2_result_d, sizeof(g2_projective_bn254), cudaMemcpyDeviceToHost);
-  std::cout << g2_projective_bn254::to_affine(g2_result) << std::endl;
+  cudaMemcpy(&g2_result, g2_result_d, sizeof(g2_projective_t), cudaMemcpyDeviceToHost);
+  std::cout << g2_projective_t::to_affine(g2_result) << std::endl;

  cudaFree(scalars_d);
  cudaFree(g2_points_d);
  cudaFree(g2_result_d);
  delete[] g2_points;
  delete[] scalars;
-  return 0;
-}
-
-int msm_bls12_377(int argc, char* argv[])
-{
-  std::cout << "Icicle example: Muli-Scalar Multiplication (MSM)" << std::endl;
-  std::cout << "Example parameters" << std::endl;
-  int batch_size = 1;
-  std::cout << "Batch size: " << batch_size << std::endl;
-  unsigned msm_size = 1048576;
-  std::cout << "MSM size: " << msm_size << std::endl;
-  int N = batch_size * msm_size;
-
-  std::cout << "Part I: use G1 points" << std::endl;
-
-  std::cout << "Generating random inputs on-host" << std::endl;
-  scalar_bls12377* scalars = new scalar_bls12377[N];
-  affine_bls12377* points = new affine_bls12377[N];
-  projective_bls12377 result;
-  scalar_bls12377::rand_host_many(scalars, N);
-  projective_bls12377::rand_host_many_affine(points, N);
-
-  std::cout << "Using default MSM configuration with on-host inputs" << std::endl;
-  device_context::DeviceContext ctx = device_context::get_default_device_context();
-  msm::MSMConfig config = {
-    ctx,   // ctx
-    0,     // points_size
-    1,     // precompute_factor
-    0,     // c
-    0,     // bitsize
-    10,    // large_bucket_factor
-    1,     // batch_size
-    false, // are_scalars_on_device
-    false, // are_scalars_montgomery_form
-    false, // are_points_on_device
-    false, // are_points_montgomery_form
-    false, // are_results_on_device
-    false, // is_big_triangle
-    false, // is_async
-  };
-  config.batch_size = batch_size;
-
-  std::cout << "Running MSM kernel with on-host inputs" << std::endl;
-  cudaStream_t stream = config.ctx.stream;
-  cudaStreamCreate(&stream);
-  // Execute the MSM kernel
-  bls12_377_msm_cuda(scalars, points, msm_size, config, &result);
-  std::cout << projective_bls12377::to_affine(result) << std::endl;
-
-  std::cout << "Copying inputs on-device" << std::endl;
-  scalar_bls12377* scalars_d_bls;
-  affine_bls12377* points_d_bls;
-  projective_bls12377* result_d_bls;
-  cudaMalloc(&scalars_d_bls, sizeof(scalar_bls12377) * N);
-  cudaMalloc(&points_d_bls, sizeof(affine_bls12377) * N);
-  cudaMalloc(&result_d_bls, sizeof(projective_bls12377));
-  cudaMemcpy(scalars_d_bls, scalars, sizeof(scalar_bls12377) * N, cudaMemcpyHostToDevice);
-  cudaMemcpy(points_d_bls, points, sizeof(affine_bls12377) * N, cudaMemcpyHostToDevice);
-
-  std::cout << "Reconfiguring MSM to use on-device inputs" << std::endl;
-  config.are_results_on_device = true;
-  config.are_scalars_on_device = true;
-  config.are_points_on_device = true;
-
-  std::cout << "Running MSM kernel with on-device inputs" << std::endl;
-  // Execute the MSM kernel
-  bls12_377_msm_cuda(scalars_d_bls, points_d_bls, msm_size, config, result_d_bls);
-
-  // Copy the result back to the host
-  cudaMemcpy(&result, result_d_bls, sizeof(projective_bls12377), cudaMemcpyDeviceToHost);
-  // Print the result
-  std::cout << projective_bls12377::to_affine(result) << std::endl;
-  // Free the device memory
-  cudaFree(scalars_d_bls);
-  cudaFree(points_d_bls);
-  cudaFree(result_d_bls);
-  // Free the host memory, keep scalars for G2 example
-  delete[] points;
-
-  std::cout << "Part II: use G2 points" << std::endl;
-
-  std::cout << "Generating random inputs on-host" << std::endl;
-  // use the same scalars
-  g2_affine_bls12377* g2_points = new g2_affine_bls12377[N];
-  g2_projective_bls12377::rand_host_many_affine(g2_points, N);
-
-  std::cout << "Reconfiguring MSM to use on-host inputs" << std::endl;
-  config.are_results_on_device = false;
-  config.are_scalars_on_device = false;
-  config.are_points_on_device = false;
-  g2_projective_bls12377 g2_result;
-  bls12_377_g2_msm_cuda(scalars, g2_points, msm_size, config, &g2_result);
-  std::cout << g2_projective_bls12377::to_affine(g2_result) << std::endl;
-
-  std::cout << "Copying inputs on-device" << std::endl;
-  g2_affine_bls12377* g2_points_d;
-  g2_projective_bls12377* g2_result_d;
-  cudaMalloc(&scalars_d_bls, sizeof(scalar_bls12377) * N);
-  cudaMalloc(&g2_points_d, sizeof(g2_affine_bls12377) * N);
-  cudaMalloc(&g2_result_d, sizeof(g2_projective_bls12377));
-  cudaMemcpy(scalars_d_bls, scalars, sizeof(scalar_bls12377) * N, cudaMemcpyHostToDevice);
-  cudaMemcpy(g2_points_d, g2_points, sizeof(g2_affine_bls12377) * N, cudaMemcpyHostToDevice);
-
-  std::cout << "Reconfiguring MSM to use on-device inputs" << std::endl;
-  config.are_results_on_device = true;
-  config.are_scalars_on_device = true;
-  config.are_points_on_device = true;
-
-  std::cout << "Running MSM kernel with on-device inputs" << std::endl;
-  bls12_377_g2_msm_cuda(scalars_d_bls, g2_points_d, msm_size, config, g2_result_d);
-  cudaMemcpy(&g2_result, g2_result_d, sizeof(g2_projective_bn254), cudaMemcpyDeviceToHost);
-  std::cout << g2_projective_bls12377::to_affine(g2_result) << std::endl;
-
-  cudaFree(scalars_d_bls);
-  cudaFree(g2_points_d);
-  cudaFree(g2_result_d);
-  delete[] g2_points;
-  delete[] scalars;
-  return 0;
-}
-
-int main(int argc, char* argv[])
-{ 
-  std::cout << "Starting BN254 MSM" << std::endl;
-  msm_bn254(argc, argv);
-  std::cout << "Starting BLS12-377 MSM" << std::endl;
-  msm_bls12_377(argc, argv);
+  cudaStreamDestroy(stream);
  return 0;
 }
--- a/examples/c++/multiply/CMakeLists.txt
+++ b/examples/c++/multiply/CMakeLists.txt
@@ -18,7 +18,7 @@ add_executable(
  example.cu
 )
 target_include_directories(example PRIVATE "../../../icicle/include")
-target_link_libraries(example ${CMAKE_SOURCE_DIR}/build/icicle/lib/libingo_field_bn254.a ${CMAKE_SOURCE_DIR}/build/icicle/lib/libingo_field_bls12_377.a)
+target_link_libraries(example ${CMAKE_SOURCE_DIR}/build/icicle/lib/libingo_field_bn254.a)
 find_library(NVML_LIBRARY nvidia-ml PATHS /usr/local/cuda/targets/x86_64-linux/lib/stubs/ )
 target_link_libraries(example ${NVML_LIBRARY})
 set_target_properties(example PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
--- a/examples/c++/multiply/README.md
+++ b/examples/c++/multiply/README.md
@@ -1,9 +1,5 @@
 # Icicle example: Multiplication

-## Best-Practices
-
-We recommend to run our examples in [ZK-containers](../../ZK-containers.md) to save your time and mental energy.
-
 ## Key-Takeaway

 `Icicle` accelerates multiplication operation `*` using [Karatsuba algorithm](https://en.wikipedia.org/wiki/Karatsuba_algorithm)
--- a/examples/c++/multiply/compile.sh
+++ b/examples/c++/multiply/compile.sh
@@ -7,13 +7,9 @@ mkdir -p build/example
 mkdir -p build/icicle

 # Configure and build Icicle
-cmake -S ../../../icicle/ -B build/icicle -DMSM=OFF -DCMAKE_BUILD_TYPE=Release -DCURVE=bls12_377
-cmake --build build/icicle
-rm build/icicle/CMakeCache.txt 
 cmake -S ../../../icicle/ -B build/icicle -DMSM=OFF -DCMAKE_BUILD_TYPE=Release -DCURVE=bn254
 cmake --build build/icicle

-
 # Configure and build the example application
 cmake -S . -B build/example
 cmake --build build/example
--- a/examples/c++/multiply/example.cu
+++ b/examples/c++/multiply/example.cu
@@ -4,17 +4,14 @@
 #include <nvml.h>

 #include "api/bn254.h"
-#include "api/bls12_377.h"
 #include "vec_ops/vec_ops.cuh"

 using namespace vec_ops;
-// using namespace bn254;
-typedef bn254::scalar_t T;
+using namespace bn254;

-typedef  bls12_377::scalar_t T_bls;
+typedef scalar_t T;

-
-int vector_mult_bn254(T* vec_b, T* vec_a, T* vec_result, size_t n_elments, device_context::DeviceContext ctx)
+int vector_mult(T* vec_b, T* vec_a, T* vec_result, size_t n_elments, device_context::DeviceContext ctx)
 {
  vec_ops::VecOpsConfig config = vec_ops::DefaultVecOpsConfig();
  config.is_a_on_device = true;
@@ -28,24 +25,10 @@ int vector_mult_bn254(T* vec_b, T* vec_a, T* vec_result, size_t n_elments, devic
  return 0;
 }

-int vector_mult_bls12377(T_bls* vec_b, T_bls* vec_a, T_bls* vec_result, size_t n_elments, device_context::DeviceContext ctx)
-{
-  vec_ops::VecOpsConfig config = vec_ops::DefaultVecOpsConfig();
-  config.is_a_on_device = true;
-  config.is_b_on_device = true;
-  config.is_result_on_device = true;
-  cudaError_t err = bls12_377_mul_cuda(vec_a, vec_b, n_elments, config, vec_result);
-  if (err != cudaSuccess) {
-    std::cerr << "Failed to multiply vectors - " << cudaGetErrorString(err) << std::endl;
-    return 0;
-  }
-  return 0;
-}
-
 int main(int argc, char** argv)
 {
  const unsigned vector_size = 1 << 15;
-  const unsigned repetitions = 1 ;
+  const unsigned repetitions = 1 << 15;

  cudaError_t err;
  nvmlInit();
@@ -78,53 +61,41 @@ int main(int argc, char** argv)
  // host data
  T* host_in1 = (T*)malloc(vector_size * sizeof(T));
  T* host_in2 = (T*)malloc(vector_size * sizeof(T));
-  T_bls* host_in1_bls12377 = (T_bls*)malloc(vector_size * sizeof(T_bls));
-  T_bls* host_in2_bls12377 = (T_bls*)malloc(vector_size * sizeof(T_bls));
  std::cout << "Initializing vectors with random data" << std::endl;
  T::rand_host_many(host_in1, vector_size);
  T::rand_host_many(host_in2, vector_size);
-  T_bls::rand_host_many(host_in1_bls12377, vector_size);
-  T_bls::rand_host_many(host_in2_bls12377, vector_size);
  // device data
  device_context::DeviceContext ctx = device_context::get_default_device_context();
-  T* device_in1_bn254;
-  T* device_in2_bn254;
-  T* device_out_bn254;
-  T_bls* device_in1_bls12377;
-  T_bls* device_in2_bls12377;
-  T_bls* device_out_bls12377;
+  T* device_in1;
+  T* device_in2;
+  T* device_out;

-  err = cudaMalloc((void**)&device_in1_bn254, vector_size * sizeof(T));
-  err = cudaMalloc((void**)&device_in1_bls12377, vector_size * sizeof(T_bls));
+  err = cudaMalloc((void**)&device_in1, vector_size * sizeof(T));
  if (err != cudaSuccess) {
    std::cerr << "Failed to allocate device memory - " << cudaGetErrorString(err) << std::endl;
    return 0;
  }

-  err = cudaMalloc((void**)&device_in2_bn254, vector_size * sizeof(T));
-  err = cudaMalloc((void**)&device_in2_bls12377, vector_size * sizeof(T_bls));
+  err = cudaMalloc((void**)&device_in2, vector_size * sizeof(T));
  if (err != cudaSuccess) {
    std::cerr << "Failed to allocate device memory - " << cudaGetErrorString(err) << std::endl;
    return 0;
  }

-  err = cudaMalloc((void**)&device_out_bn254, vector_size * sizeof(T));
-  err = cudaMalloc((void**)&device_out_bls12377, vector_size * sizeof(T_bls));
+  err = cudaMalloc((void**)&device_out, vector_size * sizeof(T));
  if (err != cudaSuccess) {
    std::cerr << "Failed to allocate device memory - " << cudaGetErrorString(err) << std::endl;
    return 0;
  }

  // copy from host to device
-  err = cudaMemcpy(device_in1_bn254, host_in1, vector_size * sizeof(T), cudaMemcpyHostToDevice);
-  err = cudaMemcpy(device_in1_bls12377, host_in1_bls12377, vector_size * sizeof(T_bls), cudaMemcpyHostToDevice);
+  err = cudaMemcpy(device_in1, host_in1, vector_size * sizeof(T), cudaMemcpyHostToDevice);
  if (err != cudaSuccess) {
    std::cerr << "Failed to copy data from host to device - " << cudaGetErrorString(err) << std::endl;
    return 0;
  }

-  err = cudaMemcpy(device_in2_bn254, host_in2, vector_size * sizeof(T), cudaMemcpyHostToDevice);
-  err = cudaMemcpy(device_in2_bls12377, host_in2_bls12377, vector_size * sizeof(T_bls), cudaMemcpyHostToDevice);
+  err = cudaMemcpy(device_in2, host_in2, vector_size * sizeof(T), cudaMemcpyHostToDevice);
  if (err != cudaSuccess) {
    std::cerr << "Failed to copy data from host to device - " << cudaGetErrorString(err) << std::endl;
    return 0;
@@ -133,10 +104,7 @@ int main(int argc, char** argv)
  std::cout << "Starting warm-up" << std::endl;
  // Warm-up loop
  for (int i = 0; i < repetitions; i++) {
-    std::cout << "bn254 mult" << std::endl;
-    vector_mult_bn254(device_in1_bn254, device_in2_bn254, device_out_bn254, vector_size, ctx);
-    std::cout << "bls12-377 mult" << std::endl;
-    vector_mult_bls12377(device_in1_bls12377, device_in2_bls12377, device_out_bls12377, vector_size, ctx);
+    vector_mult(device_in1, device_in2, device_out, vector_size, ctx);
  }

  std::cout << "Starting benchmarking" << std::endl;
@@ -154,7 +122,7 @@ int main(int argc, char** argv)
  auto start_time = std::chrono::high_resolution_clock::now();
  // Benchmark loop
  for (int i = 0; i < repetitions; i++) {
-    vector_mult_bn254(device_in1_bn254, device_in2_bn254, device_out_bn254, vector_size, ctx);
+    vector_mult(device_in1, device_in2, device_out, vector_size, ctx);
  }
  auto end_time = std::chrono::high_resolution_clock::now();
  auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
@@ -178,7 +146,7 @@ int main(int argc, char** argv)
  // Optional: validate multiplication
  T* host_out = (T*)malloc(vector_size * sizeof(T));

-  cudaMemcpy(host_out, device_out_bn254, vector_size * sizeof(T), cudaMemcpyDeviceToHost);
+  cudaMemcpy(host_out, device_out, vector_size * sizeof(T), cudaMemcpyDeviceToHost);

  // validate multiplication here...

@@ -186,9 +154,9 @@ int main(int argc, char** argv)
  free(host_in1);
  free(host_in2);
  free(host_out);
-  cudaFree(device_in1_bn254);
-  cudaFree(device_in2_bn254);
-  cudaFree(device_out_bn254);
+  cudaFree(device_in1);
+  cudaFree(device_in2);
+  cudaFree(device_out);
  nvmlShutdown();
  return 0;
 }
--- a/examples/c++/ntt/README.md
+++ b/examples/c++/ntt/README.md
@@ -1,9 +1,5 @@
 # Icicle example: Number-Theoretical Transform (NTT)

-## Best-Practices
-
-We recommend to run our examples in [ZK-containers](../../ZK-containers.md) to save your time and mental energy.
-
 ## Key-Takeaway

 `Icicle` provides CUDA C++ template function NTT for [Number Theoretical Transform](https://github.com/ingonyama-zk/ingopedia/blob/master/src/fft.md), also known as Discrete Fourier Transform.
--- a/examples/c++/pedersen-commitment/README.md
+++ b/examples/c++/pedersen-commitment/README.md
@@ -1,9 +1,5 @@
 # ICICLE example: Pedersen Commitment

-## Best-Practices
-
-We recommend to run our examples in [ZK-containers](../../ZK-containers.md) to save your time and mental energy.
-
 ## Key-Takeaway

 A Pedersen Commitment is a cryptographic primitive to commit to a value or a vector of values while keeping it hidden, yet enabling the committer to reveal the value later. It provides both hiding (the commitment does not reveal any information about the value) and binding properties (once a value is committed, it cannot be changed without detection).
--- a/examples/c++/polynomial-api/README.md
+++ b/examples/c++/polynomial-api/README.md
@@ -1,9 +1,5 @@
 # ICICLE examples: computations with polynomials

-## Best-Practices
-
-We recommend to run our examples in [ZK-containers](../../ZK-containers.md) to save your time and mental energy.
-
 ## Key-Takeaway

 Polynomials are crucial for Zero-Knowledge Proofs (ZKPs): they enable efficient representation and verification of computational statements, facilitate privacy-preserving protocols, and support complex mathematical operations essential for constructing and verifying proofs without revealing underlying data. Polynomial API is documented [here](https://dev.ingonyama.com/icicle/polynomials/overview)
--- a/examples/c++/poseidon/README.md
+++ b/examples/c++/poseidon/README.md
@@ -1,9 +1,5 @@
 # Icicle example: build a Merkle tree using Poseidon hash

-## Best-Practices
-
-We recommend to run our examples in [ZK-containers](../../ZK-containers.md) to save your time and mental energy.
-
 ## Key-Takeaway

 `Icicle` provides CUDA C++ template `poseidon_hash` to accelerate the popular [Poseidon hash function](https://www.poseidon-hash.info/).
--- a/examples/rust/msm/README.md
+++ b/examples/rust/msm/README.md
@@ -2,10 +2,6 @@

 `ICICLE` provides Rust bindings to CUDA-accelerated C++ implementation of [Multi-Scalar Multiplication](https://github.com/ingonyama-zk/ingopedia/blob/master/src/msm.md).

-## Best Practices
-
-In order to save time and setting up prerequisites manually, we recommend running this example in our [ZKContainer](../../ZKContainer.md).
-
 ## Usage

 ```rust
--- a/examples/rust/ntt/README.md
+++ b/examples/rust/ntt/README.md
@@ -4,10 +4,6 @@

 `ICICLE` provides Rust bindings to CUDA-accelerated C++ implementation of [Number Theoretic Transform](https://github.com/ingonyama-zk/ingopedia/blob/master/src/fft.md).

-## Best Practices
-
-In order to save time and setting up prerequisites manually, we recommend running this example in our [ZKContainer](../../ZKContainer.md).
-
 ## Usage

 ```rust
--- a/icicle/include/fields/field.cuh
+++ b/icicle/include/fields/field.cuh
@@ -124,6 +124,19 @@ public:
   */
  static constexpr HOST_DEVICE_INLINE unsigned num_of_reductions() { return CONFIG::num_of_reductions; }

+  // count number of bits of the field element without leading zeros.
+  static constexpr HOST_DEVICE_INLINE unsigned num_bits(const Field& x)
+  {
+    size_t size = sizeof(x.limbs_storage.limbs[0]) * 8;
+    unsigned ret = size * TLC;
+    for (unsigned i = TLC; i-- > 0;) {
+      int leading = __clz(x.limbs_storage.limbs[i]);
+      ret -= leading;
+      if (leading != size) { break; }
+    }
+    return ret;
+  }
+
  static constexpr unsigned slack_bits = 32 * TLC - NBITS;

  struct Wide {
--- a/icicle/include/hash/keccak/keccak.cuh
+++ b/icicle/include/hash/keccak/keccak.cuh
@@ -11,9 +11,25 @@
 using namespace hash;

 namespace keccak {
+  // Input rate in bytes
+  const int KECCAK_256_RATE = 136;
+  const int KECCAK_512_RATE = 72;
+
+  // Digest size in u64
+  const int KECCAK_256_DIGEST = 4;
+  const int KECCAK_512_DIGEST = 8;
+
+  // Number of state elements in u64
+  const int KECCAK_STATE_SIZE = 25;
+
+  const int KECCAK_PADDING_CONST = 1;
+  const int SHA3_PADDING_CONST = 6;
+
  class Keccak : public Hasher<uint8_t, uint64_t>
  {
  public:
+    const int PADDING_CONST;
+
    cudaError_t run_hash_many_kernel(
      const uint8_t* input,
      uint64_t* output,
@@ -22,7 +38,34 @@ namespace keccak {
      unsigned int output_len,
      const device_context::DeviceContext& ctx) const override;

-    Keccak(unsigned int rate) : Hasher<uint8_t, uint64_t>(25, 25, rate, 0) {}
+    Keccak(unsigned int rate, unsigned int padding_const)
+        : Hasher<uint8_t, uint64_t>(KECCAK_STATE_SIZE, KECCAK_STATE_SIZE, rate, 0), PADDING_CONST(padding_const)
+    {
+    }
+  };
+
+  class Keccak256 : public Keccak
+  {
+  public:
+    Keccak256() : Keccak(KECCAK_256_RATE, KECCAK_PADDING_CONST) {}
+  };
+
+  class Keccak512 : public Keccak
+  {
+  public:
+    Keccak512() : Keccak(KECCAK_512_RATE, KECCAK_PADDING_CONST) {}
+  };
+
+  class Sha3_256 : public Keccak
+  {
+  public:
+    Sha3_256() : Keccak(KECCAK_256_RATE, SHA3_PADDING_CONST) {}
+  };
+
+  class Sha3_512 : public Keccak
+  {
+  public:
+    Sha3_512() : Keccak(KECCAK_512_RATE, SHA3_PADDING_CONST) {}
  };
 } // namespace keccak

--- a/icicle/src/curves/extern.cu
+++ b/icicle/src/curves/extern.cu
@@ -20,6 +20,11 @@ extern "C" void CONCAT_EXPAND(CURVE, to_affine)(projective_t* point, affine_t* p
  *point_out = projective_t::to_affine(*point);
 }

+extern "C" void CONCAT_EXPAND(CURVE, from_affine)(affine_t* point, projective_t* point_out)
+{
+  *point_out = projective_t::from_affine(*point);
+}
+
 extern "C" void CONCAT_EXPAND(CURVE, generate_projective_points)(projective_t* points, int size)
 {
  projective_t::rand_host_many(points, size);
--- a/icicle/src/curves/extern_g2.cu
+++ b/icicle/src/curves/extern_g2.cu
@@ -20,6 +20,11 @@ extern "C" void CONCAT_EXPAND(CURVE, g2_to_affine)(g2_projective_t* point, g2_af
  *point_out = g2_projective_t::to_affine(*point);
 }

+extern "C" void CONCAT_EXPAND(CURVE, g2_from_affine)(g2_affine_t* point, g2_projective_t* point_out)
+{
+  *point_out = g2_projective_t::from_affine(*point);
+}
+
 extern "C" void CONCAT_EXPAND(CURVE, g2_generate_projective_points)(g2_projective_t* points, int size)
 {
  g2_projective_t::rand_host_many(points, size);
--- a/icicle/src/hash/keccak/extern.cu
+++ b/icicle/src/hash/keccak/extern.cu
@@ -11,13 +11,29 @@ namespace keccak {
  extern "C" cudaError_t
  keccak256_cuda(uint8_t* input, int input_block_size, int number_of_blocks, uint8_t* output, HashConfig& config)
  {
-    return Keccak(136).hash_many(input, (uint64_t*)output, number_of_blocks, input_block_size, 4, config);
+    return Keccak256().hash_many(
+      input, (uint64_t*)output, number_of_blocks, input_block_size, KECCAK_256_DIGEST, config);
  }

  extern "C" cudaError_t
  keccak512_cuda(uint8_t* input, int input_block_size, int number_of_blocks, uint8_t* output, HashConfig& config)
  {
-    return Keccak(72).hash_many(input, (uint64_t*)output, number_of_blocks, input_block_size, 8, config);
+    return Keccak512().hash_many(
+      input, (uint64_t*)output, number_of_blocks, input_block_size, KECCAK_512_DIGEST, config);
+  }
+
+  extern "C" cudaError_t
+  sha3_256_cuda(uint8_t* input, int input_block_size, int number_of_blocks, uint8_t* output, HashConfig& config)
+  {
+    return Sha3_256().hash_many(
+      input, (uint64_t*)output, number_of_blocks, input_block_size, KECCAK_256_DIGEST, config);
+  }
+
+  extern "C" cudaError_t
+  sha3_512_cuda(uint8_t* input, int input_block_size, int number_of_blocks, uint8_t* output, HashConfig& config)
+  {
+    return Sha3_512().hash_many(
+      input, (uint64_t*)output, number_of_blocks, input_block_size, KECCAK_512_DIGEST, config);
  }

  extern "C" cudaError_t build_keccak256_merkle_tree_cuda(
@@ -27,7 +43,7 @@ namespace keccak {
    unsigned int input_block_len,
    const merkle_tree::TreeBuilderConfig& tree_config)
  {
-    Keccak keccak(136);
+    Keccak256 keccak;
    return merkle_tree::build_merkle_tree<uint8_t, uint64_t>(
      leaves, digests, height, input_block_len, keccak, keccak, tree_config);
  }
@@ -39,7 +55,31 @@ namespace keccak {
    unsigned int input_block_len,
    const merkle_tree::TreeBuilderConfig& tree_config)
  {
-    Keccak keccak(72);
+    Keccak512 keccak;
+    return merkle_tree::build_merkle_tree<uint8_t, uint64_t>(
+      leaves, digests, height, input_block_len, keccak, keccak, tree_config);
+  }
+
+  extern "C" cudaError_t build_sha3_256_merkle_tree_cuda(
+    const uint8_t* leaves,
+    uint64_t* digests,
+    unsigned int height,
+    unsigned int input_block_len,
+    const merkle_tree::TreeBuilderConfig& tree_config)
+  {
+    Sha3_256 keccak;
+    return merkle_tree::build_merkle_tree<uint8_t, uint64_t>(
+      leaves, digests, height, input_block_len, keccak, keccak, tree_config);
+  }
+
+  extern "C" cudaError_t build_sha3_512_merkle_tree_cuda(
+    const uint8_t* leaves,
+    uint64_t* digests,
+    unsigned int height,
+    unsigned int input_block_len,
+    const merkle_tree::TreeBuilderConfig& tree_config)
+  {
+    Sha3_512 keccak;
    return merkle_tree::build_merkle_tree<uint8_t, uint64_t>(
      leaves, digests, height, input_block_len, keccak, keccak, tree_config);
  }
--- a/icicle/src/hash/keccak/keccak.cu
+++ b/icicle/src/hash/keccak/keccak.cu
@@ -161,7 +161,7 @@ namespace keccak {
                                 0x8000000000008002, 0x8000000000000080, 0x000000000000800a, 0x800000008000000a,
                                 0x8000000080008081, 0x8000000000008080, 0x0000000080000001, 0x8000000080008008};

-  __device__ void keccakf(u64 s[25])
+  __device__ void keccakf(u64 s[KECCAK_STATE_SIZE])
  {
    u64 t0, t1, t2, t3, t4;

@@ -180,15 +180,20 @@ namespace keccak {
  }

  template <const int R>
-  __global__ void
-  keccak_hash_blocks(const uint8_t* input, int input_block_size, int output_len, int number_of_blocks, uint64_t* output)
+  __global__ void keccak_hash_blocks(
+    const uint8_t* input,
+    int input_block_size,
+    int output_len,
+    int number_of_blocks,
+    uint64_t* output,
+    int padding_const)
  {
    int sid = (blockIdx.x * blockDim.x) + threadIdx.x;
    if (sid >= number_of_blocks) { return; }

    const uint8_t* b_input = input + sid * input_block_size;
    uint64_t* b_output = output + sid * output_len;
-    uint64_t state[25] = {}; // Initialize with zeroes
+    uint64_t state[KECCAK_STATE_SIZE] = {}; // Initialize with zeroes

    int input_len = input_block_size;

@@ -209,7 +214,7 @@ namespace keccak {
    }

    // pad 10*1
-    last_block[input_len] = 1;
+    last_block[input_len] = padding_const;
    for (int i = 0; i < R - input_len - 1; i++) {
      last_block[input_len + i + 1] = 0;
    }
@@ -238,13 +243,13 @@ namespace keccak {
    int number_of_gpu_blocks = (number_of_states - 1) / number_of_threads + 1;

    switch (rate) {
-    case 136:
-      keccak_hash_blocks<136><<<number_of_gpu_blocks, number_of_threads, 0, ctx.stream>>>(
-        input, input_len, output_len, number_of_states, output);
+    case KECCAK_256_RATE:
+      keccak_hash_blocks<KECCAK_256_RATE><<<number_of_gpu_blocks, number_of_threads, 0, ctx.stream>>>(
+        input, input_len, output_len, number_of_states, output, PADDING_CONST);
      break;
-    case 72:
-      keccak_hash_blocks<72><<<number_of_gpu_blocks, number_of_threads, 0, ctx.stream>>>(
-        input, input_len, output_len, number_of_states, output);
+    case KECCAK_512_RATE:
+      keccak_hash_blocks<KECCAK_512_RATE><<<number_of_gpu_blocks, number_of_threads, 0, ctx.stream>>>(
+        input, input_len, output_len, number_of_states, output, PADDING_CONST);
      break;
    default:
      THROW_ICICLE_ERR(IcicleError_t::InvalidArgument, "KeccakHash: #rate must be one of [136, 72]");
--- a/icicle/src/merkle-tree/merkle.cu
+++ b/icicle/src/merkle-tree/merkle.cu
@@ -129,8 +129,9 @@ namespace merkle_tree {

    while (number_of_states > 0) {
      CHK_IF_RETURN(compression.run_hash_many_kernel(
-        (L*)prev_layer, next_layer, number_of_states, tree_config.digest_elements * tree_config.arity,
-        tree_config.digest_elements, hash_config.ctx));
+        (L*)prev_layer, next_layer, number_of_states,
+        tree_config.digest_elements * tree_config.arity * (sizeof(D) / sizeof(L)), tree_config.digest_elements,
+        hash_config.ctx));

      if (!keep_rows || subtree_height < keep_rows) {
        D* digests_with_offset =
@@ -298,8 +299,9 @@ namespace merkle_tree {
      size_t segment_offset = start_segment_offset;
      while (number_of_states > 0) {
        CHK_IF_RETURN(compression.run_hash_many_kernel(
-          (L*)prev_layer, next_layer, number_of_states, tree_config.digest_elements * tree_config.arity,
-          tree_config.digest_elements, tree_config.ctx));
+          (L*)prev_layer, next_layer, number_of_states,
+          tree_config.digest_elements * tree_config.arity * (sizeof(D) / sizeof(L)), tree_config.digest_elements,
+          tree_config.ctx));
        if (!tree_config.keep_rows || cap_height < tree_config.keep_rows + (int)caps_mode) {
          D* digests_with_offset = digests + segment_offset;
          CHK_IF_RETURN(cudaMemcpyAsync(
--- a/wrappers/golang/core/internal/mock_curve.go
+++ b/wrappers/golang/core/internal/mock_curve.go
@@ -28,21 +28,6 @@ func (p *MockProjective) FromLimbs(x, y, z []uint32) MockProjective {
 	return *p
 }

-func (p *MockProjective) FromAffine(a MockAffine) MockProjective {
-	z := MockBaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}
-
-	return *p
-}
-
 type MockAffine struct {
 	X, Y MockBaseField
 }
@@ -68,18 +53,3 @@ func (a *MockAffine) FromLimbs(x, y []uint32) MockAffine {

 	return *a
 }
-
-func (a MockAffine) ToProjective() MockProjective {
-	var z MockBaseField
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p MockProjective
-		return p.Zero()
-	}
-
-	return MockProjective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
-}
--- a/wrappers/golang/curves/bls12377/curve.go
+++ b/wrappers/golang/curves/bls12377/curve.go
@@ -40,17 +40,10 @@ func (p *Projective) FromLimbs(x, y, z []uint32) Projective {
 }

 func (p *Projective) FromAffine(a Affine) Projective {
-	z := BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
+	C.bls12_377_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *Projective) ProjectiveToAffine() Affine {
 	var a Affine

 	cA := (*C.affine_t)(unsafe.Pointer(&a))
-	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
 	C.bls12_377_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,12 @@ func (a *Affine) FromLimbs(x, y []uint32) Affine {
 }

 func (a Affine) ToProjective() Projective {
-	var z BaseField
+	var p Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p Projective
-		return p.Zero()
-	}
-
-	return Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	C.bls12_377_from_affine(cA, cP)
+	return p
 }

 func AffineFromProjective(p *Projective) Affine {
--- a/wrappers/golang/curves/bls12377/g2/curve.go
+++ b/wrappers/golang/curves/bls12377/g2/curve.go
@@ -40,17 +40,10 @@ func (p *G2Projective) FromLimbs(x, y, z []uint32) G2Projective {
 }

 func (p *G2Projective) FromAffine(a G2Affine) G2Projective {
-	z := G2BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(p))
+	C.bls12_377_g2_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *G2Projective) ProjectiveToAffine() G2Affine {
 	var a G2Affine

 	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
-	cP := (*C.g2_projective_t)(unsafe.Pointer(&p))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(p))
 	C.bls12_377_g2_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,12 @@ func (a *G2Affine) FromLimbs(x, y []uint32) G2Affine {
 }

 func (a G2Affine) ToProjective() G2Projective {
-	var z G2BaseField
+	var p G2Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p G2Projective
-		return p.Zero()
-	}
-
-	return G2Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
+	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(&p))
+	C.bls12_377_g2_from_affine(cA, cP)
+	return p
 }

 func G2AffineFromProjective(p *G2Projective) G2Affine {
--- a/wrappers/golang/curves/bls12377/g2/include/curve.h
+++ b/wrappers/golang/curves/bls12377/g2/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool bls12_377_g2_eq(g2_projective_t* point1, g2_projective_t* point2);
 void bls12_377_g2_to_affine(g2_projective_t* point, g2_affine_t* point_out);
+void bls12_377_g2_from_affine(g2_affine_t* point, g2_projective_t* point_out);
 void bls12_377_g2_generate_projective_points(g2_projective_t* points, int size);
 void bls12_377_g2_generate_affine_points(g2_affine_t* points, int size);
 cudaError_t bls12_377_g2_affine_convert_montgomery(g2_affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/curves/bls12377/include/curve.h
+++ b/wrappers/golang/curves/bls12377/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool bls12_377_eq(projective_t* point1, projective_t* point2);
 void bls12_377_to_affine(projective_t* point, affine_t* point_out);
+void bls12_377_from_affine(affine_t* point, projective_t* point_out);
 void bls12_377_generate_projective_points(projective_t* points, int size);
 void bls12_377_generate_affine_points(affine_t* points, int size);
 cudaError_t bls12_377_affine_convert_montgomery(affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/curves/bls12381/curve.go
+++ b/wrappers/golang/curves/bls12381/curve.go
@@ -40,17 +40,10 @@ func (p *Projective) FromLimbs(x, y, z []uint32) Projective {
 }

 func (p *Projective) FromAffine(a Affine) Projective {
-	z := BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
+	C.bls12_381_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *Projective) ProjectiveToAffine() Affine {
 	var a Affine

 	cA := (*C.affine_t)(unsafe.Pointer(&a))
-	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
 	C.bls12_381_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,12 @@ func (a *Affine) FromLimbs(x, y []uint32) Affine {
 }

 func (a Affine) ToProjective() Projective {
-	var z BaseField
+	var p Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p Projective
-		return p.Zero()
-	}
-
-	return Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	C.bls12_381_from_affine(cA, cP)
+	return p
 }

 func AffineFromProjective(p *Projective) Affine {
--- a/wrappers/golang/curves/bls12381/g2/curve.go
+++ b/wrappers/golang/curves/bls12381/g2/curve.go
@@ -40,17 +40,10 @@ func (p *G2Projective) FromLimbs(x, y, z []uint32) G2Projective {
 }

 func (p *G2Projective) FromAffine(a G2Affine) G2Projective {
-	z := G2BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(p))
+	C.bls12_381_g2_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *G2Projective) ProjectiveToAffine() G2Affine {
 	var a G2Affine

 	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
-	cP := (*C.g2_projective_t)(unsafe.Pointer(&p))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(p))
 	C.bls12_381_g2_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,13 @@ func (a *G2Affine) FromLimbs(x, y []uint32) G2Affine {
 }

 func (a G2Affine) ToProjective() G2Projective {
-	var z G2BaseField
+	var p G2Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p G2Projective
-		return p.Zero()
-	}
+	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(&p))
+	C.bls12_381_g2_from_affine(cA, cP)
+	return p

-	return G2Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
 }

 func G2AffineFromProjective(p *G2Projective) G2Affine {
--- a/wrappers/golang/curves/bls12381/g2/include/curve.h
+++ b/wrappers/golang/curves/bls12381/g2/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool bls12_381_g2_eq(g2_projective_t* point1, g2_projective_t* point2);
 void bls12_381_g2_to_affine(g2_projective_t* point, g2_affine_t* point_out);
+void bls12_381_g2_from_affine(g2_affine_t* point, g2_projective_t* point_out);
 void bls12_381_g2_generate_projective_points(g2_projective_t* points, int size);
 void bls12_381_g2_generate_affine_points(g2_affine_t* points, int size);
 cudaError_t bls12_381_g2_affine_convert_montgomery(g2_affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/curves/bls12381/include/curve.h
+++ b/wrappers/golang/curves/bls12381/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool bls12_381_eq(projective_t* point1, projective_t* point2);
 void bls12_381_to_affine(projective_t* point, affine_t* point_out);
+void bls12_381_from_affine(affine_t* point, projective_t* point_out);
 void bls12_381_generate_projective_points(projective_t* points, int size);
 void bls12_381_generate_affine_points(affine_t* points, int size);
 cudaError_t bls12_381_affine_convert_montgomery(affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/curves/bn254/curve.go
+++ b/wrappers/golang/curves/bn254/curve.go
@@ -40,17 +40,10 @@ func (p *Projective) FromLimbs(x, y, z []uint32) Projective {
 }

 func (p *Projective) FromAffine(a Affine) Projective {
-	z := BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
+	C.bn254_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *Projective) ProjectiveToAffine() Affine {
 	var a Affine

 	cA := (*C.affine_t)(unsafe.Pointer(&a))
-	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
 	C.bn254_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,13 @@ func (a *Affine) FromLimbs(x, y []uint32) Affine {
 }

 func (a Affine) ToProjective() Projective {
-	var z BaseField
+	var p Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p Projective
-		return p.Zero()
-	}
+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	C.bn254_from_affine(cA, cP)
+	return p

-	return Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
 }

 func AffineFromProjective(p *Projective) Affine {
--- a/wrappers/golang/curves/bn254/g2/curve.go
+++ b/wrappers/golang/curves/bn254/g2/curve.go
@@ -40,17 +40,10 @@ func (p *G2Projective) FromLimbs(x, y, z []uint32) G2Projective {
 }

 func (p *G2Projective) FromAffine(a G2Affine) G2Projective {
-	z := G2BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(p))
+	C.bn254_g2_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *G2Projective) ProjectiveToAffine() G2Affine {
 	var a G2Affine

 	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
-	cP := (*C.g2_projective_t)(unsafe.Pointer(&p))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(p))
 	C.bn254_g2_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,12 @@ func (a *G2Affine) FromLimbs(x, y []uint32) G2Affine {
 }

 func (a G2Affine) ToProjective() G2Projective {
-	var z G2BaseField
+	var p G2Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p G2Projective
-		return p.Zero()
-	}
-
-	return G2Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
+	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(&p))
+	C.bn254_g2_from_affine(cA, cP)
+	return p
 }

 func G2AffineFromProjective(p *G2Projective) G2Affine {
--- a/wrappers/golang/curves/bn254/g2/include/curve.h
+++ b/wrappers/golang/curves/bn254/g2/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool bn254_g2_eq(g2_projective_t* point1, g2_projective_t* point2);
 void bn254_g2_to_affine(g2_projective_t* point, g2_affine_t* point_out);
+void bn254_g2_from_affine(g2_affine_t* point, g2_projective_t* point_out);
 void bn254_g2_generate_projective_points(g2_projective_t* points, int size);
 void bn254_g2_generate_affine_points(g2_affine_t* points, int size);
 cudaError_t bn254_g2_affine_convert_montgomery(g2_affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/curves/bn254/include/curve.h
+++ b/wrappers/golang/curves/bn254/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool bn254_eq(projective_t* point1, projective_t* point2);
 void bn254_to_affine(projective_t* point, affine_t* point_out);
+void bn254_from_affine(affine_t* point, projective_t* point_out);
 void bn254_generate_projective_points(projective_t* points, int size);
 void bn254_generate_affine_points(affine_t* points, int size);
 cudaError_t bn254_affine_convert_montgomery(affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/curves/bw6761/curve.go
+++ b/wrappers/golang/curves/bw6761/curve.go
@@ -40,17 +40,10 @@ func (p *Projective) FromLimbs(x, y, z []uint32) Projective {
 }

 func (p *Projective) FromAffine(a Affine) Projective {
-	z := BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
+	C.bw6_761_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *Projective) ProjectiveToAffine() Affine {
 	var a Affine

 	cA := (*C.affine_t)(unsafe.Pointer(&a))
-	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
 	C.bw6_761_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,13 @@ func (a *Affine) FromLimbs(x, y []uint32) Affine {
 }

 func (a Affine) ToProjective() Projective {
-	var z BaseField
+	var p Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p Projective
-		return p.Zero()
-	}
+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	C.bw6_761_from_affine(cA, cP)
+	return p

-	return Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
 }

 func AffineFromProjective(p *Projective) Affine {
--- a/wrappers/golang/curves/bw6761/g2/curve.go
+++ b/wrappers/golang/curves/bw6761/g2/curve.go
@@ -40,17 +40,10 @@ func (p *G2Projective) FromLimbs(x, y, z []uint32) G2Projective {
 }

 func (p *G2Projective) FromAffine(a G2Affine) G2Projective {
-	z := G2BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(p))
+	C.bw6_761_g2_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *G2Projective) ProjectiveToAffine() G2Affine {
 	var a G2Affine

 	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
-	cP := (*C.g2_projective_t)(unsafe.Pointer(&p))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(p))
 	C.bw6_761_g2_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,12 @@ func (a *G2Affine) FromLimbs(x, y []uint32) G2Affine {
 }

 func (a G2Affine) ToProjective() G2Projective {
-	var z G2BaseField
+	var p G2Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p G2Projective
-		return p.Zero()
-	}
-
-	return G2Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
+	cA := (*C.g2_affine_t)(unsafe.Pointer(&a))
+	cP := (*C.g2_projective_t)(unsafe.Pointer(&p))
+	C.bw6_761_g2_from_affine(cA, cP)
+	return p
 }

 func G2AffineFromProjective(p *G2Projective) G2Affine {
--- a/wrappers/golang/curves/bw6761/g2/include/curve.h
+++ b/wrappers/golang/curves/bw6761/g2/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool bw6_761_g2_eq(g2_projective_t* point1, g2_projective_t* point2);
 void bw6_761_g2_to_affine(g2_projective_t* point, g2_affine_t* point_out);
+void bw6_761_g2_from_affine(g2_affine_t* point, g2_projective_t* point_out);
 void bw6_761_g2_generate_projective_points(g2_projective_t* points, int size);
 void bw6_761_g2_generate_affine_points(g2_affine_t* points, int size);
 cudaError_t bw6_761_g2_affine_convert_montgomery(g2_affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/curves/bw6761/include/curve.h
+++ b/wrappers/golang/curves/bw6761/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool bw6_761_eq(projective_t* point1, projective_t* point2);
 void bw6_761_to_affine(projective_t* point, affine_t* point_out);
+void bw6_761_from_affine(affine_t* point, projective_t* point_out);
 void bw6_761_generate_projective_points(projective_t* points, int size);
 void bw6_761_generate_affine_points(affine_t* points, int size);
 cudaError_t bw6_761_affine_convert_montgomery(affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/curves/grumpkin/curve.go
+++ b/wrappers/golang/curves/grumpkin/curve.go
@@ -40,17 +40,10 @@ func (p *Projective) FromLimbs(x, y, z []uint32) Projective {
 }

 func (p *Projective) FromAffine(a Affine) Projective {
-	z := BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	} else {
-		p.X = a.X
-		p.Y = a.Y
-		p.Z = z.One()
-	}

+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
+	C.grumpkin_from_affine(cA, cP)
 	return *p
 }

@@ -65,7 +58,7 @@ func (p *Projective) ProjectiveToAffine() Affine {
 	var a Affine

 	cA := (*C.affine_t)(unsafe.Pointer(&a))
-	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	cP := (*C.projective_t)(unsafe.Pointer(p))
 	C.grumpkin_to_affine(cP, cA)
 	return a
 }
@@ -111,18 +104,13 @@ func (a *Affine) FromLimbs(x, y []uint32) Affine {
 }

 func (a Affine) ToProjective() Projective {
-	var z BaseField
+	var p Projective

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		var p Projective
-		return p.Zero()
-	}
+	cA := (*C.affine_t)(unsafe.Pointer(&a))
+	cP := (*C.projective_t)(unsafe.Pointer(&p))
+	C.grumpkin_from_affine(cA, cP)
+	return p

-	return Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
 }

 func AffineFromProjective(p *Projective) Affine {
--- a/wrappers/golang/curves/grumpkin/include/curve.h
+++ b/wrappers/golang/curves/grumpkin/include/curve.h
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool grumpkin_eq(projective_t* point1, projective_t* point2);
 void grumpkin_to_affine(projective_t* point, affine_t* point_out);
+void grumpkin_from_affine(affine_t* point, projective_t* point_out);
 void grumpkin_generate_projective_points(projective_t* points, int size);
 void grumpkin_generate_affine_points(affine_t* points, int size);
 cudaError_t grumpkin_affine_convert_montgomery(affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/golang/internal/generator/curves/templates/curve.go.tmpl
+++ b/wrappers/golang/internal/generator/curves/templates/curve.go.tmpl
@@ -39,21 +39,17 @@ func (p *{{.CurvePrefix}}Projective) FromLimbs(x, y, z []uint32) {{.CurvePrefix}
 	return *p
 }

+
+
+{{if ne .CurvePrefix "Mock"}}
 func (p *{{.CurvePrefix}}Projective) FromAffine(a {{.CurvePrefix}}Affine) {{.CurvePrefix}}Projective {
-	z := {{.CurvePrefix}}BaseField{}
-	z.One()
-
-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-		p.Zero()
-	}else{
-	p.X = a.X
-	p.Y = a.Y
-	p.Z = z.One()
-	}
-
+	
+	cA := (*C.{{toCName .CurvePrefix}}affine_t)(unsafe.Pointer(&a))
+	cP := (*C.{{toCName .CurvePrefix}}projective_t)(unsafe.Pointer(p))
+	C.{{.Curve}}{{toCNameBackwards .CurvePrefix}}_from_affine(cA, cP)
 	return *p
 }
-{{if ne .CurvePrefix "Mock"}}
+
 func (p {{.CurvePrefix}}Projective) ProjectiveEq(p2 *{{.CurvePrefix}}Projective) bool {
 	cP := (*C.{{toCName .CurvePrefix}}projective_t)(unsafe.Pointer(&p))
 	cP2 := (*C.{{toCName .CurvePrefix}}projective_t)(unsafe.Pointer(&p2))
@@ -65,7 +61,7 @@ func (p *{{.CurvePrefix}}Projective) ProjectiveToAffine() {{.CurvePrefix}}Affine
 	var a {{.CurvePrefix}}Affine

 	cA := (*C.{{toCName .CurvePrefix}}affine_t)(unsafe.Pointer(&a))
-	cP := (*C.{{toCName .CurvePrefix}}projective_t)(unsafe.Pointer(&p))
+	cP := (*C.{{toCName .CurvePrefix}}projective_t)(unsafe.Pointer(p))
 	C.{{.Curve}}{{toCNameBackwards .CurvePrefix}}_to_affine(cP, cA)
 	return a
 }
@@ -110,21 +106,17 @@ func (a *{{.CurvePrefix}}Affine) FromLimbs(x, y []uint32) {{.CurvePrefix}}Affine
 	return *a
 }

-func (a {{.CurvePrefix}}Affine) ToProjective() {{.CurvePrefix}}Projective {
-	var z {{.CurvePrefix}}BaseField

-	if (a.X == z.Zero()) && (a.Y == z.Zero()) {
-        var p {{.CurvePrefix}}Projective
-		return p.Zero()
-	}
-
-	return {{.CurvePrefix}}Projective{
-		X: a.X,
-		Y: a.Y,
-		Z: z.One(),
-	}
-}
 {{if ne .CurvePrefix "Mock"}}
+func (a {{.CurvePrefix}}Affine) ToProjective() {{.CurvePrefix}}Projective {
+	var p {{.CurvePrefix}}Projective
+
+	cA := (*C.{{toCName .CurvePrefix}}affine_t)(unsafe.Pointer(&a))
+	cP := (*C.{{toCName .CurvePrefix}}projective_t)(unsafe.Pointer(&p))
+	C.{{.Curve}}{{toCNameBackwards .CurvePrefix}}_from_affine(cA, cP)
+	return p
+}
+
 func {{.CurvePrefix}}AffineFromProjective(p *{{.CurvePrefix}}Projective) {{.CurvePrefix}}Affine {
 	return p.ProjectiveToAffine()
 }
--- a/wrappers/golang/internal/generator/curves/templates/curve.h.tmpl
+++ b/wrappers/golang/internal/generator/curves/templates/curve.h.tmpl
@@ -14,6 +14,7 @@ typedef struct DeviceContext DeviceContext;

 bool {{.Curve}}{{toCNameBackwards .CurvePrefix}}_eq({{toCName .CurvePrefix}}projective_t* point1, {{toCName .CurvePrefix}}projective_t* point2);
 void {{.Curve}}{{toCNameBackwards .CurvePrefix}}_to_affine({{toCName .CurvePrefix}}projective_t* point, {{toCName .CurvePrefix}}affine_t* point_out);
+void {{.Curve}}{{toCNameBackwards .CurvePrefix}}_from_affine({{toCName .CurvePrefix}}affine_t* point, {{toCName .CurvePrefix}}projective_t* point_out);
 void {{.Curve}}{{toCNameBackwards .CurvePrefix}}_generate_projective_points({{toCName .CurvePrefix}}projective_t* points, int size);
 void {{.Curve}}{{toCNameBackwards .CurvePrefix}}_generate_affine_points({{toCName .CurvePrefix}}affine_t* points, int size);
 cudaError_t {{.Curve}}{{toCNameBackwards .CurvePrefix}}_affine_convert_montgomery({{toCName .CurvePrefix}}affine_t* points, size_t n, bool is_into, DeviceContext* ctx);
--- a/wrappers/rust/Cargo.toml
+++ b/wrappers/rust/Cargo.toml
@@ -18,7 +18,7 @@ exclude = [
 ]

 [workspace.package]
-version = "2.7.1"
+version = "2.8.0"
 edition = "2021"
 authors = [ "Ingonyama" ]
 homepage = "https://www.ingonyama.com"
--- a/wrappers/rust/icicle-core/src/curve.rs
+++ b/wrappers/rust/icicle-core/src/curve.rs
@@ -22,6 +22,8 @@ pub trait Curve: Debug + PartialEq + Copy + Clone {
    #[doc(hidden)]
    fn to_affine(point: *const Projective<Self>, point_aff: *mut Affine<Self>);
    #[doc(hidden)]
+    fn from_affine(point: *const Affine<Self>, point_proj: *mut Projective<Self>);
+    #[doc(hidden)]
    fn generate_random_projective_points(size: usize) -> Vec<Projective<Self>>;
    #[doc(hidden)]
    fn generate_random_affine_points(size: usize) -> Vec<Affine<Self>>;
@@ -79,27 +81,17 @@ impl<C: Curve> Affine<C> {
    }

    pub fn to_projective(&self) -> Projective<C> {
-        if *self == Self::zero() {
-            return Projective::<C>::zero();
-        }
-        Projective {
-            x: self.x,
-            y: self.y,
-            z: C::BaseField::one(),
-        }
+        let mut proj = Projective::<C>::zero();
+        C::from_affine(self as *const Self, &mut proj as *mut Projective<C>);
+        proj
    }
 }

 impl<C: Curve> From<Affine<C>> for Projective<C> {
    fn from(item: Affine<C>) -> Self {
-        if item == (Affine::<C>::zero()) {
-            return Self::zero();
-        }
-        Self {
-            x: item.x,
-            y: item.y,
-            z: C::BaseField::one(),
-        }
+        let mut proj = Self::zero();
+        C::from_affine(&item as *const Affine<C>, &mut proj as *mut Self);
+        proj
    }
 }

@@ -282,6 +274,8 @@ macro_rules! impl_curve {
                pub(crate) fn eq(point1: *const $projective_type, point2: *const $projective_type) -> bool;
                #[link_name = concat!($curve_prefix, "_to_affine")]
                pub(crate) fn proj_to_affine(point: *const $projective_type, point_out: *mut $affine_type);
+                #[link_name = concat!($curve_prefix, "_from_affine")]
+                pub(crate) fn proj_from_affine(point: *const $affine_type, point_out: *mut $projective_type);
                #[link_name = concat!($curve_prefix, "_generate_projective_points")]
                pub(crate) fn generate_projective_points(points: *mut $projective_type, size: usize);
                #[link_name = concat!($curve_prefix, "_generate_affine_points")]
@@ -315,6 +309,10 @@ macro_rules! impl_curve {
                unsafe { $curve_prefix_ident::proj_to_affine(point, point_out) };
            }

+            fn from_affine(point: *const $affine_type, point_out: *mut $projective_type) {
+                unsafe { $curve_prefix_ident::proj_from_affine(point, point_out) };
+            }
+
            fn generate_random_projective_points(size: usize) -> Vec<$projective_type> {
                let mut res = vec![$projective_type::zero(); size];
                unsafe {
--- a/wrappers/rust/icicle-curves/icicle-bls12-377/src/tree/mod.rs
+++ b/wrappers/rust/icicle-curves/icicle-bls12-377/src/tree/mod.rs
@@ -24,6 +24,6 @@ pub(crate) mod tests {
        let ctx = device_context::DeviceContext::default();
        let sponge = Poseidon::load(2, &ctx).unwrap();

-        check_build_field_merkle_tree::<_, _, Bls12_377TreeBuilder>(25, 2, &sponge, &sponge, ScalarField::zero());
+        check_build_field_merkle_tree::<_, _, Bls12_377TreeBuilder>(18, 2, &sponge, &sponge, ScalarField::zero());
    }
 }
--- a/wrappers/rust/icicle-curves/icicle-bls12-381/src/tree/mod.rs
+++ b/wrappers/rust/icicle-curves/icicle-bls12-381/src/tree/mod.rs
@@ -24,6 +24,6 @@ pub(crate) mod tests {
        let ctx = device_context::DeviceContext::default();
        let sponge = Poseidon::load(2, &ctx).unwrap();

-        check_build_field_merkle_tree::<_, _, Bls12_381TreeBuilder>(25, 2, &sponge, &sponge, ScalarField::zero());
+        check_build_field_merkle_tree::<_, _, Bls12_381TreeBuilder>(18, 2, &sponge, &sponge, ScalarField::zero());
    }
 }
--- a/wrappers/rust/icicle-hash/src/keccak/mod.rs
+++ b/wrappers/rust/icicle-hash/src/keccak/mod.rs
@@ -25,6 +25,22 @@ extern "C" {
        config: &HashConfig,
    ) -> CudaError;

+    pub(crate) fn sha3_256_cuda(
+        input: *const u8,
+        input_block_size: u32,
+        number_of_blocks: u32,
+        output: *mut u8,
+        config: &HashConfig,
+    ) -> CudaError;
+
+    pub(crate) fn sha3_512_cuda(
+        input: *const u8,
+        input_block_size: u32,
+        number_of_blocks: u32,
+        output: *mut u8,
+        config: &HashConfig,
+    ) -> CudaError;
+
    pub(crate) fn build_keccak256_merkle_tree_cuda(
        leaves: *const u8,
        digests: *mut u64,
@@ -40,6 +56,22 @@ extern "C" {
        input_block_len: u32,
        config: &TreeBuilderConfig,
    ) -> CudaError;
+
+    pub(crate) fn build_sha3_256_merkle_tree_cuda(
+        leaves: *const u8,
+        digests: *mut u64,
+        height: u32,
+        input_block_len: u32,
+        config: &TreeBuilderConfig,
+    ) -> CudaError;
+
+    pub(crate) fn build_sha3_512_merkle_tree_cuda(
+        leaves: *const u8,
+        digests: *mut u64,
+        height: u32,
+        input_block_len: u32,
+        config: &TreeBuilderConfig,
+    ) -> CudaError;
 }

 pub fn keccak256(
@@ -86,6 +118,50 @@ pub fn keccak512(
    }
 }

+pub fn sha3_256(
+    input: &(impl HostOrDeviceSlice<u8> + ?Sized),
+    input_block_size: u32,
+    number_of_blocks: u32,
+    output: &mut (impl HostOrDeviceSlice<u8> + ?Sized),
+    config: &HashConfig,
+) -> IcicleResult<()> {
+    let mut local_cfg = config.clone();
+    local_cfg.are_inputs_on_device = input.is_on_device();
+    local_cfg.are_outputs_on_device = output.is_on_device();
+    unsafe {
+        sha3_256_cuda(
+            input.as_ptr(),
+            input_block_size,
+            number_of_blocks,
+            output.as_mut_ptr(),
+            &local_cfg,
+        )
+        .wrap()
+    }
+}
+
+pub fn sha3_512(
+    input: &(impl HostOrDeviceSlice<u8> + ?Sized),
+    input_block_size: u32,
+    number_of_blocks: u32,
+    output: &mut (impl HostOrDeviceSlice<u8> + ?Sized),
+    config: &HashConfig,
+) -> IcicleResult<()> {
+    let mut local_cfg = config.clone();
+    local_cfg.are_inputs_on_device = input.is_on_device();
+    local_cfg.are_outputs_on_device = output.is_on_device();
+    unsafe {
+        sha3_512_cuda(
+            input.as_ptr(),
+            input_block_size,
+            number_of_blocks,
+            output.as_mut_ptr(),
+            &local_cfg,
+        )
+        .wrap()
+    }
+}
+
 pub fn build_keccak256_merkle_tree(
    leaves: &(impl HostOrDeviceSlice<u8> + ?Sized),
    digests: &mut (impl HostOrDeviceSlice<u64> + ?Sized),
@@ -123,3 +199,41 @@ pub fn build_keccak512_merkle_tree(
        .wrap()
    }
 }
+
+pub fn build_sha3_256_merkle_tree(
+    leaves: &(impl HostOrDeviceSlice<u8> + ?Sized),
+    digests: &mut (impl HostOrDeviceSlice<u64> + ?Sized),
+    height: usize,
+    input_block_len: usize,
+    config: &TreeBuilderConfig,
+) -> IcicleResult<()> {
+    unsafe {
+        build_sha3_256_merkle_tree_cuda(
+            leaves.as_ptr(),
+            digests.as_mut_ptr(),
+            height as u32,
+            input_block_len as u32,
+            config,
+        )
+        .wrap()
+    }
+}
+
+pub fn build_sha3_512_merkle_tree(
+    leaves: &(impl HostOrDeviceSlice<u8> + ?Sized),
+    digests: &mut (impl HostOrDeviceSlice<u64> + ?Sized),
+    height: usize,
+    input_block_len: usize,
+    config: &TreeBuilderConfig,
+) -> IcicleResult<()> {
+    unsafe {
+        build_sha3_512_merkle_tree_cuda(
+            leaves.as_ptr(),
+            digests.as_mut_ptr(),
+            height as u32,
+            input_block_len as u32,
+            config,
+        )
+        .wrap()
+    }
+}
--- a/wrappers/rust/icicle-hash/src/keccak/tests.rs
+++ b/wrappers/rust/icicle-hash/src/keccak/tests.rs
@@ -15,7 +15,7 @@ pub(crate) mod tests {
        let number_of_hashes = 1024;

        let preimages = vec![1u8; number_of_hashes * input_block_len];
-        let mut digests = vec![0u8; number_of_hashes * 64];
+        let mut digests = vec![0u8; number_of_hashes * 32];

        let preimages_slice = HostSlice::from_slice(&preimages);
        let digests_slice = HostSlice::from_mut_slice(&mut digests);
Author	SHA1	Message	Date
Ethan-000	621676bd41	feat: add `num_bits()` function (#570 ) ## Describe the changes adding a `num_bits()` function similar to `dcf73a5f96/ff/src/biginteger/mod.rs (L482)` this could be useful for small field optimizations ## Linked Issues Resolves #	2024-08-07 09:37:16 +03:00
Otsar	badb8c5d68	Removed ZK containers from docs sidebar (#571 ) ## Describe the changes This PR... ## Linked Issues Resolves #	2024-08-04 18:38:37 +03:00
Otsar	1300434bbe	Removed ZK containers from docs sidebar	2024-08-04 11:14:06 +03:00
yshekel	6a67893773	remove the recommnedation to use zk-contariners in examples (#569 )	2024-08-01 14:58:02 +03:00
ChickenLover	0cb0b49be9	Add Sha3 (#560 ) ## Describe the changes This PR... ## Linked Issues Resolves #	2024-07-28 15:31:28 +07:00
Vlad	8411ed1451	Feat/vlad/refactor from affine (#554 ) ## Describe the changes This PR refactors the different affine to projective conversion functions using the C function also small bug fix for ProjectiveToAffine() function in Go ## Linked Issues Resolves #	2024-07-22 10:37:24 +02:00
omahs	53f34aade5	Fix typos (#558 ) Fix typos	2024-07-18 11:58:04 +03:00
release-bot	aacec3f72f	Bump rust crates' version icicle-babybear@2.8.0 icicle-bls12-377@2.8.0 icicle-bls12-381@2.8.0 icicle-bn254@2.8.0 icicle-bw6-761@2.8.0 icicle-core@2.8.0 icicle-cuda-runtime@2.8.0 icicle-grumpkin@2.8.0 icicle-hash@2.8.0 icicle-m31@2.8.0 icicle-stark252@2.8.0 Generated by cargo-workspaces	2024-07-16 13:57:56 +00:00
ChickenLover	a8fa05d0e3	Feat/roman/hash docs (#556 ) ## Describe the changes This PR... ## Linked Issues Resolves # --------- Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>	2024-07-16 16:39:35 +03:00
Vlad	877018c84c	more go fmt	2024-07-15 16:55:40 +02:00
Vlad	91ac666e06	Merge branch 'feat/vlad/refactor-from-affine' of github.com:ingonyama-zk/icicle into feat/vlad/refactor-from-affine	2024-07-15 16:48:25 +02:00
Vlad	46e6c20440	go fmt	2024-07-15 16:47:40 +02:00
Vlad	e4eda8938d	go fmt	2024-07-05 21:29:44 +02:00
Vlad	fb707d5350	Merge branch 'main' into feat/vlad/refactor-from-affine	2024-07-05 15:40:34 +02:00
Vlad	6336e74d5a	refactor from_affine with C link	2024-07-04 11:03:58 +02:00
Vlad	279cdc66e0	generated go files	2024-07-03 10:41:32 +02:00
Vlad	81644fc28c	use zero method of projective in toProjective Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>	2024-07-03 10:37:02 +02:00
Vlad	17732ea013	use zero method of projective in fromAffine Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>	2024-07-03 10:36:14 +02:00
Vlad	9e057c835d	fixed to_projective in rust	2024-07-03 09:18:41 +02:00
Vlad	f08b5bb49d	fixed fromAffine and toProj in golang	2024-07-03 09:07:43 +02:00