[DOCS]: Tidy up docs (#502)

## Describe the changes This PR tidies up docs and updates golang build instructions
2026-01-08 23:17:54 -05:00 · 2024-05-06 15:35:19 +03:00
parent 34f0212c0d
commit 6134cfe177
26 changed files with 489 additions and 397 deletions
--- a/docs/docs/icicle/core.md
+++ b/docs/docs/icicle/core.md
@@ -2,34 +2,54 @@

 ICICLE Core is a library written in C++/CUDA. All the ICICLE primitives are implemented within ICICLE Core.

-The Core is split into logical modules that can be compiled into static libraries using different [strategies](#compilation-strategies). You can then [link](#linking) these libraries with your C++ project or write your own [bindings](#writing-new-bindings-for-icicle) for other programming languages. If you want to use ICICLE with existing bindings please refer to [Rust](/icicle/rust-bindings) / [Golang](/icicle/golang-bindings).
+The Core is split into logical modules that can be compiled into static libraries using different [strategies](#compilation-strategies). You can then [link](#linking) these libraries with your C++ project or write your own [bindings](#writing-new-bindings-for-icicle) for other programming languages. If you want to use ICICLE with existing bindings please refer to the [Rust](/icicle/rust-bindings) or [Golang](/icicle/golang-bindings) bindings documentation.
+
+## Supported curves, fields and operations
+
+### Supported curves and operations
+
+| Operation\Curve | [bn254](https://neuromancer.sk/std/bn/bn254) | [bls12-377](https://neuromancer.sk/std/bls/BLS12-377) | [bls12-381](https://neuromancer.sk/std/bls/BLS12-381) | [bw6-761](https://eprint.iacr.org/2020/351) | grumpkin |
+| --- | :---: | :---: | :---: | :---: | :---: |
+| [MSM][MSM_DOCS] | ✅ | ✅ | ✅ | ✅ | ✅ |
+| G2  | ✅ | ✅ | ✅ | ✅ | ❌ |
+| [NTT][NTT_DOCS] | ✅ | ✅ | ✅ | ✅ | ❌ |
+| ECNTT | ✅ | ✅ | ✅ | ✅ | ❌ |
+| [VecOps][VECOPS_CODE] | ✅ | ✅ | ✅ | ✅ | ✅ |
+| [Polynomials][POLY_DOCS] | ✅ | ✅ | ✅ | ✅ | ❌ |
+| [Poseidon](primitives/poseidon) | ✅ | ✅ | ✅ | ✅ | ✅ |
+| [Merkle Tree](primitives/poseidon#the-tree-builder) | ✅ | ✅ | ✅ | ✅ | ✅ |
+
+### Supported fields and operations
+
+| Operation\Field | [babybear](https://eprint.iacr.org/2023/824.pdf) | [Stark252](https://docs.starknet.io/documentation/architecture_and_concepts/Cryptography/p-value/) |
+| --- | :---: | :---: |
+| [VecOps][VECOPS_CODE] | ✅ | ✅ |
+| [Polynomials][POLY_DOCS] | ✅ | ✅ |
+| [NTT][NTT_DOCS] | ✅ | ✅ |
+| Extension Field | ✅ | ❌ |
+
+### Supported hashes
+
+| Hash | Sizes |
+| --- | :---: |
+| Keccak | 256, 512 |

 ## Compilation strategies

-Most of the codebase is curve/field agnostic, which means it can be compiled for different curves and fields. When you build ICICLE Core you choose a single curve or field. If you need multiple curves or fields  - you just compile ICICLE into multiple static libraries. It's that simple. Currently, the following choices are supported:
+Most of the codebase is curve/field agnostic, which means it can be compiled for different curves and fields. When you build ICICLE Core you choose a single curve or field. If you need multiple curves or fields, you compile ICICLE once per curve or field that is needed. It's that simple. Currently, the following choices are supported:

- - [Field mode](#compiling-for-a-field) - used for STARK fields like BabyBear / Mersenne / Goldilocks. Includes field arithmetic, NTT, Poseidon, Extension fields and other primitives.
- - [Curve mode](#compiling-for-a-curve) - used for SNARK curves like BN254/ BLS curves / Grumpkin / etc. Curve mode is built upon field mode, so it includes everything that field does. It also includes curve operations / MSM / ECNTT / G2 and other curve-related primitives.
+- [Field mode][COMPILE_FIELD_MODE] - used for STARK fields like BabyBear / Mersenne / Goldilocks. Includes field arithmetic, NTT, Poseidon, Extension fields and other primitives.
+- [Curve mode][COMPILE_CURVE_MODE] - used for SNARK curves like BN254 / BLS curves / Grumpkin / etc. Curve mode is built upon field mode, so it includes everything that field does It also includes curve operations / MSM / ECNTT / G2 and other curve-related primitives.

 :::info

-If you only want to use curve's scalar/base field, you still need to go with a curve mode. You can disable MSM with [options](#compilation-options)
+If you only want to use a curve's scalar or base field, you still need to use curve mode. You can disable MSM with [options](#compilation-options)

 :::

 ### Compiling for a field

-ICICLE supports the following STARK fields:
- - [BabyBear](https://eprint.iacr.org/2023/824.pdf)
-
-Field mode includes:
- - [Field arithmetic](https://github.com/ingonyama-zk/icicle/blob/main/icicle/include/fields/field.cuh) - field multiplication, addition, subtraction
- - [NTT](icicle/primitives/ntt) - FFT / iFFT
- - [Poseidon Hash](icicle/primitives/poseidon)
- - [Vector operations](https://github.com/ingonyama-zk/icicle/blob/main/icicle/include/vec_ops/vec_ops.cuh)
- - [Polynomial](#) - structs and methods to work with polynomials
-
-You can compile ICICLE for a STARK field using this command:
+You can compile ICICLE for a field using this command:

 ```sh
 cd icicle
@@ -38,24 +58,10 @@ cmake -DFIELD=<FIELD> -S . -B build
 cmake --build build -j
 ```

-Icicle Supports the following `<FIELD>` FIELDS:
- `babybear`
-
 This command will output `libingo_field_<FIELD>.a` into `build/lib`.

 ### Compiling for a curve

-ICICLE supports the following SNARK curves:
- - [BN254](https://neuromancer.sk/std/bn/bn254)
- - [BLS12-377](https://neuromancer.sk/std/bls/BLS12-377)
- - [BLS12-381](https://neuromancer.sk/std/bls/BLS12-381)
- - [BW6-761](https://eprint.iacr.org/2020/351)
- - Grumpkin
-
-Curve mode includes everything you can find in field mode with addition of:
- - [MSM](icicle/primitives/msm) - MSM / Batched MSM
- - [ECNTT](#)
-
 :::note

 Field related primitives will be compiled for the scalar field of the curve
@@ -81,7 +87,7 @@ There exist multiple options that allow you to customize your build or enable ad

 #### EXT_FIELD

-Used only in a [field mode](#compiling-for-a-field) to add Extension field into a build. Adds NTT for the extension field.
+Used only in [field mode][COMPILE_FIELD_MODE] to add an Extension field. Adds all supported field operations for the extension field.

 Default: `OFF`

@@ -89,7 +95,7 @@ Usage: `-DEXT_FIELD=ON`

 #### G2

-Used only in a [curve mode](#compiling-for-a-curve) to add G2 definitions into a build. Also adds G2 MSM.
+Used only in [curve mode][COMPILE_CURVE_MODE] to add G2 definitions. Also adds G2 MSM.

 Default: `OFF`

@@ -97,7 +103,7 @@ Usage: `-DG2=ON`

 #### ECNTT

-Used only in a [curve mode](#compiling-for-a-curve) to add ECNTT function into a build.
+Used only in [curve mode][COMPILE_CURVE_MODE] to add ECNTT function.

 Default: `OFF`

@@ -105,7 +111,7 @@ Usage: `-DECNTT=ON`

 #### MSM

-Used only in a [curve mode](#compiling-for-a-curve) to add MSM function into a build. As MSM takes a lot of time to build, you can disable it with this option to reduce compilation time.
+Used only in [curve mode][COMPILE_CURVE_MODE] to add MSM function. As MSM takes a lot of time to build, you can disable it with this option to reduce compilation time.

 Default: `ON`

@@ -149,14 +155,13 @@ To link ICICLE with your project you first need to compile ICICLE with options o

 Refer to our [c++ examples](https://github.com/ingonyama-zk/icicle/tree/main/examples/c%2B%2B) for more info. Take a look at this [CMakeLists.txt](https://github.com/ingonyama-zk/icicle/blob/main/examples/c%2B%2B/msm/CMakeLists.txt#L22)

-
 ## Writing new bindings for ICICLE

 Since ICICLE Core is written in CUDA / C++ its really simple to generate static libraries. These static libraries can be installed on any system and called by higher level languages such as Golang.

 Static libraries can be loaded into memory once and used by multiple programs, reducing memory usage and potentially improving performance. They also allow you to separate functionality into distinct modules so your static library may need to compile only specific features that you want to use.

-Let's review the [Golang bindings](golang-bindings.md) since its a pretty verbose example (compared to rust which hides it pretty well) of using static libraries. Golang has a library named `CGO` which can be used to link static libraries. Here's a basic example on how you can use cgo to link these libraries:
+Let's review the [Golang bindings][GOLANG_BINDINGS] since its a pretty verbose example (compared to rust which hides it pretty well) of using static libraries. Golang has a library named `CGO` which can be used to link static libraries. Here's a basic example on how you can use cgo to link these libraries:

 ```go
 /*
@@ -178,4 +183,14 @@ func main() {

 The comments on the first line tell `CGO` which libraries to import as well as which header files to include. You can then call methods which are part of the static library and defined in the header file, `C.projective_from_affine_bn254` is an example.

-If you wish to create your own bindings for a language of your choice we suggest you start by investigating how you can call static libraries.
+If you wish to create your own bindings for a language of your choice we suggest you start by investigating how you can call static libraries.
+
+<!-- Begin Links -->
+[GOLANG_BINDINGS]: golang-bindings.md
+[COMPILE_CURVE_MODE]: #compiling-for-a-curve
+[COMPILE_FIELD_MODE]: #compiling-for-a-field
+[NTT_DOCS]: primitives/ntt
+[MSM_DOCS]: primitives/msm
+[POLY_DOCS]: polynomials/overview
+[VECOPS_CODE]: https://github.com/ingonyama-zk/icicle/blob/main/icicle/include/vec_ops/vec_ops.cuh
+<!-- End Links -->
--- a/docs/docs/icicle/golang-bindings.md
+++ b/docs/docs/icicle/golang-bindings.md
@@ -1,7 +1,7 @@
 # Golang bindings

 Golang bindings allow you to use ICICLE as a golang library.
-The source code for all Golang libraries can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang).
+The source code for all Golang packages can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang).

 The Golang bindings are comprised of multiple packages.

@@ -9,7 +9,7 @@ The Golang bindings are comprised of multiple packages.

 [`cuda-runtime`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/cuda_runtime) which defines abstractions for CUDA methods for allocating memory, initializing and managing streams, and `DeviceContext` which enables users to define and keep track of devices.

-Each curve has its own package which you can find [here](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/curves). If your project uses BN254 you only need to install that single package named [`bn254`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/curves/bn254).
+Each supported curve, field, and hash has its own package which you can find in the respective directories [here](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang). If your project uses BN254 you only need to import that single package named [`bn254`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/curves/bn254).

 ## Using ICICLE Golang bindings in your project

@@ -31,22 +31,30 @@ For a specific commit
 go get github.com/ingonyama-zk/icicle@<commit_id>
 ```

-To build the shared libraries you can run this script:
+To build the shared libraries you can run [this](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/build.sh) script:

-```bash
-./build.sh [-curve=<curve> | -field=<field>] [-cuda_version=<version>] [-g2] [-ecntt] [-devmode]
+```sh
+./build.sh [-curve=<curve>] [-field=<field>] [-hash=<hash>] [-cuda_version=<version>] [-g2] [-ecntt] [-devmode]
+
+curve - The name of the curve to build or "all" to build all supported curves
+field - The name of the field to build or "all" to build all supported fields
+hash - The name of the hash to build or "all" to build all supported hashes
+-g2 - Optional - build with G2 enabled 
+-ecntt - Optional - build with ECNTT enabled
+-devmode - Optional - build in devmode
+-help - Optional - Displays usage information
 ```
- **`curve`** - The name of the curve to build or "all" to build all curves
- **`field`** - The name of the field to build or "all" to build all fields
- **`g2`** - Optional - build with G2 enabled 
- **`ecntt`** - Optional - build with ECNTT enabled
- **`devmode`** - Optional - build in devmode
- Usage can be displayed with the flag `-help`
+
+:::note
+
+If more than one curve or more than one field or more than one hash is supplied, the last one supplied will be built
+
+:::

 To build ICICLE libraries for all supported curves with G2 and ECNTT enabled.

 ```bash
-./build.sh all -g2 -ecntt
+./build.sh -curve=all -g2 -ecntt
 ```

 If you wish to build for a specific curve, for example bn254, without G2 or ECNTT enabled.
@@ -73,11 +81,9 @@ import (
 To run all tests, for all curves:

 ```bash
-go test --tags=g2 ./... -count=1
+go test ./... -count=1
 ```

-If you dont want to include g2 tests then drop `--tags=g2`.
-
 If you wish to run test for a specific curve:

 ```bash
@@ -106,3 +112,25 @@ func main() {
 ```

 Replace `/path/to/shared/libs` with the actual path where the shared libraries are located on your system.
+
+## Supported curves, fields and operations
+
+### Supported curves and operations
+
+| Operation\Curve | bn254 | bls12_377 | bls12_381 | bw6-761 | grumpkin |
+| --- | :---: | :---: | :---: | :---: | :---: |
+| MSM | ✅ | ✅ | ✅ | ✅ | ✅ |
+| G2  | ✅ | ✅ | ✅ | ✅ | ❌ |
+| NTT | ✅ | ✅ | ✅ | ✅ | ❌ |
+| ECNTT | ✅ | ✅ | ✅ | ✅ | ❌ |
+| VecOps | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Polynomials | ✅ | ✅ | ✅ | ✅ | ❌ |
+
+### Supported fields and operations
+
+| Operation\Field | babybear |
+| --- | :---: |
+| VecOps | ✅ |
+| Polynomials | ✅ |
+| NTT | ✅ |
+| Extension Field | ✅ |
--- a/docs/docs/icicle/golang-bindings/ecntt.md
+++ b/docs/docs/icicle/golang-bindings/ecntt.md
@@ -1,9 +1,5 @@
 # ECNTT

-### Supported curves
-
-`bls12-377`, `bls12-381`, `bn254`
-
 ## ECNTT Method

 The `ECNtt[T any]()` function performs the Elliptic Curve Number Theoretic Transform (EC-NTT) on the input points slice, using the provided dir (direction), cfg (configuration), and stores the results in the results slice.
@@ -12,14 +8,13 @@ The `ECNtt[T any]()` function performs the Elliptic Curve Number Theoretic Trans
 func ECNtt[T any](points core.HostOrDeviceSlice, dir core.NTTDir, cfg *core.NTTConfig[T], results core.HostOrDeviceSlice) core.IcicleError
 ```

-### Parameters:
+### Parameters

 - **`points`**: A slice of elliptic curve points (in projective coordinates) that will be transformed. The slice can be stored on the host or the device, as indicated by the `core.HostOrDeviceSlice` type.
 - **`dir`**: The direction of the EC-NTT transform, either `core.KForward` or `core.KInverse`.
 - **`cfg`**: A pointer to an `NTTConfig` object, containing configuration options for the NTT operation.
 - **`results`**: A slice that will store the transformed elliptic curve points (in projective coordinates). The slice can be stored on the host or the device, as indicated by the `core.HostOrDeviceSlice` type.

-
 ### Return Value

 - **`CudaError`**: A `core.IcicleError` value, which will be `core.IcicleErrorCode(0)` if the EC-NTT operation was successful, or an error if something went wrong.
@@ -94,4 +89,4 @@ func Main() {
        panic("ECNTT operation failed")
    }
 }
-```
+```
--- a/docs/docs/icicle/golang-bindings/msm-pre-computation.md
+++ b/docs/docs/icicle/golang-bindings/msm-pre-computation.md
@@ -2,15 +2,11 @@

 To understand the theory behind MSM pre computation technique refer to Niall Emmart's [talk](https://youtu.be/KAWlySN7Hm8?feature=shared&t=1734).

-### Supported curves
-
-`bls12-377`, `bls12-381`, `bn254`, `bw6-761`, `grumpkin`
-
 ## Core package

-## MSM `PrecomputeBases`
+### MSM PrecomputeBases

-`PrecomputeBases` and `G2PrecomputeBases` exists for all supported curves. 
+`PrecomputeBases` and `G2PrecomputeBases` exists for all supported curves.

 #### Description

--- a/docs/docs/icicle/golang-bindings/msm.md
+++ b/docs/docs/icicle/golang-bindings/msm.md
@@ -1,62 +1,57 @@
 # MSM

-
-### Supported curves
-
-`bls12-377`, `bls12-381`, `bn254`, `bw6-761`, `grumpkin`
-
 ## MSM Example

 ```go
 package main

 import (
-	"github.com/ingonyama-zk/icicle/v2/wrappers/golang/core"
-	cr "github.com/ingonyama-zk/icicle/v2/wrappers/golang/cuda_runtime"
-	bn254 "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254"
+  "github.com/ingonyama-zk/icicle/v2/wrappers/golang/core"
+  cr "github.com/ingonyama-zk/icicle/v2/wrappers/golang/cuda_runtime"
+  bn254 "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254"
 )

 func main() {
-	// Obtain the default MSM configuration.
-	cfg := bn254.GetDefaultMSMConfig()
+  // Obtain the default MSM configuration.
+  cfg := bn254.GetDefaultMSMConfig()

-	// Define the size of the problem, here 2^18.
-	size := 1 << 18
+  // Define the size of the problem, here 2^18.
+  size := 1 << 18

-	// Generate scalars and points for the MSM operation.
-	scalars := bn254.GenerateScalars(size)
-	points := bn254.GenerateAffinePoints(size)
+  // Generate scalars and points for the MSM operation.
+  scalars := bn254.GenerateScalars(size)
+  points := bn254.GenerateAffinePoints(size)

-	// Create a CUDA stream for asynchronous operations.
-	stream, _ := cr.CreateStream()
-	var p bn254.Projective
+  // Create a CUDA stream for asynchronous operations.
+  stream, _ := cr.CreateStream()
+  var p bn254.Projective

-	// Allocate memory on the device for the result of the MSM operation.
-	var out core.DeviceSlice
-	_, e := out.MallocAsync(p.Size(), p.Size(), stream)
+  // Allocate memory on the device for the result of the MSM operation.
+  var out core.DeviceSlice
+  _, e := out.MallocAsync(p.Size(), p.Size(), stream)

-	if e != cr.CudaSuccess {
-		panic(e)
-	}
+  if e != cr.CudaSuccess {
+    panic(e)
+  }

-	// Set the CUDA stream in the MSM configuration.
-	cfg.Ctx.Stream = &stream
-	cfg.IsAsync = true
+  // Set the CUDA stream in the MSM configuration.
+  cfg.Ctx.Stream = &stream
+  cfg.IsAsync = true

-	// Perform the MSM operation.
-	e = bn254.Msm(scalars, points, &cfg, out)
+  // Perform the MSM operation.
+  e = bn254.Msm(scalars, points, &cfg, out)

-	if e != cr.CudaSuccess {
-		panic(e)
-	}
+  if e != cr.CudaSuccess {
+    panic(e)
+  }

-	// Allocate host memory for the results and copy the results from the device.
-	outHost := make(core.HostSlice[bn254.Projective], 1)
-	cr.SynchronizeStream(&stream)
-	outHost.CopyFromDevice(&out)
+  // Allocate host memory for the results and copy the results from the device.
+  outHost := make(core.HostSlice[bn254.Projective], 1)
+  cr.SynchronizeStream(&stream)
+  outHost.CopyFromDevice(&out)

-	// Free the device memory allocated for the results.
-	out.Free()
+  // Free the device memory allocated for the results.
+  out.Free()
 }

 ```
@@ -124,7 +119,6 @@ Use `GetDefaultMSMConfig` to obtain a default configuration, which can then be c
 func GetDefaultMSMConfig() MSMConfig
 ```

-
 ## How do I toggle between the supported algorithms?

 When creating your MSM Config you may state which algorithm you wish to use. `cfg.Ctx.IsBigTriangle = true` will activate Large triangle accumulation and `cfg.Ctx.IsBigTriangle = false` will activate Bucket accumulation.
@@ -161,13 +155,11 @@ out.Malloc(batchSize*p.Size(), p.Size())

 To activate G2 support first you must make sure you are building the static libraries with G2 feature enabled as described in the [Golang building instructions](../golang-bindings.md#using-icicle-golang-bindings-in-your-project).

-
-
 Now you may import `g2` package of the specified curve.

 ```go
 import (
-    "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bls254/g2"
+    "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254/g2"
 )
 ```

@@ -177,23 +169,23 @@ This package include `G2Projective` and `G2Affine` points as well as a `G2Msm` m
 package main

 import (
-	"github.com/ingonyama-zk/icicle/v2/wrappers/golang/core"
-	bn254 "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254"
-	g2 "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254/g2"
+  "github.com/ingonyama-zk/icicle/v2/wrappers/golang/core"
+  bn254 "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254"
+  g2 "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254/g2"
 )

 func main() {
-	cfg := bn254.GetDefaultMSMConfig()
-	size := 1 << 12
-	batchSize := 3
-	totalSize := size * batchSize
-	scalars := bn254.GenerateScalars(totalSize)
-	points := g2.G2GenerateAffinePoints(totalSize)
+  cfg := bn254.GetDefaultMSMConfig()
+  size := 1 << 12
+  batchSize := 3
+  totalSize := size * batchSize
+  scalars := bn254.GenerateScalars(totalSize)
+  points := g2.G2GenerateAffinePoints(totalSize)

-	var p g2.G2Projective
-	var out core.DeviceSlice
-	out.Malloc(batchSize*p.Size(), p.Size())
-	g2.G2Msm(scalars, points, &cfg, out)
+  var p g2.G2Projective
+  var out core.DeviceSlice
+  out.Malloc(batchSize*p.Size(), p.Size())
+  g2.G2Msm(scalars, points, &cfg, out)
 }

 ```
--- a/docs/docs/icicle/golang-bindings/multi-gpu.md
+++ b/docs/docs/icicle/golang-bindings/multi-gpu.md
@@ -2,8 +2,7 @@

 To learn more about the theory of Multi GPU programming refer to [this part](../multi-gpu.md) of documentation.

-Here we will cover the core multi GPU apis and a [example](#a-multi-gpu-example)
-
+Here we will cover the core multi GPU apis and an [example](#a-multi-gpu-example)

 ## A Multi GPU example

@@ -13,7 +12,6 @@ In this example we will display how you can
 2. For every GPU launch a thread and set an active device per thread.
 3. Execute a MSM on each GPU

-
 ```go
 package main

@@ -79,13 +77,13 @@ To streamline device management we offer as part of `cuda_runtime` package metho

 Runs a given function on a specific GPU device, ensuring that all CUDA calls within the function are executed on the selected device.

-In Go, most concurrency can be done via Goroutines. However, there is no guarantee that a goroutine stays on a specific host thread. 
+In Go, most concurrency can be done via Goroutines. However, there is no guarantee that a goroutine stays on a specific host thread.

-`RunOnDevice` was designed to solve this caveat and insure that the goroutine will stay on a specific host thread.
+`RunOnDevice` was designed to solve this caveat and ensure that the goroutine will stay on a specific host thread.

-`RunOnDevice` will lock a goroutine into a specific host thread, sets a current GPU device, runs a provided function, and unlocks the goroutine from the host thread after the provided function finishes.
+`RunOnDevice` locks a goroutine into a specific host thread, sets a current GPU device, runs a provided function, and unlocks the goroutine from the host thread after the provided function finishes.

-While the goroutine is locked to the host thread, the Go runtime will not assign other goroutine's to that host thread.
+While the goroutine is locked to the host thread, the Go runtime will not assign other goroutines to that host thread.

 **Parameters:**

@@ -96,7 +94,10 @@ While the goroutine is locked to the host thread, the Go runtime will not assign
 **Behavior:**

 - The function `funcToRun` is executed in a new goroutine that is locked to a specific OS thread to ensure that all CUDA calls within the function target the specified device.
- It's important to note that any goroutines launched within `funcToRun` are not automatically bound to the same GPU device. If necessary, `RunOnDevice` should be called again within such goroutines with the same `deviceId`.
+
+:::note
+Any goroutines launched within `funcToRun` are not automatically bound to the same GPU device. If necessary, `RunOnDevice` should be called again within such goroutines with the same `deviceId`.
+:::

 **Example:**

@@ -111,6 +112,10 @@ RunOnDevice(0, func(args ...any) {

 Sets the active device for the current host thread. All subsequent CUDA calls made from this thread will target the specified device.

+:::warning
+This function should not be used directly in conjunction with goroutines. If you want to run multi-gpu scenarios with goroutines you should use [RunOnDevice](#runondevice)
+:::
+
 **Parameters:**

 - **`device int`**: The ID of the device to set as the current device.
@@ -147,4 +152,4 @@ Retrieves the device associated with a given pointer.

 - **`int`**: The device ID associated with the memory pointed to by `ptr`.

-This documentation should provide a clear understanding of how to effectively manage multiple GPUs in Go applications using CUDA, with a particular emphasis on the `RunOnDevice` function for executing tasks on specific GPUs.
+This documentation should provide a clear understanding of how to effectively manage multiple GPUs in Go applications using CUDA, with a particular emphasis on the `RunOnDevice` function for executing tasks on specific GPUs.
--- a/docs/docs/icicle/golang-bindings/ntt.md
+++ b/docs/docs/icicle/golang-bindings/ntt.md
@@ -1,58 +1,54 @@
 # NTT

-### Supported curves
-
-`bls12-377`, `bls12-381`, `bn254`, `bw6-761`
-
 ## NTT Example

 ```go
 package main

 import (
-	"github.com/ingonyama-zk/icicle/v2/wrappers/golang/core"
-	cr "github.com/ingonyama-zk/icicle/v2/wrappers/golang/cuda_runtime"
-	bn254 "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254"
+  "github.com/ingonyama-zk/icicle/v2/wrappers/golang/core"
+  cr "github.com/ingonyama-zk/icicle/v2/wrappers/golang/cuda_runtime"
+  bn254 "github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bn254"

-	"github.com/consensys/gnark-crypto/ecc/bn254/fr/fft"
+  "github.com/consensys/gnark-crypto/ecc/bn254/fr/fft"
 )

 func init() {
-	cfg := bn254.GetDefaultNttConfig()
-	initDomain(18, cfg)
+  cfg := bn254.GetDefaultNttConfig()
+  initDomain(18, cfg)
 }

 func initDomain[T any](largestTestSize int, cfg core.NTTConfig[T]) core.IcicleError {
-	rouMont, _ := fft.Generator(uint64(1 << largestTestSize))
-	rou := rouMont.Bits()
-	rouIcicle := bn254.ScalarField{}
+  rouMont, _ := fft.Generator(uint64(1 << largestTestSize))
+  rou := rouMont.Bits()
+  rouIcicle := bn254.ScalarField{}

-	rouIcicle.FromLimbs(rou[:])
-	e := bn254.InitDomain(rouIcicle, cfg.Ctx, false)
-	return e
+  rouIcicle.FromLimbs(rou[:])
+  e := bn254.InitDomain(rouIcicle, cfg.Ctx, false)
+  return e
 }

 func main() {
-	// Obtain the default NTT configuration with a predefined coset generator.
-	cfg := bn254.GetDefaultNttConfig()
+  // Obtain the default NTT configuration with a predefined coset generator.
+  cfg := bn254.GetDefaultNttConfig()

-	// Define the size of the input scalars.
-	size := 1 << 18
+  // Define the size of the input scalars.
+  size := 1 << 18

-	// Generate scalars for the NTT operation.
-	scalars := bn254.GenerateScalars(size)
+  // Generate scalars for the NTT operation.
+  scalars := bn254.GenerateScalars(size)

-	// Set the direction of the NTT (forward or inverse).
-	dir := core.KForward
+  // Set the direction of the NTT (forward or inverse).
+  dir := core.KForward

-	// Allocate memory for the results of the NTT operation.
-	results := make(core.HostSlice[bn254.ScalarField], size)
+  // Allocate memory for the results of the NTT operation.
+  results := make(core.HostSlice[bn254.ScalarField], size)

-	// Perform the NTT operation.
-	err := bn254.Ntt(scalars, dir, &cfg, results)
-	if err.CudaErrorCode != cr.CudaSuccess {
-		panic("NTT operation failed")
-	}
+  // Perform the NTT operation.
+  err := bn254.Ntt(scalars, dir, &cfg, results)
+  if err.CudaErrorCode != cr.CudaSuccess {
+    panic("NTT operation failed")
+  }
 }
 ```

@@ -146,10 +142,10 @@ import (
 )

 func example() {
-    cfg := GetDefaultNttConfig()
-	err := ReleaseDomain(cfg.Ctx)
-    if err != nil {
-        // Handle the error
-    }
+  cfg := GetDefaultNttConfig()
+  err := ReleaseDomain(cfg.Ctx)
+  if err != nil {
+      // Handle the error
+  }
 }
 ```
--- a/docs/docs/icicle/golang-bindings/vec-ops.md
+++ b/docs/docs/icicle/golang-bindings/vec-ops.md
@@ -1,12 +1,14 @@
 # Vector Operations

 ## Overview
-Icicle is exposing a number of vector operations which a user can control:
+
+Icicle exposes a number of vector operations which a user can use:
+
 * The VecOps API provides efficient vector operations such as addition, subtraction, and multiplication.
 * MatrixTranspose API allows a user to perform a transpose on a vector representation of a matrix

-
 ## VecOps API Documentation
+
 ### Example

 #### Vector addition
@@ -183,4 +185,4 @@ if err.IcicleErrorCode != core.IcicleErrorCode(0) {
 // ...
 ```

-In this example, the `TransposeMatrix` function is used to transpose a 5x4 matrix stored in a 1D slice. The input and output slices are stored on the host (CPU), and the operation is executed synchronously.
+In this example, the `TransposeMatrix` function is used to transpose a 5x4 matrix stored in a 1D slice. The input and output slices are stored on the host (CPU), and the operation is executed synchronously.
--- a/docs/docs/icicle/introduction.md
+++ b/docs/docs/icicle/introduction.md
@@ -165,7 +165,36 @@ cargo bench

 #### ICICLE Golang

-Golang is WIP in v1, coming soon. Please checkout a previous [release v0.1.0](https://github.com/ingonyama-zk/icicle/releases/tag/v0.1.0) for golang bindings.
+The Golang bindings require compiling ICICLE Core first. We supply a [build script](https://github.com/ingonyama-zk/icicle/blob/main/wrappers/golang/build.sh) to help build what you need.
+
+Script usage:
+
+```sh
+./build.sh [-curve=<curve>] [-field=<field>] [-hash=<hash>] [-cuda_version=<version>] [-g2] [-ecntt] [-devmode]
+
+curve - The name of the curve to build or "all" to build all supported curves
+field - The name of the field to build or "all" to build all supported fields
+hash - The name of the hash to build or "all" to build all supported hashes
+-g2 - Optional - build with G2 enabled 
+-ecntt - Optional - build with ECNTT enabled
+-devmode - Optional - build in devmode
+```
+
+:::note
+
+If more than one curve or more than one field or more than one hash is supplied, the last one supplied will be built
+
+:::
+
+Once the library has been built, you can use and test the Golang bindings.
+
+To test a specific curve, field or hash, change to it's directory and then run:
+
+```sh
+go test ./tests -count=1 -failfast -timeout 60m -p 2 -v
+```
+
+You will be able to see each test that runs, how long it takes and whether it passed or failed

 ### Running ICICLE examples

@@ -185,8 +214,8 @@ Read through the compile.sh and CMakeLists.txt to understand how to link your ow

 :::

-
 #### Running with Docker
+
 In each example directory, ZK-container files are located in a subdirectory `.devcontainer`.

 ```sh
@@ -215,4 +244,4 @@ Inside the container you can run the same commands:
 ./run.sh
 ```

-You can now experiment with our other examples, perhaps try to run a rust or golang example next.
+You can now experiment with our other examples, perhaps try to run a rust or golang example next.
--- a/docs/docs/icicle/multi-gpu.md
+++ b/docs/docs/icicle/multi-gpu.md
@@ -2,7 +2,7 @@

 :::info

-If you are looking for the Multi GPU API documentation refer here for [Rust](./rust-bindings/multi-gpu.md).
+If you are looking for the Multi GPU API documentation refer [here](./rust-bindings/multi-gpu.md) for Rust and [here](./golang-bindings/multi-gpu.md) for Golang.

 :::

@@ -10,12 +10,11 @@ One common challenge with Zero-Knowledge computation is managing the large input

 Multi-GPU programming involves developing software to operate across multiple GPU devices. Lets first explore different approaches to Multi-GPU programming then we will cover how ICICLE allows you to easily develop youR ZK computations to run across many GPUs.

-
 ## Approaches to Multi GPU programming

 There are many [different strategies](https://github.com/NVIDIA/multi-gpu-programming-models) available for implementing multi GPU, however, it can be split into two categories.

-### GPU Server approach 
+### GPU Server approach

 This approach usually involves a single or multiple CPUs opening threads to read / write from multiple GPUs. You can think about it as a scaled up HOST - Device model.

@@ -23,8 +22,7 @@ This approach usually involves a single or multiple CPUs opening threads to read

 This approach won't let us tackle larger computation sizes but it will allow us to compute multiple computations which we wouldn't be able to load onto a single GPU.

-For example let's say that you had to compute two MSMs of size 2^26 on a 16GB VRAM GPU you would normally have to perform them asynchronously. However, if you double the number of GPUs in your system you can now run them in parallel. 
-
+For example let's say that you had to compute two MSMs of size 2^26 on a 16GB VRAM GPU you would normally have to perform them asynchronously. However, if you double the number of GPUs in your system you can now run them in parallel.

 ### Inter GPU approach

@@ -32,18 +30,17 @@ This approach involves a more sophisticated approach to multi GPU computation. U

 This approach requires redesigning the algorithm at the software level to be compatible with splitting amongst devices. In some cases, to lower latency to a minimum, special inter GPU connections would be installed on a server to allow direct communication between multiple GPUs.

-
-# Writing ICICLE Code for Multi GPUs
+## Writing ICICLE Code for Multi GPUs

 The approach we have taken for the moment is a GPU Server approach; we assume you have a machine with multiple GPUs and you wish to run some computation on each GPU.

 To dive deeper and learn about the API check out the docs for our different ICICLE API

 - [Rust Multi GPU APIs](./rust-bindings/multi-gpu.md)
+- [Golang Multi GPU APIs](./golang-bindings/multi-gpu.md)
 - C++ Multi GPU APIs

-
-## Best practices 
+## Best practices

 - Never hardcode device IDs, if you want your software to take advantage of all GPUs on a machine use methods such as `get_device_count` to support arbitrary number of GPUs.

@@ -57,7 +54,7 @@ Multi GPU support should work with ZK-Containers by simply defining which device
 docker run -it --gpus '"device=0,2"' zk-container-image
 ```

-If you wish to expose all GPUs 
+If you wish to expose all GPUs

 ```sh
 docker run --gpus all zk-container-image
--- a/docs/docs/icicle/overview.md
+++ b/docs/docs/icicle/overview.md
@@ -2,10 +2,6 @@

 [![GitHub Release](https://img.shields.io/github/v/release/ingonyama-zk/icicle)](https://github.com/ingonyama-zk/icicle/releases)

-
-
-
-
 [ICICLE](https://github.com/ingonyama-zk/icicle) is a cryptography library for ZK using GPUs. ICICLE implements blazing fast cryptographic primitives such as EC operations, MSM, NTT, Poseidon hash and more on GPU.

 ICICLE allows developers with minimal GPU experience to effortlessly accelerate their ZK application; from our experiments, even the most naive implementation may yield 10X improvement in proving times.
@@ -17,28 +13,26 @@ ICICLE has been used by many leading ZK companies such as [Celer Network](https:
 We understand that not all developers have access to a GPU and we don't want this to limit anyone from developing with ICICLE.
 Here are some ways we can help you gain access to GPUs:

+:::note
+
+If none of the following options suit your needs, contact us on [telegram](https://t.me/RealElan) for assistance. We're committed to ensuring that a lack of a GPU doesn't become a bottleneck for you. If you need help with setup or any other issues, we're here to help you.
+
+:::
+
 ### Grants

 At Ingonyama we are interested in accelerating the progress of ZK and cryptography. If you are an engineer, developer or an academic researcher we invite you to checkout [our grant program](https://www.ingonyama.com/blog/icicle-for-researchers-grants-challenges). We will give you access to GPUs and even pay you to do your dream research!

 ### Google Colab

-This is a great way to get started with ICICLE instantly. Google Colab offers free GPU access to a NVIDIA T4 instance, it's acquired with 16 GB of memory which should be enough for experimenting and even prototyping with ICICLE.
+This is a great way to get started with ICICLE instantly. Google Colab offers free GPU access to a NVIDIA T4 instance with 16 GB of memory which should be enough for experimenting and even prototyping with ICICLE.

 For an extensive guide on how to setup Google Colab with ICICLE refer to [this article](./colab-instructions.md).

-If none of these options are appropriate for you reach out to us on [telegram](https://t.me/RealElan) we will do our best to help you.
-
 ### Vast.ai

 [Vast.ai](https://vast.ai/) is a global GPU marketplace where you can rent many different types of GPUs by the hour for [competitive pricing](https://vast.ai/pricing). They provide on-demand and interruptible rentals depending on your need or use case; you can learn more about their rental types [here](https://vast.ai/faq#rental-types).

-:::note
-
-If none of these options suit your needs, contact us on [telegram](https://t.me/RealElan) for assistance. We're committed to ensuring that a lack of a GPU doesn't become a bottleneck for you. If you need help with setup or any other issues, we're here to do our best to help you.
-
-:::
-
 ## What can you do with ICICLE?

 [ICICLE](https://github.com/ingonyama-zk/icicle) can be used in the same way you would use any other cryptography library. While developing and integrating ICICLE into many proof systems, we found some use case categories:
--- a/docs/docs/icicle/polynomials/overview.md
+++ b/docs/docs/icicle/polynomials/overview.md
@@ -7,6 +7,7 @@ The Polynomial API offers a robust framework for polynomial operations within a
 ## Key Features

 ### Backend Agnostic Architecture
+
 Our API is structured to be independent of any specific computational backend. While a CUDA backend is currently implemented, the architecture facilitates easy integration of additional backends. This capability allows users to perform polynomial operations without the need to tailor their code to specific hardware, enhancing code portability and scalability.

 ### Templating in the Polynomial API
@@ -27,15 +28,19 @@ In this template:
 - **`Image`**: Defines the type of the output values of the polynomial. This is typically the same as the coefficients.

 #### Default instantiation
+
 ```cpp
 extern template class Polynomial<scalar_t>;
 ```

 #### Extended use cases
+
 The templated nature of the Polynomial API also supports more complex scenarios. For example, coefficients and images could be points on an elliptic curve (EC points), which are useful in cryptographic applications and advanced algebraic structures. This approach allows the API to be extended easily to support new algebraic constructions without modifying the core implementation.

 ### Supported Operations
+
 The Polynomial class encapsulates a polynomial, providing a variety of operations:
+
 - **Construction**: Create polynomials from coefficients or evaluations on roots-of-unity domains.
 - **Arithmetic Operations**: Perform addition, subtraction, multiplication, and division.
 - **Evaluation**: Directly evaluate polynomials at specific points or across a domain.
@@ -47,6 +52,7 @@ The Polynomial class encapsulates a polynomial, providing a variety of operation
 This section outlines how to use the Polynomial API in C++. Bindings for Rust and Go are detailed under the Bindings sections.

 ### Backend Initialization
+
 Initialization with an appropriate factory is required to configure the computational context and backend.

 ```cpp
@@ -57,10 +63,12 @@ Initialization with an appropriate factory is required to configure the computat
 Polynomial::initialize(std::make_shared<CUDAPolynomialFactory>());
 ```

-:::note Icicle is built to a library per field/curve. Initialization must be done per library. That is, applications linking to multiple curves/fields should do it per curve/field.
+:::note
+Initialization of a factory must be done per linked curve or field.
 :::

 ### Construction
+
 Polynomials can be constructed from coefficients, from evaluations on roots-of-unity domains, or by cloning existing polynomials.

 ```cpp
@@ -80,10 +88,11 @@ auto p_cloned = p.clone(); // p_cloned and p do not share memory
 ```

 :::note
-The coefficients or evaluations may be allocated either on host or device memory. In both cases the memory is copied to backend device.
+The coefficients or evaluations may be allocated either on host or device memory. In both cases the memory is copied to the backend device.
 :::

 ### Arithmetic
+
 Constructed polynomials can be used for various arithmetic operations:

 ```cpp
@@ -105,7 +114,8 @@ Polynomial operator%(const Polynomial& rhs) const; // returns remainder R(x)
 Polynomial divide_by_vanishing_polynomial(uint64_t degree) const; // sdivision by the vanishing polynomial V(x)=X^N-1
 ```

-#### Example:
+#### Example
+
 Given polynomials A(x),B(x),C(x) and V(x) the vanishing polynomial.

 $$
@@ -117,6 +127,7 @@ auto H = (A*B-C).divide_by_vanishing_polynomial(N);
 ```

 ### Evaluation
+
 Evaluate polynomials at arbitrary domain points or across a domain.

 ```cpp
@@ -138,7 +149,9 @@ auto evaluations = std::make_unique<scalar_t[]>(domain_size); // can be device m
 f.evaluate_on_domain(domain, domain_size, evaluations);
 ```

-:::note For special domains such as roots of unity this method is not the most efficient for two reasons:
+:::note
+For special domains such as roots of unity, this method is not the most efficient for two reasons:
+
 - Need to build the domain of size N.
 - The implementation is not trying to identify this special domain.

@@ -146,11 +159,12 @@ Therefore the computation is typically $O(n^2)$ rather than $O(nlogn)$.
 See the 'device views' section for more details.
 :::

-
 ### Manipulations
+
 Beyond arithmetic, the API supports efficient polynomial manipulations:

 #### Monomials
+
 ```cpp
 // Monomial operations
 Polynomial& add_monomial_inplace(Coeff monomial_coeff, uint64_t monomial = 0);
@@ -160,31 +174,35 @@ Polynomial& sub_monomial_inplace(Coeff monomial_coeff, uint64_t monomial = 0);
 The ability to add or subtract monomials directly and in-place is an efficient way to manipualte polynomials.

 Example:
+
 ```cpp
 f.add_monomial_in_place(scalar_t::from(5)); // f(x) += 5
 f.sub_monomial_in_place(scalar_t::from(3), 8); // f(x) -= 3x^8
 ```

 #### Computing the degree of a Polynomial
+
 ```cpp
 // Degree computation
 int64_t degree();
 ```

 The degree of a polynomial is a fundamental characteristic that describes the highest power of the variable in the polynomial expression with a non-zero coefficient.
-The `degree()` function in the API returns the degree of the polynomial, corresponding to the highest exponent with a non-zero coefficient. 
+The `degree()` function in the API returns the degree of the polynomial, corresponding to the highest exponent with a non-zero coefficient.

 - For the polynomial $f(x) = x^5 + 2x^3 + 4$, the degree is 5 because the highest power of $x$ with a non-zero coefficient is 5.
 - For a scalar value such as a constant term (e.g., $f(x) = 7$, the degree is considered 0, as it corresponds to $x^0$.
 - The degree of the zero polynomial, $f(x) = 0$, where there are no non-zero coefficients, is defined as -1. This special case often represents an "empty" or undefined state in many mathematical contexts.

 Example:
+
 ```cpp
 auto f = /*some expression*/;
 auto degree_of_f = f.degree();
 ```

 #### Slicing
+
 ```cpp
 // Slicing and selecting even or odd components.
 Polynomial slice(uint64_t offset, uint64_t stride, uint64_t size = 0 /*0 means take all elements*/);
@@ -195,6 +213,7 @@ Polynomial odd();
 The Polynomial API provides methods for slicing polynomials and selecting specific components, such as even or odd indexed terms. Slicing allows extracting specific sections of a polynomial based on an offset, stride, and size.

 The following examples demonstrate folding a polynomial's even and odd parts and arbitrary slicing;
+
 ```cpp
 // folding a polynomials even and odd parts with randomness
 auto x = rand();
@@ -207,13 +226,15 @@ auto first_quarter = f.slice(0 /*offset*/, 1 /*stride*/, f.degree()/4 /*size*/);
 ```

 ### Memory access (copy/view)
-Access to the polynomial's internal state can be vital for operations like commitment schemes or when more efficient custom operations are necessary. This can be done in one of two ways:
- **Copy** the coefficients or evaluations to user allocated memory or
- **View** into the device memory without copying.

-#### Copy
-Copy the polynomial coefficients to either host or device allocated memory.
-:::note copying to host memory is backend agnostic while copying to device memory requires the memory to be allocated on the corresponding backend.
+Access to the polynomial's internal state can be vital for operations like commitment schemes or when more efficient custom operations are necessary. This can be done either by copying or viewing the polynomial
+
+#### Copying
+
+Copies the polynomial coefficients to either host or device allocated memory.
+
+:::note
+Copying to host memory is backend agnostic while copying to device memory requires the memory to be allocated on the corresponding backend.
 :::

 ```cpp
@@ -222,6 +243,7 @@ uint64_t copy_coeffs(Coeff* coeffs, uint64_t start_idx, uint64_t end_idx) const;
 ```

 Example:
+
 ```cpp
 auto coeffs_device = /*allocate CUDA or host memory*/
 f.copy_coeffs(coeffs_device, 0/*start*/, f.degree());
@@ -232,7 +254,8 @@ auto rv = msm::MSM(coeffs_device, points, msm_size, cfg, results);
 ```

 #### Views
-The Polynomial API supports efficient data handling through the use of memory views. These views provide direct access to the polynomial's internal state, such as coefficients or evaluations, without the need to copy data. This feature is particularly useful for operations that require direct access to device memory, enhancing both performance and memory efficiency.
+
+The Polynomial API supports efficient data handling through the use of memory views. These views provide direct access to the polynomial's internal state, such as coefficients or evaluations without the need to copy data. This feature is particularly useful for operations that require direct access to device memory, enhancing both performance and memory efficiency.

 ##### What is a Memory View?

@@ -268,6 +291,7 @@ gpu_accelerated_function(coeffs_view.get(),...);
 ```

 ##### Integrity-Pointer: Managing Memory Views
+
 Within the Polynomial API, memory views are managed through a specialized tool called the Integrity-Pointer. This pointer type is designed to safeguard operations by monitoring the validity of the memory it points to. It can detect if the memory has been modified or released, thereby preventing unsafe access to stale or non-existent data.
 The Integrity-Pointer not only acts as a regular pointer but also provides additional functionality to ensure the integrity of the data it references. Here are its key features:

@@ -305,8 +329,10 @@ if (coeff_view.isValid()) {
 ```

 #### Evaluations View: Accessing Polynomial Evaluations Efficiently
+
 The Polynomial API offers a specialized method, `get_rou_evaluations_view(...)`, which facilitates direct access to the evaluations of a polynomial. This method is particularly useful for scenarios where polynomial evaluations need to be accessed frequently or manipulated externally without the overhead of copying data.
 This method provides a memory view into the device memory where polynomial evaluations are stored. It allows for efficient interpolation on larger domains, leveraging the raw evaluations directly from memory.
+
 :::warning
 Invalid request: requesting evaluations on a domain smaller than the degree of the polynomial is not supported and is considered invalid.
 :::
@@ -334,7 +360,9 @@ cudaSetDevice(int deviceID);
 This function sets the active CUDA device. All subsequent operations that allocate or deal with polynomial data will be performed on this device.

 ### Allocation Consistency
+
 Polynomials are always allocated on the current CUDA device at the time of their creation. It is crucial to ensure that the device context is correctly set before initiating any operation that involves memory allocation:
+
 ```cpp
 // Set the device before creating polynomials
 cudaSetDevice(0);
@@ -345,6 +373,7 @@ Polynomial p2 = Polynomial::from_coefficients(coeffs, size);
 ```

 ### Matching Devices for Operations
+
 When performing operations that result in the creation of new polynomials (such as addition or multiplication), it is imperative that both operands are on the same CUDA device. If the operands reside on different devices, an exception is thrown:

 ```cpp
@@ -354,7 +383,9 @@ auto p3 = p1 + p2; // Throws an exception if p1 and p2 are not on the same devic
 ```

 ### Device-Agnostic Operations
+
 Operations that do not involve the creation of new polynomials, such as computing the degree of a polynomial or performing in-place modifications, can be executed regardless of the current device setting:
+
 ```cpp
 // 'degree' and in-place operations do not require device matching
 int deg = p1.degree();
@@ -362,9 +393,11 @@ p1 += p2; // Valid if p1 and p2 are on the same device, throws otherwise
 ```

 ### Error Handling
+
 The API is designed to throw exceptions if operations are attempted across polynomials that are not located on the same GPU. This ensures that all polynomial operations are performed consistently and without data integrity issues due to device mismatches.

 ### Best Practices
+
 To maximize the performance and avoid runtime errors in a multi-GPU setup, always ensure that:

 - The CUDA device is set correctly before polynomial allocation.
--- a/docs/docs/icicle/primitives/msm.md
+++ b/docs/docs/icicle/primitives/msm.md
@@ -49,13 +49,6 @@ Accelerating MSM is crucial to a ZK protocol's performance due to the [large per

 You can learn more about how MSMs work from this [video](https://www.youtube.com/watch?v=Bl5mQA7UL2I) and from our resource list on [Ingopedia](https://www.ingonyama.com/ingopedia/msm).

-## Supported curves
-
-MSM supports the following curves:
-
-`bls12-377`, `bls12-381`, `bn254`, `bw6-761`, `grumpkin`
-
-
 ## Supported Bindings

 - [Golang](../golang-bindings/msm.md)
@@ -81,16 +74,16 @@ Large Triangle Accumulation is a method for optimizing MSM which focuses on redu

 #### When should I use Large triangle accumulation?

-The Large Triangle Accumulation algorithm is more sequential in nature, as it builds upon each step sequentially (accumulating sums and then performing doubling). This structure can make it less suitable for parallelization but potentially more efficient for a <b>large batch of smaller MSM computations</b>.
+The Large Triangle Accumulation algorithm is more sequential in nature, as it builds upon each step sequentially (accumulating sums and then performing doubling). This structure can make it less suitable for parallelization but potentially more efficient for a **large batch of smaller MSM computations**.

 ## MSM Modes

 ICICLE MSM also supports two different modes `Batch MSM` and `Single MSM`

-Batch MSM allows you to run many MSMs with a single API call, Single MSM will launch a single MSM computation.
+Batch MSM allows you to run many MSMs with a single API call while single MSM will launch a single MSM computation.

 ### Which mode should I use?

-This decision is highly dependent on your use case and design. However, if your design allows for it, using batch mode can significantly improve efficiency. Batch processing allows you to perform multiple MSMs leveraging the parallel processing capabilities of GPUs.
+This decision is highly dependent on your use case and design. However, if your design allows for it, using batch mode can significantly improve efficiency. Batch processing allows you to perform multiple MSMs simultaneously, leveraging the parallel processing capabilities of GPUs.

 Single MSM mode should be used when batching isn't possible or when you have to run a single MSM.
--- a/docs/docs/icicle/primitives/ntt.md
+++ b/docs/docs/icicle/primitives/ntt.md
@@ -11,24 +11,19 @@ A_k = \sum_{n=0}^{N-1} a_n \cdot \omega^{nk} \mod p
 $$

 where:
+
 - $N$ is the size of the input sequence and is a power of 2,
 - $p$ is a prime number such that $p = kN + 1$ for some integer $k$, ensuring that $p$ supports the existence of $N$th roots of unity,
 - $\omega$ is a primitive $N$th root of unity modulo $p$, meaning $\omega^N \equiv 1 \mod p$ and no smaller positive power of $\omega$ is congruent to 1 modulo $p$,
 - $k$ ranges from 0 to $N-1$, and it indexes the output sequence.

-The NTT is particularly useful because it enables efficient polynomial multiplication under modulo arithmetic, crucial for algorithms in cryptographic protocols, and other areas requiring fast modular arithmetic operations. 
+NTT is particularly useful because it enables efficient polynomial multiplication under modulo arithmetic, crucial for algorithms in cryptographic protocols and other areas requiring fast modular arithmetic operations.

 There exists also INTT which is the inverse operation of NTT. INTT can take as input an output sequence of integers from an NTT and reconstruct the original sequence.

-# Using NTT
+## Using NTT

-### Supported curves
-
-NTT supports the following curves:
-
-`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`
-
-## Supported Bindings
+### Supported Bindings

 - [Golang](../golang-bindings/ntt.md)
 - [Rust](../rust-bindings/ntt.md)
@@ -61,19 +56,17 @@ Choosing an algorithm is heavily dependent on your use case. For example Cooley-

 NTT also supports two different modes `Batch NTT` and `Single NTT`

-Batch NTT allows you to run many NTTs with a single API call, Single MSM will launch a single MSM computation.
-
 Deciding weather to use `batch NTT` vs `single NTT` is highly dependent on your application and use case.

-**Single NTT Mode**
+#### Single NTT

- Choose this mode when your application requires processing individual NTT operations in isolation.
+Single NTT will launch a single NTT computation.

-**Batch NTT Mode**
+Choose this mode when your application requires processing individual NTT operations in isolation.

- Batch NTT mode can significantly reduce read/write as well as computation overhead by executing multiple NTT operations in parallel.
+#### Batch NTT Mode

- Batch mode may also offer better utilization of computational resources (memory and compute).
+Batch NTT allows you to run many NTTs with a single API call. Batch NTT mode can significantly reduce read/write times as well as computation overhead by executing multiple NTT operations in parallel. Batch mode may also offer better utilization of computational resources (memory and compute).

 ## Supported algorithms

@@ -90,8 +83,8 @@ At its core, the Radix-2 NTT algorithm divides the problem into smaller sub-prob
   The algorithm recursively divides the input sequence into smaller sequences. At each step, it separates the sequence into even-indexed and odd-indexed elements, forming two subsequences that are then processed independently.

 3. **Butterfly Operations:**
-   The core computational element of the Radix-2 NTT is the "butterfly" operation, which combines pairs of elements from the sequences obtained in the decomposition step. 
-   
+   The core computational element of the Radix-2 NTT is the "butterfly" operation, which combines pairs of elements from the sequences obtained in the decomposition step.
+
   Each butterfly operation involves multiplication by a "twiddle factor," which is a root of unity in the finite field, and addition or subtraction of the results, all performed modulo the prime modulus.

   $$
@@ -108,7 +101,6 @@ At its core, the Radix-2 NTT algorithm divides the problem into smaller sub-prob

   $k$ - The index of the current operation within the butterfly or the transform stage

-
   The twiddle factors are precomputed to save runtime and improve performance.

 4. **Bit-Reversal Permutation:**
@@ -116,7 +108,7 @@ At its core, the Radix-2 NTT algorithm divides the problem into smaller sub-prob

 ### Mixed Radix

-The Mixed Radix NTT algorithm extends the concepts of the Radix-2 algorithm by allowing the decomposition of the input sequence based on various factors of its length. Specifically ICICLEs implementation splits the input into blocks of sizes 16,32,64 compared to radix2 which is always splitting such that we end with NTT of size 2. This approach offers enhanced flexibility and efficiency, especially for input sizes that are composite numbers, by leveraging the "divide and conquer" strategy across multiple radixes.
+The Mixed Radix NTT algorithm extends the concepts of the Radix-2 algorithm by allowing the decomposition of the input sequence based on various factors of its length. Specifically ICICLEs implementation splits the input into blocks of sizes 16, 32, or 64 compared to radix2 which is always splitting such that we end with NTT of size 2. This approach offers enhanced flexibility and efficiency, especially for input sizes that are composite numbers, by leveraging the "divide and conquer" strategy across multiple radices.

 The NTT blocks in Mixed Radix are implemented more efficiently based on winograd NTT but also optimized memory and register usage is better compared to Radix-2.

@@ -126,11 +118,11 @@ Mixed Radix can reduce the number of stages required to compute for large inputs
   The input to the Mixed Radix NTT is a sequence of integers $a_0, a_1, \ldots, a_{N-1}$, where $N$ is not strictly required to be a power of two. Instead, $N$ can be any composite number, ideally factorized into primes or powers of primes.

 2. **Factorization and Decomposition:**
-   Unlike the Radix-2 algorithm, which strictly divides the computational problem into halves, the Mixed Radix NTT algorithm implements a flexible decomposition approach which isn't limited to prime factorization. 
-   
+   Unlike the Radix-2 algorithm, which strictly divides the computational problem into halves, the Mixed Radix NTT algorithm implements a flexible decomposition approach which isn't limited to prime factorization.
+
   For example, an NTT of size 256 can be decomposed into two stages of $16 \times \text{NTT}_{16}$, leveraging a composite factorization strategy rather than decomposing into eight stages of $\text{NTT}_{2}$. This exemplifies the use of composite factors (in this case, $256 = 16 \times 16$) to apply smaller NTT transforms, optimizing computational efficiency by adapting the decomposition strategy to the specific structure of $N$.

-3. **Butterfly Operations with Multiple Radixes:**
+3. **Butterfly Operations with Multiple Radices:**
   The Mixed Radix algorithm utilizes butterfly operations for various radix sizes. Each sub-transform involves specific butterfly operations characterized by multiplication with twiddle factors appropriate for the radix in question.

   The generalized butterfly operation for a radix-$r$ element can be expressed as:
@@ -139,7 +131,15 @@ Mixed Radix can reduce the number of stages required to compute for large inputs
   X_{k,r} = \sum_{j=0}^{r-1} (A_{j,k} \cdot W^{jk}) \mod p
   $$

-   where $X_{k,r}$ is the output of the $radix-r$ butterfly operation for the $k-th$ set of inputs, $A_{j,k}$ represents the $j-th$ input element for the $k-th$ operation, $W$ is the twiddle factor, and $p$ is the prime modulus.
+   where:
+
+   $X_{k,r}$ - is the output of the $radix-r$ butterfly operation for the $k-th$ set of inputs
+
+   $A_{j,k}$ - represents the $j-th$ input element for the $k-th$ operation
+
+   $W$ - is the twiddle factor
+
+   $p$ - is the prime modulus

 4. **Recombination and Reordering:**
   After applying the appropriate butterfly operations across all decomposition levels, the Mixed Radix algorithm recombines the results into a single output sequence. Due to the varied sizes of the sub-transforms, a more complex reordering process may be required compared to Radix-2. This involves digit-reversal permutations to ensure that the final output sequence is correctly ordered.
@@ -154,6 +154,6 @@ Mixed radix on the other hand works better for larger NTTs with larger input siz

 Performance really depends on logn size, batch size, ordering, inverse, coset, coeff-field and which GPU you are using.

-For this reason we implemented our [heuristic auto-selection](https://github.com/ingonyama-zk/icicle/blob/774250926c00ffe84548bc7dd97aea5227afed7e/icicle/appUtils/ntt/ntt.cu#L474) which should choose the most efficient algorithm in most cases. 
+For this reason we implemented our [heuristic auto-selection](https://github.com/ingonyama-zk/icicle/blob/main/icicle/src/ntt/ntt.cu#L573) which should choose the most efficient algorithm in most cases.

 We still recommend you benchmark for your specific use case if you think a different configuration would yield better results.
--- a/docs/docs/icicle/primitives/poseidon.md
+++ b/docs/docs/icicle/primitives/poseidon.md
@@ -8,39 +8,38 @@ Poseidon has been used in many popular ZK protocols such as Filecoin and [Plonk]

 Our implementation of Poseidon is implemented in accordance with the optimized [Filecoin version](https://spec.filecoin.io/algorithms/crypto/poseidon/).

-Let understand how Poseidon works.
+Lets understand how Poseidon works.

-### Initialization
+## Initialization

-Poseidon starts with the initialization of its internal state, which is composed of the input elements and some pregenerated constants. An initial round constant is added to each element of the internal state. Adding The round constants ensure the state is properly mixed from the outset.
+Poseidon starts with the initialization of its internal state, which is composed of the input elements and some pre-generated constants. An initial round constant is added to each element of the internal state. Adding the round constants ensures the state is properly mixed from the beginning.

 This is done to prevent collisions and to prevent certain cryptographic attacks by ensuring that the internal state is sufficiently mixed and unpredictable.

 ![Alt text](image.png)

-### Applying full and partial rounds
+## Applying full and partial rounds

-To generate a secure hash output, the algorithm goes through a series of "full rounds" and "partial rounds" as well as transformations between these sets of rounds.
+To generate a secure hash output, the algorithm goes through a series of "full rounds" and "partial rounds" as well as transformations between these sets of rounds in the following order:

-First full rounds => apply SBox and Round constants => partial rounds => Last full rounds => Apply SBox
+```First full rounds -> apply S-box and Round constants -> partial rounds -> Last full rounds -> Apply S-box```

-#### Full rounds
+### Full rounds

 ![Alt text](image-1.png)

-**Uniform Application of S-Box:** In full rounds, the S-box (a non-linear transformation) is applied uniformly to every element of the hash function's internal state. This ensures a high degree of mixing and diffusion, contributing to the hash function's security. The functions S-box involves raising each element of the state to a certain power denoted by `α` a member of the finite field defined by the prime `p`, `α` can be different depending on the the implementation and user configuration.
+**Uniform Application of S-box:** In full rounds, the S-box (a non-linear transformation) is applied uniformly to every element of the hash function's internal state. This ensures a high degree of mixing and diffusion, contributing to the hash function's security. The functions S-box involves raising each element of the state to a certain power denoted by `α` a member of the finite field defined by the prime `p`; `α` can be different depending on the the implementation and user configuration.

 **Linear Transformation:** After applying the S-box, a linear transformation is performed on the state. This involves multiplying the state by a MDS (Maximum Distance Separable) Matrix. which further diffuses the transformations applied by the S-box across the entire state.

 **Addition of Round Constants:** Each element of the state is then modified by adding a unique round constant. These constants are different for each round and are precomputed as part of the hash function's initialization. The addition of round constants ensures that even minor changes to the input produce significant differences in the output.

-#### Partial Rounds
+### Partial Rounds

 **Selective Application of S-Box:** Partial rounds apply the S-box transformation to only one element of the internal state per round, rather than to all elements. This selective application significantly reduces the computational complexity of the hash function without compromising its security. The choice of which element to apply the S-box to can follow a specific pattern or be fixed, depending on the design of the hash function.

 **Linear Transformation and Round Constants:** A linear transformation is performed and round constants are added. The linear transformation in partial rounds can be designed to be less computationally intensive (this is done by using a sparse matrix) than in full rounds, further optimizing the function's efficiency.

-
 The user of Poseidon can often choose how many partial or full rounds he wishes to apply; more full rounds will increase security but degrade performance. The choice and balance is highly dependent on the use case.

 ![Alt text](image-2.png)
@@ -52,25 +51,20 @@ What that means is we calculate multiple hash-sums over multiple pre-images in p

 So for Poseidon of arity 2 and input of size 1024 * 2, we would expect 1024 elements of output. Which means each block would be of size 2 and that would result in 1024 Poseidon hashes being performed.

-### Supported API
+### Supported Bindings

-[`Rust`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/icicle-core/src/poseidon), [`C++`](https://github.com/ingonyama-zk/icicle/tree/main/icicle/appUtils/poseidon)
-
-### Supported curves
-
-Poseidon supports the following curves:
-
-`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`
+[`Rust`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/icicle-core/src/poseidon)

 ### Constants

 Poseidon is extremely customizable and using different constants will produce different hashes, security levels and performance results.

-We support pre-calculated and optimized constants for each of the [supported curves](#supported-curves).The constants can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/appUtils/poseidon/constants) and are labeled clearly per curve `<curve_name>_poseidon.h`.
+We support pre-calculated and optimized constants for each of the [supported curves](#supported-curves).The constants can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/include/poseidon/constants) and are labeled clearly per curve `<curve_name>_poseidon.h`.

-If you wish to generate your own constants you can use our python script which can be found [here](https://github.com/ingonyama-zk/icicle/blob/b6dded89cdef18348a5d4e2748b71ce4211c63ad/icicle/appUtils/poseidon/constants/generate_parameters.py#L1).
+If you wish to generate your own constants you can use our python script which can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/include/poseidon/constants/generate_parameters.py).

 Prerequisites:
+
 - Install python 3
 - `pip install poseidon-hash`
 - `pip install galois==0.3.7`
@@ -97,7 +91,7 @@ primitive_element = 7 # bls12-381
 # primitive_element = 15 # bw6-761
 ```

-We only support `alpha = 5` so if you want to use another alpha for SBox please reach out on discord or open a github issue.
+We only support `alpha = 5` so if you want to use another alpha for S-box please reach out on discord or open a github issue.

 ### Rust API

@@ -128,8 +122,7 @@ poseidon_hash_many::<F>(

 The `PoseidonConfig::default()` can be modified, by default the inputs and outputs are set to be on `Host` for example.

-
-```
+```rust
 impl<'a> Default for PoseidonConfig<'a> {
    fn default() -> Self {
        let ctx = get_default_device_context();
@@ -174,11 +167,10 @@ let ctx = get_default_device_context();
    )
    .unwrap();
 ```
-For more examples using different configurations refer here.

 ## The Tree Builder

-The tree builder allows you to build Merkle trees using Poseidon. 
+The tree builder allows you to build Merkle trees using Poseidon.

 You can define both the tree's `height` and its `arity`. The tree `height` determines the number of layers in the tree, including the root and the leaf layer. The `arity` determines how many children each internal node can have.

@@ -206,9 +198,9 @@ Similar to Poseidon, you can also configure the Tree Builder `TreeBuilderConfig:
 - `are_inputs_on_device`: Have the inputs been loaded to device memory ?
 - `is_async`: Should the TreeBuilder run asynchronously? `False` will block the current CPU thread. `True` will require you call `cudaStreamSynchronize` or `cudaDeviceSynchronize` to retrieve the result.

-### Benchmarks 
+### Benchmarks

-We ran the Poseidon tree builder on: 
+We ran the Poseidon tree builder on:

 **CPU**: 12th Gen Intel(R) Core(TM) i9-12900K/

@@ -218,9 +210,8 @@ We ran the Poseidon tree builder on:

 The benchmarks include copying data from and to the device.

-
 | Rows to keep parameter      | Run time, Icicle | Supranational PC2
-| ----------- | ----------- | ----------- |  
+| ----------- | ----------- | -----------
 | 10          | 9.4 seconds       |    13.6 seconds
 | 20          | 9.5 seconds       |    13.6 seconds
 | 29          | 13.7 seconds       |    13.6 seconds
--- a/docs/docs/icicle/rust-bindings.md
+++ b/docs/docs/icicle/rust-bindings.md
@@ -12,7 +12,7 @@ Rust bindings allow you to use ICICLE as a rust library.

 Simply add the following to your `Cargo.toml`.

-```
+```toml
 # GPU Icicle integration
 icicle-cuda-runtime = { git = "https://github.com/ingonyama-zk/icicle.git" }
 icicle-core = { git = "https://github.com/ingonyama-zk/icicle.git" }
@@ -25,7 +25,7 @@ If you wish to point to a specific ICICLE branch add `branch = "<name_of_branch>

 When you build your project ICICLE will be built as part of the build command.

-# How do the rust bindings work?
+## How do the rust bindings work?

 The rust bindings are just rust wrappers for ICICLE Core static libraries which can be compiled. We integrate the compilation of the static libraries into rusts toolchain to make usage seamless and easy. This is achieved by [extending rusts build command](https://github.com/ingonyama-zk/icicle/blob/main/wrappers/rust/icicle-curves/icicle-bn254/build.rs).

@@ -55,3 +55,33 @@ fn main() {
    println!("cargo:rustc-link-lib=cudart");
 }
 ```
+
+## Supported curves, fields and operations
+
+### Supported curves and operations
+
+| Operation\Curve | bn254 | bls12_377 | bls12_381 | bw6-761 | grumpkin |
+| --- | :---: | :---: | :---: | :---: | :---: |
+| MSM | ✅ | ✅ | ✅ | ✅ | ✅ |
+| G2  | ✅ | ✅ | ✅ | ✅ | ❌ |
+| NTT | ✅ | ✅ | ✅ | ✅ | ❌ |
+| ECNTT | ✅ | ✅ | ✅ | ✅ | ❌ |
+| VecOps | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Polynomials | ✅ | ✅ | ✅ | ✅ | ❌ |
+| Poseidon | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Merkle Tree | ✅ | ✅ | ✅ | ✅ | ✅ |
+
+### Supported fields and operations
+
+| Operation\Field | babybear | stark252 |
+| --- | :---: | :---: |
+| VecOps | ✅ | ✅ |
+| Polynomials | ✅ | ✅ |
+| NTT | ✅ | ✅ |
+| Extension Field | ✅ | ❌ |
+
+### Supported hashes
+
+| Hash | Sizes |
+| --- | :---: |
+| Keccak | 256, 512 |
--- a/docs/docs/icicle/rust-bindings/ecntt.md
+++ b/docs/docs/icicle/rust-bindings/ecntt.md
@@ -1,9 +1,5 @@
 # ECNTT

-### Supported curves
-
-`bls12-377`, `bls12-381`, `bn254`
-
 ## ECNTT Method

 The `ecntt` function computes the Elliptic Curve Number Theoretic Transform (EC-NTT) or its inverse on a batch of points of a curve.
@@ -25,7 +21,7 @@ where

 ## Parameters

- **`input`**: The input data as a slice of `Projective<C>`. This represents points on a specific elliptic curve `C`. 
+- **`input`**: The input data as a slice of `Projective<C>`. This represents points on a specific elliptic curve `C`.
 - **`dir`**: The direction of the NTT. It can be `NTTDir::kForward` for forward NTT or `NTTDir::kInverse` for inverse NTT.
 - **`cfg`**: The NTT configuration object of type `NTTConfig<C::ScalarField>`. This object specifies parameters for the NTT computation, such as the batch size and algorithm to use.
 - **`output`**: The output buffer to write the results into. This should be a slice of `Projective<C>` with the same size as the input.
--- a/docs/docs/icicle/rust-bindings/msm-pre-computation.md
+++ b/docs/docs/icicle/rust-bindings/msm-pre-computation.md
@@ -2,11 +2,7 @@

 To understand the theory behind MSM pre computation technique refer to Niall Emmart's [talk](https://youtu.be/KAWlySN7Hm8?feature=shared&t=1734).

-### Supported curves
-
-`bls12-377`, `bls12-381`, `bn254`, `bw6-761`, `Grumpkin`
-
-### `precompute_bases`
+## `precompute_bases`

 Precomputes bases for the multi-scalar multiplication (MSM) by extending each base point with its multiples, facilitating more efficient MSM calculations.

@@ -20,8 +16,7 @@ pub fn precompute_bases<C: Curve + MSM<C>>(
 ) -> IcicleResult<()>
 ```

-
-#### Parameters
+### Parameters

 - **`points`**: The original set of affine points (\(P_1, P_2, ..., P_n\)) to be used in the MSM. For batch MSM operations, this should include all unique points concatenated together.
 - **`precompute_factor`**: Specifies the total number of points to precompute for each base, including the base point itself. This parameter directly influences the memory requirements and the potential speedup of the MSM operation.
--- a/docs/docs/icicle/rust-bindings/msm.md
+++ b/docs/docs/icicle/rust-bindings/msm.md
@@ -1,9 +1,5 @@
 # MSM

-### Supported curves
-
-`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`, `grumpkin`
-
 ## Example

 ```rust
@@ -84,7 +80,7 @@ pub struct MSMConfig<'a> {
 ```

 - **`ctx: DeviceContext`**: Specifies the device context, device id and the CUDA stream for asynchronous execution.
- **`point_size: i32`**: 
+- **`point_size: i32`**:
 - **`precompute_factor: i32`**: Determines the number of extra points to pre-compute for each point, affecting memory footprint and performance.
 - **`c: i32`**: The "window bitsize," a parameter controlling the computational complexity and memory footprint of the MSM operation.
 - **`bitsize: i32`**: The number of bits of the largest scalar, typically equal to the bit size of the scalar field.
@@ -120,7 +116,6 @@ msm::msm(&scalars, &points, &cfg, &mut msm_results).unwrap();

 You may reference the rust code [here](https://github.com/ingonyama-zk/icicle/blob/77a7613aa21961030e4e12bf1c9a78a2dadb2518/wrappers/rust/icicle-core/src/msm/mod.rs#L54).

-
 ## How do I toggle between MSM modes?

 Toggling between MSM modes occurs automatically based on the number of results you are expecting from the `msm::msm` function. If you are expecting an array of `msm_results`, ICICLE will automatically split `scalars` and `points` into equal parts and run them as multiple MSMs in parallel.
@@ -136,7 +131,6 @@ msm::msm(&scalars, &points, &cfg, &mut msm_result).unwrap();

 In the example above we allocate a single expected result which the MSM method will interpret as `batch_size=1` and run a single MSM.

-
 In the next example, we are expecting 10 results which sets `batch_size=10` and runs 10 MSMs in batch mode.

 ```rust
@@ -152,7 +146,7 @@ Here is a [reference](https://github.com/ingonyama-zk/icicle/blob/77a7613aa21961

 ## Support for G2 group

-MSM also supports G2 group. 
+MSM also supports G2 group.

 Using MSM in G2 requires a G2 config, and of course your Points should also be G2 Points.

--- a/docs/docs/icicle/rust-bindings/ntt.md
+++ b/docs/docs/icicle/rust-bindings/ntt.md
@@ -1,10 +1,6 @@
 # NTT

-### Supported curves
-
-`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`
-
-## Example 
+## Example

 ```rust
 use icicle_bn254::curve::{ScalarCfg, ScalarField};
@@ -61,14 +57,13 @@ pub fn ntt<F>(

 `ntt:ntt` expects:

- **`input`** - buffer to read the inputs of the NTT from. <br/>
- **`dir`** - whether to compute forward or inverse NTT. <br/>
- **`cfg`** - config used to specify extra arguments of the NTT. <br/>
+- **`input`** - buffer to read the inputs of the NTT from.
+- **`dir`** - whether to compute forward or inverse NTT.
+- **`cfg`** - config used to specify extra arguments of the NTT.
 - **`output`** - buffer to write the NTT outputs into. Must be of the same  size as input.

 The `input` and `output` buffers can be on device or on host. Being on host means that they will be transferred to device during runtime.

-
 ### NTT Config

 ```rust
@@ -107,8 +102,7 @@ The `NTTConfig` struct is a configuration object used to specify parameters for

 - **`ntt_algorithm: NttAlgorithm`**: Can be one of `Auto`, `Radix2`, `MixedRadix`.
 `Auto` will select `Radix 2` or `Mixed Radix` algorithm based on heuristics.
-`Radix2` and `MixedRadix` will force the use of an algorithm regardless of the input size or other considerations. You should use one of these options when you know for sure that you want to 
-
+`Radix2` and `MixedRadix` will force the use of an algorithm regardless of the input size or other considerations. You should use one of these options when you know for sure that you want to

 #### Usage

@@ -134,7 +128,6 @@ let custom_config = NTTConfig {
 };
 ```

-
 ### Modes

 NTT supports two different modes `Batch NTT` and `Single NTT`
@@ -205,4 +198,3 @@ where
 #### Returns

 The function returns an `IcicleResult<()>`, which represents the result of the operation. If the operation is successful, the function returns `Ok(())`, otherwise it returns an error.
-
--- a/docs/docs/icicle/rust-bindings/polynomials.md
+++ b/docs/docs/icicle/rust-bindings/polynomials.md
@@ -1,14 +1,16 @@
-:::note Please refer to the Polynomials overview page for a deep overview. This section is a brief description of the Rust FFI bindings. 
+# Rust FFI Bindings for Univariate Polynomial
+
+:::note
+Please refer to the Polynomials overview page for a deep overview. This section is a brief description of the Rust FFI bindings.
 :::

-# Rust FFI Bindings for Univariate Polynomial
 This documentation is designed to provide developers with a clear understanding of how to utilize the Rust bindings for polynomial operations efficiently and effectively, leveraging the robust capabilities of both Rust and C++ in their applications.

 ## Introduction
+
 The Rust FFI bindings for the Univariate Polynomial serve as a "shallow wrapper" around the underlying C++ implementation. These bindings provide a straightforward Rust interface that directly calls functions from a C++ library, effectively bridging Rust and C++ operations. The Rust layer handles simple interface translations without delving into complex logic or data structures, which are managed on the C++ side. This design ensures efficient data handling, memory management, and execution of polynomial operations directly via C++.
 Currently, these bindings are tailored specifically for polynomials where the coefficients, domain, and images are represented as scalar fields.

-
 ## Initialization Requirements

 Before utilizing any functions from the polynomial API, it is mandatory to initialize the appropriate polynomial backend (e.g., CUDA). Additionally, the NTT (Number Theoretic Transform) domain must also be initialized, as the CUDA backend relies on this for certain operations. Failing to properly initialize these components can result in errors.
@@ -19,12 +21,12 @@ Before utilizing any functions from the polynomial API, it is mandatory to initi
 The ICICLE library is structured such that each field or curve has its dedicated library implementation. As a result, initialization must be performed individually for each field or curve to ensure the correct setup and functionality of the library.
 :::

-
 ## Core Trait: `UnivariatePolynomial`

 The `UnivariatePolynomial` trait encapsulates the essential functionalities required for managing univariate polynomials in the Rust ecosystem. This trait standardizes the operations that can be performed on polynomials, regardless of the underlying implementation details. It allows for a unified approach to polynomial manipulation, providing a suite of methods that are fundamental to polynomial arithmetic.

 ### Trait Definition
+
 ```rust
 pub trait UnivariatePolynomial
 where
@@ -77,6 +79,7 @@ where
 ```

 ## `DensePolynomial` Struct
+
 The DensePolynomial struct represents a dense univariate polynomial in Rust, leveraging a handle to manage its underlying memory within the CUDA device context. This struct acts as a high-level abstraction over complex C++ memory management practices, facilitating the integration of high-performance polynomial operations through Rust's Foreign Function Interface (FFI) bindings.

 ```rust
@@ -88,15 +91,19 @@ pub struct DensePolynomial {
 ### Traits implementation and methods

 #### `Drop`
+
 Ensures proper resource management by releasing the CUDA memory when a DensePolynomial instance goes out of scope. This prevents memory leaks and ensures that resources are cleaned up correctly, adhering to Rust's RAII (Resource Acquisition Is Initialization) principles.

 #### `Clone`
+
 Provides a way to create a new instance of a DensePolynomial with its own unique handle, thus duplicating the polynomial data in the CUDA context. Cloning is essential since the DensePolynomial manages external resources, which cannot be safely shared across instances without explicit duplication.

 #### Operator Overloading: `Add`, `Sub`, `Mul`, `Rem`, `Div`
+
 These traits are implemented for references to DensePolynomial (i.e., &DensePolynomial), enabling natural mathematical operations such as addition (+), subtraction (-), multiplication (*), division (/), and remainder (%). This syntactic convenience allows users to compose complex polynomial expressions in a way that is both readable and expressive.

 #### Key Methods
+
 In addition to the traits, the following methods are implemented:

 ```rust
@@ -107,16 +114,16 @@ impl DensePolynomial {
 }      
 ```

-:::note Might be consolidated with `UnivariatePolynomial` trait
-:::
-
 ## Flexible Memory Handling With `HostOrDeviceSlice`
+
 The DensePolynomial API is designed to accommodate a wide range of computational environments by supporting both host and device memory through the `HostOrDeviceSlice` trait. This approach ensures that polynomial operations can be seamlessly executed regardless of where the data resides, making the API highly adaptable and efficient for various hardware configurations.

 ### Overview of `HostOrDeviceSlice`
+
 The HostOrDeviceSlice is a Rust trait that abstracts over slices of memory that can either be on the host (CPU) or the device (GPU), as managed by CUDA. This abstraction is crucial for high-performance computing scenarios where data might need to be moved between different memory spaces depending on the operations being performed and the specific hardware capabilities available.

 ### Usage in API Functions
+
 Functions within the DensePolynomial API that deal with polynomial coefficients or evaluations use the HostOrDeviceSlice trait to accept inputs. This design allows the functions to be agnostic of the actual memory location of the data, whether it's in standard system RAM accessible by the CPU or in GPU memory accessible by CUDA cores.

 ```rust
@@ -132,10 +139,13 @@ let p_from_evals = PolynomialBabyBear::from_rou_evals(&evals, evals.len());
 ```

 ## Usage
+
 This section outlines practical examples demonstrating how to utilize the `DensePolynomial` Rust API. The API is flexible, supporting multiple scalar fields. Below are examples showing how to use polynomials defined over different fields and perform a variety of operations.

 ### Initialization and Basic Operations
+
 First, choose the appropriate field implementation for your polynomial operations, initializing the CUDA backend if necessary
+
 ```rust
 use icicle_babybear::polynomials::DensePolynomial as PolynomialBabyBear;

@@ -151,10 +161,10 @@ use icicle_bn254::polynomials::DensePolynomial as PolynomialBn254;
 ```

 ### Creation
+
 Polynomials can be created from coefficients or evaluations:

 ```rust
-// Assume F is the field type (e.g. icicle_bn254::curve::ScalarField or a type parameter)
 let coeffs = ...;
 let p_from_coeffs = PolynomialBabyBear::from_coeffs(HostSlice::from_slice(&coeffs), size);

@@ -164,6 +174,7 @@ let p_from_evals = PolynomialBabyBear::from_rou_evals(HostSlice::from_slice(&eva
 ```

 ### Arithmetic Operations
+
 Utilize overloaded operators for intuitive mathematical expressions:

 ```rust
@@ -174,6 +185,7 @@ let mul_scalar = &f * &scalar;  // Scalar multiplication
 ```

 ### Division and Remainder
+
 Compute quotient and remainder or perform division by a vanishing polynomial:

 ```rust
@@ -186,6 +198,7 @@ let h = f.div_by_vanishing(N);  // Division by V(x) = X^N - 1
 ```

 ### Monomial Operations
+
 Add or subtract monomials in-place for efficient polynomial manipulation:

 ```rust
@@ -194,6 +207,7 @@ f.sub_monomial_inplace(&one, 0 /*monmoial*/);   // Subtracts 1 from f
 ```

 ### Slicing
+
 Extract specific components:

 ```rust
@@ -203,6 +217,7 @@ let arbitrary_slice = f.slice(offset, stride, size);
 ```

 ### Evaluate
+
 Evaluate the polynoomial:

 ```rust
@@ -216,6 +231,7 @@ f.eval_on_domain(HostSlice::from_slice(&domain), HostSlice::from_mut_slice(&mut
 ```

 ### Read coefficients
+
 Read or copy polynomial coefficients for further processing:

 ```rust
@@ -227,6 +243,7 @@ f.copy_coeffs(0, &mut device_mem[..]);
 ```

 ### Polynomial Degree
+
 Determine the highest power of the variable with a non-zero coefficient:

 ```rust
@@ -234,6 +251,7 @@ let deg = f.degree();  // Degree of the polynomial
 ```

 ### Memory Management: Views (rust slices)
+
 Rust enforces correct usage of views at compile time, eliminating the need for runtime checks:

 ```rust
--- a/docs/docs/icicle/rust-bindings/vec-ops.md
+++ b/docs/docs/icicle/rust-bindings/vec-ops.md
@@ -1,13 +1,6 @@
 # Vector Operations API

-Our vector operations API which is part of `icicle-cuda-runtime` package, includes fundamental methods for addition, subtraction, and multiplication of vectors, with support for both host and device memory. 
-
-
-## Supported curves
-
-Vector operations are supported on the following curves:
-
-`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`, `grumpkin`
+Our vector operations API which is part of `icicle-cuda-runtime` package, includes fundamental methods for addition, subtraction, and multiplication of vectors, with support for both host and device memory.

 ## Examples

@@ -59,7 +52,6 @@ let cfg = VecOpsConfig::default();
 mul_scalars(&a, &ones, &mut result, &cfg).unwrap();
 ```

-
 ## Vector Operations Configuration

 The `VecOpsConfig` struct encapsulates the settings for vector operations, including device context and operation modes.
@@ -90,7 +82,7 @@ pub struct VecOpsConfig<'a> {

 `VecOpsConfig` can be initialized with default settings tailored for a specific device:

-```
+```rust
 let cfg = VecOpsConfig::default();
 ```

@@ -118,7 +110,7 @@ impl<'a> VecOpsConfig<'a> {

 ## Vector Operations

-Vector operations are implemented through the `VecOps` trait, these traits are implemented for all [supported curves](#supported-curves) providing methods for addition, subtraction, and multiplication of vectors.
+Vector operations are implemented through the `VecOps` trait, providing methods for addition, subtraction, and multiplication of vectors.

 ### `VecOps` Trait

@@ -155,7 +147,6 @@ All operations are element-wise operations, and the results placed into the `res
 - **`sub`**: Computes the element-wise difference between two vectors.
 - **`mul`**: Performs element-wise multiplication of two vectors.

-
 ## MatrixTranspose API Documentation

 This section describes the functionality of the `TransposeMatrix` function used for matrix transposition.
@@ -186,8 +177,8 @@ where
 - **`column_size`**: The number of columns in the input matrix.
 - **`output`**: A mutable slice to store the transposed matrix. The slice can be stored on either the host or the device.
 - **`ctx`**: A reference to the `DeviceContext`, which provides information about the device where the operation will be performed.
- **`on_device`**: A boolean flag indicating whether the inputs and outputs are on the device. 
- **`is_async`**: A boolean flag indicating whether the operation should be performed asynchronously. 
+- **`on_device`**: A boolean flag indicating whether the inputs and outputs are on the device.
+- **`is_async`**: A boolean flag indicating whether the operation should be performed asynchronously.

 ### Return Value

@@ -209,9 +200,8 @@ transpose_matrix(&input, 5, 4, &mut output, &ctx, true, false)
    .expect("Failed to transpose matrix");
 ```

-
 The function takes a matrix represented as a 1D slice, transposes it, and stores the result in another 1D slice. The input and output slices can be stored on either the host or the device, and the operation can be performed synchronously or asynchronously.

 The function is generic and can work with any type `F` that implements the `FieldImpl` trait. The `<F as FieldImpl>::Config` type must also implement the `VecOps<F>` trait, which provides the `transpose` method used to perform the actual transposition.

-The function returns an `IcicleResult<()>`, indicating whether the operation was successful or not.
+The function returns an `IcicleResult<()>`, indicating whether the operation was successful or not.
--- a/docs/docs/introduction.md
+++ b/docs/docs/introduction.md
@@ -11,7 +11,7 @@ Ingonyama is a next-generation semiconductor company, focusing on Zero-Knowledge
 Currently our flagship products are:

 - **ICICLE**:
-  [ICICLE](https://github.com/ingonyama-zk/icicle) is a fully featured GPU accelerated cryptography library for building ZK provers. ICICLE allows you to accelerate your ZK existing protocols in a matter of hours or implement your protocol from scratch on GPU.
+  [ICICLE](https://github.com/ingonyama-zk/icicle) is a fully featured GPU accelerated cryptography library for building ZK provers. ICICLE allows you to accelerate your existing ZK protocols in a matter of hours or implement your protocol from scratch on GPU.

 ---

@@ -39,7 +39,7 @@ Learn more about ICICLE and GPUs [here][ICICLE-OVERVIEW].

 ## Get in Touch

-If you have any questions, ideas, or are thinking of building something in this space join the discussion on [Discord]. You can explore our code on [github](https://github.com/ingonyama-zk) or read some of [our research papers](https://github.com/ingonyama-zk/papers).
+If you have any questions, ideas, or are thinking of building something in this space, join the discussion on [Discord]. You can explore our code on [github](https://github.com/ingonyama-zk) or read some of [our research papers](https://github.com/ingonyama-zk/papers).

 Follow us on [Twitter](https://x.com/Ingo_zk) and [YouTube](https://www.youtube.com/@ingo_ZK) and sign up for our [mailing list](https://wkf.ms/3LKCbdj) to get our latest announcements.

--- a/docs/package-lock.json
+++ b/docs/package-lock.json
@@ -3680,8 +3680,6 @@
      "version": "8.12.0",
      "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.12.0.tgz",
      "integrity": "sha512-sRu1kpcO9yLtYxBKvqfTeh9KzZEwO3STyX1HT+4CaDzC6HpTGYhIhPIzj9XuKU7KYDwnaeh5hcOwjy1QuJzBPA==",
-      "optional": true,
-      "peer": true,
      "dependencies": {
        "fast-deep-equal": "^3.1.1",
        "json-schema-traverse": "^1.0.0",
@@ -3696,9 +3694,7 @@
    "node_modules/ajv-formats/node_modules/json-schema-traverse": {
      "version": "1.0.0",
      "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz",
-      "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==",
-      "optional": true,
-      "peer": true
+      "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug=="
    },
    "node_modules/ajv-keywords": {
      "version": "3.5.2",
@@ -16344,13 +16340,14 @@
      "version": "2.1.1",
      "resolved": "https://registry.npmjs.org/ajv-formats/-/ajv-formats-2.1.1.tgz",
      "integrity": "sha512-Wx0Kx52hxE7C18hkMEggYlEifqWZtYaRgouJor+WMdPnQyEK13vgEWyVNup7SoeeoLMsr4kf5h6dOW11I15MUA==",
-      "requires": {},
+      "requires": {
+        "ajv": "^8.0.0"
+      },
      "dependencies": {
        "ajv": {
-          "version": "https://registry.npmjs.org/ajv/-/ajv-8.12.0.tgz",
+          "version": "8.12.0",
+          "resolved": "https://registry.npmjs.org/ajv/-/ajv-8.12.0.tgz",
          "integrity": "sha512-sRu1kpcO9yLtYxBKvqfTeh9KzZEwO3STyX1HT+4CaDzC6HpTGYhIhPIzj9XuKU7KYDwnaeh5hcOwjy1QuJzBPA==",
-          "optional": true,
-          "peer": true,
          "requires": {
            "fast-deep-equal": "^3.1.1",
            "json-schema-traverse": "^1.0.0",
@@ -16361,9 +16358,7 @@
        "json-schema-traverse": {
          "version": "1.0.0",
          "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz",
-          "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==",
-          "optional": true,
-          "peer": true
+          "integrity": "sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug=="
        }
      }
    },
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -24,6 +24,42 @@ module.exports = {
          label: "ICICLE Core",
          id: "icicle/core",
        },
+        {
+          type: "category",
+          label: "Primitives",
+          link: {
+            type: `doc`,
+            id: 'icicle/primitives/overview',
+          },
+          collapsed: true,
+          items: [
+            {
+              type: "doc",
+              label: "MSM",
+              id: "icicle/primitives/msm",
+            },
+            {
+              type: "doc",
+              label: "NTT",
+              id: "icicle/primitives/ntt",
+            },
+            {
+              type: "doc",
+              label: "Poseidon Hash",
+              id: "icicle/primitives/poseidon",
+            },
+          ],
+        },
+        {
+          type: "doc",
+          label: "Polynomials",
+          id: "icicle/polynomials/overview",
+        },
+        {
+          type: "doc",
+          label: "Multi GPU Support",
+          id: "icicle/multi-gpu",
+        },
        {
          type: "category",
          label: "Golang bindings",
@@ -123,42 +159,6 @@ module.exports = {
            },
          ],
        },
-        {
-          type: "category",
-          label: "Primitives",
-          link: {
-            type: `doc`,
-            id: 'icicle/primitives/overview',
-          },
-          collapsed: true,
-          items: [
-            {
-              type: "doc",
-              label: "MSM",
-              id: "icicle/primitives/msm",
-            },
-            {
-              type: "doc",
-              label: "NTT",
-              id: "icicle/primitives/ntt",
-            },
-            {
-              type: "doc",
-              label: "Poseidon Hash",
-              id: "icicle/primitives/poseidon",
-            },
-          ],
-        },
-        {
-          type: "doc",
-          label: "Polynomials",
-          id: "icicle/polynomials/overview",
-        },
-        {
-          type: "doc",
-          label: "Multi GPU Support",
-          id: "icicle/multi-gpu",
-        },
        {
          type: "doc",
          label: "Google Colab Instructions",
@@ -190,6 +190,7 @@ module.exports = {
      type: "category",
      label: "Additional Resources",
      collapsed: false,
+      collapsible: false,
      items: [
        {
          type: "link",
--- a/wrappers/golang/README.md
+++ b/wrappers/golang/README.md
@@ -1,6 +1,6 @@
 # Golang Bindings

-In order to build the underlying ICICLE libraries you should run the build script `build.sh` found in the `wrappers/golang` directory.
+In order to build the underlying ICICLE libraries you should run the build script found [here](./build.sh).

 Build script USAGE

@@ -16,35 +16,50 @@ field - The name of the field to build or "all" to build all fields

 To build ICICLE libraries for all supported curves with G2 and ECNTT enabled.

-```
-./build.sh all -g2 -ecntt
+```sh
+./build.sh -curve=all -g2 -ecntt
 ```

 If you wish to build for a specific curve, for example bn254, without G2 or ECNTT enabled.

-```
-./build.sh bn254
+```sh
+./build.sh -curve=bn254
 ```

->[!NOTE]
->Current supported curves are `bn254`, `bls12_381`, `bls12_377`, `bw6_671` and `grumpkin`
->Current supported fields are `babybear`
+## Supported curves, fields and operations

->[!NOTE]
->G2 and ECNTT are located in nested packages
+### Supported curves and operations
+
+| Operation\Curve | bn254 | bls12_377 | bls12_381 | bw6-761 | grumpkin |
+| --- | :---: | :---: | :---: | :---: | :---: |
+| MSM | ✅ | ✅ | ✅ | ✅ | ✅ |
+| G2  | ✅ | ✅ | ✅ | ✅ | ❌ |
+| NTT | ✅ | ✅ | ✅ | ✅ | ❌ |
+| ECNTT | ✅ | ✅ | ✅ | ✅ | ❌ |
+| VecOps | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Polynomials | ✅ | ✅ | ✅ | ✅ | ❌ |
+
+### Supported fields and operations
+
+| Operation\Field | babybear |
+| --- | :---: |
+| VecOps | ✅ |
+| Polynomials | ✅ |
+| NTT | ✅ |
+| Extension Field | ✅ |

 ## Running golang tests

 To run the tests for curve bn254.

-```bash
-go test ./wrappers/golang/curves/bn254 -count=1
+```sh
+go test ./wrappers/golang/curves/bn254/tests -count=1 -v
 ```

 To run all the tests in the golang bindings

-```bash
-go test ./... -count=1
+```sh
+go test ./... -count=1 -v
 ```

 ## How do Golang bindings work?
@@ -76,7 +91,7 @@ Replace `/path/to/shared/libs` with the actual path where the shared libraries a

 In some cases you may encounter the following error, despite exporting the correct `LD_LIBRARY_PATH`.

-```
+```sh
 /usr/local/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
 /usr/bin/ld: cannot find -lbn254: No such file or directory
 /usr/bin/ld: cannot find -lbn254: No such file or directory
@@ -90,7 +105,7 @@ This is normally fixed by exporting the path to the shared library location in t

 ### cuda_runtime.h: No such file or directory

-```
+```sh
 # github.com/ingonyama-zk/icicle/v2/wrappers/golang/curves/bls12381
 In file included from wrappers/golang/curves/bls12381/curve.go:5:
 wrappers/golang/curves/bls12381/include/curve.h:1:10: fatal error: cuda_runtime.h: No such file or directory
@@ -101,6 +116,6 @@ compilation terminated.

 Our golang bindings rely on cuda headers and require that they can be found as system headers. Make sure to add the `cuda/include` of your cuda installation to your CPATH

-```
+```sh
 export CPATH=$CPATH:<path/to/cuda/include>
 ```