Compare commits

...

242 Commits

Author SHA1 Message Date
Soowon Jeong
e02de2f485 Specify curve as bn254 2024-08-13 17:59:48 +09:00
Ido Atlas
7a32a88d6d integration with V2 2024-04-17 11:19:38 +03:00
Ido Atlas
4cc5d2d71c Merge branch 'V2' into sumcheck-experiments 2024-04-15 17:43:10 +03:00
yshekel
6e3eecd8ca Polynomial API and CUDA backend
(1) C++ template API that is backend (GPU, ZPU or other) agnostic
(polynomials.h)
(2) concrete CUDA implementation (polynomial_cuda_backend.cu/cuh)
(3) C API for FFI
(4) Groth16 example (test)
(5) icicle library is now built for tests as well. Polynomial tests are
linked to icicle lib.
2024-04-14 16:31:49 +03:00
DmytroTym
0c340924eb Baby bear Rust wrappers (#472)
This PR introduces Rust wrappers for the baby bear finite field,
including NTTs and vector operations (for both native and extension
fields).
2024-04-14 18:23:43 +07:00
Ido Atlas
9d461be0bd small fix 2024-04-14 11:58:24 +03:00
DmytroTym
c123940abb Tests and benches for extension fields (#470)
- C++ side tests and benchmarks refactored using templates;
- Extension field tests and benchmarks added;
- README added for benches;
- Some small additions to extension field API for convenience;
2024-04-13 20:07:30 +07:00
nonam3e
af4ab88f3a Feat/golang/v2 bindings (#468)
## Describe the changes

This PR updates current golang bindings to icicle v2
2024-04-11 22:44:53 +07:00
nonam3e
1ae1fa892c build templates 2024-04-11 14:25:57 +00:00
nonam3e
bf89f96dff Update wrappers/golang/internal/generator/templates/ecntt_test.go.tmpl
Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
2024-04-11 19:44:03 +07:00
nonam3e
ebf9fc4740 fmt template 2024-04-11 11:31:17 +00:00
nonam3e
b56187452f fmt 2024-04-11 11:17:28 +00:00
nonam3e
ba41d3e97b merge uint64 2024-04-11 11:08:39 +00:00
nonam3e
7fe2a11941 fmt 2024-04-11 10:03:48 +00:00
nonam3e
5144c15845 fmt 2024-04-11 10:02:49 +00:00
nonam3e
013f1efa09 review fixes 2024-04-11 09:52:42 +00:00
Jeremy Felder
b2b3702b20 [BREAKING] Change golang field representation to uint64 (#400)
## Describe the changes

This PR changes the field representation in Golang bindings to use
uint64 limbs instead of uint32 for easier conversion with other
libraries
2024-04-11 11:55:43 +03:00
ChickenLover
5b8e22953b run releasedomain last and add assertion to test (#469)
## Describe the changes

This PR...

## Linked Issues

Resolves #

---------

Co-authored-by: LeonHibnik <leon@ingonyama.com>
2024-04-11 13:26:27 +07:00
nonam3e
452aea71af update golang readme 2024-04-10 23:17:23 +00:00
nonam3e
356b13f650 update CI 2024-04-10 23:05:46 +00:00
nonam3e
2952401e40 g2 ecntt packages 2024-04-10 23:00:59 +00:00
Ido Atlas
cf9579d4f9 separate single round 2024-04-10 15:02:28 +03:00
nonam3e
a50af2f06e ecntt build tag 2024-04-09 20:39:13 +00:00
Ido Atlas
bdf4f1ee24 separate/unified option 2024-04-09 16:29:18 +03:00
ChickenLover
9af6bb5666 Merge branch 'main' into V2 2024-04-09 18:47:07 +07:00
ChickenLover
da31adefce v2 new design (#443)
Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
Co-authored-by: Otsar <122266060+Otsar-Raikou@users.noreply.github.com>
Co-authored-by: VitaliiH <vhnatyk@gmail.com>
Co-authored-by: release-bot <release-bot@ingonyama.com>
Co-authored-by: Stas <spolonsky@icloud.com>
Co-authored-by: Yuval Shekel <yshekel@gmail.com>
Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
Co-authored-by: ImmanuelSegol <3ditds@gmail.com>
Co-authored-by: JimmyHongjichuan <45908291+JimmyHongjichuan@users.noreply.github.com>
Co-authored-by: pierre <pierreuu@gmail.com>
Co-authored-by: Leon Hibnik <107353745+LeonHibnik@users.noreply.github.com>
Co-authored-by: nonam3e <timur@ingonyama.com>
Co-authored-by: Vlad <88586482+vladfdp@users.noreply.github.com>
Co-authored-by: LeonHibnik <leon@ingonyama.com>
2024-04-09 18:27:35 +07:00
Leon Hibnik
a7b0dc40c1 [FEAT] ReleaseDomain API (#465)
## Describe the changes

This PR adds a NTT ReleaseDomain API in Golang and Rust

## Linked Issues

Resolves #

---------

Co-authored-by: Yuval Shekel <yshekel@gmail.com>
2024-04-09 12:58:19 +03:00
Ido Atlas
9b39e771ea double round separate kernels 2024-04-09 12:37:10 +03:00
Vlad
4a35eece51 transpose kernel in vec_ops and rust binding (#462)
## Describe the changes

This PR adds an extern C link to the transpose kernel, now in
vec_ops.cu.
Also Rust binding, and I updated the test check_ntt_batch to use the new
transpose function.
The test passes.

## Linked Issues

Resolves #

---------

Co-authored-by: LeonHibnik <leon@ingonyama.com>
2024-04-09 08:47:33 +03:00
VitaliiH
4c9b3c00a5 Devmode to Reduce compilation time (including G2 and ECNTT) (#395)
devmode to reduce compilation time
2024-04-09 06:09:04 +02:00
Jeremy Felder
c6719167ce [FEAT]: golang device slice ranges (#463)
## Describe the changes

This PR adds the capability to slice a DeviceSlice, allowing portions of
data that are already on the device to be reused.

Additionally, this PR removes the need for a HostSlice underlying type
to implement a Size function and uses unsafe.Sizeof instead. This
together with #407 will allow direct usage of gnark-crypto types with
HostSlice without the need for converting to ICICLE types

---------

Co-authored-by: nonam3e <timur@ingonyama.com>
2024-04-08 19:42:03 +03:00
Leon Hibnik
cd3769d6b7 Fix Golang TestNttDeviceAsync (#461)
## Describe the changes

This PR fixes TestNttDeviceAsync by adding a missing call to initDomain

## Linked Issues

Resolves #
2024-04-08 17:47:10 +03:00
DmytroTym
b93b1d0aaf NTT inplace in Rust (#453)
## Describe the changes

Due to Rust's ownership rules, we can't run NTT inplace using the
[`ntt`](https://github.com/ingonyama-zk/icicle/blob/v1.9.1/wrappers/rust/icicle-core/src/ntt/mod.rs#L139)
function. Which is why we saw a need to add a separate function a couple
of times.

Incidentally an issue with radix-2 NTT was found when ran inplace,
`__syncthreads()` was used in reverse order kernel as if it was a global
barrier for all blocks and not block-local one. Thus data race happened
that is fixed by this PR.
2024-04-08 10:04:04 +03:00
Leon Hibnik
6a96eef8dc add golang multigpu to sidebar (#449)
This PR adds multi GPU golang documentation to dev sidebar
2024-04-08 09:20:29 +03:00
Ido Atlas
828fc9c006 double round verified for 1 poly 2024-04-07 18:42:17 +03:00
Ido Atlas
53395312e3 add reference for double round 2024-04-07 14:42:01 +03:00
JimmyHongjichuan
95ab6de059 fix: use the log2 in lib std explicitly to prevent makefile from link… (#459)
…ing other log2 func

## Describe the changes

This PR adds "std" as prefix on log2 function of
icicle/appUtils/msm/msm.cu to explicitly use std::log2 for MSM
calculatation.

## Linked Issues

https://github.com/ingonyama-zk/icicle/issues/458

Resolves #

Co-authored-by: pierre <pierreuu@gmail.com>
2024-04-07 07:58:53 +03:00
Yuval Shekel
9c9311bee0 golang multi-device MSM test temporarily disabled due to issues related to golang tests env 2024-04-04 23:23:18 +03:00
Yuval Shekel
406020bda6 fix: NTT release domain linkage 2024-04-04 23:23:18 +03:00
Ido Atlas
709e6a12e5 really verify 2 and 4 2024-04-03 13:27:55 +03:00
Ido Atlas
0e067b296f adding templated version 2024-04-03 13:25:17 +03:00
Ido Atlas
b9409d2109 generalized reference 2024-04-03 12:57:44 +03:00
Ido Atlas
19fe6f6d5d generalized verified for 1 and 3 polys 2024-04-03 12:46:35 +03:00
Ido Atlas
be4b636494 generalized 2024-04-02 13:31:20 +03:00
Ido Atlas
9a63f17e1b ALL VERIFIED 2024-04-01 22:37:33 +03:00
Ido Atlas
425dd2c38d fix alg1unified with new kernel 2024-04-01 22:01:02 +03:00
Ido Atlas
0c6fe543fa new test vector 2024-04-01 20:05:26 +03:00
Ido Atlas
c5ed01b52c alg3 simple works and verified 2024-04-01 16:32:43 +03:00
Ido Atlas
57ac6a13dc inplace sum reduction 2024-03-31 14:29:22 +03:00
Ido Atlas
7833324d9c suncheck 3 - first round passses 2024-03-31 12:24:04 +03:00
Ido Atlas
2111c3a91a order change, alg1 unified works without bugs 2024-03-27 21:49:27 +02:00
release-bot
25ac705c3b Bump rust crates' version
icicle-bls12-377@1.9.1
icicle-bls12-381@1.9.1
icicle-bn254@1.9.1
icicle-bw6-761@1.9.1
icicle-core@1.9.1
icicle-cuda-runtime@1.9.1
icicle-grumpkin@1.9.1

Generated by cargo-workspaces
2024-03-27 19:00:07 +00:00
VitaliiH
ef757e8210 hotfix for large ecntt (#448)
hotfix for large ECNTTs
2024-03-27 18:31:50 +01:00
Ido Atlas
009a17af11 before rewriting 2024-03-27 17:23:38 +02:00
Otsar
2c1431d904 Update Hall of fame in 'README.md' (#445)
Added v1.8's contributors to hall of fame
2024-03-27 16:57:41 +02:00
Yuval Shekel
4b50103151 enable CI for V2 branch 2024-03-27 12:26:45 +02:00
Ido Atlas
e24f74c97a adding test vecs 2024-03-26 15:59:38 +02:00
ImmanuelSegol
77ebc4848e Docs 1.8 (#436) 2024-03-25 08:54:17 -04:00
Ido Atlas
fa5064784f update kernels for alg3 2024-03-24 17:42:44 +02:00
Yuval Shekel
919ff42f49 fix: NTT input is const 2024-03-24 16:26:10 +02:00
Ido Atlas
f0b86b9c06 updated kernels for product. problem with n>16 for unified 2024-03-21 16:56:09 +02:00
Ido Atlas
79719fcb0f unified kernel works 2024-03-21 15:36:59 +02:00
release-bot
a1ff989740 Bump rust crates' version
icicle-bls12-377@1.9.0
icicle-bls12-381@1.9.0
icicle-bn254@1.9.0
icicle-bw6-761@1.9.0
icicle-core@1.9.0
icicle-cuda-runtime@1.9.0
icicle-grumpkin@1.9.0

Generated by cargo-workspaces
2024-03-21 07:11:47 +00:00
Otsar
1f2144a57c Removed "machines using ICICLE" static badge (#442) 2024-03-21 09:04:19 +02:00
Jeremy Felder
db4c07dcaf Golang bindings for ECNTT (#433) 2024-03-21 09:04:00 +02:00
ChickenLover
d4f39efea3 Add Keccak hash function (#435)
This PR adds support for Keccak-256 and Keccak-512. It only adds them in
c++. There is no way of adding rust or golang wrappers rn as it requires
having an `icicle-common` create / mod
2024-03-20 22:30:19 +02:00
Yuval Shekel
7293058246 fix: (golang) MSM multi device test reset to original device after test is done 2024-03-20 16:27:11 +02:00
Yuval Shekel
03136f1074 fix: (golang) add missing NttAlgorithm field in NTTConfig 2024-03-20 16:27:11 +02:00
Yuval Shekel
3ef0d0c66e MSM scalars and points params are const
- This is required to be able to compute MSM on polynomial coefficients that are accessible by const only.
2024-03-20 16:27:11 +02:00
Stas
0dff1f9302 Use multi-threaded CUDA compilation to spped up compilation (#439)
## Describe the changes

Speed up CUDA c++ compile time using multi-threaded compilation
(--split-compile flag).
The tests on 8 core machine show ~2x acceleration.

## Linked Issues

Compiling c++ takes long time
2024-03-18 16:40:30 -04:00
ChickenLover
0d806d96ca tidy (#437) 2024-03-19 00:59:10 +07:00
hadaringonyama
792a2e5f97 updated verification env 2024-03-17 14:02:43 +02:00
release-bot
b6b5011a47 Bump rust crates' version
icicle-bls12-377@1.8.0
icicle-bls12-381@1.8.0
icicle-bn254@1.8.0
icicle-bw6-761@1.8.0
icicle-core@1.8.0
icicle-cuda-runtime@1.8.0
icicle-grumpkin@1.8.0

Generated by cargo-workspaces
2024-03-13 21:38:17 +00:00
DmytroTym
7ac463c3d9 MSM pre-computation (#427)
## Brief description

This PR adds pre-computation to the MSM, for some theory see
[this](https://youtu.be/KAWlySN7Hm8?si=XeR-htjbnK_ySbUo&t=1734) timecode
of Niall Emmart's talk.
In terms of public APIs, one method is added. It does the
pre-computation on-device leaving resulting data on-device as well. No
extra structures are added, only `precompute_factor` from `MSMConfig` is
now activated.

## Performance

While performance gains are for now often limited by our inflexibility
in choice of `c` (for example, very large MSMs get basically no speedup
from pre-compute because currently `c` cannot be larger than 16),
there's still a number of MSM sizes which get noticeable improvement:

| Pre-computation factor | bn254 size `2^20` MSM, ms. | bn254 size
`2^12` MSM, size `2^10` batch, ms. | bls12-381 size `2^20` MSM, ms. |
bls12-381 size `2^12` MSM, size `2^10` batch, ms. |
| ------------- | ------------- | ------------- | ------------- |
------------- |
| 1  | 14.1  | 82.8  | 25.5  | 136.7  |
| 2  | 11.8  | 76.6  | 20.3  | 123.8  |
| 4  | 10.9  | 73.8  | 18.1  | 117.8  |
| 8  | 10.6  | 73.7  | 17.2  | 116.0  |

Here for example pre-computation factor = 4 means that alongside each
original base point, we pre-compute and pass into the MSM 3 of its
"shifted" versions. Pre-computation factor = 1 means no pre-computation.
GPU used for benchmarks is a 3090Ti.

## TODOs and open questions

- Golang APIs are missing;
- I mentioned that to utilise pre-compute to its full potential we need
arbitrary choice of `c`. One issue with this is that pre-compute will
become dependent on `c`. For now this is not the case as `c` can only be
a power of 2 and powers of 2 can always share the same pre-computation.
So apparently we need to make `c` a parameter of the precompute function
to future-proof it from a breaking change. This is pretty unnatural and
counterintuitive as `c` is typically chosen in runtime after pre-compute
is done but I don't really see another way, pls let me know if you do.
UPD: `c` is added into pre-compute function, for now it's unused and
it's documented how it will change in the future.

Resolves https://github.com/ingonyama-zk/icicle/issues/147
Co-authored with @ChickenLover

---------

Co-authored-by: ChickenLover <romangg81@gmail.com>
Co-authored-by: nonam3e <timur@ingonyama.com>
Co-authored-by: nonam3e <71525212+nonam3e@users.noreply.github.com>
Co-authored-by: LeonHibnik <leon@ingonyama.com>
2024-03-13 23:25:16 +02:00
HadarIngonyama
287f53ff16 NTT columns batch (#424)
This PR adds the columns batch feature - enabling batch NTT computation
to be performed directly on the columns of a matrix without having to
transpose it beforehand, as requested in issue #264.

Also some small fixes to the reordering kernels were added and some
unnecessary parameters were removes from functions interfaces.

---------

Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
2024-03-13 18:46:47 +02:00
Jeremy Felder
89082fb561 FEAT: MultiGPU for golang bindings (#417)
## Describe the changes

This PR adds multi gpu support in the golang bindings.

Tha main changes are to DeviceSlice which now includes a `deviceId`
attribute specifying which device the underlying data resides on and
checks for correct deviceId and current device when using DeviceSlices
in any operation.

In Go, most concurrency can be done via Goroutines (described as
lightweight threads - in reality, more of a threadpool manager),
however, there is no guarantee that a goroutine stays on a specific host
thread. Therefore, a function `RunOnDevice` was added to the
cuda_runtime package which locks a goroutine into a specific host
thread, sets a current GPU device, runs a provided function, and unlocks
the goroutine from the host thread after the provided function finishes.
While the goroutine is locked to the hsot thread, the Go runtime will
not assign other goroutines to that host thread
2024-03-13 16:19:45 +02:00
hadaringonyama
f6fee8e201 sumcheck verified 2024-03-13 10:33:38 +02:00
hadaringonyama
f5492a97fb bug fix 2024-03-12 16:28:31 +02:00
hadaringonyama
2ffda0ffac accumulate function verified 2024-03-12 15:50:30 +02:00
hhh_QC
08ec0b1ff6 update go install source in Dockerfile (#428) 2024-03-10 10:47:08 +02:00
Jeremy Felder
fa219d9c95 Fix release flow with deploy key and caching (#425)
## Describe the changes

This PR fixes the release flow action
2024-03-10 08:57:35 +02:00
DmytroTym
0e84fb4b76 feat: add warmup for CudaStream (#422)
## Describe the changes

Add a non-blocking `warmup` function to `CudaStream` 

> when you run the benchmark (e.g. the msm example you have) the first
instance is always slow, with a constant overhead of 200~300ms cuda
stream warmup. and I want to get rid of that in my application by
warming it up in parallel while my host do something else.
2024-03-07 19:11:34 +02:00
Alex Xiong
d8059a2a4e Merge pull request #1 from ingonyama-zk/feat/warmup
Warmup function added
2024-03-07 18:18:18 +08:00
DmytroTym
4a65758408 Merge branch 'main' into feat/warmup 2024-03-06 22:08:45 +02:00
Jeremy Felder
1abd2ef9c9 Bump rust crates' version
icicle-bls12-377@1.7.0
icicle-bls12-381@1.7.0
icicle-bn254@1.7.0
icicle-bw6-761@1.7.0
icicle-core@1.7.0
icicle-cuda-runtime@1.7.0
icicle-grumpkin@1.7.0

Generated by cargo-workspaces
2024-03-06 22:05:10 +02:00
Jeremy Felder
9d402df0cf Release flow CI (#423)
## Describe the changes

This PR:
- Moves common crate attributes to the workspace Cargo.toml. 
- Adds a manual release flow for bumping, tagging, and draft release
2024-03-06 21:41:48 +02:00
DmytroTym
7185657ff7 Warmup function 2024-03-06 18:13:23 +02:00
Alex Xiong
b22aa02e91 fix: cargo fmt 2024-03-06 13:10:12 +00:00
ImmanuelSegol
af6bfc9ab0 golang docs (#413)
## Describe the changes

This PR...

## Linked Issues

Resolves #

---------

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
2024-03-06 08:59:32 -04:00
Alex Xiong
b108c71bdd feat: add rust api for cudaFreeAsync 2024-03-06 12:44:43 +00:00
ChickenLover
9fc083916d Small features (#415)
This PR is a compilation of small improvements

 - Lock bindgen version for `icicle-cuda-runtime`
- Add an error message when trying to build on Mac (or any non
windows/linux machine)
 - Add documentation and template files for adding new curve
 - Add documentation on _params.cuh contents
- Add the script to bump all the rust crates versions to the same
version

Resolves #313
2024-03-06 13:48:34 +02:00
Jeremy Felder
87ccd62976 Fix go setup in CI (#420)
## Describe the changes

This PR adds the use of setup-go in the CI to ensure that `go` is
installed properly and caches dependencies and build outputs by default
2024-03-06 12:31:24 +02:00
Jeremy Felder
d8f2313a01 Fix the rust changed files glob pattern (#419)
## Describe the changes

This PR fixes the glob pattern of changed files for rust
2024-03-05 12:27:08 +02:00
ChickenLover
a2ae7a9e2f minor changes to cuda (#414)
## Describe the changes

 - Fix include statements to use absolute path
 - Remove stale comments and code parts
 - Fix test_kernels.cu bug
2024-03-04 16:52:03 +07:00
ImmanuelSegol
d98b851d62 fix-primitives-links (#411) 2024-03-03 13:42:28 +07:00
ImmanuelSegol
4d19ca0b98 Vec ops docs (#410)
* fix examples

* revert

* refactor

* refactor

* refactor

* refactor

* refactor
2024-02-29 13:26:17 -05:00
Otsar
2e20be56f7 Added badges (#409)
* Update overview.md

Changed and updated static badges

* Update README.md

Added badge to readme
2024-02-29 15:26:12 +02:00
ImmanuelSegol
afa61c64f4 fix dcos base url (#408)
refactor
2024-02-29 14:18:53 +02:00
Jeremy Felder
7934e15768 Release v1.6.0 (#406)
Release v1.6.0

- Add vector operations to golang bindings #399 
- Add Pederson commitment example in c++ #397
- Update CI for faster/better flow #398 
- Fix dev docs CI #405
2024-02-29 08:27:10 +02:00
ImmanuelSegol
76939f34e0 fix docs deploy github action (#405) 2024-02-28 18:09:04 +02:00
Jeremy Felder
1c1b2bab64 CI: move to language specific flows (#398)
Updates the CI to:
- run per supported language
- conditional run logic
- pipelined jobs for failing fast
- additional parallelization
- run golang build on windows
- reuse the check-changed-files workflow
2024-02-28 18:09:03 +02:00
Stas
d90081926f Pedersen commitment example in c++ (#397)
* initial commit

* random elliptic points

* initial complete example

* public random seed to prevent knowing dlogs

* cleaned up code

* add README

* Update examples/c++/pedersen-commitment/README.md

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

* updates to PR comments

* codespell compliance

* corrected terminology in README

---------

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
2024-02-28 18:09:03 +02:00
Jeremy Felder
656dd18cf8 Add vector operations for golang bindings (#399) 2024-02-28 18:09:03 +02:00
Jeremy Felder
40309329fb Migrate docs website + improved docs (#389) (#403)
migrate docs website + improved docs (#389)

* Update README.md (#385)

* refactor

* refactor

* refactor

* rename task

* update codespell

* multi gpu docs (#391)

* Refactor

* refacotr

* fix typo

* Apply suggestions from code review



* refactor

* refactor

---------

Co-authored-by: ImmanuelSegol <3ditds@gmail.com>
Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
Co-authored-by: ChickenLover <Romangg81@gmail.com>
2024-02-28 14:40:04 +02:00
hadaringonyama
2af9e44219 sumcheck design 2024-02-25 11:18:07 +02:00
Jeremy Felder
e6035698b5 Release v1.5.0 (#393)
# Contents of this release

Examples: multi-gpu example #381
Examples: updates example compares Radix2 and MixedRadix NTTs #383
Feat: add vector operations bindings to Rust #384 
Examples: update examples with new vec ops #388 
Feat: Grumpkin curve implementation #379 
Feat: mixed-radix NTT fast twiddles mode #382 
Docs: Update README.md #385 #387 
README: Update Hall of Fame section #394 
Examples: add rust poseidon example #392 
Feat: GoLang bindings for v1.x #386
2024-02-23 10:15:18 +02:00
Jeremy Felder
e8cd2d7a98 GoLang bindings for v1.x (#386) 2024-02-22 20:52:48 +02:00
ChickenLover
efda4de48f add rust poseidon example (#392)
add rust poseidon example
2024-02-22 19:47:40 +02:00
ChickenLover
402c9dfb53 Update Hall of Fame section (#394)
Add nonam3e as contributor
2024-02-22 19:39:58 +02:00
ChickenLover
9a6ab924c2 Update README.md (#385) (#387) 2024-02-22 11:39:15 +07:00
yshekel
275b2f4958 feature: mixed-radix NTT fast twiddles mode (#382)
- this mode is allocating additional 4N twiddle-factors to achieve faster computation
- enabled by flag for initDomain(). Defaults to false.

Co-authored-by: hadaringonyama <hadar@ingonyama.com>
2024-02-22 00:02:02 +02:00
nonam3e
4b221e9665 Grumpkin curve implementation (#379) 2024-02-21 23:20:28 +07:00
ChickenLover
965bf757f9 update examples with new vec ops (#388) 2024-02-21 22:30:40 +07:00
ChickenLover
f9755980f0 add vector operations bindings to Rust (#384)
* add vector operations bindings to Rust
2024-02-21 21:17:10 +07:00
Stas
bb62e716b4 Temp/stas/muli gpu example (#381)
## Describe the changes

This PR adds Multi-GPU Poseidon example

## Linked Issues

Some minor on-device memory issues require attention from devs, please
help
2024-02-20 19:45:44 -06:00
stas
c046fd17c6 removed my comments from poseidon.cuh 2024-02-20 20:43:59 -05:00
stas
82d1ff4769 fixed spelling in poseidon.cuh 2024-02-20 20:40:45 -05:00
Stas
d1f19af64d Merge branch 'dev' into temp/stas/muli-gpu-example 2024-02-20 19:07:48 -06:00
stas
b1af193f6f fixed spelling 2024-02-20 19:14:25 -05:00
Stas
49c7fb0db1 updates example compares Radix2 and MixedRadix NTTs (#383)
## Describe the changes

Update to cover new NTT algorithms
2024-02-20 18:05:39 -06:00
stas
4664cfded5 complied with reviewer's comments 2024-02-19 15:59:49 -05:00
ChickenLover
fc6badcb35 Update README.md (#385) 2024-02-19 18:54:10 +07:00
stas
fb9e5c8511 updates example compares Radix2 and MixedRadix NTTs 2024-02-18 18:40:01 -05:00
Stas Polonsky
518a3ad9b6 ready for PR 2024-02-17 00:18:21 +00:00
Stas Polonsky
6681be549a fixed on-device memory issue 2024-02-16 19:43:58 +00:00
Stas Polonsky
319358427f cudaSetDevice in the thread function 2024-02-16 16:35:04 +00:00
Stas Polonsky
8dd52306dc update README 2024-02-15 23:07:33 +00:00
Stas Polonsky
418c3d844b completed example 2024-02-15 22:10:15 +00:00
DmytroTym
15a63cc549 Release v1.4.0 (#378)
## Contents of this release

[FEAT]: support for multi-device execution:
https://github.com/ingonyama-zk/icicle/pull/356
[FEAT]: full support for new mixed-radix NTT:
https://github.com/ingonyama-zk/icicle/pull/367,
https://github.com/ingonyama-zk/icicle/pull/368 and
https://github.com/ingonyama-zk/icicle/pull/371
[FEAT]: examples for Poseidon hash and tree builder based on it
(currently only on C++ side):
https://github.com/ingonyama-zk/icicle/pull/375
[PERF]: MSM performance upgrades & zero point handling:
https://github.com/ingonyama-zk/icicle/pull/372
2024-02-15 22:32:56 +02:00
ImmanuelSegol
275eaa9904 bump version 2024-02-15 19:36:18 +00:00
DmytroTym
a91397e2c1 MSM improvements (#372)
* Improved MSM

* Zero point handling in large buckets

* Fixed affine zero point conversion for arkworks

* cargo fmt

* Addressed comments

* MSM comments

* All zero scalars case handled

* clang format
2024-02-15 20:02:10 +02:00
ImmanuelSegol
29675bb40d executes without errors 2024-02-15 16:45:33 +00:00
ChickenLover
3b48af55d7 fix declarator (#376)
Co-authored-by: Leon Hibnik <107353745+LeonHibnik@users.noreply.github.com>
2024-02-15 22:46:57 +07:00
ImmanuelSegol
481f144dc8 debug 2024-02-15 15:11:20 +00:00
ImmanuelSegol
086d36dd42 Merge remote-tracking branch 'origin/dev' into temp/stas/muli-gpu-example 2024-02-15 15:03:41 +00:00
DmytroTym
6854dbf06a Fix conflicts main (#377)
This PR fixes the conflicts between main and dev
2024-02-15 15:37:15 +02:00
ChickenLover
0929161a26 Merge branch 'dev' into fix-conflicts-main 2024-02-15 16:10:09 +07:00
ChickenLover
4f471ffa2a remove rust toolchains 2024-02-15 16:08:22 +07:00
DmytroTym
27e85d400a Poseidon examples C++ only (#375)
https://github.com/ingonyama-zk/icicle/pull/365 but without Rust which
for now doesn't work.
2024-02-15 10:46:09 +02:00
Yuval Shekel
ba6c3ae59c merge NTT part 2024-02-15 10:28:02 +02:00
ChickenLover
fd08925ed4 merge WIP 2024-02-15 14:57:09 +07:00
ChickenLover
66018f2367 rename example folder 2024-02-15 14:13:18 +07:00
stas
62cf733c5f answers Roman's comments 2024-02-15 14:07:48 +07:00
DmytroTym
76c3b4ba01 Merge branch 'dev' into poseidon-examples-no-cuda 2024-02-15 08:27:28 +02:00
ImmanuelSegol
4d75fbac93 issue with init_optimized_poseidon_constants 2024-02-14 22:28:17 +00:00
VitaliiH
774250926c multi card support (#356)
multi-GPU support
2024-02-14 22:29:30 +01:00
DmytroTym
2008259adc Merge branch 'dev' into poseidon-examples-no-cuda 2024-02-14 21:10:25 +02:00
DmytroTym
303a3b8770 Temporarily removed Rust Poseidon example 2024-02-14 21:06:10 +02:00
yshekel
0d70a0c003 fix: verify NTT size is a power of two (#374) 2024-02-14 17:54:31 +02:00
yshekel
c9e1d96b65 fix: wrong type in comment (#373)
NTTConfig -> PoseidonConfig
2024-02-14 15:51:25 +02:00
yshekel
a02459c64d Mixed-radix NTT support all orderings (#371)
- Mixed-radix NTT orderings support
- radix-2 small refactor: split core logic to function and renamed ct_butterfly to dit
- testing both radix2 and mixed-radix algs for all ntt tests
2024-02-13 15:49:24 +02:00
yshekel
ae060313db Mixed radix NTT coset support (#368) 2024-02-12 18:30:09 +02:00
yshekel
e16ce1026d Mixed-radix NTT batch support (#367)
Co-authored-by: hadaringonyama <hadar@ingonyama.com>
2024-02-12 14:50:22 +02:00
Leon Hibnik
d84ffd2679 Release/1.3.0 (#370)
Release 1.3.0:
* [FEAT] Mixed-radix NTT design
* [FEAT] example paths changed
2024-02-09 13:53:27 +02:00
ChickenLover
a65f44ad31 fix versioning problems 2024-02-09 15:55:36 +07:00
ChickenLover
8c1750ea97 Feat/roman/display functions (#366)
* fix display and debug traits

* leave only one impl for printing scalars
2024-02-09 14:40:07 +07:00
stas
582107fc7c added c++ example Poseidon-hash 2024-02-08 17:36:56 -05:00
Jeremy Felder
b5923a1791 Add concurrency group to examples workflow (#361) 2024-02-08 20:43:12 +00:00
Jeremy Felder
18fdd059da Fix: examples path deps (#363)
Change rust example deps to use paths

Co-authored-by: Leon Hibnik <107353745+LeonHibnik@users.noreply.github.com>
2024-02-08 20:43:12 +00:00
yshekel
382bec4ad3 Mixed-radix NTT algorithm
Co-authored-by: hadaringonyama <hadar@ingonyama.com>
2024-02-08 20:43:12 +00:00
Jeremy Felder
d367a8c1e0 Bump for release 2024-02-08 20:43:12 +00:00
Jeremy Felder
3cbdfe7f07 Add concurrency group to examples workflow (#361) 2024-02-08 16:35:48 +02:00
Jeremy Felder
e77173f266 Fix: examples path deps (#363)
Change rust example deps to use paths

Co-authored-by: Leon Hibnik <107353745+LeonHibnik@users.noreply.github.com>
2024-02-08 16:23:44 +02:00
yshekel
3582df2669 Mixed-radix NTT algorithm
Co-authored-by: hadaringonyama <hadar@ingonyama.com>
2024-02-08 13:52:00 +02:00
ImmanuelSegol
04b1b3dda5 refactor: add a basic example 2024-02-07 20:07:21 -04:00
Jeremy Felder
b6dded89cd Release v1.2.0 (#364)
Release v1.2.0:

- [FEAT] Add Poseidon hash as primitive
- [FEAT] Add Merkle tree using poseidon hash
- [BUG] Fix NTT overflow when using large cosets
2024-02-07 17:01:09 +02:00
Jeremy Felder
5a138367f8 (chore): bump rust crate versions (#362)
bump rust crate versions
2024-02-07 14:49:54 +02:00
ChickenLover
a3fc01d88d Implement Poseidon and TreeBuilder (#352)
* BW scalar field is now the same as BLS base field

* add poseidon

* add merkle tree builder

* poseidon rust bindings

* implement rust bindings

* add doc comments

* remove global poseidon constants

* add custom constants API and script for generating new constants

* add the rest of the curves for poseidon

* add all the curves for real

* misname bls12-377

* typo

* partial rounds

* minor fixes

* small tweak for big performance boost

* add CHK_INIT_IF_RETURN

---------

Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
2024-02-07 14:49:54 +02:00
DmytroTym
d84cab79fd Changed long to int64 2024-02-07 14:49:54 +02:00
DmytroTym
46d7b88f6e Fixed overflow in large coset NTTs 2024-02-07 14:49:54 +02:00
Jeremy Felder
b20ef93c2d Merge branch 'main' into dev 2024-02-07 14:34:59 +02:00
Jeremy Felder
6b1b735576 (chore): bump rust crate versions (#362)
bump rust crate versions
2024-02-07 14:29:21 +02:00
ChickenLover
b2eecd02af Implement Poseidon and TreeBuilder (#352)
* BW scalar field is now the same as BLS base field

* add poseidon

* add merkle tree builder

* poseidon rust bindings

* implement rust bindings

* add doc comments

* remove global poseidon constants

* add custom constants API and script for generating new constants

* add the rest of the curves for poseidon

* add all the curves for real

* misname bls12-377

* typo

* partial rounds

* minor fixes

* small tweak for big performance boost

* add CHK_INIT_IF_RETURN

---------

Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
2024-02-07 00:31:49 +07:00
DmytroTym
b13d993f5d Fixed overflow in large coset NTTs (#358)
If domain has size 2^17, NTT on size 2^16 coset generated by `-1` fails. This happens due to index in `BatchMulKernel` overflowing, fixed by using `long` instead of `int`.
2024-02-05 18:53:26 +02:00
DmytroTym
4f6b4f7dcf Merge branch 'dev' into coset_overflow_fix 2024-02-05 16:44:06 +02:00
Jeremy Felder
19721cfab6 Add missed bump to dev (#359)
Add missed bump to dev
2024-02-05 15:44:37 +02:00
DmytroTym
3c068ae4e7 Changed long to int64 2024-02-05 14:17:00 +02:00
Jeremy Felder
bfd510b3bb Bump for release 2024-02-05 13:33:00 +02:00
DmytroTym
d2b9ec1908 Fixed overflow in large coset NTTs 2024-02-05 13:14:03 +02:00
Jeremy Felder
77a7613aa2 Release v1.1.0 (#357)
Release v1.1.0:
- Updated examples to use the new API
- [c++] Curve specific functions using macros
- [c++] Consolidate MSM and Batch MSM to single function
- [CI] Add codespell in CI
- [FIX] Windows rust build
- [FIX] G2 on rust bindings
- [FIX] Bw6 using bls12377
2024-01-31 20:30:44 +02:00
Jeremy Felder
5a96f9937d Bump for release 2024-01-31 16:34:14 +02:00
Stas
aaa3808c81 Update examples for new api (#355)
## Describe the changes

Make sure the examples comply with new API
2024-01-31 08:59:21 +02:00
stas
759b7b26d6 Update examples to use latest API 2024-01-31 08:59:21 +02:00
Leon Hibnik
1874ade68a update readme links (#346)
Update README.md
2024-01-24 12:26:38 +02:00
DmytroTym
96fe5bf283 G2 fix and BW6 scalar field on the Rust side (#341)
* BW scalar field is now the same as BLS base field in Rust

* G2 fixed and added into Rust
2024-01-24 11:51:22 +02:00
Jeremy Felder
f0a0bb5974 Fix/windows build failing (#345)
- Make the curve config's omegas_count conditionally accessed when creating fields
- Remove the extern C function that returns a UDT containing non-POD types and replace it with a default_config function on the Rust bindings side
2024-01-23 10:51:16 +02:00
Jeremy Felder
69af0bef91 [FEAT]: Add codespell to CI and pre-commit hooks (#344)
Add codespell to pre-commit hook/CI and fix typos
2024-01-22 14:27:52 +02:00
Stas
45f6db666b c++ msm uses new API (#338)
## Describe the changes

This PR allows c++ MSM example to compile with new API

## Linked Issues

Resolves #
2024-01-18 11:31:30 -05:00
Stas
4c235bf5f5 Merge branch 'dev' into examples-cpp-msm-new-API 2024-01-18 11:29:54 -05:00
stas
a0f35ea8cd comply with reviewer's comments 2024-01-18 11:13:12 -05:00
yshekel
56fcd2e156 refactor: consolidate msm and batch-msm implementations to one function (#342)
refactor: consolidate msm and batch-msm implementations to one function
    - now batch-msm support parallel BM accumulation in addition to large triangle accumulation
2024-01-18 14:20:34 +02:00
Otsar
c1a32a9879 Update README.md (#339)
Added badge
2024-01-11 18:34:32 +02:00
yshekel
67586e01f8 refactor: generate curve-specific function names with macro (#337)
For exposed functions that are curve-specific, the expected symbol is
based on the curve name.
For example MSMCuda becomes bn254MSMCuda for CURVE=bn254.

currently it is implemented via objcopy by redefining symbols after
compilation.

This PR modifies the function names at preprocessing time instead.
For shared objects, objcopy doesn't work (since it cannot modify the
dynsym section).
2024-01-11 15:04:40 +02:00
Yuval Shekel
9823f87fe2 fix: enable ci for dev branch 2024-01-11 13:49:41 +02:00
Yuval Shekel
5fda751e93 refactor: generate curve-specific function names with macro instead of using objcopy to modify the symbols 2024-01-11 10:54:31 +02:00
stas
0c58fe2b83 c++ msm uses new API 2024-01-09 11:06:10 -05:00
Leon Hibnik
b184befc99 Update readme v1.0.0 (#335)
* Update readme v1.0.0

* update

* update example readme
2024-01-08 23:48:24 +02:00
DmytroTym
91471fbbc6 New API for Version 1.0.0 (#326) 2024-01-08 20:42:16 +02:00
Jeremy Felder
4cc30474d4 Added conditional PIC for compilation 2024-01-08 18:22:50 +02:00
Jeremy Felder
ad932c6b4a Merge branch 'main' into dev 2024-01-08 17:44:03 +02:00
Jeremy Felder
392fff4e8f fix: remove examples directory from clang formatting 2024-01-08 17:40:40 +02:00
DmytroTym
dc3518c0de Smart pointers and documentation (#333)
* Safer smart pointer that covers host and device

* Fixed MSM test

* Scalars and points in MSM are non-mutable in all cases

* change mont API (#332)

* Some Rust doc comments

---------

Co-authored-by: ChickenLover <Romangg81@gmail.com>
2024-01-08 17:35:27 +02:00
Jeremy Felder
79b86e0edf Added examples for rust and c++ 2024-01-08 17:31:46 +02:00
Jeremy Felder
709009a96a Formatting, fixes tests, and general cleanup 2024-01-08 17:31:46 +02:00
Jeremy Felder
b92bc707d1 Fix main build badge to be on push and not pr 2024-01-08 17:28:11 +02:00
Jeremy Felder
a87f2251da Moved ingo_<curve> output to same directory as original library output allowing rust build script to find it when crate is used as a dependency 2024-01-08 17:28:03 +02:00
ImmanuelSegol
31108ee31d Update READMEs and docs (#303)
---------

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
2024-01-08 17:22:36 +02:00
DmytroTym
5f7b36d427 Rust NTT updates (#305)
* NTT and MSM bugs fixed and functionality extended

* More Rust tests, refactored Rust wrappers

* Reworked MSM and NTT public functions on the Rust side

---------

Co-authored-by: ImmanuelSegol <3ditds@gmail.com>
Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
Co-authored-by: BigSky77 <simonjudd2@gmail.com>
Co-authored-by: yanziseeker <153156292+AdventureSeeker987@users.noreply.github.com>
Co-authored-by: BigSky <77446076+bigsky77@users.noreply.github.com>
2024-01-08 17:20:39 +02:00
ChickenLover
41affcdadf Add Montgomery conversions API (#321) 2024-01-08 17:19:46 +02:00
VitaliiH
a22c046410 Adds proper error handling for GPU errors and c++ icicle errors 2024-01-08 17:19:45 +02:00
ChickenLover
0eb8560c63 Feat/roman/msm ntt generics (#299)
Add curves to Rust, use generics for NTT, MSM and tests
2024-01-08 17:19:02 +02:00
Jeremy Felder
605c25f9d2 Fix main build badge to be on push and not pr (#325)
## Describe the changes

This PR fixes the build badge to show the status of the `main` branch
build status instead of the status of a PR branch against `main`
2024-01-01 18:45:14 +02:00
Jeremy Felder
5ccf7ff378 Fix main build badge to be on push and not pr 2023-12-31 12:34:17 +02:00
Jeremy Felder
4beda3a900 Fix curve crates not building when used as dependency (#320)
## Describe the changes

This PR moves `ingo_<curve>` archive output to the same directory as the
original library output allowing rust build scripts to find it when a
curve crate is used as a dependency
2023-12-25 14:19:07 +02:00
Jeremy Felder
da122e7ec9 Moved ingo_<curve> output to same directory as original library output allowing rust build script to find it when crate is used as a dependency 2023-12-25 10:11:50 +02:00
Jeremy Felder
f9e46d158c Fix typos (#319)
## Describe the changes
Fix some typos in comment
2023-12-24 09:25:07 +02:00
BigSky
65d4f2ba2e Add Nix Shell Environment Configuration for CUDA (#318)
This PR introduces a Nix Shell configuration for easing the development
of the ICICLE Core using Nix or NixOS.

Changes:

1. Added instructions on how to use Nix Shell to create a development
environment with all required dependencies and environmental variables.
2. Created `cuda-shell.nix` to configure Nix Shell with necessary CUDA
related packages and environmental variables. Specifically, setting the
`PATH`, `LD_LIBRARY_PATH`, `CUDA_PATH`, `CPATH`, `LIBRARY_PATH`, and
`CMAKE_CUDA_COMPILER` associated with the CUDA toolkit.

This configuration allows us to build and run ICICLE Core more
efficiently in Nix environment.
2023-12-22 10:30:36 -07:00
yanziseeker
263b5d95b5 Fix typos 2023-12-22 06:30:28 +00:00
BigSky77
64c8c89424 Update Nix Shell, add Nix instructions 2023-12-19 12:23:04 -07:00
ImmanuelSegol
dcaa0b4513 update readme (#303)
* refactor

* refactor

* refactor: add rust docs

* Update README.md

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

* Update README.md

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

* refactor

* Update icicle/README.md

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

* Update wrappers/rust/README.md

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

* Update README.md

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>

* refactor

* remove

---------

Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
2023-12-19 09:17:30 -04:00
Jeremy Felder
d5dd16dd9b Fix the need for a dummy bindings.rs file included in the repo (#296) 2023-12-13 12:45:00 +02:00
Jeremy Felder
644aa93d3f New API (#293)
This PR revamps the API and implements the rust bindings for the new API.
2023-12-12 22:30:53 +02:00
Jeremy Felder
73e39c60f6 Fix bw6 test in CI 2023-12-12 21:33:23 +02:00
Jeremy Felder
6d992a9909 Fix CI format workflow 2023-12-12 21:12:20 +02:00
Jeremy Felder
0df6262961 Update CI and prepush hooks 2023-12-12 20:49:57 +02:00
DmytroTym
401ec5afac Merge in 'main' and fmt 2023-12-12 20:08:38 +02:00
DmytroTym
629d8cb309 Update Rust APIs (#292)
* NTT reworked on CUDA side, Rust API updated

* Updates to the Rust-side CUDA runtime wrapper

* Merged in main
2023-12-12 19:31:36 +02:00
DmytroTym
dfa5b10adb Update Rust apis (#262)
* fix memory error in single_stage_multi_reduction_kernel (#235)

* refactor

* refactor

* revert

* refactor: clang format

* Update icicle/appUtils/msm/msm.cu

* Added separate device context struct, returned lde

* wip - msm and eq

* added lde to cmake

* Montgomery param added in lde.cu mul function

* fixed on_device for ntt and lde

* CamelCase

* fixed msm_test, int unification, google guilde

* wip - ntt crash debugging

* async MSM with a rust wrapper

* wip ntt tests with corretness

* hotfix for correctness > 2^9

* wip on device inout mixing with correctness

* cleanup

* preserving twiddles after first call

* fixed twiddles preserving

* formatting

* removed some printing

* disable ecntt temporarily

* format

* rust fmt

* exclude target from format

* passing ntt after merge

* hotfix for linking issue

* format

* format

* draft of pr comments + correctness restored

* wip refactor + format

* domain wip

* rust format

* Merged feature branch in and Rust MSM correctness

* rust build for correct curve

* Slowdown fixed by passing release flag to cmake

* WIP field and curve

* still wip field and curve

* field and curve in rust 1.0

* Refactored rust into several crates

* Arkworks is now an option, bn254 crate created

* Rust msm and ntt wip

* A version of rust msm done, cuda runtime wrapped

* refactor rust by creating a curve folder

* vec_ops instead of lde for now

* format

---------

Co-authored-by: ImmanuelSegol <3ditds@gmail.com>
Co-authored-by: Vitalii <vitalii@ingonyama.com>
2023-12-03 13:32:50 +02:00
Vitalii
133a1b28bc format 2023-10-24 23:49:19 +02:00
Vitalii
c666786fa1 exclude target from format 2023-10-24 23:47:32 +02:00
Vitalii
b108234ce4 formatting 2023-10-24 22:11:06 +02:00
DmytroTym
a646c81aaa Linting as per google guide (#241)
* fixed msm_test, int unification, google guilde

* Fixed warnings
2023-10-12 20:22:24 +03:00
DmytroTym
d4dd6d3923 Cuda refactoring (#240)
* fix memory error in single_stage_multi_reduction_kernel (#235)

* Added separate device context struct, returned lde

* Montgomery param added in lde.cu mul function

---------

Co-authored-by: ImmanuelSegol <3ditds@gmail.com>
2023-10-04 14:51:59 +03:00
DmytroTym
fc73f274fa remove boost from cmake 2023-10-02 12:13:37 +03:00
DmytroTym
c1dfbd28ad Merge branch 'main' into cuda_refactoring 2023-10-02 12:06:23 +03:00
DmytroTym
2d3ab73eca error codes and extern ntt functions added 2023-09-29 23:10:36 +03:00
DmytroTym
9f884f3022 functional NTT 1.0 2023-09-28 22:28:42 +03:00
DmytroTym
a9c846c44c Draft ntt.cuh 2023-09-27 10:28:02 +03:00
DmytroTym
8068f16d82 Cosmetic changes to the MSM API and build process 2023-09-23 16:50:15 +03:00
DmytroTym
82743a098f 1.0 MSM refactoring on CUDA side, cmake build 2023-09-20 19:39:09 +03:00
DmytroTym
d8d2df1cd0 Merge branch 'main' into cuda_refactoring 2023-09-13 15:59:05 +03:00
DmytroTym
323430cddc MSM API prototype 2023-09-13 15:58:48 +03:00
709 changed files with 379304 additions and 40055 deletions

View File

@@ -26,6 +26,7 @@ KeepEmptyLinesAtTheStartOfBlocks: false
MaxEmptyLinesToKeep: 1
NamespaceIndentation: All
PointerAlignment: Left
SortIncludes: false
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false

5
.codespellignore Normal file
View File

@@ -0,0 +1,5 @@
inout
crate
lmit
mut
uint

View File

@@ -1,14 +1,15 @@
golang:
- goicicle/**/*.go'
- wrappers/golang/**/*.go
- wrappers/golang/**/*.h
- wrappers/golang/**/*.tmpl
- go.mod
rust:
- src/**/*.rs
- build.rs
- Cargo.toml
- wrappers/rust/**/*
cpp:
- icicle/**/*.cu
- icicle/**/*.cuh
- icicle/**/*.cpp
- icicle/**/*.hpp
- icicle/**/*.c
- icicle/**/*.h
- icicle/**/*.h
- icicle/CMakeLists.txt

View File

@@ -0,0 +1,39 @@
name: Check Changed Files
on:
workflow_call:
outputs:
golang:
description: "Flag for if GoLang files changed"
value: ${{ jobs.check-changed-files.outputs.golang }}
rust:
description: "Flag for if Rust files changed"
value: ${{ jobs.check-changed-files.outputs.rust }}
cpp_cuda:
description: "Flag for if C++/CUDA files changed"
value: ${{ jobs.check-changed-files.outputs.cpp_cuda }}
jobs:
check-changed-files:
name: Check Changed Files
runs-on: ubuntu-22.04
outputs:
golang: ${{ steps.changed_files.outputs.golang }}
rust: ${{ steps.changed_files.outputs.rust }}
cpp_cuda: ${{ steps.changed_files.outputs.cpp_cuda }}
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Get all changed files
id: changed-files-yaml
uses: tj-actions/changed-files@v39
# https://github.com/tj-actions/changed-files#input_files_yaml_from_source_file
with:
files_yaml_from_source_file: .github/changed-files.yml
- name: Run Changed Files script
id: changed_files
# https://github.com/tj-actions/changed-files#outputs-
run: |
echo "golang=${{ steps.changed-files-yaml.outputs.golang_any_modified }}" >> "$GITHUB_OUTPUT"
echo "rust=${{ steps.changed-files-yaml.outputs.rust_any_modified }}" >> "$GITHUB_OUTPUT"
echo "cpp_cuda=${{ steps.changed-files-yaml.outputs.cpp_any_modified }}" >> "$GITHUB_OUTPUT"

20
.github/workflows/codespell.yml vendored Normal file
View File

@@ -0,0 +1,20 @@
name: Check Spelling
on:
pull_request:
branches:
- main
- V2
jobs:
spelling-checker:
name: Check Spelling
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: codespell-project/actions-codespell@v2
with:
# https://github.com/codespell-project/actions-codespell?tab=readme-ov-file#parameter-skip
skip: ./**/target,./**/build,./docs/*.js,./docs/*.json
# https://github.com/codespell-project/actions-codespell?tab=readme-ov-file#parameter-ignore_words_file
ignore_words_file: .codespellignore

74
.github/workflows/cpp_cuda.yml vendored Normal file
View File

@@ -0,0 +1,74 @@
name: C++/CUDA
on:
pull_request:
branches:
- main
- V2
push:
branches:
- main
- V2
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
check-changed-files:
uses: ./.github/workflows/check-changed-files.yml
check-format:
name: Check Code Format
runs-on: ubuntu-22.04
needs: check-changed-files
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Check clang-format
if: needs.check-changed-files.outputs.cpp_cuda == 'true'
run: if [[ $(find ./ \( -path ./icicle/build -prune -o -path ./**/target -prune -o -path ./examples -prune \) -iname *.h -or -iname *.cuh -or -iname *.cu -or -iname *.c -or -iname *.cpp | xargs clang-format --dry-run -ferror-limit=1 -style=file 2>&1) ]]; then echo "Please run clang-format"; exit 1; fi
test-linux-curve:
name: Test on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: [check-changed-files, check-format]
strategy:
matrix:
curve: [bn254, bls12_381, bls12_377, bw6_761]
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Build curve
working-directory: ./icicle
if: needs.check-changed-files.outputs.cpp_cuda == 'true'
run: |
mkdir -p build && rm -rf build/*
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTS=ON -DCURVE=${{ matrix.curve }} -DG2=ON -S . -B build
cmake --build build -j
- name: Run C++ curve Tests
working-directory: ./icicle/build/tests
if: needs.check-changed-files.outputs.cpp_cuda == 'true'
run: ctest
test-linux-field:
name: Test on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: [check-changed-files, check-format]
strategy:
matrix:
field: [babybear]
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Build field
working-directory: ./icicle
if: needs.check-changed-files.outputs.cpp_cuda == 'true'
run: |
mkdir -p build && rm -rf build/*
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTS=ON -DFIELD=${{ matrix.field }} -DEXT_FIELD=ON -S . -B build
cmake --build build -j
- name: Run C++ field Tests
working-directory: ./icicle/build/tests
if: needs.check-changed-files.outputs.cpp_cuda == 'true'
run: ctest

46
.github/workflows/deploy-docs.yml vendored Normal file
View File

@@ -0,0 +1,46 @@
name: Deploy to GitHub Pages
on:
push:
branches:
- main
paths:
- 'docs/**'
permissions:
contents: write
jobs:
deploy:
name: Deploy to GitHub Pages
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
path: 'repo'
- uses: actions/setup-node@v3
with:
node-version: 18
cache: npm
cache-dependency-path: ./repo/docs/package-lock.json
- name: Install dependencies
run: npm install --frozen-lockfile
working-directory: ./repo/docs
- name: Build website
run: npm run build
working-directory: ./repo/docs
- name: Copy CNAME to build directory
run: echo "dev.ingonyama.com" > ./build/CNAME
working-directory: ./repo/docs
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./repo/docs/build
user_name: github-actions[bot]
user_email: 41898282+github-actions[bot]@users.noreply.github.com
working-directory: ./repo/docs

60
.github/workflows/examples.yml vendored Normal file
View File

@@ -0,0 +1,60 @@
# This workflow is a demo of how to run all examples in the Icicle repository.
# For each language directory (c++, Rust, etc.) the workflow
# (1) loops over all examples (msm, ntt, etc.) and
# (2) runs ./compile.sh and ./run.sh in each directory.
# The script ./compile.sh should compile the example and ./run.sh should run it.
# Each script should return 0 for success and 1 otherwise.
name: Examples
on:
pull_request:
branches:
- main
- V2
push:
branches:
- main
- V2
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
check-changed-files:
uses: ./.github/workflows/check-changed-files.yml
run-examples:
runs-on: [self-hosted, Linux, X64, icicle, examples]
needs: check-changed-files
steps:
- name: Checkout
uses: actions/checkout@v4
- name: c++ examples
working-directory: ./examples/c++
if: needs.check-changed-files.outputs.cpp_cuda == 'true'
run: |
# loop over all directories in the current directory
for dir in $(find . -mindepth 1 -maxdepth 1 -type d); do
if [ -d "$dir" ]; then
echo "Running command in $dir"
cd $dir
./compile.sh
./run.sh
cd -
fi
done
- name: Rust examples
working-directory: ./examples/rust
if: needs.check-changed-files.outputs.rust == 'true'
run: |
# loop over all directories in the current directory
for dir in $(find . -mindepth 1 -maxdepth 1 -type d); do
if [ -d "$dir" ]; then
echo "Running command in $dir"
cd $dir
cargo run --release
cd -
fi
done

121
.github/workflows/golang.yml vendored Normal file
View File

@@ -0,0 +1,121 @@
name: GoLang
on:
pull_request:
branches:
- main
- V2
push:
branches:
- main
- V2
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
check-changed-files:
uses: ./.github/workflows/check-changed-files.yml
check-format:
name: Check Code Format
runs-on: ubuntu-22.04
needs: check-changed-files
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup go
uses: actions/setup-go@v5
with:
go-version: '1.20.0'
- name: Check gofmt
if: needs.check-changed-files.outputs.golang == 'true'
run: if [[ $(go list ./... | xargs go fmt) ]]; then echo "Please run go fmt"; exit 1; fi
build-linux:
name: Build on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: [check-changed-files, check-format]
strategy:
matrix:
curve: [bn254, bls12_381, bls12_377, bw6_761]
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Setup go
uses: actions/setup-go@v5
with:
go-version: '1.20.0'
- name: Build
working-directory: ./wrappers/golang
if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
run: ./build.sh ${{ matrix.curve }} -g2 -ecntt # builds a single curve with G2 and ECNTT enabled
- name: Upload ICICLE lib artifacts
uses: actions/upload-artifact@v4
if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
with:
name: icicle-builds-${{ matrix.curve }}-${{ github.workflow }}-${{ github.sha }}
path: |
icicle/build/src/curves/libingo_curve_${{ matrix.curve }}.a
icicle/build/src/fields/libingo_field_${{ matrix.curve }}.a
retention-days: 1
test-linux:
name: Test on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: [check-changed-files, build-linux]
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Setup go
uses: actions/setup-go@v5
with:
go-version: '1.20.0'
- name: Download ICICLE lib artifacts
uses: actions/download-artifact@v4
if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
with:
path: ./icicle/build/src
merge-multiple: true
- name: Run Tests
working-directory: ./wrappers/golang
if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
# -count ensures the test results are not cached
# -p controls the number of programs that can be run in parallel
run: |
export CPATH=$CPATH:/usr/local/cuda/include
go test --tags=g2 ./... -count=1 -failfast -p 2 -timeout 60m
# TODO: bw6 on windows requires more memory than the standard runner has
# Add a large runner and then enable this job
# build-windows:
# name: Build on Windows
# runs-on: windows-2022
# needs: [check-changed-files, check-format]
# strategy:
# matrix:
# curve: [bn254, bls12_381, bls12_377, bw6_761]
# steps:
# - name: Checkout Repo
# uses: actions/checkout@v4
# - name: Setup go
# uses: actions/setup-go@v5
# with:
# go-version: '1.20.0'
# - name: Download and Install Cuda
# if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
# id: cuda-toolkit
# uses: Jimver/cuda-toolkit@v0.2.11
# with:
# cuda: '12.0.0'
# method: 'network'
# # https://docs.nvidia.com/cuda/archive/12.0.0/cuda-installation-guide-microsoft-windows/index.html
# sub-packages: '["cudart", "nvcc", "thrust", "visual_studio_integration"]'
# - name: Build libs
# if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
# working-directory: ./wrappers/golang
# env:
# CUDA_PATH: ${{ steps.cuda-toolkit.outputs.CUDA_PATH }}
# shell: pwsh
# run: ./build.ps1 ${{ matrix.curve }} ON # builds a single curve with G2 enabled

View File

@@ -1,104 +0,0 @@
name: Build
on:
pull_request:
branches:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
CARGO_TERM_COLOR: always
ARCH_TYPE: native
jobs:
check-changed-files:
name: Check Changed Files
runs-on: ubuntu-22.04
outputs:
golang: ${{ steps.changed_files.outputs.golang }}
rust: ${{ steps.changed_files.outputs.rust }}
cpp_cuda: ${{ steps.changed_files.outputs.cpp_cuda }}
steps:
- name: Checkout Repo
uses: actions/checkout@v3
- name: Get all changed files
id: changed-files-yaml
uses: tj-actions/changed-files@v39
# https://github.com/tj-actions/changed-files#input_files_yaml_from_source_file
with:
files_yaml_from_source_file: .github/changed-files.yml
- name: Run Changed Files script
id: changed_files
# https://github.com/tj-actions/changed-files#outputs-
run: |
echo "golang=${{ steps.changed-files-yaml.outputs.golang_any_modified }}" >> "$GITHUB_OUTPUT"
echo "rust=${{ steps.changed-files-yaml.outputs.rust_any_modified }}" >> "$GITHUB_OUTPUT"
echo "cpp_cuda=${{ steps.changed-files-yaml.outputs.cpp_any_modified }}" >> "$GITHUB_OUTPUT"
build-rust-linux:
name: Build Rust on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: check-changed-files
steps:
- name: Checkout Repo
uses: actions/checkout@v3
- name: Build Rust
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
run: cargo build --release --verbose
build-rust-windows:
name: Build Rust on Windows
runs-on: windows-2022
needs: check-changed-files
steps:
- name: Checkout Repo
uses: actions/checkout@v3
- name: Download and Install Cuda
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
uses: Jimver/cuda-toolkit@v0.2.11
with:
cuda: '12.0.0'
method: 'network'
# https://docs.nvidia.com/cuda/archive/12.0.0/cuda-installation-guide-microsoft-windows/index.html
sub-packages: '["cudart", "nvcc", "thrust", "visual_studio_integration"]'
- name: Build Rust Targets
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
env:
CUDA_PATH: ${{steps.cuda-toolkit.outputs.CUDA_PATH}}
run: cargo build --release --verbose
build-golang-linux:
name: Build Golang on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: check-changed-files
steps:
- name: Checkout Repo
uses: actions/checkout@v3
- name: Build CUDA libs
if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
run: make all
working-directory: ./goicicle
# TODO: Add once Golang make file supports building for Windows
# build-golang-windows:
# name: Build Golang on Windows
# runs-on: windows-2022
# needs: check-changed-files
# steps:
# - name: Checkout Repo
# uses: actions/checkout@v3
# - name: Download and Install Cuda
# if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
# uses: Jimver/cuda-toolkit@v0.2.11
# with:
# cuda: '12.0.0'
# method: 'network'
# # https://docs.nvidia.com/cuda/archive/12.0.0/cuda-installation-guide-microsoft-windows/index.html
# sub-packages: '["cudart", "nvcc", "thrust"]'
# - name: Build cpp libs
# if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
# run: make all
# working-directory: ./goicicle

View File

@@ -1,41 +0,0 @@
name: Format
on:
pull_request:
branches:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
formatting-rust:
name: Check Rust Code Formatting
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Check rustfmt
run: if [[ $(cargo fmt --check) ]]; then echo "Please run cargo fmt"; exit 1; fi
# - name: Check clippy
# run: cargo clippy --no-deps --all-features --all-targets
formatting-golang:
name: Check Golang Code Formatting
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Check gofmt
run: if [[ $(go list ./... | xargs go fmt) ]]; then echo "Please run go fmt"; exit 1; fi
formatting-cpp-cuda:
name: Check C++/CUDA Code Formatting
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Check clang-format
run: unformatted_files=$(find ./ -path ./icicle/build -prune -o -path ./target -prune -iname *.h -or -iname *.cuh -or -iname *.cu -or -iname *.c -or -iname *.cpp | xargs clang-format --dry-run -ferror-limit=1 -style=file >&2); if [[ $unformatted_files ]]; then echo $unformatted_files; echo "Please run clang-format"; exit 1; fi

View File

@@ -1,89 +0,0 @@
name: Test
on:
pull_request:
branches:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
CARGO_TERM_COLOR: always
ARCH_TYPE: native
jobs:
check-changed-files:
name: Check Changed Files
runs-on: ubuntu-22.04
outputs:
golang: ${{ steps.changed_files.outputs.golang }}
rust: ${{ steps.changed_files.outputs.rust }}
cpp_cuda: ${{ steps.changed_files.outputs.cpp_cuda }}
steps:
- name: Checkout Repo
uses: actions/checkout@v3
- name: Get all changed files
id: changed-files-yaml
uses: tj-actions/changed-files@v39
# https://github.com/tj-actions/changed-files#input_files_yaml_from_source_file
with:
files_yaml_from_source_file: .github/changed-files.yml
- name: Run Changed Files script
id: changed_files
# https://github.com/tj-actions/changed-files#outputs-
run: |
echo "golang=${{ steps.changed-files-yaml.outputs.golang_any_modified }}" >> "$GITHUB_OUTPUT"
echo "rust=${{ steps.changed-files-yaml.outputs.rust_any_modified }}" >> "$GITHUB_OUTPUT"
echo "cpp_cuda=${{ steps.changed-files-yaml.outputs.cpp_any_modified }}" >> "$GITHUB_OUTPUT"
test-rust-linux:
name: Test Rust on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: check-changed-files
steps:
- name: Checkout Repo
uses: actions/checkout@v3
- name: Run Rust Tests
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
run: cargo test --release --verbose -- --test-threads=1
test-cpp-linux:
name: Test C++ on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: check-changed-files
steps:
- name: Checkout Repo
uses: actions/checkout@v3
- name: Build C++
working-directory: ./icicle
if: needs.check-changed-files.outputs.cpp_cuda == 'true'
run: |
mkdir -p build
cmake -DBUILD_TESTS=ON -S . -B build
cmake --build build
- name: Run C++ Tests
working-directory: ./icicle/build
if: needs.check-changed-files.outputs.cpp_cuda == 'true'
run: ctest
test-golang-linux:
name: Test Golang on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: check-changed-files
steps:
- name: Checkout Repo
uses: actions/checkout@v3
- name: Build CUDA libs
working-directory: ./goicicle
if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
run: make libbn254.so
- name: Run Golang Tests
if: needs.check-changed-files.outputs.golang == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
run: |
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/goicicle
go test ./goicicle/curves/bn254 -count=1
# TODO: Fix tests for bls12377
# TODO: Fix tests for bls12381
# run: go test ./goicicle/curves/bn254 ./goicicle/curves/bls12377 ./goicicle/curves/bls12381 -count=1

50
.github/workflows/release.yml vendored Normal file
View File

@@ -0,0 +1,50 @@
name: Release
on:
workflow_dispatch:
inputs:
releaseType:
description: 'Release type'
required: true
default: 'minor'
type: choice
options:
- patch
- minor
- major
jobs:
release:
name: Release
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ssh-key: ${{ secrets.DEPLOY_KEY }}
- name: Setup Cache
id: cache
uses: actions/cache@v4
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
key: ${{ runner.os }}-cargo-${{ hashFiles('~/.cargo/bin/cargo-workspaces') }}
- name: Install cargo-workspaces
if: steps.cache.outputs.cache-hit != 'true'
run: cargo install cargo-workspaces
- name: Bump rust crate versions, commit, and tag
working-directory: wrappers/rust
# https://github.com/pksunkara/cargo-workspaces?tab=readme-ov-file#version
run: |
git config user.name release-bot
git config user.email release-bot@ingonyama.com
cargo workspaces version ${{ inputs.releaseType }} -y --no-individual-tags -m "Bump rust crates' version"
- name: Create draft release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
LATEST_TAG=$(git describe --tags --abbrev=0)
gh release create $LATEST_TAG --generate-notes -d --verify-tag -t "Release $LATEST_TAG"

91
.github/workflows/rust.yml vendored Normal file
View File

@@ -0,0 +1,91 @@
name: Rust
on:
pull_request:
branches:
- main
- V2
push:
branches:
- main
- V2
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
check-changed-files:
uses: ./.github/workflows/check-changed-files.yml
check-format:
name: Check Code Format
runs-on: ubuntu-22.04
needs: check-changed-files
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Check rustfmt
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
working-directory: ./wrappers/rust
# "-name target -prune" removes searching in any directory named "target"
# Formatting by single file is necessary due to generated files not being present
# before building the project.
# e.g. icicle-cuda-runtime/src/bindings.rs is generated and icicle-cuda-runtime/src/lib.rs includes that module
# causing rustfmt to fail.
run: if [[ $(find . -path ./icicle-curves/icicle-curve-template -prune -o -name target -prune -o -iname *.rs -print | xargs cargo fmt --check --) ]]; then echo "Please run cargo fmt"; exit 1; fi
build-linux:
name: Build on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: [check-changed-files, check-format]
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Build
working-directory: ./wrappers/rust
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
# Building from the root workspace will build all members of the workspace by default
run: cargo build --release --verbose
test-linux:
name: Test on Linux
runs-on: [self-hosted, Linux, X64, icicle]
needs: [check-changed-files, build-linux]
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Run tests
working-directory: ./wrappers/rust
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
# Running tests from the root workspace will run all workspace members' tests by default
# We need to limit the number of threads to avoid running out of memory on weaker machines
run: cargo test --workspace --exclude icicle-babybear --release --verbose --features=g2 -- --test-threads=2
- name: Run baby bear tests
working-directory: ./wrappers/rust/icicle-fields/icicle-babybear
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
run: cargo test --release --verbose
build-windows:
name: Build on Windows
runs-on: windows-2022
needs: check-changed-files
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Download and Install Cuda
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
id: cuda-toolkit
uses: Jimver/cuda-toolkit@v0.2.11
with:
cuda: '12.0.0'
method: 'network'
# https://docs.nvidia.com/cuda/archive/12.0.0/cuda-installation-guide-microsoft-windows/index.html
sub-packages: '["cudart", "nvcc", "thrust", "visual_studio_integration"]'
- name: Build targets
working-directory: ./wrappers/rust
if: needs.check-changed-files.outputs.rust == 'true' || needs.check-changed-files.outputs.cpp_cuda == 'true'
env:
CUDA_PATH: ${{ steps.cuda-toolkit.outputs.CUDA_PATH }}
# Building from the root workspace will build all members of the workspace by default
run: cargo build --release --verbose

29
.github/workflows/test-deploy-docs.yml vendored Normal file
View File

@@ -0,0 +1,29 @@
name: Test Deploy to GitHub Pages
on:
pull_request:
branches:
- main
paths:
- 'docs/*'
jobs:
test-deploy:
name: Test deployment of docs website
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
path: 'repo'
- uses: actions/setup-node@v3
with:
node-version: 18
cache: npm
cache-dependency-path: ./repo/docs/package-lock.json
- name: Install dependencies
run: npm install --frozen-lockfile
working-directory: ./repo/docs
- name: Test build website
run: npm run build
working-directory: ./repo/docs

4
.gitignore vendored
View File

@@ -15,3 +15,7 @@
**/.DS_Store
**/Cargo.lock
**/icicle/build/
**/wrappers/rust/icicle-cuda-runtime/src/bindings.rs
**/build*
**/icicle/appUtils/large_ntt/work
icicle/appUtils/large_ntt/work/test_ntt

View File

@@ -2,7 +2,7 @@ cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ingonyama"
title: "Icicle: GPU Library for ZK Acceleration"
version: 0.1.0
date-released: 2023-03-08
title: "ICICLE: GPU Library for ZK Acceleration"
version: 1.0.0
date-released: 2024-01-04
url: "https://github.com/ingonyama-zk/icicle"

View File

@@ -1,50 +0,0 @@
[package]
name = "icicle"
version = "0.1.0"
edition = "2021"
authors = [ "Ingonyama" ]
description = "An implementation of the Ingonyama CUDA Library"
homepage = "https://www.ingonyama.com"
repository = "https://github.com/ingonyama-zk/icicle"
[[bench]]
name = "ntt"
path = "benches/ntt.rs"
harness = false
[[bench]]
name = "msm"
path = "benches/msm.rs"
harness = false
[dependencies]
hex = "*"
ark-std = "0.3.0"
ark-ff = "0.3.0"
ark-poly = "0.3.0"
ark-ec = { version = "0.3.0", features = [ "parallel" ] }
ark-bls12-381 = "0.3.0"
ark-bls12-377 = "0.3.0"
ark-bn254 = "0.3.0"
serde = { version = "1.0", features = ["derive"] }
serde_derive = "1.0"
serde_cbor = "0.11.2"
rustacuda = "0.1"
rustacuda_core = "0.1"
rustacuda_derive = "0.1"
rand = "0.8.5" #TODO: move rand and ark dependencies to dev once random scalar/point generation is done "natively"
[build-dependencies]
cc = { version = "1.0", features = ["parallel"] }
cmake = "0.1.50"
[dev-dependencies]
"criterion" = "0.4.0"
[features]
default = ["bls12_381"]
bls12_381 = ["ark-bls12-381/curve"]
g2 = []

View File

@@ -15,7 +15,7 @@ ENV PATH="/root/.cargo/bin:${PATH}"
# Install Golang
ENV GOLANG_VERSION 1.21.1
RUN curl -L https://golang.org/dl/go${GOLANG_VERSION}.linux-amd64.tar.gz | tar -xz -C /usr/local
RUN curl -L https://go.dev/dl/go${GOLANG_VERSION}.linux-amd64.tar.gz | tar -xz -C /usr/local
ENV PATH="/usr/local/go/bin:${PATH}"
# Set the working directory in the container

244
README.md
View File

@@ -1,193 +1,84 @@
# ICICLE
**<div align="center">Icicle is a library for ZK acceleration using CUDA-enabled GPUs.</div>**
![image (4)](https://user-images.githubusercontent.com/2446179/223707486-ed8eb5ab-0616-4601-8557-12050df8ccf7.png)
<div align="center">ICICLE is a library for ZK acceleration using CUDA-enabled GPUs.</div>
<p align="center">
<img src="https://github.com/ingonyama-zk/icicle/actions/workflows/main-build.yml/badge.svg" alt="Build status">
<img alt="ICICLE" width="300" height="300" src="https://user-images.githubusercontent.com/2446179/223707486-ed8eb5ab-0616-4601-8557-12050df8ccf7.png"/>
</p>
<p align="center">
<a href="https://discord.gg/EVVXTdt6DF">
<img src="https://img.shields.io/discord/1063033227788423299?logo=discord" alt="Chat with us on Discord">
</a>
<a href="https://twitter.com/intent/follow?screen_name=Ingo_zk">
<img src="https://img.shields.io/twitter/follow/Ingo_zk?style=social&logo=twitter" alt="Follow us on Twitter">
<a href="https://github.com/ingonyama-zk/icicle/releases">
<img src="https://img.shields.io/github/v/release/ingonyama-zk/icicle" alt="GitHub Release">
</a>
</p>
## Background
Zero Knowledge Proofs (ZKPs) are considered one of the greatest achievements of modern cryptography. Accordingly, ZKPs are expected to disrupt a number of industries and will usher in an era of trustless and privacy preserving services and infrastructure.
If we want ZK hardware today we have FPGAs or GPUs which are relatively inexpensive. However, the biggest selling point of GPUs is the software; we talk in particular about CUDA, which makes it easy to write code running on Nvidia GPUs, taking advantage of their highly parallel architecture. Together with the widespread availability of these devices, if we can get GPUs to work on ZK workloads, then we have made a giant step towards accessible and efficient ZK provers.
We believe GPUs are as important for ZK as for AI.
## Zero Knowledge on GPU
- GPUs are a perfect match for ZK compute - around 97% of ZK protocol runtime is parallel by nature.
- GPUs are simple for developers to use and scale compared to other hardware platforms.
- GPUs are extremely competitive in terms of power / performance and price (3x cheaper).
- GPUs are popular and readily available.
ICICLE is a CUDA implementation of general functions widely used in ZKP. ICICLE currently provides support for MSM, NTT, and ECNTT, with plans to support Hash functions soon.
## Getting Started
### Supported primitives
ICICLE is a CUDA implementation of general functions widely used in ZKP.
- Fields
- Scalars
- Points
- Projective: {x, y, z}
- Affine: {x, y}
- Curves
- [BLS12-381]
- [BLS12-377]
- [BN254]
- [BW6-671]
## Build and usage
> [!NOTE]
> Developers: We highly recommend reading our [documentation]
> [!TIP]
> Try out ICICLE by running some [examples] using ICICLE in C++ and our Rust bindings
### Prerequisites
- [NVCC] (version 12.0 or newer)
- cmake 3.18 and above
- follow [these instructions](https://github.com/ingonyama-zk/icicle/tree/main/icicle#prerequisites-on-ubuntu)
- Any Nvidia GPU
- [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) version 12.0 or newer.
- [CMake]((https://cmake.org/files/)), version 3.18 and above. Latest version is recommended.
- [GCC](https://gcc.gnu.org/install/download.html) version 9, latest version is recommended.
- Any Nvidia GPU (which supports CUDA Toolkit version 12.0 or above).
If you don't have access to a Nvidia GPU check out [google-colab](#google-colab). If you require more compute power and are looking to build or do research with ICICLE refer to our [grant program][GRANT_PROGRAM].
> [!NOTE]
> It is possible to use CUDA 11 for cards which don't support CUDA 12, however we don't officially support this version and in the future there may be issues.
### Accessing Hardware
If you don't have access to an Nvidia GPU we have some options for you.
Checkout [Google Colab](https://colab.google/). Google Colab offers a free [T4 GPU](https://www.nvidia.com/en-us/data-center/tesla-t4/) instance and ICICLE can be used with it, reference this guide for setting up your [Google Colab workplace][GOOGLE-COLAB-ICICLE].
If you require more compute and have an interesting research project, we have [bounty and grant programs][GRANT_PROGRAM].
### Steps
### Build systems
1. Define or select a curve for your application; we've provided a [template][CRV_TEMPLATE] for defining a curve
2. Include the curve in [`curve_config.cuh`][CRV_CONFIG]
3. Now you can build the ICICLE library using nvcc
ICICLE has three build systems.
```sh
mkdir -p build
nvcc -o build/<binary_name> ./icicle/curves/index.cu -lib -arch=native
```
- [ICICLE core][ICICLE-CORE], C++ and CUDA
- [ICICLE Rust][ICICLE-RUST] bindings, requires [Rust](https://www.rust-lang.org/) version 1.70 and above
- [ICICLE Golang][ICICLE-GO] bindings, requires [Go](https://go.dev/) version 1.20 and above
### Testing the CUDA code
ICICLE core always needs to be built as part of the other build systems as it contains the core ICICLE primitives implemented in CUDA. Reference these guides for the different build systems, [ICICLE core guide][ICICLE-CORE-README], [ICICLE Rust guide][ICICLE-RUST-README] and [ICICLE Golang guide][ICICLE-GO-README].
We are using [googletest] library for testing. To build and run [the test suite](./icicle/README.md) for finite field and elliptic curve arithmetic, run from the `icicle` folder:
### Compiling ICICLE
For testing, ensure the `BUILD_TESTS` option is enabled in cmake. If not, toggle it on by adding `-DBUILD_TESTS=ON` in the cmake configuration command:
Running ICICLE via Rust bindings is highly recommended and simple:
- Clone this repo
- go to our [Rust bindings][ICICLE-RUST]
- Enter a [curve](./wrappers/rust/icicle-curves) implementation
- run `cargo build --release` to build or `cargo test` to build and execute tests
```sh
cmake -S . -B build -DBUILD_TESTS=ON
```
Proceed with the following commands:
```sh
mkdir -p build
cmake -S . -B build
cmake --build build
cd build && ctest
```
NOTE: If you are using cmake versions < 3.24 add `-DCUDA_ARCH=<target_cumpute_arch>` to the command `cmake -S . -B build`
### Rust Bindings
For convenience, we also provide rust bindings to the ICICLE library for the following primitives:
- MSM
- NTT
- Forward NTT
- Inverse NTT
- ECNTT
- Forward ECNTT
- Inverse NTT
- Scalar Vector Multiplication
- Point Vector Multiplication
A custom [build script][B_SCRIPT] is used to compile and link the ICICLE library. The environment variable `ARCH_TYPE` is used to determine which GPU type the library should be compiled for and it defaults to `native` when it is not set allowing the compiler to detect the installed GPU type.
> NOTE: A GPU must be detectable and therefore installed if the `ARCH_TYPE` is not set.
Once you have your parameters set, run:
```sh
cargo build --release
```
You'll find a release ready library at `target/release/libicicle_utils.rlib`.
To benchmark and test the functionality available in RUST, run:
```
cargo bench
cargo test -- --test-threads=1
```
The flag `--test-threads=1` is needed because currently some tests might interfere with one another inside the GPU.
### Example Usage
An example of using the Rust bindings library can be found in our [fast-danksharding implementation][FDI]
### Supporting Additional Curves
Supporting additional curves can be done as follows:
Create a JSON file with the curve parameters. The curve is defined by the following parameters:
- ``curve_name`` - e.g. ``bls12_381``.
- ``modulus_p`` - scalar field modulus (in decimal).
- ``bit_count_p`` - number of bits needed to represent `` modulus_p`` .
- ``limb_p`` - number of (32-bit) limbs needed to represent `` modulus_p`` (rounded up).
- ``ntt_size`` - log of the maximal size subgroup of the scalar field.
- ``modulus_q`` - base field modulus (in decimal).
- ``bit_count_q`` - number of bits needed to represent `` modulus_q`` .
- ``limb_q`` - number of (32-bit) limbs needed to represent `` modulus_q`` (rounded up).
- ``weierstrass_b`` - `b` of the curve in Weierstrauss form.
- ``weierstrass_b_g2_re`` - real part of the `b` value in of the g2 curve in Weierstrass form.
- ``weierstrass_b_g2_im`` - imaginary part of the `b` value in of the g2 curve in Weierstrass form.
- ``gen_x`` - `x` coordinate of a generator element for the curve.
- ``gen_y`` - `y` coordinate of a generator element for the curve.
- ``gen_x_re`` - real part of the `x` coordinate of generator element for the g2 curve.
- ``gen_x_im`` - imaginary part of the `x` coordinate of generator element for the g2 curve.
- ``gen_y_re`` - real part of the `y` coordinate of generator element for the g2 curve.
- ``gen_y_im`` - imaginary part of the `y` coordinate of generator element for the g2 curve.
- ``nonresidue`` - nonresidue, or `i^2`, or `u^2` - square of the element that generates quadratic extension field of the base field.
Here's an example for BLS12-381.
```
{
"curve_name" : "bls12_381",
"modulus_p" : 52435875175126190479447740508185965837690552500527637822603658699938581184513,
"bit_count_p" : 255,
"limb_p" : 8,
"ntt_size" : 32,
"modulus_q" : 4002409555221667393417789825735904156556882819939007885332058136124031650490837864442687629129015664037894272559787,
"bit_count_q" : 381,
"limb_q" : 12,
"weierstrass_b" : 4,
"weierstrass_b_g2_re" : 4,
"weierstrass_b_g2_im" : 4,
"gen_x" : 3685416753713387016781088315183077757961620795782546409894578378688607592378376318836054947676345821548104185464507,
"gen_y" : 1339506544944476473020471379941921221584933875938349620426543736416511423956333506472724655353366534992391756441569,
"gen_x_re" : 352701069587466618187139116011060144890029952792775240219908644239793785735715026873347600343865175952761926303160,
"gen_x_im" : 3059144344244213709971259814753781636986470325476647558659373206291635324768958432433509563104347017837885763365758,
"gen_y_re" : 1985150602287291935568054521177171638300868978215655730859378665066344726373823718423869104263333984641494340347905,
"gen_y_im" : 927553665492332455747201965776037880757740193453592970025027978793976877002675564980949289727957565575433344219582,
"nonresidue" : -1
}
```
Save the parameters JSON file under the ``curve_parameters`` directory.
Then run the Python script ``new_curve_script.py `` from the root folder:
```
python3 ./curve_parameters/new_curve_script.py ./curve_parameters/bls12_381.json
```
The script does the following:
- Creates a folder in ``icicle/curves`` with the curve name, which contains all of the files needed for the supported operations in cuda.
- Adds the curve's exported operations to ``icicle/curves/index.cu``.
- Creates a file with the curve name in ``src/curves`` with the relevant objects for the curve.
- Creates a test file with the curve name in ``src``.
Also files from ``./icicle/curves/<curve_name>/supported_operations.cu`` should be added individually to ``add_library`` section of [``./icicle/CMakeLists.txt``][CMAKELISTS]
Testing the new curve could be done by running the tests in ``tests_curve_name`` (e.g. ``tests_bls12_381``).
In any case you would want to compile and run core icicle c++ tests, just follow these setps:
- Clone this repo
- go to [ICICLE core][ICICLE-CORE]
- execute the small [script](https://github.com/ingonyama-zk/icicle/tree/main/icicle#running-tests) to compile via cmake and run c++ and cuda tests
## Docker
@@ -198,14 +89,6 @@ docker build -t <name_of_your_choice> .
docker run --gpus all -it <name_of_your_choice> /bin/bash
```
## Google Colab
[Colab](https://colab.google/) is a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources including GPUS!
You can easily run ICICLE in Google Colab on a free GPU instance, this is a great option for those who want to get started with ICICLE instantly without any local setup or GPU.
Follow this [guide][GOOGLE_COLAB_ICICLE] for more details.
## Contributions
Join our [Discord Server][DISCORD] and find us on the icicle channel. We will be happy to work together to support your use case and talk features, bugs and design.
@@ -224,13 +107,18 @@ In case `clang-format` is missing on your system, you can install it using the
sudo apt install clang-format
```
You will also need to install [codespell](https://github.com/codespell-project/codespell?tab=readme-ov-file#installation) to check for typos.
This will ensure our custom hooks are run and will make it easier to follow our coding guidelines.
### Hall of Fame
- [Robik](https://github.com/robik75), for his ongoing support and mentorship
- [liuxiao](https://github.com/liuxiaobleach), for being a top notch bug smasher
- [gkigiermo](https://github.com/gkigiermo), for making it intuitive to use ICICLE in Google Colab.
- [gkigiermo](https://github.com/gkigiermo), for making it intuitive to use ICICLE in Google Colab
- [nonam3e](https://github.com/nonam3e), for adding Grumpkin curve support into ICICLE
- [alxiong](https://github.com/alxiong), for adding warmup for CudaStream
- [cyl19970726](https://github.com/cyl19970726), for updating go install source in Dockerfile
## Help & Support
@@ -244,22 +132,26 @@ ICICLE is distributed under the terms of the MIT License.
See [LICENSE-MIT][LMIT] for details.
<!-- Begin Links -->
[BLS12-381]: ./icicle/curves/bls12_381/supported_operations.cu
[BLS12-377]: ./icicle/curves/bls12_377/supported_operations.cu
[BN254]: ./icicle/curves/bn254/supported_operations.cu
[BW6-671]: ./icicle/curves/bw6_671/supported_operations.cu
[BLS12-381]: ./icicle/curves/
[BLS12-377]: ./icicle/curves/
[BN254]: ./icicle/curves/
[BW6-671]: ./icicle/curves/
[NVCC]: https://docs.nvidia.com/cuda/#installation-guides
[CRV_TEMPLATE]: ./icicle/curves/curve_template/
[CRV_CONFIG]: ./icicle/curves/index.cu
[B_SCRIPT]: ./build.rs
[FDI]: https://github.com/ingonyama-zk/fast-danksharding
[LMIT]: ./LICENSE
[DISCORD]: https://discord.gg/Y4SkbDf2Ff
[googletest]: https://github.com/google/googletest/
[HOOKS_DOCS]: https://git-scm.com/docs/githooks
[HOOKS_PATH]: ./scripts/hooks/
[CMAKELISTS]: https://github.com/ingonyama-zk/icicle/blob/f0e6b465611227b858ec4590f4de5432e892748d/icicle/CMakeLists.txt#L28
[GOOGLE_COLAB_ICICLE]: https://github.com/gkigiermo/rust-cuda-colab
[GRANT_PROGRAM]: https://docs.google.com/forms/d/e/1FAIpQLSc967TnNwxZZ4akejcSi4KOUmGrEc68ZZV-FHLfo8KnP1wbpg/viewform
[GOOGLE-COLAB-ICICLE]: https://dev.ingonyama.com/icicle/colab-instructions
[GRANT_PROGRAM]: https://medium.com/@ingonyama/icicle-for-researchers-grants-challenges-9be1f040998e
[ICICLE-CORE]: ./icicle/
[ICICLE-RUST]: ./wrappers/rust/
[ICICLE-GO]: ./wrappers/golang/
[ICICLE-CORE-README]: ./icicle/README.md
[ICICLE-RUST-README]: ./wrappers/rust/README.md
[ICICLE-GO-README]: ./wrappers/golang/README.md
[documentation]: https://dev.ingonyama.com/icicle/overview
[examples]: ./examples/
<!-- End Links -->

View File

@@ -1,54 +0,0 @@
extern crate criterion;
use criterion::{criterion_group, criterion_main, Criterion};
use icicle::test_bls12_381::{commit_batch_bls12_381, generate_random_points_bls12_381, set_up_scalars_bls12_381};
use icicle::utils::*;
#[cfg(feature = "g2")]
use icicle::{commit_batch_g2, field::ExtensionField};
use rustacuda::prelude::*;
const LOG_MSM_SIZES: [usize; 1] = [12];
const BATCH_SIZES: [usize; 2] = [128, 256];
fn bench_msm(c: &mut Criterion) {
let mut group = c.benchmark_group("MSM");
for log_msm_size in LOG_MSM_SIZES {
for batch_size in BATCH_SIZES {
let msm_size = 1 << log_msm_size;
let (scalars, _, _) = set_up_scalars_bls12_381(msm_size, 0, false);
let batch_scalars = vec![scalars; batch_size].concat();
let mut d_scalars = DeviceBuffer::from_slice(&batch_scalars[..]).unwrap();
let points = generate_random_points_bls12_381(msm_size, get_rng(None));
let batch_points = vec![points; batch_size].concat();
let mut d_points = DeviceBuffer::from_slice(&batch_points[..]).unwrap();
#[cfg(feature = "g2")]
let g2_points = generate_random_points::<ExtensionField>(msm_size, get_rng(None));
#[cfg(feature = "g2")]
let g2_batch_points = vec![g2_points; batch_size].concat();
#[cfg(feature = "g2")]
let mut d_g2_points = DeviceBuffer::from_slice(&g2_batch_points[..]).unwrap();
group
.sample_size(30)
.bench_function(
&format!("MSM of size 2^{} in batch {}", log_msm_size, batch_size),
|b| b.iter(|| commit_batch_bls12_381(&mut d_points, &mut d_scalars, batch_size)),
);
#[cfg(feature = "g2")]
group
.sample_size(10)
.bench_function(
&format!("G2 MSM of size 2^{} in batch {}", log_msm_size, batch_size),
|b| b.iter(|| commit_batch_g2(&mut d_g2_points, &mut d_scalars, batch_size)),
);
}
}
}
criterion_group!(msm_benches, bench_msm);
criterion_main!(msm_benches);

View File

@@ -1,85 +0,0 @@
extern crate criterion;
use criterion::{criterion_group, criterion_main, Criterion};
use icicle::test_bls12_381::*;
const LOG_NTT_SIZES: [usize; 3] = [20, 9, 10];
const BATCH_SIZES: [usize; 3] = [1, 512, 1024];
fn bench_ntt(c: &mut Criterion) {
let mut group = c.benchmark_group("NTT");
for log_ntt_size in LOG_NTT_SIZES {
for batch_size in BATCH_SIZES {
let ntt_size = 1 << log_ntt_size;
if ntt_size * batch_size > 1 << 25 {
continue;
}
let scalar_samples = 20;
let (_, mut d_evals, mut d_domain) = set_up_scalars_bls12_381(ntt_size * batch_size, log_ntt_size, true);
group
.sample_size(scalar_samples)
.bench_function(
&format!("Scalar NTT of size 2^{} in batch {}", log_ntt_size, batch_size),
|b| b.iter(|| evaluate_scalars_batch_bls12_381(&mut d_evals, &mut d_domain, batch_size)),
);
group
.sample_size(scalar_samples)
.bench_function(
&format!("Scalar iNTT of size 2^{} in batch {}", log_ntt_size, batch_size),
|b| b.iter(|| interpolate_scalars_batch_bls12_381(&mut d_evals, &mut d_domain, batch_size)),
);
group
.sample_size(scalar_samples)
.bench_function(
&format!("Scalar inplace NTT of size 2^{} in batch {}", log_ntt_size, batch_size),
|b| b.iter(|| ntt_inplace_batch_bls12_381(&mut d_evals, &mut d_domain, batch_size, false, 0)),
);
group
.sample_size(scalar_samples)
.bench_function(
&format!("Scalar inplace iNTT of size 2^{} in batch {}", log_ntt_size, batch_size),
|b| b.iter(|| ntt_inplace_batch_bls12_381(&mut d_evals, &mut d_domain, batch_size, true, 0)),
);
drop(d_evals);
drop(d_domain);
if ntt_size * batch_size > 1 << 18 {
continue;
}
let point_samples = 10;
let (_, mut d_points_evals, mut d_domain) =
set_up_points_bls12_381(ntt_size * batch_size, log_ntt_size, true);
group
.sample_size(point_samples)
.bench_function(
&format!("EC NTT of size 2^{} in batch {}", log_ntt_size, batch_size),
|b| b.iter(|| interpolate_points_batch_bls12_381(&mut d_points_evals, &mut d_domain, batch_size)),
);
group
.sample_size(point_samples)
.bench_function(
&format!("EC iNTT of size 2^{} in batch {}", log_ntt_size, batch_size),
|b| b.iter(|| evaluate_points_batch_bls12_381(&mut d_points_evals, &mut d_domain, batch_size)),
);
drop(d_points_evals);
drop(d_domain);
}
}
}
criterion_group!(ntt_benches, bench_ntt);
criterion_main!(ntt_benches);

View File

@@ -1,53 +0,0 @@
use std::env::{self, var};
use cmake::Config;
fn main() {
let cargo_dir = var("CARGO_MANIFEST_DIR").unwrap();
let profile = var("PROFILE").unwrap();
let target_output_dir = format!("{}/target/{}", cargo_dir, profile);
let build_output_dir = format!("{}/build", target_output_dir);
println!("cargo:rerun-if-env-changed=CXXFLAGS");
println!("cargo:rerun-if-changed=./icicle");
println!("cargo:rerun-if-changed=./target/{}", profile); // without this it ignores manual changes to build folder
let mut cmake = Config::new("./icicle");
cmake
.define("BUILD_TESTS", "OFF")
.out_dir(&target_output_dir)
.build_target("icicle");
let target_profile: &str = if profile == "release" { "Release" } else { "Debug" };
cmake.define("CMAKE_BUILD_TYPE", target_profile);
if cfg!(feature = "g2") {
cmake.define("G2_DEFINED", "");
}
cmake.build();
if cfg!(unix) {
if let Ok(cuda_path) = var("CUDA_HOME") {
println!("cargo:rustc-link-search=native={}/lib64", cuda_path);
} else {
println!("cargo:rustc-link-search=native=/usr/local/cuda/lib64");
}
} else if cfg!(windows) {
let build_output_dir_cmake = format!("{}/{}", build_output_dir, target_profile);
println!("cargo:rustc-link-search={}", &build_output_dir_cmake);
}
println!("cargo:rustc-link-search={}", &build_output_dir);
println!("cargo:rustc-link-search={}", &target_output_dir);
println!("cargo:rustc-link-lib=ingo_icicle");
println!("cargo:rustc-link-lib=dylib=cuda");
println!("cargo:rustc-link-lib=dylib=cudart");
if cfg!(unix) {
println!("cargo:rustc-link-lib=dylib=stdc++");
}
}

View File

@@ -1,21 +0,0 @@
{
"curve_name" : "bls12_377",
"modulus_p" : 8444461749428370424248824938781546531375899335154063827935233455917409239041,
"bit_count_p" : 253,
"limb_p" : 8,
"ntt_size" : 47,
"modulus_q" : 258664426012969094010652733694893533536393512754914660539884262666720468348340822774968888139573360124440321458177,
"bit_count_q" : 377,
"limb_q" : 12,
"root_of_unity" : 8065159656716812877374967518403273466521432693661810619979959746626482506078,
"weierstrass_b" : 1,
"weierstrass_b_g2_re" : 0,
"weierstrass_b_g2_im" : 155198655607781456406391640216936120121836107652948796323930557600032281009004493664981332883744016074664192874906,
"g1_gen_x" : 81937999373150964239938255573465948239988671502647976594219695644855304257327692006745978603320413799295628339695,
"g1_gen_y" : 241266749859715473739788878240585681733927191168601896383759122102112907357779751001206799952863815012735208165030,
"g2_gen_x_re" : 233578398248691099356572568220835526895379068987715365179118596935057653620464273615301663571204657964920925606294,
"g2_gen_x_im" : 140913150380207355837477652521042157274541796891053068589147167627541651775299824604154852141315666357241556069118,
"g2_gen_y_re" : 63160294768292073209381361943935198908131692476676907196754037919244929611450776219210369229519898517858833747423,
"g2_gen_y_im" : 149157405641012693445398062341192467754805999074082136895788947234480009303640899064710353187729182149407503257491,
"nonresidue" : -5
}

View File

@@ -1,21 +0,0 @@
{
"curve_name" : "bls12_381",
"modulus_p" : 52435875175126190479447740508185965837690552500527637822603658699938581184513,
"bit_count_p" : 255,
"limb_p" : 8,
"ntt_size" : 32,
"modulus_q" : 4002409555221667393417789825735904156556882819939007885332058136124031650490837864442687629129015664037894272559787,
"bit_count_q" : 381,
"limb_q" : 12,
"root_of_unity" : 937917089079007706106976984802249742464848817460758522850752807661925904159,
"weierstrass_b" : 4,
"weierstrass_b_g2_re":4,
"weierstrass_b_g2_im":4,
"g1_gen_x" : 3685416753713387016781088315183077757961620795782546409894578378688607592378376318836054947676345821548104185464507,
"g1_gen_y" : 1339506544944476473020471379941921221584933875938349620426543736416511423956333506472724655353366534992391756441569,
"g2_gen_x_re" : 352701069587466618187139116011060144890029952792775240219908644239793785735715026873347600343865175952761926303160,
"g2_gen_x_im" : 3059144344244213709971259814753781636986470325476647558659373206291635324768958432433509563104347017837885763365758,
"g2_gen_y_re" : 1985150602287291935568054521177171638300868978215655730859378665066344726373823718423869104263333984641494340347905,
"g2_gen_y_im" : 927553665492332455747201965776037880757740193453592970025027978793976877002675564980949289727957565575433344219582,
"nonresidue" : -1
}

View File

@@ -1,21 +0,0 @@
{
"curve_name" : "bn254",
"modulus_p" : 21888242871839275222246405745257275088548364400416034343698204186575808495617,
"bit_count_p" : 254,
"limb_p" : 8,
"ntt_size" : 28,
"modulus_q" : 21888242871839275222246405745257275088696311157297823662689037894645226208583,
"bit_count_q" : 254,
"limb_q" : 8,
"root_of_unity": 19103219067921713944291392827692070036145651957329286315305642004821462161904,
"weierstrass_b" : 3,
"weierstrass_b_g2_re" : 19485874751759354771024239261021720505790618469301721065564631296452457478373,
"weierstrass_b_g2_im" : 266929791119991161246907387137283842545076965332900288569378510910307636690,
"g1_gen_x" : 1,
"g1_gen_y" : 2,
"g2_gen_x_re" : 10857046999023057135944570762232829481370756359578518086990519993285655852781,
"g2_gen_x_im" : 11559732032986387107991004021392285783925812861821192530917403151452391805634,
"g2_gen_y_re" : 8495653923123431417604973247489272438418190587263600148770280649306958101930,
"g2_gen_y_im" : 4082367875863433681332203403145435568316851327593401208105741076214120093531,
"nonresidue" : -1
}

View File

@@ -1,21 +0,0 @@
{
"curve_name" : "bw6_761",
"modulus_p" : 258664426012969094010652733694893533536393512754914660539884262666720468348340822774968888139573360124440321458177,
"bit_count_p" : 377,
"limb_p" : 12,
"ntt_size" : 46,
"modulus_q" : 6891450384315732539396789682275657542479668912536150109513790160209623422243491736087683183289411687640864567753786613451161759120554247759349511699125301598951605099378508850372543631423596795951899700429969112842764913119068299,
"bit_count_q" : 761,
"limb_q" : 24,
"root_of_unity" : 32863578547254505029601261939868325669770508939375122462904745766352256812585773382134936404344547323199885654433,
"weierstrass_b" : 6891450384315732539396789682275657542479668912536150109513790160209623422243491736087683183289411687640864567753786613451161759120554247759349511699125301598951605099378508850372543631423596795951899700429969112842764913119068298,
"weierstrass_b_g2_re" : 4,
"weierstrass_b_g2_im" : 0,
"g1_gen_x" : 6238772257594679368032145693622812838779005809760824733138787810501188623461307351759238099287535516224314149266511977132140828635950940021790489507611754366317801811090811367945064510304504157188661901055903167026722666149426237,
"g1_gen_y" : 2101735126520897423911504562215834951148127555913367997162789335052900271653517958562461315794228241561913734371411178226936527683203879553093934185950470971848972085321797958124416462268292467002957525517188485984766314758624099,
"g2_gen_x_re" : 6445332910596979336035888152774071626898886139774101364933948236926875073754470830732273879639675437155036544153105017729592600560631678554299562762294743927912429096636156401171909259073181112518725201388196280039960074422214428,
"g2_gen_x_im" : 1,
"g2_gen_y_re" : 562923658089539719386922163444547387757586534741080263946953401595155211934630598999300396317104182598044793758153214972605680357108252243146746187917218885078195819486220416605630144001533548163105316661692978285266378674355041,
"g2_gen_y_im" : 1,
"nonresidue" : -1
}

View File

@@ -1,346 +0,0 @@
import json
import math
import os
from string import Template
import sys
argv_list = ['thisfile', 'curve_json', 'command']
new_curve_args = dict(zip(argv_list, sys.argv[:len(argv_list)] + [""]*(len(argv_list) - len(sys.argv))))
def to_hex(val: int, length):
x = hex(val)[2:]
if len(x) % 8 != 0:
x = "0" * (8-len(x) % 8) + x
if len(x) != length:
x = "0" * (length-len(x)) + x
n = 8
chunks = [x[i:i+n] for i in range(0, len(x), n)][::-1]
s = ""
for c in chunks[:length // n]:
s += f'0x{c}, '
return s[:-2]
def compute_values(modulus, modulus_bit_count, limbs):
limb_size = 8*limbs
bit_size = 4*limb_size
modulus_ = to_hex(modulus,limb_size)
modulus_2 = to_hex(modulus*2,limb_size)
modulus_4 = to_hex(modulus*4,limb_size)
modulus_wide = to_hex(modulus,limb_size*2)
modulus_squared = to_hex(modulus*modulus,limb_size*2)
modulus_squared_2 = to_hex(modulus*modulus*2,limb_size*2)
modulus_squared_4 = to_hex(modulus*modulus*4,limb_size*2)
m_raw = int(math.floor(int(pow(2,2*modulus_bit_count) // modulus)))
m = to_hex(m_raw,limb_size)
one = to_hex(1,limb_size)
zero = to_hex(0,limb_size)
montgomery_r = to_hex(pow(2,bit_size,modulus),limb_size)
montgomery_r_inv = to_hex(pow(2,-bit_size,modulus),limb_size)
return (
modulus_,
modulus_2,
modulus_4,
modulus_wide,
modulus_squared,
modulus_squared_2,
modulus_squared_4,
m,
one,
zero,
montgomery_r,
montgomery_r_inv
)
def get_fq_params(modulus, modulus_bit_count, limbs, nonresidue):
(
modulus,
modulus_2,
modulus_4,
modulus_wide,
modulus_squared,
modulus_squared_2,
modulus_squared_4,
m,
one,
zero,
montgomery_r,
montgomery_r_inv
) = compute_values(modulus, modulus_bit_count, limbs)
limb_size = 8*limbs
nonresidue_is_negative = str(nonresidue < 0).lower()
nonresidue = abs(nonresidue)
return {
'fq_modulus': modulus,
'fq_modulus_2': modulus_2,
'fq_modulus_4': modulus_4,
'fq_modulus_wide': modulus_wide,
'fq_modulus_squared': modulus_squared,
'fq_modulus_squared_2': modulus_squared_2,
'fq_modulus_squared_4': modulus_squared_4,
'fq_m': m,
'fq_one': one,
'fq_zero': zero,
'fq_montgomery_r': montgomery_r,
'fq_montgomery_r_inv': montgomery_r_inv,
'nonresidue': nonresidue,
'nonresidue_is_negative': nonresidue_is_negative
}
def get_fp_params(modulus, modulus_bit_count, limbs, root_of_unity, size=0):
(
modulus_,
modulus_2,
modulus_4,
modulus_wide,
modulus_squared,
modulus_squared_2,
modulus_squared_4,
m,
one,
zero,
montgomery_r,
montgomery_r_inv
) = compute_values(modulus, modulus_bit_count, limbs)
limb_size = 8*limbs
if size > 0:
omega = ''
omega_inv = ''
inv = ''
omegas = []
omegas_inv = []
for k in range(size):
if k == 0:
om = root_of_unity
else:
om = pow(om, 2, modulus)
omegas.append(om)
omegas_inv.append(pow(om, -1, modulus))
omegas.reverse()
omegas_inv.reverse()
for k in range(size):
omega += "\n {"+ to_hex(omegas[k],limb_size)+"}," if k>0 else " {"+ to_hex(omegas[k],limb_size)+"},"
omega_inv += "\n {"+ to_hex(omegas_inv[k],limb_size)+"}," if k>0 else " {"+ to_hex(omegas_inv[k],limb_size)+"},"
inv += "\n {"+ to_hex(pow(int(pow(2,k+1)), -1, modulus),limb_size)+"}," if k>0 else " {"+ to_hex(pow(int(pow(2,k+1)), -1, modulus),limb_size)+"},"
return {
'fp_modulus': modulus_,
'fp_modulus_2': modulus_2,
'fp_modulus_4': modulus_4,
'fp_modulus_wide': modulus_wide,
'fp_modulus_squared': modulus_squared,
'fp_modulus_squared_2': modulus_squared_2,
'fp_modulus_squared_4': modulus_squared_4,
'fp_m': m,
'fp_one': one,
'fp_zero': zero,
'fp_montgomery_r': montgomery_r,
'fp_montgomery_r_inv': montgomery_r_inv,
'omega': omega[:-1],
'omega_inv': omega_inv[:-1],
'inv': inv[:-1],
}
def get_generators(g1_gen_x, g1_gen_y, g2_gen_x_re, g2_gen_x_im, g2_gen_y_re, g2_gen_y_im, size):
return {
'fq_gen_x': to_hex(g1_gen_x, size),
'fq_gen_y': to_hex(g1_gen_y, size),
'fq_gen_x_re': to_hex(g2_gen_x_re, size),
'fq_gen_x_im': to_hex(g2_gen_x_im, size),
'fq_gen_y_re': to_hex(g2_gen_y_re, size),
'fq_gen_y_im': to_hex(g2_gen_y_im, size)
}
def get_weier_params(weierstrass_b, weierstrass_b_g2_re, weierstrass_b_g2_im, size):
return {
'weier_b': to_hex(weierstrass_b, size),
'weier_b_g2_re': to_hex(weierstrass_b_g2_re, size),
'weier_b_g2_im': to_hex(weierstrass_b_g2_im, size),
}
def get_params(config):
global ntt_size
curve_name = config["curve_name"]
modulus_p = config["modulus_p"]
bit_count_p = config["bit_count_p"]
limb_p = config["limb_p"]
ntt_size = config["ntt_size"]
modulus_q = config["modulus_q"]
bit_count_q = config["bit_count_q"]
limb_q = config["limb_q"]
root_of_unity = config["root_of_unity"]
nonresidue = config["nonresidue"]
if root_of_unity == modulus_p:
sys.exit("Invalid root_of_unity value; please update in curve parameters")
weierstrass_b = config["weierstrass_b"]
weierstrass_b_g2_re = config["weierstrass_b_g2_re"]
weierstrass_b_g2_im = config["weierstrass_b_g2_im"]
g1_gen_x = config["g1_gen_x"]
g1_gen_y = config["g1_gen_y"]
g2_generator_x_re = config["g2_gen_x_re"]
g2_generator_x_im = config["g2_gen_x_im"]
g2_generator_y_re = config["g2_gen_y_re"]
g2_generator_y_im = config["g2_gen_y_im"]
params = {
'curve_name_U': curve_name.upper(),
'fp_num_limbs': limb_p,
'fq_num_limbs': limb_q,
'fp_modulus_bit_count': bit_count_p,
'fq_modulus_bit_count': bit_count_q,
'num_omegas': ntt_size
}
fp_params = get_fp_params(modulus_p, bit_count_p, limb_p, root_of_unity, ntt_size)
fq_params = get_fq_params(modulus_q, bit_count_q, limb_q, nonresidue)
generators = get_generators(g1_gen_x, g1_gen_y, g2_generator_x_re, g2_generator_x_im, g2_generator_y_re, g2_generator_y_im, 8*limb_q)
weier_params = get_weier_params(weierstrass_b, weierstrass_b_g2_re, weierstrass_b_g2_im, 8*limb_q)
return {
**params,
**fp_params,
**fq_params,
**generators,
**weier_params
}
config = None
with open(new_curve_args['curve_json']) as json_file:
config = json.load(json_file)
curve_name_lower = config["curve_name"].lower()
curve_name_upper = config["curve_name"].upper()
limb_q = config["limb_q"]
limb_p = config["limb_p"]
# Create Cuda interface
newpath = f'./icicle/curves/{curve_name_lower}'
if not os.path.exists(newpath):
os.makedirs(newpath)
with open("./icicle/curves/curve_template/params.cuh.tmpl", "r") as params_file:
params_file_template = Template(params_file.read())
params = get_params(config)
params_content = params_file_template.safe_substitute(params)
with open(f'./icicle/curves/{curve_name_lower}/params.cuh', 'w') as f:
f.write(params_content)
if new_curve_args['command'] != '-update':
with open("./icicle/curves/curve_template/lde.cu.tmpl", "r") as lde_file:
template_content = Template(lde_file.read())
lde_content = template_content.safe_substitute(
CURVE_NAME_U=curve_name_upper,
CURVE_NAME_L=curve_name_lower
)
with open(f'./icicle/curves/{curve_name_lower}/lde.cu', 'w') as f:
f.write(lde_content)
with open("./icicle/curves/curve_template/msm.cu.tmpl", "r") as msm_file:
template_content = Template(msm_file.read())
msm_content = template_content.safe_substitute(
CURVE_NAME_U=curve_name_upper,
CURVE_NAME_L=curve_name_lower
)
with open(f'./icicle/curves/{curve_name_lower}/msm.cu', 'w') as f:
f.write(msm_content)
with open("./icicle/curves/curve_template/ve_mod_mult.cu.tmpl", "r") as ve_mod_mult_file:
template_content = Template(ve_mod_mult_file.read())
ve_mod_mult_content = template_content.safe_substitute(
CURVE_NAME_U=curve_name_upper,
CURVE_NAME_L=curve_name_lower
)
with open(f'./icicle/curves/{curve_name_lower}/ve_mod_mult.cu', 'w') as f:
f.write(ve_mod_mult_content)
with open(f'./icicle/curves/curve_template/curve_config.cuh.tmpl', 'r') as cc:
template_content = Template(cc.read())
cc_content = template_content.safe_substitute(
CURVE_NAME_U=curve_name_upper,
)
with open(f'./icicle/curves/{curve_name_lower}/curve_config.cuh', 'w') as f:
f.write(cc_content)
with open(f'./icicle/curves/curve_template/projective.cu.tmpl', 'r') as proj:
template_content = Template(proj.read())
proj_content = template_content.safe_substitute(
CURVE_NAME_U=curve_name_upper,
CURVE_NAME_L=curve_name_lower
)
with open(f'./icicle/curves/{curve_name_lower}/projective.cu', 'w') as f:
f.write(proj_content)
with open(f'./icicle/curves/curve_template/supported_operations.cu.tmpl', 'r') as supp_ops:
template_content = Template(supp_ops.read())
supp_ops_content = template_content.safe_substitute()
with open(f'./icicle/curves/{curve_name_lower}/supported_operations.cu', 'w') as f:
f.write(supp_ops_content)
with open('./icicle/curves/index.cu', 'r+') as f:
index_text = f.read()
if index_text.find(curve_name_lower) == -1:
f.write(f'\n#include "{curve_name_lower}/supported_operations.cu"')
# Create Rust interface and tests
if limb_p == limb_q:
with open("./src/curve_templates/curve_same_limbs.rs", "r") as curve_file:
content = curve_file.read()
content = content.replace("CURVE_NAME_U",curve_name_upper)
content = content.replace("CURVE_NAME_L",curve_name_lower)
content = content.replace("_limbs_p",str(limb_p * 8 * 4))
content = content.replace("limbs_p",str(limb_p))
text_file = open("./src/curves/"+curve_name_lower+".rs", "w")
n = text_file.write(content)
text_file.close()
else:
with open("./src/curve_templates/curve_different_limbs.rs", "r") as curve_file:
content = curve_file.read()
content = content.replace("CURVE_NAME_U",curve_name_upper)
content = content.replace("CURVE_NAME_L",curve_name_lower)
content = content.replace("_limbs_p",str(limb_p * 8 * 4))
content = content.replace("limbs_p",str(limb_p))
content = content.replace("_limbs_q",str(limb_q * 8 * 4))
content = content.replace("limbs_q",str(limb_q))
text_file = open("./src/curves/"+curve_name_lower+".rs", "w")
n = text_file.write(content)
text_file.close()
with open("./src/curve_templates/test.rs", "r") as test_file:
content = test_file.read()
content = content.replace("CURVE_NAME_U",curve_name_upper)
content = content.replace("CURVE_NAME_L",curve_name_lower)
text_file = open("./src/test_"+curve_name_lower+".rs", "w")
n = text_file.write(content)
text_file.close()
with open('./src/curves/mod.rs', 'r+') as f:
mod_text = f.read()
if mod_text.find(curve_name_lower) == -1:
f.write('\npub mod ' + curve_name_lower + ';')
with open('./src/lib.rs', 'r+') as f:
lib_text = f.read()
if lib_text.find(curve_name_lower) == -1:
f.write('\npub mod ' + curve_name_lower + ';')

1
docs/.codespellignore Normal file
View File

@@ -0,0 +1 @@
ICICLE

17
docs/.gitignore vendored Normal file
View File

@@ -0,0 +1,17 @@
.docusaurus/
node_modules/
yarn.lock
.DS_Store
# tex build artifacts
.aux
.bbl
.bcf
.blg
.fdb_latexmk
.fls
.log
.out
.xml
.gz
.toc

17
docs/.prettierignore Normal file
View File

@@ -0,0 +1,17 @@
.docusaurus/
node_modules/
yarn.lock
.DS_Store
# tex build artifacts
.aux
.bbl
.bcf
.blg
.fdb_latexmk
.fls
.log
.out
.xml
.gz
.toc

10
docs/.prettierrc Normal file
View File

@@ -0,0 +1,10 @@
{
"semi": false,
"singleQuote": true,
"trailingComma": "es5",
"printWidth": 80,
"tabWidth": 2,
"useTabs": false,
"proseWrap": "preserve",
"endOfLine": "lf"
}

1
docs/CNAME Normal file
View File

@@ -0,0 +1 @@
dev.ingonyama.com

39
docs/README.md Normal file
View File

@@ -0,0 +1,39 @@
# Website
This website is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator.
### Installation
```
$ npm i
```
### Local Development
```
$ npm start
```
This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
### Build
```
$ npm run build
```
This command generates static content into the `build` directory and can be served using any static contents hosting service.
### Deployment
Using SSH:
```
$ USE_SSH=true npm run deploy
```
Not using SSH:
```
$ GIT_USER=<Your GitHub username> npm run deploy
```

3
docs/babel.config.js Normal file
View File

@@ -0,0 +1,3 @@
module.exports = {
presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
};

12
docs/docs/ZKContainers.md Normal file
View File

@@ -0,0 +1,12 @@
# ZKContainer
We found that developing ZK provers with ICICLE gives developers the ability to scale ZK provers across many machines and many GPUs. To make this possible we developed the ZKContainer.
## What is a ZKContainer?
A ZKContainer is a standardized, optimized and secure docker container that we configured with ICICLE applications in mind. A developer using our ZKContainer can deploy an ICICLE application on a single machine or on a thousand GPU machines in a data center with minimal concerns regarding compatibility.
ZKContainer has been used by Ingonyama clients to achieve scalability across large data centers.
We suggest you read our [article](https://medium.com/@ingonyama/product-announcement-zk-containers-0e2a1f2d0a2b) regarding ZKContainer to understand the benefits of using them.
![ZKContainer inside a ZK data center](../static/img/architecture-zkcontainer.png)

View File

@@ -0,0 +1,23 @@
# Contributor's Guide
We welcome all contributions with open arms. At Ingonyama we take a village approach, believing it takes many hands and minds to build a ecosystem.
## Contributing to ICICLE
- Make suggestions or report bugs via [GitHub issues](https://github.com/ingonyama-zk/icicle/issues)
- Contribute to the ICICLE by opening a [pull request](https://github.com/ingonyama-zk/icicle/pulls).
- Contribute to our [documentation](https://github.com/ingonyama-zk/icicle/tree/main/docs) and [examples](https://github.com/ingonyama-zk/icicle/tree/main/examples).
- Ask questions on Discord
### Opening a pull request
When opening a [pull request](https://github.com/ingonyama-zk/icicle/pulls) please keep the following in mind.
- `Clear Purpose` - The pull request should solve a single issue and be clean of any unrelated changes.
- `Clear description` - If the pull request is for a new feature describe what you built, why you added it and how its best that we test it. For bug fixes please describe the issue and the solution.
- `Consistent style` - Rust and Golang code should be linted by the official linters (golang fmt and rust fmt) and maintain a proper style. For CUDA and C++ code we use [`clang-format`](https://github.com/ingonyama-zk/icicle/blob/main/.clang-format), [here](https://github.com/ingonyama-zk/icicle/blob/605c25f9d22135c54ac49683b710fe2ce06e2300/.github/workflows/main-format.yml#L46) you can see how we run it.
- `Minimal Tests` - please add test which cover basic usage of your changes .
## Questions?
Find us on [Discord](https://discord.gg/6vYrE7waPj).

23
docs/docs/grants.md Normal file
View File

@@ -0,0 +1,23 @@
# Ingonyama Grant programs
Ingonyama understands the importance of supporting and fostering a vibrant community of researchers and builders to advance ZK. To encourage progress, we are not only developing in the open but also sharing resources with researchers and builders through various programs.
## ICICLE ZK-GPU Ecosystem Grant
Ingonyama invites researchers and practitioners to collaborate in advancing ZK acceleration. We are allocating $100,000 for grants to support this initiative.
### Bounties & Grants
Eligibility for grants includes:
1. **Students**: Utilize ICICLE in your research.
2. **Performance Improvement**: Enhance the performance of accelerated primitives in ICICLE.
3. **Protocol Porting**: Migrate existing ZK protocols to ICICLE.
4. **New Primitives**: Contribute new primitives to ICICLE.
5. **Benchmarking**: Compare ZK benchmarks against ICICLE.
## Contact
For questions or submissions: [grants@ingonyama.com](mailto:grants@ingonyama.com)
**Read the full article [here](https://www.ingonyama.com/blog/icicle-for-researchers-grants-challenges)**

View File

@@ -0,0 +1,138 @@
# Run ICICLE on Google Colab
Google Colab lets you use a GPU free of charge, it's an Nvidia T4 GPU with 16 GB of memory, capable of running latest CUDA (tested on Cuda 12.2)
As Colab is able to interact with shell commands, a user can also install a framework and load git repositories into Colab space.
## Prepare Colab environment
First thing to do in a notebook is to set the runtime type to a T4 GPU.
- in the upper corner click on the dropdown menu and select "change runtime type"
![Change runtime](../../static/img/colab_change_runtime.png)
- In the window select "T4 GPU" and press Save
![T4 GPU](../../static/img/t4_gpu.png)
Installing Rust is rather simple, just execute the following command:
```sh
!apt install rustc cargo
```
To test the installation of Rust:
```sh
!rustc --version
!cargo --version
```
A successful installation will result in a rustc and cargo version print, a faulty installation will look like this:
```sh
/bin/bash: line 1: rustc: command not found
/bin/bash: line 1: cargo: command not found
```
Now we will check the environment:
```sh
!nvcc --version
!gcc --version
!cmake --version
!nvidia-smi
```
A correct environment should print the result with no bash errors for `nvidia-smi` command and result in a **Teslt T4 GPU** type:
```sh
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
cmake version 3.27.9
CMake suite maintained and supported by Kitware (kitware.com/cmake).
Wed Jan 17 13:10:18 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 39C P8 9W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
```
## Cloning ICICLE and running test
Now we are ready to clone ICICE repository,
```sh
!git clone https://github.com/ingonyama-zk/icicle.git
```
We now can browse the repository and run tests to check the runtime environment:
```sh
!ls -la
%cd icicle
```
Let's run a test!
Navigate to icicle/wrappers/rust/icicle-curves/icicle-bn254 and run cargo test:
```sh
%cd wrappers/rust/icicle-curves/icicle-bn254/
!cargo test --release
```
:::note
Compiling the first time may take a while
:::
Test run should end like this:
```sh
running 15 tests
test curve::tests::test_ark_point_convert ... ok
test curve::tests::test_ark_scalar_convert ... ok
test curve::tests::test_affine_projective_convert ... ok
test curve::tests::test_point_equality ... ok
test curve::tests::test_field_convert_montgomery ... ok
test curve::tests::test_scalar_equality ... ok
test curve::tests::test_points_convert_montgomery ... ok
test msm::tests::test_msm ... ok
test msm::tests::test_msm_skewed_distributions ... ok
test ntt::tests::test_ntt ... ok
test ntt::tests::test_ntt_arbitrary_coset ... ok
test msm::tests::test_msm_batch has been running for over 60 seconds
test msm::tests::test_msm_batch ... ok
test ntt::tests::test_ntt_coset_from_subgroup ... ok
test ntt::tests::test_ntt_device_async ... ok
test ntt::tests::test_ntt_batch ... ok
test result: ok. 15 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 99.39s
```
Viola, ICICLE in Colab!

View File

@@ -0,0 +1,105 @@
# Golang bindings
Golang bindings allow you to use ICICLE as a golang library.
The source code for all Golang libraries can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang).
The Golang bindings are comprised of multiple packages.
[`core`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/core) which defines all shared methods and structures, such as configuration structures, or memory slices.
[`cuda-runtime`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/cuda_runtime) which defines abstractions for CUDA methods for allocating memory, initializing and managing streams, and `DeviceContext` which enables users to define and keep track of devices.
Each curve has its own package which you can find [here](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/curves). If your project uses BN254 you only need to install that single package named [`bn254`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/golang/curves/bn254).
## Using ICICLE Golang bindings in your project
To add ICICLE to your `go.mod` file.
```bash
go get github.com/ingonyama-zk/icicle
```
If you want to specify a specific branch
```bash
go get github.com/ingonyama-zk/icicle@<branch_name>
```
For a specific commit
```bash
go get github.com/ingonyama-zk/icicle@<commit_id>
```
To build the shared libraries you can run this script:
```
./build <curve> [G2_enabled]
curve - The name of the curve to build or "all" to build all curves
G2_enabled - Optional - To build with G2 enabled
```
For example if you want to build all curves with G2 enabled you would run:
```bash
./build.sh all ON
```
If you are interested in building a specific curve you would run:
```bash
./build.sh bls12_381 ON
```
Now you can import ICICLE into your project
```golang
import (
"github.com/stretchr/testify/assert"
"testing"
"github.com/ingonyama-zk/icicle/wrappers/golang/core"
cr "github.com/ingonyama-zk/icicle/wrappers/golang/cuda_runtime"
)
...
```
## Running tests
To run all tests, for all curves:
```bash
go test --tags=g2 ./... -count=1
```
If you dont want to include g2 tests then drop `--tags=g2`.
If you wish to run test for a specific curve:
```bash
go test <path_to_curve> -count=1
```
## How do Golang bindings work?
The libraries produced from the CUDA code compilation are used to bind Golang to ICICLE's CUDA code.
1. These libraries (named `libingo_<curve>.a`) can be imported in your Go project to leverage the GPU accelerated functionalities provided by ICICLE.
2. In your Go project, you can use `cgo` to link these libraries. Here's a basic example on how you can use `cgo` to link these libraries:
```go
/*
#cgo LDFLAGS: -L/path/to/shared/libs -lingo_bn254
#include "icicle.h" // make sure you use the correct header file(s)
*/
import "C"
func main() {
// Now you can call the C functions from the ICICLE libraries.
// Note that C function calls are prefixed with 'C.' in Go code.
}
```
Replace `/path/to/shared/libs` with the actual path where the shared libraries are located on your system.

View File

@@ -0,0 +1,92 @@
# MSM Pre computation
To understand the theory behind MSM pre computation technique refer to Niall Emmart's [talk](https://youtu.be/KAWlySN7Hm8?feature=shared&t=1734).
### Supported curves
`bls12-377`, `bls12-381`, `bn254`, `bw6-761`
## Core package
## MSM `PrecomputeBases`
`PrecomputeBases` and `G2PrecomputeBases` exists for all supported curves.
#### Description
This function extends each provided base point $(P)$ with its multiples $(2^lP, 2^{2l}P, ..., 2^{(precompute_factor - 1) \cdot l}P)$, where $(l)$ is a level of precomputation determined by the `precompute_factor`. The extended set of points facilitates faster MSM computations by allowing the MSM algorithm to leverage precomputed multiples of base points, reducing the number of point additions required during the computation.
The precomputation process is crucial for optimizing MSM operations, especially when dealing with large sets of points and scalars. By precomputing and storing multiples of the base points, the MSM function can more efficiently compute the scalar-point multiplications.
#### `PrecomputeBases`
Precomputes bases for MSM by extending each base point with its multiples.
```go
func PrecomputeBases(points core.HostOrDeviceSlice, precomputeFactor int32, c int32, ctx *cr.DeviceContext, outputBases core.DeviceSlice) cr.CudaError
```
##### Parameters
- **`points`**: A slice of the original affine points to be extended with their multiples.
- **`precomputeFactor`**: Determines the total number of points to precompute for each base point.
- **`c`**: Currently unused; reserved for future compatibility.
- **`ctx`**: CUDA device context specifying the execution environment.
- **`outputBases`**: The device slice allocated for storing the extended bases.
##### Example
```go
cfg := GetDefaultMSMConfig()
points := GenerateAffinePoints(1024)
precomputeFactor := 8
var precomputeOut core.DeviceSlice
_, e := precomputeOut.Malloc(points[0].Size()*points.Len()*int(precomputeFactor), points[0].Size())
err := PrecomputeBases(points, precomputeFactor, 0, &cfg.Ctx, precomputeOut)
if err != cr.CudaSuccess {
log.Fatalf("PrecomputeBases failed: %v", err)
}
```
#### `G2PrecomputeBases`
This method is the same as `PrecomputeBases` but for G2 points. Extends each G2 curve base point with its multiples for optimized MSM computations.
```go
func G2PrecomputeBases(points core.HostOrDeviceSlice, precomputeFactor int32, c int32, ctx *cr.DeviceContext, outputBases core.DeviceSlice) cr.CudaError
```
##### Parameters
- **`points`**: A slice of G2 curve points to be extended.
- **`precomputeFactor`**: The total number of points to precompute for each base.
- **`c`**: Reserved for future use to ensure compatibility with MSM operations.
- **`ctx`**: Specifies the CUDA device context for execution.
- **`outputBases`**: Allocated device slice for the extended bases.
##### Example
```go
cfg := G2GetDefaultMSMConfig()
points := G2GenerateAffinePoints(1024)
precomputeFactor := 8
var precomputeOut core.DeviceSlice
_, e := precomputeOut.Malloc(points[0].Size()*points.Len()*int(precomputeFactor), points[0].Size())
err := G2PrecomputeBases(points, precomputeFactor, 0, &cfg.Ctx, precomputeOut)
if err != cr.CudaSuccess {
log.Fatalf("G2PrecomputeBases failed: %v", err)
}
```
### Benchmarks
Benchmarks where performed on a Nvidia RTX 3090Ti.
| Pre-computation factor | bn254 size `2^20` MSM, ms. | bn254 size `2^12` MSM, size `2^10` batch, ms. | bls12-381 size `2^20` MSM, ms. | bls12-381 size `2^12` MSM, size `2^10` batch, ms. |
| ------------- | ------------- | ------------- | ------------- | ------------- |
| 1 | 14.1 | 82.8 | 25.5 | 136.7 |
| 2 | 11.8 | 76.6 | 20.3 | 123.8 |
| 4 | 10.9 | 73.8 | 18.1 | 117.8 |
| 8 | 10.6 | 73.7 | 17.2 | 116.0 |

View File

@@ -0,0 +1,200 @@
# MSM
### Supported curves
`bls12-377`, `bls12-381`, `bn254`, `bw6-761`
## MSM Example
```go
package main
import (
"github.com/ingonyama-zk/icicle/wrappers/golang/core"
cr "github.com/ingonyama-zk/icicle/wrappers/golang/cuda_runtime"
)
func Main() {
// Obtain the default MSM configuration.
cfg := GetDefaultMSMConfig()
// Define the size of the problem, here 2^18.
size := 1 << 18
// Generate scalars and points for the MSM operation.
scalars := GenerateScalars(size)
points := GenerateAffinePoints(size)
// Create a CUDA stream for asynchronous operations.
stream, _ := cr.CreateStream()
var p Projective
// Allocate memory on the device for the result of the MSM operation.
var out core.DeviceSlice
_, e := out.MallocAsync(p.Size(), p.Size(), stream)
if e != cr.CudaSuccess {
panic(e)
}
// Set the CUDA stream in the MSM configuration.
cfg.Ctx.Stream = &stream
cfg.IsAsync = true
// Perform the MSM operation.
e = Msm(scalars, points, &cfg, out)
if e != cr.CudaSuccess {
panic(e)
}
// Allocate host memory for the results and copy the results from the device.
outHost := make(core.HostSlice[Projective], 1)
cr.SynchronizeStream(&stream)
outHost.CopyFromDevice(&out)
// Free the device memory allocated for the results.
out.Free()
}
```
## MSM Method
```go
func Msm(scalars core.HostOrDeviceSlice, points core.HostOrDeviceSlice, cfg *core.MSMConfig, results core.HostOrDeviceSlice) cr.CudaError
```
### Parameters
- **scalars**: A slice containing the scalars for multiplication. It can reside either in host memory or device memory.
- **points**: A slice containing the points to be multiplied with scalars. Like scalars, these can also be in host or device memory.
- **cfg**: A pointer to an `MSMConfig` object, which contains various configuration options for the MSM operation.
- **results**: A slice where the results of the MSM operation will be stored. This slice can be in host or device memory.
### Return Value
- **CudaError**: Returns a CUDA error code indicating the success or failure of the MSM operation.
## MSMConfig
The `MSMConfig` structure holds configuration parameters for the MSM operation, allowing customization of its behavior to optimize performance based on the specifics of the operation or the underlying hardware.
```go
type MSMConfig struct {
Ctx cr.DeviceContext
PrecomputeFactor int32
C int32
Bitsize int32
LargeBucketFactor int32
batchSize int32
areScalarsOnDevice bool
AreScalarsMontgomeryForm bool
arePointsOnDevice bool
ArePointsMontgomeryForm bool
areResultsOnDevice bool
IsBigTriangle bool
IsAsync bool
}
```
### Fields
- **Ctx**: Device context containing details like device id and stream.
- **PrecomputeFactor**: Controls the number of extra points to pre-compute.
- **C**: Window bitsize, a key parameter in the "bucket method" for MSM.
- **Bitsize**: Number of bits of the largest scalar.
- **LargeBucketFactor**: Sensitivity to frequently occurring buckets.
- **batchSize**: Number of results to compute in one batch.
- **areScalarsOnDevice**: Indicates if scalars are located on the device.
- **AreScalarsMontgomeryForm**: True if scalars are in Montgomery form.
- **arePointsOnDevice**: Indicates if points are located on the device.
- **ArePointsMontgomeryForm**: True if point coordinates are in Montgomery form.
- **areResultsOnDevice**: Indicates if results are stored on the device.
- **IsBigTriangle**: If `true` MSM will run in Large triangle accumulation if `false` Bucket accumulation will be chosen. Default value: false.
- **IsAsync**: If true, runs MSM asynchronously.
### Default Configuration
Use `GetDefaultMSMConfig` to obtain a default configuration, which can then be customized as needed.
```go
func GetDefaultMSMConfig() MSMConfig
```
## How do I toggle between the supported algorithms?
When creating your MSM Config you may state which algorithm you wish to use. `cfg.Ctx.IsBigTriangle = true` will activate Large triangle accumulation and `cfg.Ctx.IsBigTriangle = false` will activate Bucket accumulation.
```go
...
// Obtain the default MSM configuration.
cfg := GetDefaultMSMConfig()
cfg.Ctx.IsBigTriangle = true
...
```
## How do I toggle between MSM modes?
Toggling between MSM modes occurs automatically based on the number of results you are expecting from the `MSM` function.
The number of results is interpreted from the size of `var out core.DeviceSlice`. Thus its important when allocating memory for `var out core.DeviceSlice` to make sure that you are allocating `<number of results> X <size of a single point>`.
```go
...
batchSize := 3
var p G2Projective
var out core.DeviceSlice
out.Malloc(batchSize*p.Size(), p.Size())
...
```
## Support for G2 group
To activate G2 support first you must make sure you are building the static libraries with G2 feature enabled.
```bash
./build.sh bls12_381 ON
```
Now when importing `icicle`, you should have access to G2 features.
```go
import (
"github.com/ingonyama-zk/icicle/wrappers/golang/core"
)
```
These features include `G2Projective` and `G2Affine` points as well as a `G2Msm` method.
```go
...
cfg := GetDefaultMSMConfig()
size := 1 << 12
batchSize := 3
totalSize := size * batchSize
scalars := GenerateScalars(totalSize)
points := G2GenerateAffinePoints(totalSize)
var p G2Projective
var out core.DeviceSlice
out.Malloc(batchSize*p.Size(), p.Size())
G2Msm(scalars, points, &cfg, out)
...
```
`G2Msm` works the same way as normal MSM, the difference is that it uses G2 Points.
Additionally when you are building your application make sure to use the g2 feature flag
```bash
go build -tags=g2
```

View File

@@ -0,0 +1,139 @@
# Multi GPU APIs
To learn more about the theory of Multi GPU programming refer to [this part](../multi-gpu.md) of documentation.
Here we will cover the core multi GPU apis and a [example](#a-multi-gpu-example)
## A Multi GPU example
In this example we will display how you can
1. Fetch the number of devices installed on a machine
2. For every GPU launch a thread and set an active device per thread.
3. Execute a MSM on each GPU
```go
func main() {
numDevices, _ := cuda_runtime.GetDeviceCount()
fmt.Println("There are ", numDevices, " devices available")
wg := sync.WaitGroup{}
for i := 0; i < numDevices; i++ {
wg.Add(1)
// RunOnDevice makes sure each MSM runs on a single thread
cuda_runtime.RunOnDevice(i, func(args ...any) {
defer wg.Done()
cfg := GetDefaultMSMConfig()
cfg.IsAsync = true
for _, power := range []int{10, 18} {
size := 1 << power // 2^pwr
// generate random scalars
scalars := GenerateScalars(size)
points := GenerateAffinePoints(size)
// create a stream and allocate result pointer
stream, _ := cuda_runtime.CreateStream()
var p Projective
var out core.DeviceSlice
_, e := out.MallocAsync(p.Size(), p.Size(), stream)
// assign stream to device context
cfg.Ctx.Stream = &stream
// execute MSM
e = Msm(scalars, points, &cfg, out)
// read result from device
outHost := make(core.HostSlice[Projective], 1)
outHost.CopyFromDeviceAsync(&out, stream)
out.FreeAsync(stream)
// sync the stream
cr.SynchronizeStream(&stream)
}
})
}
wg.Wait()
}
```
This example demonstrates a basic pattern for distributing tasks across multiple GPUs. The `RunOnDevice` function ensures that each goroutine is executed on its designated GPU and a corresponding thread.
## Device Management API
To streamline device management we offer as part of `cuda_runtime` package methods for dealing with devices.
### `RunOnDevice`
Runs a given function on a specific GPU device, ensuring that all CUDA calls within the function are executed on the selected device.
In Go, most concurrency can be done via Goroutines. However, there is no guarantee that a goroutine stays on a specific host thread.
`RunOnDevice` was designed to solve this caveat and insure that the goroutine will stay on a specific host thread.
`RunOnDevice` will lock a goroutine into a specific host thread, sets a current GPU device, runs a provided function, and unlocks the goroutine from the host thread after the provided function finishes.
While the goroutine is locked to the host thread, the Go runtime will not assign other goroutine's to that host thread.
**Parameters:**
- `deviceId int`: The ID of the device on which to run the provided function. Device IDs start from 0.
- `funcToRun func(args ...any)`: The function to be executed on the specified device.
- `args ...any`: Arguments to be passed to `funcToRun`.
**Behavior:**
- The function `funcToRun` is executed in a new goroutine that is locked to a specific OS thread to ensure that all CUDA calls within the function target the specified device.
- It's important to note that any goroutines launched within `funcToRun` are not automatically bound to the same GPU device. If necessary, `RunOnDevice` should be called again within such goroutines with the same `deviceId`.
**Example:**
```go
RunOnDevice(0, func(args ...any) {
fmt.Println("This runs on GPU 0")
// CUDA-related operations here will target GPU 0
}, nil)
```
### `SetDevice`
Sets the active device for the current host thread. All subsequent CUDA calls made from this thread will target the specified device.
**Parameters:**
- `device int`: The ID of the device to set as the current device.
**Returns:**
- `CudaError`: Error code indicating the success or failure of the operation.
### `GetDeviceCount`
Retrieves the number of CUDA-capable devices available on the host.
**Returns:**
- `(int, CudaError)`: The number of devices and an error code indicating the success or failure of the operation.
### `GetDevice`
Gets the ID of the currently active device for the calling host thread.
**Returns:**
- `(int, CudaError)`: The ID of the current device and an error code indicating the success or failure of the operation.
### `GetDeviceFromPointer`
Retrieves the device associated with a given pointer.
**Parameters:**
- `ptr unsafe.Pointer`: Pointer to query.
**Returns:**
- `int`: The device ID associated with the memory pointed to by `ptr`.
This documentation should provide a clear understanding of how to effectively manage multiple GPUs in Go applications using CUDA, with a particular emphasis on the `RunOnDevice` function for executing tasks on specific GPUs.

View File

@@ -0,0 +1,104 @@
# NTT
### Supported curves
`bls12-377`, `bls12-381`, `bn254`, `bw6-761`
## NTT Example
```go
package main
import (
"github.com/ingonyama-zk/icicle/wrappers/golang/core"
cr "github.com/ingonyama-zk/icicle/wrappers/golang/cuda_runtime"
)
func Main() {
// Obtain the default NTT configuration with a predefined coset generator.
cfg := GetDefaultNttConfig()
// Define the size of the input scalars.
size := 1 << 18
// Generate scalars for the NTT operation.
scalars := GenerateScalars(size)
// Set the direction of the NTT (forward or inverse).
dir := core.KForward
// Allocate memory for the results of the NTT operation.
results := make(core.HostSlice[ScalarField], size)
// Perform the NTT operation.
err := Ntt(scalars, dir, &cfg, results)
if err != cr.CudaSuccess {
panic("NTT operation failed")
}
}
```
## NTT Method
```go
func Ntt[T any](scalars core.HostOrDeviceSlice, dir core.NTTDir, cfg *core.NTTConfig[T], results core.HostOrDeviceSlice) core.IcicleError
```
### Parameters
- **scalars**: A slice containing the input scalars for the transform. It can reside either in host memory or device memory.
- **dir**: The direction of the NTT operation (`KForward` or `KInverse`).
- **cfg**: A pointer to an `NTTConfig` object, containing configuration options for the NTT operation.
- **results**: A slice where the results of the NTT operation will be stored. This slice can be in host or device memory.
### Return Value
- **CudaError**: Returns a CUDA error code indicating the success or failure of the NTT operation.
## NTT Configuration (NTTConfig)
The `NTTConfig` structure holds configuration parameters for the NTT operation, allowing customization of its behavior to optimize performance based on the specifics of your protocol.
```go
type NTTConfig[T any] struct {
Ctx cr.DeviceContext
CosetGen T
BatchSize int32
ColumnsBatch bool
Ordering Ordering
areInputsOnDevice bool
areOutputsOnDevice bool
IsAsync bool
NttAlgorithm NttAlgorithm
}
```
### Fields
- **Ctx**: Device context containing details like device ID and stream ID.
- **CosetGen**: Coset generator used for coset (i)NTTs, defaulting to no coset being used.
- **BatchSize**: The number of NTTs to compute in one operation, defaulting to 1.
- **ColumnsBatch**: If true the function will compute the NTTs over the columns of the input matrix and not over the rows. Defaults to `false`.
- **Ordering**: Ordering of inputs and outputs (`KNN`, `KNR`, `KRN`, `KRR`, `KMN`, `KNM`), affecting how data is arranged.
- **areInputsOnDevice**: Indicates if input scalars are located on the device.
- **areOutputsOnDevice**: Indicates if results are stored on the device.
- **IsAsync**: Controls whether the NTT operation runs asynchronously.
- **NttAlgorithm**: Explicitly select the NTT algorithm. Default value: Auto (the implementation selects radix-2 or mixed-radix algorithm based on heuristics).
### Default Configuration
Use `GetDefaultNTTConfig` to obtain a default configuration, customizable as needed.
```go
func GetDefaultNTTConfig[T any](cosetGen T) NTTConfig[T]
```
### Initializing the NTT Domain
Before performing NTT operations, it's necessary to initialize the NTT domain; it only needs to be called once per GPU since the twiddles are cached.
```go
func InitDomain(primitiveRoot ScalarField, ctx cr.DeviceContext, fastTwiddles bool) core.IcicleError
```
This function initializes the domain with a given primitive root, optionally using fast twiddle factors to optimize the computation.

View File

@@ -0,0 +1,132 @@
# Vector Operations
## Overview
The VecOps API provides efficient vector operations such as addition, subtraction, and multiplication.
## Example
### Vector addition
```go
package main
import (
"github.com/ingonyama-zk/icicle/wrappers/golang/core"
cr "github.com/ingonyama-zk/icicle/wrappers/golang/cuda_runtime"
)
func main() {
testSize := 1 << 12
a := GenerateScalars(testSize)
b := GenerateScalars(testSize)
out := make(core.HostSlice[ScalarField], testSize)
cfg := core.DefaultVecOpsConfig()
// Perform vector addition
err := VecOp(a, b, out, cfg, core.Add)
if err != cr.CudaSuccess {
panic("Vector addition failed")
}
}
```
### Vector Subtraction
```go
package main
import (
"github.com/ingonyama-zk/icicle/wrappers/golang/core"
cr "github.com/ingonyama-zk/icicle/wrappers/golang/cuda_runtime"
)
func main() {
testSize := 1 << 12
a := GenerateScalars(testSize)
b := GenerateScalars(testSize)
out := make(core.HostSlice[ScalarField], testSize)
cfg := core.DefaultVecOpsConfig()
// Perform vector subtraction
err := VecOp(a, b, out, cfg, core.Sub)
if err != cr.CudaSuccess {
panic("Vector subtraction failed")
}
}
```
### Vector Multiplication
```go
package main
import (
"github.com/ingonyama-zk/icicle/wrappers/golang/core"
cr "github.com/ingonyama-zk/icicle/wrappers/golang/cuda_runtime"
)
func main() {
testSize := 1 << 12
a := GenerateScalars(testSize)
b := GenerateScalars(testSize)
out := make(core.HostSlice[ScalarField], testSize)
cfg := core.DefaultVecOpsConfig()
// Perform vector multiplication
err := VecOp(a, b, out, cfg, core.Mul)
if err != cr.CudaSuccess {
panic("Vector multiplication failed")
}
}
```
## VecOps Method
```go
func VecOp(a, b, out core.HostOrDeviceSlice, config core.VecOpsConfig, op core.VecOps) (ret cr.CudaError)
```
### Parameters
- **a**: The first input vector.
- **b**: The second input vector.
- **out**: The output vector where the result of the operation will be stored.
- **config**: A `VecOpsConfig` object containing various configuration options for the vector operations.
- **op**: The operation to perform, specified as one of the constants (`Sub`, `Add`, `Mul`) from the `VecOps` type.
### Return Value
- **CudaError**: Returns a CUDA error code indicating the success or failure of the vector operation.
## VecOpsConfig
The `VecOpsConfig` structure holds configuration parameters for the vector operations, allowing customization of its behavior.
```go
type VecOpsConfig struct {
Ctx cr.DeviceContext
isAOnDevice bool
isBOnDevice bool
isResultOnDevice bool
IsResultMontgomeryForm bool
IsAsync bool
}
```
### Fields
- **Ctx**: Device context containing details like device ID and stream ID.
- **isAOnDevice**: Indicates if vector `a` is located on the device.
- **isBOnDevice**: Indicates if vector `b` is located on the device.
- **isResultOnDevice**: Specifies where the result vector should be stored (device or host memory).
- **IsResultMontgomeryForm**: Determines if the result vector should be in Montgomery form.
- **IsAsync**: Controls whether the vector operation runs asynchronously.
### Default Configuration
Use `DefaultVecOpsConfig` to obtain a default configuration, customizable as needed.
```go
func DefaultVecOpsConfig() VecOpsConfig
```

BIN
docs/docs/icicle/image.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

View File

@@ -0,0 +1,97 @@
# ICICLE integrated provers
ICICLE has been used by companies and projects such as [Celer Network](https://github.com/celer-network), [Consensys Gnark](https://github.com/Consensys/gnark), [EZKL](https://blog.ezkl.xyz/post/acceleration/) and others to accelerate their ZK proving pipeline.
Many of these integrations have been a collaboration between Ingonyama and the integrating company. We have learned a lot about designing GPU based ZK provers.
If you're interested in understanding these integrations better or learning how you can use ICICLE to accelerate your existing ZK proving pipeline this is the place for you.
## A primer to building your own integrations
Lets illustrate an ICICLE integration, so you can understand the core API and design overview of ICICLE.
![ICICLE architecture](../../static/img/architecture-high-level.png)
Engineers usually use a cryptographic library to implement their ZK protocols. These libraries implement efficient primitives which are used as building blocks for the protocol; ICICLE is such a library. The difference is that ICICLE is designed from the start to run on GPUs; the Rust and Golang APIs abstract away all low level CUDA details. Our goal was to allow developers with no GPU experience to quickly get started with ICICLE.
A developer may use ICICLE with two main approaches in mind.
1. Drop-in replacement approach.
2. End-to-End GPU replacement approach.
The first approach for GPU-accelerating your Prover with ICICLE is quick to implement, but it has limitations, such as reduced memory optimization and limited protocol tuning for GPUs. It's a solid starting point, but those committed to fully leveraging GPU acceleration should consider a more comprehensive approach.
A End-to-End GPU replacement means performing the entire ZK proof on the GPU. This approach will reduce latency to a minimum and requires you to change the way you implement the protocol to be more GPU friendly. This approach will take full advantage of GPU acceleration. Redesigning your prover this way may take more engineering effort but we promise you that its worth it!
## Using ICICLE integrated provers
Here we cover how a developer can run existing circuits on ICICLE integrated provers.
### Gnark
[Gnark](https://github.com/Consensys/gnark) officially supports GPU proving with ICICLE. Currently only Groth16 on curve `BN254` is supported. This means that if you are currently using Gnark to write your circuits you can enjoy GPU acceleration without making many changes.
:::info
Currently ICICLE has been merged to Gnark [master branch](https://github.com/Consensys/gnark), however the [latest release](https://github.com/Consensys/gnark/releases/tag/v0.9.1) is from October 2023.
:::
Make sure your golang circuit project has `gnark` as a dependency and that you are using the master branch for now.
```
go get github.com/consensys/gnark@master
```
You should see two indirect dependencies added.
```
...
github.com/ingonyama-zk/icicle v0.1.0 // indirect
github.com/ingonyama-zk/iciclegnark v0.1.1 // indirect
...
```
:::info
As you may notice we are using ICICLE v0.1 here since golang bindings are only support in ICICLE v0.1 for the time being.
:::
To switch over to ICICLE proving, make sure to change the backend you are using, below is an example of how this should be done.
```
// toggle on
proofIci, err := groth16.Prove(ccs, pk, secretWitness, backend.WithIcicleAcceleration())
// toggle off
proof, err := groth16.Prove(ccs, pk, secretWitness)
```
Now that you have enabled `WithIcicleAcceleration` backend simple change the way your run your circuits to:
```
go run -tags=icicle main.go
```
Your logs should look something like this if everything went as expected.
```
13:12:05 INF compiling circuit
13:12:05 INF parsed circuit inputs nbPublic=1 nbSecret=1
13:12:05 INF building constraint builder nbConstraints=3
13:12:05 DBG precomputing proving key in GPU acceleration=icicle backend=groth16 curve=bn254 nbConstraints=3
13:12:05 DBG constraint system solver done nbConstraints=3 took=0.070259
13:12:05 DBG prover done acceleration=icicle backend=groth16 curve=bn254 nbConstraints=3 took=80.356684
13:12:05 DBG verifier done backend=groth16 curve=bn254 took=1.843888
```
`acceleration=icicle` indicates that the prover is running in acceleration mode with ICICLE.
You can reference the [Gnark docs](https://github.com/Consensys/gnark?tab=readme-ov-file#gpu-support) for further information.
### Halo2
[Halo2](https://github.com/zkonduit/halo2) fork integrated with ICICLE for GPU acceleration. This means that you can run your existing Halo2 circuits with GPU acceleration just by activating a feature flag.
To enable GPU acceleration just enable `icicle_gpu` [feature flag](https://github.com/zkonduit/halo2/blob/3d7b5e61b3052680ccb279e05bdcc21dd8a8fedf/halo2_proofs/Cargo.toml#L102).
This feature flag will seamlessly toggle on GPU acceleration for you.

View File

@@ -0,0 +1,260 @@
# Getting started with ICICLE
This guide is oriented towards developers who want to start writing code with the ICICLE libraries. If you just want to run your existing ZK circuits on GPU refer to [this guide](./integrations.md#using-icicle-integrations) please.
## ICICLE repository overview
![ICICLE API overview](../../static/img/apilevels.png)
The diagram above displays the general architecture of ICICLE and the API layers that exist. The CUDA API, which we also call ICICLE Core, is the lowest level and is comprised of CUDA kernels which implement all primitives such as MSM as well as C++ wrappers which expose these methods for different curves.
ICICLE Core compiles into a static library. This library can be used with our official Golang and Rust wrappers or you can implement a wrapper for it in any language.
Based on this dependency architecture, the ICICLE repository has three main sections, each of which is independent from the other.
- ICICLE core
- ICICLE Rust bindings
- ICICLE Golang bindings
### ICICLE Core
[ICICLE core](https://github.com/ingonyama-zk/icicle/tree/main/icicle) contains all the low level CUDA code implementing primitives such as [points](https://github.com/ingonyama-zk/icicle/tree/main/icicle/primitives) and [MSM](https://github.com/ingonyama-zk/icicle/tree/main/icicle/appUtils/msm). There also exists higher level C++ wrappers to expose the low level CUDA primitives ([example](https://github.com/ingonyama-zk/icicle/blob/c1a32a9879a7612916e05aa3098f76144de4109e/icicle/appUtils/msm/msm.cu#L1)).
ICICLE Core would typically be compiled into a static library and used in a third party language such as Rust or Golang.
### ICICLE Rust and Golang bindings
- [ICICLE Rust bindings](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust)
- [ICICLE Golang bindings](https://github.com/ingonyama-zk/icicle/tree/main/goicicle)
These bindings allow you to easily use ICICLE in a Rust or Golang project. Setting up Golang bindings requires a bit of extra steps compared to the Rust bindings which utilize the `cargo build` tool.
## Running ICICLE
This guide assumes that you have a Linux or Windows machine with an Nvidia GPU installed. If you don't have access to an Nvidia GPU you can access one for free on [Google Colab](https://colab.google/).
### Prerequisites
- NVCC (version 12.0 or newer)
- cmake 3.18 and above
- GCC - version 9 or newer is recommended.
- Any Nvidia GPU
- Linux or Windows operating system.
#### Optional Prerequisites
- Docker, latest version.
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html)
If you don't wish to install these prerequisites you can follow this tutorial using a [ZK-Container](https://github.com/ingonyama-zk/icicle/blob/main/Dockerfile) (docker container). To learn more about using ZK-Containers [read this](../ZKContainers.md).
### Setting up ICICLE and running tests
The objective of this guide is to make sure you can run the ICICLE Core, Rust and Golang tests. Achieving this will ensure you know how to setup ICICLE and run a ICICLE program. For simplicity, we will be using the ICICLE docker container as our environment, however, you may install the prerequisites on your machine and follow the same commands in your terminal.
#### Setting up our environment
Lets begin by cloning the ICICLE repository:
```sh
git clone https://github.com/ingonyama-zk/icicle
```
We will proceed to build the docker image [found here](https://github.com/ingonyama-zk/icicle/blob/main/Dockerfile):
```sh
docker build -t icicle-demo .
docker run -it --runtime=nvidia --gpus all --name icicle_container icicle-demo
```
- `-it` runs the container in interactive mode with a terminal.
- `--gpus all` Allocate all available GPUs to the container. You can also specify which GPUs to use if you don't want to allocate all.
- `--runtime=nvidia` Use the NVIDIA runtime, necessary for GPU support.
To read more about these settings reference this [article](https://developer.nvidia.com/nvidia-container-runtime).
If you accidentally close your terminal and want to reconnect just call:
```sh
docker exec -it icicle_container bash
```
Lets make sure that we have the correct CUDA version before proceeding
```sh
nvcc --version
```
You should see something like this
```sh
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
```
Make sure the release version is at least 12.0.
#### ICICLE Core
ICICLE Core is found under [`<project_root>/icicle`](https://github.com/ingonyama-zk/icicle/tree/main/icicle). To build and run the tests first:
```sh
cd icicle
```
We are going to compile ICICLE for a specific curve
```sh
mkdir -p build
cmake -S . -B build -DCURVE=bn254 -DBUILD_TESTS=ON
cmake --build build
```
`-DBUILD_TESTS=ON` compiles the tests, without this flag `ctest` won't work.
`-DCURVE=bn254` tells the compiler which curve to build. You can find a list of supported curves [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/curves).
The output in `build` folder should include the static libraries for the compiled curve.
:::info
Make sure to only use `-DBUILD_TESTS=ON` for running tests as the archive output will only be available when `-DBUILD_TESTS=ON` is not supplied.
:::
To run the test
```sh
cd build
ctest
```
#### ICICLE Rust
The rust bindings work by first compiling the CUDA static libraries as seen [here](https://github.com/ingonyama-zk/icicle/blob/main/wrappers/rust/icicle-curves/icicle-bn254/build.rs). The compilation of CUDA and the Rust library is all handled by the rust build toolchain.
Similar to ICICLE Core here we also have to compile per curve.
Lets compile curve `bn254`
```sh
cd wrappers/rust/icicle-curves/icicle-bn254
```
Now lets build our library
```sh
cargo build --release
```
This may take a couple of minutes since we are compiling both the CUDA and Rust code.
To run the tests
```sh
cargo test
```
We also include some benchmarks
```sh
cargo bench
```
#### ICICLE Golang
Golang is WIP in v1, coming soon. Please checkout a previous [release v0.1.0](https://github.com/ingonyama-zk/icicle/releases/tag/v0.1.0) for golang bindings.
### Running ICICLE examples
ICICLE examples can be found [here](https://github.com/ingonyama-zk/icicle-examples) these examples cover some simple use cases using C++, rust and golang.
In each example directory, ZK-container files are located in a subdirectory `.devcontainer`.
```sh
msm/
├── .devcontainer
├── devcontainer.json
└── Dockerfile
```
Lets run one of our C++ examples, in this case the [MSM example](https://github.com/ingonyama-zk/icicle-examples/blob/main/c%2B%2B/msm/example.cu).
Clone the repository
```sh
git clone https://github.com/ingonyama-zk/icicle-examples.git
cd icicle-examples
```
Enter the test directory
```sh
cd c++/msm
```
Now lets build our docker file and run the test inside it. Make sure you have installed the [optional prerequisites](#optional-prerequisites).
```sh
docker build -t icicle-example-msm -f .devcontainer/Dockerfile .
```
Lets start and enter the container
```sh
docker run -it --rm --gpus all -v .:/icicle-example icicle-example-msm
```
to run the example
```sh
rm -rf build
mkdir -p build
cmake -S . -B build
cmake --build build
./build/example
```
You can now experiment with our other examples, perhaps try to run a rust or golang example next.
## Writing new bindings for ICICLE
Since ICICLE Core is written in CUDA / C++ its really simple to generate static libraries. These static libraries can be installed on any system and called by higher level languages such as Golang.
static libraries can be loaded into memory once and used by multiple programs, reducing memory usage and potentially improving performance. They also allow you to separate functionality into distinct modules so your static library may need to compile only specific features that you want to use.
Lets review the Golang bindings since its a pretty verbose example (compared to rust which hides it pretty well) of using static libraries. Golang has a library named `CGO` which can be used to link static libraries. Here's a basic example on how you can use cgo to link these libraries:
```go
/*
#cgo LDFLAGS: -L/path/to/shared/libs -lbn254 -lbls12_381 -lbls12_377 -lbw6_671
#include "icicle.h" // make sure you use the correct header file(s)
*/
import "C"
func main() {
// Now you can call the C functions from the ICICLE libraries.
// Note that C function calls are prefixed with 'C.' in Go code.
out := (*C.BN254_projective_t)(unsafe.Pointer(p))
in := (*C.BN254_affine_t)(unsafe.Pointer(affine))
C.projective_from_affine_bn254(out, in)
}
```
The comments on the first line tell `CGO` which libraries to import as well as which header files to include. You can then call methods which are part of the static library and defined in the header file, `C.projective_from_affine_bn254` is an example.
If you wish to create your own bindings for a language of your choice we suggest you start by investigating how you can call static libraries.
### ICICLE Adapters
One of the core ideas behind ICICLE is that developers can gradually accelerate their provers. Many protocols are written using other cryptographic libraries and completely replacing them may be complex and time consuming.
Therefore we offer adapters for various popular libraries, these adapters allow us to convert points and scalars between different formats defined by various libraries. Here is a list:
Golang adapters:
- [Gnark crypto adapter](https://github.com/ingonyama-zk/iciclegnark)

View File

@@ -0,0 +1,64 @@
# Multi GPU with ICICLE
:::info
If you are looking for the Multi GPU API documentation refer here for [Rust](./rust-bindings/multi-gpu.md).
:::
One common challenge with Zero-Knowledge computation is managing the large input sizes. It's not uncommon to encounter circuits surpassing 2^25 constraints, pushing the capabilities of even advanced GPUs to their limits. To effectively scale and process such large circuits, leveraging multiple GPUs in tandem becomes a necessity.
Multi-GPU programming involves developing software to operate across multiple GPU devices. Lets first explore different approaches to Multi-GPU programming then we will cover how ICICLE allows you to easily develop youR ZK computations to run across many GPUs.
## Approaches to Multi GPU programming
There are many [different strategies](https://github.com/NVIDIA/multi-gpu-programming-models) available for implementing multi GPU, however, it can be split into two categories.
### GPU Server approach
This approach usually involves a single or multiple CPUs opening threads to read / write from multiple GPUs. You can think about it as a scaled up HOST - Device model.
![alt text](image.png)
This approach won't let us tackle larger computation sizes but it will allow us to compute multiple computations which we wouldn't be able to load onto a single GPU.
For example let's say that you had to compute two MSMs of size 2^26 on a 16GB VRAM GPU you would normally have to perform them asynchronously. However, if you double the number of GPUs in your system you can now run them in parallel.
### Inter GPU approach
This approach involves a more sophisticated approach to multi GPU computation. Using technologies such as [GPUDirect, NCCL, NVSHMEM](https://www.nvidia.com/en-us/on-demand/session/gtcspring21-cwes1084/) and NVLink it's possible to combine multiple GPUs and split a computation among different devices.
This approach requires redesigning the algorithm at the software level to be compatible with splitting amongst devices. In some cases, to lower latency to a minimum, special inter GPU connections would be installed on a server to allow direct communication between multiple GPUs.
# Writing ICICLE Code for Multi GPUs
The approach we have taken for the moment is a GPU Server approach; we assume you have a machine with multiple GPUs and you wish to run some computation on each GPU.
To dive deeper and learn about the API check out the docs for our different ICICLE API
- [Rust Multi GPU APIs](./rust-bindings/multi-gpu.md)
- C++ Multi GPU APIs
## Best practices
- Never hardcode device IDs, if you want your software to take advantage of all GPUs on a machine use methods such as `get_device_count` to support arbitrary number of GPUs.
- Launch one CPU thread per GPU. To avoid [nasty errors](https://developer.nvidia.com/blog/cuda-pro-tip-always-set-current-device-avoid-multithreading-bugs/) and hard to read code we suggest that for every GPU you create a dedicated thread. Within a CPU thread you should be able to launch as many tasks as you wish for a GPU as long as they all run on the same GPU id. This will make your code way more manageable, easy to read and performant.
## ZKContainer support for multi GPUs
Multi GPU support should work with ZK-Containers by simply defining which devices the docker container should interact with:
```sh
docker run -it --gpus '"device=0,2"' zk-container-image
```
If you wish to expose all GPUs
```sh
docker run --gpus all zk-container-image
```

View File

@@ -0,0 +1,64 @@
# What is ICICLE?
[![GitHub Release](https://img.shields.io/github/v/release/ingonyama-zk/icicle)](https://github.com/ingonyama-zk/icicle/releases)
[ICICLE](https://github.com/ingonyama-zk/icicle) is a cryptography library for ZK using GPUs. ICICLE implements blazing fast cryptographic primitives such as EC operations, MSM, NTT, Poseidon hash and more on GPU.
ICICLE allows developers with minimal GPU experience to effortlessly accelerate their ZK application; from our experiments, even the most naive implementation may yield 10X improvement in proving times.
ICICLE has been used by many leading ZK companies such as [Celer Network](https://github.com/celer-network), [Gnark](https://github.com/Consensys/gnark) and others to accelerate their ZK proving pipeline.
## Dont have access to a GPU?
We understand that not all developers have access to a GPU and we don't want this to limit anyone from developing with ICICLE.
Here are some ways we can help you gain access to GPUs:
### Grants
At Ingonyama we are interested in accelerating the progress of ZK and cryptography. If you are an engineer, developer or an academic researcher we invite you to checkout [our grant program](https://www.ingonyama.com/blog/icicle-for-researchers-grants-challenges). We will give you access to GPUs and even pay you to do your dream research!
### Google Colab
This is a great way to get started with ICICLE instantly. Google Colab offers free GPU access to a NVIDIA T4 instance, it's acquired with 16 GB of memory which should be enough for experimenting and even prototyping with ICICLE.
For an extensive guide on how to setup Google Colab with ICICLE refer to [this article](./colab-instructions.md).
If none of these options are appropriate for you reach out to us on [telegram](https://t.me/RealElan) we will do our best to help you.
### Vast.ai
[Vast.ai](https://vast.ai/) is a global GPU marketplace where you can rent many different types of GPUs by the hour for [competitive pricing](https://vast.ai/pricing). They provide on-demand and interruptible rentals depending on your need or use case; you can learn more about their rental types [here](https://vast.ai/faq#rental-types).
:::note
If none of these options suit your needs, contact us on [telegram](https://t.me/RealElan) for assistance. We're committed to ensuring that a lack of a GPU doesn't become a bottleneck for you. If you need help with setup or any other issues, we're here to do our best to help you.
:::
## What can you do with ICICLE?
[ICICLE](https://github.com/ingonyama-zk/icicle) can be used in the same way you would use any other cryptography library. While developing and integrating ICICLE into many proof systems, we found some use case categories:
### Circuit developers
If you are a circuit developer and are experiencing bottlenecks while running your circuits, an ICICLE integrated prover may be the solution.
ICICLE has been integrated into a number of popular ZK provers including [Gnark prover](https://github.com/Consensys/gnark) and [Halo2](https://github.com/zkonduit/halo2). This means that you can enjoy GPU acceleration for your existing circuits immediately without writing a single line of code by simply switching on the GPU prover flag!
### Integrating into existing ZK provers
From our collaborations we have learned that its possible to accelerate a specific part of your prover to solve for a specific bottleneck.
ICICLE can be used to accelerate specific parts of your prover without completely rewriting your ZK prover.
### Developing your own ZK provers
If your goal is to build a ZK prover from the ground up, ICICLE is an ideal tool for creating a highly optimized and scalable ZK prover. A key benefit of using GPUs with ICICLE is the ability to scale your ZK prover efficiently across multiple machines within a data center.
### Developing proof of concepts
ICICLE is also ideal for developing small prototypes. ICICLE has Golang and Rust bindings so you can easily develop a library implementing a specific primitive using ICICLE. An example would be develop a KZG commitment library using ICICLE.

View File

@@ -0,0 +1,27 @@
@startuml
skinparam componentStyle uml2
' Define Components
component "C++ Template\nComponent" as CppTemplate {
[Parameterizable Interface]
}
component "C API Wrapper\nComponent" as CApiWrapper {
[C API Interface]
}
component "Rust Code\nComponent" as RustCode {
[Macro Interface\n(Template Instantiation)]
}
' Define Artifact
artifact "Static Library\n«artifact»" as StaticLib
' Connections
CppTemplate -down-> CApiWrapper : Instantiates
CApiWrapper .down.> StaticLib : Compiles into
RustCode -left-> StaticLib : Links against\nand calls via FFI
' Notes
note right of CppTemplate : Generic C++\ntemplate implementation
note right of CApiWrapper : Exposes C API for FFI\nto Rust/Go
note right of RustCode : Uses macros to\ninstantiate templates
@enduml

View File

@@ -0,0 +1,86 @@
@startuml
' Define Interface for Polynomial Backend Operations
interface IPolynomialBackend {
+add()
+subtract()
+multiply()
+divide()
+evaluate()
}
' Define Interface for Polynomial Context (State Management)
interface IPolynomialContext {
+initFromCoeffs()
+initFromEvals()
+getCoeffs()
+getEvals()
}
' PolynomialAPI now uses two strategies: Backend and Context
class PolynomialAPI {
-backendStrategy: IPolynomialBackend
-contextStrategy: IPolynomialContext
-setBackendStrategy(IPolynomialBackend)
-setContextStrategy(IPolynomialContext)
+add()
+subtract()
+multiply()
+divide()
+evaluate()
}
' Backend Implementations
class GPUPolynomialBackend implements IPolynomialBackend {
#gpuResources: Resource
+add()
+subtract()
+multiply()
+divide()
+evaluate()
}
class ZPUPolynomialBackend implements IPolynomialBackend {
#zpuResources: Resource
+add()
+subtract()
+multiply()
+divide()
+evaluate()
}
class TracerPolynomialBackend implements IPolynomialBackend {
#traceData: Data
+add()
+subtract()
+multiply()
+divide()
+evaluate()
}
' Context Implementations (Placeholder for actual implementation)
class GPUContext implements IPolynomialContext {
+initFromCoeffs()
+initFromEvals()
+getCoeffs()
+getEvals()
}
class ZPUContext implements IPolynomialContext {
+initFromCoeffs()
+initFromEvals()
+getCoeffs()
+getEvals()
}
class TracerContext implements IPolynomialContext {
+initFromCoeffs()
+initFromEvals()
+getCoeffs()
+getEvals()
}
' Relationships
PolynomialAPI o-- IPolynomialBackend : uses
PolynomialAPI o-- IPolynomialContext : uses
@enduml

Binary file not shown.

After

Width:  |  Height:  |  Size: 220 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 215 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 322 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 113 KiB

View File

@@ -0,0 +1,96 @@
# MSM - Multi scalar multiplication
MSM stands for Multi scalar multiplication, its defined as:
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mi>M</mi>
<mi>S</mi>
<mi>M</mi>
<mo stretchy="false">(</mo>
<mi>a</mi>
<mo>,</mo>
<mi>G</mi>
<mo stretchy="false">)</mo>
<mo>=</mo>
<munderover>
<mo data-mjx-texclass="OP" movablelimits="false">&#x2211;</mo>
<mrow data-mjx-texclass="ORD">
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow data-mjx-texclass="ORD">
<mi>n</mi>
<mo>&#x2212;</mo>
<mn>1</mn>
</mrow>
</munderover>
<msub>
<mi>a</mi>
<mi>j</mi>
</msub>
<msub>
<mi>G</mi>
<mi>j</mi>
</msub>
</math>
Where
$G_j \in G$ - points from an Elliptic Curve group.
$a_0, \ldots, a_n$ - Scalars
$MSM(a, G) \in G$ - a single EC (elliptic curve) point
In words, MSM is the sum of scalar and EC point multiplications. We can see from this definition that the core operations occurring are Modular Multiplication and Elliptic curve point addition. Its obvious that multiplication can be computed in parallel and then the products summed, making MSM inherently parallelizable.
Accelerating MSM is crucial to a ZK protocol's performance due to the [large percent of run time](https://hackmd.io/@0xMonia/SkQ6-oRz3#Hardware-acceleration-in-action) they take when generating proofs.
You can learn more about how MSMs work from this [video](https://www.youtube.com/watch?v=Bl5mQA7UL2I) and from our resource list on [Ingopedia](https://www.ingonyama.com/ingopedia/msm).
## Supported curves
MSM supports the following curves:
`bls12-377`, `bls12-381`, `bn254`, `bw6-761`, `grumpkin`
## Supported Bindings
- [Golang](../golang-bindings/msm.md)
- [Rust](../rust-bindings//msm.md)
## Supported algorithms
Our MSM implementation supports two algorithms `Bucket accumulation` and `Large triangle accumulation`.
### Bucket accumulation
The Bucket Accumulation algorithm is a method of dividing the overall MSM task into smaller, more manageable sub-tasks. It involves partitioning scalars and their corresponding points into different "buckets" based on the scalar values.
Bucket Accumulation can be more parallel-friendly because it involves dividing the computation into smaller, independent tasks, distributing scalar-point pairs into buckets and summing points within each bucket. This division makes it well suited for parallel processing on GPUs.
#### When should I use Bucket accumulation?
In scenarios involving large MSM computations with many scalar-point pairs, the ability to parallelize operations makes Bucket Accumulation more efficient. The larger the MSM task, the more significant the potential gains from parallelization.
### Large triangle accumulation
Large Triangle Accumulation is a method for optimizing MSM which focuses on reducing the number of point doublings in the computation. This algorithm is based on the observation that the number of point doublings can be minimized by structuring the computation in a specific manner.
#### When should I use Large triangle accumulation?
The Large Triangle Accumulation algorithm is more sequential in nature, as it builds upon each step sequentially (accumulating sums and then performing doubling). This structure can make it less suitable for parallelization but potentially more efficient for a <b>large batch of smaller MSM computations</b>.
## MSM Modes
ICICLE MSM also supports two different modes `Batch MSM` and `Single MSM`
Batch MSM allows you to run many MSMs with a single API call, Single MSM will launch a single MSM computation.
### Which mode should I use?
This decision is highly dependent on your use case and design. However, if your design allows for it, using batch mode can significantly improve efficiency. Batch processing allows you to perform multiple MSMs leveraging the parallel processing capabilities of GPUs.
Single MSM mode should be used when batching isn't possible or when you have to run a single MSM.

View File

@@ -0,0 +1,159 @@
# NTT - Number Theoretic Transform
The Number Theoretic Transform (NTT) is a variant of the Fourier Transform used over finite fields, particularly those of integers modulo a prime number. NTT operates in a discrete domain and is used primarily in applications requiring modular arithmetic, such as cryptography and polynomial multiplication.
NTT is defined similarly to the Discrete Fourier Transform (DFT), but instead of using complex roots of unity, it uses roots of unity within a finite field. The definition hinges on the properties of the finite field, specifically the existence of a primitive root of unity of order $N$ (where $N$ is typically a power of 2), and the modulo operation is performed with respect to a specific prime number that supports these roots.
Formally, given a sequence of integers $a_0, a_1, ..., a_{N-1}$, the NTT of this sequence is another sequence of integers $A_0, A_1, ..., A_{N-1}$, computed as follows:
$$
A_k = \sum_{n=0}^{N-1} a_n \cdot \omega^{nk} \mod p
$$
where:
- $N$ is the size of the input sequence and is a power of 2,
- $p$ is a prime number such that $p = kN + 1$ for some integer $k$, ensuring that $p$ supports the existence of $N$th roots of unity,
- $\omega$ is a primitive $N$th root of unity modulo $p$, meaning $\omega^N \equiv 1 \mod p$ and no smaller positive power of $\omega$ is congruent to 1 modulo $p$,
- $k$ ranges from 0 to $N-1$, and it indexes the output sequence.
The NTT is particularly useful because it enables efficient polynomial multiplication under modulo arithmetic, crucial for algorithms in cryptographic protocols, and other areas requiring fast modular arithmetic operations.
There exists also INTT which is the inverse operation of NTT. INTT can take as input an output sequence of integers from an NTT and reconstruct the original sequence.
# Using NTT
### Supported curves
NTT supports the following curves:
`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`
## Supported Bindings
- [Golang](../golang-bindings/ntt.md)
- [Rust](../rust-bindings/ntt.md)
### Examples
- [Rust API examples](https://github.com/ingonyama-zk/icicle/blob/d84ffd2679a4cb8f8d1ac2ad2897bc0b95f4eeeb/examples/rust/ntt/src/main.rs#L1)
- [C++ API examples](https://github.com/ingonyama-zk/icicle/blob/d84ffd2679a4cb8f8d1ac2ad2897bc0b95f4eeeb/examples/c%2B%2B/ntt/example.cu#L1)
### Ordering
The `Ordering` enum defines how inputs and outputs are arranged for the NTT operation, offering flexibility in handling data according to different algorithmic needs or compatibility requirements. It primarily affects the sequencing of data points for the transform, which can influence both performance and the compatibility with certain algorithmic approaches. The available ordering options are:
- **`kNN` (Natural-Natural):** Both inputs and outputs are in their natural order. This is the simplest form of ordering, where data is processed in the sequence it is given, without any rearrangement.
- **`kNR` (Natural-Reversed):** Inputs are in natural order, while outputs are in bit-reversed order. This ordering is typically used in algorithms that benefit from having the output in a bit-reversed pattern.
- **`kRN` (Reversed-Natural):** Inputs are in bit-reversed order, and outputs are in natural order. This is often used with the Cooley-Tukey FFT algorithm.
- **`kRR` (Reversed-Reversed):** Both inputs and outputs are in bit-reversed order.
- **`kNM` (Natural-Mixed):** Inputs are provided in their natural order, while outputs are arranged in a digit-reversed (mixed) order. This ordering is good for mixed radix NTT operations, where the mixed or digit-reversed ordering of outputs is a generalization of the bit-reversal pattern seen in simpler, radix-2 cases.
- **`kMN` (Mixed-Natural):** Inputs are in a digit-reversed (mixed) order, while outputs are restored to their natural order. This ordering would primarily be used for mixed radix NTT
Choosing an algorithm is heavily dependent on your use case. For example Cooley-Tukey will often use `kRN` and Gentleman-Sande often uses `kNR`.
### Modes
NTT also supports two different modes `Batch NTT` and `Single NTT`
Batch NTT allows you to run many NTTs with a single API call, Single MSM will launch a single MSM computation.
Deciding weather to use `batch NTT` vs `single NTT` is highly dependent on your application and use case.
**Single NTT Mode**
- Choose this mode when your application requires processing individual NTT operations in isolation.
**Batch NTT Mode**
- Batch NTT mode can significantly reduce read/write as well as computation overhead by executing multiple NTT operations in parallel.
- Batch mode may also offer better utilization of computational resources (memory and compute).
## Supported algorithms
Our NTT implementation supports two algorithms `radix-2` and `mixed-radix`.
### Radix 2
At its core, the Radix-2 NTT algorithm divides the problem into smaller sub-problems, leveraging the properties of "divide and conquer" to reduce the overall computational complexity. The algorithm operates on sequences whose lengths are powers of two.
1. **Input Preparation:**
The input is a sequence of integers $a_0, a_1, \ldots, a_{N-1}, \text{ where } N$ is a power of two.
2. **Recursive Decomposition:**
The algorithm recursively divides the input sequence into smaller sequences. At each step, it separates the sequence into even-indexed and odd-indexed elements, forming two subsequences that are then processed independently.
3. **Butterfly Operations:**
The core computational element of the Radix-2 NTT is the "butterfly" operation, which combines pairs of elements from the sequences obtained in the decomposition step.
Each butterfly operation involves multiplication by a "twiddle factor," which is a root of unity in the finite field, and addition or subtraction of the results, all performed modulo the prime modulus.
$$
X_k = (A_k + B_k \cdot W^k) \mod p
$$
$X_k$ - The output of the butterfly operation for the $k$-th element
$A_k$ - an element from the even-indexed subset
$B_k$ - an element from the odd-indexed subset
$p$ - prime modulus
$k$ - The index of the current operation within the butterfly or the transform stage
The twiddle factors are precomputed to save runtime and improve performance.
4. **Bit-Reversal Permutation:**
A final step involves rearranging the output sequence into the correct order. Due to the halving process in the decomposition steps, the elements of the transformed sequence are initially in a bit-reversed order. A bit-reversal permutation is applied to obtain the final sequence in natural order.
### Mixed Radix
The Mixed Radix NTT algorithm extends the concepts of the Radix-2 algorithm by allowing the decomposition of the input sequence based on various factors of its length. Specifically ICICLEs implementation splits the input into blocks of sizes 16,32,64 compared to radix2 which is always splitting such that we end with NTT of size 2. This approach offers enhanced flexibility and efficiency, especially for input sizes that are composite numbers, by leveraging the "divide and conquer" strategy across multiple radixes.
The NTT blocks in Mixed Radix are implemented more efficiently based on winograd NTT but also optimized memory and register usage is better compared to Radix-2.
Mixed Radix can reduce the number of stages required to compute for large inputs.
1. **Input Preparation:**
The input to the Mixed Radix NTT is a sequence of integers $a_0, a_1, \ldots, a_{N-1}$, where $N$ is not strictly required to be a power of two. Instead, $N$ can be any composite number, ideally factorized into primes or powers of primes.
2. **Factorization and Decomposition:**
Unlike the Radix-2 algorithm, which strictly divides the computational problem into halves, the Mixed Radix NTT algorithm implements a flexible decomposition approach which isn't limited to prime factorization.
For example, an NTT of size 256 can be decomposed into two stages of $16 \times \text{NTT}_{16}$, leveraging a composite factorization strategy rather than decomposing into eight stages of $\text{NTT}_{2}$. This exemplifies the use of composite factors (in this case, $256 = 16 \times 16$) to apply smaller NTT transforms, optimizing computational efficiency by adapting the decomposition strategy to the specific structure of $N$.
3. **Butterfly Operations with Multiple Radixes:**
The Mixed Radix algorithm utilizes butterfly operations for various radix sizes. Each sub-transform involves specific butterfly operations characterized by multiplication with twiddle factors appropriate for the radix in question.
The generalized butterfly operation for a radix-$r$ element can be expressed as:
$$
X_{k,r} = \sum_{j=0}^{r-1} (A_{j,k} \cdot W^{jk}) \mod p
$$
where $X_{k,r}$ is the output of the $radix-r$ butterfly operation for the $k-th$ set of inputs, $A_{j,k}$ represents the $j-th$ input element for the $k-th$ operation, $W$ is the twiddle factor, and $p$ is the prime modulus.
4. **Recombination and Reordering:**
After applying the appropriate butterfly operations across all decomposition levels, the Mixed Radix algorithm recombines the results into a single output sequence. Due to the varied sizes of the sub-transforms, a more complex reordering process may be required compared to Radix-2. This involves digit-reversal permutations to ensure that the final output sequence is correctly ordered.
### Which algorithm should I choose ?
Both work only on inputs of power of 2 (e.g., 256, 512, 1024).
Radix 2 is faster for small NTTs. A small NTT would be around logN = 16 and batch size 1. Radix 2 won't necessarily perform better for smaller `logn` with larger batches.
Mixed radix on the other hand works better for larger NTTs with larger input sizes.
Performance really depends on logn size, batch size, ordering, inverse, coset, coeff-field and which GPU you are using.
For this reason we implemented our [heuristic auto-selection](https://github.com/ingonyama-zk/icicle/blob/774250926c00ffe84548bc7dd97aea5227afed7e/icicle/appUtils/ntt/ntt.cu#L474) which should choose the most efficient algorithm in most cases.
We still recommend you benchmark for your specific use case if you think a different configuration would yield better results.

View File

@@ -0,0 +1,11 @@
# ICICLE Primitives
This section of the documentation is dedicated to the ICICLE primitives, we will cover the usage and internal details of our primitives such as hashing algorithms, MSM and NTT.
## Supported primitives
- [MSM](./msm.md)
- [NTT](./ntt.md)
- [Poseidon Hash](./poseidon.md)

View File

@@ -0,0 +1,226 @@
# Poseidon
[Poseidon](https://eprint.iacr.org/2019/458.pdf) is a popular hash in the ZK ecosystem primarily because its optimized to work over large prime fields, a common setting for ZK proofs, thereby minimizing the number of multiplicative operations required.
Poseidon has also been specifically designed to be efficient when implemented within ZK circuits, Poseidon uses far less constraints compared to other hash functions like Keccak or SHA-256 in the context of ZK circuits.
Poseidon has been used in many popular ZK protocols such as Filecoin and [Plonk](https://drive.google.com/file/d/1bZZvKMQHaZGA4L9eZhupQLyGINkkFG_b/view?usp=drive_open).
Our implementation of Poseidon is implemented in accordance with the optimized [Filecoin version](https://spec.filecoin.io/algorithms/crypto/poseidon/).
Let understand how Poseidon works.
### Initialization
Poseidon starts with the initialization of its internal state, which is composed of the input elements and some pregenerated constants. An initial round constant is added to each element of the internal state. Adding The round constants ensure the state is properly mixed from the outset.
This is done to prevent collisions and to prevent certain cryptographic attacks by ensuring that the internal state is sufficiently mixed and unpredictable.
![Alt text](image.png)
### Applying full and partial rounds
To generate a secure hash output, the algorithm goes through a series of "full rounds" and "partial rounds" as well as transformations between these sets of rounds.
First full rounds => apply SBox and Round constants => partial rounds => Last full rounds => Apply SBox
#### Full rounds
![Alt text](image-1.png)
**Uniform Application of S-Box:** In full rounds, the S-box (a non-linear transformation) is applied uniformly to every element of the hash function's internal state. This ensures a high degree of mixing and diffusion, contributing to the hash function's security. The functions S-box involves raising each element of the state to a certain power denoted by `α` a member of the finite field defined by the prime `p`, `α` can be different depending on the the implementation and user configuration.
**Linear Transformation:** After applying the S-box, a linear transformation is performed on the state. This involves multiplying the state by a MDS (Maximum Distance Separable) Matrix. which further diffuses the transformations applied by the S-box across the entire state.
**Addition of Round Constants:** Each element of the state is then modified by adding a unique round constant. These constants are different for each round and are precomputed as part of the hash function's initialization. The addition of round constants ensures that even minor changes to the input produce significant differences in the output.
#### Partial Rounds
**Selective Application of S-Box:** Partial rounds apply the S-box transformation to only one element of the internal state per round, rather than to all elements. This selective application significantly reduces the computational complexity of the hash function without compromising its security. The choice of which element to apply the S-box to can follow a specific pattern or be fixed, depending on the design of the hash function.
**Linear Transformation and Round Constants:** A linear transformation is performed and round constants are added. The linear transformation in partial rounds can be designed to be less computationally intensive (this is done by using a sparse matrix) than in full rounds, further optimizing the function's efficiency.
The user of Poseidon can often choose how many partial or full rounds he wishes to apply; more full rounds will increase security but degrade performance. The choice and balance is highly dependent on the use case.
![Alt text](image-2.png)
## Using Poseidon
ICICLE Poseidon is implemented for GPU and parallelization is performed for each element of the state rather than for each state.
What that means is we calculate multiple hash-sums over multiple pre-images in parallel, rather than going block by block over the input vector.
So for Poseidon of arity 2 and input of size 1024 * 2, we would expect 1024 elements of output. Which means each block would be of size 2 and that would result in 1024 Poseidon hashes being performed.
### Supported API
[`Rust`](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/icicle-core/src/poseidon), [`C++`](https://github.com/ingonyama-zk/icicle/tree/main/icicle/appUtils/poseidon)
### Supported curves
Poseidon supports the following curves:
`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`
### Constants
Poseidon is extremely customizable and using different constants will produce different hashes, security levels and performance results.
We support pre-calculated and optimized constants for each of the [supported curves](#supported-curves).The constants can be found [here](https://github.com/ingonyama-zk/icicle/tree/main/icicle/appUtils/poseidon/constants) and are labeled clearly per curve `<curve_name>_poseidon.h`.
If you wish to generate your own constants you can use our python script which can be found [here](https://github.com/ingonyama-zk/icicle/blob/b6dded89cdef18348a5d4e2748b71ce4211c63ad/icicle/appUtils/poseidon/constants/generate_parameters.py#L1).
Prerequisites:
- Install python 3
- `pip install poseidon-hash`
- `pip install galois==0.3.7`
- `pip install numpy`
You will then need to modify the following values before running the script.
```python
# Modify these
arity = 11 # we support arity 2, 4, 8 and 11.
p = 0x73EDA753299D7D483339D80809A1D80553BDA402FFFE5BFEFFFFFFFF00000001 # bls12-381
# p = 0x12ab655e9a2ca55660b44d1e5c37b00159aa76fed00000010a11800000000001 # bls12-377
# p = 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000001 # bn254
# p = 0x1ae3a4617c510eac63b05c06ca1493b1a22d9f300f5138f1ef3622fba094800170b5d44300000008508c00000000001 # bw6-761
prime_bit_len = 255
field_bytes = 32
...
# primitive_element = None
primitive_element = 7 # bls12-381
# primitive_element = 22 # bls12-377
# primitive_element = 5 # bn254
# primitive_element = 15 # bw6-761
```
We only support `alpha = 5` so if you want to use another alpha for SBox please reach out on discord or open a github issue.
### Rust API
This is the most basic way to use the Poseidon API.
```rust
let test_size = 1 << 10;
let arity = 2u32;
let ctx = get_default_device_context();
let constants = load_optimized_poseidon_constants::<F>(arity, &ctx).unwrap();
let config = PoseidonConfig::default();
let inputs = vec![F::one(); test_size * arity as usize];
let outputs = vec![F::zero(); test_size];
let mut input_slice = HostOrDeviceSlice::on_host(inputs);
let mut output_slice = HostOrDeviceSlice::on_host(outputs);
poseidon_hash_many::<F>(
&mut input_slice,
&mut output_slice,
test_size as u32,
arity as u32,
&constants,
&config,
)
.unwrap();
```
The `PoseidonConfig::default()` can be modified, by default the inputs and outputs are set to be on `Host` for example.
```
impl<'a> Default for PoseidonConfig<'a> {
fn default() -> Self {
let ctx = get_default_device_context();
Self {
ctx,
are_inputs_on_device: false,
are_outputs_on_device: false,
input_is_a_state: false,
aligned: false,
loop_state: false,
is_async: false,
}
}
}
```
In the example above `load_optimized_poseidon_constants::<F>(arity, &ctx).unwrap();` is used which will load the correct constants based on arity and curve. Its possible to [generate](#constants) your own constants and load them.
```rust
let ctx = get_default_device_context();
let cargo_manifest_dir = env!("CARGO_MANIFEST_DIR");
let constants_file = PathBuf::from(cargo_manifest_dir)
.join("tests")
.join(format!("{}_constants.bin", field_prefix));
let mut constants_buf = vec![];
File::open(constants_file)
.unwrap()
.read_to_end(&mut constants_buf)
.unwrap();
let mut custom_constants = vec![];
for chunk in constants_buf.chunks(field_bytes) {
custom_constants.push(F::from_bytes_le(chunk));
}
let custom_constants = create_optimized_poseidon_constants::<F>(
arity as u32,
&ctx,
full_rounds_half,
partial_rounds,
&mut custom_constants,
)
.unwrap();
```
For more examples using different configurations refer here.
## The Tree Builder
The tree builder allows you to build Merkle trees using Poseidon.
You can define both the tree's `height` and its `arity`. The tree `height` determines the number of layers in the tree, including the root and the leaf layer. The `arity` determines how many children each internal node can have.
```rust
let height = 20;
let arity = 2;
let leaves = vec![F::one(); 1 << (height - 1)];
let mut digests = vec![F::zero(); merkle_tree_digests_len(height, arity)];
let mut leaves_slice = HostOrDeviceSlice::on_host(leaves);
let ctx = get_default_device_context();
let constants = load_optimized_poseidon_constants::<F>(arity, &ctx).unwrap()
let mut config = TreeBuilderConfig::default();
config.keep_rows = 1;
build_poseidon_merkle_tree::<F>(&mut leaves_slice, &mut digests, height, arity, &constants, &config).unwrap();
println!("Root: {:?}", digests[0..1][0]);
```
Similar to Poseidon, you can also configure the Tree Builder `TreeBuilderConfig::default()`
- `keep_rows`: The number of rows which will be written to output, 0 will write all rows.
- `are_inputs_on_device`: Have the inputs been loaded to device memory ?
- `is_async`: Should the TreeBuilder run asynchronously? `False` will block the current CPU thread. `True` will require you call `cudaStreamSynchronize` or `cudaDeviceSynchronize` to retrieve the result.
### Benchmarks
We ran the Poseidon tree builder on:
**CPU**: 12th Gen Intel(R) Core(TM) i9-12900K/
**GPU**: RTX 3090 Ti
**Tree height**: 30 (2^29 elements)
The benchmarks include copying data from and to the device.
| Rows to keep parameter | Run time, Icicle | Supranational PC2
| ----------- | ----------- | ----------- |
| 10 | 9.4 seconds | 13.6 seconds
| 20 | 9.5 seconds | 13.6 seconds
| 29 | 13.7 seconds | 13.6 seconds

View File

@@ -0,0 +1,57 @@
# Rust bindings
Rust bindings allow you to use ICICLE as a rust library.
`icicle-core` defines all interfaces, macros and common methods.
`icicle-cuda-runtime` defines DeviceContext which can be used to manage a specific GPU as well as wrapping common CUDA methods.
`icicle-curves` implements all interfaces and macros from icicle-core for each curve. For example icicle-bn254 implements curve bn254. Each curve has its own build script which will build the CUDA libraries for that curve as part of the rust-toolchain build.
## Using ICICLE Rust bindings in your project
Simply add the following to your `Cargo.toml`.
```
# GPU Icicle integration
icicle-cuda-runtime = { git = "https://github.com/ingonyama-zk/icicle.git" }
icicle-core = { git = "https://github.com/ingonyama-zk/icicle.git" }
icicle-bn254 = { git = "https://github.com/ingonyama-zk/icicle.git" }
```
`icicle-bn254` being the curve you wish to use and `icicle-core` and `icicle-cuda-runtime` contain ICICLE utilities and CUDA wrappers.
If you wish to point to a specific ICICLE branch add `branch = "<name_of_branch>"` or `tag = "<name_of_tag>"` to the ICICLE dependency. For a specific commit add `rev = "<commit_id>"`.
When you build your project ICICLE will be built as part of the build command.
# How do the rust bindings work?
The rust bindings are just rust wrappers for ICICLE Core static libraries which can be compiled. We integrate the compilation of the static libraries into rusts toolchain to make usage seamless and easy. This is achieved by [extending rusts build command](https://github.com/ingonyama-zk/icicle/blob/main/wrappers/rust/icicle-curves/icicle-bn254/build.rs).
```rust
use cmake::Config;
use std::env::var;
fn main() {
println!("cargo:rerun-if-env-changed=CXXFLAGS");
println!("cargo:rerun-if-changed=../../../../icicle");
let cargo_dir = var("CARGO_MANIFEST_DIR").unwrap();
let profile = var("PROFILE").unwrap();
let out_dir = Config::new("../../../../icicle")
.define("BUILD_TESTS", "OFF") //TODO: feature
.define("CURVE", "bn254")
.define("CMAKE_BUILD_TYPE", "Release")
.build_target("icicle")
.build();
println!("cargo:rustc-link-search={}/build", out_dir.display());
println!("cargo:rustc-link-lib=ingo_bn254");
println!("cargo:rustc-link-lib=stdc++");
// println!("cargo:rustc-link-search=native=/usr/local/cuda/lib64");
println!("cargo:rustc-link-lib=cudart");
}
```

View File

@@ -0,0 +1,63 @@
# MSM Pre computation
To understand the theory behind MSM pre computation technique refer to Niall Emmart's [talk](https://youtu.be/KAWlySN7Hm8?feature=shared&t=1734).
### Supported curves
`bls12-377`, `bls12-381`, `bn254`, `bw6-761`, `Grumpkin`
### `precompute_bases`
Precomputes bases for the multi-scalar multiplication (MSM) by extending each base point with its multiples, facilitating more efficient MSM calculations.
```rust
pub fn precompute_bases<C: Curve + MSM<C>>(
points: &HostOrDeviceSlice<Affine<C>>,
precompute_factor: i32,
_c: i32,
ctx: &DeviceContext,
output_bases: &mut HostOrDeviceSlice<Affine<C>>,
) -> IcicleResult<()>
```
#### Parameters
- **`points`**: The original set of affine points (\(P_1, P_2, ..., P_n\)) to be used in the MSM. For batch MSM operations, this should include all unique points concatenated together.
- **`precompute_factor`**: Specifies the total number of points to precompute for each base, including the base point itself. This parameter directly influences the memory requirements and the potential speedup of the MSM operation.
- **`_c`**: Currently unused. Intended for future use to align with the `c` parameter in `MSMConfig`, ensuring the precomputation is compatible with the bucket method's window size used in MSM.
- **`ctx`**: The device context specifying the device ID and stream for execution. This context determines where the precomputation is performed (e.g., on a specific GPU).
- **`output_bases`**: The output buffer for the extended bases. Its size must be `points.len() * precompute_factor`. This buffer should be allocated on the device for GPU computations.
#### Returns
`Ok(())` if the operation is successful, or an `IcicleResult` error otherwise.
#### Description
This function extends each provided base point $(P)$ with its multiples $(2^lP, 2^{2l}P, ..., 2^{(precompute_factor - 1) \cdot l}P)$, where $(l)$ is a level of precomputation determined by the `precompute_factor`. The extended set of points facilitates faster MSM computations by allowing the MSM algorithm to leverage precomputed multiples of base points, reducing the number of point additions required during the computation.
The precomputation process is crucial for optimizing MSM operations, especially when dealing with large sets of points and scalars. By precomputing and storing multiples of the base points, the MSM function can more efficiently compute the scalar-point multiplications.
#### Example Usage
```rust
let device_context = DeviceContext::default_for_device(0); // Use the default device
let precompute_factor = 4; // Number of points to precompute
let mut extended_bases = HostOrDeviceSlice::cuda_malloc(expected_size).expect("Failed to allocate memory for extended bases");
// Precompute the bases using the specified factor
precompute_bases(&points, precompute_factor, 0, &device_context, &mut extended_bases)
.expect("Failed to precompute bases");
```
### Benchmarks
Benchmarks where performed on a Nvidia RTX 3090Ti.
| Pre-computation factor | bn254 size `2^20` MSM, ms. | bn254 size `2^12` MSM, size `2^10` batch, ms. | bls12-381 size `2^20` MSM, ms. | bls12-381 size `2^12` MSM, size `2^10` batch, ms. |
| ------------- | ------------- | ------------- | ------------- | ------------- |
| 1 | 14.1 | 82.8 | 25.5 | 136.7 |
| 2 | 11.8 | 76.6 | 20.3 | 123.8 |
| 4 | 10.9 | 73.8 | 18.1 | 117.8 |
| 8 | 10.6 | 73.7 | 17.2 | 116.0 |

View File

@@ -0,0 +1,172 @@
# MSM
### Supported curves
`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`, `grumpkin`
## Example
```rust
use icicle_bn254::curve::{CurveCfg, G1Projective, ScalarCfg};
use icicle_core::{curve::Curve, msm, traits::GenerateRandom};
use icicle_cuda_runtime::{memory::HostOrDeviceSlice, stream::CudaStream};
fn main() {
let size: usize = 1 << 10; // Define the number of points and scalars
// Generate random points and scalars
println!("Generating random G1 points and scalars for BN254...");
let points = CurveCfg::generate_random_affine_points(size);
let scalars = ScalarCfg::generate_random(size);
// Wrap points and scalars in HostOrDeviceSlice for MSM
let points_host = HostOrDeviceSlice::Host(points);
let scalars_host = HostOrDeviceSlice::Host(scalars);
// Allocate memory on the CUDA device for MSM results
let mut msm_results: HostOrDeviceSlice<'_, G1Projective> = HostOrDeviceSlice::cuda_malloc(1).expect("Failed to allocate CUDA memory for MSM results");
// Create a CUDA stream for asynchronous execution
let stream = CudaStream::create().expect("Failed to create CUDA stream");
let mut cfg = msm::MSMConfig::default();
cfg.ctx.stream = &stream;
cfg.is_async = true; // Enable asynchronous execution
// Execute MSM on the device
println!("Executing MSM on device...");
msm::msm(&scalars_host, &points_host, &cfg, &mut msm_results).expect("Failed to execute MSM");
// Synchronize CUDA stream to ensure MSM execution is complete
stream.synchronize().expect("Failed to synchronize CUDA stream");
// Optionally, move results to host for further processing or printing
println!("MSM execution complete.");
}
```
## MSM API Overview
```rust
pub fn msm<C: Curve>(
scalars: &HostOrDeviceSlice<C::ScalarField>,
points: &HostOrDeviceSlice<Affine<C>>,
cfg: &MSMConfig,
results: &mut HostOrDeviceSlice<Projective<C>>,
) -> IcicleResult<()>
```
### Parameters
- **`scalars`**: A buffer containing the scalar values to be multiplied with corresponding points.
- **`points`**: A buffer containing the points to be multiplied by the scalars.
- **`cfg`**: MSM configuration specifying additional parameters for the operation.
- **`results`**: A buffer where the results of the MSM operations will be stored.
### MSM Config
```rust
pub struct MSMConfig<'a> {
pub ctx: DeviceContext<'a>,
points_size: i32,
pub precompute_factor: i32,
pub c: i32,
pub bitsize: i32,
pub large_bucket_factor: i32,
batch_size: i32,
are_scalars_on_device: bool,
pub are_scalars_montgomery_form: bool,
are_points_on_device: bool,
pub are_points_montgomery_form: bool,
are_results_on_device: bool,
pub is_big_triangle: bool,
pub is_async: bool,
}
```
- **`ctx: DeviceContext`**: Specifies the device context, device id and the CUDA stream for asynchronous execution.
- **`point_size: i32`**:
- **`precompute_factor: i32`**: Determines the number of extra points to pre-compute for each point, affecting memory footprint and performance.
- **`c: i32`**: The "window bitsize," a parameter controlling the computational complexity and memory footprint of the MSM operation.
- **`bitsize: i32`**: The number of bits of the largest scalar, typically equal to the bit size of the scalar field.
- **`large_bucket_factor: i32`**: Adjusts the algorithm's sensitivity to frequently occurring buckets, useful for non-uniform scalar distributions.
- **`batch_size: i32`**: The number of MSMs to compute in a single batch, for leveraging parallelism.
- **`are_scalars_montgomery_form`**: Set to `true` if scalars are in montgomery form.
- **`are_points_montgomery_form`**: Set to `true` if points are in montgomery form.
- **`are_scalars_on_device: bool`**, **`are_points_on_device: bool`**, **`are_results_on_device: bool`**: Indicate whether the corresponding buffers are on the device memory.
- **`is_big_triangle`**: If `true` MSM will run in Large triangle accumulation if `false` Bucket accumulation will be chosen. Default value: false.
- **`is_async: bool`**: Whether to perform the MSM operation asynchronously.
### Usage
The `msm` function is designed to compute the sum of multiple scalar-point multiplications efficiently. It supports both single MSM operations and batched operations for increased performance. The configuration allows for detailed control over the execution environment and performance characteristics of the MSM operation.
When performing MSM operations, it's crucial to match the size of the `scalars` and `points` arrays correctly and ensure that the `results` buffer is appropriately sized to hold the output. The `MSMConfig` should be set up to reflect the specifics of the operation, including whether the operation should be asynchronous and any device-specific settings.
## How do I toggle between the supported algorithms?
When creating your MSM Config you may state which algorithm you wish to use. `is_big_triangle=true` will activate Large triangle accumulation and `is_big_triangle=false` will activate Bucket accumulation.
```rust
...
let mut cfg_bls12377 = msm::get_default_msm_config::<BLS12377CurveCfg>();
// is_big_triangle will determine which algorithm to use
cfg_bls12377.is_big_triangle = true;
msm::msm(&scalars, &points, &cfg, &mut msm_results).unwrap();
...
```
You may reference the rust code [here](https://github.com/ingonyama-zk/icicle/blob/77a7613aa21961030e4e12bf1c9a78a2dadb2518/wrappers/rust/icicle-core/src/msm/mod.rs#L54).
## How do I toggle between MSM modes?
Toggling between MSM modes occurs automatically based on the number of results you are expecting from the `msm::msm` function. If you are expecting an array of `msm_results`, ICICLE will automatically split `scalars` and `points` into equal parts and run them as multiple MSMs in parallel.
```rust
...
let mut msm_result: HostOrDeviceSlice<'_, G1Projective> = HostOrDeviceSlice::cuda_malloc(1).unwrap();
msm::msm(&scalars, &points, &cfg, &mut msm_result).unwrap();
...
```
In the example above we allocate a single expected result which the MSM method will interpret as `batch_size=1` and run a single MSM.
In the next example, we are expecting 10 results which sets `batch_size=10` and runs 10 MSMs in batch mode.
```rust
...
let mut msm_results: HostOrDeviceSlice<'_, G1Projective> = HostOrDeviceSlice::cuda_malloc(10).unwrap();
msm::msm(&scalars, &points, &cfg, &mut msm_results).unwrap();
...
```
Here is a [reference](https://github.com/ingonyama-zk/icicle/blob/77a7613aa21961030e4e12bf1c9a78a2dadb2518/wrappers/rust/icicle-core/src/msm/mod.rs#L108) to the code which automatically sets the batch size. For more MSM examples have a look [here](https://github.com/ingonyama-zk/icicle/blob/77a7613aa21961030e4e12bf1c9a78a2dadb2518/examples/rust/msm/src/main.rs#L1).
## Support for G2 group
MSM also supports G2 group.
Using MSM in G2 requires a G2 config, and of course your Points should also be G2 Points.
```rust
...
let scalars = HostOrDeviceSlice::Host(upper_scalars[..size].to_vec());
let g2_points = HostOrDeviceSlice::Host(g2_upper_points[..size].to_vec());
let mut g2_msm_results: HostOrDeviceSlice<'_, G2Projective> = HostOrDeviceSlice::cuda_malloc(1).unwrap();
let mut g2_cfg = msm::get_default_msm_config::<G2CurveCfg>();
msm::msm(&scalars, &g2_points, &g2_cfg, &mut g2_msm_results).unwrap();
...
```
Here you can [find an example](https://github.com/ingonyama-zk/icicle/blob/5a96f9937d0a7176d88c766bd3ef2062b0c26c37/examples/rust/msm/src/main.rs#L114) of MSM on G2 Points.

View File

@@ -0,0 +1,202 @@
# Multi GPU APIs
To learn more about the theory of Multi GPU programming refer to [this part](../multi-gpu.md) of documentation.
Here we will cover the core multi GPU apis and a [example](#a-multi-gpu-example)
## A Multi GPU example
In this example we will display how you can
1. Fetch the number of devices installed on a machine
2. For every GPU launch a thread and set an active device per thread.
3. Execute a MSM on each GPU
```rust
...
let device_count = get_device_count().unwrap();
(0..device_count)
.into_par_iter()
.for_each(move |device_id| {
set_device(device_id).unwrap();
// you can allocate points and scalars_d here
let mut cfg = MSMConfig::default_for_device(device_id);
cfg.ctx.stream = &stream;
cfg.is_async = true;
cfg.are_scalars_montgomery_form = true;
msm(&scalars_d, &HostOrDeviceSlice::on_host(points), &cfg, &mut msm_results).unwrap();
// collect and process results
})
...
```
We use `get_device_count` to fetch the number of connected devices, device IDs will be `0, 1, 2, ..., device_count - 1`
[`into_par_iter`](https://docs.rs/rayon/latest/rayon/iter/trait.IntoParallelIterator.html#tymethod.into_par_iter) is a parallel iterator, you should expect it to launch a thread for every iteration.
We then call `set_device(device_id).unwrap();` it should set the context of that thread to the selected `device_id`.
Any data you now allocate from the context of this thread will be linked to the `device_id`. We create our `MSMConfig` with the selected device ID `let mut cfg = MSMConfig::default_for_device(device_id);`, behind the scene this will create for us a `DeviceContext` configured for that specific GPU.
We finally call our `msm` method.
## Device management API
To streamline device management we offer as part of `icicle-cuda-runtime` package methods for dealing with devices.
#### [`set_device`](https://github.com/ingonyama-zk/icicle/blob/e6035698b5e54632f2c44e600391352ccc11cad4/wrappers/rust/icicle-cuda-runtime/src/device.rs#L6)
Sets the current CUDA device by its ID, when calling `set_device` it will set the current thread to a CUDA device.
**Parameters:**
- `device_id: usize`: The ID of the device to set as the current device. Device IDs start from 0.
**Returns:**
- `CudaResult<()>`: An empty result indicating success if the device is set successfully. In case of failure, returns a `CudaError`.
**Errors:**
- Returns a `CudaError` if the specified device ID is invalid or if a CUDA-related error occurs during the operation.
**Example:**
```rust
let device_id = 0; // Device ID to set
match set_device(device_id) {
Ok(()) => println!("Device set successfully."),
Err(e) => eprintln!("Failed to set device: {:?}", e),
}
```
#### [`get_device_count`](https://github.com/ingonyama-zk/icicle/blob/e6035698b5e54632f2c44e600391352ccc11cad4/wrappers/rust/icicle-cuda-runtime/src/device.rs#L10)
Retrieves the number of CUDA devices available on the machine.
**Returns:**
- `CudaResult<usize>`: The number of available CUDA devices. On success, contains the count of CUDA devices. On failure, returns a `CudaError`.
**Errors:**
- Returns a `CudaError` if a CUDA-related error occurs during the retrieval of the device count.
**Example:**
```rust
match get_device_count() {
Ok(count) => println!("Number of devices available: {}", count),
Err(e) => eprintln!("Failed to get device count: {:?}", e),
}
```
#### [`get_device`](https://github.com/ingonyama-zk/icicle/blob/e6035698b5e54632f2c44e600391352ccc11cad4/wrappers/rust/icicle-cuda-runtime/src/device.rs#L15)
Retrieves the ID of the current CUDA device.
**Returns:**
- `CudaResult<usize>`: The ID of the current CUDA device. On success, contains the device ID. On failure, returns a `CudaError`.
**Errors:**
- Returns a `CudaError` if a CUDA-related error occurs during the retrieval of the current device ID.
**Example:**
```rust
match get_device() {
Ok(device_id) => println!("Current device ID: {}", device_id),
Err(e) => eprintln!("Failed to get current device: {:?}", e),
}
```
## Device context API
The `DeviceContext` is embedded into `NTTConfig`, `MSMConfig` and `PoseidonConfig`, meaning you can simply pass a `device_id` to your existing config and the same computation will be triggered on a different device.
#### [`DeviceContext`](https://github.com/ingonyama-zk/icicle/blob/e6035698b5e54632f2c44e600391352ccc11cad4/wrappers/rust/icicle-cuda-runtime/src/device_context.rs#L11)
Represents the configuration a CUDA device, encapsulating the device's stream, ID, and memory pool. The default device is always `0`.
```rust
pub struct DeviceContext<'a> {
pub stream: &'a CudaStream,
pub device_id: usize,
pub mempool: CudaMemPool,
}
```
##### Fields
- **`stream: &'a CudaStream`**
A reference to a `CudaStream`. This stream is used for executing CUDA operations. By default, it points to a null stream CUDA's default execution stream.
- **`device_id: usize`**
The index of the GPU currently in use. The default value is `0`, indicating the first GPU in the system.
In some cases assuming `CUDA_VISIBLE_DEVICES` was configured, for example as `CUDA_VISIBLE_DEVICES=2,3,7` in the system with 8 GPUs - the `device_id=0` will correspond to GPU with id 2. So the mapping may not always be a direct reflection of the number of GPUs installed on a system.
- **`mempool: CudaMemPool`**
Represents the memory pool used for CUDA memory allocations. The default is set to a null pointer, which signifies the use of the default CUDA memory pool.
##### Implementation Notes
- The `DeviceContext` structure is cloneable and can be debugged, facilitating easier logging and duplication of contexts when needed.
#### [`DeviceContext::default_for_device(device_id: usize) -> DeviceContext<'static>`](https://github.com/ingonyama-zk/icicle/blob/e6035698b5e54632f2c44e600391352ccc11cad4/wrappers/rust/icicle-cuda-runtime/src/device_context.rs#L30)
Provides a default `DeviceContext` with system-wide defaults, ideal for straightforward setups.
#### Returns
A `DeviceContext` instance configured with:
- The default stream (`null_mut()`).
- The default device ID (`0`).
- The default memory pool (`null_mut()`).
#### Parameters
- **`device_id: usize`**: The ID of the device for which to create the context.
#### Returns
A `DeviceContext` instance with the provided `device_id` and default settings for the stream and memory pool.
#### [`check_device(device_id: i32)`](https://github.com/vhnatyk/icicle/blob/eef6876b037a6b0797464e7cdcf9c1ecfcf41808/wrappers/rust/icicle-cuda-runtime/src/device_context.rs#L42)
Validates that the specified `device_id` matches the ID of the currently active device, ensuring operations are targeted correctly.
#### Parameters
- **`device_id: i32`**: The device ID to verify against the currently active device.
#### Behavior
- **Panics** if the `device_id` does not match the active device's ID, preventing cross-device operation errors.
#### Example
```rust
let device_id: i32 = 0; // Example device ID
check_device(device_id);
// Ensures that the current context is correctly set for the specified device ID.
```

View File

@@ -0,0 +1,199 @@
# NTT
### Supported curves
`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`
## Example
```rust
use icicle_bn254::curve::{ScalarCfg, ScalarField};
use icicle_core::{ntt::{self, NTT}, traits::GenerateRandom};
use icicle_cuda_runtime::{device_context::DeviceContext, memory::HostOrDeviceSlice, stream::CudaStream};
fn main() {
let size = 1 << 12; // Define the size of your input, e.g., 2^10
let icicle_omega = <Bn254Fr as FftField>::get_root_of_unity(
size.try_into()
.unwrap(),
)
// Generate random inputs
println!("Generating random inputs...");
let scalars = HostOrDeviceSlice::Host(ScalarCfg::generate_random(size));
// Allocate memory on CUDA device for NTT results
let mut ntt_results: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::cuda_malloc(size).expect("Failed to allocate CUDA memory");
// Create a CUDA stream
let stream = CudaStream::create().expect("Failed to create CUDA stream");
let ctx = DeviceContext::default(); // Assuming default device context
ScalarCfg::initialize_domain(ScalarField::from_ark(icicle_omega), &ctx).unwrap();
// Configure NTT
let mut cfg = ntt::NTTConfig::default();
cfg.ctx.stream = &stream;
cfg.is_async = true; // Set to true for asynchronous execution
// Execute NTT on device
println!("Executing NTT on device...");
ntt::ntt(&scalars, ntt::NTTDir::kForward, &cfg, &mut ntt_results).expect("Failed to execute NTT");
// Synchronize CUDA stream to ensure completion
stream.synchronize().expect("Failed to synchronize CUDA stream");
// Optionally, move results to host for further processing or verification
println!("NTT execution complete.");
}
```
## NTT API overview
```rust
pub fn ntt<F>(
input: &HostOrDeviceSlice<F>,
dir: NTTDir,
cfg: &NTTConfig<F>,
output: &mut HostOrDeviceSlice<F>,
) -> IcicleResult<()>
```
`ntt:ntt` expects:
`input` - buffer to read the inputs of the NTT from. <br/>
`dir` - whether to compute forward or inverse NTT. <br/>
`cfg` - config used to specify extra arguments of the NTT. <br/>
`output` - buffer to write the NTT outputs into. Must be of the same size as input.
The `input` and `output` buffers can be on device or on host. Being on host means that they will be transferred to device during runtime.
### NTT Config
```rust
pub struct NTTConfig<'a, S> {
pub ctx: DeviceContext<'a>,
pub coset_gen: S,
pub batch_size: i32,
pub columns_batch: bool,
pub ordering: Ordering,
are_inputs_on_device: bool,
are_outputs_on_device: bool,
pub is_async: bool,
pub ntt_algorithm: NttAlgorithm,
}
```
The `NTTConfig` struct is a configuration object used to specify parameters for an NTT instance.
#### Fields
- **`ctx: DeviceContext<'a>`**: Specifies the device context, including the device ID and the stream ID.
- **`coset_gen: S`**: Defines the coset generator used for coset (i)NTTs. By default, this is set to `S::one()`, indicating that no coset is being used.
- **`batch_size: i32`**: Determines the number of NTTs to compute in a single batch. The default value is 1, meaning that operations are performed on individual inputs without batching. Batch processing can significantly improve performance by leveraging parallelism in GPU computations.
- **`columns_batch`**: If true the function will compute the NTTs over the columns of the input matrix and not over the rows. Defaults to `false`.
- **`ordering: Ordering`**: Controls the ordering of inputs and outputs for the NTT operation. This field can be used to specify decimation strategies (in time or in frequency) and the type of butterfly algorithm (Cooley-Tukey or Gentleman-Sande). The ordering is crucial for compatibility with various algorithmic approaches and can impact the efficiency of the NTT.
- **`are_inputs_on_device: bool`**: Indicates whether the input data has been preloaded on the device memory. If `false` inputs will be copied from host to device.
- **`are_outputs_on_device: bool`**: Indicates whether the output data is preloaded in device memory. If `false` outputs will be copied from host to device. If the inputs and outputs are the same pointer NTT will be computed in place.
- **`is_async: bool`**: Specifies whether the NTT operation should be performed asynchronously. When set to `true`, the NTT function will not block the CPU, allowing other operations to proceed concurrently. Asynchronous execution requires careful synchronization to ensure data integrity and correctness.
- **`ntt_algorithm: NttAlgorithm`**: Can be one of `Auto`, `Radix2`, `MixedRadix`.
`Auto` will select `Radix 2` or `Mixed Radix` algorithm based on heuristics.
`Radix2` and `MixedRadix` will force the use of an algorithm regardless of the input size or other considerations. You should use one of these options when you know for sure that you want to
#### Usage
Example initialization with default settings:
```rust
let default_config = NTTConfig::default();
```
Customizing the configuration:
```rust
let custom_config = NTTConfig {
ctx: custom_device_context,
coset_gen: my_coset_generator,
batch_size: 10,
columns_batch: false,
ordering: Ordering::kRN,
are_inputs_on_device: true,
are_outputs_on_device: true,
is_async: false,
ntt_algorithm: NttAlgorithm::MixedRadix,
};
```
### Modes
NTT supports two different modes `Batch NTT` and `Single NTT`
You may toggle between single and batch NTT by simply configure `batch_size` to be larger then 1 in your `NTTConfig`.
```rust
let mut cfg = ntt::get_default_ntt_config::<ScalarField>();
cfg.batch_size = 10 // your ntt using this config will run in batch mode.
```
`batch_size=1` would keep our NTT in single NTT mode.
Deciding weather to use `batch NTT` vs `single NTT` is highly dependent on your application and use case.
### Initializing the NTT Domain
Before performing NTT operations, its necessary to initialize the NTT domain, It only needs to be called once per GPU since the twiddles are cached.
```rust
ScalarCfg::initialize_domain(ScalarField::from_ark(icicle_omega), &ctx).unwrap();
```
### `initialize_domain`
```rust
pub fn initialize_domain<F>(primitive_root: F, ctx: &DeviceContext) -> IcicleResult<()>
where
F: FieldImpl,
<F as FieldImpl>::Config: NTT<F>;
```
#### Parameters
- **`primitive_root`**: The primitive root of unity, chosen based on the maximum NTT size required for the computations. It must be of an order that is a power of two. This root is used to generate twiddle factors that are essential for the NTT operations.
- **`ctx`**: A reference to a `DeviceContext` specifying which device and stream the computation should be executed on.
#### Returns
- **`IcicleResult<()>`**: Will return an error if the operation fails.
### `initialize_domain_fast_twiddles_mode`
Similar to `initialize_domain`, `initialize_domain_fast_twiddles_mode` is a faster implementation and can be used for larger NTTs.
```rust
pub fn initialize_domain_fast_twiddles_mode<F>(primitive_root: F, ctx: &DeviceContext) -> IcicleResult<()>
where
F: FieldImpl,
<F as FieldImpl>::Config: NTT<F>;
```
#### Parameters
- **`primitive_root`**: The primitive root of unity, chosen based on the maximum NTT size required for the computations. It must be of an order that is a power of two. This root is used to generate twiddle factors that are essential for the NTT operations.
- **`ctx`**: A reference to a `DeviceContext` specifying which device and stream the computation should be executed on.
#### Returns
- **`IcicleResult<()>`**: Will return an error if the operation fails.

View File

@@ -0,0 +1,159 @@
# Vector Operations API
Our vector operations API which is part of `icicle-cuda-runtime` package, includes fundamental methods for addition, subtraction, and multiplication of vectors, with support for both host and device memory.
## Supported curves
Vector operations are supported on the following curves:
`bls12-377`, `bls12-381`, `bn-254`, `bw6-761`, `grumpkin`
## Examples
### Addition of Scalars
```rust
use icicle_bn254::curve::{ScalarCfg, ScalarField};
use icicle_core::vec_ops::{add_scalars};
let test_size = 1 << 18;
let a: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size));
let b: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size));
let mut result: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(vec![F::zero(); test_size]);
let cfg = VecOpsConfig::default();
add_scalars(&a, &b, &mut result, &cfg).unwrap();
```
### Subtraction of Scalars
```rust
use icicle_bn254::curve::{ScalarCfg, ScalarField};
use icicle_core::vec_ops::{sub_scalars};
let test_size = 1 << 18;
let a: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size));
let b: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size));
let mut result: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(vec![F::zero(); test_size]);
let cfg = VecOpsConfig::default();
sub_scalars(&a, &b, &mut result, &cfg).unwrap();
```
### Multiplication of Scalars
```rust
use icicle_bn254::curve::{ScalarCfg, ScalarField};
use icicle_core::vec_ops::{mul_scalars};
let test_size = 1 << 18;
let a: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(F::Config::generate_random(test_size));
let ones: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(vec![F::one(); test_size]);
let mut result: HostOrDeviceSlice<'_, ScalarField> = HostOrDeviceSlice::on_host(vec![F::zero(); test_size]);
let cfg = VecOpsConfig::default();
mul_scalars(&a, &ones, &mut result, &cfg).unwrap();
```
## Vector Operations Configuration
The `VecOpsConfig` struct encapsulates the settings for vector operations, including device context and operation modes.
### `VecOpsConfig`
Defines configuration parameters for vector operations.
```rust
pub struct VecOpsConfig<'a> {
pub ctx: DeviceContext<'a>,
is_a_on_device: bool,
is_b_on_device: bool,
is_result_on_device: bool,
is_result_montgomery_form: bool,
pub is_async: bool,
}
```
#### Fields
- **`ctx: DeviceContext<'a>`**: Specifies the device context for the operation, including the device ID and memory pool.
- **`is_a_on_device`**: Indicates if the first operand vector resides in device memory.
- **`is_b_on_device`**: Indicates if the second operand vector resides in device memory.
- **`is_result_on_device`**: Specifies if the result vector should be stored in device memory.
- **`is_result_montgomery_form`**: Determines if the result should be in Montgomery form.
- **`is_async`**: Enables asynchronous operation. If `true`, operations are non-blocking; otherwise, they block the current thread.
### Default Configuration
`VecOpsConfig` can be initialized with default settings tailored for a specific device:
```
let cfg = VecOpsConfig::default();
```
These are the default settings.
```rust
impl<'a> Default for VecOpsConfig<'a> {
fn default() -> Self {
Self::default_for_device(DEFAULT_DEVICE_ID)
}
}
impl<'a> VecOpsConfig<'a> {
pub fn default_for_device(device_id: usize) -> Self {
VecOpsConfig {
ctx: DeviceContext::default_for_device(device_id),
is_a_on_device: false,
is_b_on_device: false,
is_result_on_device: false,
is_result_montgomery_form: false,
is_async: false,
}
}
}
```
## Vector Operations
Vector operations are implemented through the `VecOps` trait, these traits are implemented for all [supported curves](#supported-curves) providing methods for addition, subtraction, and multiplication of vectors.
### `VecOps` Trait
```rust
pub trait VecOps<F> {
fn add(
a: &HostOrDeviceSlice<F>,
b: &HostOrDeviceSlice<F>,
result: &mut HostOrDeviceSlice<F>,
cfg: &VecOpsConfig,
) -> IcicleResult<()>;
fn sub(
a: &HostOrDeviceSlice<F>,
b: &HostOrDeviceSlice<F>,
result: &mut HostOrDeviceSlice<F>,
cfg: &VecOpsConfig,
) -> IcicleResult<()>;
fn mul(
a: &HostOrDeviceSlice<F>,
b: &HostOrDeviceSlice<F>,
result: &mut HostOrDeviceSlice<F>,
cfg: &VecOpsConfig,
) -> IcicleResult<()>;
}
```
#### Methods
All operations are element-wise operations, and the results placed into the `result` param. These operations are not in place.
- **`add`**: Computes the element-wise sum of two vectors.
- **`sub`**: Computes the element-wise difference between two vectors.
- **`mul`**: Performs element-wise multiplication of two vectors.

View File

@@ -0,0 +1,117 @@
# Supporting Additional Curves
We understand the need for ZK developers to use different curves, some common some more exotic. For this reason we designed ICICLE to allow developers to add any curve they desire.
## ICICLE Core
ICICLE core is very generic by design so all algorithms and primitives are designed to work based of configuration files [selected during compile](https://github.com/ingonyama-zk/icicle/blob/main/icicle/curves/curve_config.cuh) time. This is why we compile ICICLE Core per curve.
To add support for a new curve you must create a new file under [`icicle/curves`](https://github.com/ingonyama-zk/icicle/tree/main/icicle/curves). The file should be named `<curve_name>_params.cuh`.
### Adding curve_name_params.cuh
Start by copying `bn254_params.cuh` contents in your params file. Params should include:
- **fq_config** - parameters of the Base field.
- **limbs_count** - `ceil(field_byte_size / 4)`.
- **modulus_bit_count** - bit-size of the modulus.
- **num_of_reductions** - the number of times to reduce in reduce function. Use 2 if not sure.
- **modulus** - modulus of the field.
- **modulus_2** - modulus * 2.
- **modulus_4** - modulus * 4.
- **neg_modulus** - negated modulus.
- **modulus_wide** - modulus represented as a double-sized integer.
- **modulus_squared** - modulus**2 represented as a double-sized integer.
- **modulus_squared_2** - 2 * modulus**2 represented as a double-sized integer.
- **modulus_squared_4** - 4 * modulus**2 represented as a double-sized integer.
- **m** - value used in multiplication. Can be computed as `2**(2*modulus_bit_count) // modulus`.
- **one** - multiplicative identity.
- **zero** - additive identity.
- **montgomery_r** - `2 ** M % modulus` where M is a closest (larger than) bitsize multiple of 32. E.g. 384 or 768 for bls and bw curves respectively
- **montgomery_r_inv** - `2 ** (-M) % modulus`
- **fp_config** - parameters of the Scalar field.
Same as fq_config, but with additional arguments:
- **omegas_count** - [two-adicity](https://cryptologie.net/article/559/whats-two-adicity/) of the field. And thus the maximum size of NTT.
- **omegas** - an array of omegas for NTTs. An array of size `omegas_count`. The ith element is equal to `1.nth_root(2**(2**(omegas_count-i)))`.
- **inv** - an array of inverses of powers of two in a field. Ith element is equal to `(2 ** (i+1)) ** -1`.
- **G1 generators points** - affine coordinates of the generator point.
- **G2 generators points** - affine coordinates of the extension generator. Remove these if `G2` is not supported.
- **Weierstrass b value** - base field element equal to value of `b` in the curve equation.
- **Weierstrass b value G2** - base field element equal to value of `b` for the extension. Remove this if `G2` is not supported.
:::note
All the params are not in Montgomery form.
:::
:::note
To convert number values into `storage` type you can use the following python function
```python
import struct
def unpack(x, field_size):
return ', '.join(["0x" + format(x, '08x') for x in struct.unpack('I' * (field_size) // 4, int(x).to_bytes(field_size, 'little'))])
```
:::
We also require some changes to [`curve_config.cuh`](https://github.com/ingonyama-zk/icicle/blob/main/icicle/curves/curve_config.cuh#L16-L29), we need to add a new curve id.
```
...
#define BN254 1
#define BLS12_381 2
#define BLS12_377 3
#define BW6_761 4
#define GRUMPKIN 5
#define <curve_name> 6
...
```
Make sure to modify the [rest of the file](https://github.com/ingonyama-zk/icicle/blob/4beda3a900eda961f39af3a496f8184c52bf3b41/icicle/curves/curve_config.cuh#L16-L29) accordingly.
Finally we must modify the [`make` file](https://github.com/ingonyama-zk/icicle/blob/main/icicle/CMakeLists.txt#L64) to make sure we can compile our new curve.
```
set(SUPPORTED_CURVES bn254;bls12_381;bls12_377;bw6_761;grumpkin;<curve_name>)
```
### Adding Poseidon support
If you want your curve to implement a Poseidon hash function or a tree builder, you will need to pre-calculate its optimized parameters.
Copy [constants_template.h](https://github.com/ingonyama-zk/icicle/blob/main/icicle/appUtils/poseidon/constants/constants_template.h) into `icicle/appUtils/poseidon/constants/<CURVE>_poseidon.h`. Run the [constants generation script](https://dev.ingonyama.com/icicle/primitives/poseidon#constants). The script will print the number of partial rounds and generate a `constants.bin` file. Use `xxd -i constants.bin` to parse the file into C declarations. Copy the `unsigned char constants_bin[]` contents inside your new file. Repeat this process for arities 2, 4, 8 and 11.
After you've generated the constants, add your curve in this [SUPPORTED_CURVES_WITH_POSEIDON](https://github.com/ingonyama-zk/icicle/blob/main/icicle/CMakeLists.txt#L72) in the `CMakeLists.txt`.
## Bindings
In order to support a new curve in the binding libraries you first must support it in ICICLE core.
### Rust
Go to [rust curves folder](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/icicle-curves) and copy `icicle-curve-template` to a new folder named `icicle-<curve_name>`.
Find all the occurrences of `<CURVE>` placeholder inside the crate. (You can use `Ctrl+Shift+F` in VS Code or `grep -nr "<CURVE>"` in bash). You will then need to replace each occurrence with your new curve name.
#### Limbs
Go to your curve's `curve.rs` file and set `SCALAR_LIMBS`, `BASE_LIMBS` and `G2_BASE_LIMBS` (if G2 is needed) to a minimum number of `u64` required to store a single scalar field / base field element respectively.
e.g. for bn254, scalar field is 254 bit so `SCALAR_LIMBS` is set to 4.
#### Primitives
If your curve doesn't support some of the primitives (ntt/msm/poseidon/merkle tree/), or you simply don't want to include it, just remove a corresponding module from `src` and then from `lib.rs`
#### G2
If your curve doesn't support G2 - remove all the code under `#[cfg(feature = "g2")]` and remove the feature from [Cargo.toml](https://github.com/ingonyama-zk/icicle/blob/main/wrappers/rust/icicle-curves/icicle-bn254/Cargo.toml#L29) and [build.rs](https://github.com/ingonyama-zk/icicle/blob/main/wrappers/rust/icicle-curves/icicle-bn254/build.rs#L15).
After this is done, add your new crate in the [global Cargo.toml](https://github.com/ingonyama-zk/icicle/tree/main/wrappers/rust/Cargo.toml).
### Golang
Golang is WIP in v1, coming soon. Please checkout a previous [release v0.1.0](https://github.com/ingonyama-zk/icicle/releases/tag/v0.1.0) for golang bindings.

47
docs/docs/introduction.md Normal file
View File

@@ -0,0 +1,47 @@
---
slug: /
displayed_sidebar: GettingStartedSidebar
title: ''
---
# Welcome to Ingonyama's Developer Documentation
Ingonyama is a next-generation semiconductor company, focusing on Zero-Knowledge Proof hardware acceleration. We build accelerators for advanced cryptography, unlocking real-time applications. Our focus is on democratizing access to compute intensive cryptography and making it accessible for developers to build on top of.
Currently our flagship products are:
- **ICICLE**:
[ICICLE](https://github.com/ingonyama-zk/icicle) is a fully featured GPU accelerated cryptography library for building ZK provers. ICICLE allows you to accelerate your ZK existing protocols in a matter of hours or implement your protocol from scratch on GPU.
---
## Our current take on hardware acceleration
We believe GPUs are as important for ZK as for AI.
- GPUs are a perfect match for ZK compute - around 97% of ZK protocol runtime is parallel by nature.
- GPUs are simple for developers to use and scale compared to other hardware platforms.
- GPUs are extremely competitive in terms of power / performance and price (3x cheaper compared to FPGAs).
- GPUs are popular and readily available.
For a more in-depth understanding on this topic we suggest you read [our article on the subject](https://www.ingonyama.com/blog/revisiting-paradigm-hardware-acceleration-for-zero-knowledge-proofs).
Despite our current focus on GPUs we are still hard at work developing a ZPU (ZK Processing Unit), with the goal of offering a programmable hardware platform for ZK. To read more about ZPUs we suggest you read this [article](https://medium.com/@ingonyama/zpu-the-zero-knowledge-processing-unit-f886a48e00e0).
## ICICLE
[ICICLE](https://github.com/ingonyama-zk/icicle) is a cryptography library for ZK using GPUs.
ICICLE implements blazing fast cryptographic primitives such as EC operations, MSM, NTT, Poseidon hash and more on GPU.
ICICLE is designed to be easy to use, developers don't have to touch a single line of CUDA code. Our Rust and Golang bindings allow your team to transition from CPU to GPU with minimal changes.
Learn more about ICICLE and GPUs [here][ICICLE-OVERVIEW].
## Get in Touch
If you have any questions, ideas, or are thinking of building something in this space join the discussion on [Discord]. You can explore our code on [github](https://github.com/ingonyama-zk) or read some of [our research papers](https://github.com/ingonyama-zk/papers).
Follow us on [Twitter](https://x.com/Ingo_zk) and [YouTube](https://www.youtube.com/@ingo_ZK) and sign up for our [mailing list](https://wkf.ms/3LKCbdj) to get our latest announcements.
[ICICLE-OVERVIEW]: ./icicle/overview.md
[Discord]: https://discord.gg/6vYrE7waPj

171
docs/docusaurus.config.js Normal file
View File

@@ -0,0 +1,171 @@
// @ts-check
const lightCodeTheme = require('prism-react-renderer/themes/github');
const darkCodeTheme = require('prism-react-renderer/themes/dracula');
const math = require('remark-math');
const katex = require('rehype-katex');
/** @type {import('@docusaurus/types').Config} */
const config = {
title: 'Ingonyama Developer Documentation',
tagline: 'Ingonyama is a next-generation semiconductor company, focusing on Zero-Knowledge Proof hardware acceleration. We build accelerators for advanced cryptography, unlocking real-time applications.',
url: 'https://dev.ingonyama.com/',
baseUrl: '/',
onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'warn',
favicon: 'img/logo.png',
organizationName: 'ingonyama-zk',
projectName: 'developer-docs',
trailingSlash: false,
deploymentBranch: "main",
presets: [
[
'classic',
/** @type {import('@docusaurus/preset-classic').Options} */
({
docs: {
showLastUpdateAuthor: true,
showLastUpdateTime: true,
routeBasePath: '/',
remarkPlugins: [math, require('mdx-mermaid')],
rehypePlugins: [katex],
sidebarPath: require.resolve('./sidebars.js'),
editUrl: 'https://github.com/ingonyama-zk/icicle/tree/main',
},
blog: {
remarkPlugins: [math, require('mdx-mermaid')],
rehypePlugins: [katex],
showReadingTime: true,
editUrl: 'https://github.com/ingonyama-zk/icicle/tree/main',
},
pages: {},
theme: {
customCss: require.resolve('./src/css/custom.css'),
},
}),
],
],
stylesheets: [
{
href: 'https://cdn.jsdelivr.net/npm/katex@0.13.24/dist/katex.min.css',
type: 'text/css',
integrity:
'sha384-odtC+0UGzzFL/6PNoE8rX/SPcQDXBJ+uRepguP4QkPCm2LBxH3FA3y+fKSiJ+AmM',
crossorigin: 'anonymous',
},
],
scripts: [
{
src: 'https://plausible.io/js/script.js',
'data-domain':'ingonyama.com',
defer: true,
},
],
themeConfig:
/** @type {import('@docusaurus/preset-classic').ThemeConfig} */
({
metadata: [
{name: 'twitter:card', content: 'summary_large_image'},
{name: 'twitter:site', content: '@Ingo_zk'},
{name: 'twitter:title', content: 'Ingonyama Developer Documentation'},
{name: 'twitter:description', content: 'Ingonyama is a next-generation semiconductor company focusing on Zero-Knowledge Proof hardware acceleration...'},
{name: 'twitter:image', content: 'https://dev.ingonyama.com/img/logo.png'},
// title
{name: 'og:title', content: 'Ingonyama Developer Documentation'},
{name: 'og:description', content: 'Ingonyama is a next-generation semiconductor company focusing on Zero-Knowledge Proof hardware acceleration...'},
{name: 'og:image', content: 'https://dev.ingonyama.com/img/logo.png'},
],
hideableSidebar: true,
colorMode: {
defaultMode: 'dark',
respectPrefersColorScheme: false,
},
algolia: {
// The application ID provided by Algolia
appId: 'PZY4KJBBBK',
// Public API key: it is safe to commit it
apiKey: '2cc940a6e0ef5c117f4f44e7f4e6e20b',
indexName: 'ingonyama',
// Optional: see doc section below
contextualSearch: true,
// Optional: Specify domains where the navigation should occur through window.location instead on history.push. Useful when our Algolia config crawls multiple documentation sites and we want to navigate with window.location.href to them.
externalUrlRegex: 'external\\.com|domain\\.com',
// Optional: Replace parts of the item URLs from Algolia. Useful when using the same search index for multiple deployments using a different baseUrl. You can use regexp or string in the `from` param. For example: localhost:3000 vs myCompany.com/docs
replaceSearchResultPathname: {
from: '/docs/', // or as RegExp: /\/docs\//
to: '/',
},
// Optional: Algolia search parameters
searchParameters: {},
// Optional: path for search page that enabled by default (`false` to disable it)
searchPagePath: 'search',
},
navbar: {
title: 'Ingonyama Developer Documentation',
logo: {
alt: 'Ingonyama Logo',
src: 'img/logo.png',
},
items: [
{
position: 'left',
label: 'Docs',
to: '/',
},
{
href: 'https://github.com/ingonyama-zk',
position: 'right',
label: 'GitHub',
},
{
href: 'https://www.ingonyama.com/ingopedia/glossary',
position: 'right',
label: 'Ingopedia',
},
{
type: 'dropdown',
position: 'right',
label: 'Community',
items: [
{
label: 'Discord',
href: 'https://discord.gg/6vYrE7waPj',
},
{
label: 'Twitter',
href: 'https://x.com/Ingo_zk',
},
{
label: 'YouTube',
href: 'https://www.youtube.com/@ingo_ZK'
},
{
label: 'Mailing List',
href: 'https://wkf.ms/3LKCbdj',
}
]
},
],
},
footer: {
copyright: `Copyright © ${new Date().getFullYear()} Ingonyama, Inc. Built with Docusaurus.`,
},
prism: {
theme: lightCodeTheme,
darkTheme: darkCodeTheme,
},
image: 'img/logo.png',
}),
};
module.exports = config;

13681
docs/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

48
docs/package.json Normal file
View File

@@ -0,0 +1,48 @@
{
"name": "docusaurus",
"version": "0.0.0",
"private": true,
"description": "Ingonyama - developer docs",
"scripts": {
"docusaurus": "docusaurus",
"start": "docusaurus start",
"build": "docusaurus build",
"swizzle": "docusaurus swizzle",
"deploy": "docusaurus deploy",
"clear": "docusaurus clear",
"serve": "docusaurus serve",
"write-translations": "docusaurus write-translations",
"write-heading-ids": "docusaurus write-heading-ids",
"dev": "docusaurus start",
"format": "prettier --write '**/*.md'"
},
"dependencies": {
"@docusaurus/core": "2.0.0-beta.18",
"@docusaurus/preset-classic": "2.0.0-beta.18",
"@mdx-js/react": "^1.6.22",
"clsx": "^1.1.1",
"hast-util-is-element": "1.1.0",
"mdx-mermaid": "^1.2.2",
"mermaid": "^9.1.2",
"prism-react-renderer": "^1.3.1",
"react": "^17.0.2",
"react-dom": "^17.0.2",
"rehype-katex": "5",
"remark-math": "3"
},
"browserslist": {
"production": [
">0.5%",
"not dead",
"not op_mini all"
],
"development": [
"last 1 chrome version",
"last 1 firefox version",
"last 1 safari version"
]
},
"devDependencies": {
"prettier": "^3.2.4"
}
}

10
docs/sandbox.config.json Normal file
View File

@@ -0,0 +1,10 @@
{
"infiniteLoopProtection": true,
"hardReloadOnChange": true,
"view": "browser",
"template": "docusaurus",
"node": "14",
"container": {
"node": "14"
}
}

197
docs/sidebars.js Normal file
View File

@@ -0,0 +1,197 @@
module.exports = {
GettingStartedSidebar: [
{
type: "doc",
label: "Introduction",
id: "introduction",
},
{
type: "category",
label: "ICICLE",
link: {
type: `doc`,
id: 'icicle/overview',
},
collapsed: false,
items: [
{
type: "doc",
label: "Getting started",
id: "icicle/introduction"
},
{
type: "doc",
label: "ICICLE Provers",
id: "icicle/integrations"
},
{
type: "category",
label: "Golang bindings",
link: {
type: `doc`,
id: "icicle/golang-bindings",
},
collapsed: true,
items: [
{
type: "category",
label: "MSM",
link: {
type: `doc`,
id: "icicle/golang-bindings/msm",
},
collapsed: true,
items: [
{
type: "doc",
label: "MSM pre computation",
id: "icicle/golang-bindings/msm-pre-computation",
}
]
},
{
type: "doc",
label: "NTT",
id: "icicle/golang-bindings/ntt",
},
{
type: "doc",
label: "Vector operations",
id: "icicle/golang-bindings/vec-ops",
},
{
type: "doc",
label: "Multi GPU Support",
id: "icicle/golang-bindings/multi-gpu",
},
]
},
{
type: "category",
label: "Rust bindings",
link: {
type: `doc`,
id: "icicle/rust-bindings",
},
collapsed: true,
items: [
{
type: "category",
label: "MSM",
link: {
type: `doc`,
id: "icicle/rust-bindings/msm",
},
collapsed: true,
items: [
{
type: "doc",
label: "MSM pre computation",
id: "icicle/rust-bindings/msm-pre-computation",
}
]
},
{
type: "doc",
label: "NTT",
id: "icicle/rust-bindings/ntt",
},
{
type: "doc",
label: "Vector operations",
id: "icicle/rust-bindings/vec-ops",
},
{
type: "doc",
label: "Multi GPU Support",
id: "icicle/rust-bindings/multi-gpu",
},
],
},
{
type: "category",
label: "Primitives",
link: {
type: `doc`,
id: 'icicle/primitives/overview',
},
collapsed: true,
items: [
{
type: "doc",
label: "MSM",
id: "icicle/primitives/msm",
},
{
type: "doc",
label: "NTT",
id: "icicle/primitives/ntt",
},
{
type: "doc",
label: "Poseidon Hash",
id: "icicle/primitives/poseidon",
},
],
},
{
type: "doc",
label: "Multi GPU Support",
id: "icicle/multi-gpu",
},
{
type: "doc",
label: "Supporting additional curves",
id: "icicle/supporting-additional-curves",
},
{
type: "doc",
label: "Google Colab Instructions",
id: "icicle/colab-instructions",
},
]
},
{
type: "doc",
label: "ZK Containers",
id: "ZKContainers",
},
{
type: "doc",
label: "Ingonyama Grant program",
id: "grants",
},
{
type: "doc",
label: "Contributor guide",
id: "contributor-guide",
},
{
type: "category",
label: "Additional Resources",
collapsed: false,
items: [
{
type: "link",
label: "YouTube",
href: "https://www.youtube.com/@ingo_ZK"
},
{
type: "link",
label: "Ingonyama Blog",
href: "https://www.ingonyama.com/blog"
},
{
type: "link",
label: "Ingopedia",
href: "https://www.ingonyama.com/ingopedia"
},
{
href: 'https://github.com/ingonyama-zk',
type: "link",
label: 'GitHub',
}
]
}
],
};

59
docs/src/css/custom.css Normal file
View File

@@ -0,0 +1,59 @@
/**
* Any CSS included here will be global. The classic template
* bundles Infima by default. Infima is a CSS framework designed to
* work well for content-centric websites.
*/
/* You can override the default Infima variables here. */
:root {
--ifm-color-primary: #FFCB00;
--ifm-color-primary-dark: #FFCB00;
--ifm-color-primary-darker: #FFCB00;
--ifm-color-primary-darkest: #FFCB00;
--ifm-color-primary-light: #FFCB00;
--ifm-color-primary-lighter: #FFCB00;
--ifm-color-primary-lightest: #FFCB00;
--ifm-code-font-size: 95%;
}
/* For readability concerns, you should choose a lighter palette in dark mode. */
[data-theme='dark'] {
--ifm-color-primary: #FFCB00;
--ifm-color-primary-dark: #FFCB00;
--ifm-color-primary-darker:#FFCB00;
--ifm-color-primary-darkest: #FFCB00;
--ifm-color-primary-light:#FFCB00;
--ifm-color-primary-lighter: #FFCB00;
--ifm-color-primary-lightest: #FFCB00;
}
.docusaurus-highlight-code-line {
background-color: rgba(0, 0, 0, 0.1);
display: block;
margin: 0 calc(-1 * var(--ifm-pre-padding));
padding: 0 var(--ifm-pre-padding);
}
[data-theme='dark'] .docusaurus-highlight-code-line {
background-color: rgba(0, 0, 0, 0.3);
}
/* Mermaid elements must be changed to be visible in dark mode */
[data-theme='dark'] .mermaid .messageLine0, .messageLine1 {
filter: invert(51%) sepia(84%) saturate(405%) hue-rotate(21deg) brightness(94%) contrast(91%) !important;
}
/* NOTE Must be a separate specification from the above or it won't toggle off */
[data-theme='dark'] .mermaid .flowchart-link {
filter: invert(51%) sepia(84%) saturate(405%) hue-rotate(21deg) brightness(94%) contrast(91%) !important;
}
[data-theme='dark'] .mermaid .cluster-label {
filter: invert(51%) sepia(84%) saturate(405%) hue-rotate(21deg) brightness(94%) contrast(91%) !important;
}
[data-theme='dark'] .mermaid .messageText {
stroke:none !important; fill:white !important;
}
/* Our additions */
.anchor {
scroll-margin-top: 50pt;
}

0
docs/static/.nojekyll vendored Normal file
View File

BIN
docs/static/img/apilevels.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 103 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 204 KiB

BIN
docs/static/img/colab_change_runtime.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

BIN
docs/static/img/logo.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

BIN
docs/static/img/t4_gpu.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

23
examples/ZKContainer.md Normal file
View File

@@ -0,0 +1,23 @@
# ZKContainer
We recommend using [ZKContainer](https://ingonyama.com/blog/Immanuel-ZKDC), where we have already preinstalled all the required dependencies, to run Icicle examples.
To use our containers you will need [Docker](https://www.docker.com/) and [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html).
In each example directory, ZKContainer files are located in a subdirectory `.devcontainer`.
- File `Dockerfile` specifies how to build an image of a ZKContainer.
- File `devcontainer.json` enables running ZKContainer from Visual Studio Code.
## Running ZKContainer from shell
```sh
docker build -t icicle-example-poseidon -f .devcontainer/Dockerfile .
```
To run the example interactively, start the container
```sh
docker run -it --rm --gpus all -v .:/icicle-example icicle-example-poseidon
```
Inside the container, run the commands for building the library for whichever [build system](../README.md#build-systems) you choose to use.

View File

@@ -0,0 +1,25 @@
# Make sure NVIDIA Container Toolkit is installed on your host
# Use the specified base image
FROM nvidia/cuda:12.0.0-devel-ubuntu22.04
# Update and install dependencies
RUN apt-get update && apt-get install -y \
cmake \
curl \
build-essential \
git \
libboost-all-dev \
&& rm -rf /var/lib/apt/lists/*
# Clone Icicle from a GitHub repository
RUN git clone https://github.com/ingonyama-zk/icicle.git /opt/icicle
# Set the working directory in the container
WORKDIR /icicle-example
# Specify the default command for the container
CMD ["/bin/bash"]

View File

@@ -0,0 +1,21 @@
{
"name": "Icicle Examples: msm",
"build": {
"dockerfile": "Dockerfile"
},
"runArgs": [
"--gpus",
"all"
],
"postCreateCommand": [
"nvidia-smi"
],
"customizations": {
"vscode": {
"extensions": [
"ms-vscode.cmake-tools",
"ms-python.python"
]
}
}
}

View File

@@ -0,0 +1,25 @@
cmake_minimum_required(VERSION 3.18)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_STANDARD_REQUIRED TRUE)
set(CMAKE_CXX_STANDARD_REQUIRED TRUE)
if (${CMAKE_VERSION} VERSION_LESS "3.24.0")
set(CMAKE_CUDA_ARCHITECTURES ${CUDA_ARCH})
else()
set(CMAKE_CUDA_ARCHITECTURES native) # on 3.24+, on earlier it is ignored, and the target is not passed
endif ()
project(icicle LANGUAGES CUDA CXX)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")
set(CMAKE_CUDA_FLAGS_RELEASE "")
set(CMAKE_CUDA_FLAGS_DEBUG "${CMAKE_CUDA_FLAGS_DEBUG} -g -G -O0")
# change the path to your Icicle location
include_directories("../../../icicle")
add_executable(
example
example.cu
)
find_library(NVML_LIBRARY nvidia-ml PATHS /usr/local/cuda-12.0/targets/x86_64-linux/lib/stubs/ )
target_link_libraries(example ${NVML_LIBRARY})
set_target_properties(example PROPERTIES CUDA_SEPARABLE_COMPILATION ON)

View File

@@ -0,0 +1,52 @@
# Icicle example: Muli-Scalar Multiplication (MSM)
## Best-Practices
We recommend to run our examples in [ZK-containers](../../ZK-containers.md) to save your time and mental energy.
## Key-Takeaway
`Icicle` provides CUDA C++ template function `MSM` to accelerate [Multi-Scalar Multiplication](https://github.com/ingonyama-zk/ingopedia/blob/master/src/msm.md).
## Concise Usage Explanation
1. Select the curve
2. Include an MSM template
3. Configure MSM
4. Call the template
```c++
#define CURVE_ID 1
#include "icicle/appUtils/msm/msm.cu"
...
msm::MSMConfig config = {...};
...
msm::MSM<scalar_t, affine_t, projective_t>(scalars, points, size, config, &result);
```
In this example we use `BN254` curve (`CURVE_ID=1`). The function computes $result = \sum_{i=0}^{size-1} scalars[i] \cdot points[i]$, where input `points[]` use affine coordinates, and `result` uses projective coordinates.
**Parameters:**
The configuration is passed to the kernel as a structure of type `msm::MSMConfig`. Some of the most important fields are listed below:
- `are_scalars_on_device`, `are_points_on_device`, `are_results_on_device`: location of the data
- `is_async`: blocking vs. non-blocking kernel call
- `large_bucket_factor`: distinguishes between large bucket and normal bucket sizes. If there is a scalar distribution that is skewed heavily to a few values we can operate on those separately from the rest of the values. The ideal value here can vary by circuit (based on the distribution of scalars) but start with 10 and adjust to see if it improves performance.
## Running the example
- `cd` to your example directory
- compile with `./compile.sh`
- run with `./run.sh`
## What's in the example
1. Define the parameters of MSM
2. Generate random inputs on-host
3. Configure and execute MSM using on-host data
4. Copy inputs on-device
5. Configure and execute MSM using on-device data
6. Repeat the above steps for G2 points

9
examples/c++/msm/compile.sh Executable file
View File

@@ -0,0 +1,9 @@
#!/bin/bash
# Exit immediately on error
set -e
rm -rf build
mkdir -p build
cmake -S . -B build
cmake --build build

180
examples/c++/msm/example.cu Normal file
View File

@@ -0,0 +1,180 @@
#include <fstream>
#include <iostream>
#include <iomanip>
#define G2_DEFINED
#define CURVE_ID 1
// include MSM template
#include "appUtils/msm/msm.cu"
using namespace curve_config;
int main(int argc, char* argv[])
{
std::cout << "Icicle example: Muli-Scalar Multiplication (MSM)" << std::endl;
std::cout << "Example parameters" << std::endl;
int batch_size = 1;
std::cout << "Batch size: " << batch_size << std::endl;
unsigned msm_size = 1048576;
std::cout << "MSM size: " << msm_size << std::endl;
int N = batch_size * msm_size;
std::cout << "Part I: use G1 points" << std::endl;
std::cout << "Generating random inputs on-host" << std::endl;
scalar_t* scalars = new scalar_t[N];
affine_t* points = new affine_t[N];
projective_t result;
scalar_t::RandHostMany(scalars, N);
projective_t::RandHostManyAffine(points, N);
std::cout << "Using default MSM configuration with on-host inputs" << std::endl;
// auto config = msm::DefaultMSMConfig();
device_context::DeviceContext ctx = device_context::get_default_device_context();
msm::MSMConfig config = {
ctx, // ctx
0, // points_size
1, // precompute_factor
0, // c
0, // bitsize
10, // large_bucket_factor
1, // batch_size
false, // are_scalars_on_device
false, // are_scalars_montgomery_form
false, // are_points_on_device
false, // are_points_montgomery_form
false, // are_results_on_device
false, // is_big_triangle
false, // is_async
};
config.batch_size = batch_size;
std::cout << "Running MSM kernel with on-host inputs" << std::endl;
// Create two events to time the MSM kernel
cudaStream_t stream = config.ctx.stream;
cudaEvent_t start, stop;
float time;
cudaEventCreate(&start);
cudaEventCreate(&stop);
// Record the start event on the stream
cudaEventRecord(start, stream);
// Execute the MSM kernel
msm::MSM<scalar_t, affine_t, projective_t>(scalars, points, msm_size, config, &result);
// Record the stop event on the stream
cudaEventRecord(stop, stream);
// Wait for the stop event to complete
cudaEventSynchronize(stop);
// Calculate the elapsed time between the start and stop events
cudaEventElapsedTime(&time, start, stop);
// Destroy the events
cudaEventDestroy(start);
cudaEventDestroy(stop);
// Print the elapsed time
std::cout << "Kernel runtime: " << std::fixed << std::setprecision(3) << time * 1e-3 << " sec." << std::endl;
// Print the result
std::cout << projective_t::to_affine(result) << std::endl;
std::cout << "Copying inputs on-device" << std::endl;
scalar_t* scalars_d;
affine_t* points_d;
projective_t* result_d;
cudaMalloc(&scalars_d, sizeof(scalar_t) * N);
cudaMalloc(&points_d, sizeof(affine_t) * N);
cudaMalloc(&result_d, sizeof(projective_t));
cudaMemcpy(scalars_d, scalars, sizeof(scalar_t) * N, cudaMemcpyHostToDevice);
cudaMemcpy(points_d, points, sizeof(affine_t) * N, cudaMemcpyHostToDevice);
std::cout << "Reconfiguring MSM to use on-device inputs" << std::endl;
config.are_results_on_device = true;
config.are_scalars_on_device = true;
config.are_points_on_device = true;
std::cout << "Running MSM kernel with on-device inputs" << std::endl;
// Create two events to time the MSM kernel
cudaEventCreate(&start);
cudaEventCreate(&stop);
// Record the start event on the stream
cudaEventRecord(start, stream);
// Execute the MSM kernel
msm::MSM<scalar_t, affine_t, projective_t>(scalars_d, points_d, msm_size, config, result_d);
// Record the stop event on the stream
cudaEventRecord(stop, stream);
// Wait for the stop event to complete
cudaEventSynchronize(stop);
// Calculate the elapsed time between the start and stop events
cudaEventElapsedTime(&time, start, stop);
// Destroy the events
cudaEventDestroy(start);
cudaEventDestroy(stop);
// Print the elapsed time
std::cout << "Kernel runtime: " << std::fixed << std::setprecision(3) << time * 1e-3 << " sec." << std::endl;
// Copy the result back to the host
cudaMemcpy(&result, result_d, sizeof(projective_t), cudaMemcpyDeviceToHost);
// Print the result
std::cout << projective_t::to_affine(result) << std::endl;
// Free the device memory
cudaFree(scalars_d);
cudaFree(points_d);
cudaFree(result_d);
// Free the host memory, keep scalars for G2 example
delete[] points;
std::cout << "Part II: use G2 points" << std::endl;
std::cout << "Generating random inputs on-host" << std::endl;
// use the same scalars
g2_affine_t* g2_points = new g2_affine_t[N];
g2_projective_t::RandHostManyAffine(g2_points, N);
std::cout << "Reconfiguring MSM to use on-host inputs" << std::endl;
config.are_results_on_device = false;
config.are_scalars_on_device = false;
config.are_points_on_device = false;
g2_projective_t g2_result;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, stream);
msm::MSM<scalar_t, g2_affine_t, g2_projective_t>(scalars, g2_points, msm_size, config, &g2_result);
cudaEventRecord(stop, stream);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&time, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
std::cout << "Kernel runtime: " << std::fixed << std::setprecision(3) << time * 1e-3 << " sec." << std::endl;
std::cout << g2_projective_t::to_affine(g2_result) << std::endl;
std::cout << "Copying inputs on-device" << std::endl;
g2_affine_t* g2_points_d;
g2_projective_t* g2_result_d;
cudaMalloc(&scalars_d, sizeof(scalar_t) * N);
cudaMalloc(&g2_points_d, sizeof(g2_affine_t) * N);
cudaMalloc(&g2_result_d, sizeof(g2_projective_t));
cudaMemcpy(scalars_d, scalars, sizeof(scalar_t) * N, cudaMemcpyHostToDevice);
cudaMemcpy(g2_points_d, g2_points, sizeof(g2_affine_t) * N, cudaMemcpyHostToDevice);
std::cout << "Reconfiguring MSM to use on-device inputs" << std::endl;
config.are_results_on_device = true;
config.are_scalars_on_device = true;
config.are_points_on_device = true;
std::cout << "Running MSM kernel with on-device inputs" << std::endl;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, stream);
msm::MSM<scalar_t, g2_affine_t, g2_projective_t>(scalars_d, g2_points_d, msm_size, config, g2_result_d);
cudaEventRecord(stop, stream);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&time, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
std::cout << "Kernel runtime: " << std::fixed << std::setprecision(3) << time * 1e-3 << " sec." << std::endl;
cudaMemcpy(&g2_result, g2_result_d, sizeof(g2_projective_t), cudaMemcpyDeviceToHost);
std::cout << g2_projective_t::to_affine(g2_result) << std::endl;
cudaFree(scalars_d);
cudaFree(g2_points_d);
cudaFree(g2_result_d);
delete[] g2_points;
delete[] scalars;
cudaStreamDestroy(stream);
return 0;
}

2
examples/c++/msm/run.sh Executable file
View File

@@ -0,0 +1,2 @@
#!/bin/bash
./build/example

View File

@@ -0,0 +1,25 @@
cmake_minimum_required(VERSION 3.18)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_STANDARD_REQUIRED TRUE)
set(CMAKE_CXX_STANDARD_REQUIRED TRUE)
if (${CMAKE_VERSION} VERSION_LESS "3.24.0")
set(CMAKE_CUDA_ARCHITECTURES ${CUDA_ARCH})
else()
set(CMAKE_CUDA_ARCHITECTURES native) # on 3.24+, on earlier it is ignored, and the target is not passed
endif ()
project(icicle LANGUAGES CUDA CXX)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")
set(CMAKE_CUDA_FLAGS_RELEASE "")
set(CMAKE_CUDA_FLAGS_DEBUG "${CMAKE_CUDA_FLAGS_DEBUG} -g -G -O0")
# change the path to your Icicle location
include_directories("../../../icicle")
add_executable(
example
example.cu
)
find_library(NVML_LIBRARY nvidia-ml PATHS /usr/local/cuda/targets/x86_64-linux/lib/stubs/ )
target_link_libraries(example ${NVML_LIBRARY})
set_target_properties(example PROPERTIES CUDA_SEPARABLE_COMPILATION ON)

View File

@@ -0,0 +1,52 @@
# Icicle example: using multiple GPU to hash large dataset
## Best-Practices
This example builds on [single GPU Poseidon example](../poseidon/README.md) so we recommend to run it first.
## Key-Takeaway
Use `device_context::DeviceContext` variable to select GPU to use.
Use C++ threads to compute `Icicle` primitives on different GPUs in parallel.
## Concise Usage Explanation
1. Include c++ threads
```c++
#include <thread>
```
2. Define a __thread function__. Importantly, device context `ctx` will hold the GPU id.
```c++
void threadPoseidon(device_context::DeviceContext ctx, ...) {...}
```
3. Initialize device contexts for different GPUs
```c++
device_context::DeviceContext ctx0 = device_context::get_default_device_context();
ctx0.device_id=0;
device_context::DeviceContext ctx1 = device_context::get_default_device_context();
ctx1.device_id=1;
```
4. Finally, spawn the threads and wait for their completion
```c++
std::thread thread0(threadPoseidon, ctx0, ...);
std::thread thread1(threadPoseidon, ctx1, ...);
thread0.join();
thread1.join();
```
## What's in the example
This is a **toy** example executing the first step of the Filecoin's Pre-Commit 2 phase: compute $2^{30}$ Poseison hashes for each column of $11 \times 2^{30}$ matrix.
1. Define the size of the example: $2^{30}$ won't fit on a typical machine, so we partition the problem into `nof_partitions`
2. Hash two partitions in parallel on two GPUs
3. Hash two partitions in series on one GPU
4. Compare execution times

View File

@@ -0,0 +1,9 @@
#!/bin/bash
# Exit immediately on error
set -e
rm -rf build
mkdir -p build
cmake -S . -B build
cmake --build build

View File

@@ -0,0 +1,148 @@
#include <iostream>
#include <thread>
#include <chrono>
#include <nvml.h>
// select the curve
#define CURVE_ID 2
#include "appUtils/poseidon/poseidon.cu"
#include "utils/error_handler.cuh"
using namespace poseidon;
using namespace curve_config;
void checkCudaError(cudaError_t error) {
if (error != cudaSuccess) {
std::cerr << "CUDA error: " << cudaGetErrorString(error) << std::endl;
// Handle the error, e.g., exit the program or throw an exception.
}
}
// these global constants go into template calls
const int size_col = 11;
// this function executes the Poseidon thread
void threadPoseidon(device_context::DeviceContext ctx, unsigned size_partition, scalar_t * layers, scalar_t * column_hashes, PoseidonConstants<scalar_t> * constants) {
cudaError_t err_result = CHK_STICKY(cudaSetDevice(ctx.device_id));
if (err_result != cudaSuccess) {
std::cerr << "CUDA error: " << cudaGetErrorString(err_result) << std::endl;
return;
}
// CHK_IF_RETURN(); I can't use it in a standard thread function
PoseidonConfig column_config = {
ctx, // ctx
false, // are_inputes_on_device
false, // are_outputs_on_device
false, // input_is_a_state
false, // aligned
false, // loop_state
false, // is_async
};
cudaError_t err = poseidon_hash<scalar_t, size_col+1>(layers, column_hashes, (size_t) size_partition, *constants, column_config);
checkCudaError(err);
}
using FpMilliseconds = std::chrono::duration<float, std::chrono::milliseconds::period>;
#define START_TIMER(timer) auto timer##_start = std::chrono::high_resolution_clock::now();
#define END_TIMER(timer, msg) printf("%s: %.0f ms\n", msg, FpMilliseconds(std::chrono::high_resolution_clock::now() - timer##_start).count());
#define CHECK_ALLOC(ptr) if ((ptr) == nullptr) { \
std::cerr << "Memory allocation for '" #ptr "' failed." << std::endl; \
exit(EXIT_FAILURE); \
}
int main() {
const unsigned size_row = (1<<30);
const unsigned nof_partitions = 64;
const unsigned size_partition = size_row / nof_partitions;
// layers is allocated only for one partition, need to reuse for different partitions
const uint32_t size_layers = size_col * size_partition;
nvmlInit();
unsigned int deviceCount;
nvmlDeviceGetCount(&deviceCount);
std::cout << "Available GPUs: " << deviceCount << std::endl;
for (unsigned int i = 0; i < deviceCount; ++i) {
nvmlDevice_t device;
nvmlMemory_t memory;
char name[NVML_DEVICE_NAME_BUFFER_SIZE];
nvmlDeviceGetHandleByIndex(i, &device);
nvmlDeviceGetName(device, name, NVML_DEVICE_NAME_BUFFER_SIZE);
nvmlDeviceGetMemoryInfo(device, &memory);
std::cout << "Device ID: " << i << ", Type: " << name << ", Memory Total/Free (MiB) " << memory.total/1024/1024 << "/" << memory.free/1024/1024 << std::endl;
}
const unsigned memory_partition = sizeof(scalar_t)*(size_col+1)*size_partition/1024/1024;
std::cout << "Required Memory (MiB) " << memory_partition << std::endl;
//===============================================================================
// Key: multiple devices are supported by device context
//===============================================================================
device_context::DeviceContext ctx0 = device_context::get_default_device_context();
ctx0.device_id=0;
device_context::DeviceContext ctx1 = device_context::get_default_device_context();
ctx1.device_id=1;
std::cout << "Allocate and initialize the memory for layers and hashes" << std::endl;
scalar_t* layers0 = static_cast<scalar_t*>(malloc(size_layers * sizeof(scalar_t)));
CHECK_ALLOC(layers0);
scalar_t s = scalar_t::zero();
for (unsigned i = 0; i < size_col*size_partition ; i++) {
layers0[i] = s;
s = s + scalar_t::one();
}
scalar_t* layers1 = static_cast<scalar_t*>(malloc(size_layers * sizeof(scalar_t)));
CHECK_ALLOC(layers1);
s = scalar_t::zero() + scalar_t::one();
for (unsigned i = 0; i < size_col*size_partition ; i++) {
layers1[i] = s;
s = s + scalar_t::one();
}
scalar_t* column_hash0 = static_cast<scalar_t*>(malloc(size_partition * sizeof(scalar_t)));
CHECK_ALLOC(column_hash0);
scalar_t* column_hash1 = static_cast<scalar_t*>(malloc(size_partition * sizeof(scalar_t)));
CHECK_ALLOC(column_hash1);
PoseidonConstants<scalar_t> column_constants0, column_constants1;
init_optimized_poseidon_constants<scalar_t>(size_col, ctx0, &column_constants0);
cudaError_t err_result = CHK_STICKY(cudaSetDevice(ctx1.device_id));
if (err_result != cudaSuccess) {
std::cerr << "CUDA error: " << cudaGetErrorString(err_result) << std::endl;
return;
}
init_optimized_poseidon_constants<scalar_t>(size_col, ctx1, &column_constants1);
std::cout << "Parallel execution of Poseidon threads" << std::endl;
START_TIMER(parallel);
std::thread thread0(threadPoseidon, ctx0, size_partition, layers0, column_hash0, &column_constants0);
std::thread thread1(threadPoseidon, ctx1, size_partition, layers1, column_hash1, &column_constants1);
// Wait for the threads to finish
thread0.join();
thread1.join();
END_TIMER(parallel,"2 GPUs");
std::cout << "Output Data from Thread 0: ";
std::cout << column_hash0[0] << std::endl;
std::cout << "Output Data from Thread 1: ";
std::cout << column_hash1[0] << std::endl;
std::cout << "Sequential execution of Poseidon threads" << std::endl;
START_TIMER(sequential);
std::thread thread2(threadPoseidon, ctx0, size_partition, layers0, column_hash0, &column_constants0);
thread2.join();
std::thread thread3(threadPoseidon, ctx0, size_partition, layers1, column_hash1, &column_constants0);
thread3.join();
END_TIMER(sequential,"1 GPU");
std::cout << "Output Data from Thread 2: ";
std::cout << column_hash0[0] << std::endl;
std::cout << "Output Data from Thread 3: ";
std::cout << column_hash1[0] << std::endl;
nvmlShutdown();
return 0;
}

View File

@@ -0,0 +1,2 @@
#!/bin/bash
./build/example

View File

@@ -0,0 +1,23 @@
# Make sure NVIDIA Container Toolkit is installed on your host
# Use NVIDIA base image
FROM nvidia/cuda:12.2.0-devel-ubuntu22.04
# Update and install dependencies
RUN apt-get update && apt-get install -y \
nsight-systems-12.2 \
cmake \
protobuf-compiler \
curl \
build-essential \
git \
&& rm -rf /var/lib/apt/lists/*
# Clone Icicle from a GitHub repository
RUN git clone https://github.com/ingonyama-zk/icicle.git /icicle
# Set the working directory in the container
WORKDIR /icicle-example
# Specify the default command for the container
CMD ["/bin/bash"]

View File

@@ -0,0 +1,24 @@
{
"name": "Icicle Examples - Multiply",
"build": {
"dockerfile": "Dockerfile"
},
"workspaceMount": "source=${localWorkspaceFolder}/.,target=/icicle-example,type=bind",
"workspaceFolder": "/icicle-example",
"runArgs": [
"--gpus",
"all"
],
"postCreateCommand": [
"nvidia-smi"
],
"customizations": {
"vscode": {
"extensions": [
"ms-vscode.cmake-tools",
"ms-azuretools.vscode-docker",
"ms-vscode.cpptools-extension-pack"
]
}
}
}

View File

@@ -0,0 +1,25 @@
cmake_minimum_required(VERSION 3.18)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_STANDARD_REQUIRED TRUE)
set(CMAKE_CXX_STANDARD_REQUIRED TRUE)
if (${CMAKE_VERSION} VERSION_LESS "3.24.0")
set(CMAKE_CUDA_ARCHITECTURES ${CUDA_ARCH})
else()
set(CMAKE_CUDA_ARCHITECTURES native) # on 3.24+, on earlier it is ignored, and the target is not passed
endif ()
project(icicle LANGUAGES CUDA CXX)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")
set(CMAKE_CUDA_FLAGS_RELEASE "")
set(CMAKE_CUDA_FLAGS_DEBUG "${CMAKE_CUDA_FLAGS_DEBUG} -g -G -O0")
# change the path to your Icicle location
include_directories("../../../icicle")
add_executable(
example
example.cu
)
find_library(NVML_LIBRARY nvidia-ml PATHS /usr/local/cuda/targets/x86_64-linux/lib/stubs/ )
target_link_libraries(example ${NVML_LIBRARY})
set_target_properties(example PROPERTIES CUDA_SEPARABLE_COMPILATION ON)

View File

@@ -0,0 +1,41 @@
# Icicle example: Multiplication
## Best-Practices
We recommend to run our examples in [ZK-containers](../../ZK-containers.md) to save your time and mental energy.
## Key-Takeaway
`Icicle` accelerates multiplication operation `*` using [Karatsuba algorithm](https://en.wikipedia.org/wiki/Karatsuba_algorithm)
## Concise Usage Explanation
Define a `CURVE_ID` and include curve configuration header:
```c++
#define CURVE_ID 1
#include "curves/curve_config.cuh"
```
The values of `CURVE_ID` for different curves are in the above header. Multiplication is accelerated both for field scalars and point fields.
```c++
using namespace curve_config;
scalar_t a;
point_field_t b;
```
## Running the example
- `cd` to your example directory
- compile with `./compile.sh`
- run with `./run.sh`
## What's in the example
1. Define the parameters for the example such as vector size
2. Generate random vectors on-host
3. Copy them on-device
4. Execute element-wise vector multiplication on-device
5. Copy results on-host

View File

@@ -0,0 +1,9 @@
#!/bin/bash
# Exit immediately on error
set -e
rm -rf build
mkdir -p build
cmake -S . -B build
cmake --build build

Some files were not shown because too many files have changed in this diff Show More