Commit Graph

89 Commits

Author SHA1 Message Date
release-bot
76a82bf88e Bump rust crates' version
icicle-babybear@2.3.0
icicle-bls12-377@2.3.0
icicle-bls12-381@2.3.0
icicle-bn254@2.3.0
icicle-bw6-761@2.3.0
icicle-core@2.3.0
icicle-cuda-runtime@2.3.0
icicle-grumpkin@2.3.0
icicle-hash@2.3.0
icicle-stark252@2.3.0

Generated by cargo-workspaces
2024-05-17 04:42:17 +00:00
Vlad
b8310d577e Feat/vlad/poseidon go binding (#513) 2024-05-17 07:20:15 +03:00
yshekel
9c1afe8a44 Polynomial API views replaced by evaluation on rou domain (#514)
- removed poly API to access view of evaluations. This is a problematic API since it cannot handle small domains and for large domains requires the polynomial to use more memory than need to.
- added evaluate_on_rou_domain() API instead that supports any domain size (powers of two size).
- the new API can compute to HOST or DEVICE memory
- Rust wrapper for evaluate_on_rou_domain()
- updated documentation: overview and Rust wrappers
- faster division by vanishing poly for common case where numerator is 2N and vanishing poly is of degree N.
- allow division a/b where deg(a)<deg(b) instead of throwing an error.
2024-05-15 14:06:23 +03:00
release-bot
940b283c47 Bump rust crates' version
icicle-babybear@2.2.0
icicle-bls12-377@2.2.0
icicle-bls12-381@2.2.0
icicle-bn254@2.2.0
icicle-bw6-761@2.2.0
icicle-core@2.2.0
icicle-cuda-runtime@2.2.0
icicle-grumpkin@2.2.0
icicle-hash@2.2.0
icicle-stark252@2.2.0

Generated by cargo-workspaces
2024-05-09 12:27:17 +00:00
ChickenLover
9da52bc09f Feat/roman/poseidon2 (#510)
# This PR

1. Adds C++ API
2. Renames a lot of API functions
3. Adds inplace poseidon2
4. Makes input const at all poseidon functions
5. Adds benchmark for poseidon2
2024-05-09 19:19:55 +07:00
VitaliiH
49079d0d2a rust ecntt hotfix (#509)
## Describe the changes

This PR fixes Rust ECNTT benches and tests


---------

Co-authored-by: VitaliiH <Vitaliy@ingo>
2024-05-09 11:21:21 +03:00
ChickenLover
094683d291 Feat/roman/poseidon2 (#507)
This PR adds support for poseidon2 permutation function as described in
https://eprint.iacr.org/2023/323.pdf

Reference implementations used (and compared against):
https://github.com/HorizenLabs/poseidon2/tree/main
https://github.com/Plonky3/Plonky3/tree/main

Tasks:

- [x] Remove commented code and prints
- [ ] Add doc-comments to functions and structs
- [x] Fix possible issue with Plonky3 imports
- [x] Update NTT/Plonky3 test
- [x] Add Plonky3-bn254 test (impossible)
2024-05-09 15:13:43 +07:00
nonam3e
c30e333819 keccak docs (#508)
This PR adds keccak docs

---------

Co-authored-by: Leon Hibnik <107353745+LeonHibnik@users.noreply.github.com>
2024-05-08 23:18:59 +03:00
Jeremy Felder
14997566ff [FIX]: Fix releasing device set on host thread during multigpu call (#501)
## Describe the changes

This PR fixes an issue when `RunOnDevice` is called for multi-gpu while
other goroutines calling device operations are run outside of
`RunOnDevice`. The issue comes from setting a device other than the
default device (device 0) on a host thread within `RunOnDevice` and not
unsetting that host threads device when `RunOnDevice` finishes.

When `RunOnDevice` locks a host thread to ensure that all other calls in
the go routine are on the same device, it never unsets that thread’s
device. Once the thread is unlocked, other go routines can get scheduled
to it but it still has the device set to whatever it was before while it
was locked so its possible that the following sequence happens:

1. NTT domain is initialized on thread 2 via a goroutine on device 0
2. MSM multiGPU test runs and is locked on thread 3 setting its device
to 1
3. Other tests run concurrently on threads other than 3 (since it is
locked)
4. MSM multiGPU test finishes and release thread 3 back to the pool but
its device is still 1
5. NTT test runs and is assigned to thread 3 --> this will fail because
the thread’s device wasn’t released back

We really only want to set a thread's device while the thread is locked.
But once we unlock a thread, it’s device should return to whatever it
was set at originally. In theory, it should always be 0 if `SetDevice`
is never used outside of `RunOnDevice` - which it shouldn’t be in most 
situations
2024-05-08 14:07:29 +03:00
Jeremy Felder
6134cfe177 [DOCS]: Tidy up docs (#502)
## Describe the changes

This PR tidies up docs and updates golang build instructions
2024-05-06 15:35:19 +03:00
VitaliiH
34f0212c0d rust classic benches with Criterion for ecntt/msm/ntt (#499)
Rust idiomatic benches for EC NTT, NTT, MSM
2024-05-05 10:28:41 +02:00
release-bot
f6758f3447 Bump rust crates' version
icicle-babybear@2.1.0
icicle-bls12-377@2.1.0
icicle-bls12-381@2.1.0
icicle-bn254@2.1.0
icicle-bw6-761@2.1.0
icicle-core@2.1.0
icicle-cuda-runtime@2.1.0
icicle-grumpkin@2.1.0
icicle-hash@2.1.0
icicle-stark252@2.1.0

Generated by cargo-workspaces
2024-05-01 20:11:42 +00:00
nonam3e
e2ad621f97 Nonam3e/golang/keccak (#496)
## Describe the changes

This PR adds keccak bindings + passes cfg as reference in keccak cuda functions
2024-05-01 14:08:33 +03:00
PatStiles
bdc3da98d6 FEAT(stark252 field): Adds Stark252 curve (#494)
## Describe the changes

Adds support for the stark252 base field.
2024-05-01 14:08:05 +03:00
yshekel
36e288c1fa fix: bug regarding MixedRadix coset (I)NTT for NM/MN ordering (#497)
The bug is in how twiddles array is indexed when multiplied by a mixed
(M) vector to implement (I)NTT on cosets.
The fix is to use the DIF-digit-reverse to compute the index of the element in the
natural (N) vector that moved to index 'i' in the M vector. This is
emulating a DIT-digit-reverse (which is mixing like a DIF-compute)
reorder of the twiddles array and element-wise multiplication without
reordering the twiddles memory.
2024-04-25 18:09:27 +03:00
nonam3e
f8d15e2613 update imports in golang bindings (#498)
## Describe the changes

This PR updates imports in golang bindings to the v2 version
2024-04-25 03:46:14 +07:00
release-bot
14b39b57cc Bump rust crates' version
icicle-babybear@2.0.1
icicle-bls12-377@2.0.1
icicle-bls12-381@2.0.1
icicle-bn254@2.0.1
icicle-bw6-761@2.0.1
icicle-core@2.0.1
icicle-cuda-runtime@2.0.1
icicle-grumpkin@2.0.1
icicle-hash@2.0.1

Generated by cargo-workspaces
2024-04-24 07:13:05 +00:00
release-bot
ff374fcac7 Bump rust crates' version
icicle-babybear@2.0.0
icicle-bls12-377@2.0.0
icicle-bls12-381@2.0.0
icicle-bn254@2.0.0
icicle-bw6-761@2.0.0
icicle-core@2.0.0
icicle-cuda-runtime@2.0.0
icicle-grumpkin@2.0.0
icicle-hash@2.0.0

Generated by cargo-workspaces
2024-04-23 02:30:18 +00:00
ChickenLover
7265d18d48 ICICLE V2 Release (#492)
This PR introduces major updates for ICICLE Core, Rust and Golang
bindings

---------

Co-authored-by: Yuval Shekel <yshekel@gmail.com>
Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
Co-authored-by: Otsar <122266060+Otsar-Raikou@users.noreply.github.com>
Co-authored-by: VitaliiH <vhnatyk@gmail.com>
Co-authored-by: release-bot <release-bot@ingonyama.com>
Co-authored-by: Stas <spolonsky@icloud.com>
Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
Co-authored-by: ImmanuelSegol <3ditds@gmail.com>
Co-authored-by: JimmyHongjichuan <45908291+JimmyHongjichuan@users.noreply.github.com>
Co-authored-by: pierre <pierreuu@gmail.com>
Co-authored-by: Leon Hibnik <107353745+LeonHibnik@users.noreply.github.com>
Co-authored-by: nonam3e <timur@ingonyama.com>
Co-authored-by: Vlad <88586482+vladfdp@users.noreply.github.com>
Co-authored-by: LeonHibnik <leon@ingonyama.com>
Co-authored-by: nonam3e <71525212+nonam3e@users.noreply.github.com>
Co-authored-by: vladfdp <vlad.heintz@gmail.com>
2024-04-23 05:26:40 +03:00
release-bot
a1dc0539ce Bump rust crates' version
icicle-bls12-377@1.10.1
icicle-bls12-381@1.10.1
icicle-bn254@1.10.1
icicle-bw6-761@1.10.1
icicle-core@1.10.1
icicle-cuda-runtime@1.10.1
icicle-grumpkin@1.10.1

Generated by cargo-workspaces
2024-04-11 07:56:32 +00:00
release-bot
8498a962f9 Bump rust crates' version
icicle-bls12-377@1.10.0
icicle-bls12-381@1.10.0
icicle-bn254@1.10.0
icicle-bw6-761@1.10.0
icicle-core@1.10.0
icicle-cuda-runtime@1.10.0
icicle-grumpkin@1.10.0

Generated by cargo-workspaces
2024-04-09 10:02:34 +00:00
Leon Hibnik
a7b0dc40c1 [FEAT] ReleaseDomain API (#465)
## Describe the changes

This PR adds a NTT ReleaseDomain API in Golang and Rust

## Linked Issues

Resolves #

---------

Co-authored-by: Yuval Shekel <yshekel@gmail.com>
2024-04-09 12:58:19 +03:00
Vlad
4a35eece51 transpose kernel in vec_ops and rust binding (#462)
## Describe the changes

This PR adds an extern C link to the transpose kernel, now in
vec_ops.cu.
Also Rust binding, and I updated the test check_ntt_batch to use the new
transpose function.
The test passes.

## Linked Issues

Resolves #

---------

Co-authored-by: LeonHibnik <leon@ingonyama.com>
2024-04-09 08:47:33 +03:00
VitaliiH
4c9b3c00a5 Devmode to Reduce compilation time (including G2 and ECNTT) (#395)
devmode to reduce compilation time
2024-04-09 06:09:04 +02:00
Jeremy Felder
c6719167ce [FEAT]: golang device slice ranges (#463)
## Describe the changes

This PR adds the capability to slice a DeviceSlice, allowing portions of
data that are already on the device to be reused.

Additionally, this PR removes the need for a HostSlice underlying type
to implement a Size function and uses unsafe.Sizeof instead. This
together with #407 will allow direct usage of gnark-crypto types with
HostSlice without the need for converting to ICICLE types

---------

Co-authored-by: nonam3e <timur@ingonyama.com>
2024-04-08 19:42:03 +03:00
Leon Hibnik
cd3769d6b7 Fix Golang TestNttDeviceAsync (#461)
## Describe the changes

This PR fixes TestNttDeviceAsync by adding a missing call to initDomain

## Linked Issues

Resolves #
2024-04-08 17:47:10 +03:00
DmytroTym
b93b1d0aaf NTT inplace in Rust (#453)
## Describe the changes

Due to Rust's ownership rules, we can't run NTT inplace using the
[`ntt`](https://github.com/ingonyama-zk/icicle/blob/v1.9.1/wrappers/rust/icicle-core/src/ntt/mod.rs#L139)
function. Which is why we saw a need to add a separate function a couple
of times.

Incidentally an issue with radix-2 NTT was found when ran inplace,
`__syncthreads()` was used in reverse order kernel as if it was a global
barrier for all blocks and not block-local one. Thus data race happened
that is fixed by this PR.
2024-04-08 10:04:04 +03:00
Yuval Shekel
9c9311bee0 golang multi-device MSM test temporarily disabled due to issues related to golang tests env 2024-04-04 23:23:18 +03:00
release-bot
25ac705c3b Bump rust crates' version
icicle-bls12-377@1.9.1
icicle-bls12-381@1.9.1
icicle-bn254@1.9.1
icicle-bw6-761@1.9.1
icicle-core@1.9.1
icicle-cuda-runtime@1.9.1
icicle-grumpkin@1.9.1

Generated by cargo-workspaces
2024-03-27 19:00:07 +00:00
Yuval Shekel
919ff42f49 fix: NTT input is const 2024-03-24 16:26:10 +02:00
release-bot
a1ff989740 Bump rust crates' version
icicle-bls12-377@1.9.0
icicle-bls12-381@1.9.0
icicle-bn254@1.9.0
icicle-bw6-761@1.9.0
icicle-core@1.9.0
icicle-cuda-runtime@1.9.0
icicle-grumpkin@1.9.0

Generated by cargo-workspaces
2024-03-21 07:11:47 +00:00
Jeremy Felder
db4c07dcaf Golang bindings for ECNTT (#433) 2024-03-21 09:04:00 +02:00
Yuval Shekel
7293058246 fix: (golang) MSM multi device test reset to original device after test is done 2024-03-20 16:27:11 +02:00
Yuval Shekel
03136f1074 fix: (golang) add missing NttAlgorithm field in NTTConfig 2024-03-20 16:27:11 +02:00
Yuval Shekel
3ef0d0c66e MSM scalars and points params are const
- This is required to be able to compute MSM on polynomial coefficients that are accessible by const only.
2024-03-20 16:27:11 +02:00
release-bot
b6b5011a47 Bump rust crates' version
icicle-bls12-377@1.8.0
icicle-bls12-381@1.8.0
icicle-bn254@1.8.0
icicle-bw6-761@1.8.0
icicle-core@1.8.0
icicle-cuda-runtime@1.8.0
icicle-grumpkin@1.8.0

Generated by cargo-workspaces
2024-03-13 21:38:17 +00:00
DmytroTym
7ac463c3d9 MSM pre-computation (#427)
## Brief description

This PR adds pre-computation to the MSM, for some theory see
[this](https://youtu.be/KAWlySN7Hm8?si=XeR-htjbnK_ySbUo&t=1734) timecode
of Niall Emmart's talk.
In terms of public APIs, one method is added. It does the
pre-computation on-device leaving resulting data on-device as well. No
extra structures are added, only `precompute_factor` from `MSMConfig` is
now activated.

## Performance

While performance gains are for now often limited by our inflexibility
in choice of `c` (for example, very large MSMs get basically no speedup
from pre-compute because currently `c` cannot be larger than 16),
there's still a number of MSM sizes which get noticeable improvement:

| Pre-computation factor | bn254 size `2^20` MSM, ms. | bn254 size
`2^12` MSM, size `2^10` batch, ms. | bls12-381 size `2^20` MSM, ms. |
bls12-381 size `2^12` MSM, size `2^10` batch, ms. |
| ------------- | ------------- | ------------- | ------------- |
------------- |
| 1  | 14.1  | 82.8  | 25.5  | 136.7  |
| 2  | 11.8  | 76.6  | 20.3  | 123.8  |
| 4  | 10.9  | 73.8  | 18.1  | 117.8  |
| 8  | 10.6  | 73.7  | 17.2  | 116.0  |

Here for example pre-computation factor = 4 means that alongside each
original base point, we pre-compute and pass into the MSM 3 of its
"shifted" versions. Pre-computation factor = 1 means no pre-computation.
GPU used for benchmarks is a 3090Ti.

## TODOs and open questions

- Golang APIs are missing;
- I mentioned that to utilise pre-compute to its full potential we need
arbitrary choice of `c`. One issue with this is that pre-compute will
become dependent on `c`. For now this is not the case as `c` can only be
a power of 2 and powers of 2 can always share the same pre-computation.
So apparently we need to make `c` a parameter of the precompute function
to future-proof it from a breaking change. This is pretty unnatural and
counterintuitive as `c` is typically chosen in runtime after pre-compute
is done but I don't really see another way, pls let me know if you do.
UPD: `c` is added into pre-compute function, for now it's unused and
it's documented how it will change in the future.

Resolves https://github.com/ingonyama-zk/icicle/issues/147
Co-authored with @ChickenLover

---------

Co-authored-by: ChickenLover <romangg81@gmail.com>
Co-authored-by: nonam3e <timur@ingonyama.com>
Co-authored-by: nonam3e <71525212+nonam3e@users.noreply.github.com>
Co-authored-by: LeonHibnik <leon@ingonyama.com>
2024-03-13 23:25:16 +02:00
HadarIngonyama
287f53ff16 NTT columns batch (#424)
This PR adds the columns batch feature - enabling batch NTT computation
to be performed directly on the columns of a matrix without having to
transpose it beforehand, as requested in issue #264.

Also some small fixes to the reordering kernels were added and some
unnecessary parameters were removes from functions interfaces.

---------

Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
2024-03-13 18:46:47 +02:00
Jeremy Felder
89082fb561 FEAT: MultiGPU for golang bindings (#417)
## Describe the changes

This PR adds multi gpu support in the golang bindings.

Tha main changes are to DeviceSlice which now includes a `deviceId`
attribute specifying which device the underlying data resides on and
checks for correct deviceId and current device when using DeviceSlices
in any operation.

In Go, most concurrency can be done via Goroutines (described as
lightweight threads - in reality, more of a threadpool manager),
however, there is no guarantee that a goroutine stays on a specific host
thread. Therefore, a function `RunOnDevice` was added to the
cuda_runtime package which locks a goroutine into a specific host
thread, sets a current GPU device, runs a provided function, and unlocks
the goroutine from the host thread after the provided function finishes.
While the goroutine is locked to the hsot thread, the Go runtime will
not assign other goroutines to that host thread
2024-03-13 16:19:45 +02:00
DmytroTym
0e84fb4b76 feat: add warmup for CudaStream (#422)
## Describe the changes

Add a non-blocking `warmup` function to `CudaStream` 

> when you run the benchmark (e.g. the msm example you have) the first
instance is always slow, with a constant overhead of 200~300ms cuda
stream warmup. and I want to get rid of that in my application by
warming it up in parallel while my host do something else.
2024-03-07 19:11:34 +02:00
DmytroTym
4a65758408 Merge branch 'main' into feat/warmup 2024-03-06 22:08:45 +02:00
Jeremy Felder
1abd2ef9c9 Bump rust crates' version
icicle-bls12-377@1.7.0
icicle-bls12-381@1.7.0
icicle-bn254@1.7.0
icicle-bw6-761@1.7.0
icicle-core@1.7.0
icicle-cuda-runtime@1.7.0
icicle-grumpkin@1.7.0

Generated by cargo-workspaces
2024-03-06 22:05:10 +02:00
Jeremy Felder
9d402df0cf Release flow CI (#423)
## Describe the changes

This PR:
- Moves common crate attributes to the workspace Cargo.toml. 
- Adds a manual release flow for bumping, tagging, and draft release
2024-03-06 21:41:48 +02:00
DmytroTym
7185657ff7 Warmup function 2024-03-06 18:13:23 +02:00
Alex Xiong
b22aa02e91 fix: cargo fmt 2024-03-06 13:10:12 +00:00
Alex Xiong
b108c71bdd feat: add rust api for cudaFreeAsync 2024-03-06 12:44:43 +00:00
ChickenLover
9fc083916d Small features (#415)
This PR is a compilation of small improvements

 - Lock bindgen version for `icicle-cuda-runtime`
- Add an error message when trying to build on Mac (or any non
windows/linux machine)
 - Add documentation and template files for adding new curve
 - Add documentation on _params.cuh contents
- Add the script to bump all the rust crates versions to the same
version

Resolves #313
2024-03-06 13:48:34 +02:00
Jeremy Felder
1c1b2bab64 CI: move to language specific flows (#398)
Updates the CI to:
- run per supported language
- conditional run logic
- pipelined jobs for failing fast
- additional parallelization
- run golang build on windows
- reuse the check-changed-files workflow
2024-02-28 18:09:03 +02:00
Jeremy Felder
656dd18cf8 Add vector operations for golang bindings (#399) 2024-02-28 18:09:03 +02:00
Jeremy Felder
40309329fb Migrate docs website + improved docs (#389) (#403)
migrate docs website + improved docs (#389)

* Update README.md (#385)

* refactor

* refactor

* refactor

* rename task

* update codespell

* multi gpu docs (#391)

* Refactor

* refacotr

* fix typo

* Apply suggestions from code review



* refactor

* refactor

---------

Co-authored-by: ImmanuelSegol <3ditds@gmail.com>
Co-authored-by: DmytroTym <dmytrotym1@gmail.com>
Co-authored-by: ChickenLover <Romangg81@gmail.com>
2024-02-28 14:40:04 +02:00