# Updates:
## Hashing
- Added SpongeHasher class
- Can be used to accept any hash function as an argument
- Absorb and squeeze are now separated
- Memory management is now mostly done by SpongeHasher class, each hash
function only describes permutation kernels
## Tree builder
- Tree builder is now hash-agnostic.
- Tree builder now supports 2D input (matrices)
- Tree builder can now use two different hash functions for layer 0 and
compression layers
## Poseidon1
- Interface changed to classes
- Now allows for any alpha
- Now allows passing constants not in a single vector
- Now allows for any domain tag
- Constants are now released upon going out of scope
- Rust wrappers changed to Poseidon struct
## Poseidon2
- Interface changed to classes
- Constants are now released upon going out of scope
- Rust wrappers changed to Poseidon2 struct
## Keccak
- Added Keccak class which inherits SpongeHasher
- Now doesn't use gpu registers for storing states
To do:
- [x] Update poseidon1 golang bindings
- [x] Update poseidon1 examples
- [x] Fix poseidon2 cuda test
- [x] Fix poseidon2 merkle tree builder test
- [x] Update keccak class with new design
- [x] Update keccak test
- [x] Check keccak correctness
- [x] Update tree builder rust wrappers
- [x] Leave doc comments
Future work:
- [ ] Add keccak merkle tree builder externs
- [ ] Add keccak rust tree builder wrappers
- [ ] Write docs
- [ ] Add example
- [ ] Fix device output for tree builder
---------
Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
Co-authored-by: nonam3e <71525212+nonam3e@users.noreply.github.com>
Params files for fields now only require modulus specified by the user
(also twiddle generator and/or non-residue in case either or both are
needed). Everything else gets generated by a macro.
This PR solves an issue for large ecntt where cuda blocks are too large
and cannot be assigned to SMs. The fix is to reduce thread count per
block and increase block count in that case.
## Describe the changes
This PR fixes affine to projective functions in bindings by adding a
condition if the point in affine form is zero then return the projective zero
---------
Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
This PR fixes a bug in the iterative reduction algorithm.
There were unsynchronized threads reading and writing to the same
addresses that caused MSM to fail a small percentage of the time - this is fixed now.
## Describe the changes
This PR adds the capability to pin host memory in golang bindings
allowing data transfers to be quicker. Memory can be pinned once for
multiple devices by passing the flag
`cuda_runtime.CudaHostRegisterPortable` or
`cuda_runtime.CudaHostAllocPortable` depending on how pinned memory is
called
This PR enables using MSM with any value of c.
Note: default c isn't necessarily optimal, the user is expected to
choose c and the precomputation factor that give the best results for
the relevant case.
---------
Co-authored-by: Jeremy Felder <jeremy.felder1@gmail.com>
This PR fixes 2 things:
1. Removes the assertion regarding the precompute factor needing to be a
power of 2. There is no such requirement and it works just fine for
other values too.
2. Fixes the average bucket size for the large buckets threshold - it
depends on the precompute factor.
## Describe the changes
This PR modifies icicle/cmake/Common.cmake to set
CMAKE_CUDA_ARCHITECTURES to ${CUDA_ARCH} if the user defines the arch,
to set CMAKE_CUDA_ARCHITECTURES to native if the cmake version is
greater than or equal to 3.24.0. This change has been successfully
tested with cmake 3.22.0 and 3.25.2.
## Linked Issues
Resolves#167.
## Describe the changes
Icicle examples: Concurrent Data Transfer and NTT Computation
This PR introduces a Best Practice series of examples in c++.
Specifically, the example shows how to concurrently transfer data
to/from device and execute NTT
## Linked Issues
Resolves #
- removed poly API to access view of evaluations. This is a problematic API since it cannot handle small domains and for large domains requires the polynomial to use more memory than need to.
- added evaluate_on_rou_domain() API instead that supports any domain size (powers of two size).
- the new API can compute to HOST or DEVICE memory
- Rust wrapper for evaluate_on_rou_domain()
- updated documentation: overview and Rust wrappers
- faster division by vanishing poly for common case where numerator is 2N and vanishing poly is of degree N.
- allow division a/b where deg(a)<deg(b) instead of throwing an error.