Agnes Leroy
a07837a798
fix(cuda): fix scratch functions to avoid misaligned pointers
2023-03-01 17:08:04 +01:00
Agnes Leroy
a11a009df6
chore(cuda): replace synchronous copies/allocs with async ones in the private cuda backend
2023-03-01 09:15:24 +01:00
Pedro Alves
e7e6c8fb53
chore(cuda): Reduces shared memory consumption in the amortized PBS and improves loop unrolling.
2023-02-28 16:28:59 -03:00
Pedro Alves
400786f3f9
refactor(cuda): Implements support for k>1 on the Wop-PBS.
2023-02-27 17:14:30 +01:00
Pedro Alves
b0a362af6d
refactor(cuda): Implements support to k>1 on cbs+vp.
2023-02-27 17:14:30 +01:00
Pedro Alves
7896d96a49
refactor(cuda): Implements support to k>1 on bit extraction.
2023-02-27 17:14:30 +01:00
Pedro Alves
eb8aeb5a01
refactor(cuda): Implements support to k>1 on cmux tree.
2023-02-27 17:14:30 +01:00
Pedro Alves
184d453387
refactor(cuda): Implements support to N=256 in the cmux tree, bit
...
extraction, and cbs.
2023-02-27 17:14:30 +01:00
Agnes Leroy
75e9baae78
refactor(cuda): introduce scratch for low latency pbs
2023-02-22 16:49:43 +01:00
Agnes Leroy
5cd0cb5d19
refactor(cuda): introduce scratch for amortized PBS
2023-02-17 13:51:41 +01:00
Agnes Leroy
2a487ffbfd
refactor(cuda): introduce scratch for blind rotation and sample extraction
2023-02-16 09:31:49 +01:00
Agnes Leroy
870d896ad9
refactor(cuda): introduce cmux tree scratch
2023-02-15 17:32:12 +01:00
Agnes Leroy
e6dfb588db
refactor(cuda): prepare to introduce cmux tree scratch
2023-02-15 17:32:12 +01:00
Pedro Alves
bfb07b961d
feat(cuda): Add support for the classical PBS for polynomial_size=256.
2023-02-15 09:15:21 +01:00
Agnes Leroy
730274f156
refactor(cuda): create scratch function and cleanup for wop pbs
2023-02-14 09:21:30 +01:00
Agnes Leroy
e9243bce6f
refactor(cuda): change lut_vector_indexes type to Torus
2023-02-14 09:21:30 +01:00
Pedro Alves
db83fd7649
refactor(cuda): Refactor the low latency PBS.
2023-02-13 09:55:15 +01:00
Pedro Alves
41be2b4832
refactor(cuda): Refactor the amortized PBS.
2023-02-09 17:14:47 +01:00
Agnes Leroy
2a299664e7
chore(cuda): refactor cuda errors, remove deprecated files
2023-02-08 14:12:55 +01:00
Beka Barbakadze
3cd48f0de2
feat(cuda): add a new fft algorithm.
...
- FFT can work for any polynomial size, as long as twiddles are provided.
- All the twiddles fit in the constant memory.
- Bit reverse is not used anymore, no more sw1 and sw2 arrays in constant memory.
- Real to complex compression algorithm is changed.
- Twiddle initialization functions are removed.
2023-02-08 00:49:44 +04:00
Agnes Leroy
bd9cbbc7af
fix(cuda): fix asynchronous behaviour for pbs and wop pbs
2023-01-28 14:20:34 +01:00
Agnes Leroy
2fcc5b2d0f
feat(cuda): add missing boolan gates
2023-01-25 09:30:50 +01:00
Beka Barbakadze
bc90576454
docs(cuda): add Rust doc for all concrete-cuda entry points
2023-01-25 09:30:26 +01:00
Agnes Leroy
8327cd7fff
feat(cuda): add NOT and AND gates to the library
2023-01-09 15:28:52 -03:00
Pedro Alves
c4f0daa203
fix(cuda): Fix the CUDA test for CBS+VP when tau > 1 and tau * p == log2(N).
2023-01-09 15:38:20 +01:00
Pedro Alves
e82a8d4e81
fix(cuda): Fix an error on an assert.
2022-12-20 12:49:16 -03:00
Agnes Leroy
29284b4260
chore(cuda): split wop pbs file and add entry point for wop pbs
2022-12-20 12:49:16 -03:00
Pedro Alves
e324f14c6b
chore(cuda): Modifies the CBS+VP host function to fully parallelize the cmux tree and blind rotation. Also changes how the CMUX Tree handles the input LUTs to match the CPU version.
2022-12-16 16:29:59 +01:00
Agnes Leroy
0a0c45338c
fix(cuda): fix the assert on the number of inputs in the low lat pbs
2022-12-15 14:51:29 +01:00
Pedro Alves
9bcf0f8a70
chore(cuda): Refactor device.cu functions to take pointers to cudaStream_t instead of void
2022-12-15 10:40:19 +01:00
Agnes Leroy
e4ba380594
fix(cuda): remove u32 support for cbs+vp entry point
2022-12-14 13:32:01 +01:00
Agnes Leroy
4da789abda
feat(cuda): add a cbs+vp entry point
...
- fix bug in CBS as well
- update cuda benchmarks
2022-12-14 13:32:01 +01:00
Quentin Bourgerie
2db1ef6a56
fix(cuda): Include cuda_runtime.h in device.h to include the defininition of cudaStream_t
2022-12-06 11:31:47 +01:00
Beka Barbakadze
0aedb1a4f4
feat(cuda): Add circuit bootstrap in the cuda backend
...
- Add FP-Keyswitch.
- Add entry points for cuda fk ksk in the public API.
- Add test for fp_ksk in cuda backend.
- Add fixture for bit extract
Co-authored-by: agnesLeroy <agnes.leroy@zama.ai >
2022-12-05 22:00:43 +01:00
Agnes Leroy
e10c2936d1
feat(cuda): support N=4096 and 8192 for the low latency bootstrap
2022-12-02 09:41:32 +01:00
Beka Barbakadze
c1f1b533ea
fix(cuda): fix pbs for 8192 polynomial_size
2022-12-01 13:05:28 +01:00
Agnes Leroy
921c0a6306
fix(cuda): fix N = 8192 support
2022-11-29 09:00:35 +01:00
Pedro Alves
68866766a4
feat(cuda): Adds a parameter in the CUDA host functions passing the gpu index that should be used.
2022-11-28 15:11:46 +01:00
Agnes Leroy
f04a29aea4
chore(cuda): fix asserts regarding the base log value
2022-11-28 13:58:57 +01:00
Pedro Alves
739db73d46
feat(cuda): batch_fft_ggsw_vector uses global memory in case there is not enough space in the shared memory
2022-11-28 11:48:16 +01:00
Beka Barbakadze
56b986da8b
feat(cuda): new decomposition algorithm for pbs.
...
- removes 16 bit limitation on base_log
- optimizes shared memory use: buffers for decomposition are not used anymore, rotated buffers are reused as state buffer for decomposition for the amortized PBS.
- Add a private test for cuda PBS, as we have for fft backend.
2022-11-28 11:48:16 +01:00
Pedro Alves
d59b2f6dda
feat(cuda): Check for errors after each kernel launch.
2022-11-28 09:52:19 +01:00
Pedro Alves
9d25f9248d
feat(cuda): Implements vertical packing's blind rotation and sample extraction on CUDA backend. Implements a private test for the CUDA vertical packing + blind rotation.
2022-11-21 09:30:26 +01:00
Agnes Leroy
f36f565b75
chore(cuda): replace casts with cuda intrinsics
2022-11-16 14:22:03 +01:00
Agnes Leroy
da654ee9cb
chore(core): fix clippy error in cuda backend and fix formatting
2022-11-10 14:46:49 +01:00
Pedro Alves
80f4ca7338
fix(cuda): Checks the cudaDevAttrMemoryPoolsSupported property to ensure that asynchronous allocation is supported
2022-11-10 14:46:49 +01:00
Agnes Leroy
553c2e6948
feat(cuda): add lwe / cleartext multiplication GPU acceleration
2022-11-09 15:31:36 +01:00
Pedro Alves
25f103f62d
feat(cuda): Refactor the low latency PBS to use asynchronous allocation.
2022-11-09 09:44:25 +01:00
Pedro Alves
0b58741fd4
feat(cuda): Refactor the amortized PBS to use asynchronous allocation.
2022-11-09 09:44:25 +01:00
Pedro Alves
cf222e9176
feat(cuda): encapsulate asynchronous allocation methods.
2022-11-09 09:44:25 +01:00