Commit Graph

32 Commits

Author SHA1 Message Date
Andrei Stoian
e43528db71 feat(gpu): support keyswitch 64/32 in PBS 2026-01-05 09:48:00 +01:00
Andrei Stoian
78d1ce18c1 feat(gpu): support keyswitch 64/32 2025-12-12 22:01:49 +01:00
Andrei Stoian
e2063c8ef4 chore(gpu): bench KS latency batches 2025-11-27 17:32:44 +01:00
Arthur Meyre
9fdaa983e3 chore: fix october typos 2025-10-01 14:32:41 +02:00
Andrei Stoian
1dcc3c8c89 chore(gpu): structure to encapsulate streams 2025-09-18 09:43:17 +02:00
Pedro Alves
57ea3e3e88 chore(gpu): refactor the entry points for PBS in the backend 2025-08-29 16:46:27 -03:00
Andrei Stoian
71f427de9e chore(gpu): add assert macro 2025-08-27 10:32:43 +02:00
cryptoraph
d78266e141 fix(cuda): correct typo in keyswitch error message 2025-07-28 17:09:56 +02:00
Agnes Leroy
48dfeb21dc chore(gpu): refactor size tracker to avoid future bugs 2025-07-04 14:37:02 +01:00
Guillermo Oyarzun
981083360e feat(gpu): increase keyswitch occupancy 2025-07-01 09:54:14 +02:00
Agnes Leroy
8a2d93aaa8 fix(gpu): compression memory check bug, size computation was incorrect 2025-06-20 15:45:01 +02:00
Pedro Alves
53845b298a fix(gpu): fix the packing keyswitch buffer not being allocated on large parameter sets 2025-06-11 08:58:09 +02:00
Agnes Leroy
9eaa77ddef feat(gpu): make all scratch functions return the amount of memory consumed for temporary buffers 2025-04-30 10:48:03 +02:00
Guillermo Oyarzun
0f44ffdf30 fix(gpu): enable larger number of samples in the keyswitch 2025-02-17 19:34:26 -03:00
Pedro Alves
3c88574a52 chore(gpu): encapsulate cudaSetDevice 2025-01-31 09:08:30 +01:00
Andrei Stoian
298fd66631 feat(gpu): optimize packing keyswitch on gpu 2025-01-13 09:18:53 -03:00
Andrei Stoian
2c8f0ce7de feat(gpu): optimize packing keyswitch in ML special case 2024-12-23 10:32:23 -03:00
Arthur Meyre
d28040342c chore(gpu): use same balanced decomposition code as in the CPU code 2024-11-13 14:26:13 +01:00
Arthur Meyre
615ed3d5db refactor(tfhe)!: update key level order for better performance
- use natural order for decomposition levels in bsk

co-authored-by: Agnes Leroy <agnes.leroy@zama.ai>
2024-11-05 17:23:57 +01:00
Arthur Meyre
5a54cf678f chore(data)!: breaking data changes for future compatibility
- invert the LweKeyswitchKey level order and propagate change
- remove dependency on unsupported wopbs keys for the HL keys
2024-10-22 10:23:21 +02:00
Guillermo Oyarzun
d780276ae6 fix(gpu): add template parameter to packing keyswitch calls 2024-10-16 09:30:38 +02:00
Agnes Leroy
e698d18242 chore(gpu): automatically generate rust bindings for cuda functions, except device.cu 2024-10-14 17:07:57 +02:00
Pedro Alves
faf200218b chore(gpu): add checks to ensure limits for compression 2024-09-19 15:57:16 -03:00
Pedro Alves
fe5641ef6d feat(gpu): implement CUDA-based Radix Integer compression and public functional packing keyswitch 2024-08-16 15:44:34 -03:00
Agnes Leroy
d9eca01631 fix(gpu): dispatch/gather inputs and outputs to the ks and pbs on all GPUs 2024-07-23 08:48:48 +02:00
Guillermo Oyarzun
c1fcd95d72 refactor(gpu): add restrict keyword 2024-07-19 13:08:39 +02:00
Agnes Leroy
b8991229ec feat(gpu): make PBS and ks execution parallel over available GPUs
Only GPUs with peer access to GPU 0 can be used for this at the moment.
Peer to peer copy is used if different GPUs are passed to memcpy_gpu_to_gpu
A gpu offset is passed as new parameter to pbs and keyswitch to adjust the input/output index user per gpu.
bsk and ksk are copied to all GPUs.
The CI now tests & runs benchmarks on p3.8xlarge aws instances
2024-06-10 15:05:42 +02:00
Guillermo Oyarzun
019efb7fef chore(gpu): parallelize keyswitch further 2024-06-05 11:23:53 +02:00
Agnes Leroy
fedd1ca7b2 chore(gpu): change the number of threads used in the keyswitch 2024-05-24 12:03:53 +02:00
Agnes Leroy
4f29db404c feat(gpu): prepare code base for multi-gpu support 2024-05-06 09:41:54 +02:00
Agnes Leroy
c1c56ab770 fix(gpu): fix memory bug in multi-bit PBS 2024-03-06 14:18:29 +01:00
Agnes Leroy
548f2e5d05 chore(gpu): fix gpu package for publication 2024-01-22 15:29:26 +01:00