Andrei Stoian
e43528db71
feat(gpu): support keyswitch 64/32 in PBS
2026-01-05 09:48:00 +01:00
Andrei Stoian
78d1ce18c1
feat(gpu): support keyswitch 64/32
2025-12-12 22:01:49 +01:00
Andrei Stoian
e2063c8ef4
chore(gpu): bench KS latency batches
2025-11-27 17:32:44 +01:00
Arthur Meyre
9fdaa983e3
chore: fix october typos
2025-10-01 14:32:41 +02:00
Andrei Stoian
1dcc3c8c89
chore(gpu): structure to encapsulate streams
2025-09-18 09:43:17 +02:00
Pedro Alves
57ea3e3e88
chore(gpu): refactor the entry points for PBS in the backend
2025-08-29 16:46:27 -03:00
Andrei Stoian
71f427de9e
chore(gpu): add assert macro
2025-08-27 10:32:43 +02:00
cryptoraph
d78266e141
fix(cuda): correct typo in keyswitch error message
2025-07-28 17:09:56 +02:00
Agnes Leroy
48dfeb21dc
chore(gpu): refactor size tracker to avoid future bugs
2025-07-04 14:37:02 +01:00
Guillermo Oyarzun
981083360e
feat(gpu): increase keyswitch occupancy
2025-07-01 09:54:14 +02:00
Agnes Leroy
8a2d93aaa8
fix(gpu): compression memory check bug, size computation was incorrect
2025-06-20 15:45:01 +02:00
Pedro Alves
53845b298a
fix(gpu): fix the packing keyswitch buffer not being allocated on large parameter sets
2025-06-11 08:58:09 +02:00
Agnes Leroy
9eaa77ddef
feat(gpu): make all scratch functions return the amount of memory consumed for temporary buffers
2025-04-30 10:48:03 +02:00
Guillermo Oyarzun
0f44ffdf30
fix(gpu): enable larger number of samples in the keyswitch
2025-02-17 19:34:26 -03:00
Pedro Alves
3c88574a52
chore(gpu): encapsulate cudaSetDevice
2025-01-31 09:08:30 +01:00
Andrei Stoian
298fd66631
feat(gpu): optimize packing keyswitch on gpu
2025-01-13 09:18:53 -03:00
Andrei Stoian
2c8f0ce7de
feat(gpu): optimize packing keyswitch in ML special case
2024-12-23 10:32:23 -03:00
Arthur Meyre
d28040342c
chore(gpu): use same balanced decomposition code as in the CPU code
2024-11-13 14:26:13 +01:00
Arthur Meyre
615ed3d5db
refactor(tfhe)!: update key level order for better performance
...
- use natural order for decomposition levels in bsk
co-authored-by: Agnes Leroy <agnes.leroy@zama.ai >
2024-11-05 17:23:57 +01:00
Arthur Meyre
5a54cf678f
chore(data)!: breaking data changes for future compatibility
...
- invert the LweKeyswitchKey level order and propagate change
- remove dependency on unsupported wopbs keys for the HL keys
2024-10-22 10:23:21 +02:00
Guillermo Oyarzun
d780276ae6
fix(gpu): add template parameter to packing keyswitch calls
2024-10-16 09:30:38 +02:00
Agnes Leroy
e698d18242
chore(gpu): automatically generate rust bindings for cuda functions, except device.cu
2024-10-14 17:07:57 +02:00
Pedro Alves
faf200218b
chore(gpu): add checks to ensure limits for compression
2024-09-19 15:57:16 -03:00
Pedro Alves
fe5641ef6d
feat(gpu): implement CUDA-based Radix Integer compression and public functional packing keyswitch
2024-08-16 15:44:34 -03:00
Agnes Leroy
d9eca01631
fix(gpu): dispatch/gather inputs and outputs to the ks and pbs on all GPUs
2024-07-23 08:48:48 +02:00
Guillermo Oyarzun
c1fcd95d72
refactor(gpu): add restrict keyword
2024-07-19 13:08:39 +02:00
Agnes Leroy
b8991229ec
feat(gpu): make PBS and ks execution parallel over available GPUs
...
Only GPUs with peer access to GPU 0 can be used for this at the moment.
Peer to peer copy is used if different GPUs are passed to memcpy_gpu_to_gpu
A gpu offset is passed as new parameter to pbs and keyswitch to adjust the input/output index user per gpu.
bsk and ksk are copied to all GPUs.
The CI now tests & runs benchmarks on p3.8xlarge aws instances
2024-06-10 15:05:42 +02:00
Guillermo Oyarzun
019efb7fef
chore(gpu): parallelize keyswitch further
2024-06-05 11:23:53 +02:00
Agnes Leroy
fedd1ca7b2
chore(gpu): change the number of threads used in the keyswitch
2024-05-24 12:03:53 +02:00
Agnes Leroy
4f29db404c
feat(gpu): prepare code base for multi-gpu support
2024-05-06 09:41:54 +02:00
Agnes Leroy
c1c56ab770
fix(gpu): fix memory bug in multi-bit PBS
2024-03-06 14:18:29 +01:00
Agnes Leroy
548f2e5d05
chore(gpu): fix gpu package for publication
2024-01-22 15:29:26 +01:00