Commit Graph

441 Commits

Author SHA1 Message Date
Pedro Alves
becd08db71 fix(gpu): fix an overflow that may happen when the user tries to allocate a huge amount of blocks 2025-09-16 16:17:32 -03:00
Pedro Alves
6b94872a00 fix(gpu): add an assert to be sure the carry part has correct size in expand 2025-09-15 12:57:11 -03:00
Pedro Alves
b2624d1a76 chore(gpu): refactor the indexing logic for the LWE expand 2025-09-11 13:10:18 -03:00
Pedro Alves
c78cc2d2e9 chore(gpu): add a benchmark for 128-bit multi-bit noise squashing
- Also, remove the lut indexes concept from the 128-bit multi-bit pbs. It's assumed not to exist by the entire backend (as it doesn't for classical PBS). So to keep it here would be a bit error prone.
2025-09-09 07:51:35 -03:00
Himess
6fde90ad9c chore(clap): Replace use of deprecated attributes
Replace deprecated #[clap(...)] attributes to #[arg]/#[command] and remove redundant use of value_parser
2025-09-09 09:35:59 +02:00
Agnes Leroy
5d70ae4232 fix(gpu): add missing broadcast lut 2025-09-09 08:47:53 +02:00
Guillermo Oyarzun
a3168eb1b5 feat(gpu): enable lut generation with preallocated buffers 2025-09-08 10:01:34 +02:00
pgardratzama
0a1651adf3 fix(hpu): update firmware in bitstream to allow SIMD operations 2025-09-05 10:42:36 +02:00
pgardratzama
2279d0deb8 chore(hpu): update hpu firmware (fix 2 bits operations issue) 2025-09-05 10:42:36 +02:00
Helder Campos
d3a867ecfe feat(hpu): High bandwidth HPU 2025-09-05 10:42:36 +02:00
Helder Campos
a83c92f28f feat(hpu): Soft Reset Support and fix some runtime registers 2025-09-05 10:42:36 +02:00
Helder Campos
3b48ef301e feat(hpu): Made two SIMD IOPs, ADD and ERC20. 2025-09-05 10:42:36 +02:00
Helder Campos
827a6e912c feat(hpu): Adding a massively parallel multiplier operation 2025-09-05 10:42:36 +02:00
Guillermo Oyarzun
eeccace7b3 fix(gpu): add missing syncs when releasing scalar ops and returning to old lut release 2025-09-05 09:53:00 +02:00
Guillermo Oyarzun
60d137de6e feat(gpu): use mempools to optimize mem reuse 2025-09-04 13:23:18 +02:00
Guillermo Oyarzun
c2e816a86c fix(gpu): change mininum number of elements in benches 2025-09-04 11:03:27 +02:00
Guillermo Oyarzun
baad6a6b49 feat(gpu): change broadcast lut to communicate the minimum possible 2025-09-03 15:20:58 +02:00
Guillermo Oyarzun
88c3df8331 feat(gpu): improve communication scheme 2025-09-03 15:20:58 +02:00
Pedro Alves
57ea3e3e88 chore(gpu): refactor the entry points for PBS in the backend 2025-08-29 16:46:27 -03:00
Pedro Alves
cad4070ebe fix(gpu): fix the decompression function signature in the backend 2025-08-29 21:09:40 +02:00
Pedro Alves
94d24e1f8b feat(gpu): implement the centered modulus switch technique to classical PBS 2025-08-29 11:38:26 -03:00
Pedro Alves
9a1c0f48f4 feat(gpu): implement 128-bit compression and add it to the integer API 2025-08-29 11:26:07 -03:00
Guillermo Oyarzun
ff29535eb0 feat(gpu): enable specialized pbs for 4_1_1 params 2025-08-29 10:19:45 +02:00
Andrei Stoian
c06b513182 chore(gpu): add valgrind and fix leaks 2025-08-28 14:21:57 +02:00
Nicolas Sarlin
fa48444611 chore(ci): update toolchain to nightly-2025-08-26 2025-08-28 08:41:48 +02:00
Andrei Stoian
71f427de9e chore(gpu): add assert macro 2025-08-27 10:32:43 +02:00
Enzo Di Maria
14063ca3b3 fix(gpu): fix perf of ilog2 backend 2025-08-26 14:53:08 +02:00
Andrei Stoian
f776c737a1 chore(gpu): fix typos 2025-08-25 10:02:07 +02:00
Guillermo Oyarzun
c1c7fe78ed fix(gpu): fix memory leak in count consecutive bits 2025-08-22 17:39:32 +02:00
Guillermo Oyarzun
827cea966b chore(gpu): fix nvtx labels and a comment 2025-08-21 18:02:53 +02:00
Enzo Di Maria
e5e54be4a4 refactor(gpu): moving unchecked_ilog2_async to the backend 2025-08-12 09:05:29 +02:00
Guillermo Oyarzun
4a3be71bd7 fix(gpu): create message extract lut only when needed 2025-08-11 10:38:31 +02:00
pgardratzama
afd8f58a8d feat(hpu): update backend to support multiple V80 device, id of v80 is its serial number
- update psi64 to replace fw with stable version (3.1.0), remove psi16.hpu
2025-08-07 14:58:39 +02:00
Guillermo Oyarzun
1b92bcf476 feat(gpu): extra optimizations for 2_2 params kernels and bugs fixes 2025-08-07 09:34:32 +02:00
Guillermo Oyarzun
79d5db66d4 feat(gpu): use warp level optimizations for fft 2025-08-07 09:34:32 +02:00
Guillermo Oyarzun
d741e55218 feat(gpu): write specialized pbs keybundle for 2_2 params 2025-08-07 09:34:32 +02:00
Guillermo Oyarzun
ef5a391dc2 feat(gpu): write specialized pbs accumulate for 2_2 params 2025-08-07 09:34:32 +02:00
Enzo Di Maria
d1c417bf71 refactor(gpu): cleaning compression 2025-08-07 09:31:55 +02:00
Enzo Di Maria
852a06b330 refactor(gpu): orpf with grouped processing and for multi-gpu 2025-08-05 09:58:25 +02:00
Mayeul@Zama
fe2dde0e0c chore(gpu): fix index type 2025-08-01 10:38:09 +02:00
Afounso Souza
e7e095b924 fix(gpu): fix typo
fix(gpu): fix typo
2025-08-01 10:21:54 +02:00
Andrei Stoian
7bf2ec6ff2 chore(gpu): fix warnings detection 2025-07-31 18:47:08 +02:00
Agnes Leroy
2d7e1b2293 chore(gpu): change active gpu count logic 2025-07-31 16:10:45 +01:00
Guillermo Oyarzun
a411e5720d fix(gpu): update soon deprecated version nvtx 2025-07-31 16:52:05 +02:00
Agnes Leroy
54d038ef30 chore(gpu): enhance scatter to check gpu count is ok 2025-07-31 13:11:52 +01:00
Guillermo Oyarzun
908922171d fix(gpu): remove unused pointer in squash and add some extra checks 2025-07-31 09:52:34 +01:00
Kendra Karol Sevilla
84f6a8082d fix(cuda): correct radix block mismatch check in LWE array validation 2025-07-31 09:28:48 +01:00
otc group
0bc59dca59 chore: fix typo in comment
chore: fix typo in comment
2025-07-31 09:20:49 +01:00
Agnes Leroy
09ffc39b15 fix(gpu): fix inconsistent types 2025-07-31 08:14:45 +01:00
Andrei Stoian
36eceaf05e feat(gpu): utility debug workflows in ci 2025-07-30 12:55:40 +01:00