Pedro Alves
becd08db71
fix(gpu): fix an overflow that may happen when the user tries to allocate a huge amount of blocks
2025-09-16 16:17:32 -03:00
Pedro Alves
6b94872a00
fix(gpu): add an assert to be sure the carry part has correct size in expand
2025-09-15 12:57:11 -03:00
Pedro Alves
b2624d1a76
chore(gpu): refactor the indexing logic for the LWE expand
2025-09-11 13:10:18 -03:00
Pedro Alves
c78cc2d2e9
chore(gpu): add a benchmark for 128-bit multi-bit noise squashing
...
- Also, remove the lut indexes concept from the 128-bit multi-bit pbs. It's assumed not to exist by the entire backend (as it doesn't for classical PBS). So to keep it here would be a bit error prone.
2025-09-09 07:51:35 -03:00
Himess
6fde90ad9c
chore(clap): Replace use of deprecated attributes
...
Replace deprecated #[clap(...)] attributes to #[arg]/#[command] and remove redundant use of value_parser
2025-09-09 09:35:59 +02:00
Agnes Leroy
5d70ae4232
fix(gpu): add missing broadcast lut
2025-09-09 08:47:53 +02:00
Guillermo Oyarzun
a3168eb1b5
feat(gpu): enable lut generation with preallocated buffers
2025-09-08 10:01:34 +02:00
pgardratzama
0a1651adf3
fix(hpu): update firmware in bitstream to allow SIMD operations
2025-09-05 10:42:36 +02:00
pgardratzama
2279d0deb8
chore(hpu): update hpu firmware (fix 2 bits operations issue)
2025-09-05 10:42:36 +02:00
Helder Campos
d3a867ecfe
feat(hpu): High bandwidth HPU
2025-09-05 10:42:36 +02:00
Helder Campos
a83c92f28f
feat(hpu): Soft Reset Support and fix some runtime registers
2025-09-05 10:42:36 +02:00
Helder Campos
3b48ef301e
feat(hpu): Made two SIMD IOPs, ADD and ERC20.
2025-09-05 10:42:36 +02:00
Helder Campos
827a6e912c
feat(hpu): Adding a massively parallel multiplier operation
2025-09-05 10:42:36 +02:00
Guillermo Oyarzun
eeccace7b3
fix(gpu): add missing syncs when releasing scalar ops and returning to old lut release
2025-09-05 09:53:00 +02:00
Guillermo Oyarzun
60d137de6e
feat(gpu): use mempools to optimize mem reuse
2025-09-04 13:23:18 +02:00
Guillermo Oyarzun
c2e816a86c
fix(gpu): change mininum number of elements in benches
2025-09-04 11:03:27 +02:00
Guillermo Oyarzun
baad6a6b49
feat(gpu): change broadcast lut to communicate the minimum possible
2025-09-03 15:20:58 +02:00
Guillermo Oyarzun
88c3df8331
feat(gpu): improve communication scheme
2025-09-03 15:20:58 +02:00
Pedro Alves
57ea3e3e88
chore(gpu): refactor the entry points for PBS in the backend
2025-08-29 16:46:27 -03:00
Pedro Alves
cad4070ebe
fix(gpu): fix the decompression function signature in the backend
2025-08-29 21:09:40 +02:00
Pedro Alves
94d24e1f8b
feat(gpu): implement the centered modulus switch technique to classical PBS
2025-08-29 11:38:26 -03:00
Pedro Alves
9a1c0f48f4
feat(gpu): implement 128-bit compression and add it to the integer API
2025-08-29 11:26:07 -03:00
Guillermo Oyarzun
ff29535eb0
feat(gpu): enable specialized pbs for 4_1_1 params
2025-08-29 10:19:45 +02:00
Andrei Stoian
c06b513182
chore(gpu): add valgrind and fix leaks
2025-08-28 14:21:57 +02:00
Nicolas Sarlin
fa48444611
chore(ci): update toolchain to nightly-2025-08-26
2025-08-28 08:41:48 +02:00
Andrei Stoian
71f427de9e
chore(gpu): add assert macro
2025-08-27 10:32:43 +02:00
Enzo Di Maria
14063ca3b3
fix(gpu): fix perf of ilog2 backend
2025-08-26 14:53:08 +02:00
Andrei Stoian
f776c737a1
chore(gpu): fix typos
2025-08-25 10:02:07 +02:00
Guillermo Oyarzun
c1c7fe78ed
fix(gpu): fix memory leak in count consecutive bits
2025-08-22 17:39:32 +02:00
Guillermo Oyarzun
827cea966b
chore(gpu): fix nvtx labels and a comment
2025-08-21 18:02:53 +02:00
Enzo Di Maria
e5e54be4a4
refactor(gpu): moving unchecked_ilog2_async to the backend
2025-08-12 09:05:29 +02:00
Guillermo Oyarzun
4a3be71bd7
fix(gpu): create message extract lut only when needed
2025-08-11 10:38:31 +02:00
pgardratzama
afd8f58a8d
feat(hpu): update backend to support multiple V80 device, id of v80 is its serial number
...
- update psi64 to replace fw with stable version (3.1.0), remove psi16.hpu
2025-08-07 14:58:39 +02:00
Guillermo Oyarzun
1b92bcf476
feat(gpu): extra optimizations for 2_2 params kernels and bugs fixes
2025-08-07 09:34:32 +02:00
Guillermo Oyarzun
79d5db66d4
feat(gpu): use warp level optimizations for fft
2025-08-07 09:34:32 +02:00
Guillermo Oyarzun
d741e55218
feat(gpu): write specialized pbs keybundle for 2_2 params
2025-08-07 09:34:32 +02:00
Guillermo Oyarzun
ef5a391dc2
feat(gpu): write specialized pbs accumulate for 2_2 params
2025-08-07 09:34:32 +02:00
Enzo Di Maria
d1c417bf71
refactor(gpu): cleaning compression
2025-08-07 09:31:55 +02:00
Enzo Di Maria
852a06b330
refactor(gpu): orpf with grouped processing and for multi-gpu
2025-08-05 09:58:25 +02:00
Mayeul@Zama
fe2dde0e0c
chore(gpu): fix index type
2025-08-01 10:38:09 +02:00
Afounso Souza
e7e095b924
fix(gpu): fix typo
...
fix(gpu): fix typo
2025-08-01 10:21:54 +02:00
Andrei Stoian
7bf2ec6ff2
chore(gpu): fix warnings detection
2025-07-31 18:47:08 +02:00
Agnes Leroy
2d7e1b2293
chore(gpu): change active gpu count logic
2025-07-31 16:10:45 +01:00
Guillermo Oyarzun
a411e5720d
fix(gpu): update soon deprecated version nvtx
2025-07-31 16:52:05 +02:00
Agnes Leroy
54d038ef30
chore(gpu): enhance scatter to check gpu count is ok
2025-07-31 13:11:52 +01:00
Guillermo Oyarzun
908922171d
fix(gpu): remove unused pointer in squash and add some extra checks
2025-07-31 09:52:34 +01:00
Kendra Karol Sevilla
84f6a8082d
fix(cuda): correct radix block mismatch check in LWE array validation
2025-07-31 09:28:48 +01:00
otc group
0bc59dca59
chore: fix typo in comment
...
chore: fix typo in comment
2025-07-31 09:20:49 +01:00
Agnes Leroy
09ffc39b15
fix(gpu): fix inconsistent types
2025-07-31 08:14:45 +01:00
Andrei Stoian
36eceaf05e
feat(gpu): utility debug workflows in ci
2025-07-30 12:55:40 +01:00