Arthur Meyre
eafc61423d
chore: fix typos
2025-12-11 16:03:31 +01:00
Arthur Meyre
ebbf5563c7
chore: constrain bytemuck < 1.24.0 as we don't have avx512 updated code
2025-12-11 16:03:31 +01:00
Arthur Meyre
fd63db1f6f
chore: make 1.4.x compile properly even if some deps are updated to 2024
2025-12-11 16:03:31 +01:00
Beka Barbakadze
7549474aac
feat(gpu): Implements optimized division algorithm for message_2_carry_2, when 4 or more gpus are used
2025-09-29 15:16:34 +02:00
Agnes Leroy
23d46ba2bc
fix(gpu): fix oprf output degree
2025-09-29 08:33:25 +02:00
Agnes Leroy
daf0e79e4a
fix(gpu): fix get oprf size on gpu
2025-09-29 08:33:25 +02:00
Arthur Meyre
c5ad73865c
chore: prepare alpha.2
...
- bump tfhe-cuda-backend to 0.12.0-alpha.2
- bump tfhe to 1.4.0-alpha.2
2025-09-27 11:35:27 +02:00
Agnes Leroy
9aab79e23a
chore(gpu): fix compilation warning
2025-09-26 17:04:17 +02:00
Agnes Leroy
f53c75636d
chore(gpu): refactor oprf test, remove unused arg and fix multi-GPU for oprf
2025-09-26 13:19:34 +02:00
Arthur Meyre
ce63cabc05
chore: bump tfhe-cuda-backend to 0.12.0-alpha.1
2025-09-26 10:39:24 +02:00
JJ-hw
3680f796af
feat(hpu): Now the mockup takes into account the field position from the regmap toml to generate its register read and write answers.
2025-09-25 14:00:07 +02:00
JJ-hw
3ded3fe7c9
fix(hpu): (From hcampos-zama) Missing a field.
2025-09-25 14:00:07 +02:00
Arthur Meyre
d60028c47c
chore: bump tfhe-cuda-backend to 0.12.0-alpha.0
2025-09-24 15:57:30 +02:00
Agnes Leroy
4b0623da4a
chore(gpu): remove unused variable
2025-09-22 16:36:34 +02:00
Guillermo Oyarzun
d415d47894
chore(gpu): remove unnecessary nvtx lib dependency
2025-09-22 16:34:57 +02:00
Nicolas Sarlin
4d02d3abb4
fix(hpu): clippy lint
2025-09-22 14:02:41 +02:00
Guillermo Oyarzun
022cb3b18a
fix(gpu): avoid out of memory when benchmarking throughput
2025-09-19 14:44:12 +02:00
Agnes Leroy
fe6e81ff78
chore(gpu): post hackathon cleanup
2025-09-18 16:30:45 +02:00
Andrei Stoian
87c0d646a4
fix(gpu): coprocessor bench
2025-09-18 13:56:55 +02:00
Agnes Leroy
e5b39a6d4d
fix(gpu): fix memory leak in multi-gpu calculations
2025-09-18 13:55:03 +02:00
Andrei Stoian
1dcc3c8c89
chore(gpu): structure to encapsulate streams
2025-09-18 09:43:17 +02:00
Pedro Alves
becd08db71
fix(gpu): fix an overflow that may happen when the user tries to allocate a huge amount of blocks
2025-09-16 16:17:32 -03:00
Pedro Alves
6b94872a00
fix(gpu): add an assert to be sure the carry part has correct size in expand
2025-09-15 12:57:11 -03:00
Pedro Alves
b2624d1a76
chore(gpu): refactor the indexing logic for the LWE expand
2025-09-11 13:10:18 -03:00
Pedro Alves
c78cc2d2e9
chore(gpu): add a benchmark for 128-bit multi-bit noise squashing
...
- Also, remove the lut indexes concept from the 128-bit multi-bit pbs. It's assumed not to exist by the entire backend (as it doesn't for classical PBS). So to keep it here would be a bit error prone.
2025-09-09 07:51:35 -03:00
Himess
6fde90ad9c
chore(clap): Replace use of deprecated attributes
...
Replace deprecated #[clap(...)] attributes to #[arg]/#[command] and remove redundant use of value_parser
2025-09-09 09:35:59 +02:00
Agnes Leroy
5d70ae4232
fix(gpu): add missing broadcast lut
2025-09-09 08:47:53 +02:00
Guillermo Oyarzun
a3168eb1b5
feat(gpu): enable lut generation with preallocated buffers
2025-09-08 10:01:34 +02:00
pgardratzama
0a1651adf3
fix(hpu): update firmware in bitstream to allow SIMD operations
2025-09-05 10:42:36 +02:00
pgardratzama
2279d0deb8
chore(hpu): update hpu firmware (fix 2 bits operations issue)
2025-09-05 10:42:36 +02:00
Helder Campos
d3a867ecfe
feat(hpu): High bandwidth HPU
2025-09-05 10:42:36 +02:00
Helder Campos
a83c92f28f
feat(hpu): Soft Reset Support and fix some runtime registers
2025-09-05 10:42:36 +02:00
Helder Campos
3b48ef301e
feat(hpu): Made two SIMD IOPs, ADD and ERC20.
2025-09-05 10:42:36 +02:00
Helder Campos
827a6e912c
feat(hpu): Adding a massively parallel multiplier operation
2025-09-05 10:42:36 +02:00
Guillermo Oyarzun
eeccace7b3
fix(gpu): add missing syncs when releasing scalar ops and returning to old lut release
2025-09-05 09:53:00 +02:00
Guillermo Oyarzun
60d137de6e
feat(gpu): use mempools to optimize mem reuse
2025-09-04 13:23:18 +02:00
Guillermo Oyarzun
c2e816a86c
fix(gpu): change mininum number of elements in benches
2025-09-04 11:03:27 +02:00
Guillermo Oyarzun
baad6a6b49
feat(gpu): change broadcast lut to communicate the minimum possible
2025-09-03 15:20:58 +02:00
Guillermo Oyarzun
88c3df8331
feat(gpu): improve communication scheme
2025-09-03 15:20:58 +02:00
Pedro Alves
57ea3e3e88
chore(gpu): refactor the entry points for PBS in the backend
2025-08-29 16:46:27 -03:00
Pedro Alves
cad4070ebe
fix(gpu): fix the decompression function signature in the backend
2025-08-29 21:09:40 +02:00
Pedro Alves
94d24e1f8b
feat(gpu): implement the centered modulus switch technique to classical PBS
2025-08-29 11:38:26 -03:00
Pedro Alves
9a1c0f48f4
feat(gpu): implement 128-bit compression and add it to the integer API
2025-08-29 11:26:07 -03:00
Guillermo Oyarzun
ff29535eb0
feat(gpu): enable specialized pbs for 4_1_1 params
2025-08-29 10:19:45 +02:00
Andrei Stoian
c06b513182
chore(gpu): add valgrind and fix leaks
2025-08-28 14:21:57 +02:00
Nicolas Sarlin
fa48444611
chore(ci): update toolchain to nightly-2025-08-26
2025-08-28 08:41:48 +02:00
Andrei Stoian
71f427de9e
chore(gpu): add assert macro
2025-08-27 10:32:43 +02:00
Enzo Di Maria
14063ca3b3
fix(gpu): fix perf of ilog2 backend
2025-08-26 14:53:08 +02:00
Andrei Stoian
f776c737a1
chore(gpu): fix typos
2025-08-25 10:02:07 +02:00
Guillermo Oyarzun
c1c7fe78ed
fix(gpu): fix memory leak in count consecutive bits
2025-08-22 17:39:32 +02:00