Commit Graph

462 Commits

Author SHA1 Message Date
Arthur Meyre
eafc61423d chore: fix typos 2025-12-11 16:03:31 +01:00
Arthur Meyre
ebbf5563c7 chore: constrain bytemuck < 1.24.0 as we don't have avx512 updated code 2025-12-11 16:03:31 +01:00
Arthur Meyre
fd63db1f6f chore: make 1.4.x compile properly even if some deps are updated to 2024 2025-12-11 16:03:31 +01:00
Beka Barbakadze
7549474aac feat(gpu): Implements optimized division algorithm for message_2_carry_2, when 4 or more gpus are used 2025-09-29 15:16:34 +02:00
Agnes Leroy
23d46ba2bc fix(gpu): fix oprf output degree 2025-09-29 08:33:25 +02:00
Agnes Leroy
daf0e79e4a fix(gpu): fix get oprf size on gpu 2025-09-29 08:33:25 +02:00
Arthur Meyre
c5ad73865c chore: prepare alpha.2
- bump tfhe-cuda-backend to 0.12.0-alpha.2
- bump tfhe to 1.4.0-alpha.2
2025-09-27 11:35:27 +02:00
Agnes Leroy
9aab79e23a chore(gpu): fix compilation warning 2025-09-26 17:04:17 +02:00
Agnes Leroy
f53c75636d chore(gpu): refactor oprf test, remove unused arg and fix multi-GPU for oprf 2025-09-26 13:19:34 +02:00
Arthur Meyre
ce63cabc05 chore: bump tfhe-cuda-backend to 0.12.0-alpha.1 2025-09-26 10:39:24 +02:00
JJ-hw
3680f796af feat(hpu): Now the mockup takes into account the field position from the regmap toml to generate its register read and write answers. 2025-09-25 14:00:07 +02:00
JJ-hw
3ded3fe7c9 fix(hpu): (From hcampos-zama) Missing a field. 2025-09-25 14:00:07 +02:00
Arthur Meyre
d60028c47c chore: bump tfhe-cuda-backend to 0.12.0-alpha.0 2025-09-24 15:57:30 +02:00
Agnes Leroy
4b0623da4a chore(gpu): remove unused variable 2025-09-22 16:36:34 +02:00
Guillermo Oyarzun
d415d47894 chore(gpu): remove unnecessary nvtx lib dependency 2025-09-22 16:34:57 +02:00
Nicolas Sarlin
4d02d3abb4 fix(hpu): clippy lint 2025-09-22 14:02:41 +02:00
Guillermo Oyarzun
022cb3b18a fix(gpu): avoid out of memory when benchmarking throughput 2025-09-19 14:44:12 +02:00
Agnes Leroy
fe6e81ff78 chore(gpu): post hackathon cleanup 2025-09-18 16:30:45 +02:00
Andrei Stoian
87c0d646a4 fix(gpu): coprocessor bench 2025-09-18 13:56:55 +02:00
Agnes Leroy
e5b39a6d4d fix(gpu): fix memory leak in multi-gpu calculations 2025-09-18 13:55:03 +02:00
Andrei Stoian
1dcc3c8c89 chore(gpu): structure to encapsulate streams 2025-09-18 09:43:17 +02:00
Pedro Alves
becd08db71 fix(gpu): fix an overflow that may happen when the user tries to allocate a huge amount of blocks 2025-09-16 16:17:32 -03:00
Pedro Alves
6b94872a00 fix(gpu): add an assert to be sure the carry part has correct size in expand 2025-09-15 12:57:11 -03:00
Pedro Alves
b2624d1a76 chore(gpu): refactor the indexing logic for the LWE expand 2025-09-11 13:10:18 -03:00
Pedro Alves
c78cc2d2e9 chore(gpu): add a benchmark for 128-bit multi-bit noise squashing
- Also, remove the lut indexes concept from the 128-bit multi-bit pbs. It's assumed not to exist by the entire backend (as it doesn't for classical PBS). So to keep it here would be a bit error prone.
2025-09-09 07:51:35 -03:00
Himess
6fde90ad9c chore(clap): Replace use of deprecated attributes
Replace deprecated #[clap(...)] attributes to #[arg]/#[command] and remove redundant use of value_parser
2025-09-09 09:35:59 +02:00
Agnes Leroy
5d70ae4232 fix(gpu): add missing broadcast lut 2025-09-09 08:47:53 +02:00
Guillermo Oyarzun
a3168eb1b5 feat(gpu): enable lut generation with preallocated buffers 2025-09-08 10:01:34 +02:00
pgardratzama
0a1651adf3 fix(hpu): update firmware in bitstream to allow SIMD operations 2025-09-05 10:42:36 +02:00
pgardratzama
2279d0deb8 chore(hpu): update hpu firmware (fix 2 bits operations issue) 2025-09-05 10:42:36 +02:00
Helder Campos
d3a867ecfe feat(hpu): High bandwidth HPU 2025-09-05 10:42:36 +02:00
Helder Campos
a83c92f28f feat(hpu): Soft Reset Support and fix some runtime registers 2025-09-05 10:42:36 +02:00
Helder Campos
3b48ef301e feat(hpu): Made two SIMD IOPs, ADD and ERC20. 2025-09-05 10:42:36 +02:00
Helder Campos
827a6e912c feat(hpu): Adding a massively parallel multiplier operation 2025-09-05 10:42:36 +02:00
Guillermo Oyarzun
eeccace7b3 fix(gpu): add missing syncs when releasing scalar ops and returning to old lut release 2025-09-05 09:53:00 +02:00
Guillermo Oyarzun
60d137de6e feat(gpu): use mempools to optimize mem reuse 2025-09-04 13:23:18 +02:00
Guillermo Oyarzun
c2e816a86c fix(gpu): change mininum number of elements in benches 2025-09-04 11:03:27 +02:00
Guillermo Oyarzun
baad6a6b49 feat(gpu): change broadcast lut to communicate the minimum possible 2025-09-03 15:20:58 +02:00
Guillermo Oyarzun
88c3df8331 feat(gpu): improve communication scheme 2025-09-03 15:20:58 +02:00
Pedro Alves
57ea3e3e88 chore(gpu): refactor the entry points for PBS in the backend 2025-08-29 16:46:27 -03:00
Pedro Alves
cad4070ebe fix(gpu): fix the decompression function signature in the backend 2025-08-29 21:09:40 +02:00
Pedro Alves
94d24e1f8b feat(gpu): implement the centered modulus switch technique to classical PBS 2025-08-29 11:38:26 -03:00
Pedro Alves
9a1c0f48f4 feat(gpu): implement 128-bit compression and add it to the integer API 2025-08-29 11:26:07 -03:00
Guillermo Oyarzun
ff29535eb0 feat(gpu): enable specialized pbs for 4_1_1 params 2025-08-29 10:19:45 +02:00
Andrei Stoian
c06b513182 chore(gpu): add valgrind and fix leaks 2025-08-28 14:21:57 +02:00
Nicolas Sarlin
fa48444611 chore(ci): update toolchain to nightly-2025-08-26 2025-08-28 08:41:48 +02:00
Andrei Stoian
71f427de9e chore(gpu): add assert macro 2025-08-27 10:32:43 +02:00
Enzo Di Maria
14063ca3b3 fix(gpu): fix perf of ilog2 backend 2025-08-26 14:53:08 +02:00
Andrei Stoian
f776c737a1 chore(gpu): fix typos 2025-08-25 10:02:07 +02:00
Guillermo Oyarzun
c1c7fe78ed fix(gpu): fix memory leak in count consecutive bits 2025-08-22 17:39:32 +02:00