Commit Graph

503 Commits

Author SHA1 Message Date
Agnes Leroy
df73c36cbf fix(gpu): fix decomposition algorithm not matching the theory 2025-11-14 16:36:35 +01:00
Agnes Leroy
4f9f4982f6 fix(gpu): fix memory leak in rerand 2025-11-14 14:00:01 +01:00
pgardratzama
d38df76eb6 chore(hpu): adds a page about HPU PBS performances 2025-11-10 18:43:50 +01:00
pgardratzama
afaf761cdd chore(hpu): adds 3 custom IOp to measure PBS performance on HPU and update trace parser to handle 32b timestamp wrap 2025-11-10 18:43:50 +01:00
pgardratzama
4eb4fa95e3 feat(hpu): new HPU bitstream with few optimizations (GRAM arb, ALU nb, BSK manager) 2025-11-10 09:14:18 +01:00
Guillermo Oyarzun
12426573fa fix(gpu): add upper bound to lwe_chunk_size calculation 2025-11-07 09:29:40 +01:00
Guillermo Oyarzun
6f105cd82e fix(gpu): fix out of bounds in specialized classical pbs 2025-11-06 15:35:04 +01:00
Enzo Di Maria
4ff95e3a42 feat(gpu): AES 256 2025-11-05 13:37:08 +01:00
Baptiste Roux
f970031d33 chore(hpu): Update version of hw_regmap deps
This new version update rust MSRV.
2025-11-04 15:26:27 +01:00
Arthur Meyre
00ce0deec9 chore: make typos version fixed
- add a script to properly install the correct version
- correct new typos
2025-11-03 14:58:23 +01:00
Enzo Di Maria
026cc376ed refactor(gpu): multibit decompression 2025-10-30 08:59:10 +01:00
Pedro Alves
867f8fb579 feat(gpu): implement re-randomization
- exposed to integer and HL API
- test on the HL API
- benchmarks for GPU and CPU implementation
2025-10-29 17:55:45 -03:00
Guillermo Oyarzun
62780ac500 fix(gpu): fix decompression mem leak 2025-10-24 13:02:41 +02:00
Guillermo Oyarzun
e12638dabe feat(gpu): extend specialized version to classical pbs 2025-10-22 09:20:40 +02:00
pgardratzama
79f1d22573 fix(hpu): scalar rot & shift were not doing anything and not tested in test/hpu.rs 2025-10-21 13:29:59 +02:00
pgardratzama
b918f77859 chore(hpu): add force_reload option in v80 config, remove added line in sim config 2025-10-21 13:29:59 +02:00
Helder Campos
054c5028a1 feat(hpu): Added the option to forcefully reload the HPU 2025-10-21 13:29:59 +02:00
Helder Campos
7b621e57b0 feat(hpu): LLT ROT/SHIFT IOPs 2025-10-21 13:29:59 +02:00
Agnes Leroy
b4b6275ca5 chore(gpu): remove device synchronize in drop for cudavec 2025-10-21 11:33:46 +02:00
Beka Barbakadze
39862c2861 fix(gpu): fix bug in are_all_comparison_blocks_true when number of blocks is 0 2025-10-20 13:26:50 +02:00
Guillermo Oyarzun
c22e63895e fix(gpu): fix multi-gpu throughput benches with classical pbs 2025-10-16 17:55:10 +02:00
Enzo Di Maria
126e779533 refactor(gpu): oprf_unsigned_custom_range + tests 2025-10-16 09:31:01 +02:00
Enzo Di Maria
353237c0d6 refactor(gpu): oprf_unsigned_custom_range 2025-10-16 09:31:01 +02:00
Agnes Leroy
7bad509f9a fix(gpu): fix perf regression introduced in cf3f25efdd 2025-10-16 09:21:05 +02:00
Agnes Leroy
cf3f25efdd chore(gpu): add missing syncs in linearalgebra functions and aes 2025-10-14 09:23:11 +02:00
Agnes Leroy
c3ed1a7558 chore(gpu): internal renaming 2025-10-14 09:23:11 +02:00
Agnes Leroy
6347f25668 chore(gpu): synchronize after every release 2025-10-14 09:23:11 +02:00
Agnes Leroy
91b263d480 chore(gpu): split integer utilities file 2025-10-10 14:49:02 +02:00
Andrei Stoian
30938eec74 chore(gpu): use active streams in int_radix_lut 2025-10-09 21:59:15 +02:00
pgardratzama
ca4159f123 fix(hpu): fix overflow flag of OVF_MUL & OVF_MULS, also update simulation HPU config 2025-10-07 10:14:43 +02:00
Arthur Meyre
e07f07c4c8 chore: bump tfhe-cuda-backend to 0.12.0 2025-10-06 13:26:54 +02:00
Arthur Meyre
81cc0c31b4 chore: constrain bytemuck < 1.24.0 as we don't have avx512 updated code 2025-10-06 13:24:16 +02:00
Enzo Di Maria
f0f3dd76eb feat(gpu): aes 128 2025-10-06 09:31:36 +02:00
Andrei Stoian
0604d237eb chore(gpu): multi-gpu debug target 2025-10-03 16:48:42 +02:00
Agnes Leroy
f9e876730a chore(gpu): remove support for drift noise reduction 2025-10-03 09:45:20 +02:00
pgardratzama
602c6faf8a chore(hpu): update hpu-backend dependencies, fix pcc 2025-10-02 13:20:36 +02:00
pgardratzama
563502a6a6 chore(hpu): update tfhe-hpu-backend version, readme and run-on-hpu doc 2025-10-02 13:20:36 +02:00
pgardratzama
5f30569452 fix(hpu): update AMC firmware in bitstream to lower polling period 2025-10-02 13:20:36 +02:00
pgardratzama
39b81a8ded feat(hpu): move to new bitstream at 400Mhz with GRAM_NB 3
- update SIMD_N and min_batch_size to 12 which seems to give better
  latency and ERC20 throughput
- support IOp on several lines in ami /proc file
- reduce amount of ERC_20_SIMD per batch in HLAPI bench
2025-10-02 13:20:36 +02:00
pgardratzama
da223b36b6 fix(hpu): reduce polling period of backend on iop ack file from 10 to 2us 2025-10-02 13:20:36 +02:00
JJ-hw
db16276715 chore(hpu): Remove all references to U55C, which is not supported anymore. 2025-10-02 13:20:36 +02:00
pgardratzama
a59742f518 fix(hpu): uuid comparison is now done in lower case from both value (metadata, ami read) 2025-10-02 13:20:36 +02:00
Arthur Meyre
9fdaa983e3 chore: fix october typos 2025-10-01 14:32:41 +02:00
Agnes Leroy
71b45c14da chore(gpu): refactor subset_first and subset 2025-09-30 12:21:39 +02:00
Beka Barbakadze
7549474aac feat(gpu): Implements optimized division algorithm for message_2_carry_2, when 4 or more gpus are used 2025-09-29 15:16:34 +02:00
Agnes Leroy
23d46ba2bc fix(gpu): fix oprf output degree 2025-09-29 08:33:25 +02:00
Agnes Leroy
daf0e79e4a fix(gpu): fix get oprf size on gpu 2025-09-29 08:33:25 +02:00
Arthur Meyre
c5ad73865c chore: prepare alpha.2
- bump tfhe-cuda-backend to 0.12.0-alpha.2
- bump tfhe to 1.4.0-alpha.2
2025-09-27 11:35:27 +02:00
Agnes Leroy
9aab79e23a chore(gpu): fix compilation warning 2025-09-26 17:04:17 +02:00
Agnes Leroy
f53c75636d chore(gpu): refactor oprf test, remove unused arg and fix multi-GPU for oprf 2025-09-26 13:19:34 +02:00