Commit Graph

521 Commits

Author SHA1 Message Date
Agnes Leroy
eb3b988380 Use internal streams 2025-12-05 15:28:07 +01:00
Agnes Leroy
8ed3b4b59d chore(gpu): reuse CPU LUT buffer to generate accumulators 2025-12-05 15:23:07 +01:00
Agnes Leroy
20daf182f0 Experiment with erc20 in the backend 2025-12-05 15:23:07 +01:00
Enzo Di Maria
5273f61593 refactor(gpu): creating InternalCudaStreams to improve the management of multiple streams per GPU 2025-12-05 10:13:18 +01:00
Enzo Di Maria
a96d68323b fix(gpu): No broadcast is needed because full prop is done on 1 single GPU 2025-12-04 09:17:40 +01:00
pgardratzama
1f48456b17 feat(hpu): adds a PSI64 HPU now using ISC IOP Ack interrupt to internal core
AMC fw version update from 3.1.0 to 3.1.1
2025-12-03 09:33:44 +01:00
Enzo Di Maria
6752249f7f refactor(gpu): moving vector_comparisons's functions to the backend 2025-12-03 09:11:00 +01:00
Enzo Di Maria
cc33161b97 refactor(gpu): moving cast_to_signed to the backend 2025-12-02 12:40:47 +01:00
Enzo Di Maria
3cfbaa40c3 refactor(gpu): unchecked_index_of_clear to backend 2025-11-28 14:34:53 +01:00
Andrei Stoian
e2063c8ef4 chore(gpu): bench KS latency batches 2025-11-27 17:32:44 +01:00
Enzo Di Maria
0aa0918fea refactor(gpu): vector_find's functions to backend 2025-11-27 13:36:10 +01:00
Enzo Di Maria
b6fffa3d86 fix(gpu): wrong number of blocks in tmp_match_result 2025-11-25 10:50:30 +01:00
Guillermo Oyarzun
28ea0bc086 fix(gpu): force uint64 when calculating lwe chunksize 2025-11-25 09:52:16 +01:00
Enzo Di Maria
32b1a7ab1d refactor(gpu): unchecked_match_value_or to backend 2025-11-24 13:50:15 +01:00
Guillermo Oyarzun
02312e23ea fix(gpu): fix pbs128 selection for small num samples 2025-11-21 10:57:14 +01:00
Enzo Di Maria
c6709a82c0 refactor(gpu): match_value to backend with multiple streams 2025-11-20 17:11:15 +01:00
Beka Barbakadze
80cacbd079 feat(gpu): add boolean bitops in cuda backend 2025-11-20 14:56:21 +01:00
Nicolas Sarlin
edb435bd46 chore: update msrv to 1.91.1 2025-11-20 09:29:37 +01:00
Agnes Leroy
df73c36cbf fix(gpu): fix decomposition algorithm not matching the theory 2025-11-14 16:36:35 +01:00
Agnes Leroy
4f9f4982f6 fix(gpu): fix memory leak in rerand 2025-11-14 14:00:01 +01:00
pgardratzama
d38df76eb6 chore(hpu): adds a page about HPU PBS performances 2025-11-10 18:43:50 +01:00
pgardratzama
afaf761cdd chore(hpu): adds 3 custom IOp to measure PBS performance on HPU and update trace parser to handle 32b timestamp wrap 2025-11-10 18:43:50 +01:00
pgardratzama
4eb4fa95e3 feat(hpu): new HPU bitstream with few optimizations (GRAM arb, ALU nb, BSK manager) 2025-11-10 09:14:18 +01:00
Guillermo Oyarzun
12426573fa fix(gpu): add upper bound to lwe_chunk_size calculation 2025-11-07 09:29:40 +01:00
Guillermo Oyarzun
6f105cd82e fix(gpu): fix out of bounds in specialized classical pbs 2025-11-06 15:35:04 +01:00
Enzo Di Maria
4ff95e3a42 feat(gpu): AES 256 2025-11-05 13:37:08 +01:00
Baptiste Roux
f970031d33 chore(hpu): Update version of hw_regmap deps
This new version update rust MSRV.
2025-11-04 15:26:27 +01:00
Arthur Meyre
00ce0deec9 chore: make typos version fixed
- add a script to properly install the correct version
- correct new typos
2025-11-03 14:58:23 +01:00
Enzo Di Maria
026cc376ed refactor(gpu): multibit decompression 2025-10-30 08:59:10 +01:00
Pedro Alves
867f8fb579 feat(gpu): implement re-randomization
- exposed to integer and HL API
- test on the HL API
- benchmarks for GPU and CPU implementation
2025-10-29 17:55:45 -03:00
Guillermo Oyarzun
62780ac500 fix(gpu): fix decompression mem leak 2025-10-24 13:02:41 +02:00
Guillermo Oyarzun
e12638dabe feat(gpu): extend specialized version to classical pbs 2025-10-22 09:20:40 +02:00
pgardratzama
79f1d22573 fix(hpu): scalar rot & shift were not doing anything and not tested in test/hpu.rs 2025-10-21 13:29:59 +02:00
pgardratzama
b918f77859 chore(hpu): add force_reload option in v80 config, remove added line in sim config 2025-10-21 13:29:59 +02:00
Helder Campos
054c5028a1 feat(hpu): Added the option to forcefully reload the HPU 2025-10-21 13:29:59 +02:00
Helder Campos
7b621e57b0 feat(hpu): LLT ROT/SHIFT IOPs 2025-10-21 13:29:59 +02:00
Agnes Leroy
b4b6275ca5 chore(gpu): remove device synchronize in drop for cudavec 2025-10-21 11:33:46 +02:00
Beka Barbakadze
39862c2861 fix(gpu): fix bug in are_all_comparison_blocks_true when number of blocks is 0 2025-10-20 13:26:50 +02:00
Guillermo Oyarzun
c22e63895e fix(gpu): fix multi-gpu throughput benches with classical pbs 2025-10-16 17:55:10 +02:00
Enzo Di Maria
126e779533 refactor(gpu): oprf_unsigned_custom_range + tests 2025-10-16 09:31:01 +02:00
Enzo Di Maria
353237c0d6 refactor(gpu): oprf_unsigned_custom_range 2025-10-16 09:31:01 +02:00
Agnes Leroy
7bad509f9a fix(gpu): fix perf regression introduced in cf3f25efdd 2025-10-16 09:21:05 +02:00
Agnes Leroy
cf3f25efdd chore(gpu): add missing syncs in linearalgebra functions and aes 2025-10-14 09:23:11 +02:00
Agnes Leroy
c3ed1a7558 chore(gpu): internal renaming 2025-10-14 09:23:11 +02:00
Agnes Leroy
6347f25668 chore(gpu): synchronize after every release 2025-10-14 09:23:11 +02:00
Agnes Leroy
91b263d480 chore(gpu): split integer utilities file 2025-10-10 14:49:02 +02:00
Andrei Stoian
30938eec74 chore(gpu): use active streams in int_radix_lut 2025-10-09 21:59:15 +02:00
pgardratzama
ca4159f123 fix(hpu): fix overflow flag of OVF_MUL & OVF_MULS, also update simulation HPU config 2025-10-07 10:14:43 +02:00
Arthur Meyre
e07f07c4c8 chore: bump tfhe-cuda-backend to 0.12.0 2025-10-06 13:26:54 +02:00
Arthur Meyre
81cc0c31b4 chore: constrain bytemuck < 1.24.0 as we don't have avx512 updated code 2025-10-06 13:24:16 +02:00