Baptiste Roux
52b8e81ccb
fix(hpu): Correctly select adder configuration in ERC_20/ERC_20_SIMD
...
Add knobs to select ripple or kogge adder in ERC_20/ERC_20_SIMD.
Previously, it was hardcoded to ripple carry and thus degraded latency
performance of ERC_20.
2025-12-24 10:38:38 +01:00
Guillermo Oyarzun
92df46f8f2
fix(gpu): return to 64 regs in multi-bit pbs
2025-12-23 11:51:00 +01:00
Agnes Leroy
49be544297
fix(gpu): fix cpu memory leak in expand and rerand
2025-12-18 16:33:23 +01:00
Enzo Di Maria
ca2a79f1fb
refactor(gpu): Threshold for multi-GPU with Classical PBS
2025-12-18 09:27:09 +01:00
Enzo Di Maria
0a59e86675
fix(gpu): Using tbc for classical 64 bits pbs on H100
2025-12-17 19:18:01 +01:00
Agnes Leroy
cfa53682ae
fix(gpu): add missing sync before free in oprf
2025-12-16 09:42:11 +01:00
Agnes Leroy
006d6cc300
fix(gpu): fix some cpu memory leaks
2025-12-15 14:27:35 -03:00
Andrei Stoian
78d1ce18c1
feat(gpu): support keyswitch 64/32
2025-12-12 22:01:49 +01:00
Guillermo Oyarzun
11579bd3d0
feat(gpu): create noise and pfail tests for pbs + ks + ms
2025-12-12 09:41:11 +01:00
Arthur Meyre
a5aa3c366f
chore: bump tfhe-hpu-backend
2025-12-11 13:12:36 +01:00
Arthur Meyre
5818b08f2c
chore: bump tfhe-cuda-backend to 0.13.0
2025-12-11 13:12:36 +01:00
Arthur Meyre
de439ff6a1
chore: fix repo name in hpu README
2025-12-11 13:12:36 +01:00
Agnes Leroy
1f0a83e4bb
fix(gpu): fix some CPU memory leaks due to the use of new without delete
2025-12-11 08:58:13 +01:00
Enzo Di Maria
5273f61593
refactor(gpu): creating InternalCudaStreams to improve the management of multiple streams per GPU
2025-12-05 10:13:18 +01:00
Enzo Di Maria
a96d68323b
fix(gpu): No broadcast is needed because full prop is done on 1 single GPU
2025-12-04 09:17:40 +01:00
pgardratzama
1f48456b17
feat(hpu): adds a PSI64 HPU now using ISC IOP Ack interrupt to internal core
...
AMC fw version update from 3.1.0 to 3.1.1
2025-12-03 09:33:44 +01:00
Enzo Di Maria
6752249f7f
refactor(gpu): moving vector_comparisons's functions to the backend
2025-12-03 09:11:00 +01:00
Enzo Di Maria
cc33161b97
refactor(gpu): moving cast_to_signed to the backend
2025-12-02 12:40:47 +01:00
Enzo Di Maria
3cfbaa40c3
refactor(gpu): unchecked_index_of_clear to backend
2025-11-28 14:34:53 +01:00
Andrei Stoian
e2063c8ef4
chore(gpu): bench KS latency batches
2025-11-27 17:32:44 +01:00
Enzo Di Maria
0aa0918fea
refactor(gpu): vector_find's functions to backend
2025-11-27 13:36:10 +01:00
Enzo Di Maria
b6fffa3d86
fix(gpu): wrong number of blocks in tmp_match_result
2025-11-25 10:50:30 +01:00
Guillermo Oyarzun
28ea0bc086
fix(gpu): force uint64 when calculating lwe chunksize
2025-11-25 09:52:16 +01:00
Enzo Di Maria
32b1a7ab1d
refactor(gpu): unchecked_match_value_or to backend
2025-11-24 13:50:15 +01:00
Guillermo Oyarzun
02312e23ea
fix(gpu): fix pbs128 selection for small num samples
2025-11-21 10:57:14 +01:00
Enzo Di Maria
c6709a82c0
refactor(gpu): match_value to backend with multiple streams
2025-11-20 17:11:15 +01:00
Beka Barbakadze
80cacbd079
feat(gpu): add boolean bitops in cuda backend
2025-11-20 14:56:21 +01:00
Nicolas Sarlin
edb435bd46
chore: update msrv to 1.91.1
2025-11-20 09:29:37 +01:00
Agnes Leroy
df73c36cbf
fix(gpu): fix decomposition algorithm not matching the theory
2025-11-14 16:36:35 +01:00
Agnes Leroy
4f9f4982f6
fix(gpu): fix memory leak in rerand
2025-11-14 14:00:01 +01:00
pgardratzama
d38df76eb6
chore(hpu): adds a page about HPU PBS performances
2025-11-10 18:43:50 +01:00
pgardratzama
afaf761cdd
chore(hpu): adds 3 custom IOp to measure PBS performance on HPU and update trace parser to handle 32b timestamp wrap
2025-11-10 18:43:50 +01:00
pgardratzama
4eb4fa95e3
feat(hpu): new HPU bitstream with few optimizations (GRAM arb, ALU nb, BSK manager)
2025-11-10 09:14:18 +01:00
Guillermo Oyarzun
12426573fa
fix(gpu): add upper bound to lwe_chunk_size calculation
2025-11-07 09:29:40 +01:00
Guillermo Oyarzun
6f105cd82e
fix(gpu): fix out of bounds in specialized classical pbs
2025-11-06 15:35:04 +01:00
Enzo Di Maria
4ff95e3a42
feat(gpu): AES 256
2025-11-05 13:37:08 +01:00
Baptiste Roux
f970031d33
chore(hpu): Update version of hw_regmap deps
...
This new version update rust MSRV.
2025-11-04 15:26:27 +01:00
Arthur Meyre
00ce0deec9
chore: make typos version fixed
...
- add a script to properly install the correct version
- correct new typos
2025-11-03 14:58:23 +01:00
Enzo Di Maria
026cc376ed
refactor(gpu): multibit decompression
2025-10-30 08:59:10 +01:00
Pedro Alves
867f8fb579
feat(gpu): implement re-randomization
...
- exposed to integer and HL API
- test on the HL API
- benchmarks for GPU and CPU implementation
2025-10-29 17:55:45 -03:00
Guillermo Oyarzun
62780ac500
fix(gpu): fix decompression mem leak
2025-10-24 13:02:41 +02:00
Guillermo Oyarzun
e12638dabe
feat(gpu): extend specialized version to classical pbs
2025-10-22 09:20:40 +02:00
pgardratzama
79f1d22573
fix(hpu): scalar rot & shift were not doing anything and not tested in test/hpu.rs
2025-10-21 13:29:59 +02:00
pgardratzama
b918f77859
chore(hpu): add force_reload option in v80 config, remove added line in sim config
2025-10-21 13:29:59 +02:00
Helder Campos
054c5028a1
feat(hpu): Added the option to forcefully reload the HPU
2025-10-21 13:29:59 +02:00
Helder Campos
7b621e57b0
feat(hpu): LLT ROT/SHIFT IOPs
2025-10-21 13:29:59 +02:00
Agnes Leroy
b4b6275ca5
chore(gpu): remove device synchronize in drop for cudavec
2025-10-21 11:33:46 +02:00
Beka Barbakadze
39862c2861
fix(gpu): fix bug in are_all_comparison_blocks_true when number of blocks is 0
2025-10-20 13:26:50 +02:00
Guillermo Oyarzun
c22e63895e
fix(gpu): fix multi-gpu throughput benches with classical pbs
2025-10-16 17:55:10 +02:00
Enzo Di Maria
126e779533
refactor(gpu): oprf_unsigned_custom_range + tests
2025-10-16 09:31:01 +02:00