Agnes Leroy
|
eb3b988380
|
Use internal streams
|
2025-12-05 15:28:07 +01:00 |
|
Agnes Leroy
|
8ed3b4b59d
|
chore(gpu): reuse CPU LUT buffer to generate accumulators
|
2025-12-05 15:23:07 +01:00 |
|
Agnes Leroy
|
20daf182f0
|
Experiment with erc20 in the backend
|
2025-12-05 15:23:07 +01:00 |
|
Enzo Di Maria
|
5273f61593
|
refactor(gpu): creating InternalCudaStreams to improve the management of multiple streams per GPU
|
2025-12-05 10:13:18 +01:00 |
|
Enzo Di Maria
|
a96d68323b
|
fix(gpu): No broadcast is needed because full prop is done on 1 single GPU
|
2025-12-04 09:17:40 +01:00 |
|
pgardratzama
|
1f48456b17
|
feat(hpu): adds a PSI64 HPU now using ISC IOP Ack interrupt to internal core
AMC fw version update from 3.1.0 to 3.1.1
|
2025-12-03 09:33:44 +01:00 |
|
Enzo Di Maria
|
6752249f7f
|
refactor(gpu): moving vector_comparisons's functions to the backend
|
2025-12-03 09:11:00 +01:00 |
|
Enzo Di Maria
|
cc33161b97
|
refactor(gpu): moving cast_to_signed to the backend
|
2025-12-02 12:40:47 +01:00 |
|
Enzo Di Maria
|
3cfbaa40c3
|
refactor(gpu): unchecked_index_of_clear to backend
|
2025-11-28 14:34:53 +01:00 |
|
Andrei Stoian
|
e2063c8ef4
|
chore(gpu): bench KS latency batches
|
2025-11-27 17:32:44 +01:00 |
|
Enzo Di Maria
|
0aa0918fea
|
refactor(gpu): vector_find's functions to backend
|
2025-11-27 13:36:10 +01:00 |
|
Enzo Di Maria
|
b6fffa3d86
|
fix(gpu): wrong number of blocks in tmp_match_result
|
2025-11-25 10:50:30 +01:00 |
|
Guillermo Oyarzun
|
28ea0bc086
|
fix(gpu): force uint64 when calculating lwe chunksize
|
2025-11-25 09:52:16 +01:00 |
|
Enzo Di Maria
|
32b1a7ab1d
|
refactor(gpu): unchecked_match_value_or to backend
|
2025-11-24 13:50:15 +01:00 |
|
Guillermo Oyarzun
|
02312e23ea
|
fix(gpu): fix pbs128 selection for small num samples
|
2025-11-21 10:57:14 +01:00 |
|
Enzo Di Maria
|
c6709a82c0
|
refactor(gpu): match_value to backend with multiple streams
|
2025-11-20 17:11:15 +01:00 |
|
Beka Barbakadze
|
80cacbd079
|
feat(gpu): add boolean bitops in cuda backend
|
2025-11-20 14:56:21 +01:00 |
|
Nicolas Sarlin
|
edb435bd46
|
chore: update msrv to 1.91.1
|
2025-11-20 09:29:37 +01:00 |
|
Agnes Leroy
|
df73c36cbf
|
fix(gpu): fix decomposition algorithm not matching the theory
|
2025-11-14 16:36:35 +01:00 |
|
Agnes Leroy
|
4f9f4982f6
|
fix(gpu): fix memory leak in rerand
|
2025-11-14 14:00:01 +01:00 |
|
pgardratzama
|
d38df76eb6
|
chore(hpu): adds a page about HPU PBS performances
|
2025-11-10 18:43:50 +01:00 |
|
pgardratzama
|
afaf761cdd
|
chore(hpu): adds 3 custom IOp to measure PBS performance on HPU and update trace parser to handle 32b timestamp wrap
|
2025-11-10 18:43:50 +01:00 |
|
pgardratzama
|
4eb4fa95e3
|
feat(hpu): new HPU bitstream with few optimizations (GRAM arb, ALU nb, BSK manager)
|
2025-11-10 09:14:18 +01:00 |
|
Guillermo Oyarzun
|
12426573fa
|
fix(gpu): add upper bound to lwe_chunk_size calculation
|
2025-11-07 09:29:40 +01:00 |
|
Guillermo Oyarzun
|
6f105cd82e
|
fix(gpu): fix out of bounds in specialized classical pbs
|
2025-11-06 15:35:04 +01:00 |
|
Enzo Di Maria
|
4ff95e3a42
|
feat(gpu): AES 256
|
2025-11-05 13:37:08 +01:00 |
|
Baptiste Roux
|
f970031d33
|
chore(hpu): Update version of hw_regmap deps
This new version update rust MSRV.
|
2025-11-04 15:26:27 +01:00 |
|
Arthur Meyre
|
00ce0deec9
|
chore: make typos version fixed
- add a script to properly install the correct version
- correct new typos
|
2025-11-03 14:58:23 +01:00 |
|
Enzo Di Maria
|
026cc376ed
|
refactor(gpu): multibit decompression
|
2025-10-30 08:59:10 +01:00 |
|
Pedro Alves
|
867f8fb579
|
feat(gpu): implement re-randomization
- exposed to integer and HL API
- test on the HL API
- benchmarks for GPU and CPU implementation
|
2025-10-29 17:55:45 -03:00 |
|
Guillermo Oyarzun
|
62780ac500
|
fix(gpu): fix decompression mem leak
|
2025-10-24 13:02:41 +02:00 |
|
Guillermo Oyarzun
|
e12638dabe
|
feat(gpu): extend specialized version to classical pbs
|
2025-10-22 09:20:40 +02:00 |
|
pgardratzama
|
79f1d22573
|
fix(hpu): scalar rot & shift were not doing anything and not tested in test/hpu.rs
|
2025-10-21 13:29:59 +02:00 |
|
pgardratzama
|
b918f77859
|
chore(hpu): add force_reload option in v80 config, remove added line in sim config
|
2025-10-21 13:29:59 +02:00 |
|
Helder Campos
|
054c5028a1
|
feat(hpu): Added the option to forcefully reload the HPU
|
2025-10-21 13:29:59 +02:00 |
|
Helder Campos
|
7b621e57b0
|
feat(hpu): LLT ROT/SHIFT IOPs
|
2025-10-21 13:29:59 +02:00 |
|
Agnes Leroy
|
b4b6275ca5
|
chore(gpu): remove device synchronize in drop for cudavec
|
2025-10-21 11:33:46 +02:00 |
|
Beka Barbakadze
|
39862c2861
|
fix(gpu): fix bug in are_all_comparison_blocks_true when number of blocks is 0
|
2025-10-20 13:26:50 +02:00 |
|
Guillermo Oyarzun
|
c22e63895e
|
fix(gpu): fix multi-gpu throughput benches with classical pbs
|
2025-10-16 17:55:10 +02:00 |
|
Enzo Di Maria
|
126e779533
|
refactor(gpu): oprf_unsigned_custom_range + tests
|
2025-10-16 09:31:01 +02:00 |
|
Enzo Di Maria
|
353237c0d6
|
refactor(gpu): oprf_unsigned_custom_range
|
2025-10-16 09:31:01 +02:00 |
|
Agnes Leroy
|
7bad509f9a
|
fix(gpu): fix perf regression introduced in cf3f25efdd
|
2025-10-16 09:21:05 +02:00 |
|
Agnes Leroy
|
cf3f25efdd
|
chore(gpu): add missing syncs in linearalgebra functions and aes
|
2025-10-14 09:23:11 +02:00 |
|
Agnes Leroy
|
c3ed1a7558
|
chore(gpu): internal renaming
|
2025-10-14 09:23:11 +02:00 |
|
Agnes Leroy
|
6347f25668
|
chore(gpu): synchronize after every release
|
2025-10-14 09:23:11 +02:00 |
|
Agnes Leroy
|
91b263d480
|
chore(gpu): split integer utilities file
|
2025-10-10 14:49:02 +02:00 |
|
Andrei Stoian
|
30938eec74
|
chore(gpu): use active streams in int_radix_lut
|
2025-10-09 21:59:15 +02:00 |
|
pgardratzama
|
ca4159f123
|
fix(hpu): fix overflow flag of OVF_MUL & OVF_MULS, also update simulation HPU config
|
2025-10-07 10:14:43 +02:00 |
|
Arthur Meyre
|
e07f07c4c8
|
chore: bump tfhe-cuda-backend to 0.12.0
|
2025-10-06 13:26:54 +02:00 |
|
Arthur Meyre
|
81cc0c31b4
|
chore: constrain bytemuck < 1.24.0 as we don't have avx512 updated code
|
2025-10-06 13:24:16 +02:00 |
|