tfhe-rs

mirror of https://github.com/zama-ai/tfhe-rs.git synced 2026-01-07 22:04:10 -05:00

Author	SHA1	Message	Date
Arthur Meyre	20a91337c1	chore: prepare v1.5	2025-10-16 15:23:36 +02:00
Thomas Montaigu	498b0e6e5c	refactor: use BTreeMap as internals of KVStore This is to make the order of the key and value lists deterministic when compressing	2025-10-14 17:04:13 +02:00
Thomas Montaigu	126138a59d	chore: only run KVStore benches on CPU As its the only backend that supports it	2025-10-08 11:52:14 +02:00
pgardratzama	3073d60f11	fix(hpu): work-around a criterion assert by reducing number of elements on division & modulus throughput bench	2025-10-07 14:23:07 +02:00
pgardratzama	ab25919187	fix(hpu): throughput benchmarks were done 1 IOp per 1 IOp...	2025-10-07 10:14:43 +02:00
Nicolas Sarlin	6a676551d8	chore(shortint): add metaparams for ks32	2025-10-07 09:51:09 +02:00
Enzo Di Maria	f0f3dd76eb	feat(gpu): aes 128	2025-10-06 09:31:36 +02:00
Thomas Montaigu	e523fd2cb6	feat: add KVStore to the high level api * Added Value type name to crate::integer::KVStore impl of Named trait as well as a bool to check we deserialize the correct value type (Radix vs SignedRadix) * Add KVStore to high_level_api * Add KVStore hlapi benches * Remove specialized `[add,mul,sub]_to_slot` as `map` is now the intended API. - mul_to_slot was way slower than using `map` - add/mul_to_slot were a bit faster (~5% latency-wise), but returned less information (no old_value, no new_value, no boolean to check) if the key matched - Some known improvement can be made to map, which should result in it being better than add/sub_to_slot * Add FheIntegerType trait to make the KVStore generic over FheUint/FheInt, and should make GPU integration "easy"	2025-10-03 15:01:23 +02:00
Agnes Leroy	f9e876730a	chore(gpu): remove support for drift noise reduction	2025-10-03 09:45:20 +02:00
pgardratzama	39b81a8ded	feat(hpu): move to new bitstream at 400Mhz with GRAM_NB 3 - update SIMD_N and min_batch_size to 12 which seems to give better latency and ERC20 throughput - support IOp on several lines in ami /proc file - reduce amount of ERC_20_SIMD per batch in HLAPI bench	2025-10-02 13:20:36 +02:00
pgardratzama	2bf595d0e2	fix(hpu): missing bench numbers for less_than & less_or_equal because lower != less	2025-10-02 13:20:36 +02:00
David Testé	d397ea3a39	chore(bench): handle ks32 atomic pattern in key size measurements	2025-09-23 12:01:33 +02:00
Guillermo Oyarzun	022cb3b18a	fix(gpu): avoid out of memory when benchmarking throughput	2025-09-19 14:44:12 +02:00
David Testé	4ba1787e12	chore(bench): add crs size in zk-pke benchmark names This is done get more details about the benchmarks when parsing results.	2025-09-16 16:06:41 +02:00
David Testé	366d359441	chore(bench): measure ciphertext and key sizes at a large scale Ciphertext sizes are measured at HLAPI layer with several parameters set. Keys sizes are measured at shortint level. This benchmark has now its dedicated GitHub workflow that would run, at least, each 24th of the month.	2025-09-16 15:43:36 +02:00
pgardratzama	757c2fc828	chore(hpu): make hpu integer bench fast by default	2025-09-10 22:24:31 +02:00
pgardratzama	4ff0d6cac2	feat(hpu): integer bench update (adds mod, div -> div_mod), erc20_simd simd batch size read from iop prototype	2025-09-10 22:24:31 +02:00
pgardratzama	1530f52c79	feat(hpu): adds support of ERC20 SIMD in hpu ERC20 bench	2025-09-10 22:24:31 +02:00
tmontaigu	e8dc403ebd	feat(integer): add flip operation Add the flip(condition: BooleanBlock, a: T, b: T) -> (T, T) operation that homomorphically flip/swap two values if the given encrypted boolean encrypts true	2025-09-10 09:44:28 +02:00
Pedro Alves	c78cc2d2e9	chore(gpu): add a benchmark for 128-bit multi-bit noise squashing - Also, remove the lut indexes concept from the 128-bit multi-bit pbs. It's assumed not to exist by the entire backend (as it doesn't for classical PBS). So to keep it here would be a bit error prone.	2025-09-09 07:51:35 -03:00
David Testé	89b36ebca0	chore(bench): remove 2-bits size for full precision bench on gpu GPU backend cannot accept less than 2 blocks for integer benchmarks. Since 2-bits precision benchmarks are run with _MESSAGE_2_CARRY_2_ parameters, it will create only one block of ciphertext, thus making the benchmarks unsuitable for GPU backend.	2025-09-08 12:24:24 +02:00
pgardratzama	bd7df4a03b	chore(hpu): enable hpu hlapi workflow and throughput bench in integer workflow	2025-09-05 10:42:36 +02:00
pgardratzama	6fe24c6ab3	chore(hpu): update hpu integer bench scalar op names	2025-09-05 10:42:36 +02:00
pgardratzama	c6aa1adbe7	chore(hpu): update benches to run new operations	2025-09-05 10:42:36 +02:00
David Testé	4a0658389e	chore(bench): make bits to prove customizable in zk benchmarks Some application like blockchain, may wants to prove less bits than CRS size allows to.	2025-09-05 09:03:24 +02:00
David Testé	97574bdae8	chore(bench): add noise squash benchmark with compressions This new benchmark is extracted from a use case. From a compressed ciphertext, it measures the decompression, then noise squashes it and finally compresses again the result.	2025-09-04 15:13:08 +02:00
Guillermo Oyarzun	c2e816a86c	fix(gpu): change mininum number of elements in benches	2025-09-04 11:03:27 +02:00
Pedro Alves	cad4070ebe	fix(gpu): fix the decompression function signature in the backend	2025-08-29 21:09:40 +02:00
Pedro Alves	94d24e1f8b	feat(gpu): implement the centered modulus switch technique to classical PBS	2025-08-29 11:38:26 -03:00
Pedro Alves	9a1c0f48f4	feat(gpu): implement 128-bit compression and add it to the integer API	2025-08-29 11:26:07 -03:00
Guillermo Oyarzun	a8f391a442	chore(gpu): update 4_1_1 params to match specialized pbs	2025-08-28 17:54:59 +02:00
David Testé	4b6942a0f8	chore(bench): add unbounded oprf integer benchmarks Also move Cuda OPRF benchmark into the same file as CPU implementation	2025-08-22 15:01:53 +02:00
David Testé	1647ec8f21	chore(bench): add 2 bits integer to full benchmarks This is done to measure execution time on FheBool equivalent on all operations.	2025-08-19 09:54:03 +02:00
David Testé	b3f1a85e1d	chore(bench): write parameters to disk for hlapi operations	2025-08-13 18:34:26 +02:00
Antoniu Pop	9316922e81	fix(benches): fix hlapi dex benchmark transfer function	2025-08-12 17:28:40 +01:00
Nicolas Sarlin	bc5c2f51ff	fix(bench): store correct pfail from params	2025-08-12 09:44:37 +02:00
Mayeul@Zama	4d1b917045	feat(shortint): add multibit noise squashing	2025-08-11 16:30:59 +02:00
David Testé	3b42f9873a	chore(bench): write params to file for each zk benchmark on gpu To be parsable each benchmark criterion ID must have their crypto details written to a file.	2025-08-07 15:17:33 +02:00
tmontaigu	8c838da209	chore(integer): improve measurements It seems that in ```rust bench_group.bench_function(&bench_id, \|b\| { // some code b.iter(\|\| { // function to bench }) }); ``` If we put code in the '// some code' part, it affects the measurements the slower this code is the worse the measurements can be. For many operations the gap is small (a few ms or no gap), but for the division the gap was around 500ms. So to reduce this, we move out what we can, moving the keycache access is the most important aspect as it cost around 70ms to 100ms. A LazyCell is used in order only access the keycache is the bench is not filtered out. Which is the behaviour we had before this commit, and the behaviour we want to keep so that running specific benches via regex selection stay fast. Also, for clean input benches, we use `iter` instead of `iter_batched` as it makes more sense and should give more accurate results as iter_batched timing include other things that just the timing of the function.	2025-07-15 12:46:38 +02:00
Agnes Leroy	068cbc0f41	chore(gpu): add hl api noise squash latency and throughput bench	2025-07-11 14:04:32 +01:00
Pedro Alves	1b98312e2c	fix(gpu): fix regression on ERC20 throughput - partially revert changes done in `fd79c4f972` - transfers for the GPU case should be measured using sequential operations (without rayon!)	2025-07-11 08:57:19 +01:00
Pedro Alves	d3dd010deb	fix(gpu): reduces number of elements in the ZK throughput benchmark	2025-07-11 08:57:01 +01:00
Pedro Alves	9960f5e8b6	fix(gpu): Fix expand bench on multi-gpus	2025-07-09 09:17:55 +01:00
Pedro Alves	23ebd42209	fix(gpu): fix compression throughput benchmark	2025-07-07 16:30:24 +01:00
Pedro Alves	8c88678ee8	feat(gpu): implement 128-bit multi-bit PBS	2025-07-03 20:34:32 -03:00
Agnes Leroy	e4d856afdf	chore(gpu): update noise squashing parameters	2025-07-03 12:51:19 +01:00
Pedro Alves	22ddba7145	fix(gpu): refactor the (128-bit and regular) classical PBS entry point to remove the num_samples parameter - fixes the throughput for those PBSs - also fixes the throughput benchmark for regular PBSs	2025-07-03 08:23:09 -03:00
David Testé	d955696fe0	chore(bench): reduce number of bit sizes to benchmark This is done to reduce execution time since 4 bits precision is not useful to measure.	2025-07-03 12:45:02 +02:00
Baptiste Roux	24572edb1c	feat(hpu): Add support for centered modswitch. Add new field in HpuPBSParameters (log2_pfail and modulus_switch_type). Also add new parameters set definition in shortint for benchmark matching. Remove the used of use_mean_compensation register, this information is now embedded inside the parameters set definition. Update psi64.hpu archive with newest bitstream	2025-07-02 14:41:41 +02:00
Arthur Meyre	86a40bcea9	chore: move gated import to section with feature gate in HL erc20 bench	2025-07-02 13:14:31 +02:00

1 2 3

123 Commits