tfhe-rs

mirror of https://github.com/zama-ai/tfhe-rs.git synced 2026-01-08 22:28:01 -05:00

Author	SHA1	Message	Date
Nicolas Sarlin	312ce494bf	chore(zk): add 1 * 64 benches with production CRS	2025-12-17 15:06:37 +01:00
Enzo Di Maria	cf969ff930	refactor(gpu): creating benchmarks for match_value	2025-12-11 12:01:43 +01:00
Nicolas Sarlin	01367368ed	chore(zk): do not bench zkv1 at the integer level	2025-11-25 17:20:06 +01:00
Nicolas Sarlin	33f77458e9	chore(zk): fix elements count for zk throughput benches	2025-11-25 17:20:06 +01:00
Arthur Meyre	caf5e9d879	chore: fix scalar benchmarks generating fixed values - this would not give an average runtime for scalar benchmarks and for small precisions could give super good timings (for lucky values) - the timings for other precisions could still be favorable or unfavorable depending on the value that was drawn	2025-11-25 14:23:55 +01:00
David Testé	b0393c0acb	chore(bench): run scalar ops in integer deduplicated cpu bench	2025-11-24 14:03:08 +01:00
David Testé	58378b7972	chore(bench): add dedicated targets for aes cuda benchmarks	2025-11-20 16:58:06 +01:00
David Testé	071e70c037	chore(bench): fix benchmark id pattern for aes and aes256	2025-11-19 17:23:05 +01:00
Mayeul@Zama	f9268b889f	chore(bench): revert print bench id This reverts commit `ef07963767`.	2025-11-17 11:23:50 +01:00
Enzo Di Maria	54c8c5e020	chore(gpu): no crash with aes benches if oom error	2025-11-14 17:02:33 +01:00
David Testé	ef07963767	chore(bench): print bench id before running the benchmark Done to circumvent criterion limitation regarding automatic truncation of long benchmark ID. Using a println() call we ensure the complete name is displayed before benchmark execution to ease manual parsing and debugging.	2025-11-14 13:45:04 +01:00
Enzo Di Maria	4ff95e3a42	feat(gpu): AES 256	2025-11-05 13:37:08 +01:00
Pedro Alves	867f8fb579	feat(gpu): implement re-randomization - exposed to integer and HL API - test on the HL API - benchmarks for GPU and CPU implementation	2025-10-29 17:55:45 -03:00
Pedro Alves	70773e442c	fix(gpu): fix 128-bit compression benchmark	2025-10-27 17:06:45 +01:00
Mayeul@Zama	777bbe437a	feat(shortint): add multi bit decompression	2025-10-24 09:28:17 +02:00
Arthur Meyre	23246f63f7	chore: update fast_dedup opset to match the latency benchmarks in the docs - signed bench update	2025-10-23 10:42:19 +02:00
Arthur Meyre	11c79b5237	chore: update fast_dedup opset to match the latency benchmarks in the docs	2025-10-23 10:42:19 +02:00
pgardratzama	f9c89212ea	fix(hpu): display name on shift looked wrong	2025-10-21 13:29:59 +02:00
Thomas Montaigu	0dd0ead4e2	chore(bench): remove trivial encryptions It makes benches not accurate	2025-10-20 12:26:44 +02:00
Agnes Leroy	c30835fc30	chore(gpu): remove async entry points for abs, add, sub, aes	2025-10-17 15:42:06 +02:00
pgardratzama	3073d60f11	fix(hpu): work-around a criterion assert by reducing number of elements on division & modulus throughput bench	2025-10-07 14:23:07 +02:00
pgardratzama	ab25919187	fix(hpu): throughput benchmarks were done 1 IOp per 1 IOp...	2025-10-07 10:14:43 +02:00
Enzo Di Maria	f0f3dd76eb	feat(gpu): aes 128	2025-10-06 09:31:36 +02:00
pgardratzama	2bf595d0e2	fix(hpu): missing bench numbers for less_than & less_or_equal because lower != less	2025-10-02 13:20:36 +02:00
David Testé	4ba1787e12	chore(bench): add crs size in zk-pke benchmark names This is done get more details about the benchmarks when parsing results.	2025-09-16 16:06:41 +02:00
David Testé	366d359441	chore(bench): measure ciphertext and key sizes at a large scale Ciphertext sizes are measured at HLAPI layer with several parameters set. Keys sizes are measured at shortint level. This benchmark has now its dedicated GitHub workflow that would run, at least, each 24th of the month.	2025-09-16 15:43:36 +02:00
pgardratzama	4ff0d6cac2	feat(hpu): integer bench update (adds mod, div -> div_mod), erc20_simd simd batch size read from iop prototype	2025-09-10 22:24:31 +02:00
tmontaigu	e8dc403ebd	feat(integer): add flip operation Add the flip(condition: BooleanBlock, a: T, b: T) -> (T, T) operation that homomorphically flip/swap two values if the given encrypted boolean encrypts true	2025-09-10 09:44:28 +02:00
pgardratzama	6fe24c6ab3	chore(hpu): update hpu integer bench scalar op names	2025-09-05 10:42:36 +02:00
pgardratzama	c6aa1adbe7	chore(hpu): update benches to run new operations	2025-09-05 10:42:36 +02:00
David Testé	4a0658389e	chore(bench): make bits to prove customizable in zk benchmarks Some application like blockchain, may wants to prove less bits than CRS size allows to.	2025-09-05 09:03:24 +02:00
Pedro Alves	9a1c0f48f4	feat(gpu): implement 128-bit compression and add it to the integer API	2025-08-29 11:26:07 -03:00
David Testé	4b6942a0f8	chore(bench): add unbounded oprf integer benchmarks Also move Cuda OPRF benchmark into the same file as CPU implementation	2025-08-22 15:01:53 +02:00
David Testé	3b42f9873a	chore(bench): write params to file for each zk benchmark on gpu To be parsable each benchmark criterion ID must have their crypto details written to a file.	2025-08-07 15:17:33 +02:00
tmontaigu	8c838da209	chore(integer): improve measurements It seems that in ```rust bench_group.bench_function(&bench_id, \|b\| { // some code b.iter(\|\| { // function to bench }) }); ``` If we put code in the '// some code' part, it affects the measurements the slower this code is the worse the measurements can be. For many operations the gap is small (a few ms or no gap), but for the division the gap was around 500ms. So to reduce this, we move out what we can, moving the keycache access is the most important aspect as it cost around 70ms to 100ms. A LazyCell is used in order only access the keycache is the bench is not filtered out. Which is the behaviour we had before this commit, and the behaviour we want to keep so that running specific benches via regex selection stay fast. Also, for clean input benches, we use `iter` instead of `iter_batched` as it makes more sense and should give more accurate results as iter_batched timing include other things that just the timing of the function.	2025-07-15 12:46:38 +02:00
Pedro Alves	d3dd010deb	fix(gpu): reduces number of elements in the ZK throughput benchmark	2025-07-11 08:57:01 +01:00
Pedro Alves	9960f5e8b6	fix(gpu): Fix expand bench on multi-gpus	2025-07-09 09:17:55 +01:00
Pedro Alves	23ebd42209	fix(gpu): fix compression throughput benchmark	2025-07-07 16:30:24 +01:00
David Testé	11c0340eca	chore(bench): plug server-side proof in zk benchmarks	2025-06-10 18:00:39 +02:00
Baptiste Roux	443e02215f	feat(hpu): Add recent IOp in integer benchmarks	2025-06-10 17:43:35 +02:00
Pedro Alves	408e81c45a	feat(gpu): add support for GPU-accelerated expand on the HL Api - includes documentation about GPU's accelerated expand on the HL API - rework CudaKeySwitchingKey - Cloning the key is no longer necessary on the HL API	2025-05-23 11:54:29 +02:00
David Testé	e29d615b9d	chore(bench): add suitable heuristic for zk throughput Heuristic based on PBS count was flawed since a ZK verification operation will eat up to 32 threads on the machine. The previous heuristic could generate an input data vector way bigger than the total of threads divided by 32. This in turn lead to long execution time for benchmark and generate bad results.	2025-05-20 15:02:59 +02:00
Baptiste Roux	9ee8259002	feat(hpu): Add Hpu backend implementation This backend abstract communication with Hpu Fpga hardware. It define it's proper entities to prevent circular dependencies with tfhe-rs. Object lifetime is handle through Arc<Mutex<T>> wrapper, and enforce that all objects currently alive in Hpu Hw are also kept valid on the host side. It contains the second version of HPU instruction set (HIS_V2.0): * DOp have following properties: + Template as first class citizen + Support of Immediate template + Direct parser and conversion between Asm/Hex + Replace deku (and it's associated endianess limitation) by + bitfield_struct and manual parsing * IOp have following properties: + Support various number of Destination + Support various number of Sources + Support various number of Immediat values + Support of multiple bitwidth (Not implemented yet in the Fpga firmware) Details could be view in `backends/tfhe-hpu-backend/Readme.md`	2025-05-16 16:30:23 +02:00
David Testé	67ec4a28c1	chore(bench): move benchmarks to their own crate This is done to speed-up compilation duration by avoiding recompiling tfhe each time a modification is made in a benchmark file.	2025-05-09 13:46:27 +02:00

44 Commits