tfhe-rs

mirror of https://github.com/zama-ai/tfhe-rs.git synced 2026-01-09 14:47:56 -05:00

Author	SHA1	Message	Date
pgardratzama	bd7df4a03b	chore(hpu): enable hpu hlapi workflow and throughput bench in integer workflow	2025-09-05 10:42:36 +02:00
pgardratzama	6fe24c6ab3	chore(hpu): update hpu integer bench scalar op names	2025-09-05 10:42:36 +02:00
pgardratzama	c6aa1adbe7	chore(hpu): update benches to run new operations	2025-09-05 10:42:36 +02:00
David Testé	4a0658389e	chore(bench): make bits to prove customizable in zk benchmarks Some application like blockchain, may wants to prove less bits than CRS size allows to.	2025-09-05 09:03:24 +02:00
David Testé	97574bdae8	chore(bench): add noise squash benchmark with compressions This new benchmark is extracted from a use case. From a compressed ciphertext, it measures the decompression, then noise squashes it and finally compresses again the result.	2025-09-04 15:13:08 +02:00
Guillermo Oyarzun	c2e816a86c	fix(gpu): change mininum number of elements in benches	2025-09-04 11:03:27 +02:00
Pedro Alves	cad4070ebe	fix(gpu): fix the decompression function signature in the backend	2025-08-29 21:09:40 +02:00
Pedro Alves	94d24e1f8b	feat(gpu): implement the centered modulus switch technique to classical PBS	2025-08-29 11:38:26 -03:00
Pedro Alves	9a1c0f48f4	feat(gpu): implement 128-bit compression and add it to the integer API	2025-08-29 11:26:07 -03:00
Guillermo Oyarzun	a8f391a442	chore(gpu): update 4_1_1 params to match specialized pbs	2025-08-28 17:54:59 +02:00
David Testé	4b6942a0f8	chore(bench): add unbounded oprf integer benchmarks Also move Cuda OPRF benchmark into the same file as CPU implementation	2025-08-22 15:01:53 +02:00
David Testé	1647ec8f21	chore(bench): add 2 bits integer to full benchmarks This is done to measure execution time on FheBool equivalent on all operations.	2025-08-19 09:54:03 +02:00
David Testé	b3f1a85e1d	chore(bench): write parameters to disk for hlapi operations	2025-08-13 18:34:26 +02:00
Antoniu Pop	9316922e81	fix(benches): fix hlapi dex benchmark transfer function	2025-08-12 17:28:40 +01:00
Nicolas Sarlin	bc5c2f51ff	fix(bench): store correct pfail from params	2025-08-12 09:44:37 +02:00
Mayeul@Zama	4d1b917045	feat(shortint): add multibit noise squashing	2025-08-11 16:30:59 +02:00
David Testé	3b42f9873a	chore(bench): write params to file for each zk benchmark on gpu To be parsable each benchmark criterion ID must have their crypto details written to a file.	2025-08-07 15:17:33 +02:00
tmontaigu	8c838da209	chore(integer): improve measurements It seems that in ```rust bench_group.bench_function(&bench_id, \|b\| { // some code b.iter(\|\| { // function to bench }) }); ``` If we put code in the '// some code' part, it affects the measurements the slower this code is the worse the measurements can be. For many operations the gap is small (a few ms or no gap), but for the division the gap was around 500ms. So to reduce this, we move out what we can, moving the keycache access is the most important aspect as it cost around 70ms to 100ms. A LazyCell is used in order only access the keycache is the bench is not filtered out. Which is the behaviour we had before this commit, and the behaviour we want to keep so that running specific benches via regex selection stay fast. Also, for clean input benches, we use `iter` instead of `iter_batched` as it makes more sense and should give more accurate results as iter_batched timing include other things that just the timing of the function.	2025-07-15 12:46:38 +02:00
Agnes Leroy	068cbc0f41	chore(gpu): add hl api noise squash latency and throughput bench	2025-07-11 14:04:32 +01:00
Pedro Alves	1b98312e2c	fix(gpu): fix regression on ERC20 throughput - partially revert changes done in `fd79c4f972` - transfers for the GPU case should be measured using sequential operations (without rayon!)	2025-07-11 08:57:19 +01:00
Pedro Alves	d3dd010deb	fix(gpu): reduces number of elements in the ZK throughput benchmark	2025-07-11 08:57:01 +01:00
Pedro Alves	9960f5e8b6	fix(gpu): Fix expand bench on multi-gpus	2025-07-09 09:17:55 +01:00
Pedro Alves	23ebd42209	fix(gpu): fix compression throughput benchmark	2025-07-07 16:30:24 +01:00
Pedro Alves	8c88678ee8	feat(gpu): implement 128-bit multi-bit PBS	2025-07-03 20:34:32 -03:00
Agnes Leroy	e4d856afdf	chore(gpu): update noise squashing parameters	2025-07-03 12:51:19 +01:00
Pedro Alves	22ddba7145	fix(gpu): refactor the (128-bit and regular) classical PBS entry point to remove the num_samples parameter - fixes the throughput for those PBSs - also fixes the throughput benchmark for regular PBSs	2025-07-03 08:23:09 -03:00
David Testé	d955696fe0	chore(bench): reduce number of bit sizes to benchmark This is done to reduce execution time since 4 bits precision is not useful to measure.	2025-07-03 12:45:02 +02:00
Baptiste Roux	24572edb1c	feat(hpu): Add support for centered modswitch. Add new field in HpuPBSParameters (log2_pfail and modulus_switch_type). Also add new parameters set definition in shortint for benchmark matching. Remove the used of use_mean_compensation register, this information is now embedded inside the parameters set definition. Update psi64.hpu archive with newest bitstream	2025-07-02 14:41:41 +02:00
Arthur Meyre	86a40bcea9	chore: move gated import to section with feature gate in HL erc20 bench	2025-07-02 13:14:31 +02:00
Mayeul@Zama	e1620d4087	feat(shortint): add support for centered modulus switch in parameters	2025-07-01 14:18:10 +02:00
pgardratzama	702989f796	fix(hpu): it seems transfer_safe is not totally safe with HPU	2025-06-20 10:04:16 +02:00
Nicolas Sarlin	343cad641c	chore: TFHE-rs 1.3.0	2025-06-18 10:20:49 +02:00
David Testé	39d77299ed	chore(bench): harmonize dex benchmark function names	2025-06-18 09:47:57 +02:00
Andrei Stoian	7986e0bf1d	chore(gpu): skip packing ks test if it needs more ram than available	2025-06-12 17:47:10 +02:00
David Testé	11c0340eca	chore(bench): plug server-side proof in zk benchmarks	2025-06-10 18:00:39 +02:00
Baptiste Roux	443e02215f	feat(hpu): Add recent IOp in integer benchmarks	2025-06-10 17:43:35 +02:00
Baptiste Roux	96c8c44c71	feat(hpu): Enable some erc20 impl With the support of overflowing ops, those impl are now available to Hpu	2025-06-10 17:43:35 +02:00
Guillermo Oyarzun	0d81623a23	feat(gpu): add squash noise in the hlapi	2025-06-10 13:14:29 +02:00
Agnes Leroy	3bfacc1e9d	chore(bench): add swap throughput benchmark	2025-05-27 12:08:31 +02:00
Agnes Leroy	a47a418d41	chore(gpu): rework dex bench to prepare throughput benchmark	2025-05-27 12:08:31 +02:00
Nicolas Sarlin	f51c70d536	feat(shortint): adds generic client key for atomic pattern support	2025-05-26 16:53:35 +02:00
Pedro Alves	408e81c45a	feat(gpu): add support for GPU-accelerated expand on the HL Api - includes documentation about GPU's accelerated expand on the HL API - rework CudaKeySwitchingKey - Cloning the key is no longer necessary on the HL API	2025-05-23 11:54:29 +02:00
Nicolas Sarlin	25d008bae8	fix(bench): add missing internal keycache feature	2025-05-22 16:14:30 +02:00
Pedro Alves	259d125434	fix(gpu): fix pbs and ks benchmarks	2025-05-20 17:37:48 +02:00
David Testé	e29d615b9d	chore(bench): add suitable heuristic for zk throughput Heuristic based on PBS count was flawed since a ZK verification operation will eat up to 32 threads on the machine. The previous heuristic could generate an input data vector way bigger than the total of threads divided by 32. This in turn lead to long execution time for benchmark and generate bad results.	2025-05-20 15:02:59 +02:00
Nicolas Sarlin	a01949e630	fix(bench): compilation error without the internal-keycache feature	2025-05-19 09:50:29 +02:00
Baptiste Roux	9ee8259002	feat(hpu): Add Hpu backend implementation This backend abstract communication with Hpu Fpga hardware. It define it's proper entities to prevent circular dependencies with tfhe-rs. Object lifetime is handle through Arc<Mutex<T>> wrapper, and enforce that all objects currently alive in Hpu Hw are also kept valid on the host side. It contains the second version of HPU instruction set (HIS_V2.0): * DOp have following properties: + Template as first class citizen + Support of Immediate template + Direct parser and conversion between Asm/Hex + Replace deku (and it's associated endianess limitation) by + bitfield_struct and manual parsing * IOp have following properties: + Support various number of Destination + Support various number of Sources + Support various number of Immediat values + Support of multiple bitwidth (Not implemented yet in the Fpga firmware) Details could be view in `backends/tfhe-hpu-backend/Readme.md`	2025-05-16 16:30:23 +02:00
David Testé	97b5973e4c	chore(bench): store object measurements results in tfhe-benchmark	2025-05-13 16:05:16 +02:00
Agnes Leroy	fd79c4f972	chore(bench): parallelize transfer bench	2025-05-13 10:45:48 +02:00
David Testé	a96970e8c3	chore: update clap dependency version to 4.5.30	2025-05-13 10:35:51 +02:00

1 2

52 Commits