Commit Graph

52 Commits

Author SHA1 Message Date
pgardratzama
bd7df4a03b chore(hpu): enable hpu hlapi workflow and throughput bench in integer workflow 2025-09-05 10:42:36 +02:00
pgardratzama
6fe24c6ab3 chore(hpu): update hpu integer bench scalar op names 2025-09-05 10:42:36 +02:00
pgardratzama
c6aa1adbe7 chore(hpu): update benches to run new operations 2025-09-05 10:42:36 +02:00
David Testé
4a0658389e chore(bench): make bits to prove customizable in zk benchmarks
Some application like blockchain, may wants to prove less bits
than CRS size allows to.
2025-09-05 09:03:24 +02:00
David Testé
97574bdae8 chore(bench): add noise squash benchmark with compressions
This new benchmark is extracted from a use case.
From a compressed ciphertext, it measures the decompression, then noise squashes it and finally compresses again the result.
2025-09-04 15:13:08 +02:00
Guillermo Oyarzun
c2e816a86c fix(gpu): change mininum number of elements in benches 2025-09-04 11:03:27 +02:00
Pedro Alves
cad4070ebe fix(gpu): fix the decompression function signature in the backend 2025-08-29 21:09:40 +02:00
Pedro Alves
94d24e1f8b feat(gpu): implement the centered modulus switch technique to classical PBS 2025-08-29 11:38:26 -03:00
Pedro Alves
9a1c0f48f4 feat(gpu): implement 128-bit compression and add it to the integer API 2025-08-29 11:26:07 -03:00
Guillermo Oyarzun
a8f391a442 chore(gpu): update 4_1_1 params to match specialized pbs 2025-08-28 17:54:59 +02:00
David Testé
4b6942a0f8 chore(bench): add unbounded oprf integer benchmarks
Also move Cuda OPRF benchmark into the same file as CPU implementation
2025-08-22 15:01:53 +02:00
David Testé
1647ec8f21 chore(bench): add 2 bits integer to full benchmarks
This is done to measure execution time on FheBool equivalent on all operations.
2025-08-19 09:54:03 +02:00
David Testé
b3f1a85e1d chore(bench): write parameters to disk for hlapi operations 2025-08-13 18:34:26 +02:00
Antoniu Pop
9316922e81 fix(benches): fix hlapi dex benchmark transfer function 2025-08-12 17:28:40 +01:00
Nicolas Sarlin
bc5c2f51ff fix(bench): store correct pfail from params 2025-08-12 09:44:37 +02:00
Mayeul@Zama
4d1b917045 feat(shortint): add multibit noise squashing 2025-08-11 16:30:59 +02:00
David Testé
3b42f9873a chore(bench): write params to file for each zk benchmark on gpu
To be parsable each benchmark criterion ID must have their crypto
details written to a file.
2025-08-07 15:17:33 +02:00
tmontaigu
8c838da209 chore(integer): improve measurements
It seems that in
```rust
bench_group.bench_function(&bench_id, |b| {
  // some code
  b.iter(|| {
      // function to bench
  })
});
```
If we put code in the '// some code' part, it affects the measurements
the slower this code is the worse the measurements can be.

For many operations the gap is small (a few ms or no gap),
but for the division the gap was around 500ms.

So to reduce this, we move out what we can, moving
the keycache access is the most important aspect as it
cost around 70ms to 100ms.

A LazyCell is used in order only access the keycache is the bench is not
filtered out. Which is the behaviour we had before this commit, and the
behaviour we want to keep so that running specific benches via regex
selection stay fast.

Also, for clean input benches, we use `iter` instead of `iter_batched`
as it makes more sense and should give more accurate results as
iter_batched timing include other things that just the timing of the
function.
2025-07-15 12:46:38 +02:00
Agnes Leroy
068cbc0f41 chore(gpu): add hl api noise squash latency and throughput bench 2025-07-11 14:04:32 +01:00
Pedro Alves
1b98312e2c fix(gpu): fix regression on ERC20 throughput
- partially revert changes done in fd79c4f972
- transfers for the GPU case should be measured using sequential
  operations (without rayon!)
2025-07-11 08:57:19 +01:00
Pedro Alves
d3dd010deb fix(gpu): reduces number of elements in the ZK throughput benchmark 2025-07-11 08:57:01 +01:00
Pedro Alves
9960f5e8b6 fix(gpu): Fix expand bench on multi-gpus 2025-07-09 09:17:55 +01:00
Pedro Alves
23ebd42209 fix(gpu): fix compression throughput benchmark 2025-07-07 16:30:24 +01:00
Pedro Alves
8c88678ee8 feat(gpu): implement 128-bit multi-bit PBS 2025-07-03 20:34:32 -03:00
Agnes Leroy
e4d856afdf chore(gpu): update noise squashing parameters 2025-07-03 12:51:19 +01:00
Pedro Alves
22ddba7145 fix(gpu): refactor the (128-bit and regular) classical PBS entry point to remove the num_samples parameter
- fixes the throughput for those PBSs
- also fixes the throughput benchmark for regular PBSs
2025-07-03 08:23:09 -03:00
David Testé
d955696fe0 chore(bench): reduce number of bit sizes to benchmark
This is done to reduce execution time since 4 bits precision is not useful to measure.
2025-07-03 12:45:02 +02:00
Baptiste Roux
24572edb1c feat(hpu): Add support for centered modswitch.
Add new field in HpuPBSParameters (log2_pfail and modulus_switch_type).
Also add new parameters set definition in shortint for benchmark matching.

Remove the used of use_mean_compensation register, this information is now embedded inside the parameters set definition.
Update psi64.hpu archive with newest bitstream
2025-07-02 14:41:41 +02:00
Arthur Meyre
86a40bcea9 chore: move gated import to section with feature gate in HL erc20 bench 2025-07-02 13:14:31 +02:00
Mayeul@Zama
e1620d4087 feat(shortint): add support for centered modulus switch in parameters 2025-07-01 14:18:10 +02:00
pgardratzama
702989f796 fix(hpu): it seems transfer_safe is not totally safe with HPU 2025-06-20 10:04:16 +02:00
Nicolas Sarlin
343cad641c chore: TFHE-rs 1.3.0 2025-06-18 10:20:49 +02:00
David Testé
39d77299ed chore(bench): harmonize dex benchmark function names 2025-06-18 09:47:57 +02:00
Andrei Stoian
7986e0bf1d chore(gpu): skip packing ks test if it needs more ram than available 2025-06-12 17:47:10 +02:00
David Testé
11c0340eca chore(bench): plug server-side proof in zk benchmarks 2025-06-10 18:00:39 +02:00
Baptiste Roux
443e02215f feat(hpu): Add recent IOp in integer benchmarks 2025-06-10 17:43:35 +02:00
Baptiste Roux
96c8c44c71 feat(hpu): Enable some erc20 impl
With the support of overflowing ops, those impl are now available to Hpu
2025-06-10 17:43:35 +02:00
Guillermo Oyarzun
0d81623a23 feat(gpu): add squash noise in the hlapi 2025-06-10 13:14:29 +02:00
Agnes Leroy
3bfacc1e9d chore(bench): add swap throughput benchmark 2025-05-27 12:08:31 +02:00
Agnes Leroy
a47a418d41 chore(gpu): rework dex bench to prepare throughput benchmark 2025-05-27 12:08:31 +02:00
Nicolas Sarlin
f51c70d536 feat(shortint): adds generic client key for atomic pattern support 2025-05-26 16:53:35 +02:00
Pedro Alves
408e81c45a feat(gpu): add support for GPU-accelerated expand on the HL Api
- includes documentation about GPU's accelerated expand on the HL API
- rework CudaKeySwitchingKey
- Cloning the key is no longer necessary on the HL API
2025-05-23 11:54:29 +02:00
Nicolas Sarlin
25d008bae8 fix(bench): add missing internal keycache feature 2025-05-22 16:14:30 +02:00
Pedro Alves
259d125434 fix(gpu): fix pbs and ks benchmarks 2025-05-20 17:37:48 +02:00
David Testé
e29d615b9d chore(bench): add suitable heuristic for zk throughput
Heuristic based on PBS count was flawed since a ZK verification operation will eat up to 32 threads on the machine. The previous heuristic could generate an input data vector way bigger than the total of threads divided by 32. This in turn lead to long execution time for benchmark and generate bad results.
2025-05-20 15:02:59 +02:00
Nicolas Sarlin
a01949e630 fix(bench): compilation error without the internal-keycache feature 2025-05-19 09:50:29 +02:00
Baptiste Roux
9ee8259002 feat(hpu): Add Hpu backend implementation
This backend abstract communication with Hpu Fpga hardware.
It define it's proper entities to prevent circular dependencies with
tfhe-rs.
Object lifetime is handle through Arc<Mutex<T>> wrapper, and enforce
that all objects currently alive in Hpu Hw are also kept valid on the
host side.

It contains the second version of HPU instruction set (HIS_V2.0):
* DOp have following properties:
  + Template as first class citizen
  + Support of Immediate template
  + Direct parser and conversion between Asm/Hex
  + Replace deku (and it's associated endianess limitation) by
  + bitfield_struct and manual parsing

* IOp have following properties:
  + Support various number of Destination
  + Support various number of Sources
  + Support various number of Immediat values
  + Support of multiple bitwidth (Not implemented yet in the Fpga
    firmware)

Details could be view in `backends/tfhe-hpu-backend/Readme.md`
2025-05-16 16:30:23 +02:00
David Testé
97b5973e4c chore(bench): store object measurements results in tfhe-benchmark 2025-05-13 16:05:16 +02:00
Agnes Leroy
fd79c4f972 chore(bench): parallelize transfer bench 2025-05-13 10:45:48 +02:00
David Testé
a96970e8c3 chore: update clap dependency version to 4.5.30 2025-05-13 10:35:51 +02:00