44 Commits

Author SHA1 Message Date
Nicolas Sarlin
312ce494bf chore(zk): add 1 * 64 benches with production CRS 2025-12-17 15:06:37 +01:00
Enzo Di Maria
cf969ff930 refactor(gpu): creating benchmarks for match_value 2025-12-11 12:01:43 +01:00
Nicolas Sarlin
01367368ed chore(zk): do not bench zkv1 at the integer level 2025-11-25 17:20:06 +01:00
Nicolas Sarlin
33f77458e9 chore(zk): fix elements count for zk throughput benches 2025-11-25 17:20:06 +01:00
Arthur Meyre
caf5e9d879 chore: fix scalar benchmarks generating fixed values
- this would not give an average runtime for scalar benchmarks and for
small precisions could give super good timings (for lucky values)
- the timings for other precisions could still be favorable or unfavorable
depending on the value that was drawn
2025-11-25 14:23:55 +01:00
David Testé
b0393c0acb chore(bench): run scalar ops in integer deduplicated cpu bench 2025-11-24 14:03:08 +01:00
David Testé
58378b7972 chore(bench): add dedicated targets for aes cuda benchmarks 2025-11-20 16:58:06 +01:00
David Testé
071e70c037 chore(bench): fix benchmark id pattern for aes and aes256 2025-11-19 17:23:05 +01:00
Mayeul@Zama
f9268b889f chore(bench): revert print bench id
This reverts commit ef07963767.
2025-11-17 11:23:50 +01:00
Enzo Di Maria
54c8c5e020 chore(gpu): no crash with aes benches if oom error 2025-11-14 17:02:33 +01:00
David Testé
ef07963767 chore(bench): print bench id before running the benchmark
Done to circumvent criterion limitation regarding automatic
truncation of long benchmark ID.
Using a println() call we ensure the complete name is displayed
before benchmark execution to ease manual parsing and debugging.
2025-11-14 13:45:04 +01:00
Enzo Di Maria
4ff95e3a42 feat(gpu): AES 256 2025-11-05 13:37:08 +01:00
Pedro Alves
867f8fb579 feat(gpu): implement re-randomization
- exposed to integer and HL API
- test on the HL API
- benchmarks for GPU and CPU implementation
2025-10-29 17:55:45 -03:00
Pedro Alves
70773e442c fix(gpu): fix 128-bit compression benchmark 2025-10-27 17:06:45 +01:00
Mayeul@Zama
777bbe437a feat(shortint): add multi bit decompression 2025-10-24 09:28:17 +02:00
Arthur Meyre
23246f63f7 chore: update fast_dedup opset to match the latency benchmarks in the docs
- signed bench update
2025-10-23 10:42:19 +02:00
Arthur Meyre
11c79b5237 chore: update fast_dedup opset to match the latency benchmarks in the docs 2025-10-23 10:42:19 +02:00
pgardratzama
f9c89212ea fix(hpu): display name on shift looked wrong 2025-10-21 13:29:59 +02:00
Thomas Montaigu
0dd0ead4e2 chore(bench): remove trivial encryptions
It makes benches not accurate
2025-10-20 12:26:44 +02:00
Agnes Leroy
c30835fc30 chore(gpu): remove async entry points for abs, add, sub, aes 2025-10-17 15:42:06 +02:00
pgardratzama
3073d60f11 fix(hpu): work-around a criterion assert by reducing number of elements on division & modulus throughput bench 2025-10-07 14:23:07 +02:00
pgardratzama
ab25919187 fix(hpu): throughput benchmarks were done 1 IOp per 1 IOp... 2025-10-07 10:14:43 +02:00
Enzo Di Maria
f0f3dd76eb feat(gpu): aes 128 2025-10-06 09:31:36 +02:00
pgardratzama
2bf595d0e2 fix(hpu): missing bench numbers for less_than & less_or_equal because lower != less 2025-10-02 13:20:36 +02:00
David Testé
4ba1787e12 chore(bench): add crs size in zk-pke benchmark names
This is done get more details about the benchmarks when parsing
results.
2025-09-16 16:06:41 +02:00
David Testé
366d359441 chore(bench): measure ciphertext and key sizes at a large scale
Ciphertext sizes are measured at HLAPI layer with several
parameters set.
Keys sizes are measured at shortint level.
This benchmark has now its dedicated GitHub workflow that would
run, at least, each 24th of the month.
2025-09-16 15:43:36 +02:00
pgardratzama
4ff0d6cac2 feat(hpu): integer bench update (adds mod, div -> div_mod), erc20_simd simd batch size read from iop prototype 2025-09-10 22:24:31 +02:00
tmontaigu
e8dc403ebd feat(integer): add flip operation
Add the flip(condition: BooleanBlock, a: T, b: T) -> (T, T)
operation that homomorphically flip/swap two values if the
given encrypted boolean encrypts true
2025-09-10 09:44:28 +02:00
pgardratzama
6fe24c6ab3 chore(hpu): update hpu integer bench scalar op names 2025-09-05 10:42:36 +02:00
pgardratzama
c6aa1adbe7 chore(hpu): update benches to run new operations 2025-09-05 10:42:36 +02:00
David Testé
4a0658389e chore(bench): make bits to prove customizable in zk benchmarks
Some application like blockchain, may wants to prove less bits
than CRS size allows to.
2025-09-05 09:03:24 +02:00
Pedro Alves
9a1c0f48f4 feat(gpu): implement 128-bit compression and add it to the integer API 2025-08-29 11:26:07 -03:00
David Testé
4b6942a0f8 chore(bench): add unbounded oprf integer benchmarks
Also move Cuda OPRF benchmark into the same file as CPU implementation
2025-08-22 15:01:53 +02:00
David Testé
3b42f9873a chore(bench): write params to file for each zk benchmark on gpu
To be parsable each benchmark criterion ID must have their crypto
details written to a file.
2025-08-07 15:17:33 +02:00
tmontaigu
8c838da209 chore(integer): improve measurements
It seems that in
```rust
bench_group.bench_function(&bench_id, |b| {
  // some code
  b.iter(|| {
      // function to bench
  })
});
```
If we put code in the '// some code' part, it affects the measurements
the slower this code is the worse the measurements can be.

For many operations the gap is small (a few ms or no gap),
but for the division the gap was around 500ms.

So to reduce this, we move out what we can, moving
the keycache access is the most important aspect as it
cost around 70ms to 100ms.

A LazyCell is used in order only access the keycache is the bench is not
filtered out. Which is the behaviour we had before this commit, and the
behaviour we want to keep so that running specific benches via regex
selection stay fast.

Also, for clean input benches, we use `iter` instead of `iter_batched`
as it makes more sense and should give more accurate results as
iter_batched timing include other things that just the timing of the
function.
2025-07-15 12:46:38 +02:00
Pedro Alves
d3dd010deb fix(gpu): reduces number of elements in the ZK throughput benchmark 2025-07-11 08:57:01 +01:00
Pedro Alves
9960f5e8b6 fix(gpu): Fix expand bench on multi-gpus 2025-07-09 09:17:55 +01:00
Pedro Alves
23ebd42209 fix(gpu): fix compression throughput benchmark 2025-07-07 16:30:24 +01:00
David Testé
11c0340eca chore(bench): plug server-side proof in zk benchmarks 2025-06-10 18:00:39 +02:00
Baptiste Roux
443e02215f feat(hpu): Add recent IOp in integer benchmarks 2025-06-10 17:43:35 +02:00
Pedro Alves
408e81c45a feat(gpu): add support for GPU-accelerated expand on the HL Api
- includes documentation about GPU's accelerated expand on the HL API
- rework CudaKeySwitchingKey
- Cloning the key is no longer necessary on the HL API
2025-05-23 11:54:29 +02:00
David Testé
e29d615b9d chore(bench): add suitable heuristic for zk throughput
Heuristic based on PBS count was flawed since a ZK verification operation will eat up to 32 threads on the machine. The previous heuristic could generate an input data vector way bigger than the total of threads divided by 32. This in turn lead to long execution time for benchmark and generate bad results.
2025-05-20 15:02:59 +02:00
Baptiste Roux
9ee8259002 feat(hpu): Add Hpu backend implementation
This backend abstract communication with Hpu Fpga hardware.
It define it's proper entities to prevent circular dependencies with
tfhe-rs.
Object lifetime is handle through Arc<Mutex<T>> wrapper, and enforce
that all objects currently alive in Hpu Hw are also kept valid on the
host side.

It contains the second version of HPU instruction set (HIS_V2.0):
* DOp have following properties:
  + Template as first class citizen
  + Support of Immediate template
  + Direct parser and conversion between Asm/Hex
  + Replace deku (and it's associated endianess limitation) by
  + bitfield_struct and manual parsing

* IOp have following properties:
  + Support various number of Destination
  + Support various number of Sources
  + Support various number of Immediat values
  + Support of multiple bitwidth (Not implemented yet in the Fpga
    firmware)

Details could be view in `backends/tfhe-hpu-backend/Readme.md`
2025-05-16 16:30:23 +02:00
David Testé
67ec4a28c1 chore(bench): move benchmarks to their own crate
This is done to speed-up compilation duration by avoiding
recompiling tfhe each time a modification is made in a benchmark
file.
2025-05-09 13:46:27 +02:00