fix: SUMMARY links were missing, two line jump in HPU text

chore(hpu): adds a few words on HPU ERC20 based on GPU words
doc: add svg tables for erc20 benchmarks for all backends
2026-01-14 00:58:13 -05:00 · 2026-01-13 09:29:29 +01:00 · 2026-01-13 09:13:38 +01:00 · 2026-01-13 09:13:31 +01:00 · 2025-12-18 17:44:42 +01:00
11 changed files with 175 additions and 0 deletions
--- a/tfhe/docs/.gitbook/assets/cpu-hlapi-erc20-benchmark-latency-throughput.svg
+++ b/tfhe/docs/.gitbook/assets/cpu-hlapi-erc20-benchmark-latency-throughput.svg
@@ -0,0 +1,26 @@
+<?xml version="1.0" ?>
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 160" preserveAspectRatio="meet" width="100%" height="160">
+	<rect x="0" y="0" width="720" height="40" fill="black"/>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="6" y="20.0">Transfer implementation</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="405.0" y="20.0">Latency</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="615.0" y="20.0">Throughput</text>
+	<rect x="0" y="40" width="300" height="120" fill="#fbbc04"/>
+	<rect x="300" y="40" width="420" height="120" fill="#f3f3f3"/>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="60.0">whitepaper</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="60.0">276 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="60.0">23.0 ops/s</text>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="100.0">no_cmux</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="100.0">238 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="100.0">24.0 ops/s</text>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="140.0">overflow</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="140.0">225 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="140.0">21.3 ops/s</text>
+	<line stroke="white" stroke-width="2" x1="0" y1="0" x2="720" y2="0"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="40" x2="720" y2="40"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="80" x2="720" y2="80"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="120" x2="720" y2="120"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="0" x2="0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="300.0" y1="0" x2="300.0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="510.0" y1="0" x2="510.0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="720.0" y1="0" x2="720.0" y2="160"/>
+</svg>
--- a/tfhe/docs/.gitbook/assets/gpu-hlapi-erc20-benchmark-h100x8-sxm5-latency-throughput.svg
+++ b/tfhe/docs/.gitbook/assets/gpu-hlapi-erc20-benchmark-h100x8-sxm5-latency-throughput.svg
@@ -0,0 +1,26 @@
+<?xml version="1.0" ?>
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 160" preserveAspectRatio="meet" width="100%" height="160">
+	<rect x="0" y="0" width="720" height="40" fill="black"/>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="6" y="20.0">Transfer implementation</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="405.0" y="20.0">Latency</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="615.0" y="20.0">Throughput</text>
+	<rect x="0" y="40" width="300" height="120" fill="#fbbc04"/>
+	<rect x="300" y="40" width="420" height="120" fill="#f3f3f3"/>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="60.0">whitepaper</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="60.0">30.2 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="60.0">174 ops/s</text>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="100.0">no_cmux</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="100.0">26.8 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="100.0">195 ops/s</text>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="140.0">overflow</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="140.0">23.2 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="140.0">232 ops/s</text>
+	<line stroke="white" stroke-width="2" x1="0" y1="0" x2="720" y2="0"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="40" x2="720" y2="40"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="80" x2="720" y2="80"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="120" x2="720" y2="120"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="0" x2="0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="300.0" y1="0" x2="300.0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="510.0" y1="0" x2="510.0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="720.0" y1="0" x2="720.0" y2="160"/>
+</svg>
--- a/tfhe/docs/.gitbook/assets/hpu-hlapi-erc20-benchmark-hpux1-latency-throughput.svg
+++ b/tfhe/docs/.gitbook/assets/hpu-hlapi-erc20-benchmark-hpux1-latency-throughput.svg
@@ -0,0 +1,26 @@
+<?xml version="1.0" ?>
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 160" preserveAspectRatio="meet" width="100%" height="160">
+	<rect x="0" y="0" width="720" height="40" fill="black"/>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="6" y="20.0">Transfer implementation</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="405.0" y="20.0">Latency</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="bold" fill="white" x="615.0" y="20.0">Throughput</text>
+	<rect x="0" y="40" width="300" height="120" fill="#fbbc04"/>
+	<rect x="300" y="40" width="420" height="120" fill="#f3f3f3"/>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="60.0">whitepaper</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="60.0">24.9 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="60.0">41.2 ops/s</text>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="100.0">hpu_optim</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="100.0">24.1 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="100.0">42.7 ops/s</text>
+	<text dominant-baseline="middle" text-anchor="start" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="6" y="140.0">hpu_simd</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="405.0" y="140.0">138 ms</text>
+	<text dominant-baseline="middle" text-anchor="middle" font-family="Arial" font-size="14" font-weight="normal" fill="black" x="615.0" y="140.0">87.6 ops/s</text>
+	<line stroke="white" stroke-width="2" x1="0" y1="0" x2="720" y2="0"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="40" x2="720" y2="40"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="80" x2="720" y2="80"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="120" x2="720" y2="120"/>
+	<line stroke="white" stroke-width="2" x1="0" y1="0" x2="0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="300.0" y1="0" x2="300.0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="510.0" y1="0" x2="510.0" y2="160"/>
+	<line stroke="white" stroke-width="2" x1="720.0" y1="0" x2="720.0" y2="160"/>
+</svg>
--- a/tfhe/docs/SUMMARY.md
+++ b/tfhe/docs/SUMMARY.md
@@ -10,12 +10,15 @@
 * [Benchmarks](getting-started/benchmarks/README.md)
  * [CPU Benchmarks](getting-started/benchmarks/cpu/README.md)
    * [Integer](getting-started/benchmarks/cpu/cpu-integer-operations.md)
+    * [ERC20](getting-started/benchmarks/cpu/cpu-erc20.md)
    * [Programmable bootstrapping](getting-started/benchmarks/cpu/cpu-programmable-bootstrapping.md)
  * [GPU Benchmarks](getting-started/benchmarks/gpu/README.md)
    * [Integer](getting-started/benchmarks/gpu/gpu-integer-operations.md)
+    * [ERC20](getting-started/benchmarks/gpu/gpu-erc20.md)
    * [Programmable bootstrapping](getting-started/benchmarks/gpu/gpu-programmable-bootstrapping.md)
  * [HPU Benchmarks](getting-started/benchmarks/hpu/README.md)
    * [Integer](getting-started/benchmarks/hpu/hpu-integer-operations.md)
+    * [ERC20](getting-started/benchmarks/hpu/hpu-erc20.md)
    * [Programmable bootstrapping](getting-started/benchmarks/hpu/hpu-programmable-bootstrapping.md)
  * [Zero-knowledge proof benchmarks](getting-started/benchmarks/zk-proof-benchmarks.md)
 * [Security and cryptography](getting-started/security-and-cryptography.md)
--- a/tfhe/docs/getting-started/benchmarks/cpu/README.md
+++ b/tfhe/docs/getting-started/benchmarks/cpu/README.md
@@ -9,4 +9,5 @@ All CPU benchmarks were launched on an `AWS hpc7a.96xlarge` instance equipped wi
 {% endhint %}

 * [Integer operations](cpu-integer-operations.md)
+* [ERC20](cpu-erc20.md)
 * [Programmable Bootstrapping](cpu-programmable-bootstrapping.md)
--- a/tfhe/docs/getting-started/benchmarks/cpu/cpu-erc20.md
+++ b/tfhe/docs/getting-started/benchmarks/cpu/cpu-erc20.md
@@ -0,0 +1,68 @@
+As TFHE-rs is the underlying library of the Zama Confidential Blockchain Protocol, to illustrate real-world performance,  
+consider an ERC20 transfer that requires executing the following sequence of operations:
+```rust
+use tfhe::FheUint64;
+fn erc20_transfer_whitepaper(
+    from_amount: &FheUint64,
+    to_amount: &FheUint64,
+    amount: &FheUint64,
+) -> (FheUint64, FheUint64) {
+    let has_enough_funds = (from_amount).ge(amount);
+    let zero_amount = FheUint64::encrypt_trivial(0u64);
+    let amount_to_transfer = has_enough_funds.select(amount, &zero_amount);
+
+    let new_to_amount = to_amount + &amount_to_transfer;
+    let new_from_amount = from_amount - &amount_to_transfer;
+
+    (new_from_amount, new_to_amount)
+}
+```
+This is one way to compute an encrypted ERC20 transfer, but it is not the most efficient.
+Instead, it is possible to compute the same transfer in a more efficient way by not using the `select` operation:
+```rust
+use tfhe::FheUint64;
+fn erc20_transfer_no_cmux(
+    from_amount: &FheUint64,
+    to_amount: &FheUint64,
+    amount: &FheUint64,
+) -> (FheUint64, FheUint64) {
+    let has_enough_funds = (from_amount).ge(amount);
+
+    let amount = amount * FheUint64::cast_from(has_enough_funds);
+
+    let new_to_amount = to_amount + &amount;
+    let new_from_amount = from_amount - &amount;
+
+    (new_from_amount, new_to_amount)
+}
+```
+An even more efficient way to compute an encrypted ERC20 transfer is to use the `overflowing_sub` operation as follows:
+```rust
+use tfhe::FheUint64;
+fn erc20_transfer_overflow(
+    from_amount: &FheUint64,
+    to_amount: &FheUint64,
+    amount: &FheUint64,
+) -> (FheUint64, FheUint64) {
+    let (new_from, did_not_have_enough) = (from_amount).overflowing_sub(amount);
+    let did_not_have_enough = &did_not_have_enough;
+    let had_enough_funds = !did_not_have_enough;
+
+    let (new_from_amount, new_to_amount) = rayon::join(
+        || did_not_have_enough.if_then_else(from_amount, &new_from),
+        || to_amount + (amount * FheUint64::cast_from(had_enough_funds)),
+    );
+    (new_from_amount, new_to_amount)
+}
+```
+In a blockchain protocol, the FHE operations would not be the only ones used to compute the transfer:
+ciphertext compression and decompression, as well as rerandomization, would also be used. 
+Network communications would also introduce significant overhead.
+For the sake of simplicity, here the focus is only placed on the performance of the FHE operations.
+The latency and throughput of these three ERC20 FHE transfer implementations are compared in the following table:
+
+![](../../../.gitbook/assets/cpu-hlapi-erc20-benchmark-latency-throughput.svg)
+
+The throughput shown here is the maximum that can be achieved with TFHE-rs on CPU, in an ideal scenario where all transactions are independent. 
+In a blockchain protocol, the throughput would be limited by the latency of the network, but also by the necessity to apply other operations 
+(compression, decompression, ciphertext rerandomization).
--- a/tfhe/docs/getting-started/benchmarks/gpu/README.md
+++ b/tfhe/docs/getting-started/benchmarks/gpu/README.md
@@ -9,4 +9,5 @@ All GPU benchmarks were launched on H100 GPUs, and rely on the multithreaded PBS
 {% endhint %}

 * [Integer operations](gpu-integer-operations.md)
+* [ERC20](gpu-erc20.md)
 * [Programmable Bootstrapping](gpu-programmable-bootstrapping.md)
--- a/tfhe/docs/getting-started/benchmarks/gpu/gpu-erc20.md
+++ b/tfhe/docs/getting-started/benchmarks/gpu/gpu-erc20.md
@@ -0,0 +1,7 @@
+Similarly to the [CPU benchmarks](../cpu/cpu-erc20.md), the latency and throughput of a confidential ERC20 token transfer can be measured.
+
+![](../../../.gitbook/assets/gpu-hlapi-erc20-benchmark-h100x8-sxm5-latency-throughput.svg)
+
+The throughput shown here is the maximum that can be achieved with TFHE-rs on an 8xH100 GPU node, in an ideal scenario where all transactions are independent.
+In a blockchain protocol, the throughput would be limited by the latency of the network and the necessity to apply 
+other operations (compression, decompression, rerandomization).
--- a/tfhe/docs/getting-started/benchmarks/hpu/README.md
+++ b/tfhe/docs/getting-started/benchmarks/hpu/README.md
@@ -10,3 +10,4 @@ All HPU benchmarks were launched on AMD Alveo v80 FPGAs.

 * [Integer operations](hpu-integer-operations.md)
 * [Programmable Bootstrapping](hpu-programmable-bootstrapping.md)
+* [ERC20](hpu-erc20.md)
--- a/tfhe/docs/getting-started/benchmarks/hpu/hpu-erc20.md
+++ b/tfhe/docs/getting-started/benchmarks/hpu/hpu-erc20.md
@@ -0,0 +1,10 @@
+Similarly to the [CPU benchmarks](../cpu/cpu-erc20.md), the latency and throughput of a confidential ERC20 token transfer can be measured.
+
+![](../../../.gitbook/assets/hpu-hlapi-erc20-benchmark-hpux1-latency-throughput.svg)
+
+The whitepaper version of the ERC20 is the same implementation than on CPU or GPU using a selection between the amount to be transferred and 0 based on result of comparison between this amount and from_amount followed by an addition and a subtraction. The hpu_optim version is very much the same processing but with a single HPU instruction (IOp) doing the complete ERC20 processing.
+
+The hpu_simd version is a different measure which uses another HPU IOp called ERC20_SIMD (for Single Instruction Multiple Data). This ERC20_SIMD takes an input of 12 triplets (from, to, amount) and returns 12 pairs (new_from, new_to) and is particulartly efficient when having many independent transfers to execute.
+
+The throughput shown here is the maximum that can be achieved with TFHE-rs on a single FPGA node running the HPU, in an ideal scenario where all transactions are independent.
+In a blockchain protocol, the throughput would be limited by the latency of the network and the necessity to apply other operations (compression, decompression, rerandomization).
--- a/tfhe/src/test_user_docs.rs
+++ b/tfhe/src/test_user_docs.rs
@@ -15,6 +15,12 @@ mod test_cpu_doc {
        configuration_rust_configuration
    );

+    // BENCHMARKS
+    doctest!(
+        "../docs/getting-started/benchmarks/cpu/cpu-erc20.md",
+        benchmarks_cpu_erc20
+    );
+
    // FHE COMPUTATION

    // ADVANCED FEATURES
Author	SHA1	Message	Date
pgardratzama	6a12d25522	fix: SUMMARY links were missing, two line jump in HPU text	2026-01-13 09:29:29 +01:00
pgardratzama	bc8a2d05cd	chore(hpu): adds a few words on HPU ERC20 based on GPU words	2026-01-13 09:13:38 +01:00
David Testé	1eef7a9d2b	doc: add svg tables for erc20 benchmarks for all backends	2026-01-13 09:13:31 +01:00
Agnes Leroy	069c7334a9	doc: start adding erc20 benchmarks	2025-12-18 17:44:42 +01:00