diff --git a/tfhe/docs/getting-started/benchmarks/cpu/README.md b/tfhe/docs/getting-started/benchmarks/cpu/README.md index 357bc2387..dc64af748 100644 --- a/tfhe/docs/getting-started/benchmarks/cpu/README.md +++ b/tfhe/docs/getting-started/benchmarks/cpu/README.md @@ -9,4 +9,5 @@ All CPU benchmarks were launched on an `AWS hpc7a.96xlarge` instance equipped wi {% endhint %} * [Integer operations](cpu-integer-operations.md) +* [ERC20](cpu-erc20.md) * [Programmable Bootstrapping](cpu-programmable-bootstrapping.md) diff --git a/tfhe/docs/getting-started/benchmarks/cpu/cpu-erc20.md b/tfhe/docs/getting-started/benchmarks/cpu/cpu-erc20.md new file mode 100644 index 000000000..514f3a424 --- /dev/null +++ b/tfhe/docs/getting-started/benchmarks/cpu/cpu-erc20.md @@ -0,0 +1,68 @@ +As TFHE-rs is the underlying library of the Zama Confidential Blockchain Protocol, to illustrate real-world performance, +consider an ERC20 transfer that requires executing the following sequence of operations: +```rust +use tfhe::FheUint64; +fn erc20_transfer_whitepaper( + from_amount: &FheUint64, + to_amount: &FheUint64, + amount: &FheUint64, +) -> (FheUint64, FheUint64) { + let has_enough_funds = (from_amount).ge(amount); + let zero_amount = FheUint64::encrypt_trivial(0u64); + let amount_to_transfer = has_enough_funds.select(amount, &zero_amount); + + let new_to_amount = to_amount + &amount_to_transfer; + let new_from_amount = from_amount - &amount_to_transfer; + + (new_from_amount, new_to_amount) +} +``` +This is one way to compute an encrypted ERC20 transfer, but it is not the most efficient. +Instead, it is possible to compute the same transfer in a more efficient way by not using the `select` operation: +```rust +use tfhe::FheUint64; +fn erc20_transfer_no_cmux( + from_amount: &FheUint64, + to_amount: &FheUint64, + amount: &FheUint64, +) -> (FheUint64, FheUint64) { + let has_enough_funds = (from_amount).ge(amount); + + let amount = amount * FheUint64::cast_from(has_enough_funds); + + let new_to_amount = to_amount + &amount; + let new_from_amount = from_amount - &amount; + + (new_from_amount, new_to_amount) +} +``` +An even more efficient way to compute an encrypted ERC20 transfer is to use the `overflowing_sub` operation as follows: +```rust +use tfhe::FheUint64; +fn erc20_transfer_overflow( + from_amount: &FheUint64, + to_amount: &FheUint64, + amount: &FheUint64, +) -> (FheUint64, FheUint64) { + let (new_from, did_not_have_enough) = (from_amount).overflowing_sub(amount); + let did_not_have_enough = &did_not_have_enough; + let had_enough_funds = !did_not_have_enough; + + let (new_from_amount, new_to_amount) = rayon::join( + || did_not_have_enough.if_then_else(from_amount, &new_from), + || to_amount + (amount * FheUint64::cast_from(had_enough_funds)), + ); + (new_from_amount, new_to_amount) +} +``` +In a blockchain protocol, the FHE operations would not be the only ones used to compute the transfer: +ciphertext compression and decompression, as well as rerandomization, would also be used. +Network communications would also introduce significant overhead. +For the sake of simplicity, here the focus is only placed on the performance of the FHE operations. +The latency and throughput of these three ERC20 FHE transfer implementations are compared in the following table: + +TODO add SVG + +The throughput shown here is the maximum that can be achieved with TFHE-rs on CPU, in an ideal scenario where all transactions are independent. +In a blockchain protocol, the throughput would be limited by the latency of the network, but also by the necessity to apply other operations +(compression, decompression, ciphertext rerandomization). diff --git a/tfhe/docs/getting-started/benchmarks/gpu/README.md b/tfhe/docs/getting-started/benchmarks/gpu/README.md index 26b137008..d3ed9ee59 100644 --- a/tfhe/docs/getting-started/benchmarks/gpu/README.md +++ b/tfhe/docs/getting-started/benchmarks/gpu/README.md @@ -9,4 +9,5 @@ All GPU benchmarks were launched on H100 GPUs, and rely on the multithreaded PBS {% endhint %} * [Integer operations](gpu-integer-operations.md) +* [ERC20](gpu-erc20.md) * [Programmable Bootstrapping](gpu-programmable-bootstrapping.md) diff --git a/tfhe/docs/getting-started/benchmarks/gpu/gpu-erc20.md b/tfhe/docs/getting-started/benchmarks/gpu/gpu-erc20.md new file mode 100644 index 000000000..d470353d7 --- /dev/null +++ b/tfhe/docs/getting-started/benchmarks/gpu/gpu-erc20.md @@ -0,0 +1,7 @@ +Similarly to the [CPU benchmarks](../cpu/cpu-erc20.md), the latency and throughput of a confidential ERC20 token transfer can be measured. + +TODO add SVG + +The throughput shown here is the maximum that can be achieved with TFHE-rs on an 8xH100 GPU node, in an ideal scenario where all transactions are independent. +In a blockchain protocol, the throughput would be limited by the latency of the network and the necessity to apply +other operations (compression, decompression, rerandomization). diff --git a/tfhe/docs/getting-started/benchmarks/hpu/README.md b/tfhe/docs/getting-started/benchmarks/hpu/README.md index eb1c30ac7..1f90fa056 100644 --- a/tfhe/docs/getting-started/benchmarks/hpu/README.md +++ b/tfhe/docs/getting-started/benchmarks/hpu/README.md @@ -10,3 +10,4 @@ All HPU benchmarks were launched on AMD Alveo v80 FPGAs. * [Integer operations](hpu-integer-operations.md) * [Programmable Bootstrapping](hpu-programmable-bootstrapping.md) +* [ERC20](hpu-erc20.md) diff --git a/tfhe/docs/getting-started/benchmarks/hpu/hpu-erc20.md b/tfhe/docs/getting-started/benchmarks/hpu/hpu-erc20.md new file mode 100644 index 000000000..e69de29bb diff --git a/tfhe/src/test_user_docs.rs b/tfhe/src/test_user_docs.rs index bd9e5857d..90f54b2df 100644 --- a/tfhe/src/test_user_docs.rs +++ b/tfhe/src/test_user_docs.rs @@ -15,6 +15,12 @@ mod test_cpu_doc { configuration_rust_configuration ); + // BENCHMARKS + doctest!( + "../docs/getting-started/benchmarks/cpu/cpu-erc20.md", + benchmarks_cpu_erc20 + ); + // FHE COMPUTATION // ADVANCED FEATURES