docs(frontend-python): bit extract performance

2026-04-17 03:00:54 -04:00 · 2024-01-09 14:33:51 +01:00
parent 019538e84b
commit 240ae2d78b
1 changed files with 35 additions and 0 deletions
--- a/docs/tutorial/bit_extraction.md
+++ b/docs/tutorial/bit_extraction.md
@@ -122,3 +122,38 @@ prints

 - Bits of floats cannot be extracted.
  - Floats are partially supported but extracting their bits is not supported at all.
+
+## Performance Considerations
+
+### A Chain of Individual Bit Extractions
+
+**Key Concept**: Extracting a specific bit requires clearing all the preceding lower bits. This involves extracting these previous bits as intermediate values and then subtracting them from the input.
+
+**Implications:**
+
+* Bits are extracted sequentially, starting from the least significant bit to the more significant ones. The cost is proportional to the index of the highest extracted bit plus one.
+* No parallelization is possible. The computation time is proportional to the cost, independent of the number of CPUs.
+
+**Examples:**
+
+* Extracting `fhe.bits(x)[4]` is approximately five times costlier than extracting `fhe.bits(x)[0]`.
+* Extracting `fhe.bits(x)[4]` takes around five times more wall clock time than `fhe.bits(x)[0]`.
+* The cost of extracting `fhe.bits(x)[0:5]` is almost the same as that of `fhe.bits(x)[5]`.
+
+### Reuse of Intermediate Extracted Bits
+
+**Key Concept**: Common sub-expression elimination is applied to intermediate extracted bits.
+
+**Implications:**
+
+* The overall cost for a series of `fhe.bits(x)[m:n]` calls on the same input `x` is almost equivalent to the cost of the single most computationally expensive extraction in the series, i.e. `fhe.bits(x)[n]`.
+* The order of extraction in that series does not affect the overall cost.
+
+**Example**:
+
+The combined operation `fhe.bit(x)[3] + fhe.bit(x)[2] + fhe.bit(x)[1]` has almost the same cost as `fhe.bits(x)[3]`.
+
+### TLUs of 1b input precision
+
+Each extracted bit incurs a cost of approximately one TLU of 1-bit input precision. Therefore, `fhe.bits(x)[0]` is generally faster than any other TLU operation.
+