docs: re-write documentation

This commit is contained in:
aquint-zama
2022-06-01 15:41:37 +02:00
committed by Umut
parent 546ed48765
commit 35e46aca69
53 changed files with 1505 additions and 1673 deletions

View File

@@ -1,45 +1,14 @@
# Compilation Pipeline In Depth
# Compilation
## What is **concrete-numpy**?
**concrete-numpy** is a convenient python package, made on top of **Concrete compiler** and **Concrete library**, for developing homomorphic applications. One of its essential functionalities is to transform Python functions to their `MLIR` equivalent. Unfortunately, not all python functions can be converted due to the limits of current product (we are in the alpha stage), or sometimes due to inherent restrictions of FHE itself. However, you can already build interesting and impressing use cases, and more will be available in further versions of the framework.
## How can I use it?
```python
# Import necessary Concrete components
import concrete.numpy as cnp
# Define the function to homomorphize
def f(x, y):
return (2 * x) + y
# Create a Compiler
compiler = cnp.Compiler(f, {"x": "encrypted", "y": "encrypted"})
# Compile to a Circuit using an inputset
inputset = [(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1), (3, 0), (3, 1)]
circuit = compiler.compile(inputset)
# Make homomorphic inference
circuit.encrypt_run_decrypt(1, 0)
```
## Overview of the numpy compilation process
The compilation journey begins with tracing to get an easy to understand and manipulate representation of the function. We call this representation `Computation Graph` which is basically a Directed Acyclic Graph (DAG) containing nodes representing the computations done in the function. Working with graphs is good because they have been studied extensively over the years and there are a lot of algorithms to manipulate them. Internally, we use [networkx](https://networkx.org) which is an excellent graph library for Python.
The compilation journey begins with tracing to get an easy-to-manipulate representation of the function. We call this representation a `Computation Graph`, which is basically a Directed Acyclic Graph (DAG) containing nodes representing the computations done in the function. Working with graphs is good because they have been studied extensively over the years and there are a lot of algorithms to manipulate them. Internally, we use [networkx](https://networkx.org), which is an excellent graph library for Python.
The next step in the compilation is transforming the computation graph. There are many transformations we perform, and they will be discussed in their own sections. In any case, the result of transformations is just another computation graph.
After transformations are applied, we need to determine the bounds (i.e., the minimum and the maximum values) of each intermediate node. This is required because FHE currently allows a limited precision for computations. Bound measurement is our way to know what is the needed precision for the function.
After transformations are applied, we need to determine the bounds (i.e., the minimum and the maximum values) of each intermediate node. This is required because FHE currently allows a limited precision for computations. Bound measurement is our way to know what is the required precision for the function.
The final step is to transform the computation graph to equivalent `MLIR` code. How this is done will be explained in detail in its own chapter.
Once the MLIR is prepared, the rest of the stack, which you can learn more about [here](http://docs.zama.ai/), takes over and completes the compilation process.
Here is the visual representation of the pipeline:
![Frontend Flow](../_static/compilation-pipeline/frontend_flow.svg)
Once the MLIR is generated, we send it to the **Concrete Compiler**, and it completes the compilation process.
## Tracing
@@ -52,13 +21,11 @@ def f(x):
the goal of tracing is to create the following computation graph without needing any change from the user.
![](../_static/compilation-pipeline/two_x_plus_three.png)
![](../\_static/compilation-pipeline/two\_x\_plus\_three.png)
(Note that the edge labels are for non-commutative operations. To give an example, a subtraction node represents `(predecessor with edge label 0) - (predecessor with edge label 1)`)
To do this, we make use of Tracers, which are objects that record the operation performed during their creation. We create a `Tracer` for each argument of the function and call the function with those tracers. Tracers make use of operator overloading feature of Python to achieve their goal.
Here is an example:
To do this, we make use of `Tracer`s, which are objects that record the operation performed during their creation. We create a `Tracer` for each argument of the function and call the function with those tracers. `Tracer`s make use of the operator overloading feature of Python to achieve their goal:
```
def f(x, y):
@@ -70,11 +37,11 @@ y = Tracer(computation=Input("y"))
resulting_tracer = f(x, y)
```
`2 * y` will be performed first, and `*` is overloaded for `Tracer` to return another tracer: `Tracer(computation=Multiply(Constant(2), self.computation))` which is equal to: `Tracer(computation=Multiply(Constant(2), Input("y")))`
`2 * y` will be performed first, and `*` is overloaded for `Tracer` to return another tracer: `Tracer(computation=Multiply(Constant(2), self.computation))`, which is equal to `Tracer(computation=Multiply(Constant(2), Input("y")))`
`x + (2 * y)` will be performed next, and `+` is overloaded for `Tracer` to return another tracer: `Tracer(computation=Add(self.computation, (2 * y).computation))` which is equal to: `Tracer(computation=Add(Input("x"), Multiply(Constant(2), Input("y")))`
`x + (2 * y)` will be performed next, and `+` is overloaded for `Tracer` to return another tracer: `Tracer(computation=Add(self.computation, (2 * y).computation))`, which is equal to `Tracer(computation=Add(Input("x"), Multiply(Constant(2), Input("y")))`
In the end, we will have output Tracers that can be used to create the computation graph. The implementation is a bit more complex than this, but the idea is the same.
In the end, we will have output tracers that can be used to create the computation graph. The implementation is a bit more complex than this, but the idea is the same.
Tracing is also responsible for indicating whether the values in the node would be encrypted or not, and the rule for that is if a node has an encrypted predecessor, it is encrypted as well.
@@ -86,33 +53,33 @@ With the current version of **Concrete Numpy**, floating point inputs and floati
Let's take a closer look at the transforms we can currently perform.
### Fusing floating point operations
### Fusing.
We have allocated a whole new chapter to explaining float fusing. You can find it [here](float-fusing.md).
We have allocated a whole new chapter to explaining fusing. You can find it after this chapter.
## Bounds measurement
Given a computation graph, the goal of the bound measurement step is to assign the minimal data type to each node in the graph.
Let's say we have an encrypted input that is always between `0` and `10`, we should assign the type `Encrypted<uint4>` to node of this input as `Encrypted<uint4>` is the minimal encrypted integer that supports all the values between `0` and `10`.
Let's say we have an encrypted input that is always between `0` and `10`. We should assign the type `Encrypted<uint4>` to the node of this input as `Encrypted<uint4>` is the minimal encrypted integer that supports all values between `0` and `10`.
If there were negative values in the range, we could have used `intX` instead of `uintX`.
Bounds measurement is necessary because FHE supports limited precision, and we don't want unexpected behaviour during evaluation of the compiled functions.
Bounds measurement is necessary because FHE supports limited precision, and we don't want unexpected behaviour while evaluating the compiled functions.
Let's take a closer look at how we perform bounds measurement.
### Inputset evaluation
### Inputset evaluation.
This is a simple approach that requires an inputset to be provided by the user.
The inputset is not to be confused with the dataset which is classical in ML, as it doesn't require labels. Rather, it is a set of values which are typical inputs of the function.
The inputset is not to be confused with the dataset, which is classical in ML, as it doesn't require labels. Rather, it is a set of values which are typical inputs of the function.
The idea is to evaluate each input in the inputset and record the result of each operation in the computation graph. Then we compare the evaluation results with the current minimum/maximum values of each node and update the minimum/maximum accordingly. After the entire inputset is evaluated, we assign a data type to each node using the minimum and the maximum value it contains.
The idea is to evaluate each input in the inputset and record the result of each operation in the computation graph. Then we compare the evaluation results with the current minimum/maximum values of each node and update the minimum/maximum accordingly. After the entire inputset is evaluated, we assign a data type to each node using the minimum and the maximum values it contains.
Here is an example, given this computation graph where `x` is encrypted:
![](../_static/compilation-pipeline/two_x_plus_three.png)
![](../\_static/compilation-pipeline/two\_x\_plus\_three.png)
and this inputset:
@@ -178,156 +145,4 @@ Assigned Data Types:
## MLIR conversion
The actual compilation will be done by the **Concrete** compiler, which is expecting an MLIR input. The MLIR conversion goes from a computation graph to its MLIR equivalent. You can read more about it [here](mlir.md)
## Example walkthrough #1
### Function to homomorphize
```
def f(x):
return (2 * x) + 3
```
### Parameters
```
x = "encrypted"
```
#### Corresponding computation graph
![](../_static/compilation-pipeline/two_x_plus_three.png)
### Topological transforms
#### Fusing floating point operations
This transform isn't applied since the computation doesn't involve any floating point operations.
### Bounds measurement using \[2, 3, 1] as inputset (same settings as above)
Data Types:
* `x`: Encrypted<**uint2**>
* `2`: Clear<**uint2**>
* `*`: Encrypted<**uint3**>
* `3`: Clear<**uint2**>
* `+`: Encrypted<**uint4**>
### MLIR lowering
```
module {
func @main(%arg0: !FHE.eint<4>) -> !FHE.eint<4> {
%c3_i5 = constant 3 : i5
%c2_i5 = constant 2 : i5
%0 = "FHE.mul_eint_int"(%arg0, %c2_i5) : (!FHE.eint<4>, i5) -> !FHE.eint<4>
%1 = "FHE.add_eint_int"(%0, %c3_i5) : (!FHE.eint<4>, i5) -> !FHE.eint<4>
return %1 : !FHE.eint<4>
}
}
```
## Example walkthrough #2
### Function to homomorphize
```
def f(x, y):
return (42 - x) + (y * 2)
```
### Parameters
```
x = "encrypted"
y = "encrypted"
```
#### Corresponding computation graph
![](../_static/compilation-pipeline/forty_two_minus_x_plus_y_times_two.png)
### Topological transforms
#### Fusing floating point operations
This transform isn't applied since the computation doesn't involve any floating point operations.
### Bounds measurement using \[(6, 0), (5, 1), (3, 0), (4, 1)] as inputset
Evaluation Result of `(6, 0)`:
* `42`: 42
* `x`: 6
* `y`: 0
* `2`: 2
* `-`: 36
* `*`: 0
* `+`: 36
Evaluation Result of `(5, 1)`:
* `42`: 42
* `x`: 5
* `y`: 1
* `2`: 2
* `-`: 37
* `*`: 2
* `+`: 39
Evaluation Result of `(3, 0)`:
* `42`: 42
* `x`: 3
* `y`: 0
* `2`: 2
* `-`: 39
* `*`: 0
* `+`: 39
Evaluation Result of `(4, 1)`:
* `42`: 42
* `x`: 4
* `y`: 1
* `2`: 2
* `-`: 38
* `*`: 2
* `+`: 40
Bounds:
* `42`: \[42, 42]
* `x`: \[3, 6]
* `y`: \[0, 1]
* `2`: \[2, 2]
* `-`: \[36, 39]
* `*`: \[0, 2]
* `+`: \[36, 40]
Data Types:
* `42`: Clear<**uint6**>
* `x`: Encrypted<**uint3**>
* `y`: Encrypted<**uint1**>
* `2`: Clear<**uint2**>
* `-`: Encrypted<**uint6**>
* `*`: Encrypted<**uint2**>
* `+`: Encrypted<**uint6**>
### MLIR lowering
```
module {
func @main(%arg0: !FHE.eint<6>, %arg1: !FHE.eint<6>) -> !FHE.eint<6> {
%c42_i7 = constant 42 : i7
%c2_i7 = constant 2 : i7
%0 = "FHE.sub_int_eint"(%c42_i7, %arg0) : (i7, !FHE.eint<6>) -> !FHE.eint<6>
%1 = "FHE.mul_eint_int"(%arg1, %c2_i7) : (!FHE.eint<6>, i7) -> !FHE.eint<6>
%2 = "FHE.add_eint"(%0, %1) : (!FHE.eint<6>, !FHE.eint<6>) -> !FHE.eint<6>
return %2 : !FHE.eint<6>
}
}
```
The actual compilation will be done by the **Concrete Compiler**, which is expecting an MLIR input. The MLIR conversion goes from a computation graph to its MLIR equivalent. You can read more about it [here](mlir.md).