From 6242be7a332f8e0f9d52068607a2dfc647d113a8 Mon Sep 17 00:00:00 2001
From: jfrery <jordan.frery@zama.ai>
Date: Thu, 16 Dec 2021 13:20:56 +0100
Subject: [PATCH] docs: update use_quantization.md

---
 docs/user/howto/use_quantization.md | 127 ++++++++++++++++++++++++++--
 1 file changed, 122 insertions(+), 5 deletions(-)

diff --git a/docs/user/howto/use_quantization.md b/docs/user/howto/use_quantization.md
index 187189ba4..b6f88c125 100644
--- a/docs/user/howto/use_quantization.md
+++ b/docs/user/howto/use_quantization.md
@@ -4,7 +4,7 @@ In this section we detail some usage of [quantization](../explanation/quantizati
 
 ## Quantization Basics
 
-The very basic purpose of the quantization is to convert floating point values to integers. We can apply such conversion using `QuantizedArray` available in `concrete.quantization`. 
+Concrete Framework implements some basic concepts of quantization. The very basic purpose of it is to convert floating point values to integers. We can apply such conversion using `QuantizedArray` available in `concrete.quantization`. 
 
 `QuantizedArray` takes 2 arguments:
 - `n_bits` that defines the precision of the quantization. Currently, `n_bits` is limited to 7, due to some ConcreteLib limits.
@@ -32,10 +32,127 @@ q_A.dequant()
 # Dequantized values.
 ```
 
-## Operations in the Quantized Realm
+## Neural networks in the Quantized Realm
 
-## Why are the scale and zero point not a problem ?
+Neural networks are implemented with a diverse set of operations, such as convolution, linear transformations, activation functions and element-wise operations. When working with quantized values, these operations can not be carried out the same way as for floating point values. With quantization it is necessary to re-scale the input and output values of each operation to fit in the quantization domain. 
 
-## Compilation
+Re-scaling raw input values to the quantized domain implies that we need to make use of floating point operations. In the FHE setting where we only work with integers, this could be a problem, but luckily, the FHE implementation behind the Concrete framework provides a workaround. We essentially make use of a [table lookup](../tutorial/table_lookup.md) which is later translated into a [PBS](https://whitepaper.zama.ai). 
+
+Of course, having a PBS for every quantized addition isn't recommended for computational cost reasons. Also, Concrete Framework allows PBS only for univariate operations (i.e. matrix multiplication can't be done in a PBS). Therefore, our quantized modules split the computation of floating point values and unsigned integers as it is currently done in `concrete.quantization.QuantizedLinear`.
+
+
+The above operations are all implemented in Concrete Framework and transparent to the user via our Quantized Modules.
+
+Concrete Framework allows you to convert Numpy operations to their FHE counterparts. This essentially opens the door to any python computing framework such as [PyTorch](https://pytorch.org/). Concrete Framework implements a Torch to Numpy converter that makes it easy for the user to use a torch model.
+
+First we define a model:
+
+<!--python-test:cont-->
+```python
+from torch import nn
+import torch
+class LogisticRegression(nn.Module):
+    """LogisticRegression with Torch"""
+
+    def __init__(self):
+        super().__init__()
+        self.fc1 = nn.Linear(in_features=14, out_features=1)
+        self.sigmoid1 = nn.Sigmoid()
+
+
+    def forward(self, x):
+        """Forward pass."""
+        out = self.fc1(x)
+        out = self.sigmoid1(out)
+        return out
+
+torch_model = LogisticRegression()
+```
+
+We then convert this model to numpy only operations:
+<!--python-test:cont-->
+```python
+from concrete.torch import NumpyModule
+numpy_model = NumpyModule(torch_model)
+```
+
+The `NumpyModule` allows us to runs inference as for a `nn.Module`. Here, the prediction of the numpy module should be exactly the same.
+
+We can then quantize the numpy module with `PostTrainingAffineQuantization` as follows:
+
+```python
+from concrete.quantization import PostTrainingAffineQuantization
+numpy_input = numpy.random.uniform(-1, 1, size=(10,14)) # some input with 14 features to calibrate the quantization
+n_bits = 2 # number of bits of precision for the weights, activation, inputs and outputs.
+post_training_quant = PostTrainingAffineQuantization(n_bits, numpy_model)
+quantized_numpy_module = post_training_quant.quantize_module(numpy_input)
+```
+
+Here, the quantized model takes a quantized array and runs inference in the quantized paradigm.
+
+We can then easily verify that all models give similar predictions. Obviously, the `n_bits` chosen may adversely affect the prediction of the `quantized_numpy_module`. You can try increasing this parameter to see the effect on your model but keep in mind that the compilation will require all the values of your network to be less than 7 bits of precision.
+
+<!--python-test:cont-->
+```python
+torch_model(torch.from_numpy(numpy_input).float())
+# tensor([[-0.0690],
+#         [-0.1108],
+#         [-0.0743],
+#         [-0.0464],
+#         [ 0.0261],
+#         [-0.1380],
+#         [-0.0941],
+#         [-0.1589],
+#         [ 0.0374],
+#         [-0.1088]])
+numpy_model(numpy_input)
+# array([[-0.06901879],
+#        [-0.11081327],
+#        [-0.07429631],
+#        [-0.04636377],
+#        [ 0.02613242],
+#        [-0.13795333],
+#        [-0.09408965],
+#        [-0.15885062],
+#        [ 0.03735061],
+#        [-0.10878125]])
+quantized_numpy_module.forward_and_dequant(QuantizedArray(2, numpy_input))
+# array([[-0.03792994],
+#        [-0.15551274],
+#        [-0.03792994],
+#        [ 0.08154936],
+#        [ 0.08154936],
+#        [-0.15551274],
+#        [-0.03792994],
+#        [-0.15551274],
+#        [ 0.08154936],
+#        [-0.15551274]])
+```
+
+```{warning}
+The current implementation of the framework parses the layers in the order of their definition in the nn.Module. Thus, the order of instantiation of the layers in the constructor (init function) is crucial for the conversion to numpy to work properly.
+```
+
+```{warning}
+Do not reuse a layer or an activation multiple times in the forward (i.e. self.sigmoid for each layer activation) and always place them at the correct position (the order of appearance in the forward function) in the init function.
+```
+
+It is now possible to compile the `quantized_numpy_module`. Details on how to compile the model are available in the [torch compilation documentation].
+## Building your own QuantizedModule
+
+Concrete Framework also offers the possibility to build your own models and use them in the FHE settings. The `QuantizedModule` is a very simple abstraction that allows to create any model using the available operators:
+
+- QuantizedSigmoid, the quantized version of `nn.Sigmoid`
+- QuantizedLinear, the quantized version of `nn.Linear`
+- QuantizedReLU6, the quantized version of `nn.ReLU6`
+
+
+A well detailed example is available for a [QuantizedLinearRegression](../advanced_examples/QuantizedLinearRegression.ipynb).
+
+
+## Future releases
+
+Currently, the quantization is only available via `PostTrainingAffineQuantization` which is a [popular](https://arxiv.org/pdf/1712.05877.pdf) approach for quantization but has some constraints.
+
+In future releases we plan to offer the possibility to the user to apply quantization beforehand and convert the model directly to our `QuantizedModule`. This will allow users to take advantage of Quantization Aware Training (QAT) that allow neural networks to reach better accuracies.
 
-## Future releases
\ No newline at end of file