docs: document some of the new features, improve the parts of the old documentation

2026-02-09 03:55:04 -05:00 · 2022-11-22 16:50:15 +01:00
parent 73a0b5c78a
commit 5a8fe7cdda
17 changed files with 552 additions and 44 deletions
--- a/README.md
+++ b/README.md
@@ -71,26 +71,6 @@ You can find more detailed installation instructions in [installing.md](docs/get
 ```python
 import concrete.numpy as cnp

-@cnp.compiler({"x": "encrypted", "y": "encrypted"})
-def add(x, y):
-    return x + y
-
-inputset = [(2, 3), (0, 0), (1, 6), (7, 7), (7, 1), (3, 2), (6, 1), (1, 7), (4, 5), (5, 4)]
-
-print(f"Compiling...")
-circuit = add.compile(inputset)
-
-examples = [(3, 4), (1, 2), (7, 7), (0, 0)]
-for example in examples:
-    result = circuit.encrypt_run_decrypt(*example)
-    print(f"Evaluation of {' + '.join(map(str, example))} homomorphically = {result}")
-```
-
-if you have a function object that you cannot decorate, you can use the explicit `Compiler` API instead
-
-```python
-import concrete.numpy as cnp
-
 def add(x, y):
    return x + y

@@ -100,6 +80,31 @@ inputset = [(2, 3), (0, 0), (1, 6), (7, 7), (7, 1), (3, 2), (6, 1), (1, 7), (4,
 print(f"Compiling...")
 circuit = compiler.compile(inputset)

+print(f"Generating keys...")
+circuit.keygen()
+
+examples = [(3, 4), (1, 2), (7, 7), (0, 0)]
+for example in examples:
+    encrypted_example = circuit.encrypt(*example)
+    encrypted_result = circuit.run(encrypted_example)
+    result = circuit.decrypt(encrypted_result)
+    print(f"Evaluation of {' + '.join(map(str, example))} homomorphically = {result}")
+```
+
+or if you have a simple function that you can decorate, and you don't care about explicit steps of key generation, encryption, evaluation and decryption:
+
+```python
+import concrete.numpy as cnp
+
+@cnp.compiler({"x": "encrypted", "y": "encrypted"})
+def add(x, y):
+    return x + y
+
+inputset = [(2, 3), (0, 0), (1, 6), (7, 7), (7, 1), (3, 2), (6, 1), (1, 7), (4, 5), (5, 4)]
+
+print(f"Compiling...")
+circuit = add.compile(inputset)
+
 examples = [(3, 4), (1, 2), (7, 7), (0, 0)]
 for example in examples:
    result = circuit.encrypt_run_decrypt(*example)
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -7,14 +7,20 @@
 * [Installation](getting-started/installing.md)
 * [Quick Start](getting-started/quick\_start.md)
 * [Compatibility](getting-started/compatibility.md)
+* [Exactness](getting-started/exactness.md)
+* [Performance](getting-started/performance.md)

 ## Tutorials

 * [Decorator](tutorial/decorator.md)
+* [Formatting](tutorial/formatting.md)
+* [Tagging](tutorial/tagging.md)
 * [Extensions](tutorial/extensions.md)
-* [Table Lookups](tutorial/table\_lookup.md)
+* [Table Lookups](tutorial/table\_lookups.md)
+* [Rounded Table Lookups](tutorial/rounded\_table\_lookups.md)
 * [Floating Points](tutorial/floating\_points.md)
-* [Format](tutorial/formatting.md)
+* [Virtual Circuits](tutorial/virtual\_circuits.md)
+* [Direct Circuits](tutorial/direct\_circuits.md)

 ## How To

--- a/docs/_static/rounded-tlu/10-bits-removed.png
+++ b/docs/_static/rounded-tlu/10-bits-removed.png
--- a/docs/_static/rounded-tlu/12-bits-removed.png
+++ b/docs/_static/rounded-tlu/12-bits-removed.png
--- a/docs/_static/rounded-tlu/4-bits-kept.png
+++ b/docs/_static/rounded-tlu/4-bits-kept.png
--- a/docs/_static/rounded-tlu/6-bits-kept.png
+++ b/docs/_static/rounded-tlu/6-bits-kept.png
--- a/docs/_static/rounded-tlu/relu.png
+++ b/docs/_static/rounded-tlu/relu.png
--- a/docs/getting-started/compatibility.md
+++ b/docs/getting-started/compatibility.md
@@ -164,28 +164,14 @@ Some of these operations are not supported between two encrypted values. A detai

 ## Limitations

-### Operational constraints.
+### Control flow constraints.

 Some Python control flow statements are not supported. For example, you cannot have an `if` statement or a `while` statement for which the condition depends on an encrypted value. However, such statements are supported with constant values (e.g., `for i in range(SOME_CONSTANT)`, `if os.environ.get("SOME_FEATURE") == "ON":`).

+### Type constraints.
+
 Another constraint is that you cannot have floating-point inputs or floating-point outputs. You can have floating-point intermediate values as long as they can be converted to an integer Table Lookup (e.g., `(60 * np.sin(x)).astype(np.int64)`).

 ### Bit width constraints.

 There is a limit on the bit width of encrypted values. We are constantly working on increasing this bit width. If you go above the limit, you will get an error.
-
-### Computation constraints.
-
-One of the most common operations in **Concrete-Numpy** is `Table Lookups` (TLUs). TLUs are performed with an FHE operation called `Programmable Bootstrapping` (PBS). PBSes have a certain probability of error, which, when triggered, result in inaccurate results.
-
-Let's say you have the table:
-
-```python
-[1, 4, 9, 16, 25, 36, 49, 64]
-```
-
-And you performed a table lookup using `4`. The result you should get is `16`, but because of the possibility of error, you can sometimes get `9` or `25`.
-
-{% hint style="info" %}
-The probability of this error can be configured through the `p_error` configuration option, which has the default value of `0.000063342483999973` (i.e., probability of success is `99.993`%). Keep in mind that changing it could affect compilation and key generation times.
-{% endhint %}
--- a/docs/getting-started/exactness.md
+++ b/docs/getting-started/exactness.md
@@ -0,0 +1,27 @@
+# Exactness
+
+One of the most common operations in **Concrete-Numpy** is `Table Lookups` (TLUs). TLUs are performed with an FHE operation called `Programmable Bootstrapping` (PBS). PBSes have a certain probability of error, which, when triggered, result in inaccurate results.
+
+Let's say you have the table:
+
+```python
+[0, 1, 4, 9, 16, 25, 36, 49, 64]
+```
+
+And you performed a table lookup using `4`. The result you should get is `16`, but because of the possibility of error, you can sometimes get `9` or `25`. Sometimes even `4` or `36` if you have a high probability of error.
+
+The probability of this error can be configured through the `p_error` and `global_p_error` configuration options. The difference between these two options is that, `p_error` is for individual TLUs but `global_p_error` is for the whole circuit.
+
+Here is an example, if you set `p_error` to `0.01`, it means every TLU in the circuit will have a 1% chance of not being exact and 99% chance of being exact. If you have a single TLU in the circuit, `global_p_error` would be 1% as well. But if you have 2 TLUs for example, `global_p_error` would be almost 2% (`1 - (0.99 * 0.99)`).
+
+However, if you set `global_p_error` to `0.01`, the whole circuit will have 1% probability of being not exact, no matter how many table lookups are there.
+
+If you set both of them, both will be satisfied. Essentially, the stricter one will be used.
+
+By default, `p_error` is set to `None` and `global_p_error` is set to `1 / 100_000`. Feel free to play with these configuration options to pick the one best suited for your needs! For example, in some machine learning use cases, off-by-one or off-by-two errors doesn't affect the result much, in such cases `p_error` could be set to increase performance without losing accuracy.
+
+See [How to Configure](../howto/configure.md) to learn how you can set a custom `p_error` and/or `global_p_error`.
+
+{% hint style="info" %}
+Configuring either of those variables would affect computation time (compilation, keys generation, circuit execution) and space requirements (size of the keys on disk and in memory). Lower error probability would result in longer computation time and larger space requirements. 
+{% endhint %}
--- a/docs/getting-started/performance.md
+++ b/docs/getting-started/performance.md
@@ -0,0 +1,104 @@
+# Performance
+
+The most important operation in Concrete-Numpy is the table lookup operation. All operations except addition, subtraction, multiplication with non-encrypted values, and a few operations built with those primitive operations (e.g. matmul, conv) are converted to table lookups under the hood:
+
+```python
+import concrete.numpy as cnp
+
+@cnp.compiler({"x": "encrypted"})
+def f(x):
+    return x ** 2
+
+inputset = range(2 ** 4)
+circuit = f.compile(inputset)
+```
+
+is exactly the same as
+
+```python
+import concrete.numpy as cnp
+
+table = cnp.LookupTable([x ** 2 for x in range(2 ** 4)])
+
+@cnp.compiler({"x": "encrypted"})
+def f(x):
+    return table[x]
+
+inputset = range(2 ** 4)
+circuit = f.compile(inputset)
+```
+
+Table lookups are very flexible, and they allow Concrete Numpy to support many operations, but they are expensive! Therefore, you should try to avoid them as much as possible. In most cases, it's not possible to avoid them completely, but you might remove the number of TLUs or replace some of them with other primitive operations.
+
+The exact cost depend on many variables (machine configuration, error probability, etc.), but you can develop some intuition for single threaded CPU execution performance using:
+
+```python
+import time
+
+import concrete.numpy as cnp
+import numpy as np
+
+WARMUP = 3
+SAMPLES = 8
+BITWIDTHS = range(1, 15)
+CONFIGURATION = cnp.Configuration(
+    enable_unsafe_features=True,
+    use_insecure_key_cache=True,
+    insecure_key_cache_location=".keys",
+)
+
+timings = {}
+for n in BITWIDTHS:
+    @cnp.compiler({"x": "encrypted"})
+    def base(x):
+        return x
+
+    table = cnp.LookupTable([np.sqrt(x).round().astype(np.int64) for x in range(2 ** n)])
+
+    @cnp.compiler({"x": "encrypted"})
+    def tlu(x):
+        return table[x]
+
+    inputset = [0, 2**n - 1]
+
+    base_circuit = base.compile(inputset, CONFIGURATION)
+    tlu_circuit = tlu.compile(inputset, CONFIGURATION)
+
+    print()
+    print(f"Generating keys for n={n}...")
+
+    base_circuit.keygen()
+    tlu_circuit.keygen()
+
+    timings[n] = []
+    for i in range(SAMPLES + WARMUP):
+        sample = np.random.randint(0, 2 ** n)
+
+        encrypted_sample = base_circuit.encrypt(sample)
+        start = time.time()
+        encrypted_result = base_circuit.run(encrypted_sample)
+        end = time.time()
+        assert base_circuit.decrypt(encrypted_result) == sample
+
+        base_time = end - start
+
+        encrypted_sample = tlu_circuit.encrypt(sample)
+        start = time.time()
+        encrypted_result = tlu_circuit.run(encrypted_sample)
+        end = time.time()
+        assert tlu_circuit.decrypt(encrypted_result) == np.sqrt(sample).round().astype(np.int64)
+
+        tlu_time = end - start
+
+        if i >= WARMUP:
+            timings[n].append(tlu_time - base_time)
+            print(f"Sample #{i - WARMUP + 1} took {timings[n][-1] * 1000:.3f}ms")
+
+print()
+for n, times in timings.items():
+    print(f"{n}-bits -> {np.mean(times) * 1000:.3f}ms")
+```
+
+{% hint style="info" %}
+Concrete Numpy automatically parallelize execution if TLUs are applied to tensors.
+{% endhint %}
--- a/docs/getting-started/quick_start.md
+++ b/docs/getting-started/quick_start.md
@@ -63,6 +63,10 @@ It should be an iterable, yielding tuples of the same length as the number of ar
 inputset = [(2, 3), (0, 0), (1, 6), (7, 7), (7, 1)]
 ```

+{% hint style="warning" %}
+All inputs in the inputset will be evaluated in the graph, which takes time. If you're experiencing long compilation times, consider providing a smaller inputset.
+{% endhint %}
+
 ## Compiling the function

 You can use the `compile` method of a `Compiler` class with an inputset to perform the compilation and get the resulting circuit back:
--- a/docs/tutorial/direct_circuits.md
+++ b/docs/tutorial/direct_circuits.md
@@ -0,0 +1,83 @@
+# Direct Circuits
+
+{% hint style="warning" %}
+Direct circuits are still experimental, and it's very easy to shoot yourself in the foot (e.g., no overflow checks, no type coercion) while using them so utilize them with care.
+{% endhint %}
+
+For some applications, data types of inputs, intermediate values and outputs are known (e.g., for manipulating bytes, you would want to use uint8). For such cases, using inputsets to determine bounds are not necessary, or even error-prone. Therefore, another interface for defining such circuits, is introduced:
+
+```python
+import concrete.numpy as cnp
+
+@cnp.circuit({"x": "encrypted"})
+def circuit(x: cnp.uint8):
+    return x + 42
+
+assert circuit.encrypt_run_decrypt(10) == 52
+```
+
+There are a few differences between direct circuits and traditional circuits though:
+
+- You need to remember that resulting dtype for each operation will be determined by its inputs. This can lead to some unexpected results if you're not careful (e.g., if you do `-x` where `x: cnp.uint8`, you'll not get the negative value as the result will be `cnp.uint8` as well)
+- You need to use cnp types in `.astype(...)` calls (e.g., `np.sqrt(x).astype(cnp.uint4)`). This is because there are no inputset evaluation, so cannot determine the bit-width of the output.
+- You need to specify the resulting data type in [univariate](./extensions.md#cnpunivariatefunction) extension (e.g., `cnp.univariate(function, outputs=cnp.uint4)(x)`), because of the same reason as above.
+- You need to be careful with overflows. With inputset evaluation, you'll get bigger bit-widths but no overflows, with direct definition, you're responsible to ensure there aren't any overflows!
+
+Let's go over a more complicated example to see how direct circuits behave:
+
+```python
+import concrete.numpy as cnp
+import numpy as np
+
+def square(value):
+    return value ** 2
+
+@cnp.circuit({"x": "encrypted", "y": "encrypted"})
+def circuit(x: cnp.uint8, y: cnp.int2):
+    a = x + 10
+    b = y + 10
+
+    c = np.sqrt(a).round().astype(cnp.uint4)
+    d = cnp.univariate(square, outputs=cnp.uint8)(b)
+
+    return d - c
+
+print(circuit)
+```
+prints
+```
+%0 = x                       # EncryptedScalar<uint8>
+%1 = y                       # EncryptedScalar<int2>
+%2 = 10                      # ClearScalar<uint4>
+%3 = add(%0, %2)             # EncryptedScalar<uint8>
+%4 = 10                      # ClearScalar<uint4>
+%5 = add(%1, %4)             # EncryptedScalar<int4>
+%6 = subgraph(%3)            # EncryptedScalar<uint4>
+%7 = square(%5)              # EncryptedScalar<uint8>
+%8 = subtract(%7, %6)        # EncryptedScalar<uint8>
+return %8
+
+Subgraphs:
+
+    %6 = subgraph(%3):
+
+        %0 = input                         # EncryptedScalar<uint8>
+        %1 = sqrt(%0)                      # EncryptedScalar<float64>
+        %2 = around(%1, decimals=0)        # EncryptedScalar<float64>
+        %3 = astype(%2)                    # EncryptedScalar<uint4>
+        return %3
+```
+And here is the breakdown of assigned data types:
+```
+%0 is uint8 because it's specified in the definition
+%1 is  int2 because it's specified in the definition
+%2 is uint4 because it's the constant 10
+%3 is uint8 because it's the addition between uint8 and uint4
+%4 is uint4 because it's the constant 10
+%5 is  int4 because it's the addition between int2 and uint4
+%6 is uint4 because it's specified in astype
+%7 is uint8 because it's specified in univariate
+%8 is uint8 because it's subtraction between uint8 and uint4
+```
+
+As you can see, `%8` is subtraction of two unsigned values, and it's unsigned as well. In an overflow condition where `c > d`, it'll result in undefined behavior.
--- a/docs/tutorial/formatting.md
+++ b/docs/tutorial/formatting.md
@@ -1,8 +1,4 @@
-# Format
-
-Sometimes, it can be useful to print circuits. We provide methods to just do that.
-
-## Formatting
+# Formatting

 You can convert your compiled circuit into its textual representation by converting it to string:

@@ -17,3 +13,7 @@ If you just want to see the output on your terminal, you can directly print it a
 ```python
 print(circuit)
 ```
+
+{% hint style="warning" %}
+Formatting is just for debugging. It's not possible to serialize the circuit back from its textual representation. See [How to Deploy](../howto/deploy.md) if that's your goal.
+{% endhint %}
--- a/docs/tutorial/rounded_table_lookups.md
+++ b/docs/tutorial/rounded_table_lookups.md
@@ -0,0 +1,183 @@
+# Rounded Table Lookups
+
+{% hint style="warning" %}
+Rounded table lookups are only available in [virtual circuits](./virtual_circuits.md) for the time being.
+{% endhint %}
+
+Table lookups have a strict constraint on number of bits they support. This can be quite limiting, especially if you don't need the exact precision.
+
+To overcome such shortcomings, rounded table lookup operation is introduced. It's a way to extract most significant bits of a large integer and then applying the table lookup to those bits.
+
+Imagine you have an 8-bit value, but you want to have a 5-bit table lookup, you can call `cnp.round_bit_pattern(input, lsbs_to_remove=3)` and use the value you get in the table lookup.
+
+In Python, evaluation will work like the following:
+```
+0b_0000_0000 => 0b_0000_0000
+0b_0000_0001 => 0b_0000_0000
+0b_0000_0010 => 0b_0000_0000
+0b_0000_0011 => 0b_0000_0000
+0b_0000_0100 => 0b_0000_1000
+0b_0000_0101 => 0b_0000_1000
+0b_0000_0110 => 0b_0000_1000
+0b_0000_0111 => 0b_0000_1000
+
+0b_1010_0000 => 0b_1010_0000
+0b_1010_0001 => 0b_1010_0000
+0b_1010_0010 => 0b_1010_0000
+0b_1010_0011 => 0b_1010_0000
+0b_1010_0100 => 0b_1010_1000
+0b_1010_0101 => 0b_1010_1000
+0b_1010_0110 => 0b_1010_1000
+0b_1010_0111 => 0b_1010_1000
+
+0b_1010_1000 => 0b_1010_1000
+0b_1010_1001 => 0b_1010_1000
+0b_1010_1010 => 0b_1010_1000
+0b_1010_1011 => 0b_1010_1000
+0b_1010_1100 => 0b_1011_0000
+0b_1010_1101 => 0b_1011_0000
+0b_1010_1110 => 0b_1011_0000
+0b_1010_1111 => 0b_1011_0000
+
+0b_1011_1000 => 0b_1011_1000
+0b_1011_1001 => 0b_1011_1000
+0b_1011_1010 => 0b_1011_1000
+0b_1011_1011 => 0b_1011_1000
+0b_1011_1100 => 0b_1100_0000
+0b_1011_1101 => 0b_1100_0000
+0b_1011_1110 => 0b_1100_0000
+0b_1011_1111 => 0b_1100_0000
+```
+
+and during homomorphic execution, it'll be converted like this:
+```
+0b_0000_0000 => 0b_00000
+0b_0000_0001 => 0b_00000
+0b_0000_0010 => 0b_00000
+0b_0000_0011 => 0b_00000
+0b_0000_0100 => 0b_00001
+0b_0000_0101 => 0b_00001
+0b_0000_0110 => 0b_00001
+0b_0000_0111 => 0b_00001
+
+0b_1010_0000 => 0b_10100
+0b_1010_0001 => 0b_10100
+0b_1010_0010 => 0b_10100
+0b_1010_0011 => 0b_10100
+0b_1010_0100 => 0b_10101
+0b_1010_0101 => 0b_10101
+0b_1010_0110 => 0b_10101
+0b_1010_0111 => 0b_10101
+
+0b_1010_1000 => 0b_10101
+0b_1010_1001 => 0b_10101
+0b_1010_1010 => 0b_10101
+0b_1010_1011 => 0b_10101
+0b_1010_1100 => 0b_10110
+0b_1010_1101 => 0b_10110
+0b_1010_1110 => 0b_10110
+0b_1010_1111 => 0b_10110
+
+0b_1011_1000 => 0b_10111
+0b_1011_1001 => 0b_10111
+0b_1011_1010 => 0b_10111
+0b_1011_1011 => 0b_10111
+0b_1011_1100 => 0b_11000
+0b_1011_1101 => 0b_11000
+0b_1011_1110 => 0b_11000
+0b_1011_1111 => 0b_11000
+```
+
+and then a modified table lookup would be applied to the resulting 5-bits.
+
+Here is a concrete example, let's say you want to apply ReLU to an 18-bit value. Let's see what the original ReLU looks like first:
+
+```python
+import matplotlib.pyplot as plt
+
+def relu(x):
+    return x if x >= 0 else 0
+
+xs = range(-100_000, 100_000)
+ys = [relu(x) for x in xs]
+
+plt.plot(xs, ys)
+plt.show()
+```
+
+![](../_static/rounded-tlu/relu.png)
+
+Input range is [-100_000, 100_000), which means 18-bit table lookups are required, but they are not supported yet, you can apply rounding operation to the input before passing it to `ReLU` function:
+
+```python
+import concrete.numpy as cnp
+import matplotlib.pyplot as plt
+import numpy as np
+
+def relu(x):
+    return x if x >= 0 else 0
+
+@cnp.compiler({"x": "encrypted"})
+def f(x):
+    x = cnp.round_bit_pattern(x, lsbs_to_remove=10)
+    return cnp.univariate(relu)(x)
+
+inputset = [-100_000, (100_000 - 1)]
+circuit = f.compile(inputset, enable_unsafe_features=True, virtual=True)
+
+xs = range(-100_000, 100_000)
+ys = [circuit.encrypt_run_decrypt(x) for x in xs]
+
+plt.plot(xs, ys)
+plt.show()
+```
+
+in this case we've removed 10 least significant bits of the input and then applied ReLU function to this value to get:
+
+![](../_static/rounded-tlu/10-bits-removed.png)
+
+which is close enough to original ReLU for some cases. If your application is more flexible, you could remove more bits, let's say 12 to get:
+
+![](../_static/rounded-tlu/12-bits-removed.png)
+
+This is very useful, but in some cases, you don't know how many bits your input have, so it's not reliable to specify `lsbs_to_remove` manually. For this reason, `AutoRounder` class is introduced.
+
+```python
+import concrete.numpy as cnp
+import matplotlib.pyplot as plt
+import numpy as np
+
+rounder = cnp.AutoRounder(target_msbs=6)
+
+def relu(x):
+    return x if x >= 0 else 0
+
+@cnp.compiler({"x": "encrypted"})
+def f(x):
+    x = cnp.round_bit_pattern(x, lsbs_to_remove=rounder)
+    return cnp.univariate(relu)(x)
+
+inputset = [-100_000, (100_000 - 1)]
+cnp.AutoRounder.adjust(f, inputset)  # alternatively, you can use `auto_adjust_rounders=True` below
+circuit = f.compile(inputset, enable_unsafe_features=True, virtual=True)
+
+xs = range(-100_000, 100_000)
+ys = [circuit.encrypt_run_decrypt(x) for x in xs]
+
+plt.plot(xs, ys)
+plt.show()
+```
+
+`AutoRounder`s allow you to set how many of the most significant bits to keep, but they need to be adjusted using an inputset to determine how many of the least significant bits to remove. This can be done manually using `cnp.AutoRounder.adjust(function, inputset)`, or by setting `auto_adjust_rounders` to `True` during compilation.
+
+In the example above, `6` of the most significant bits are kept to get:
+
+![](../_static/rounded-tlu/6-bits-kept.png)
+
+You can adjust `target_msbs` depending on your requirements. If you set it to `4` for example, you'd get:
+
+![](../_static/rounded-tlu/4-bits-kept.png)
+
+{% hint style="warning" %}
+`AutoRounder`s should be defined outside the function being compiled. They are used to store the result of aqdjustment process, so they shouldn't be created each time the function is called.
+{% endhint %}
--- a/docs/tutorial/table_lookups.md
+++ b/docs/tutorial/table_lookups.md
--- a/docs/tutorial/tagging.md
+++ b/docs/tutorial/tagging.md
@@ -0,0 +1,56 @@
+# Tagging
+
+When you have big circuits, keeping track of which node corresponds to which part of your code becomes very hard. Tagging system could simplify such situations: 
+
+```python
+def g(z):
+    with cnp.tag("def"):
+        a = 120 - z
+        b = a // 4
+    return b
+
+
+def f(x):
+    with cnp.tag("abc"):
+        x = x * 2
+        with cnp.tag("foo"):
+            y = x + 42
+        z = np.sqrt(y).astype(np.int64)
+
+    return g(z + 3) * 2
+```
+
+when you compile `f` with inputset of `range(10)`, you get the following graph:
+
+```
+ %0 = x                            # EncryptedScalar<uint4>        ∈ [0, 9]
+ %1 = 2                            # ClearScalar<uint2>            ∈ [2, 2]            @ abc
+ %2 = multiply(%0, %1)             # EncryptedScalar<uint5>        ∈ [0, 18]           @ abc
+ %3 = 42                           # ClearScalar<uint6>            ∈ [42, 42]          @ abc.foo
+ %4 = add(%2, %3)                  # EncryptedScalar<uint6>        ∈ [42, 60]          @ abc.foo
+ %5 = subgraph(%4)                 # EncryptedScalar<uint3>        ∈ [6, 7]            @ abc
+ %6 = 3                            # ClearScalar<uint2>            ∈ [3, 3]
+ %7 = add(%5, %6)                  # EncryptedScalar<uint4>        ∈ [9, 10]
+ %8 = 120                          # ClearScalar<uint7>            ∈ [120, 120]        @ def
+ %9 = subtract(%8, %7)             # EncryptedScalar<uint7>        ∈ [110, 111]        @ def
+%10 = 4                            # ClearScalar<uint3>            ∈ [4, 4]            @ def
+%11 = floor_divide(%9, %10)        # EncryptedScalar<uint5>        ∈ [27, 27]          @ def
+%12 = 2                            # ClearScalar<uint2>            ∈ [2, 2]
+%13 = multiply(%11, %12)           # EncryptedScalar<uint6>        ∈ [54, 54]
+return %13
+
+Subgraphs:
+
+    %5 = subgraph(%4):
+
+        %0 = input                         # EncryptedScalar<uint2>          @ abc.foo
+        %1 = sqrt(%0)                      # EncryptedScalar<float64>        @ abc
+        %2 = astype(%1, dtype=int_)        # EncryptedScalar<uint1>          @ abc
+        return %2
+```
+
+and if you get an error, you'll precisely see where the error occurred (e.g., which layer of the neural network, if you tag layers).
+
+{% hint style="info" %}
+In the future, we're planning to use tags for other features as well (e.g., to measure performance of tagged regions), so it's a good idea to start utilizing them for big circuits.
+{% endhint %}
--- a/docs/tutorial/virtual_circuits.md
+++ b/docs/tutorial/virtual_circuits.md
@@ -0,0 +1,54 @@
+# Virtual Circuits
+
+During development, speed of homomorphic execution is a big blocker for fast prototyping. Furthermore, it might be desirable to experiment with more bit-widths, even though they are not supported yet, to get insights about the requirements of your system (e.g., we would have an XYZ model with 95% accuracy if we have 25-bits).
+
+To simplify this process, we've introduces virtual circuits:
+
+```python
+import concrete.numpy as cnp
+import numpy as np
+
+@cnp.compiler({"x": "encrypted"})
+def f(x):
+    return np.sqrt(x * 100_000).round().astype(np.int64)
+
+inputset = range(100_000, 101_000)
+circuit = f.compile(inputset, enable_unsafe_features=True, virtual=True)
+
+print(circuit)
+print(circuit.encrypt_run_decrypt(100_500), "~=", np.sqrt(100_500 * 100_000))
+```
+
+prints
+
+```
+%0 = x                       # EncryptedScalar<uint17>        ∈ [100000, 100999]
+%1 = 100000                  # ClearScalar<uint17>            ∈ [100000, 100000]
+%2 = multiply(%0, %1)        # EncryptedScalar<uint34>        ∈ [10000000000, 10099900000]
+%3 = subgraph(%2)            # EncryptedScalar<uint17>        ∈ [100000, 100498]
+return %3
+
+Subgraphs:
+
+    %3 = subgraph(%2):
+
+        %0 = input                         # EncryptedScalar<uint1>
+        %1 = sqrt(%0)                      # EncryptedScalar<float64>
+        %2 = around(%1, decimals=0)        # EncryptedScalar<float64>
+        %3 = astype(%2, dtype=int_)        # EncryptedScalar<uint1>
+        return %3
+        
+100250 ~= 100249.6882788171
+```
+
+and it doesn't perform any homomorphic computation. It just simulates execution.
+
+Keyword arguments `enable_unsafe_features=True` and `virtual=True` passed to `compile` are configuration options. `virtaul=True` enables makes the circuit virtual, and because virtual circuits are highly experimental, unsafe features must be enabled using `enable_unsafe_features=True` to utilize virtual circuits. See [How to Configure](../howto/configure.md) to learn more about configuration options.
+
+{% hint style="info" %}
+Virtual circuits still check for operational constraints and type constraints. Which means you cannot have floating points, or unsupported operations. They just ignore bit-width constraints.
+{% endhint %}
+
+{% hint style="warning" %}
+Virtual circuits are still experimental, and they don't properly consider [error probability](../getting-started/exactness.md) for example. That's why you need to enable unsafe features to use them. Use them with care!
+{% endhint %}