Until now, the pass hoisting `RT.await_future` operations only
supports `tensor.parallel_insert_slice` operations that use loop
induction variables directly as indexes. Any more complex indexing
expressions produce a domination error, since a
`tensor.parallel_insert_slice` cloned by the pass into an additional
parallel for loop is left with references to values from the original
loop.
This change properly clones operations producing intermediate values
within the original parallel for loop and thus adds support for
indexing expressions that reference loops IVs only indirectly.
The tiling infrastructure preserves attributes of tiled
`linalg.generic` operations, such that the attribute for the tile
sizes specified for the `linalg.generic` operation before tiling is
copied to the `linalg.generic` operation that is part of the generated
IR for a single tile.
This change causes the attribute to be removed after tiling, since it
does not make sense to preserve the attribute for per-tile operations.
This adds support for the tiling of `linalg.generic` operations that
have only parallel iterators or only parallel iterators and a single
reduction dimension via the linalg tiling infrastructure (i.e.,
`mlir::linalg::tileToForallOpUsingTileSizes()` and
`mlir::linalg::tileReductionUsingForall()`).
This allows for the tiling of FHELinalg operations by first replacing
them with appropriate `linalg.generic` oeprations and then invoking
the tiling pass in the pipeline. In order for the tiling to take
place, tile sizes must be specified using the `tile-sizes` operation
attribute, either directly for `linalg.generic` operations or
indirectly for the FHELinalg operation, e.g.,
"FHELinalg.matmul_eint_int"(%a, %b) { "tile-sizes" = [0, 0, 7] } : ...
Tiling of operations with a reduction dimension is currently limited
to tiling of the reduction dimension, i.e., the tile sizes for the
parallel dimensions must be zero.
The tests `end_to_end_leveled.jit.loop_dagmulti.mul_eint_10bits.0` and
`end_to_end_leveled.jit.loop_dagmulti.signed_mul_eint_10bits.0`
reproducibly fail on the mac2.metal instance used by the CI due to
insufficient memory. This change disables all instances of these tests
with 10 or more bits of precision on MacOS.
On Mac arm, the c api backing the python bindings does not propagate the
exceptions properly to the concretelang python module. This makes all
exceptions raised through `CompilerEngine.cpp` fall in the catch-all
case of the pybind exceptions handler.
Since there is no particular need for a public c api, we just remove it
from the bindings, and move all the content of `CompilerEngine.cpp`
directly in the `CompilerAPIModule.cpp` file.
Some of the tests use lookup tables whose numbers of elements do not
match the sizes of the polynoms of the bootstrap operations they are
passed to. This commit replaces these lookup tables with tables of the
right size.
In the optimizer, nodes without consumers are identified as outputs.
Since we can now return multiple values, this is inherently buggy,
since a value can then be both returned, and consumed to create another
input.
This commit fixes this by allowing the compiler to tag nodes as being
outputs.