This adds support for the tiling of `linalg.generic` operations that
have only parallel iterators or only parallel iterators and a single
reduction dimension via the linalg tiling infrastructure (i.e.,
`mlir::linalg::tileToForallOpUsingTileSizes()` and
`mlir::linalg::tileReductionUsingForall()`).
This allows for the tiling of FHELinalg operations by first replacing
them with appropriate `linalg.generic` oeprations and then invoking
the tiling pass in the pipeline. In order for the tiling to take
place, tile sizes must be specified using the `tile-sizes` operation
attribute, either directly for `linalg.generic` operations or
indirectly for the FHELinalg operation, e.g.,
"FHELinalg.matmul_eint_int"(%a, %b) { "tile-sizes" = [0, 0, 7] } : ...
Tiling of operations with a reduction dimension is currently limited
to tiling of the reduction dimension, i.e., the tile sizes for the
parallel dimensions must be zero.
This adds an invocation of the `SCFForallToSCFFor` pass to the
compilation pipeline before lowering to the LLVM dialect as a
sequential fallback path to future passes exploiting the parallelism
of `scf.forall` further up in the pipeline.
This adds a new pass that converts `scf.forall` loops into nested
`scf.for` operations. The conversion carries parallel output tensors
from the original loop as dependencies through the loop nest and
replaces any occurrence of `tensor.parallel_insert_slice` operations
in the `scf.forall.in_parallel` terminator with equivalent
`tensor.insert_slice` operations.
This makes the trip counts of operations in the TFHE statistics pass
as well as the per-location memory usage statistics in the memory
usage statistics pass optional. These values are unset if the trip
count could not be determined statically.
The tests `end_to_end_leveled.jit.loop_dagmulti.mul_eint_10bits.0` and
`end_to_end_leveled.jit.loop_dagmulti.signed_mul_eint_10bits.0`
reproducibly fail on the mac2.metal instance used by the CI due to
insufficient memory. This change disables all instances of these tests
with 10 or more bits of precision on MacOS.