For `ExtractSliceOp` and `InsertSliceOp`, the code performing the
hoisting of indexed operations in the batching pass derives the
indexes of the hoisted operation from the indexes provided by
`ExtractSliceOp::getOffsets()` and
`InsertSliceOp::getOffsets()`. However, these methods only return
dynamic indexes, such that operations with mixed, dynamic and static
offsets are hoisted incorrectly.
This patch replaces the invocations of `ExtractSliceOp::getOffsets()`
and `InsertSliceOp::getOffsets()` with invocations of
`ExtractSliceOp::getMixedOffsets()` and
`InsertSliceOp::getMixedOffsets()`, respectively in order to take into
account both static and dynamic indexes.
The IR generated by the batching pattern introduces intermediate
tensor values omitting the batched dimensions for batched
operands. This happens uncondiationally, leading to the generation of
`tensor.collapse_shape` operations with the same output and inout
shape. However, verification for such operation fails, since the
verifier assumes that the rank of the resulting tensor is reduced at
least by one.
This commit modified the check in `flattenTensor`, such that no
flattening operation is generated if the input and output shapes would
be identical.
Currently, the cleanup pattern matches sequences of `tensor.extract` /
`tensor.extract_slice`, `tensor.insert` / `tensor.insert_slice` and
`scf.yield` operations that are embedded into perfect loop nests whose
IVs are referenced in quasi-affine expressions with constant
steps. The pattern ensures that the extraction and insertion operation
use the same IVs for indexing, but does not check whether they appear
in the same order.
However, the order in which IVs are used for indexing is crucial when
replacing the operations with `tensor.extract_slice` and
`tensor.insert_slice` operations in order to preserve the shape of the
slices and the order of elements.
For example, when the cleanup pattern is applied to the following IR:
```
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%c3 = arith.constant 3 : index
scf.for %i = %c0 to %c3 step %c1 ... {
scf.for %j = %c0 to %c2 step %c1 ... {
%v = tensor.extract ... [%i, %j]
%t = tensor.insert ... [%j, %i]
scf.yield %t
}
...
}
```
The extracted slice has a shape of 3x2, while the insertion should be
happening with a slice with the shape 2x3.
This commit adds an additional check to the cleanup pattern that
ensures that loop IVs are used for indexing in the same order and
appear the same number of times.
Normalization of indexes in the batching pass currently
unconditionally emits arithmetic operations to shift and scale indexes
regardless of whether these indexes are already normalized. This leads
to unnecessary subtractions with 0 and divisions by 1.
This commit introduces two additional checks to the code normalizing
indexes that prevent arithmetic operations to be emitted for indexes
that do not need to be shifted or scaled.
- added --compress-input compiler option which forces the use of seeded
bootstrap keys and keyswitch keys
- replaced the concrete-cpu FHE implementation with tfhe-rs
Co-authored-by: Nikita Frolov <nf@mkmks.org>