This adds support for the operations `RT.await_future`,
`RT.build_return_ptr_placeholder`, `RT.create_async_task`,
`RT.deref_return_ptr_placeholder`,
`RT.deref_work_function_argument_ptr_placeholder`,
`RT.make_ready_future`, `RT.register_task_work_function`, and
`RT.work_function_return`, the RT types `RT.ptr` and `RT.future`, as
well as the auxiliary operation `arith.select`.
A function may be referenced at multiple sites, including at locations
that have not been visited by the rewriter when the function itself is
rewritten. By using `IRRewriter::replaceOp` after rewriting a
function, the function is erased immediately, which may cause the
rewriting to fail due to remaining uses of the function.
Defering the erasure of functions to the end of the entire rewriting
process ensures that all uses of the original functions have been
updated to the rewritten functions.
Operations with regions containing multiple blocks are currently not
handled correctly, since the list of successors of an operation is not
updated with rewritten blocks when the operation is
rewritten. Furthermore, functions were assumed to have only a single
block.
This change supports operations with multiple blocks by fixing the
issues above.
Updates of the types inferred for the values related to an operation
only propagate correctly to the operation if there is a direct
producer-consumer relationship or an indirect producer-consumer
relationship via some additional mechanism (e.g., region
successors). However, this is not sufficient to ensure updates of
values and operations that are related otherwise.
This change explicitly models the dependencies between related values
and an operation via `DataFlowAnalysis::addDependency` in order to
guarantee that all updates of the types of values are propagated to
all operations that the type resolver has designated as related
operations.
Furthermore, all operations are visited once initially in order to
guarantee that updates propagate to operations, for which the dataflow
framework does not invoke `visitOperation`.
This delegates the enumeration of values related to an operation to
the class `TypeResolver`. This allows for the customization of this
process via a class inheriting `TypeResolver`.
Return-like operations require special treatment by the type inference
rewriter, since their operand types are both tied to the result types
of their producers and to result types of their parent operations.
The inference scheme for ordinary operations, in which the initial
local inference state is composed of the operand types of the
rewritten producers and the old types of related operations before
rewriting is insufficient, since this may result in a mismatch between
the inferred types and the actual types of the already rewritten
parent operation.
Until now, precedence of the new result types of the parent operation
has been implemented by simply designating these types as the operand
types of a return-like operation. However, while this works as
intended for return-like operations, which simply forward values
(e.g., `func.return`), this creates invalid IR for other return-like
operations (e.g., `tensor.yield`).
This change implements precedence of the result types of the parent
operation of a return-like operation by adding the return types of the
already rewritten parent operation to the initial local inference
state before final invocation of type inference.
The type inference rewriter changes the name of the rewritten function
to the name of the original function when the rewriting process is
complete. However, the name is retrieved from the original function
operation after the operation has already been replaced and thus
destroyed, resulting in a null pointer dereference.
This change retrieves the name of the original function before it is
replaced and saves it in a copy, which is then used to safely assign
the new name to the rewritten function.
The current scheme used by reinstantiating conversion patterns in
`lib/Conversion/Utils/Dialects` for operations with blocks is to
create a new operation with empty blocks, to move the operations from
the old blocks and then to replace any references to block
arguments. However, such in-place updates of the types of block
arguments leave conversion patterns for operations nested in the
blocks without the ability to determine the original types of values
from before the update.
This change uses proper signature conversion for block arguments, such
that the original types of block arguments with converted types is
preserved, while the new types are made available through the dialect
conversion infrastructure via the respective adaptors.
This introduces a new function `normalizeInductionVar()` to the static
loop utility code in `concretelang/Analysis/StaticLoops.h` with code
extracted for IV normalization from the batching code and changes the
batching code to make use of the factored function.
This adds support for the tiling of `linalg.generic` operations that
have only parallel iterators or only parallel iterators and a single
reduction dimension via the linalg tiling infrastructure (i.e.,
`mlir::linalg::tileToForallOpUsingTileSizes()` and
`mlir::linalg::tileReductionUsingForall()`).
This allows for the tiling of FHELinalg operations by first replacing
them with appropriate `linalg.generic` oeprations and then invoking
the tiling pass in the pipeline. In order for the tiling to take
place, tile sizes must be specified using the `tile-sizes` operation
attribute, either directly for `linalg.generic` operations or
indirectly for the FHELinalg operation, e.g.,
"FHELinalg.matmul_eint_int"(%a, %b) { "tile-sizes" = [0, 0, 7] } : ...
Tiling of operations with a reduction dimension is currently limited
to tiling of the reduction dimension, i.e., the tile sizes for the
parallel dimensions must be zero.
This adds a new pass that converts `scf.forall` loops into nested
`scf.for` operations. The conversion carries parallel output tensors
from the original loop as dependencies through the loop nest and
replaces any occurrence of `tensor.parallel_insert_slice` operations
in the `scf.forall.in_parallel` terminator with equivalent
`tensor.insert_slice` operations.
This makes the trip counts of operations in the TFHE statistics pass
as well as the per-location memory usage statistics in the memory
usage statistics pass optional. These values are unset if the trip
count could not be determined statically.