This adds a new pattern to the batching pass that folds operations on
tensors of constants into new tensors of constants. E.g.,
%cst = arith.constant dense<...> : tensor<Nxi9>
%res = scf.for %i = %c0 to %cN {
%cst_i9 = tensor.extract %cst[%i]
%cst_i64 = arith.extui %cst_i9 : i64
...
}
becomes:
%cst = arith.constant dense<...> : tensor<Nxi64>
%res = scf.for %i = %c0 to %cN {
%cst_i64 = tensor.extract %cst[%i]
...
}
The pattern only works for static loops, indexes that are quasi-affine
expressions on single loop induction variables with a constant step
size across iterations and foldable operations that have a single
result.
The current batching pass only supports batching of operations that
have a single batchable operand, that can only be batched in one way
and that operate on scalar values. However, this does not allow for
efficient batching of all arithmetic operations in TFHE, since these
are often applied to pairs of scalar values from tensors, to tensors
and scalars or to tensors that can be grouped in higher-order tensors.
This commit introduces three new features for batching:
1. Support of multiple batchable operands
The operation interface for batching now allows for the
specification of multiple batchable operands. This set can be
composed of any subset of an operation's operands, i.e., it is
not limited to sets of operands with contiguous operand indexes.
2. Support for multiple batching variants
To account for multiple kinds of batching, the batching operation
interface `BatchableOpInterface` now supports variants. The
batching pass attempts to batch an operation by trying the
batching variants expressed via the interface in order until it
succeeds.
3. Support for batching of tensor values
Some operations that could be batched already operate on tensor
values. The new batching pass detects those patterns and groups
the batchable tensors' values into higher-dimensional tensors.
The batching pass erroneously assumes that any expression solely
composed of an induction variable has static bounds. This commit adds
a test for the lower bound, upper bound and step checking that they
are indeed static before attempting to determine their static values.
The batching pass only creates a batched version of a batchable
operation if all of its non-batchable operands are defined out ouf the
outermost loop the iterating over the values of the batchable operand.
This change also allows for operations to be batched if the
non-batachable operands are generated by operations, which are pure
and thus hoistable out of the outermost loop.
An early test for a batchable operation checks whether the batchable
operand is produced by a `tensor.extract` operation and bails out if
this is not the case. However, the use of `llvm::dyn_cast<T>()` directly
on the defining operation of the batchable operand causes an attempt
to cast a null value for an operand which is not produced by an
operation (e.g., block arguments).
Using `llvm::dyn_cast_or_null<T>()` fixes this issue.
This commit rebases the compiler onto commit 465ee9bfb26d from
llvm-project with locally maintained patches on top, i.e.:
* 5d8669d669ee: Fix the element alignment (size) for memrefCopy
* 4239163ea337: fix: Do not fold the memref.subview if the offset are
!= 0 and strides != 1
* 72c5decfcc21: remove github stuff from llvm
* 8d0ce8f9eca1: Support arbitrary element types in named operations
via attributes
* 94f64805c38c: Copy attributes of scf.for on bufferization and make
it an allocation hoisting barrier
Main upstream changes from llvm-project that required modification of
concretecompiler:
* Switch to C++17
* Various changes in the interfaces for linalg named operations
* Transition from `llvm::Optional` to `std::optional`
* Use of enums instead of string values for iterator types in linalg
* Changed default naming convention of getter methods in
ODS-generated operation classes from `some_value()` to
`getSomeValue()`
* Renaming of Arithmetic dialect to Arith
* Refactoring of side effect interfaces (i.e., renaming from
`NoSideEffect` to `Pure`)
* Re-design of the data flow analysis framework
* Refactoring of build targets for Python bindings
* Refactoring of array attributes with integer values
* Renaming of `linalg.init_tensor` to `tensor.empty`
* Emission of `linalg.map` operations in bufferization of the Tensor
dialect requiring another linalg conversion pass and registration
of the bufferization op interfaces for linalg operations
* Refactoring of the one-shot bufferizer
* Necessity to run the expand-strided-metadata, affine-to-std and
finalize-memref-to-llvm passes before converson to the LLVM
dialect
* Renaming of `BlockAndValueMapping` to `IRMapping`
* Changes in the build function of `LLVM::CallOp`
* Refactoring of the construction of `llvm::ArrayRef` and
`llvm::MutableArrayRef` (direct invocation of constructor instead
of builder functions for some cases)
* New naming conventions for generated SSA values requiring rewrite
of some check tests
* Refactoring of `mlir::LLVM::lookupOrCreateMallocFn()`
* Interface changes in generated type parsers
* New dependencies for to mlir_float16_utils and
MLIRSparseTensorRuntime for the runtime
* Overhaul of MLIR-c deleting `mlir-c/Registration.h`
* Deletion of library MLIRLinalgToSPIRV
* Deletion of library MLIRLinalgAnalysis
* Deletion of library MLIRMemRefUtils
* Deletion of library MLIRQuantTransforms
* Deletion of library MLIRVectorToROCDL