C struct now contains an additonal char* pointer, which can be either
NULL in case there is no error, or a buffer containing the error
message. It's the responsability of destructor function to free that
memory.
CAPI covering a wider API of the Support library.
Better error handling. Could also be improved by returning an error
message back from C to rust (left TODO).
- Missing offset in woppbs routine
- Better error message for check of tensor result in end to end fixture
- Modify fixture generator for testing purpose
current CAPI of CompilerEngine isn't really a CAPI. It's initial need
was for the python bindings to have access to the CompilerEngine through
a convenient API. So we now make a clear separation of CAPI and python
wrappers. So we now have wrappers functions, that can be implemented
using C/C++, and will be exposed to python via pybind11. And we have a
CAPI (still need fixing as it still contains C++ code), that can be used
as is, or to build bindings for other languages (such as Rust).
The batching pass passes operands to the batched operation as a flat,
one-dimensional vector produced through a `tensor.collapse_shape`
operation collapsing all dimensions of the original tensor of
operands. Similarly, the shape of the result vector of the batched
operation is expanded to the original shape afterwards using a
`tensor.expand_shape` operation.
The pass emits the `tensor.collapse_shape` and `tensor.expand_shape`
operations unconditionally, even for tensors, which already have only
a single dimension. This causes the verifiers of these operations to
fail in some cases, aborting the entire compilation process.
This patch lets the batching pass emit `tensor.collapse_shape` and
`tensor.expand_shape` for batched operands and batched results only if
the rank of the corresponding tensors is greater than one.
The new option `--batch-concrete-ops` invokes the batching pass after
lowering to the Concrete dialect and after lowering linalg operations
with operations from the Concrete dialect to loops.
The new action `dump-concrete-with-loops` dumps the IR right before
batching.
This adds a new pass that is able to hoist operations implementing the
`BatchableOpInterface` out of a loop nest that applies the operation
to the elements of a tensor indexed by the loop induction variables.
Example:
scf.for %i = c0 to %cN step %c1 {
scf.for %j = c0 to %cM step %c1 {
scf.for %k = c0 to %cK step %c1 {
%s = tensor.extract %T[%i, %j, %k]
%res = batchable_op %s
...
}
}
}
is replaced with:
%batchedSlice = tensor.extract_slice
%T[%c0, %c0, %c0] [%cN, %cM, %cK] [%c1, %c1, %c1]
%flatSlice = tensor.collapse_shape %batchedSlice
%resTFlat = batchedOp %flatSlice
%resT = tensor.expand_shape %resTFlat
scf.for %i = c0 to %cN step %c1 {
scf.for %j = c0 to %cM step %c1 {
scf.for %k = c0 to %cK step %c1 {
%res = tensor.extract %resT[%i, %j, %k]
...
}
}
}
Every index of the tensor with the input values may be a quasi-affine
expression on a single loop induction variable, as long as the
difference between the results of the expression for any two
consecutive values of the referenced loop induction variable is
constant.
This adds a new operation interface that allows an operation to
specify that a batched version of the operation exists that applies it
on the elements of a flat tensor in parallel.
updated also the API to make it easier to use by:
- creating MLIR components from native rust types instead of require
MLIR components in the API
- adding helpers around the creation of standard dialects
this required to have a CAPI that when asked for types, returns a
structure that can report if an error was faced during type creation.
This is required since a failure at that stage in the compiler would
lead to a segfault in the python bindings for example, and we want to be
able to handle this scenario gracefully.