This adds a new pass that is able to hoist operations implementing the
`BatchableOpInterface` out of a loop nest that applies the operation
to the elements of a tensor indexed by the loop induction variables.
Example:
scf.for %i = c0 to %cN step %c1 {
scf.for %j = c0 to %cM step %c1 {
scf.for %k = c0 to %cK step %c1 {
%s = tensor.extract %T[%i, %j, %k]
%res = batchable_op %s
...
}
}
}
is replaced with:
%batchedSlice = tensor.extract_slice
%T[%c0, %c0, %c0] [%cN, %cM, %cK] [%c1, %c1, %c1]
%flatSlice = tensor.collapse_shape %batchedSlice
%resTFlat = batchedOp %flatSlice
%resT = tensor.expand_shape %resTFlat
scf.for %i = c0 to %cN step %c1 {
scf.for %j = c0 to %cM step %c1 {
scf.for %k = c0 to %cK step %c1 {
%res = tensor.extract %resT[%i, %j, %k]
...
}
}
}
Every index of the tensor with the input values may be a quasi-affine
expression on a single loop induction variable, as long as the
difference between the results of the expression for any two
consecutive values of the referenced loop induction variable is
constant.
This adds a new operation interface that allows an operation to
specify that a batched version of the operation exists that applies it
on the elements of a flat tensor in parallel.
updated also the API to make it easier to use by:
- creating MLIR components from native rust types instead of require
MLIR components in the API
- adding helpers around the creation of standard dialects
Previousely, we were sending parsed benchmark results to a
Prometheus instance. Do to its time-series nature, Prometheus would
downsample database content to avoid having to much data points
for a given range of time. While this behavior is good for a
continuous stream of data, like monitoring CPU load, it's not suited
for benchmarks. Indeed benchmarks are discrete events that would
occurr once in a while (i.e once a day). Downsampling would, at
some point, simply omit some of benchmarks results.
Using a regular SQL database like PostgreSQL solves this issue.
this required to have a CAPI that when asked for types, returns a
structure that can report if an error was faced during type creation.
This is required since a failure at that stage in the compiler would
lead to a segfault in the python bindings for example, and we want to be
able to handle this scenario gracefully.
Instead of overriding the process stderr to get
the string representation from mlir we can can
directly capture in into a string using mlir's
printOperation.
Another problem with overriding stderr is that
each `#[test]` runs as a different thread meaning that
as soon as we have 2+ tests the tests could panic
due to conflicts/races between the different overrides.
This also moves the expected string directly into the test
as a literal.
Bench just one compilation option for automatic benchmarks. Only 'loop'
option is tested to take advantage of hardware with a lot of available
CPUs. Running benchmarks with 'default' option is suboptimal for this
kind of hardware since it uses only one CPU.
This also remove time consuming MNIST test, as it should be in ML benchmarks.
Moreover Makefile is fixed to use provided Python executable instead of
relying on system one to generate MLIR Yaml files.
The rust bindings are intented to access both LLVM/MLIR CAPI as well as
the concrete-compiler one. This initial commit provide the API for
LLVM/MLIR only. Tests should be used as an example to how to generate a
valid DAG of operations in MLIR.