current CAPI of CompilerEngine isn't really a CAPI. It's initial need
was for the python bindings to have access to the CompilerEngine through
a convenient API. So we now make a clear separation of CAPI and python
wrappers. So we now have wrappers functions, that can be implemented
using C/C++, and will be exposed to python via pybind11. And we have a
CAPI (still need fixing as it still contains C++ code), that can be used
as is, or to build bindings for other languages (such as Rust).
The batching pass passes operands to the batched operation as a flat,
one-dimensional vector produced through a `tensor.collapse_shape`
operation collapsing all dimensions of the original tensor of
operands. Similarly, the shape of the result vector of the batched
operation is expanded to the original shape afterwards using a
`tensor.expand_shape` operation.
The pass emits the `tensor.collapse_shape` and `tensor.expand_shape`
operations unconditionally, even for tensors, which already have only
a single dimension. This causes the verifiers of these operations to
fail in some cases, aborting the entire compilation process.
This patch lets the batching pass emit `tensor.collapse_shape` and
`tensor.expand_shape` for batched operands and batched results only if
the rank of the corresponding tensors is greater than one.
This CI "feature" is meant to circumvent the 6 hours hard-limit
for a job in GitHub Action.
The benchmark is done using a matrix which is handled by Slab.
Here's the workflow:
1. ML benchmarks are started in a fire and forget fashion via
start_ml_benchmarks.yml
2. Slab will read ci/slab.toml to get the AWS EC2 configuration
and the matrix parameters
3. Slab will launch at most max_parallel_jobs EC2 instances in
parallel
4. Each job will trigger ml_benchmark_subset.yml which will run
only one of the generated YAML file via make generate-mlbench,
based on the value of the matrix item they were given.
5. As soon as a job is completed, the next one in the matrix
will start promptly.
This is done until all the matrix items are exhausted.
This adds a new end-to-end test `apply_lookup_table_batched`, which
forces batching of Concrete operations when invoking the compiler
engine, indirectly causing the `concrete.bootstrap_lwe` and
`concrete.keyswitch_lwe` operations generated from the
`FHELinalg.apply_lookup_table` operation of the test to be batched
into `concrete.batched_bootstrap_lwe` and
`concrete.batched_keyswitch_lwe` operations. The batched operations
trigger the generation of calls to batching wrapper functions further
down the pipeline, effectively testing the lowering and implementation
of batched operations altogether.
The new option `--batch-concrete-ops` invokes the batching pass after
lowering to the Concrete dialect and after lowering linalg operations
with operations from the Concrete dialect to loops.
The new action `dump-concrete-with-loops` dumps the IR right before
batching.
This adds a new pass that is able to hoist operations implementing the
`BatchableOpInterface` out of a loop nest that applies the operation
to the elements of a tensor indexed by the loop induction variables.
Example:
scf.for %i = c0 to %cN step %c1 {
scf.for %j = c0 to %cM step %c1 {
scf.for %k = c0 to %cK step %c1 {
%s = tensor.extract %T[%i, %j, %k]
%res = batchable_op %s
...
}
}
}
is replaced with:
%batchedSlice = tensor.extract_slice
%T[%c0, %c0, %c0] [%cN, %cM, %cK] [%c1, %c1, %c1]
%flatSlice = tensor.collapse_shape %batchedSlice
%resTFlat = batchedOp %flatSlice
%resT = tensor.expand_shape %resTFlat
scf.for %i = c0 to %cN step %c1 {
scf.for %j = c0 to %cM step %c1 {
scf.for %k = c0 to %cK step %c1 {
%res = tensor.extract %resT[%i, %j, %k]
...
}
}
}
Every index of the tensor with the input values may be a quasi-affine
expression on a single loop induction variable, as long as the
difference between the results of the expression for any two
consecutive values of the referenced loop induction variable is
constant.
This adds a new operation interface that allows an operation to
specify that a batched version of the operation exists that applies it
on the elements of a flat tensor in parallel.
updated also the API to make it easier to use by:
- creating MLIR components from native rust types instead of require
MLIR components in the API
- adding helpers around the creation of standard dialects
this required to have a CAPI that when asked for types, returns a
structure that can report if an error was faced during type creation.
This is required since a failure at that stage in the compiler would
lead to a segfault in the python bindings for example, and we want to be
able to handle this scenario gracefully.
Instead of overriding the process stderr to get
the string representation from mlir we can can
directly capture in into a string using mlir's
printOperation.
Another problem with overriding stderr is that
each `#[test]` runs as a different thread meaning that
as soon as we have 2+ tests the tests could panic
due to conflicts/races between the different overrides.
This also moves the expected string directly into the test
as a literal.