add install target in Makefile to copy necessary libs, bins, and
includes to an installation directory. Use this install target to
package deps into a tarball with new installation instructions.
When this type of benchmarks is triggered, only the tests that
benefits from GPU acceleration are run on a specific AWS EC2
instance. Note that this instance (p3.2xlarge) is not a bare metal
one, so performance may variate due to hypervisor controlling the
machine.
Now compiler builds on AWS can be requested in Pull Request comment
using '@slab-ci <build_command>'. This sets up environment to able
to trigger both CPU and GPU builds on AWS EC2.
* Update makefile to use appropriate arguments to find executables on
macos systems.
* Add `set -e` to make sure that the tests crash if something goes wrong
in the build or test of the macos job of the CI.
closes#783
This CI "feature" is meant to circumvent the 6 hours hard-limit
for a job in GitHub Action.
The benchmark is done using a matrix which is handled by Slab.
Here's the workflow:
1. ML benchmarks are started in a fire and forget fashion via
start_ml_benchmarks.yml
2. Slab will read ci/slab.toml to get the AWS EC2 configuration
and the matrix parameters
3. Slab will launch at most max_parallel_jobs EC2 instances in
parallel
4. Each job will trigger ml_benchmark_subset.yml which will run
only one of the generated YAML file via make generate-mlbench,
based on the value of the matrix item they were given.
5. As soon as a job is completed, the next one in the matrix
will start promptly.
This is done until all the matrix items are exhausted.
Previousely, we were sending parsed benchmark results to a
Prometheus instance. Do to its time-series nature, Prometheus would
downsample database content to avoid having to much data points
for a given range of time. While this behavior is good for a
continuous stream of data, like monitoring CPU load, it's not suited
for benchmarks. Indeed benchmarks are discrete events that would
occurr once in a while (i.e once a day). Downsampling would, at
some point, simply omit some of benchmarks results.
Using a regular SQL database like PostgreSQL solves this issue.
Bench just one compilation option for automatic benchmarks. Only 'loop'
option is tested to take advantage of hardware with a lot of available
CPUs. Running benchmarks with 'default' option is suboptimal for this
kind of hardware since it uses only one CPU.
This also remove time consuming MNIST test, as it should be in ML benchmarks.
Moreover Makefile is fixed to use provided Python executable instead of
relying on system one to generate MLIR Yaml files.
This includes several fixes and add some functionalities:
* EC2 instance type can be selected when workflow is triggered manually
* benchmarks will be run on each push on main branch
* docker is not used any more due to building issues
* 10 repetitions are made during the benchmarks then results are aggregated
* more tags are used to identify benchmarks configuration