In case of benchmark_repetitions > 1 using 'run_name' will return
always the same name for a given benchmark case. It won't make the
distinction between the iterations and the aggregated values like
mean, median or stdev. Using 'name' field fix this behavior.
Now compiler builds on AWS can be requested in Pull Request comment
using '@slab-ci <build_command>'. This sets up environment to able
to trigger both CPU and GPU builds on AWS EC2.
This CI "feature" is meant to circumvent the 6 hours hard-limit
for a job in GitHub Action.
The benchmark is done using a matrix which is handled by Slab.
Here's the workflow:
1. ML benchmarks are started in a fire and forget fashion via
start_ml_benchmarks.yml
2. Slab will read ci/slab.toml to get the AWS EC2 configuration
and the matrix parameters
3. Slab will launch at most max_parallel_jobs EC2 instances in
parallel
4. Each job will trigger ml_benchmark_subset.yml which will run
only one of the generated YAML file via make generate-mlbench,
based on the value of the matrix item they were given.
5. As soon as a job is completed, the next one in the matrix
will start promptly.
This is done until all the matrix items are exhausted.
Previousely, we were sending parsed benchmark results to a
Prometheus instance. Do to its time-series nature, Prometheus would
downsample database content to avoid having to much data points
for a given range of time. While this behavior is good for a
continuous stream of data, like monitoring CPU load, it's not suited
for benchmarks. Indeed benchmarks are discrete events that would
occurr once in a while (i.e once a day). Downsampling would, at
some point, simply omit some of benchmarks results.
Using a regular SQL database like PostgreSQL solves this issue.
Bench just one compilation option for automatic benchmarks. Only 'loop'
option is tested to take advantage of hardware with a lot of available
CPUs. Running benchmarks with 'default' option is suboptimal for this
kind of hardware since it uses only one CPU.
This also remove time consuming MNIST test, as it should be in ML benchmarks.
Moreover Makefile is fixed to use provided Python executable instead of
relying on system one to generate MLIR Yaml files.