In case of benchmark_repetitions > 1 using 'run_name' will return
always the same name for a given benchmark case. It won't make the
distinction between the iterations and the aggregated values like
mean, median or stdev. Using 'name' field fix this behavior.
Previousely, we were sending parsed benchmark results to a
Prometheus instance. Do to its time-series nature, Prometheus would
downsample database content to avoid having to much data points
for a given range of time. While this behavior is good for a
continuous stream of data, like monitoring CPU load, it's not suited
for benchmarks. Indeed benchmarks are discrete events that would
occurr once in a while (i.e once a day). Downsampling would, at
some point, simply omit some of benchmarks results.
Using a regular SQL database like PostgreSQL solves this issue.
Bench just one compilation option for automatic benchmarks. Only 'loop'
option is tested to take advantage of hardware with a lot of available
CPUs. Running benchmarks with 'default' option is suboptimal for this
kind of hardware since it uses only one CPU.
This also remove time consuming MNIST test, as it should be in ML benchmarks.
Moreover Makefile is fixed to use provided Python executable instead of
relying on system one to generate MLIR Yaml files.