- this would not give an average runtime for scalar benchmarks and for
small precisions could give super good timings (for lucky values)
- the timings for other precisions could still be favorable or unfavorable
depending on the value that was drawn
This new workflow can trigger all the required benchmarks needed
to populate benchmarks tables in documentation.
It also can generate SVG tables and store them as artifacts.
Optionally, it can open a pull-request to update the current
tables in documentation.
Done to circumvent criterion limitation regarding automatic
truncation of long benchmark ID.
Using a println() call we ensure the complete name is displayed
before benchmark execution to ease manual parsing and debugging.
After running performances regression benchmarks, a performance
changes checking is executed. It will fetch results data with an
external tool then it will look for anomaly in changes.
Finally it will produce a report as an issue comment with any
anomaly display in a Markdown array. A folded section of the
report message contains all the results from the benchmark.
Note that a fully custom benchmark triggered from an issue comment
would not generate a report. In addition HPU performance
regression benchmark is not supported yet.
* Added Value type name to crate::integer::KVStore impl of Named trait
as well as a bool to check we deserialize the correct value type
(Radix vs SignedRadix)
* Add KVStore to high_level_api
* Add KVStore hlapi benches
* Remove specialized `[add,mul,sub]_to_slot` as `map` is now the
intended API.
- mul_to_slot was way slower than using `map`
- add/mul_to_slot were a bit faster (~5% latency-wise), but returned
less information (no old_value, no new_value, no boolean to check)
if the key matched
- Some known improvement can be made to map, which should result in
it being better than add/sub_to_slot
* Add FheIntegerType trait to make the KVStore generic over
FheUint/FheInt, and should make GPU integration "easy"
- update SIMD_N and min_batch_size to 12 which seems to give better
latency and ERC20 throughput
- support IOp on several lines in ami /proc file
- reduce amount of ERC_20_SIMD per batch in HLAPI bench