- update SIMD_N and min_batch_size to 12 which seems to give better
latency and ERC20 throughput
- support IOp on several lines in ami /proc file
- reduce amount of ERC_20_SIMD per batch in HLAPI bench
Post-commit checks was the bottleneck regarding running duration.
It's now split into 7 batches to improve parallelism.
Builds that are specific to Ubuntu are run in their own jobs, so
that only build_tfhe_full recipe call remains in the os matrix.
A final check is performed to ensure all the checks have passed,
this very job is used as branch protection rule.
Regression benchmarks are meant to be run in pull-request. They
can be launched in two flavors:
* issue comment: using command like "/bench --backend cpu"
* adding a label: `bench-perfs-cpu` or `bench-perfs-gpu`
Benchmark definitions are written in TOML and located at
ci/regression.toml.
While not exhaustive, it can be easily modified by reading the
embbeded documentation.
"/bench" commands are parsed by a Python script located at
ci/perf_regression.py. This script produces output files that
contains cargo commands and a shell script generating custom
environment variables. The Python script and generated files are
meant to be used only by the workflow
benchmark_perf_regression.yml.
This adds a new kind of seed to the csprng
When created which such seed, the AES-CTR random generator
initialization changes:
- The AES-KEY used is initialized differently
- The AES-CTR starts with a CTR that may not be 0
The changes make it so that the counter still goes from 0..MAX,
but now the AES-CTR will encrypt the counter + some offset allowing
to keep the regular behavior and the new one
This commit fixes an endian (little) for the counter
representation of the counter used in the AES-CTR counter.
This is so that, the random bytes generated are the same not matter
the endian of the system.
A test case with known answers is added, as well as make command
to run the test in an emulated big-endian arch using the `cross`
utility.
This also include a small refactor where now the block cipher
do not encrypt `AesIndex`. This is done as it makes more sense
(AES encrypts bytes, not numbers), so this allows to move and centralize
the concept of endian as well a centralize where batch created.
Newer version of Zizmor can trigger errors due to new findings in workflows. To avoid breaking any ongoing pull-request, due to this unhandled update, zizmor version is locked.
Due to #[cfg] before the test_user_docs module, the module would
not actually be compiled (thus run user doc test) unless all required
features where activated when running.
So we remove these cfg, as each hardware doc supports its own set of
features and its better to have a test fail because a feature is
missing rather than silently not run anything
Also, add commands and ci stuff to check HPU docs
- includes documentation about GPU's accelerated expand on the HL API
- rework CudaKeySwitchingKey
- Cloning the key is no longer necessary on the HL API
This backend abstract communication with Hpu Fpga hardware.
It define it's proper entities to prevent circular dependencies with
tfhe-rs.
Object lifetime is handle through Arc<Mutex<T>> wrapper, and enforce
that all objects currently alive in Hpu Hw are also kept valid on the
host side.
It contains the second version of HPU instruction set (HIS_V2.0):
* DOp have following properties:
+ Template as first class citizen
+ Support of Immediate template
+ Direct parser and conversion between Asm/Hex
+ Replace deku (and it's associated endianess limitation) by
+ bitfield_struct and manual parsing
* IOp have following properties:
+ Support various number of Destination
+ Support various number of Sources
+ Support various number of Immediat values
+ Support of multiple bitwidth (Not implemented yet in the Fpga
firmware)
Details could be view in `backends/tfhe-hpu-backend/Readme.md`