Add new registers:
* bpip_use_opportunism: used to control the pep_pbs flush strategies
* counters/info in pe_pbs: used to enhance debug/analysis
Some cleanup, removed unexisting fields asd renaame bpip_used in bpip_use
Use a dedicated keyswitch implementation that used unbalanced keyswitch.
Enable to generate bit-accurate stimulus without the need of a feature
flag inside the decomposer implementation.
- restrict HPU bench to size 8,16,32,64 for now
- use Llt by default
- set min batch size to 10 to adapt to HPU RxPSI=128
- update Makefile to load HPU config
Ilp used in Rtl simulation, this will prevent to triggered batch by timeout
and thus should reduce the simulation time.
It should also enhance IOp latency, but for latency optimized IOp user
should use the Llt fw impl.
This upper-bound prevent correct tfhe-rs integer bench execution.
The drawback is a potential perf degradation on huge defrag windows.
Have to check on real HW.
Bypass Tfhe operation for fast simulation.
This obviously break the behavior but kept accurate performance estimation.
For accurate behavior with fast runtime, use `fast` parameters set.
NB: This kept correct behavior but break performance estimation.
Not perfect solution, but should mitigate our runtime issue, until proper
computation over trivial ciphertext is supported.
It now prioritizes low latency instructions to make a better decision
later with the high latency instructions.
This only has an impact on low usage IOPs, such as the erc20. It gets
slightly better results than the hand scheduled code, although I won't
be enabling it right now since it requires precise flushing behavior,
which seems to elude us right now.
Hpu only support withpaper and a custom implementation.
Seems to have some allocation issue for throughput tests.
They are disable for the moment.
NB: Small hack used to bypass NamedParams ATM. We must properly
implement it when Hw parameters set will be fixed
IF_THEN_ZERO is an altered version of IF_THEN_ELSE than take 0 as default value.
ERC_20 is a custom iop dedicated to erc_20 computation. Its a first attempt and mainly a placeholder for future work.
It will be use to test various way to call custom iop from HighLevelApi.
Change test macro to support multi-output IOp correctly.