Commit Graph

2642 Commits

Author SHA1 Message Date
Baptiste Roux
33aaaa7f17 feat(hpu,u55c) Update U55C bitstream
This bitstream add support for Multi-width IOp and Flush configuration
hpu_u55c hpu_v80
2025-04-01 10:19:19 +02:00
JJ-hw
2b88097e47 feat(hpu): Update register map
Add new registers:
 * bpip_use_opportunism: used to control the pep_pbs flush strategies
 * counters/info in pe_pbs: used to enhance debug/analysis

Some cleanup, removed unexisting fields asd renaame bpip_used in bpip_use
2025-04-01 10:17:23 +02:00
JJ-hw
cf0b288b17 feat(hpu): Add bpip_use_opportunism register to select the usage of opportunism when BPIP.
Update regif toml (also contains new registers - WIP).
Remove access to registers that do not exist anymore
2025-04-01 10:17:23 +02:00
Baptiste Roux
cc968380f8 feat(hpu): Move opportunistic config in RtlConfig
Rename opportunistic to flush_opportunism and retrieved the value from
RtlConfig instead of HpuParameters
2025-04-01 10:17:23 +02:00
Baptiste Roux
d2554b273c feat(hpu): Add dedicated keyswitch implementation
Use a dedicated keyswitch implementation that used unbalanced keyswitch.
Enable to generate bit-accurate stimulus without the need of a feature
flag inside the decomposer implementation.
hpu_v80_no_flush_barrier hpu_wo_bpip_opportunism
2025-03-31 21:40:56 +02:00
Baptiste Roux
3c78fdbdb0 bugfix(hpu): Correct minor error in hpu test
Add correct feature flag and update hpu_entities test
2025-03-31 21:37:30 +02:00
Helder Campos
f56f3efabe fix(hpu) Fixing the trace to work with the new register model 2025-03-27 20:25:16 +00:00
pgardratzama
c76147bab9 chore(ci): adds Makefile target for erc20 HPU bench & adds in the HPU workflow (not tested yet) 2025-03-27 15:56:12 +01:00
Baptiste Roux
016684a1da feat(hpu,bench) Enable throughput test in Erc20 bench
Also tweak ERC_20 Fw flag to enhance performances.
2025-03-27 15:21:33 +01:00
Helder Campos
c9a9467c08 feat(hpu) psi64 mockup config 2025-03-27 12:36:24 +00:00
Helder Campos
4498111ccb fix(hpu) Fixing Llt comparison operations 2025-03-27 10:00:45 +00:00
pgardratzama
31872b7f7c chore(ci): repeat all env variable in Makefile target or hpu integer bench 2025-03-26 22:59:31 +01:00
pgardratzama
acbbd7cf07 chore(ci): trying to setup HPU env for bench 2025-03-26 22:26:59 +01:00
pgardratzama
1ee89aa48c typo: integer bench was not compiling anymore 2025-03-26 18:43:43 +01:00
pgardratzama
50cee2a25f chore(ci):
- restrict HPU bench to size 8,16,32,64 for now
- use Llt by default
- set min batch size to 10 to adapt to HPU RxPSI=128
- update Makefile to load HPU config
2025-03-26 18:35:19 +01:00
David Testé
5714a15d2d chore(ci): fail workflow if job step fails 2025-03-26 15:24:47 +01:00
Baptiste Roux
61de84970a bugfix(hpu): Correctly handle rejected AMI request
AMI could reject some request if queue are full. Correctly handle it with a retry loop.
2025-03-26 15:17:02 +01:00
David Testé
7a8a9337dd chore(ci): use ssh-agent to get hw_regmap dependency 2025-03-26 15:14:21 +01:00
Helder Campos
683b77190a feat(hpu) Additions and tiny improvements to the trace python library 2025-03-25 17:27:42 +00:00
Helder Campos
3aa7d86ce5 fix(hpu) Fixing SSUB on llt 2025-03-24 17:07:36 +00:00
David Testé
23f25d89cf chore(ci): add make recipe and workflow to run benchmarks 2025-03-24 17:18:56 +01:00
Baptiste Roux
e1b59bffdc bugfix(hpu,fw) Fix issue with pbs flushing in CMP IOp
Previously the case integer-w == 2 triggered underflow.
2025-03-24 14:17:40 +01:00
Helder Campos
9d8ef170f9 feat(hpu) Adding clear-text add/sub to llt 2025-03-24 12:35:50 +00:00
Baptiste Roux
84a9c1e945 feat(Hpu) Add support for IOpProto parsing through CLI
Enable to correctly generate input for custom IOp
2025-03-24 10:52:13 +01:00
Baptiste Roux
cb8373a9f7 chore(hpu): Update hpu/utils CLI
Use same name in all binary
2025-03-21 15:50:39 +01:00
Baptiste Roux
5900d14fe0 feat(hpu,fw): Add flush in Ilp firmware
Ilp used in Rtl simulation, this will prevent to triggered batch by timeout
and thus should reduce the simulation time.
It should also enhance IOp latency, but for latency optimized IOp user
should use the Llt fw impl.
2025-03-21 15:07:46 +01:00
Baptiste Roux
17a757d1be chore(hpu): Fix warning
Should have been done in the previous commit
2025-03-21 14:33:39 +01:00
Baptiste Roux
f1ee51a182 feat(hpu,fw): Add multithreading for tr_table generation
Use rayon to generate each table entry in //
2025-03-20 16:21:18 +01:00
Baptiste Roux
875f6cd439 bugfix(hpu) Update fw
Add new entry in OpCfg and required parsing/format for CLI support
2025-03-19 17:32:03 +01:00
Baptiste Roux
f4884174b6 bugfix(hpu,benches) Correctly wait on results.
Blackbox is not enough with Hpu context, we must wait on result availability (i.e. synced back on Host)
2025-03-18 15:23:56 +01:00
Baptiste Roux
d1c98974e9 bugfix(hpu): Correctly activate hpu feature when some hpu-hw is enabled 2025-03-18 15:23:24 +01:00
Baptiste Roux
751f24d9d2 bugfix(hpu,benches) Fix issue with 128b tfhe-rs integer benches 2025-03-18 13:45:02 +01:00
Baptiste Roux
4eecd0adce bugfix(hpu): Remove the upper-bound in ct defrag algorithm
This upper-bound prevent correct tfhe-rs integer bench execution.
The drawback is a potential perf degradation on huge defrag windows.
Have to check on real HW.
2025-03-18 13:42:50 +01:00
Baptiste Roux
ce3208f74c feat(hpu,mockup) Add a flag nops for fast simulation
Bypass Tfhe operation for fast simulation.
This obviously break the behavior but kept accurate performance estimation.

For accurate behavior with fast runtime, use `fast` parameters set.
NB: This kept correct behavior but break performance estimation.

Not perfect solution, but should mitigate our runtime issue, until proper
computation over trivial ciphertext is supported.
2025-03-18 09:15:45 +01:00
Helder Campos
ed6f74b468 chore(hpu) Small improvement to the rtl graph infrastructure
- This affects debugging features only
2025-03-17 17:25:01 +00:00
Baptiste Roux
1234d6e839 feat(hpu,benches): Add hpu support in benches-integer
Modify Integer keycache to be able to store HpuDevice alongside the key when needed.
2025-03-17 16:33:53 +01:00
Baptiste Roux
c9badb5d7b feat(hpu,benches): Add Hpu parameters in KeyCache
Also update hlapi benches to use the NamedParams traits with Hpu
2025-03-17 16:33:43 +01:00
Helder Campos
253a47ee7b feath(hpu) More scheduling control for Llt firmware. 2025-03-17 13:26:05 +00:00
Helder Campos
27a60ef00f fix(hpu) Adding the new registers to the sim config_store
- Fixing also the Llt firmware for a message width of 2.
2025-03-14 18:53:29 +00:00
Helder Campos
3f52267763 chore(hpu) Dividing Ilp and Llt firmware
- Llt stands for low latency. All instructions there should be optimize
  to have the least latency possible, at the expense of throughput.
2025-03-14 17:47:40 +01:00
Helder Campos
ca8a2dfc17 feat(hpu) Improving the comparison operations 2025-03-14 17:47:40 +01:00
Helder Campos
7933e68e9b feat(hpu) Improving the RTL scheduler
It now prioritizes low latency instructions to make a better decision
later with the high latency instructions.

This only has an impact on low usage IOPs, such as the erc20. It gets
slightly better results than the hand scheduled code, although I won't
be enabling it right now since it requires precise flushing behavior,
which seems to elude us right now.
2025-03-14 17:47:40 +01:00
Helder Campos
517c308d3c feat(hpu) Improving ERC20
Currently flush have negative impact, thus they are disabled
2025-03-14 17:47:40 +01:00
Baptiste Roux
cf77292a41 feat(hpu,hlapi) Enable 128b operation in hlapi bench 2025-03-13 19:27:02 +01:00
Baptiste Roux
f8218375e6 bugfix(hpu): Disable MULF/MULSF
Currently triggered overflow when used with 128b
2025-03-13 17:45:50 +01:00
Baptiste Roux
229b974859 feat(hpu) Add Hpu support in erc20 benchmark
Hpu only support withpaper and a custom implementation.

Seems to have some allocation issue for throughput tests.
They are disable for the moment.

NB: Small hack used to bypass NamedParams ATM. We must properly
    implement it when Hw parameters set will be fixed
2025-03-13 17:45:50 +01:00
Baptiste Roux
1242b5200e feat(hpu): Expose custom iop interface through the hl-api
Far from perfect but enable to start custom IOp on Hpu at the high_level
pi level.
2025-03-13 17:45:50 +01:00
Baptiste Roux
18acffc98a feat(hpu): Add flush in cmp
Flush in reduce/fold shouldn't have negative impact
2025-03-13 17:45:50 +01:00
Baptiste Roux
e5f07a933e feat(hpu,fw): Add flush in ERC_20 2025-03-13 17:45:50 +01:00
Baptiste Roux
a7b3032a1c feat(hpu): Add IOP_IF_THEN_ZERO and IOP_ERC_20
IF_THEN_ZERO is an altered version of IF_THEN_ELSE than take 0 as default value.
ERC_20 is a custom iop dedicated to erc_20 computation. Its a first attempt and mainly a placeholder for future work.
It will be use to test various way to call custom iop from HighLevelApi.

Change test macro to support multi-output IOp correctly.
2025-03-13 17:45:50 +01:00