Compare commits

..

175 Commits

Author SHA1 Message Date
Baptiste Roux
f7995f591e refactor(hpu): Based on Arthur's review 2025-04-29 18:17:53 +02:00
Baptiste Roux
54b050f214 fix(hpu): Add clear compile error with hw-v80/hw-xrt
Only one feature could be enabled at a time
2025-04-29 18:07:15 +02:00
Baptiste Roux
f8d7e71df8 fix(hpu): Correctly set the input range for test/bench
Previously defined range missed the upper bound. It was not a big deal
 except for FheBool.
2025-04-29 17:41:28 +02:00
Baptiste Roux
aa6849bae5 fix(hpu): Add BufWriter in write_hex implementation
Enhance io_dump execution time
2025-04-29 17:16:21 +02:00
Baptiste Roux
c752720247 feat(hpu): Add io_dump support inside hpu test
Required hpu-debug feature and setting HPU_IO_DUMP environment variable
NB: This could generate large file and degrade test runtime.
    Use with cautious.
2025-04-29 17:16:20 +02:00
Baptiste Roux
5b17f658bd fix(hpu): Let Hpu test be reproductible
Cargo test didn't enforce test ordering and in case of failure we want
to be able to only rerun the failing one.
For this purpose the shared RNG was replaced by a new one in each test-bundle.
Thus two seeding environment variables are availables:
* HPU_KEY_SEED -> Use for key generation shared by all test-bundle
* HPU_TEST_SEED -> Use for cleartext/noise generation, start from fresh in each test bundle
2025-04-29 17:16:12 +02:00
Baptiste Roux
d86cb6467b fix(hpu): Prevent ksk rounding to overflow
Indeed, without mask ksk coef rounding could overlaps with next coefs
 when we pack them on lbz.
Fix the observed error in hpu_loopback
2025-04-29 17:13:44 +02:00
Baptiste Roux
4bbe002d8b fix(hpu): Remove dedicated decomposition changes for RTL
Indeed, there is an error in the RTL implementation that prevent decomposition
to be balanced.
Now RTL use same decomposition as SW and thus, there is no more need of a
dedicated keyswitch implementation
2025-04-29 09:23:59 +02:00
pgardratzama
1c0202562b feat(hpu): Add Hpu TUniform parameter inside shortint list.
Enable to use this Hpu TUniform parameter set inside hpu benches.
Also update hpu targets in Makefile
2025-04-28 15:46:08 +02:00
Baptiste Roux
5e7546756d feat(hpu): Add support for SeedableRng in hpu testsuite.
Use HPU_TEST_SEED environment variable to force a value (support bin,oct,dec,hex format)
2025-04-28 15:46:04 +02:00
Baptiste Roux
67455b46f8 feat(hpu): Use IOP_MEMCPY for hl-api ciphertext clone
Implement clone on HPU with same semantic as Cpu/Gpu (i.e. deep clone)
2025-04-28 14:00:58 +02:00
Baptiste Roux
9befb1dd64 feat(hpu): Add IOP MEMCPY
Dedicated IOP to copy a ciphertext from one memory slot to another.
This wasn't implemented before, since it could be easily avoid by
reworking the top level algorithm.
However, it's usefull to enforce same clone semantic at the hl-api level
2025-04-28 09:57:44 +02:00
Helder Campos
01d38a65d5 chore(hpu): Add requirements.txt to the isctrace python lib
Adding a requirements and README to the isctrace python library.
The python library is supposed to use a python virtual environment.
2025-04-28 09:24:39 +02:00
Baptiste Roux
156bc8079b refactor(hpu): HpuDevice hold a ref to Params
Enable to share reference with multilpe HpuVarWrapped instead of value. Since HpuVarWrapped are somehow attach to a device, it's meaningfull to share a reference instead of value duplication.
2025-04-25 18:59:12 +02:00
tmontaigu
ebf3caee4f chore: add token to should-run job 2025-04-25 10:48:17 +02:00
Baptiste Roux
169cedd037 fix(hpu): Following review removed unused code 2025-04-25 09:31:11 +02:00
Baptiste Roux
3f9966504b fix(hpu): fix some lints issues
Fix lint issues that arise following retrieved on hw-team/hpu/dev
2025-04-24 20:16:23 +02:00
Baptiste Roux
73e69cb70a feat(hpu): Update mockup parameters file
Add TUniform parameters, and rename them to know the associated distribution.
Also update them with new parameters structure with Gaussian/TUniform enum.
2025-04-24 19:45:20 +02:00
pgardratzama
00baada810 feat(hpu): adds TUniform HPU parameter set and updates HpuParameters to support both distribution type 2025-04-24 19:45:19 +02:00
Helder Campos
d971fe2de3 chore(hpu): Adding mockup test targets to the main makefile 2025-04-24 19:45:19 +02:00
pgardratzama
0af36e441f chore(hpu): generate KSK with native modulus even in TUniform, use GLWE distribution on ciphertexts, re-enable check on KS variance 2025-04-24 19:45:19 +02:00
pgardratzama
d77fe311bb chore(hpu): uses max on minimal variance calculation, rename a few fct to match security level, remove assert on KS variance for now 2025-04-24 19:45:18 +02:00
pgardratzama
b375f59ac6 feat(hpu): adding support of 1st FPGA TUniform parameter set in HPU noise measurement test 2025-04-24 19:45:18 +02:00
Helder Campos
16d025b6d5 feat(hpu): Adding the option to use ripple carry for lower bit widths
- Don't forget to remove the old kogge_cfg.toml!
2025-04-24 19:45:17 +02:00
Baptiste Roux
b6268cb4b5 fix(hpu): Prevent lfs objects to be pulled by default
Should be the default behavior, but not working as expected initially
2025-04-24 19:36:08 +02:00
Baptiste Roux
0898fc3df1 fix(hpu): Add missing feature guard for hpu 2025-04-24 19:27:41 +02:00
Baptiste Roux
34c25ee9e1 doc(hpu): Move hpu modification in a dedicated PR 2025-04-24 19:26:30 +02:00
Baptiste Roux
0e3256f2cd refactor(hpu): Remove need of feature in ntt key conversion
Instead used an explicit enum as argument to select Raw or Normalize option
2025-04-24 17:12:51 +02:00
Baptiste Roux
ee1bf0887c refactor(hpu): Use CreateFrom instead of FromWith
Those traits fullfill the same purpose, and could be fused.

NB: CreateFrom previously imposed a Copy bound metadata that wasn't required and thus removed
2025-04-24 17:12:51 +02:00
tmontaigu
07fb2adff3 chore: fix some hpu lints 2025-04-24 17:12:51 +02:00
tmontaigu
2ff7a1364f chore: other fixes 2025-04-24 17:12:50 +02:00
tmontaigu
779da18634 chore: fix compilation errors for feature hpu+gpu 2025-04-24 17:12:50 +02:00
tmontaigu
129d60211d chore: fix clippy_gpu 2025-04-24 17:12:49 +02:00
Baptiste Roux
6342e8cbfb fix(hpu): Remove shortint reference in core_crypto 2025-04-24 17:12:49 +02:00
Baptiste Roux
988b8a551e fix(hpu): Some update following review 2025-04-24 17:12:48 +02:00
Baptiste Roux
b5467fc6fe chore(hpu): Remove author in backend crate 2025-04-24 17:12:48 +02:00
Baptiste Roux
1d93f9c7e1 refactor(hpu): Replace todo with clear panic
As suggested by Arthur, todo not planned to be implemented for the release must panic instead
2025-04-24 17:12:48 +02:00
Baptiste Roux
bf3bd1b26f fix(hpu): Fix sync issue with assign operation
Early invalidate variable state to prevent Cpu to view state of the source instead of the dst for assign operations
2025-04-24 17:12:47 +02:00
tmontaigu
8c5f933d08 chore: fix pcc issues 2025-04-24 17:12:47 +02:00
tmontaigu
2947f93ca2 chore: use https for hw_regmap
As the repo is public we can use the https
address, it should not require any kind of key/login
and thus work in the CI
2025-04-24 17:12:46 +02:00
tmontaigu
4096788769 chore: fix hpu bitwise tests 2025-04-24 17:12:46 +02:00
tmontaigu
8b68717e60 chore: add err messages for hpu bool ops 2025-04-24 17:12:46 +02:00
pgardratzama
1a89ee03fc chore(hpu): remove automatic launch of HPU run workflow when PR is updated 2025-04-24 17:12:45 +02:00
tmontaigu
c5e5cfc3f2 chore: small fixes 2025-04-24 17:12:45 +02:00
tmontaigu
bb1c9980c1 chore: fix pcc 2025-04-24 17:12:44 +02:00
tmontaigu
5d7f88a91e chore: fix typos 2025-04-24 17:12:44 +02:00
tmontaigu
59ef1b05f9 chore(hlapi): add more hpu tests 2025-04-24 17:12:44 +02:00
Baptiste Roux
9c979453c4 fix(hpu): First round of review by IceTDrinker
Still WIP, all remarks not taken into account yet
2025-04-24 17:12:43 +02:00
Baptiste Roux
ba59619cbb refactor(hpu): Remove the need of serial_test crate
Merge hpu test in same file and rely on competition over HpuDevice
mutex to serialize them.
2025-04-24 17:12:43 +02:00
Baptiste Roux
02c41fcb05 fix(hpu): init_device return a Result
Prevent init of hpu device with incompatible key
2025-04-24 17:12:42 +02:00
Baptiste Roux
ad0b511510 refactor(hpu): move packed_struct inside isc_trace
There is no need to have a separated crate, thus fuse code with isc_trace, the only module that used it
2025-04-24 17:12:42 +02:00
Baptiste Roux
38a0d173d2 fix(hpu): Second round of review from tmontaigu 2025-04-24 17:12:42 +02:00
Baptiste Roux
d745a59dff fix(hpu): Fix issue with heap allocation
Heap reserved slot mustn't be view as available memory by the backend.
However, we kept them in the allocation process for correct simulation
through the mockup.
2025-04-24 17:12:41 +02:00
Baptiste Roux
ed72456b8a refactor(hpu): Move parameters definition in shortint 2025-04-24 17:12:41 +02:00
Baptiste Roux
42bc12bc0d chore(hpu): Add license file for backend and mockup 2025-04-24 17:12:40 +02:00
Baptiste Roux
ed1b34cffd refactor(hpu): Remove old NttArchitecture support
Also remove associated ordering/network that weren't needed anymore.
Integrate Thomas's rework on RadixBasisConv.
2025-04-24 17:12:40 +02:00
Baptiste Roux
48053a88a8 fix(hpu): First round of review 2025-04-24 17:12:39 +02:00
Baptiste Roux
09b3567bba chore(hpu): Correct EOF lint errors 2025-04-24 17:12:39 +02:00
Baptiste Roux
b3b9631725 fix(hpu): Correctly hide hpu related stuff behind feature flag 2025-04-24 17:12:39 +02:00
Baptiste Roux
9cf0b911d8 chore(hpu): Fix clippy warning 2025-04-24 17:12:38 +02:00
Baptiste Roux
1529d28011 chore(noise-formula): Fix fmt issue in variance_formula 2025-04-24 17:12:38 +02:00
Baptiste Roux
2a8af7a50e feat(hpu): Add makefile target to start CI test over Hpu
Also fix issue in integer_bench target (missing feature required after
the rebase)
2025-04-24 17:12:37 +02:00
Baptiste Roux
347858b2a0 chore(hpu): Change Qdma queue
Prevent conflict with queue used to upload tandem bitstream
2025-04-24 17:12:37 +02:00
Baptiste Roux
cd0c4589cc chore(hpu): Rename aved feature flag into v80
Depict the target board name instead of the SW stack.
2025-04-24 17:12:37 +02:00
Baptiste Roux
21d7e26236 docs(hpu): Update Hpu backend documentation
Update hpu backend readme with new features name and examples
2025-04-24 17:12:36 +02:00
Baptiste Roux
d7a56cf91f fix(hpu): Various fixes on benches following rebase on main
Update parameters set definition to match new release number and update
benches body to fit with new structure
2025-04-24 17:12:36 +02:00
Baptiste Roux
5155763879 docs(hpu): Update Hpu mockup documentation
Update Readme.md and associated files
2025-04-24 17:12:35 +02:00
Baptiste Roux
c7c98f05e9 feat(hpu): Bind load_ksk_rcp_dur runtime counter in tfhe-hpu-backend 2025-04-24 17:12:35 +02:00
Baptiste Roux
7b75468e7e chore(hpu): Fix Clippy with tfhe-hpu-backend hw-aved features 2025-04-24 17:12:35 +02:00
Baptiste Roux
77e01b862e chore(hpu): Fix Clippy with tfhe-hpu-backend utils features 2025-04-24 17:12:34 +02:00
Baptiste Roux
5ce167cce1 chore(hpu): Add dedicated clippy_hpu_backend entry in makefile
Fix all generated clippy warning/errors
2025-04-24 17:12:34 +02:00
Baptiste Roux
36543536c0 test(hpu-noise): Fix issue with hpu_noise test
Since the split between ntt64/ntt64_bnf, hpu_noise tests must use the bnf version explcitly
2025-04-24 17:12:33 +02:00
Baptiste Roux
6118f0ba59 test(hpu): Correctly disable hpu tests when feature is disabled 2025-04-24 17:12:33 +02:00
Baptiste Roux
8e080afac2 test(hpu): Fix hpu_entities tests 2025-04-24 17:12:33 +02:00
Baptiste Roux
6833a5f9f9 chore(hpu): Fix clippy error in Hpu related files 2025-04-24 17:12:32 +02:00
Baptiste Roux
2067de2904 chore(hpu): Fix fmt issue within Hpu Backend and associated files 2025-04-24 17:12:32 +02:00
Baptiste Roux
b1cf20babd chore(hpu): Fix typos in Hpu backend and associated files 2025-04-24 17:12:31 +02:00
pgardratzama
386c454adf chore(ci): adds 128b to HPU integer bench, runs these benches with --quick 2025-04-24 17:12:31 +02:00
Helder Campos
2a303fb00d feat(hpu): Speeding up LLT generation
- The 128bit LLT FW generation speed is now bearable.
- The kogge stone config is now linked to the minimum batch size, not
  the maximum batch size.
- Enabled 128bit generation in hpu_config.toml for AVED and SIM.
2025-04-24 17:12:30 +02:00
Helder Campos
93fbc2bde7 feat(hpu): Adding a min batch size option for FW generation
- The old hardcoded min_batch_size is now configurable in the
  hpu_config.toml.
2025-04-24 17:12:30 +02:00
Helder Campos
8d2ada1f1f feat(hpu): Add demo FW and trace 2025-04-24 17:12:30 +02:00
Baptiste Roux
8ecbfeb1db feat(hpu): Add function to enforce Ciphertext ordering
Iterate over various integer-width on large dataset could introduce
some memory fragmentation.
Add a dedicated function to reorder the memory and prevent this effect
that could marginally impact performances in benchmarks.
2025-04-24 17:12:29 +02:00
Baptiste Roux
47e7c00ec0 feat(hpu-xrt): Update U55C bitstream
This bitstream add support for Multi-width IOp and Flush configuration
2025-04-24 17:12:29 +02:00
JJ-hw
b2e906710f feat(hpu): Update register map
Add new registers:
 * bpip_use_opportunism: used to control the pep_pbs flush strategies
 * counters/info in pe_pbs: used to enhance debug/analysis

Some cleanup, removed unexisting fields asd renaame bpip_used in bpip_use
2025-04-24 17:12:28 +02:00
JJ-hw
28b3630b22 feat(hpu): Add bpip_use_opportunism register to select the usage of opportunism when BPIP.
Update regif toml (also contains new registers - WIP).
Remove access to registers that do not exist anymore
2025-04-24 17:12:28 +02:00
Baptiste Roux
9b381bdf95 feat(hpu): Move opportunistic config in RtlConfig
Rename opportunistic to flush_opportunism and retrieved the value from
RtlConfig instead of HpuParameters
2025-04-24 17:12:27 +02:00
Baptiste Roux
b85d024648 feat(hpu): Add dedicated keyswitch implementation
Use a dedicated keyswitch implementation that used unbalanced keyswitch.
Enable to generate bit-accurate stimulus without the need of a feature
flag inside the decomposer implementation.
2025-04-24 17:12:27 +02:00
Baptiste Roux
f165e4061e fix(hpu): Correct minor error in hpu test
Add correct feature flag and update hpu_entities test
2025-04-24 17:12:26 +02:00
Helder Campos
2022410288 fix(hpu): Fixing the trace to work with the new register model 2025-04-24 17:12:26 +02:00
pgardratzama
c8e6d34141 chore(ci): adds Makefile target for erc20 HPU bench
Adds in the HPU workflow (not tested yet)
2025-04-24 17:12:26 +02:00
Baptiste Roux
b221f49cc7 feat(hpu): Benches enable throughput test in Erc20 bench
Also tweak ERC_20 Fw flag to enhance performances.
2025-04-24 17:12:25 +02:00
Helder Campos
d21993a08e feat(hpu): psi64 mockup config 2025-04-24 17:12:25 +02:00
Helder Campos
f33773f2eb fix(hpu): Fixing Llt comparison operations 2025-04-24 17:12:24 +02:00
pgardratzama
dfbd95a1dd chore(ci): repeat all env variable in Makefile target or hpu integer bench 2025-04-24 17:12:24 +02:00
pgardratzama
37d02f7f84 chore(ci): trying to setup HPU env for bench 2025-04-24 17:12:24 +02:00
pgardratzama
12078f4f6b fix(bench): integer bench was not compiling anymore
Fix typos that prevent compilation
2025-04-24 17:12:23 +02:00
pgardratzama
f0c7061df6 chore(ci): Update hpu CI entries
- restrict HPU bench to size 8,16,32,64 for now
- use Llt by default
- set min batch size to 10 to adapt to HPU RxPSI=128
- update Makefile to load HPU config
2025-04-24 17:12:23 +02:00
David Testé
39e016a564 chore(ci): fail workflow if job step fails 2025-04-24 17:12:22 +02:00
Baptiste Roux
f1f79e3427 fix(hpu): Correctly handle rejected AMI request
AMI could reject some request if queue are full. Correctly handle it with a retry loop.
2025-04-24 17:12:22 +02:00
David Testé
5f3ccd1192 chore(ci): use ssh-agent to get hw_regmap dependency 2025-04-24 17:12:22 +02:00
Helder Campos
bda1ea170f feat(hpu): Additions and tiny improvements to the trace python library 2025-04-24 17:12:21 +02:00
Helder Campos
1e23982dd0 fix(hpu): Fixing SSUB on llt 2025-04-24 17:12:21 +02:00
David Testé
c70af4b3bc chore(ci): add make recipe and workflow to run benchmarks 2025-04-24 17:12:20 +02:00
Baptiste Roux
0ffcc511e2 fix(hpu): Fw fix issue with pbs flushing in CMP IOp
Previously the case integer-w == 2 triggered underflow.
2025-04-24 17:12:20 +02:00
Helder Campos
8e86b2beb7 feat(hpu): Adding clear-text add/sub to llt 2025-04-24 17:12:20 +02:00
Baptiste Roux
f30dfa80a9 feat(hpu): Add support for IOpProto parsing through CLI
Enable to correctly generate input for custom IOp
2025-04-24 17:12:19 +02:00
Baptiste Roux
b54d54149d chore(hpu): Update hpu/utils CLI
Use same name in all binary
2025-04-24 17:12:19 +02:00
Baptiste Roux
acc174a06f feat(hpu): Fw add flush in Ilp firmware
Ilp used in Rtl simulation, this will prevent to triggered batch by timeout
and thus should reduce the simulation time.
It should also enhance IOp latency, but for latency optimized IOp user
should use the Llt fw impl.
2025-04-24 17:12:18 +02:00
Baptiste Roux
6a5cf3218c chore(hpu): Fix warning
Should have been done in the previous commit
2025-04-24 17:12:18 +02:00
Baptiste Roux
a65babadf4 feat(hpu): Fw add multithreading for tr_table generation
Use rayon to generate each table entry in //
2025-04-24 17:12:17 +02:00
Baptiste Roux
868a13bd63 fix(hpu): Update fw
Add new entry in OpCfg and required parsing/format for CLI support
2025-04-24 17:12:17 +02:00
Baptiste Roux
64792697f1 fix(hpu): Benches correctly wait on results.
Blackbox is not enough with Hpu context, we must wait on result availability (i.e. synced back on Host)
2025-04-24 17:12:17 +02:00
Baptiste Roux
0e243b0587 fix(hpu): Correctly activate hpu feature when some hpu-hw is enabled 2025-04-24 17:12:16 +02:00
Baptiste Roux
701641bf18 fix(hpu): Benches Fix issue with 128b tfhe-rs integer benches 2025-04-24 17:12:16 +02:00
Baptiste Roux
d2e84f9701 fix(hpu): Remove the upper-bound in ct defrag algorithm
This upper-bound prevent correct tfhe-rs integer bench execution.
The drawback is a potential perf degradation on huge defrag windows.
Have to check on real HW.
2025-04-24 17:12:15 +02:00
Baptiste Roux
7043510322 feat(hpu): Mockup add a flag nops for fast simulation
Bypass Tfhe operation for fast simulation.
This obviously break the behavior but kept accurate performance estimation.

For accurate behavior with fast runtime, use `fast` parameters set.
NB: This kept correct behavior but break performance estimation.

Not perfect solution, but should mitigate our runtime issue, until proper
computation over trivial ciphertext is supported.
2025-04-24 17:12:15 +02:00
Helder Campos
1f82ed9ac6 chore(hpu): Small improvement to the rtl graph infrastructure
- This affects debugging features only
2025-04-24 17:12:15 +02:00
Baptiste Roux
22c917d9b0 feat(hpu): Benches add hpu support in benches-integer
Modify Integer keycache to be able to store HpuDevice alongside the key when needed.
2025-04-24 17:12:14 +02:00
Baptiste Roux
23338229fa feat(hpu): Benches add Hpu parameters in KeyCache
Also update hlapi benches to use the NamedParams traits with Hpu
2025-04-24 17:12:14 +02:00
Helder Campos
02d408a74a feat(hpu): More scheduling control for Llt firmware. 2025-04-24 17:12:13 +02:00
Helder Campos
8a38f70304 fix(hpu): Adding the new registers to the sim config_store
- Fixing also the Llt firmware for a message width of 2.
2025-04-24 17:12:13 +02:00
Helder Campos
7daa8e7d7d chore(hpu): Dividing Ilp and Llt firmware
- Llt stands for low latency. All instructions there should be optimize
  to have the least latency possible, at the expense of throughput.
2025-04-24 17:12:13 +02:00
Helder Campos
37a544ac4e feat(hpu): Improving the comparison operations 2025-04-24 17:12:12 +02:00
Helder Campos
d17c698b0f feat(hpu): Improving the RTL scheduler
It now prioritizes low latency instructions to make a better decision
later with the high latency instructions.

This only has an impact on low usage IOPs, such as the erc20. It gets
slightly better results than the hand scheduled code, although I won't
be enabling it right now since it requires precise flushing behavior,
which seems to elude us right now.
2025-04-24 17:12:12 +02:00
Helder Campos
df8c86367b feat(hpu): Improving ERC20
Currently flush have negative impact, thus they are disabled
2025-04-24 17:12:11 +02:00
Baptiste Roux
4654dba69d feat(hpu): HlApi enable 128b operation in hlapi bench 2025-04-24 17:12:11 +02:00
Baptiste Roux
fbb55001fd fix(hpu): Disable MULF/MULSF
Currently triggered overflow when used with 128b
2025-04-24 17:12:11 +02:00
Baptiste Roux
916b3fd9e8 feat(hpu): Add Hpu support in erc20 benchmark
Hpu only support withpaper and a custom implementation.

Seems to have some allocation issue for throughput tests.
They are disable for the moment.

NB: Small hack used to bypass NamedParams ATM. We must properly
    implement it when Hw parameters set will be fixed
2025-04-24 17:12:10 +02:00
Baptiste Roux
8b6d97c043 feat(hpu): Expose custom iop interface through the hl-api
Far from perfect but enable to start custom IOp on Hpu at the high_level
pi level.
2025-04-24 17:12:10 +02:00
Baptiste Roux
9ed57aaa9f feat(hpu): Add flush in cmp
Flush in reduce/fold shouldn't have negative impact
2025-04-24 17:12:09 +02:00
Baptiste Roux
816260c6ec feat(hpu): Fw add flush in ERC_20 2025-04-24 17:12:09 +02:00
Baptiste Roux
82df829780 feat(hpu): Add IOP_IF_THEN_ZERO and IOP_ERC_20
IF_THEN_ZERO is an altered version of IF_THEN_ELSE than take 0 as default value.
ERC_20 is a custom iop dedicated to erc_20 computation. Its a first attempt and mainly a placeholder for future work.
It will be use to test various way to call custom iop from HighLevelApi.

Change test macro to support multi-output IOp correctly.
2025-04-24 17:12:09 +02:00
Baptiste Roux
64212e9537 feat(hpu): HlApi bind IfThenElse operation in the hl_api 2025-04-24 17:12:08 +02:00
Baptiste Roux
3b7a506d43 feat(hpu): HlApi bind Hpu cmp in the HighLevel API 2025-04-24 17:12:08 +02:00
Baptiste Roux
36c40c6800 chore(hpu): Reduce integer-w support in simulation
Aims is to reduce boot time.
2025-04-24 17:12:07 +02:00
Baptiste Roux
2b4b5bf316 feat(hpu): Change Cmp Iop prototype
Those IOp now return an Boolean instead of a full-width ciphertext.
This arise an issue on fw tr_table selection, and was fixed in mockup
model and in arm fw.
2025-04-24 17:12:07 +02:00
Baptiste Roux
f83173f225 feat(hpu): Rework exposed API
IOp have a clear prototype to enable runtime check of IOp arguments
and to generate accurate input for test/bench.

The new API support :
 * multi-output IOP
 * Asymetric IOp (In some extend, i.e. only relative width)

Hpu examples /quick bench rely on Integer abstraction. No more need for
explicit API that was only a interfaces showcase/proposal supersed by
high_level_api integration.

Update test framework:
Now use rework API at integer level (same as example/quick bench).
All tests are gather in same file and used a common backend to reduce init time.
Also add 128b test and `if_then_else` one

Feature hpu-xfer wasn't used anymore, remove it and associated files
2025-04-24 17:12:06 +02:00
Baptiste Roux
ab6b398743 feat(hpu): Add IfThenElse IOp
Still WIP must be hooked up it hlapi
2025-04-24 17:12:06 +02:00
Baptiste Roux
e0330c2306 refactor(hpu): Replace hpu example with all-in-one
All previous Hpu example application coud be supersed by a all-in-one
application and thus reduce maintance work.
Delete old showcase application of the explicit xfer API (Replaced by hlapi showcase)
NB: Explicit API is currently kept for bench/test purpose only.
2025-04-24 17:12:06 +02:00
Baptiste Roux
3165f87831 feat(hpu): Add support for 128b IOp
Extend Fw table entry and update configuration
2025-04-24 17:12:05 +02:00
Baptiste Roux
d26bb98313 chore(hpu): Reduce log verbosity in fw generation
Indeed, the previous level always clutter the trace when used at backend level
2025-04-24 17:12:05 +02:00
Helder Campos
83fd943d64 fix(hpu): Fixing the mockup forced flushes 2025-04-24 17:12:04 +02:00
Helder Campos
dcea55688a fix(hpu): Changing the default MUL back to the legacy
This until we have the flushes working correctly
2025-04-24 17:12:04 +02:00
Helder Campos
957eefe547 feat(hpu): A better mockup
- All the work done here is trying to achieve an accurate mockup. For
  that purpose, I ended up by having to change the RTL scheduling
  infrastructure and most of the PE logic.
- Added a new parameter set reflecting the current RTL state in the
  mockup.
- Changed the python trace library to both read hw and mockup traces for
  better comparison and statistic collection.
2025-04-24 17:12:04 +02:00
Helder Campos
dce6136639 feat(hpu): Adapting the isc-trace format for the new ISC trace
- The new format is a consequence of the new flush barrier feature
2025-04-24 17:12:03 +02:00
Helder Campos
2a7bc57f97 feat(hpu): Compiled Multiplier
- The MUL/ADD/SUB IOPs now use the compiled rtl framework
- Fixed some bugs in the framework for it to be used with the
  multiplication.
- Added the option to have rtl configuration per IOP
2025-04-24 17:12:03 +02:00
Baptiste Roux
4537de5704 feat(hpu): HlApi enhance showcase example and add support for benchmark
Show that Hpu support multilpe integer-width at the same time.
Fix issue with `wait` implementation and expose it through a trait
 to use it in benchmark.

WARN: Benchmark currently disable unavailable operations
2025-04-24 17:12:02 +02:00
Baptiste Roux
c56fbd1329 feat(hpu): Rework memory allocation
Use lock-free structure instead of backend lock.
Enable retry in case of allocation failure
2025-04-24 17:12:02 +02:00
Baptiste Roux
32909b087f feat(hpu): Add support for multiple integer width
WARN: This required update of the ARM FW and break compatibility with u55c
Multiple translation table could now be used at the same time.
Thus, with same translation table we can used != integer_width and
create application that used != FheUint at the same time.

NB: Only symetric operations (i.e. Where dst_align == src_align) are
    currently supported
2025-04-24 17:12:02 +02:00
JJ-hw
8f4dad0802 feat(hpu): Do not read workq register in aved configuration 2025-04-24 17:12:01 +02:00
JJ-hw
1ea69ff0d0 feat(hpu): Add regif toml for 4xregif HPU 2025-04-24 17:12:01 +02:00
JJ-hw
2b76e79338 feat(hpu): Update HPU bitstream and regif toml for u55c 2025-04-24 17:12:00 +02:00
JJ-hw
16813407e2 fix(hpu): Forgot to update this register's name 2025-04-24 17:12:00 +02:00
JJ-hw
4b635cf3e7 feat(hpu): Change register names as the one used on the .toml files. 2025-04-24 17:11:59 +02:00
Baptiste Roux
9d5ee60524 chore(hpu): Fix issue when used on computer without V80 2025-04-24 17:11:59 +02:00
Helder Campos
eac9e36b55 feat(hpu): ISC Trace support 2025-04-24 17:11:59 +02:00
lsainati
84cf06e4e6 docs(hpu): small syntax correction 2025-04-24 17:11:58 +02:00
lsainati
fc5c94d99a chore(hpu): Flow add a flexible way to fetch pcie device
Check that we have only one card.
Since we are initializing qdma here, let's load the kernel module as well
2025-04-24 17:11:58 +02:00
Baptiste Roux
92603a6334 chore(hpu): Reduce fw generation verbosity
Warning about MAC anti-pattern is emmited in the asm file instead of a WARN in the trace messages
2025-04-24 17:11:57 +02:00
Helder Campos
984a37a4ae fix(hpu): Bad carry input for the subtraction 2025-04-24 17:11:57 +02:00
Helder Campos
fae7df92a4 feat(hpu): Using Thomas's carry propagation trick on the kogge-stone
While this actually halves the PBSs required to merge the look-ahead
carry computation, it doesn't halve the latency because of the overhead
needed to extract and add carry. Still saves some time though.
2025-04-24 17:11:56 +02:00
Baptiste Roux
630865925a feat(hpu-aved): Add support for Aved backend (i.e V80)
Configuration was changed to enable explicit selection of the memory kind
during allocation (i.e. DDR or HBM).
NB: HBM allocation use a custom allocator, DDR one are based on offset
    and let full freedom to the user (No check on overlaps was done)

Add trace_mem in config and associated code to set its addr in the hpu registers

Runtime configuration have changes, now we only have:
* sim for mockup simulation
* u55c_gf64 for u55c gf64_msg2_carry2 execution
* aved for v80 execution
NB: Look at Readme.md in each configuration folder to found synthesis
details.

The associated `setup_hpu.sh` script have evolved to enable fine grain
configuration of things such as PCIe id for Aved board.
Associated Toml file also support ShellString instead of String and
interpolate the String entry with environnement variables.
This enable to have configuration that work on multiple server with
different PCIe id and so on.
2025-04-24 17:11:56 +02:00
Helder Campos
26f09db623 feat(hpu): PBS Latency Emulation Improvement
- Improved the PBS latency emulation in the mockup by taking into
  account the size of the batch.
- Added an option to fill the PBS batch FIFO as much as possible in the
  kogge stone scheduling.
- Added an option to use the minimum batch size to schedule the kogge
  stone. This improves performance just because the allocation is a bit
  better, but it was mostly luck, as the architecture remained the same
  (ie the amount of work to do is the same).
  By manual inspection, there's an even better scheduling strategy by
  mixing minimum batch sizes with non minimum sizes though, which would
  save us one batch. However, doing this requires to write code that
  would keep track of the current scheduling and try to optimize it
  further, which would be a lot of work to save a single batch.
2025-04-24 17:11:55 +02:00
Helder Campos
7fb5f3baa3 chore(hpu): Cargo fmt and Clippy lint 2025-04-24 17:11:55 +02:00
Helder Campos
6100323d6a feat(hpu): PBS batch flush
- Fixed also a scheduler bug in the timeout handling
2025-04-24 17:11:54 +02:00
Baptiste Roux
6d720e054c feat(hpu): Update hw_regmap dependency
Use new regmap format and update interface when required.
Also fix hw_regmap revision instead of using a branch name.
2025-04-24 17:11:54 +02:00
Baptiste Roux
ffda733109 feat(hpu): Revamp configuration in Hpu backend and Mockup
Instead of a set of configuration files, configuration is now based on
one source of truth -> HpuParameters (extract from RTL register).
All other configuration structures in now derived from it.

This prevent the user to use unmatch configuration (e.g. fw generation
with a cost model that don't match the real Hw).
And also reduce the complexity of configuration file generation for Fpga
simulation.

Ditch out ron deps in favor of TOML. Enable to have all our
configuration files relying on the same format.
Since Toml don't support number bigger than i64, we use an enum to pass
the used prime value instead of the raw value
Also ditch out the ron dependency to have all our configuration in the
same format (i.e. TOML).
2025-04-24 17:11:53 +02:00
Helder Campos
86f8882fa9 feat(hpu): Add support for manyLUT and kogge-stone adder
*Kogge-stone adder required dynamic scheduling based on Hw parameters:
  - Moved isc/ from the mockup to the hpu backend as isc_sim
  - Extracted simulation parameters from the mockup to the backend
  - Made the rtl/ framework to use the new isc_sim framework
  - Many additions to the pe module to be able to plug it into the rtl/
    usecase
  - Bug fixes to rtl
  - Added a new PE report to the mockup with batch/issue/usage statistics

* Update config_store:
Remove old bitstream that couldn't be used with the current SW version.
Add New bitstream with HisV2+ManyLut feature.
Rename sim_pem2 to sim
2025-04-24 17:11:53 +02:00
Baptiste Roux
076272f256 feat(hpu): Implement new version of the Hpu Instruction Set
In depth rework of the DOp/IOp definition.
Dop structure was almost the same, however, the asm definition/parser
were rework with the following objectives:
* Template as first class citizen
* Support of Immediate template
* Direct parser and conversion between Asm/Hex
* Replace deku (and it's associated endianess limitation) by
  bitfield_struct and manual parsing

IOp structure was deeply rework. The aims was to introduce more
flexibiliy.
* Support various number of Destination
* Support various number of Sources
* Support various number of Immediat values
* Support of multiple bitwidth (Not implemented yet in the Fpga
  firmware)

Add lot of information in the IOp prototype that could be used at
runtime for proper error handling.
2025-04-24 17:11:52 +02:00
pgardratzama
876ba9ad91 feat(hpu-noise): Add dedicated test for Hpu noise measurements
Use variance formulas from latest optimizer fpga model in wip/fpga-poc
Add Csv for report generation
Test currently activated on FPGA & CPU 132b parameters set
2025-04-24 17:11:52 +02:00
tmontaigu
016c3e3e61 feat(hlapi): start adding HPU 2025-04-24 17:11:51 +02:00
tmontaigu
64eeff0742 fix(hpu): fix invalid degree after copy from hpu 2025-04-24 17:11:51 +02:00
Baptiste Roux
5831c85279 fix(hpu): Generate bit-accurate stimulus
Some changes inside tfhe-rs to be bit-accurate with the HW.
Hacky approach that rely on feature, must be reworked later
2025-04-24 17:11:50 +02:00
Baptiste Roux
0e11ae30a7 feat(hpu): Add Hpu backend implementation
This backend abstract communication with Hpu Fpga hardware.
It define it's proper entities to prevent circular dependencies with
tfhe-rs.
Object lifetime is handle through Arc<Mutex<T>> wrapper, and enforce
that all objects currently alive in Hpu Hw are also kept valid on the
host side.

Also add a Mockup implementation of the Hpu for simulation and debug purpose
2025-04-24 17:11:50 +02:00
Baptiste Roux
e12397ad69 fix(shortint): Fix issue with LUT generation
Encoding with NonNative ciphertext wasn't correct
2025-04-22 15:47:10 +02:00
657 changed files with 10219 additions and 20774 deletions

View File

@@ -8,6 +8,9 @@ inputs:
gcc-version:
description: Version of GCC to use
required: true
cmake-version:
description: Version of cmake to use
default: 3.29.6
github-instance:
description: Instance is hosted on GitHub
default: 'false'
@@ -19,58 +22,41 @@ runs:
- name: Install dependencies
shell: bash
run: |
wget https://github.com/Kitware/CMake/releases/download/v"${CMAKE_VERSION}"/cmake-"${CMAKE_VERSION}"-linux-x86_64.sh
echo "${CMAKE_SCRIPT_SHA} cmake-${CMAKE_VERSION}-linux-x86_64.sh" > checksum
sha256sum -c checksum
sudo bash cmake-"${CMAKE_VERSION}"-linux-x86_64.sh --skip-license --prefix=/usr/ --exclude-subdir
sudo apt update
sudo apt install -y cmake-format libclang-dev
env:
CMAKE_VERSION: 3.29.6
CMAKE_SCRIPT_SHA: "6e4fada5cba3472ae503a11232b6580786802f0879cead2741672bf65d97488a"
curl -fsSL https://apt.kitware.com/keys/kitware-archive-latest.asc | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/kitware.gpg
sudo chmod 644 /etc/apt/trusted.gpg.d/kitware.gpg
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/kitware.gpg] https://apt.kitware.com/ubuntu/ jammy main' | sudo tee /etc/apt/sources.list.d/kitware.list >/dev/null
sudo apt update
sudo apt install -y cmake cmake-format libclang-dev
- name: Install CUDA
if: inputs.github-instance == 'true'
shell: bash
run: |
TOOLKIT_VERSION="$(echo ${CUDA_VERSION} | sed 's/\(.*\)\.\(.*\)/\1-\2/')"
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/${env.CUDA_KEYRING_PACKAGE}
echo "${CUDA_KEYRING_SHA} ${CUDA_KEYRING_PACKAGE}" > checksum
sha256sum -c checksum
sudo dpkg -i "${CUDA_KEYRING_PACKAGE}"
TOOLKIT_VERSION="$(echo ${{ inputs.cuda-version }} | sed 's/\(.*\)\.\(.*\)/\1-\2/')"
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt -y install cuda-toolkit-"${TOOLKIT_VERSION}"
env:
CUDA_VERSION: ${{ inputs.cuda-version }}
CUDA_KEYRING_PACKAGE: cuda-keyring_1.1-1_all.deb
CUDA_KEYRING_SHA: "d93190d50b98ad4699ff40f4f7af50f16a76dac3bb8da1eaaf366d47898ff8df"
sudo apt -y install cuda-toolkit-${TOOLKIT_VERSION}
- name: Export CUDA variables
shell: bash
run: |
CUDA_PATH=/usr/local/cuda-"${CUDA_VERSION}"
{
echo "CUDA_PATH=$CUDA_PATH";
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH";
echo "CUDA_MODULE_LOADER=EAGER";
} >> "${GITHUB_ENV}"
{
echo "PATH=$PATH:$CUDA_PATH/bin";
} >> "${GITHUB_PATH}"
env:
CUDA_VERSION: ${{ inputs.cuda-version }}
CUDA_PATH=/usr/local/cuda-${{ inputs.cuda-version }}
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "PATH=$PATH:$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDA_MODULE_LOADER=EAGER" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
shell: bash
run: |
{
echo "CC=/usr/bin/gcc-${GCC_VERSION}";
echo "CXX=/usr/bin/g++-${GCC_VERSION}";
echo "CUDAHOSTCXX=/usr/bin/g++-${GCC_VERSION}";
echo "CC=/usr/bin/gcc-${{ inputs.gcc-version }}";
echo "CXX=/usr/bin/g++-${{ inputs.gcc-version }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ inputs.gcc-version }}";
} >> "${GITHUB_ENV}"
env:
GCC_VERSION: ${{ inputs.gcc-version }}
- name: Check device is detected
shell: bash

View File

@@ -6,9 +6,6 @@ on:
pull_request_review:
types: [submitted]
permissions: {}
jobs:
trigger-tests:
runs-on: ubuntu-latest
@@ -37,10 +34,3 @@ jobs:
# We need to use a PAT to be able to trigger `labeled` event for the other workflow.
github_token: ${{ secrets.FHE_ACTIONS_TOKEN }}
labels: approved
- name: Check if maintainer needs to handle label manually
if: ${{ failure() }}
run: |
echo "Pull-request from an external contributor."
echo "A maintainer need to manually add/remove the 'approved' label."
exit 1

View File

@@ -23,9 +23,6 @@ on:
workflow_dispatch:
pull_request:
permissions:
contents: read
jobs:
setup-instance:
name: Setup instance (backward-compat-tests)
@@ -50,7 +47,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
backward-compat-tests:
name: Backward compatibility tests
@@ -86,12 +83,11 @@ jobs:
- name: Get backward compat branch head SHA
id: backward_compat_sha
run: |
SHA=$(git ls-remote "${REPO_URL}" refs/heads/"${BACKWARD_COMPAT_BRANCH}" | awk '{print $1}')
echo "sha=${SHA}" >> "${GITHUB_OUTPUT}"
env:
REPO_URL: "https://github.com/zama-ai/tfhe-backward-compat-data"
BACKWARD_COMPAT_BRANCH: ${{ steps.backward_compat_branch.outputs.branch }}
run: |
SHA=$(git ls-remote ${{ env.REPO_URL }} refs/heads/${{ steps.backward_compat_branch.outputs.branch }} | awk '{print $1}')
echo "sha=${SHA}" >> "${GITHUB_OUTPUT}"
- name: Retrieve data from cache
id: retrieve-data-cache
@@ -105,7 +101,6 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
repository: zama-ai/tfhe-backward-compat-data
path: tests/tfhe-backward-compat-data
lfs: 'true'
@@ -126,12 +121,10 @@ jobs:
- name: Set pull-request URL
if: ${{ failure() && github.event_name == 'pull_request' }}
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -24,9 +24,6 @@ on:
workflow_dispatch:
pull_request:
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -157,7 +154,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
fast-tests:
name: Fast CPU tests
@@ -272,9 +269,7 @@ jobs:
- name: Set pull-request URL
if: ${{ failure() && github.event_name == 'pull_request' }}
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Slack Notification
if: ${{ failure() && env.SECRETS_AVAILABLE == 'true' }}
@@ -302,7 +297,7 @@ jobs:
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -30,9 +30,6 @@ on:
branches:
- main
permissions:
contents: read
jobs:
should-run:
if:
@@ -97,7 +94,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
unsigned-integer-tests:
name: Unsigned integer tests
@@ -137,17 +134,15 @@ jobs:
- name: Run unsigned integer tests
run: |
AVX512_SUPPORT=ON NO_BIG_PARAMS="${NO_BIG_PARAMS}" BIG_TESTS_INSTANCE=TRUE make test_unsigned_integer_ci
AVX512_SUPPORT=ON NO_BIG_PARAMS=${{ env.NO_BIG_PARAMS }} BIG_TESTS_INSTANCE=TRUE make test_unsigned_integer_ci
- name: Set pull-request URL
if: ${{ failure() && github.event_name == 'pull_request' }}
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -30,9 +30,6 @@ on:
branches:
- main
permissions:
contents: read
jobs:
should-run:
if:
@@ -98,7 +95,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
signed-integer-tests:
name: Signed integer tests
@@ -142,17 +139,15 @@ jobs:
- name: Run signed integer tests
run: |
AVX512_SUPPORT=ON NO_BIG_PARAMS="${NO_BIG_PARAMS}" BIG_TESTS_INSTANCE=TRUE make test_signed_integer_ci
AVX512_SUPPORT=ON NO_BIG_PARAMS=${{ env.NO_BIG_PARAMS }} BIG_TESTS_INSTANCE=TRUE make test_signed_integer_ci
- name: Set pull-request URL
if: ${{ failure() && github.event_name == 'pull_request' }}
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -27,9 +27,6 @@ on:
# Nightly tests @ 1AM after each work day
- cron: "0 1 * * MON-FRI"
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -166,7 +163,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cpu-tests:
name: CPU tests
@@ -254,12 +251,10 @@ jobs:
- name: Set pull-request URL
if: ${{ failure() && github.event_name == 'pull_request' }}
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -23,9 +23,6 @@ on:
pull_request:
types: [ labeled ]
permissions:
contents: read
jobs:
setup-instance:
name: Setup instance (wasm-tests)
@@ -51,7 +48,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
wasm-tests:
name: WASM tests
@@ -123,12 +120,10 @@ jobs:
- name: Set pull-request URL
if: ${{ failure() && github.event_name == 'pull_request' }}
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -18,9 +18,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
setup-instance:
name: Setup instance (boolean-benchmarks)
@@ -48,6 +45,7 @@ jobs:
concurrency:
group: ${{ github.workflow_ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
@@ -75,17 +73,15 @@ jobs:
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512
env:
REF_NAME: ${{ github.ref_name }}
- name: Measure key sizes
run: |
@@ -93,7 +89,7 @@ jobs:
- name: Parse key sizes results
run: |
python3 ./ci/benchmark_parser.py tfhe-benchmark/boolean_key_sizes.csv "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py tfhe/boolean_key_sizes.csv ${{ env.RESULTS_FILENAME }} \
--object-sizes \
--append-results
@@ -114,11 +110,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -18,9 +18,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
setup-instance:
name: Setup instance (core-crypto-benchmarks)
@@ -78,17 +75,15 @@ jobs:
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--name-suffix avx512 \
--walk-subdirs
env:
REF_NAME: ${{ github.ref_name }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -107,11 +102,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -1,152 +0,0 @@
# Run all DEX benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: DEX benchmarks
on:
workflow_dispatch:
schedule:
# Weekly benchmarks will be triggered each Saturday at 5a.m.
- cron: '0 5 * * 6'
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
setup-instance:
name: Setup instance (dex-benchmarks)
runs-on: ubuntu-latest
if: github.event_name == 'workflow_dispatch' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@79939325c3c429837c10d6041e4fd8589d328bac
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: bench
dex-benchmarks:
name: Execute DEX benchmarks
needs: setup-instance
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
concurrency:
group: ${{ github.workflow_ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
timeout-minutes: 720 # 12 hours
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@888c2e1ea69ab0d4330cbf0af1ecc7b68f368cc1
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
repository: zama-ai/slab
path: slab
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Run benchmarks
run: |
make bench_hlapi_dex
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--walk-subdirs \
--name-suffix avx512
env:
REF_NAME: ${{ github.ref_name }}
- name: Parse swap request PBS counts
run: |
python3 ./ci/benchmark_parser.py tfhe-benchmark/dex_swap_request_pbs_count.csv "${RESULTS_FILENAME}" \
--object-sizes \
--append-results
- name: Parse swap claim PBS counts
run: |
python3 ./ci/benchmark_parser.py tfhe-benchmark/dex_swap_claim_pbs_count.csv "${RESULTS_FILENAME}" \
--object-sizes \
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
with:
name: ${{ github.sha }}_dex
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "DEX benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (dex-benchmarks)
if: ${{ always() && needs.setup-instance.result == 'success' }}
needs: [ setup-instance, dex-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@79939325c3c429837c10d6041e4fd8589d328bac
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (dex-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -18,9 +18,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
setup-instance:
name: Setup instance (erc20-benchmarks)
@@ -48,6 +45,7 @@ jobs:
concurrency:
group: ${{ github.workflow_ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
timeout-minutes: 720 # 12 hours
steps:
- name: Checkout tfhe-rs repo with tags
@@ -84,21 +82,19 @@ jobs:
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512
env:
REF_NAME: ${{ github.ref_name }}
- name: Parse PBS counts
run: |
python3 ./ci/benchmark_parser.py tfhe-benchmark/erc20_pbs_count.csv "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py tfhe/erc20_pbs_count.csv ${{ env.RESULTS_FILENAME }} \
--object-sizes \
--append-results
@@ -111,11 +107,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -10,14 +10,13 @@ on:
type: choice
options:
- "l40 (n3-L40x1)"
- "4-l40 (n3-L40x4)"
- "multi-a100-nvlink (n3-A100x8-NVLink)"
- "single-h100 (n3-H100x1)"
- "2-h100 (n3-H100x2)"
- "4-h100 (n3-H100x4)"
- "multi-h100 (n3-H100x8)"
- "multi-h100-nvlink (n3-H100x8-NVLink)"
- "multi-h100-sxm5 (n3-H100x8-SXM5)"
- "multi-a100-nvlink (n3-A100x8-NVLink)"
command:
description: "Benchmark command to run"
type: choice
@@ -60,33 +59,22 @@ on:
- multi_bit
- both
permissions: {}
jobs:
parse-inputs:
runs-on: ubuntu-latest
outputs:
profile: ${{ steps.parse_profile.outputs.profile }}
hardware_name: ${{ steps.parse_hardware_name.outputs.name }}
env:
INPUTS_PROFILE: ${{ inputs.profile }}
steps:
- name: Parse profile
id: parse_profile
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
PROFILE=$(echo "${INPUTS_PROFILE}" | sed 's|\(.*\)[[:space:]](.*)|\1|')
echo "profile=${PROFILE}" >> "${GITHUB_OUTPUT}"
echo "profile=$(echo '${{ inputs.profile }}' | sed 's|\(.*\)[[:space:]](.*)|\1|')" >> "${GITHUB_OUTPUT}"
- name: Parse hardware name
id: parse_hardware_name
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
NAME=$(echo "${INPUTS_PROFILE}" | sed 's|.*[[:space:]](\(.*\))|\1|')
echo "name=${NAME}" >> "${GITHUB_OUTPUT}"
echo "name=$(echo '${{ inputs.profile }}' | sed 's|.*[[:space:]](\(.*\))|\1|')" >> "${GITHUB_OUTPUT}"
run-benchmarks:
name: Run benchmarks
@@ -100,12 +88,4 @@ jobs:
bench_type: ${{ inputs.bench_type }}
params_type: ${{ inputs.params_type }}
all_precisions: ${{ inputs.all_precisions }}
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit

View File

@@ -22,9 +22,6 @@ on:
# Weekly benchmarks will be triggered each Friday at 9p.m.
- cron: "0 21 * * 5"
permissions:
contents: read
jobs:
cuda-integer-benchmarks:
name: Cuda integer benchmarks (RTX 4090)
@@ -72,17 +69,15 @@ jobs:
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "rtx4090" \
--backend gpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs
env:
REF_NAME: ${{ github.ref_name }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -93,11 +88,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:
@@ -150,14 +145,14 @@ jobs:
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "rtx4090" \
--backend gpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
@@ -170,11 +165,19 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -58,9 +58,6 @@ env:
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_BENCH: TRUE
permissions: {}
jobs:
prepare-matrix:
name: Prepare operations matrix
@@ -70,56 +67,44 @@ jobs:
op_flavor: ${{ steps.set_op_flavor.outputs.op_flavor }}
bench_type: ${{ steps.set_bench_type.outputs.bench_type }}
params_type: ${{ steps.set_params_type.outputs.params_type }}
env:
INPUTS_COMMAND: ${{ inputs.command }}
INPUTS_OP_FLAVOR: ${{ inputs.op_flavor }}
steps:
- name: Set single command
if: ${{ !contains(inputs.command, ',')}}
run: |
echo "COMMAND=[\"${INPUTS_COMMAND}\"]" >> "${GITHUB_ENV}"
echo "COMMAND=[\"${{ inputs.command }}\"]" >> "${GITHUB_ENV}"
- name: Set multiple commands
if: ${{ contains(inputs.command, ',')}}
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
PARSED_COMMAND=$(echo "${INPUTS_COMMAND}" | sed 's/[[:space:]]*,[[:space:]]*/\\", \\"/g')
PARSED_COMMAND=$(echo "${{ inputs.command }}" | sed 's/[[:space:]]*,[[:space:]]*/\\", \\"/g')
echo "COMMAND=[\"${PARSED_COMMAND}\"]" >> "${GITHUB_ENV}"
- name: Set single operations flavor
if: ${{ !contains(inputs.op_flavor, ',')}}
run: |
echo "OP_FLAVOR=[\"${INPUTS_OP_FLAVOR}\"]" >> "${GITHUB_ENV}"
echo "OP_FLAVOR=[\"${{ inputs.op_flavor }}\"]" >> "${GITHUB_ENV}"
- name: Set multiple operations flavors
if: ${{ contains(inputs.op_flavor, ',')}}
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
PARSED_OP_FLAVOR=$(echo "${INPUTS_OP_FLAVOR}" | sed 's/[[:space:]]*,[[:space:]]*/", "/g')
PARSED_OP_FLAVOR=$(echo "${{ inputs.op_flavor }}" | sed 's/[[:space:]]*,[[:space:]]*/", "/g')
echo "OP_FLAVOR=[\"${PARSED_OP_FLAVOR}\"]" >> "${GITHUB_ENV}"
- name: Set benchmark types
run: |
if [[ "${INPUTS_BENCH_TYPE}" == "both" ]]; then
if [[ "${{ inputs.bench_type }}" == "both" ]]; then
echo "BENCH_TYPE=[\"latency\", \"throughput\"]" >> "${GITHUB_ENV}"
else
echo "BENCH_TYPE=[\"${INPUTS_BENCH_TYPE}\"]" >> "${GITHUB_ENV}"
echo "BENCH_TYPE=[\"${{ inputs.bench_type }}\"]" >> "${GITHUB_ENV}"
fi
env:
INPUTS_BENCH_TYPE: ${{ inputs.bench_type }}
- name: Set parameters types
run: |
if [[ "${INPUTS_PARAMS_TYPE}" == "both" ]]; then
if [[ "${{ inputs.params_type }}" == "both" ]]; then
echo "PARAMS_TYPE=[\"classical\", \"multi_bit\"]" >> "${GITHUB_ENV}"
else
echo "PARAMS_TYPE=[\"${INPUTS_PARAMS_TYPE}\"]" >> "${GITHUB_ENV}"
echo "PARAMS_TYPE=[\"${{ inputs.params_type }}\"]" >> "${GITHUB_ENV}"
fi
env:
INPUTS_PARAMS_TYPE: ${{ inputs.params_type }}
- name: Set command output
id: set_command
@@ -169,11 +154,9 @@ jobs:
if: steps.start-remote-instance.outcome == 'failure' &&
inputs.profile != 'single-h100'
run: |
echo "Remote instance instance has failed to start (profile provided: '${INPUTS_PROFILE}')"
echo "Remote instance instance has failed to start (profile provided: '${{ inputs.profile }}')"
echo "Permanent instance instance cannot be used as a substitute (profile needed: 'single-h100')"
exit 1
env:
INPUTS_PROFILE: ${{ inputs.profile }}
# This will allow to fallback on permanent instances running on Hyperstack.
- name: Use permanent remote instance
@@ -215,6 +198,7 @@ jobs:
needs: [ prepare-matrix, setup-instance, install-dependencies ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
timeout-minutes: 1440 # 24 hours
continue-on-error: true
strategy:
fail-fast: false
max-parallel: 1
@@ -275,30 +259,21 @@ jobs:
- name: Run benchmarks
run: |
make BENCH_OP_FLAVOR="${OP_FLAVOR}" BENCH_TYPE="${BENCH_TYPE}" BENCH_PARAM_TYPE="${BENCH_PARAMS_TYPE}" bench_"${BENCH_COMMAND}"_gpu
env:
OP_FLAVOR: ${{ matrix.op_flavor }}
BENCH_TYPE: ${{ matrix.bench_type }}
BENCH_PARAMS_TYPE: ${{ matrix.params_type }}
BENCH_COMMAND: ${{ matrix.command }}
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} BENCH_TYPE=${{ matrix.bench_type }} BENCH_PARAM_TYPE=${{ matrix.params_type }} bench_${{ matrix.command }}_gpu
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "${INPUTS_HARDWARE_NAME}" \
--hardware "${{ inputs.hardware_name }}" \
--backend gpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--bench-type "${BENCH_TYPE}"
env:
INPUTS_HARDWARE_NAME: ${{ inputs.hardware_name }}
REF_NAME: ${{ github.ref_name }}
BENCH_TYPE: ${{ matrix.bench_type }}
--bench-type ${{ matrix.bench_type }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -317,7 +292,7 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
slack-notify:

View File

@@ -1,64 +0,0 @@
# Run CUDA DEX benchmarks on a Hyperstack VM and return parsed results to Slab CI bot.
name: Cuda DEX benchmarks
on:
workflow_dispatch:
inputs:
profile:
description: "Instance type"
required: true
type: choice
options:
- "l40 (n3-L40x1)"
- "4-l40 (n3-L40x4)"
- "multi-a100-nvlink (n3-A100x8-NVLink)"
- "single-h100 (n3-H100x1)"
- "2-h100 (n3-H100x2)"
- "4-h100 (n3-H100x4)"
- "multi-h100 (n3-H100x8)"
- "multi-h100-nvlink (n3-H100x8-NVLink)"
- "multi-h100-sxm5 (n3-H100x8-SXM5)"
permissions: {}
jobs:
parse-inputs:
runs-on: ubuntu-latest
outputs:
profile: ${{ steps.parse_profile.outputs.profile }}
hardware_name: ${{ steps.parse_hardware_name.outputs.name }}
env:
INPUTS_PROFILE: ${{ inputs.profile }}
steps:
- name: Parse profile
id: parse_profile
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
PROFILE=$(echo "${INPUTS_PROFILE}" | sed 's|\(.*\)[[:space:]](.*)|\1|')
echo "profile=${PROFILE}" >> "${GITHUB_OUTPUT}"
- name: Parse hardware name
id: parse_hardware_name
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
NAME=$(echo "${INPUTS_PROFILE}" | sed 's|.*[[:space:]](\(.*\))|\1|')
echo "name=${NAME}" >> "${GITHUB_OUTPUT}"
run-benchmarks:
name: Run benchmarks
needs: parse-inputs
uses: ./.github/workflows/benchmark_gpu_dex_common.yml
with:
profile: ${{ needs.parse-inputs.outputs.profile }}
hardware_name: ${{ needs.parse-inputs.outputs.hardware_name }}
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}

View File

@@ -1,208 +0,0 @@
# Run DEX benchmarks on an instance with CUDA and return parsed results to Slab CI bot.
name: Cuda DEX benchmarks - common
on:
workflow_call:
inputs:
backend:
type: string
default: hyperstack
profile:
type: string
required: true
hardware_name:
type: string
required: true
secrets:
REPO_CHECKOUT_TOKEN:
required: true
SLAB_ACTION_TOKEN:
required: true
SLAB_BASE_URL:
required: true
SLAB_URL:
required: true
JOB_SECRET:
required: true
SLACK_CHANNEL:
required: true
BOT_USERNAME:
required: true
SLACK_WEBHOOK:
required: true
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
setup-instance:
name: Setup instance (cuda-dex-benchmarks)
runs-on: ubuntu-latest
if: github.event_name == 'workflow_dispatch' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
# Use permanent remote instance label first as on-demand remote instance label output is set before the end of start-remote-instance step.
# If the latter fails due to a failed GitHub action runner set up, we have to fallback on the permanent instance.
# Since the on-demand remote label is set before failure, we have to do the logical OR in this order,
# otherwise we'll try to run the next job on a non-existing on-demand instance.
runner-name: ${{ steps.use-permanent-instance.outputs.runner_group || steps.start-remote-instance.outputs.label }}
remote-instance-outcome: ${{ steps.start-remote-instance.outcome }}
steps:
- name: Start remote instance
id: start-remote-instance
continue-on-error: true
uses: zama-ai/slab-github-runner@79939325c3c429837c10d6041e4fd8589d328bac
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: ${{ inputs.backend }}
profile: ${{ inputs.profile }}
- name: Acknowledge remote instance failure
if: steps.start-remote-instance.outcome == 'failure' &&
inputs.profile != 'single-h100'
run: |
echo "Remote instance instance has failed to start (profile provided: '${INPUTS_PROFILE}')"
echo "Permanent instance instance cannot be used as a substitute (profile needed: 'single-h100')"
exit 1
env:
INPUTS_PROFILE: ${{ inputs.profile }}
# This will allow to fallback on permanent instances running on Hyperstack.
- name: Use permanent remote instance
id: use-permanent-instance
if: env.SECRETS_AVAILABLE == 'true' &&
steps.start-remote-instance.outcome == 'failure' &&
inputs.profile == 'single-h100'
run: |
echo "runner_group=h100x1" >> "$GITHUB_OUTPUT"
cuda-dex-benchmarks:
name: Cuda DEX benchmarks (${{ inputs.profile }})
needs: setup-instance
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 11
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Setup Hyperstack dependencies
if: needs.setup-instance.outputs.remote-instance-outcome == 'success'
uses: ./.github/actions/gpu_setup
with:
cuda-version: ${{ matrix.cuda }}
gcc-version: ${{ matrix.gcc }}
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@888c2e1ea69ab0d4330cbf0af1ecc7b68f368cc1
with:
toolchain: nightly
- name: Run benchmarks
run: |
make bench_hlapi_dex_gpu
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
--database tfhe_rs \
--hardware "${INPUTS_HARDWARE_NAME}" \
--backend gpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--walk-subdirs \
--name-suffix avx512
env:
INPUTS_HARDWARE_NAME: ${{ inputs.hardware_name }}
REF_NAME: ${{ github.ref_name }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
with:
name: ${{ github.sha }}_dex_${{ inputs.profile }}
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
repository: zama-ai/slab
path: slab
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-dex-benchmarks ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-dex-benchmarks.result != 'skipped' && failure() }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:
SLACK_COLOR: ${{ needs.cuda-dex-benchmarks.result }}
SLACK_MESSAGE: "Cuda DEX benchmarks (${{ inputs.profile }}) finished with status: ${{ needs.cuda-dex-benchmarks.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-dex-${{ inputs.profile }}-benchmarks)
if: ${{ always() && needs.setup-instance.outputs.remote-instance-outcome == 'success' }}
needs: [ setup-instance, cuda-dex-benchmarks, slack-notify ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@79939325c3c429837c10d6041e4fd8589d328bac
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-dex-${{ inputs.profile }}-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -1,61 +0,0 @@
# Run CUDA DEX benchmarks on multiple Hyperstack VMs and return parsed results to Slab CI bot.
name: Cuda DEX weekly benchmarks
on:
schedule:
# Weekly benchmarks will be triggered each Saturday at 9a.m.
- cron: '0 9 * * 6'
permissions: {}
jobs:
run-benchmarks-1-h100:
name: Run benchmarks (1xH100)
if: github.repository == 'zama-ai/tfhe-rs'
uses: ./.github/workflows/benchmark_gpu_dex_common.yml
with:
profile: single-h100
hardware_name: n3-H100x1
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
run-benchmarks-2-h100:
name: Run benchmarks (2xH100)
if: github.repository == 'zama-ai/tfhe-rs'
uses: ./.github/workflows/benchmark_gpu_dex_common.yml
with:
profile: 2-h100
hardware_name: n3-H100x2
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
run-benchmarks-8-h100:
name: Run benchmarks (8xH100)
if: github.repository == 'zama-ai/tfhe-rs'
uses: ./.github/workflows/benchmark_gpu_dex_common.yml
with:
profile: multi-h100
hardware_name: n3-H100x8
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}

View File

@@ -10,8 +10,6 @@ on:
type: choice
options:
- "l40 (n3-L40x1)"
- "4-l40 (n3-L40x4)"
- "multi-a100-nvlink (n3-A100x8-NVLink)"
- "single-h100 (n3-H100x1)"
- "2-h100 (n3-H100x2)"
- "4-h100 (n3-H100x4)"
@@ -19,33 +17,22 @@ on:
- "multi-h100-nvlink (n3-H100x8-NVLink)"
- "multi-h100-sxm5 (n3-H100x8-SXM5)"
permissions: {}
jobs:
parse-inputs:
runs-on: ubuntu-latest
outputs:
profile: ${{ steps.parse_profile.outputs.profile }}
hardware_name: ${{ steps.parse_hardware_name.outputs.name }}
env:
INPUTS_PROFILE: ${{ inputs.profile }}
steps:
- name: Parse profile
id: parse_profile
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
PROFILE=$(echo "${INPUTS_PROFILE}" | sed 's|\(.*\)[[:space:]](.*)|\1|')
echo "profile=${PROFILE}" >> "${GITHUB_OUTPUT}"
echo "profile=$(echo '${{ inputs.profile }}' | sed 's|\(.*\)[[:space:]](.*)|\1|')" >> "${GITHUB_OUTPUT}"
- name: Parse hardware name
id: parse_hardware_name
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
NAME=$(echo "${INPUTS_PROFILE}" | sed 's|.*[[:space:]](\(.*\))|\1|')
echo "name=${NAME}" >> "${GITHUB_OUTPUT}"
echo "name=$(echo '${{ inputs.profile }}' | sed 's|.*[[:space:]](\(.*\))|\1|')" >> "${GITHUB_OUTPUT}"
run-benchmarks:
name: Run benchmarks
@@ -54,12 +41,4 @@ jobs:
with:
profile: ${{ needs.parse-inputs.outputs.profile }}
hardware_name: ${{ needs.parse-inputs.outputs.hardware_name }}
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit

View File

@@ -43,9 +43,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
setup-instance:
name: Setup instance (cuda-erc20-benchmarks)
@@ -76,11 +73,9 @@ jobs:
if: steps.start-remote-instance.outcome == 'failure' &&
inputs.profile != 'single-h100'
run: |
echo "Remote instance instance has failed to start (profile provided: '${INPUTS_PROFILE}')"
echo "Remote instance instance has failed to start (profile provided: '${{ inputs.profile }}')"
echo "Permanent instance instance cannot be used as a substitute (profile needed: 'single-h100')"
exit 1
env:
INPUTS_PROFILE: ${{ inputs.profile }}
# This will allow to fallback on permanent instances running on Hyperstack.
- name: Use permanent remote instance
@@ -137,19 +132,16 @@ jobs:
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "${INPUTS_HARDWARE_NAME}" \
--hardware "${{ inputs.hardware_name }}" \
--backend gpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512
env:
INPUTS_HARDWARE_NAME: ${{ inputs.hardware_name }}
REF_NAME: ${{ github.ref_name }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -168,7 +160,7 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
slack-notify:

View File

@@ -6,9 +6,6 @@ on:
# Weekly benchmarks will be triggered each Saturday at 5a.m.
- cron: '0 5 * * 6'
permissions: {}
jobs:
run-benchmarks-1-h100:
name: Run benchmarks (1xH100)
@@ -17,15 +14,7 @@ jobs:
with:
profile: single-h100
hardware_name: n3-H100x1
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit
run-benchmarks-2-h100:
name: Run benchmarks (2xH100)
@@ -34,15 +23,7 @@ jobs:
with:
profile: 2-h100
hardware_name: n3-H100x2
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit
run-benchmarks-8-h100:
name: Run benchmarks (8xH100)
@@ -51,12 +32,4 @@ jobs:
with:
profile: multi-h100
hardware_name: n3-H100x8
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit

View File

@@ -6,9 +6,6 @@ on:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
permissions: {}
jobs:
run-benchmarks-1-h100:
name: Run integer benchmarks (1xH100)
@@ -21,15 +18,7 @@ jobs:
op_flavor: default
bench_type: latency
all_precisions: true
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit
run-benchmarks-2-h100:
name: Run integer benchmarks (2xH100)
@@ -42,15 +31,7 @@ jobs:
op_flavor: default
bench_type: latency
all_precisions: true
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit
run-benchmarks-8-h100:
name: Run integer benchmarks (8xH100)
@@ -63,15 +44,7 @@ jobs:
op_flavor: default
bench_type: latency
all_precisions: true
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit
run-benchmarks-l40:
name: Run integer benchmarks (L40)
@@ -84,15 +57,7 @@ jobs:
op_flavor: default
bench_type: latency
all_precisions: true
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit
run-benchmarks-1-h100-core-crypto:
name: Run core-crypto benchmarks (1xH100)
@@ -103,12 +68,4 @@ jobs:
hardware_name: n3-H100x1
command: pbs,pbs128,ks,ks_pbs
bench_type: latency
secrets:
BOT_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
REPO_CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN }}
JOB_SECRET: ${{ secrets.JOB_SECRET }}
SLAB_ACTION_TOKEN: ${{ secrets.SLAB_ACTION_TOKEN }}
SLAB_URL: ${{ secrets.SLAB_URL }}
SLAB_BASE_URL: ${{ secrets.SLAB_BASE_URL }}
secrets: inherit

View File

@@ -1,88 +0,0 @@
# Run all integer benchmarks on a permanent HPU instance and return parsed results to Slab CI bot.
name: Hpu Integer Benchmarks
on:
workflow_dispatch:
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
permissions: {}
jobs:
integer-benchmarks-hpu:
name: Execute integer & erc20 benchmarks for HPU backend
runs-on: v80-desktop
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
timeout-minutes: 1440 # 24 hours
steps:
# Needed as long as hw_regmap repository is private
- name: Configure SSH
uses: webfactory/ssh-agent@a6f90b1f127823b31d4d4a8d96047790581349bd # v0.9.1
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@a54c7afa936fefeb4456b2dd8068152669aa8203
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
repository: zama-ai/slab
path: slab
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Run benchmarks
run: |
make bench_integer_hpu
make bench_hlapi_erc20_hpu
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
--database tfhe_rs \
--hardware "hpu_x1" \
--backend hpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--walk-subdirs
env:
REF_NAME: ${{ github.ref_name }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08
with:
name: ${{ github.sha }}_integer_benchmarks
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"

View File

@@ -36,9 +36,6 @@ env:
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_BENCH: TRUE
permissions: {}
jobs:
prepare-matrix:
name: Prepare operations matrix
@@ -63,13 +60,11 @@ jobs:
if: github.event_name == 'workflow_dispatch'
run: |
echo "OP_FLAVOR=[\"default\"]" >> "${GITHUB_ENV}"
if [[ "${INPUTS_BENCH_TYPE}" == "both" ]]; then
if [[ "${{ inputs.bench_type }}" == "both" ]]; then
echo "BENCH_TYPE=[\"latency\", \"throughput\"]" >> "${GITHUB_ENV}"
else
echo "BENCH_TYPE=[\"${INPUTS_BENCH_TYPE}\"]" >> "${GITHUB_ENV}"
echo "BENCH_TYPE=[\"${{ inputs.bench_type }}\"]" >> "${GITHUB_ENV}"
fi
env:
INPUTS_BENCH_TYPE: ${{ inputs.bench_type }}
- name: Default benchmark type
if: github.event_name != 'workflow_dispatch'
@@ -111,6 +106,7 @@ jobs:
concurrency:
group: ${{ github.workflow_ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
timeout-minutes: 1440 # 24 hours
strategy:
max-parallel: 1
@@ -154,35 +150,26 @@ jobs:
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR="${OP_FLAVOR}" BENCH_TYPE="${BENCH_TYPE}" bench_"${BENCH_COMMAND}"
env:
OP_FLAVOR: ${{ matrix.op_flavor }}
BENCH_TYPE: ${{ matrix.bench_type }}
BENCH_COMMAND: ${{ matrix.command }}
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} BENCH_TYPE=${{ matrix.bench_type }} bench_${{ matrix.command }}
# Run these benchmarks only once per benchmark type
- name: Run compression benchmarks with AVX512
if: matrix.op_flavor == 'default' && matrix.command == 'integer'
run: |
make BENCH_TYPE="${BENCH_TYPE}" bench_integer_compression
env:
BENCH_TYPE: ${{ matrix.bench_type }}
make BENCH_TYPE=${{ matrix.bench_type }} bench_integer_compression
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--bench-type "${BENCH_TYPE}"
env:
REF_NAME: ${{ github.ref_name }}
BENCH_TYPE: ${{ matrix.bench_type }}
--bench-type ${{ matrix.bench_type }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -193,11 +180,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -22,9 +22,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
prepare-matrix:
name: Prepare operations matrix
@@ -75,6 +72,7 @@ jobs:
concurrency:
group: ${{ github.workflow_ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
strategy:
max-parallel: 1
matrix:
@@ -110,23 +108,21 @@ jobs:
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR="${OP_FLAVOR}" bench_shortint
env:
OP_FLAVOR: ${{ matrix.op_flavor }}
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_shortint
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512
env:
REF_NAME: ${{ github.ref_name }}
# This small benchmark needs to be executed only once.
- name: Measure key sizes
@@ -137,7 +133,7 @@ jobs:
- name: Parse key sizes results
if: matrix.op_flavor == 'default'
run: |
python3 ./ci/benchmark_parser.py tfhe-benchmark/shortint_key_sizes.csv "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py tfhe/shortint_key_sizes.csv ${{ env.RESULTS_FILENAME }} \
--object-sizes \
--append-results
@@ -150,11 +146,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -36,9 +36,6 @@ env:
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_BENCH: TRUE
permissions: {}
jobs:
prepare-matrix:
name: Prepare operations matrix
@@ -63,13 +60,11 @@ jobs:
if: github.event_name == 'workflow_dispatch'
run: |
echo "OP_FLAVOR=[\"default\"]" >> "${GITHUB_ENV}"
if [[ "${INPUTS_BENCH_TYPE}" == "both" ]]; then
if [[ "${{ inputs.bench_type }}" == "both" ]]; then
echo "BENCH_TYPE=[\"latency\", \"throughput\"]" >> "${GITHUB_ENV}"
else
echo "BENCH_TYPE=[\"${INPUTS_BENCH_TYPE}\"]" >> "${GITHUB_ENV}"
echo "BENCH_TYPE=[\"${{ inputs.bench_type }}\"]" >> "${GITHUB_ENV}"
fi
env:
INPUTS_BENCH_TYPE: ${{ inputs.bench_type }}
- name: Default benchmark type
if: github.event_name != 'workflow_dispatch'
@@ -111,6 +106,7 @@ jobs:
concurrency:
group: ${{ github.workflow_ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
timeout-minutes: 1440 # 24 hours
strategy:
max-parallel: 1
@@ -154,27 +150,20 @@ jobs:
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR="${OP_FLAVOR}" BENCH_TYPE="${BENCH_TYPE}" bench_signed_"${BENCH_COMMAND}"
env:
OP_FLAVOR: ${{ matrix.op_flavor }}
BENCH_TYPE: ${{ matrix.bench_type }}
BENCH_COMMAND: ${{ matrix.command }}
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} BENCH_TYPE=${{ matrix.bench_type }} bench_signed_${{ matrix.command }}
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--bench-type "${BENCH_TYPE}"
env:
REF_NAME: ${{ github.ref_name }}
BENCH_TYPE: ${{ matrix.bench_type }}
--bench-type ${{ matrix.bench_type }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -185,11 +174,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -23,9 +23,6 @@ on:
# Job will be triggered each Thursday at 11p.m.
- cron: '0 23 * * 4'
permissions: {}
jobs:
setup-ec2:
name: Setup EC2 instance (fft-benchmarks)
@@ -56,8 +53,6 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Get benchmark details
run: |
@@ -79,16 +74,14 @@ jobs:
- name: Parse AVX512 results
run: |
python3 ./ci/fft_benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/fft_benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database concrete_fft \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--name-suffix avx512
env:
REF_NAME: ${{ github.ref_name }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -107,11 +100,19 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
echo "Computing HMac on downloaded artifact"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -23,9 +23,6 @@ on:
# Job will be triggered each Friday at 11p.m.
- cron: "0 23 * * 5"
permissions: {}
jobs:
setup-ec2:
name: Setup EC2 instance (ntt-benchmarks)
@@ -56,8 +53,6 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Get benchmark details
run: |
@@ -79,16 +74,14 @@ jobs:
- name: Parse results
run: |
python3 ./ci/ntt_benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/ntt_benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database concrete_ntt \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--name-suffix avx512
env:
REF_NAME: ${{ github.ref_name }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -107,11 +100,19 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
echo "Computing HMac on downloaded artifact"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -30,9 +30,6 @@ env:
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
BENCH_TYPE: ${{ inputs.bench_type || 'latency' }}
permissions: {}
jobs:
should-run:
runs-on: ubuntu-latest
@@ -45,8 +42,6 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Check for file changes
id: changed-files
@@ -119,24 +114,22 @@ jobs:
- name: Run benchmarks
run: |
make BENCH_TYPE="${BENCH_TYPE}" bench_tfhe_zk_pok
make BENCH_TYPE=${{ env.BENCH_TYPE }} bench_tfhe_zk_pok
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--crate tfhe-zk-pok \
--hardware "hpc7a.96xlarge" \
--backend cpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--bench-type "${BENCH_TYPE}"
env:
REF_NAME: ${{ github.ref_name }}
--bench-type ${{ env.BENCH_TYPE }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
@@ -155,11 +148,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -21,9 +21,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
should-run:
runs-on: ubuntu-latest
@@ -146,17 +143,15 @@ jobs:
- name: Parse results
run: |
make parse_wasm_benchmarks
python3 ./ci/benchmark_parser.py tfhe-benchmark/wasm_pk_gen.csv "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py tfhe/wasm_pk_gen.csv ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "m6i.4xlarge" \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--key-gen
rm tfhe-benchmark/wasm_pk_gen.csv
env:
REF_NAME: ${{ github.ref_name }}
rm tfhe/wasm_pk_gen.csv
# Run these benchmarks only once
- name: Measure public key and ciphertext sizes in HL Api
@@ -167,7 +162,7 @@ jobs:
- name: Parse key and ciphertext sizes results
if: matrix.browser == 'chrome'
run: |
python3 ./ci/benchmark_parser.py tfhe-benchmark/hlapi_cpk_and_cctl_sizes.csv "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py tfhe/hlapi_cpk_and_cctl_sizes.csv ${{ env.RESULTS_FILENAME }} \
--key-gen \
--append-results
@@ -188,11 +183,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -31,9 +31,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
should-run:
runs-on: ubuntu-latest
@@ -77,13 +74,11 @@ jobs:
- name: Set benchmark types
if: github.event_name == 'workflow_dispatch'
run: |
if [[ "${INPUTS_BENCH_TYPE}" == "both" ]]; then
if [[ "${{ inputs.bench_type }}" == "both" ]]; then
echo "BENCH_TYPE=[\"latency\", \"throughput\"]" >> "${GITHUB_ENV}"
else
echo "BENCH_TYPE=[\"${INPUTS_BENCH_TYPE}\"]" >> "${GITHUB_ENV}"
echo "BENCH_TYPE=[\"${{ inputs.bench_type }}\"]" >> "${GITHUB_ENV}"
fi
env:
INPUTS_BENCH_TYPE: ${{ inputs.bench_type }}
- name: Default benchmark type
if: github.event_name != 'workflow_dispatch'
@@ -161,30 +156,25 @@ jobs:
- name: Run benchmarks with AVX512
run: |
make BENCH_TYPE="${BENCH_TYPE}" bench_integer_zk
env:
BENCH_TYPE: ${{ matrix.bench_type }}
make BENCH_TYPE=${{ matrix.bench_type }} bench_integer_zk
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--backend cpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--bench-type "${BENCH_TYPE}"
env:
REF_NAME: ${{ github.ref_name }}
BENCH_TYPE: ${{ matrix.bench_type }}
--bench-type ${{ matrix.bench_type }}
- name: Parse CRS sizes results
run: |
python3 ./ci/benchmark_parser.py tfhe-benchmark/pke_zk_crs_sizes.csv "${RESULTS_FILENAME}" \
python3 ./ci/benchmark_parser.py tfhe/pke_zk_crs_sizes.csv ${{ env.RESULTS_FILENAME }} \
--object-sizes \
--append-results
@@ -205,11 +195,11 @@ jobs:
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -8,15 +8,11 @@ env:
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN || secrets.GITHUB_TOKEN }}
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
cargo-builds:
runs-on: ${{ matrix.os }}
@@ -30,9 +26,6 @@ jobs:
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install latest stable
uses: dtolnay/rust-toolchain@888c2e1ea69ab0d4330cbf0af1ecc7b68f368cc1
@@ -94,10 +87,5 @@ jobs:
run: |
make build_tfhe_coverage
- name: Run Hpu pcc checks
if: ${{ contains(matrix.os, 'ubuntu') }}
run: |
make pcc_hpu
# The wasm build check is a bit annoying to set-up here and is done during the tests in
# aws_tfhe_tests.yml

View File

@@ -6,15 +6,11 @@ on:
env:
CARGO_TERM_COLOR: always
CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN || secrets.GITHUB_TOKEN }}
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
cargo-builds-fft:
runs-on: ${{ matrix.runner_type }}
@@ -26,9 +22,6 @@ jobs:
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install Rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af

View File

@@ -6,15 +6,11 @@ on:
env:
CARGO_TERM_COLOR: always
CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN || secrets.GITHUB_TOKEN }}
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
cargo-builds-ntt:
runs-on: ${{ matrix.os }}
@@ -24,9 +20,6 @@ jobs:
fail-fast: false
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install Rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af

View File

@@ -16,9 +16,6 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -55,9 +52,6 @@ jobs:
fail-fast: false
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install Rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
@@ -85,9 +79,6 @@ jobs:
runner_type: [ ubuntu-latest, macos-latest, windows-latest ]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install Rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
@@ -109,9 +100,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Test node js
run: |

View File

@@ -16,9 +16,6 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -33,7 +30,7 @@ jobs:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@ed68ef82c095e0d48ec87eccea555d944a631a4c # v46.0.5
@@ -55,9 +52,6 @@ jobs:
fail-fast: false
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install Rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
@@ -80,9 +74,6 @@ jobs:
os: [ ubuntu-latest, macos-latest, windows-latest ]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install Rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af

View File

@@ -3,10 +3,6 @@ name: Check commit and PR compliance
on:
pull_request:
permissions:
contents: read
pull-requests: read # Permission needed to scan commits in a pull-request
jobs:
check-commit-pr:
name: Check commit and PR

View File

@@ -5,13 +5,9 @@ on:
pull_request:
env:
ACTIONLINT_VERSION: 1.7.7
ACTIONLINT_CHECKSUM: "023070a287cd8cccd71515fedc843f1985bf96c436b7effaecce67290e7e0757"
ACTIONLINT_VERSION: 1.6.27
CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN || secrets.GITHUB_TOKEN }}
permissions:
contents: read
jobs:
lint-check:
name: Lint and checks
@@ -25,20 +21,15 @@ jobs:
- name: Get actionlint
run: |
wget "https://github.com/rhysd/actionlint/releases/download/v${{ env.ACTIONLINT_VERSION }}/actionlint_${{ env.ACTIONLINT_VERSION }}_linux_amd64.tar.gz"
echo "${{ env.ACTIONLINT_CHECKSUM }} actionlint_${{ env.ACTIONLINT_VERSION }}_linux_amd64.tar.gz" > checksum
bash <(curl https://raw.githubusercontent.com/rhysd/actionlint/main/scripts/download-actionlint.bash) ${{ env.ACTIONLINT_VERSION }}
echo "f2ee6d561ce00fa93aab62a7791c1a0396ec7e8876b2a8f2057475816c550782 actionlint" > checksum
sha256sum -c checksum
tar -xf actionlint_${{ env.ACTIONLINT_VERSION }}_linux_amd64.tar.gz actionlint
ln -s "$(pwd)/actionlint" /usr/local/bin/
- name: Lint workflows
run: |
make lint_workflow
- name: Check workflows security
run: |
make check_workflow_security
- name: Ensure SHA pinned actions
uses: zgosalvez/github-actions-ensure-sha-pinned-actions@4830be28ce81da52ec70d65c552a7403821d98d4 # v3.0.23
with:

View File

@@ -10,16 +10,12 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN || secrets.GITHUB_TOKEN }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
# Code coverage workflow is only run via workflow_dispatch event since execution duration is not stabilized yet.
permissions:
contents: read
jobs:
setup-instance:
name: Setup instance (code-coverage)
@@ -49,9 +45,6 @@ jobs:
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install latest stable
uses: dtolnay/rust-toolchain@888c2e1ea69ab0d4330cbf0af1ecc7b68f368cc1
@@ -90,7 +83,7 @@ jobs:
make test_shortint_cov
- name: Upload tfhe coverage to Codecov
uses: codecov/codecov-action@ad3126e916f78f00edff4ed0317cf185271ccc2d
uses: codecov/codecov-action@0565863a31f2c772f9f0395002a31e3f06189574
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
with:
token: ${{ secrets.CODECOV_TOKEN }}
@@ -104,7 +97,7 @@ jobs:
make test_integer_cov
- name: Upload tfhe coverage to Codecov
uses: codecov/codecov-action@ad3126e916f78f00edff4ed0317cf185271ccc2d
uses: codecov/codecov-action@0565863a31f2c772f9f0395002a31e3f06189574
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
with:
token: ${{ secrets.CODECOV_TOKEN }}
@@ -113,7 +106,7 @@ jobs:
files: integer/cobertura.xml
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -21,9 +21,6 @@ on:
pull_request:
types: [ labeled ]
permissions:
contents: read
jobs:
setup-instance:
name: Setup instance (csprng-randomness-tests)
@@ -49,7 +46,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
csprng-randomness-tests:
name: CSPRNG randomness tests
@@ -75,7 +72,7 @@ jobs:
make dieharder_csprng
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -25,9 +25,6 @@ on:
# the script will always return 0 because of the "echo EOF".
permissions: {}
jobs:
auto_close_job:
if: ${{ contains(github.event.pull_request.labels.*.name, 'data_PR') }}
@@ -42,17 +39,14 @@ jobs:
curl --fail-with-body --no-progress-meter -L -X GET \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"${TARGET_REPO_API_URL}"/pulls\?head="${REPO_OWNER}":"${PR_BRANCH}" | jq -e '.[0]' | sed 's/null/{ "message": "corresponding PR not found" }/'
${{ env.TARGET_REPO_API_URL }}/pulls\?head=${{ github.repository_owner }}:${{ env.PR_BRANCH }} | jq -e '.[0]' | sed 's/null/{ "message": "corresponding PR not found" }/'
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
env:
REPO_OWNER: ${{ github.repository_owner }}
- name: Comment on the PR to indicate the reason of the close
run: |
BODY="'{ \"body\": \"PR ${CLOSE_TYPE}d because the corresponding PR in main repo was ${CLOSE_TYPE}d: ${REPO}#${EVENT_NUMBER}\" }'"
{
set +e
set -o pipefail
@@ -61,16 +55,12 @@ jobs:
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.FHE_ACTIONS_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"${COMMENTS_URL}" \
-d "${BODY}"
${{ fromJson(env.TARGET_REPO_PR).comments_url }} \
-d '{ "body": "PR ${{ env.CLOSE_TYPE }}d because the corresponding PR in main repo was ${{ env.CLOSE_TYPE }}d: ${{ github.repository }}#${{ github.event.number }}" }'
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
env:
REPO: ${{ github.repository }}
EVENT_NUMBER: ${{ github.event.number }}
COMMENTS_URL: ${{ fromJson(env.TARGET_REPO_PR).comments_url }}
- name: Merge the Pull Request in the data repo
if: ${{ github.event.pull_request.merged }}
@@ -83,14 +73,12 @@ jobs:
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.FHE_ACTIONS_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"${TARGET_REPO_PR_URL}"/merge \
${{ fromJson(env.TARGET_REPO_PR).url }}/merge \
-d '{ "merge_method": "rebase" }'
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
env:
TARGET_REPO_PR_URL: ${{ fromJson(env.TARGET_REPO_PR).url }}
- name: Close the Pull Request in the data repo
if: ${{ !github.event.pull_request.merged }}
@@ -103,14 +91,12 @@ jobs:
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.FHE_ACTIONS_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"${TARGET_REPO_PR_URL}" \
${{ fromJson(env.TARGET_REPO_PR).url }} \
-d '{ "state": "closed" }'
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
env:
TARGET_REPO_PR_URL: ${{ fromJson(env.TARGET_REPO_PR).url }}
- name: Delete the associated branch in the data repo
run: |
@@ -122,7 +108,7 @@ jobs:
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.FHE_ACTIONS_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"${TARGET_REPO_API_URL}"/git/refs/heads/"${PR_BRANCH}"
${{ env.TARGET_REPO_API_URL }}/git/refs/heads/${{ env.PR_BRANCH }}
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"

View File

@@ -22,9 +22,6 @@ on:
# Nightly tests @ 1AM after each work day
- cron: "0 1 * * MON-FRI"
permissions:
contents: read
jobs:
cuda-tests-linux:
name: CUDA tests (RTX 4090)
@@ -80,7 +77,7 @@ jobs:
github_token: ${{ secrets.GITHUB_TOKEN }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -25,9 +25,6 @@ on:
pull_request:
types: [ labeled ]
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -105,7 +102,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-tests-linux:
name: CUDA H100 tests
@@ -172,9 +169,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -24,9 +24,6 @@ on:
workflow_dispatch:
pull_request:
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -90,7 +87,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-tests-linux:
name: CUDA tests
@@ -156,9 +153,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -15,9 +15,6 @@ env:
on:
workflow_dispatch:
permissions: {}
jobs:
setup-instance:
name: Setup instance (cuda-h100-tests)
@@ -65,6 +62,18 @@ jobs:
cuda: "12.2"
gcc: 11
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev libclang-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:

View File

@@ -25,9 +25,6 @@ on:
pull_request:
types: [ labeled ]
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -92,7 +89,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-tests-linux:
name: CUDA multi-GPU tests
@@ -161,9 +158,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -10,7 +10,6 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN || secrets.GITHUB_TOKEN }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -19,9 +18,6 @@ on:
# Nightly tests will be triggered each evening 8p.m.
- cron: "0 20 * * *"
permissions:
contents: read
jobs:
setup-instance:
name: Setup instance (gpu-tests)
@@ -61,9 +57,6 @@ jobs:
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Setup Hyperstack dependencies
uses: ./.github/actions/gpu_setup

View File

@@ -17,15 +17,10 @@ env:
# Secrets will be available only to zama-ai organization members
SECRETS_AVAILABLE: ${{ secrets.JOB_SECRET != '' }}
EXTERNAL_CONTRIBUTION_RUNNER: "large_ubuntu_16-22.04"
CUDA_KEYRING_PACKAGE: cuda-keyring_1.1-1_all.deb
CUDA_KEYRING_SHA: "d93190d50b98ad4699ff40f4f7af50f16a76dac3bb8da1eaaf366d47898ff8df"
on:
pull_request:
permissions:
contents: read
jobs:
setup-instance:
name: Setup instance (cuda-pcc)
@@ -50,7 +45,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-pcc:
name: CUDA post-commit checks
@@ -82,10 +77,8 @@ jobs:
shell: bash
run: |
TOOLKIT_VERSION="$(echo ${{ matrix.cuda }} | sed 's/\(.*\)\.\(.*\)/\1-\2/')"
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/"${CUDA_KEYRING_PACKAGE}"
echo "${CUDA_KEYRING_SHA} ${CUDA_KEYRING_PACKAGE}" > checksum
sha256sum -c checksum
sudo dpkg -i "${CUDA_KEYRING_PACKAGE}"
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt -y install "cuda-toolkit-${TOOLKIT_VERSION}" cmake-format
@@ -123,9 +116,7 @@ jobs:
- name: Set pull-request URL
if: ${{ failure() && github.event_name == 'pull_request' }}
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Slack Notification
if: ${{ failure() && env.SECRETS_AVAILABLE == 'true' }}

View File

@@ -25,9 +25,6 @@ on:
pull_request:
types: [ labeled ]
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -92,7 +89,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-tests-linux:
name: CUDA signed integer tests with classical PBS
@@ -144,9 +141,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -25,8 +25,6 @@ on:
pull_request:
types: [ labeled ]
permissions:
contents: read
jobs:
should-run:
@@ -105,7 +103,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-tests-linux:
name: CUDA H100 signed integer tests
@@ -158,9 +156,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -29,9 +29,6 @@ on:
# Nightly tests @ 1AM after each work day
- cron: "0 1 * * MON-FRI"
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -96,7 +93,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-signed-integer-tests:
name: CUDA signed integer tests
@@ -156,9 +153,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -25,8 +25,6 @@ on:
pull_request:
types: [ labeled ]
permissions:
contents: read
jobs:
should-run:
@@ -92,7 +90,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-tests-linux:
name: CUDA unsigned integer tests with classical PBS
@@ -144,9 +142,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -25,9 +25,6 @@ on:
pull_request:
types: [ labeled ]
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -105,7 +102,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-tests-linux:
name: CUDA H100 unsigned integer tests
@@ -158,9 +155,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -29,9 +29,6 @@ on:
# Nightly tests @ 1AM after each work day
- cron: "0 1 * * MON-FRI"
permissions:
contents: read
jobs:
should-run:
runs-on: ubuntu-latest
@@ -96,7 +93,7 @@ jobs:
id: start-github-instance
if: env.SECRETS_AVAILABLE == 'false'
run: |
echo "runner_group=${EXTERNAL_CONTRIBUTION_RUNNER}" >> "$GITHUB_OUTPUT"
echo "runner_group=${{ env.EXTERNAL_CONTRIBUTION_RUNNER }}" >> "$GITHUB_OUTPUT"
cuda-unsigned-integer-tests:
name: CUDA unsigned integer tests
@@ -156,9 +153,7 @@ jobs:
- name: Set pull-request URL
if: env.SECRETS_AVAILABLE == 'true' && github.event_name == 'pull_request'
run: |
echo "PULL_REQUEST_MD_LINK=[pull-request](${PR_BASE_URL}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
env:
PR_BASE_URL: ${{ vars.PR_BASE_URL }}
echo "PULL_REQUEST_MD_LINK=[pull-request](${{ vars.PR_BASE_URL }}${{ github.event.pull_request.number }}), " >> "${GITHUB_ENV}"
- name: Send message
if: env.SECRETS_AVAILABLE == 'true'

View File

@@ -1,73 +0,0 @@
# Test tfhe-fft
name: Cargo Test HLAPI HPU
on:
pull_request:
push:
branches:
- main
env:
CARGO_TERM_COLOR: always
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN || secrets.GITHUB_TOKEN }}
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
permissions: { }
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: read
outputs:
hpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.hpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@ed68ef82c095e0d48ec87eccea555d944a631a4c # v46.0.5
with:
files_yaml: |
hpu:
- tfhe/Cargo.toml
- Makefile
- backends/tfhe-hpu-backend/**
- mockups/tfhe-hpu-mockup/**
cargo-tests-hpu:
needs: should-run
if: needs.should-run.outputs.hpu_test == 'true'
runs-on: large_ubuntu_16
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install Rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
with:
toolchain: stable
override: true
- name: Install Just
run: |
cargo install just
- name: Test HLAPI HPU
run: |
source setup_hpu.sh
just -f mockups/tfhe-hpu-mockup/Justfile BUILD_PROFILE=release mockup &
make HPU_CONFIG=sim test_high_level_api_hpu

View File

@@ -18,9 +18,6 @@ on:
# Weekly tests will be triggered each Friday at 9p.m.
- cron: "0 21 * * 5"
permissions: {}
jobs:
setup-instance:
name: Setup instance (cpu-tests)
@@ -66,7 +63,7 @@ jobs:
make test_integer_long_run
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661
env:

View File

@@ -27,9 +27,6 @@ concurrency:
group: ${{ github.workflow_ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
cargo-builds-m1:
if: ${{ (github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||

View File

@@ -28,12 +28,6 @@ on:
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
NPM_TAG: ""
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
verify_tag:
@@ -100,7 +94,7 @@ jobs:
run: |
echo "NPM_TAG=latest" >> "${GITHUB_ENV}"
- name: Download artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
uses: actions/download-artifact@95815c38cf2ff2164869cbab79da8d1f422bc89e # v4.2.1
with:
name: crate
path: target/package
@@ -110,10 +104,7 @@ jobs:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
# DRY_RUN expansion cannot be double quoted when variable contains empty string otherwise cargo publish
# would fail. This is safe since DRY_RUN is handled in the env section above.
# shellcheck disable=SC2086
cargo publish -p tfhe --token "${CRATES_TOKEN}" ${DRY_RUN}
cargo publish -p tfhe --token ${{ env.CRATES_TOKEN }} ${{ env.DRY_RUN }}
- name: Generate hash
id: published_hash
@@ -125,7 +116,11 @@ jobs:
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: failure
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "SLSA tfhe crate - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
- name: Build web package
if: ${{ inputs.push_web_package }}
@@ -161,9 +156,13 @@ jobs:
provenance: true
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "tfhe release failed: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -15,8 +15,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
verify_tag:
uses: ./.github/workflows/verify_tagged_commit.yml
@@ -159,10 +157,7 @@ jobs:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
# DRY_RUN expansion cannot be double quoted when variable contains empty string otherwise cargo publish
# would fail. This is safe since DRY_RUN is handled in the env section above.
# shellcheck disable=SC2086
cargo publish -p tfhe-cuda-backend --token "${CRATES_TOKEN}" ${DRY_RUN}
cargo publish -p tfhe-cuda-backend --token ${{ env.CRATES_TOKEN }} ${{ env.DRY_RUN }}
- name: Generate hash
id: published_hash
@@ -174,10 +169,14 @@ jobs:
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: failure
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "SLSA tfhe-cuda-backend crate - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:

View File

@@ -1,105 +0,0 @@
name: Publish HPU release
on:
workflow_dispatch:
inputs:
dry_run:
description: "Dry-run"
type: boolean
default: true
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
verify_tag:
uses: ./.github/workflows/verify_tagged_commit.yml
secrets:
RELEASE_TEAM: ${{ secrets.RELEASE_TEAM }}
READ_ORG_TOKEN: ${{ secrets.READ_ORG_TOKEN }}
package:
runs-on: ubuntu-latest
needs: verify_tag
outputs:
hash: ${{ steps.hash.outputs.hash }}
steps:
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Prepare package
run: |
cargo package -p tfhe-hpu-backend
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: crate
path: target/package/*.crate
- name: generate hash
id: hash
run: cd target/package && echo "hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
provenance:
if: ${{ !inputs.dry_run }}
needs: [package]
uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0
permissions:
# Needed to detect the GitHub Actions environment
actions: read
# Needed to create the provenance via GitHub OIDC
id-token: write
# Needed to upload assets/artifacts
contents: write
with:
# SHA-256 hashes of the Crate package.
base64-subjects: ${{ needs.package.outputs.hash }}
publish_release:
name: Publish tfhe-hpu-backend Release
runs-on: ubuntu-latest
needs: [verify_tag, package] # for comparing hashes
steps:
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Publish crate.io package
env:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
# DRY_RUN expansion cannot be double quoted when variable contains empty string otherwise cargo publish
# would fail. This is safe since DRY_RUN is handled in the env section above.
# shellcheck disable=SC2086
cargo publish -p tfhe-hpu-backend --token "${CRATES_TOKEN}" ${DRY_RUN}
- name: Generate hash
id: published_hash
run: cd target/package && echo "pub_hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
- name: Slack notification (hashes comparison)
if: ${{ needs.package.outputs.hash != steps.published_hash.outputs.pub_hash }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: failure
SLACK_MESSAGE: "SLSA tfhe-hpu-backend crate - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "tfhe-hpu-backend release failed: (${{ env.ACTION_RUN_URL }})"

View File

@@ -10,12 +10,6 @@ on:
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
verify_tag:
@@ -33,8 +27,6 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Prepare package
run: |
cargo package -p tfhe-csprng
@@ -72,10 +64,9 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Download artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
uses: actions/download-artifact@95815c38cf2ff2164869cbab79da8d1f422bc89e # v4.2.1
with:
name: crate-tfhe-csprng
path: target/package
@@ -84,10 +75,7 @@ jobs:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
# DRY_RUN expansion cannot be double quoted when variable contains empty string otherwise cargo publish
# would fail. This is safe since DRY_RUN is handled in the env section above.
# shellcheck disable=SC2086
cargo publish -p tfhe-csprng --token "${CRATES_TOKEN}" ${DRY_RUN}
cargo publish -p tfhe-csprng --token ${{ env.CRATES_TOKEN }} ${{ env.DRY_RUN }}
- name: Generate hash
id: published_hash
run: cd target/package && echo "pub_hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
@@ -97,11 +85,19 @@ jobs:
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: failure
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "SLSA tfhe-csprng - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "tfhe-csprng release finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -11,12 +11,6 @@ on:
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
verify_tag:
@@ -35,8 +29,7 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Prepare package
run: |
cargo package -p tfhe-fft
@@ -72,18 +65,14 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Publish crate.io package
env:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
# DRY_RUN expansion cannot be double quoted when variable contains empty string otherwise cargo publish
# would fail. This is safe since DRY_RUN is handled in the env section above.
# shellcheck disable=SC2086
cargo publish -p tfhe-fft --token "${CRATES_TOKEN}" ${DRY_RUN}
cargo publish -p tfhe-fft --token ${{ env.CRATES_TOKEN }} ${{ env.DRY_RUN }}
- name: Generate hash
id: published_hash
@@ -95,12 +84,20 @@ jobs:
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: failure
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "SLSA tfhe-fft crate - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "tfhe-fft release failed: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -11,12 +11,6 @@ on:
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
verify_tag:
@@ -35,8 +29,7 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Prepare package
run: |
cargo package -p tfhe-ntt
@@ -72,18 +65,13 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Publish crate.io package
env:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
# DRY_RUN expansion cannot be double quoted when variable contains empty string otherwise cargo publish
# would fail. This is safe since DRY_RUN is handled in the env section above.
# shellcheck disable=SC2086
cargo publish -p tfhe-ntt --token "${CRATES_TOKEN}" ${DRY_RUN}
cargo publish -p tfhe-ntt --token ${{ env.CRATES_TOKEN }} ${{ env.DRY_RUN }}
- name: Generate hash
id: published_hash
@@ -95,12 +83,20 @@ jobs:
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: failure
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "SLSA tfhe-ntt crate - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "tfhe-ntt release failed: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -10,8 +10,6 @@ env:
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
verify_tag:
uses: ./.github/workflows/verify_tagged_commit.yml
@@ -29,8 +27,6 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Prepare package
run: |
cargo package -p tfhe-versionable-derive
@@ -68,7 +64,7 @@ jobs:
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Download artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
uses: actions/download-artifact@95815c38cf2ff2164869cbab79da8d1f422bc89e # v4.2.1
with:
name: crate-tfhe-versionable-derive
path: target/package
@@ -76,7 +72,7 @@ jobs:
env:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
run: |
cargo publish -p tfhe-versionable-derive --token "${CRATES_TOKEN}"
cargo publish -p tfhe-versionable-derive --token ${{ env.CRATES_TOKEN }}
- name: Generate hash
id: published_hash
run: cd target/package && echo "pub_hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
@@ -88,7 +84,7 @@ jobs:
SLACK_COLOR: failure
SLACK_MESSAGE: "SLSA tfhe-versionable-derive - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
@@ -106,8 +102,6 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Prepare package
run: |
cargo package -p tfhe-versionable
@@ -142,10 +136,8 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Download artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
uses: actions/download-artifact@95815c38cf2ff2164869cbab79da8d1f422bc89e # v4.2.1
with:
name: crate-tfhe-versionable
path: target/package
@@ -153,7 +145,7 @@ jobs:
env:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
run: |
cargo publish -p tfhe-versionable --token "${CRATES_TOKEN}"
cargo publish -p tfhe-versionable --token ${{ env.CRATES_TOKEN }}
- name: Generate hash
id: published_hash
run: cd target/package && echo "pub_hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
@@ -165,7 +157,7 @@ jobs:
SLACK_COLOR: failure
SLACK_MESSAGE: "SLSA tfhe-versionable - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:

View File

@@ -10,12 +10,6 @@ on:
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
package:
@@ -27,8 +21,6 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Prepare package
run: |
cargo package -p tfhe-zk-pok
@@ -72,7 +64,7 @@ jobs:
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Download artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
uses: actions/download-artifact@95815c38cf2ff2164869cbab79da8d1f422bc89e # v4.2.1
with:
name: crate-zk-pok
path: target/package
@@ -81,10 +73,7 @@ jobs:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
# DRY_RUN expansion cannot be double quoted when variable contains empty string otherwise cargo publish
# would fail. This is safe since DRY_RUN is handled in the env section above.
# shellcheck disable=SC2086
cargo publish -p tfhe-zk-pok --token "${CRATES_TOKEN}" ${DRY_RUN}
cargo publish -p tfhe-zk-pok --token ${{ env.CRATES_TOKEN }} ${{ env.DRY_RUN }}
- name: Verify hash
id: published_hash
run: cd target/package && echo "pub_hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
@@ -94,11 +83,19 @@ jobs:
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: failure
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "SLSA tfhe-zk-pok crate - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "tfhe-zk-pok release failed: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -12,17 +12,12 @@ on:
- "main"
workflow_dispatch:
permissions: {}
jobs:
params-curves-security-check:
runs-on: large_ubuntu_16-22.04
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Checkout lattice-estimator
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
@@ -30,7 +25,6 @@ jobs:
repository: malb/lattice-estimator
path: lattice_estimator
ref: 'e80ec6bbbba212428b0e92d0467c18629cf9ed67'
persist-credentials: 'false'
- name: Install Sage
run: |

View File

@@ -1,16 +1,84 @@
# Placeholder workflow file allowing running it without having to merge to main first
# Run all integer benchmarks on a permanent HPU instance and return parsed results to Slab CI bot.
name: Placeholder Workflow
on:
workflow_dispatch:
permissions: {}
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
placeholder:
name: Placeholder
runs-on: ubuntu-latest
integer-benchmarks-hpu:
name: Execute integer & erc20 benchmarks for HPU backend
runs-on: v80-desktop
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
timeout-minutes: 1440 # 24 hours
steps:
- run: |
echo "Hello this is a Placeholder Workflow"
# Needed as long as hw_regmap repository is private
- name: Configure SSH
uses: webfactory/ssh-agent@a6f90b1f127823b31d4d4a8d96047790581349bd # v0.9.1
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@a54c7afa936fefeb4456b2dd8068152669aa8203
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
repository: zama-ai/slab
path: slab
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Run benchmarks
run: |
make bench_integer_hpu
make bench_hlapi_erc20_hpu
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpu_x1" \
--backend hpu \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs
- name: Upload parsed results artifact
uses: actions/upload-artifact@65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08
with:
name: ${{ github.sha }}_integer_benchmarks
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py ${{ env.RESULTS_FILENAME }} "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"

View File

@@ -7,8 +7,6 @@ on:
- 'main'
workflow_dispatch:
permissions: {}
jobs:
sync-repo:
if: ${{ github.repository == 'zama-ai/tfhe-rs' }}

View File

@@ -9,8 +9,6 @@ on:
READ_ORG_TOKEN:
required: true
permissions: {}
jobs:
checks:
runs-on: ubuntu-latest
@@ -28,10 +26,7 @@ jobs:
- name: Actor authorized
run: |
if [ "${ACTOR_CHECK_OUTPUT}" == "false" ]; then
echo "Actor '${TRIGGERING_ACTOR}' is not authorized to perform release"
if [ "${{ steps.actor_check.outputs.authorized }}" == "false" ]; then
echo "Actor '${{ github.triggering_actor }}' is not authorized to perform release"
exit 1
fi
env:
TRIGGERING_ACTOR: ${{ github.triggering_actor }}
ACTOR_CHECK_OUTPUT: ${{ steps.actor_check.outputs.authorized }}

View File

@@ -1,12 +0,0 @@
# Specifying a path without code owners means that path won't have owners and is akin to a negation
# i.e. the `core_crypto` dir is owned and needs owner approval/review, but not the `gpu` sub dir
# See https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners#example-of-a-codeowners-file
/tfhe/src/core_crypto/ @IceTDrinker
/tfhe/src/core_crypto/gpu
/tfhe/src/shortint/ @mayeul-zama
/tfhe/src/integer/ @tmontaigu
/tfhe/src/integer/gpu
/tfhe/src/high_level_api/ @tmontaigu

View File

@@ -102,7 +102,8 @@ For example, if you made changes in `tfhe/src/integer/*`, you can test them with
## 4. Committing
**TFHE-rs** follows the conventional commit specification to maintain a consistent commit history, essential for Semantic Versioning ([semver.org](https://semver.org/)).
Commit messages are automatically checked in CI and will be rejected if they do not comply, so make sure that you follow the commit conventions detailed on [this page](https://www.conventionalcommits.org/en/v1.0.0/).
Commit messages are automatically checked in CI and will be rejected if they do not comply, so make sure that you follow the commit conventions detailed on [this page]
(https://www.conventionalcommits.org/en/v1.0.0/).
## 5. Rebasing
@@ -144,15 +145,12 @@ sequenceDiagram
Reviewer -->> GitHub: Merge if pipeline green
```
{% hint style="info" %}
## Useful details:
- pipeline is triggered by humans
- review team is located in Paris timezone, pipeline launch will most likely happen during office hours
- direct changes to CI related files are not allowed for external contributors
- run `make pcc` to fix any build errors before pushing commits
{% endhint %}
> [!Note]
>Useful details:
>* pipeline is triggered by humans
>* review team is located in Paris timezone, pipeline launch will most likely happen during office hours
>* direct changes to CI related files are not allowed for external contributors
>* run `make pcc` to fix any build errors before pushing commits
## 8. Data versioning

View File

@@ -2,7 +2,6 @@
resolver = "2"
members = [
"tfhe",
"tfhe-benchmark",
"tfhe-fft",
"tfhe-ntt",
"tfhe-zk-pok",
@@ -12,9 +11,8 @@ members = [
"backends/tfhe-hpu-backend",
"utils/tfhe-versionable",
"utils/tfhe-versionable-derive",
"utils/param_dedup",
"tests",
"mockups/tfhe-hpu-mockup",
"tests",
]
exclude = [

169
Makefile
View File

@@ -167,12 +167,6 @@ install_typos_checker: install_rs_build_toolchain
cargo $(CARGO_RS_BUILD_TOOLCHAIN) install typos-cli || \
( echo "Unable to install typos-cli, unknown error." && exit 1 )
.PHONY: install_zizmor # Install zizmor workflow security checker
install_zizmor: install_rs_build_toolchain
@zizmor --version > /dev/null 2>&1 || \
cargo $(CARGO_RS_BUILD_TOOLCHAIN) install zizmor || \
( echo "Unable to install zizmor, unknown error." && exit 1 )
.PHONY: setup_venv # Setup Python virtualenv for wasm tests
setup_venv:
python3 -m venv venv
@@ -324,10 +318,6 @@ check_newline: check_linelint_installed
lint_workflow: check_actionlint_installed
actionlint
.PHONY: check_workflow_security # Run zizmor security checker on GitHub workflows
check_workflow_security: install_zizmor
zizmor --persona pedantic .
.PHONY: clippy_core # Run clippy lints on core_crypto with and without experimental features
clippy_core: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
@@ -465,15 +455,10 @@ clippy_tfhe_lints: install_cargo_dylint # the toolchain is selected with toolcha
rustup toolchain install && \
cargo clippy --all-targets -- --no-deps -D warnings
.PHONY: clippy_param_dedup # Run clippy lints on param_dedup tool
clippy_param_dedup: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
-p param_dedup -- --no-deps -D warnings
.PHONY: clippy_all # Run all clippy targets
clippy_all: clippy_rustdoc clippy clippy_boolean clippy_shortint clippy_integer clippy_all_targets \
clippy_c_api clippy_js_wasm_api clippy_tasks clippy_core clippy_tfhe_csprng clippy_zk_pok clippy_trivium \
clippy_versionable clippy_tfhe_lints clippy_ws_tests clippy_bench clippy_param_dedup
clippy_versionable clippy_tfhe_lints clippy_ws_tests
.PHONY: clippy_fast # Run main clippy targets
clippy_fast: clippy_rustdoc clippy clippy_all_targets clippy_c_api clippy_js_wasm_api clippy_tasks \
@@ -726,18 +711,9 @@ test_integer_hpu_ci: install_rs_check_toolchain install_cargo_nextest
test_integer_hpu_mockup_ci: install_rs_check_toolchain install_cargo_nextest
source ./setup_hpu.sh --config sim ; \
cargo build --release --bin hpu_mockup; \
coproc target/release/hpu_mockup --params mockups/tfhe-hpu-mockup/params/tuniform_64b_pfail64_psi64.toml > mockup.log; \
coproc target/release/hpu_mockup --params mockups/tfhe-hpu-mockup/params/gaussian_64b_pfail64_psi64.toml > mockup.log; \
HPU_TEST_ITER=1 \
cargo test --profile devo -p $(TFHE_SPEC) --features hpu --test hpu -- u32 && \
kill %1
.PHONY: test_integer_hpu_mockup_ci_fast # Run the quick tests for integer ci on hpu backend and mockup.
test_integer_hpu_mockup_ci_fast: install_rs_check_toolchain install_cargo_nextest
source ./setup_hpu.sh --config sim ; \
cargo build --profile devo --bin hpu_mockup; \
coproc target/devo/hpu_mockup --params mockups/tfhe-hpu-mockup/params/tuniform_64b_fast.toml > mockup.log; \
HPU_TEST_ITER=1 \
cargo test --profile devo -p $(TFHE_SPEC) --features hpu --test hpu -- u32 && \
cargo test --profile devo -p $(TFHE_SPEC) --features hpu --test hpu -- alu_u32 && \
kill %1
.PHONY: test_boolean # Run the tests of the boolean module
@@ -896,19 +872,11 @@ test_high_level_api_gpu: install_rs_build_toolchain install_cargo_nextest
-E "test(/high_level_api::.*gpu.*/)"
test_high_level_api_hpu: install_rs_build_toolchain install_cargo_nextest
ifeq ($(HPU_CONFIG), v80)
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) nextest run --cargo-profile $(CARGO_PROFILE) \
--build-jobs=$(CARGO_BUILD_JOBS) \
--test-threads=1 \
--features=integer,internal-keycache,hpu,hpu-v80 -p $(TFHE_SPEC) \
-E "test(/high_level_api::.*hpu.*/)"
else
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) nextest run --cargo-profile $(CARGO_PROFILE) \
--build-jobs=$(CARGO_BUILD_JOBS) \
--test-threads=1 \
--features=integer,internal-keycache,hpu -p $(TFHE_SPEC) \
-E "test(/high_level_api::.*hpu.*/)"
endif
.PHONY: test_strings # Run the tests for strings ci
@@ -1028,8 +996,8 @@ lint_doc: install_rs_check_toolchain
lint_docs: lint_doc
.PHONY: format_doc_latex # Format the documentation latex equations to avoid broken rendering.
format_doc_latex: install_rs_build_toolchain
RUSTFLAGS="" cargo "$(CARGO_RS_BUILD_TOOLCHAIN)" xtask format_latex_doc
format_doc_latex:
RUSTFLAGS="" cargo xtask format_latex_doc
@"$(MAKE)" --no-print-directory fmt
@printf "\n===============================\n\n"
@printf "Please manually inspect changes made by format_latex_doc, rustfmt can break equations \
@@ -1037,8 +1005,8 @@ format_doc_latex: install_rs_build_toolchain
@printf "\n===============================\n"
.PHONY: check_md_docs_are_tested # Checks that the rust codeblocks in our .md files are tested
check_md_docs_are_tested: install_rs_build_toolchain
RUSTFLAGS="" cargo "$(CARGO_RS_BUILD_TOOLCHAIN)" xtask check_tfhe_docs_are_tested
check_md_docs_are_tested:
RUSTFLAGS="" cargo xtask check_tfhe_docs_are_tested
.PHONY: check_intra_md_links # Checks broken internal links in Markdown docs
check_intra_md_links: install_mlc
@@ -1142,24 +1110,6 @@ dieharder_csprng: install_dieharder build_tfhe_csprng
# Benchmarks
#
.PHONY: clippy_bench # Run clippy lints on tfhe-benchmark
clippy_bench: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=boolean,shortint,integer,internal-keycache,nightly-avx512,pbs-stats,zk-pok \
-p tfhe-benchmark -- --no-deps -D warnings
.PHONY: clippy_bench_gpu # Run clippy lints on tfhe-benchmark
clippy_bench_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=gpu,shortint,integer,internal-keycache,nightly-avx512,pbs-stats,zk-pok \
-p tfhe-benchmark -- --no-deps -D warnings
.PHONY: clippy_bench_hpu # Run clippy lints on tfhe-benchmark
clippy_bench_hpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=hpu,shortint,integer,internal-keycache,pbs-stats\
-p tfhe-benchmark -- --no-deps -D warnings
.PHONY: print_doc_bench_parameters # Print parameters used in doc benchmarks
print_doc_bench_parameters:
RUSTFLAGS="" cargo run --example print_doc_bench_parameters \
@@ -1170,28 +1120,28 @@ bench_integer: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_signed_integer # Run benchmarks for signed integer
bench_signed_integer: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-signed-bench \
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_integer_gpu # Run benchmarks for integer on GPU backend
bench_integer_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_signed_integer_gpu # Run benchmarks for signed integer on GPU backend
bench_signed_integer_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-signed-bench \
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_integer_hpu # Run benchmarks for integer on HPU backend
bench_integer_hpu: install_rs_check_toolchain
@@ -1199,28 +1149,28 @@ bench_integer_hpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=integer,internal-keycache,pbs-stats,hpu,hpu-v80 -p tfhe-benchmark -- --quick
--features=integer,internal-keycache,pbs-stats,hpu,hpu-v80 -p $(TFHE_SPEC) -- --quick
.PHONY: bench_integer_compression # Run benchmarks for unsigned integer compression
bench_integer_compression: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench glwe_packing_compression-integer-bench \
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_integer_compression_gpu
bench_integer_compression_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench glwe_packing_compression-integer-bench \
--features=integer,internal-keycache,gpu,pbs-stats -p tfhe-benchmark --
--features=integer,internal-keycache,gpu,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_integer_zk_gpu
bench_integer_zk_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench zk-pke-bench \
--features=integer,internal-keycache,gpu,pbs-stats,zk-pok -p tfhe-benchmark --
--features=integer,internal-keycache,gpu,pbs-stats,zk-pok -p $(TFHE_SPEC) --
.PHONY: bench_integer_multi_bit # Run benchmarks for unsigned integer using multi-bit parameters
bench_integer_multi_bit: install_rs_check_toolchain
@@ -1228,7 +1178,7 @@ bench_integer_multi_bit: install_rs_check_toolchain
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_signed_integer_multi_bit # Run benchmarks for signed integer using multi-bit parameters
bench_signed_integer_multi_bit: install_rs_check_toolchain
@@ -1236,7 +1186,7 @@ bench_signed_integer_multi_bit: install_rs_check_toolchain
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-signed-bench \
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_integer_multi_bit_gpu # Run benchmarks for integer on GPU backend using multi-bit parameters
bench_integer_multi_bit_gpu: install_rs_check_toolchain
@@ -1244,7 +1194,7 @@ bench_integer_multi_bit_gpu: install_rs_check_toolchain
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_signed_integer_multi_bit_gpu # Run benchmarks for signed integer on GPU backend using multi-bit parameters
bench_signed_integer_multi_bit_gpu: install_rs_check_toolchain
@@ -1252,7 +1202,7 @@ bench_signed_integer_multi_bit_gpu: install_rs_check_toolchain
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-signed-bench \
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p $(TFHE_SPEC) --
.PHONY: bench_integer_zk # Run benchmarks for integer encryption with ZK proofs
bench_integer_zk: install_rs_check_toolchain
@@ -1260,83 +1210,91 @@ bench_integer_zk: install_rs_check_toolchain
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench zk-pke-bench \
--features=integer,internal-keycache,zk-pok,nightly-avx512,pbs-stats \
-p tfhe-benchmark --
-p $(TFHE_SPEC) --
.PHONY: bench_shortint # Run benchmarks for shortint
bench_shortint: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_PARAMS_SET=$(BENCH_PARAMS_SET) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench shortint-bench \
--features=shortint,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_shortint_oprf # Run benchmarks for shortint
bench_shortint_oprf: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_PARAMS_SET=$(BENCH_PARAMS_SET) \
RUSTFLAGS="$(RUSTFLAGS)" \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench oprf-shortint-bench \
--features=shortint,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_shortint_multi_bit # Run benchmarks for shortint using multi-bit parameters
bench_shortint_multi_bit: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_PARAM_TYPE=MULTI_BIT \
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench shortint-bench \
--features=shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC) --
.PHONY: bench_boolean # Run benchmarks for boolean
bench_boolean: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench boolean-bench \
--features=boolean,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_ks # Run benchmarks for keyswitch
bench_ks: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_PARAM_TYPE=$(BENCH_PARAM_TYPE) __TFHE_RS_PARAMS_SET=$(BENCH_PARAMS_SET) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench ks-bench \
--features=boolean,shortint,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_ks_gpu # Run benchmarks for keyswitch on GPU backend
bench_ks_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_PARAM_TYPE=$(BENCH_PARAM_TYPE) __TFHE_RS_PARAMS_SET=$(BENCH_PARAMS_SET) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench ks-bench \
--features=boolean,shortint,gpu,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,shortint,gpu,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_pbs # Run benchmarks for PBS
bench_pbs: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_PARAM_TYPE=$(BENCH_PARAM_TYPE) __TFHE_RS_PARAMS_SET=$(BENCH_PARAMS_SET) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench pbs-bench \
--features=boolean,shortint,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_pbs_gpu # Run benchmarks for PBS on GPU backend
bench_pbs_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_PARAM_TYPE=$(BENCH_PARAM_TYPE) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_PARAMS_SET=$(BENCH_PARAMS_SET) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench pbs-bench \
--features=boolean,shortint,gpu,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,shortint,gpu,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_ks_pbs # Run benchmarks for KS-PBS
bench_ks_pbs: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_PARAM_TYPE=$(BENCH_PARAM_TYPE) __TFHE_RS_PARAMS_SET=$(BENCH_PARAMS_SET) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench ks-pbs-bench \
--features=boolean,shortint,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_ks_pbs_gpu # Run benchmarks for KS-PBS on GPU backend
bench_ks_pbs_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_PARAM_TYPE=$(BENCH_PARAM_TYPE) __TFHE_RS_PARAMS_SET=$(BENCH_PARAMS_SET) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench ks-pbs-bench \
--features=boolean,shortint,gpu,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,shortint,gpu,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_pbs128 # Run benchmarks for PBS using FFT 128 bits
bench_pbs128: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench pbs128-bench \
--features=boolean,shortint,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_pbs128_gpu # Run benchmarks for PBS using FFT 128 bits on GPU
bench_pbs128_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench pbs128-bench \
--features=boolean,shortint,gpu,internal-keycache,nightly-avx512 -p tfhe-benchmark
--features=boolean,shortint,gpu,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
bench_web_js_api_parallel_chrome: browser_path = "$(WEB_RUNNER_DIR)/chrome/chrome-linux64/chrome"
bench_web_js_api_parallel_chrome: driver_path = "$(WEB_RUNNER_DIR)/chrome/chromedriver-linux64/chromedriver"
@@ -1368,29 +1326,17 @@ bench_web_js_api_parallel_firefox_ci: setup_venv
nvm use $(NODE_VERSION) && \
$(MAKE) bench_web_js_api_parallel_firefox
.PHONY: bench_hlapi_erc20 # Run benchmarks for ERC20 operations
.PHONY: bench_hlapi_erc20 # Run benchmarks for ECR20 operations
bench_hlapi_erc20: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench hlapi-erc20 \
--features=integer,internal-keycache,pbs-stats,nightly-avx512 -p tfhe-benchmark --
--features=integer,internal-keycache,pbs-stats,nightly-avx512 -p $(TFHE_SPEC) --
.PHONY: bench_hlapi_erc20_gpu # Run benchmarks for ERC20 operations on GPU
.PHONY: bench_hlapi_erc20_gpu # Run benchmarks for ECR20 operations on GPU
bench_hlapi_erc20_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench hlapi-erc20 \
--features=integer,gpu,internal-keycache,pbs-stats,nightly-avx512 -p tfhe-benchmark --
.PHONY: bench_hlapi_dex # Run benchmarks for DEX operations
bench_hlapi_dex: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench hlapi-dex \
--features=integer,internal-keycache,pbs-stats,nightly-avx512 -p tfhe-benchmark --
.PHONY: bench_hlapi_dex_gpu # Run benchmarks for DEX operations on GPU
bench_hlapi_dex_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench hlapi-dex \
--features=integer,gpu,internal-keycache,pbs-stats,nightly-avx512 -p tfhe-benchmark --
--features=integer,gpu,internal-keycache,pbs-stats,nightly-avx512 -p $(TFHE_SPEC) --
.PHONY: bench_hlapi_erc20_hpu # Run benchmarks for ECR20 operations on HPU
bench_hlapi_erc20_hpu: install_rs_check_toolchain
@@ -1398,7 +1344,7 @@ bench_hlapi_erc20_hpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench hlapi-erc20 \
--features=integer,internal-keycache,hpu,hpu-v80 -p tfhe-benchmark -- --quick
--features=integer,internal-keycache,hpu,hpu-v80 -p $(TFHE_SPEC) -- --quick
.PHONY: bench_tfhe_zk_pok # Run benchmarks for the tfhe_zk_pok crate
bench_tfhe_zk_pok: install_rs_check_toolchain
@@ -1425,23 +1371,20 @@ gen_key_cache_core_crypto: install_rs_build_toolchain
.PHONY: measure_hlapi_compact_pk_ct_sizes # Measure sizes of public keys and ciphertext for high-level API
measure_hlapi_compact_pk_ct_sizes: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) run --profile $(CARGO_PROFILE) \
--bin hlapi_compact_pk_ct_sizes \
--features=integer,internal-keycache \
-p tfhe-benchmark
--example hlapi_compact_pk_ct_sizes \
--features=integer,internal-keycache
.PHONY: measure_shortint_key_sizes # Measure sizes of bootstrapping and key switching keys for shortint
measure_shortint_key_sizes: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) run --profile $(CARGO_PROFILE) \
--bin shortint_key_sizes \
--features=shortint,internal-keycache \
-p tfhe-benchmark
--example shortint_key_sizes \
--features=shortint,internal-keycache
.PHONY: measure_boolean_key_sizes # Measure sizes of bootstrapping and key switching keys for boolean
measure_boolean_key_sizes: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) run --profile $(CARGO_PROFILE) \
--bin boolean_key_sizes \
--features=boolean,internal-keycache \
-p tfhe-benchmark
--example boolean_key_sizes \
--features=boolean,internal-keycache
.PHONY: parse_integer_benches # Run python parser to output a csv containing integer benches data
parse_integer_benches:
@@ -1452,9 +1395,8 @@ parse_integer_benches:
.PHONY: parse_wasm_benchmarks # Parse benchmarks performed with WASM web client into a CSV file
parse_wasm_benchmarks: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) run --profile $(CARGO_PROFILE) \
--bin wasm_benchmarks_parser \
--example wasm_benchmarks_parser \
--features=shortint,internal-keycache \
-p tfhe-benchmark \
-- wasm_benchmark_results.json
.PHONY: write_params_to_file # Gather all crypto parameters into a file with a Sage readable format.
@@ -1497,10 +1439,7 @@ tfhe_lints
.PHONY: pcc_gpu # pcc stands for pre commit checks for GPU compilation
pcc_gpu: check_rust_bindings_did_not_change clippy_rustdoc_gpu \
clippy_gpu clippy_cuda_backend clippy_bench_gpu check_compile_tests_benches_gpu
.PHONY: pcc_hpu # pcc stands for pre commit checks for HPU compilation
pcc_hpu: clippy_hpu clippy_hpu_backend test_integer_hpu_mockup_ci_fast
clippy_gpu clippy_cuda_backend check_compile_tests_benches_gpu
.PHONY: fpcc # pcc stands for pre commit checks, the f stands for fast
fpcc: no_tfhe_typo no_dbg_log check_parameter_export_ok check_fmt check_typos lint_doc \

View File

@@ -206,7 +206,6 @@ If you want to work within the IND-CPA security model, which is less strict than
The default parameters used in the High-Level API with the GPU backend are chosen considering the IND-CPA security model, and are selected with a bootstrapping failure probability fixed at $p_{error} \le 2^{-64}$. In particular, it is assumed that the results of decrypted computations are not shared by the secret key owner with any third parties, as such an action can lead to leakage of the secret encryption key. If you are designing an application where decryptions must be shared, you will need to craft custom encryption parameters which are chosen in consideration of the IND-CPA^D security model [2].
[1] Bernard, Olivier, et al. "Drifting Towards Better Error Probabilities in Fully Homomorphic Encryption Schemes". https://eprint.iacr.org/2024/1718.pdf
[2] Li, Baiyu, et al. "Securing approximate homomorphic encryption using differential privacy." Annual International Cryptology Conference. Cham: Springer Nature Switzerland, 2022. https://eprint.iacr.org/2022/816.pdf
#### Side-channel attacks

View File

@@ -129,7 +129,7 @@ Other sizes than 64 bit are expected to be available in the future.
# FHE shortint Trivium implementation
The same implementation is also available for generic Ciphertexts representing bits (meant to be used with parameters `V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128`).
The same implementation is also available for generic Ciphertexts representing bits (meant to be used with parameters `V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128`).
It uses a lower level API of tfhe-rs, so the syntax is a little bit different. It also implements the `TransCiphering` trait. For optimization purposes, it does not internally run
on the same cryptographic parameters as the high level API of tfhe-rs. As such, it requires the usage of a casting key, to switch from one parameter space to another, which makes
its setup a little more intricate.
@@ -137,10 +137,10 @@ its setup a little more intricate.
Example code:
```rust
use tfhe::shortint::prelude::*;
use tfhe::shortint::parameters::v1_2::{
V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
use tfhe::shortint::parameters::v1_1::{
V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
};
use tfhe::{ConfigBuilder, generate_keys, FheUint64};
use tfhe::prelude::*;
@@ -148,17 +148,17 @@ use tfhe_trivium::TriviumStreamShortint;
fn test_shortint() {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128_2M128,
);
let key_string = "0053A6F94C9FF24598EB".to_string();

View File

@@ -1,9 +1,9 @@
use criterion::Criterion;
use tfhe::prelude::*;
use tfhe::shortint::parameters::v1_2::{
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
use tfhe::shortint::parameters::v1_1::{
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
};
use tfhe::shortint::prelude::*;
use tfhe::{generate_keys, ConfigBuilder, FheUint64};
@@ -11,19 +11,19 @@ use tfhe_trivium::{KreyviumStreamShortint, TransCiphering};
pub fn kreyvium_shortint_warmup(c: &mut Criterion) {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) =
gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();
@@ -64,19 +64,19 @@ pub fn kreyvium_shortint_warmup(c: &mut Criterion) {
pub fn kreyvium_shortint_gen(c: &mut Criterion) {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) =
gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();
@@ -112,19 +112,19 @@ pub fn kreyvium_shortint_gen(c: &mut Criterion) {
pub fn kreyvium_shortint_trans(c: &mut Criterion) {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) =
gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();

View File

@@ -1,9 +1,9 @@
use criterion::Criterion;
use tfhe::prelude::*;
use tfhe::shortint::parameters::v1_2::{
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
use tfhe::shortint::parameters::v1_1::{
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
};
use tfhe::shortint::prelude::*;
use tfhe::{generate_keys, ConfigBuilder, FheUint64};
@@ -11,19 +11,19 @@ use tfhe_trivium::{TransCiphering, TriviumStreamShortint};
pub fn trivium_shortint_warmup(c: &mut Criterion) {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) =
gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
);
let key_string = "0053A6F94C9FF24598EB".to_string();
@@ -64,19 +64,19 @@ pub fn trivium_shortint_warmup(c: &mut Criterion) {
pub fn trivium_shortint_gen(c: &mut Criterion) {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) =
gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
);
let key_string = "0053A6F94C9FF24598EB".to_string();
@@ -112,19 +112,19 @@ pub fn trivium_shortint_gen(c: &mut Criterion) {
pub fn trivium_shortint_trans(c: &mut Criterion) {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) =
gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
);
let key_string = "0053A6F94C9FF24598EB".to_string();

View File

@@ -1,9 +1,9 @@
use crate::{KreyviumStream, KreyviumStreamByte, KreyviumStreamShortint, TransCiphering};
use tfhe::prelude::*;
use tfhe::shortint::parameters::v1_2::{
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
use tfhe::shortint::parameters::v1_1::{
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
};
use tfhe::{generate_keys, ConfigBuilder, FheBool, FheUint64, FheUint8};
// Values for these tests come from the github repo renaud1239/Kreyvium,
@@ -66,7 +66,7 @@ fn get_hexagonal_string_from_bytes(a: Vec<u8>) -> String {
assert!(a.len() % 8 == 0);
let mut hexadecimal: String = "".to_string();
for test in a {
hexadecimal.push_str(&format!("{test:02X?}"));
hexadecimal.push_str(&format!("{:02X?}", test));
}
hexadecimal
}
@@ -74,7 +74,7 @@ fn get_hexagonal_string_from_bytes(a: Vec<u8>) -> String {
fn get_hexagonal_string_from_u64(a: Vec<u64>) -> String {
let mut hexadecimal: String = "".to_string();
for test in a {
hexadecimal.push_str(&format!("{test:016X?}"));
hexadecimal.push_str(&format!("{:016X?}", test));
}
hexadecimal
}
@@ -221,19 +221,19 @@ use tfhe::shortint::prelude::*;
#[test]
fn kreyvium_test_shortint_long() {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) =
gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();

View File

@@ -55,7 +55,7 @@ impl<const N: usize, T> Index<usize> for StaticDeque<N, T> {
/// 0 is youngest
fn index(&self, i: usize) -> &T {
if i >= N {
panic!("Index {i:?} too high for size {N:?}");
panic!("Index {:?} too high for size {:?}", i, N);
}
&self.arr[(N + self.cursor - i - 1) % N]
}
@@ -66,7 +66,7 @@ impl<const N: usize, T> IndexMut<usize> for StaticDeque<N, T> {
/// 0 is youngest
fn index_mut(&mut self, i: usize) -> &mut T {
if i >= N {
panic!("Index {i:?} too high for size {N:?}");
panic!("Index {:?} too high for size {:?}", i, N);
}
&mut self.arr[(N + self.cursor - i - 1) % N]
}

View File

@@ -1,9 +1,9 @@
use crate::{TransCiphering, TriviumStream, TriviumStreamByte, TriviumStreamShortint};
use tfhe::prelude::*;
use tfhe::shortint::parameters::v1_2::{
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
use tfhe::shortint::parameters::v1_1::{
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128,
};
use tfhe::{generate_keys, ConfigBuilder, FheBool, FheUint64, FheUint8};
// Values for these tests come from the github repo cantora/avr-crypto-lib, commit 2a5b018,
@@ -66,7 +66,7 @@ fn get_hexagonal_string_from_bytes(a: Vec<u8>) -> String {
assert!(a.len() % 8 == 0);
let mut hexadecimal: String = "".to_string();
for test in a {
hexadecimal.push_str(&format!("{test:02X?}"));
hexadecimal.push_str(&format!("{:02X?}", test));
}
hexadecimal
}
@@ -74,7 +74,7 @@ fn get_hexagonal_string_from_bytes(a: Vec<u8>) -> String {
fn get_hexagonal_string_from_u64(a: Vec<u64>) -> String {
let mut hexadecimal: String = "".to_string();
for test in a {
hexadecimal.push_str(&format!("{test:016X?}"));
hexadecimal.push_str(&format!("{:016X?}", test));
}
hexadecimal
}
@@ -357,19 +357,19 @@ use tfhe::shortint::prelude::*;
#[test]
fn trivium_test_shortint_long() {
let config = ConfigBuilder::default()
.use_custom_parameters(V1_2_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.use_custom_parameters(V1_1_PARAM_MESSAGE_2_CARRY_2_KS_PBS_GAUSSIAN_2M128)
.build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
let (client_key, server_key): (ClientKey, ServerKey) =
gen_keys(V1_2_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
gen_keys(V1_1_PARAM_MESSAGE_1_CARRY_1_KS_PBS_GAUSSIAN_2M128);
let ksk = KeySwitchingKey::new(
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
V1_2_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
V1_1_PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS_GAUSSIAN_2M128,
);
let key_string = "0053A6F94C9FF24598EB".to_string();

View File

@@ -1,6 +1,6 @@
[package]
name = "tfhe-cuda-backend"
version = "0.10.0"
version = "0.9.0"
edition = "2021"
authors = ["Zama team"]
license = "BSD-3-Clause-Clear"
@@ -18,4 +18,3 @@ bindgen = "0.71"
[features]
experimental-multi-arch = []
profile = []

View File

@@ -45,13 +45,6 @@ fn main() {
} else {
cmake_config.define("MULTI_ARCH", "OFF");
}
// Conditionally pass the "USE_NVTOOLS" variable to CMake if the feature is enabled
if cfg!(feature = "profile") {
cmake_config.define("USE_NVTOOLS", "ON");
println!("cargo:rustc-link-lib=nvToolsExt");
} else {
cmake_config.define("USE_NVTOOLS", "OFF");
}
// Build the CMake project
let dest = cmake_config.build();
@@ -78,7 +71,7 @@ fn main() {
"cuda/include/integer/compression/compression.h",
"cuda/include/integer/integer.h",
"cuda/include/zk/zk.h",
"cuda/include/keyswitch/keyswitch.h",
"cuda/include/keyswitch.h",
"cuda/include/keyswitch/ks_enums.h",
"cuda/include/linear_algebra.h",
"cuda/include/fft/fft128.h",
@@ -93,7 +86,7 @@ fn main() {
};
let mut headers_modified = bindings_modified;
for header in headers {
println!("cargo:rerun-if-changed={header}");
println!("cargo:rerun-if-changed={}", header);
// Check modification times
let header_modified = std::fs::metadata(header).unwrap().modified().unwrap();
if header_modified > headers_modified {

View File

@@ -88,18 +88,11 @@ else()
set(OPTIMIZATION_FLAGS "${OPTIMIZATION_FLAGS} -O3")
endif()
# Check if the USE_NVTOOLS environment variable is set
if(${USE_NVTOOLS})
message(STATUS "USE_NVTOOLS is enabled")
add_definitions(-DUSE_NVTOOLS)
endif()
# in production, should use -arch=sm_70 --ptxas-options=-v to see register spills -lineinfo for better debugging to use
# nvtx when profiling -lnvToolsExt
# in production, should use -arch=sm_70 --ptxas-options=-v to see register spills -lineinfo for better debugging
set(CMAKE_CUDA_FLAGS
"${CMAKE_CUDA_FLAGS} -ccbin ${CMAKE_CXX_COMPILER} ${OPTIMIZATION_FLAGS}\
-std=c++17 --no-exceptions --expt-relaxed-constexpr -rdc=true \
--use_fast_math -Xcompiler -fPIC ")
--use_fast_math -Xcompiler -fPIC")
set(INCLUDE_DIR include)

View File

@@ -31,11 +31,6 @@ void cuda_improve_noise_modulus_switch_64(
void const *lwe_array_in, void const *encrypted_zeros, uint32_t lwe_size,
uint32_t num_lwes, uint32_t num_zeros, double input_variance,
double r_sigma, double bound, uint32_t log_modulus);
void cuda_glwe_sample_extract_128(
void *stream, uint32_t gpu_index, void *lwe_array_out,
void const *glwe_array_in, uint32_t const *nth_array, uint32_t num_nths,
uint32_t lwe_per_glwe, uint32_t glwe_dimension, uint32_t polynomial_size);
}
#endif

View File

@@ -8,6 +8,7 @@
#include <cuda_runtime.h>
#include <vector>
#define synchronize_threads_in_block() __syncthreads()
extern "C" {
#define check_cuda_error(ans) \
@@ -47,28 +48,13 @@ uint32_t cuda_is_available();
void *cuda_malloc(uint64_t size, uint32_t gpu_index);
void *cuda_malloc_with_size_tracking_async(uint64_t size, cudaStream_t stream,
uint32_t gpu_index,
uint64_t *size_tracker,
bool allocate_gpu_memory);
void *cuda_malloc_async(uint64_t size, cudaStream_t stream, uint32_t gpu_index);
bool cuda_check_valid_malloc(uint64_t size, uint32_t gpu_index);
void cuda_memcpy_with_size_tracking_async_to_gpu(void *dest, const void *src,
uint64_t size,
cudaStream_t stream,
uint32_t gpu_index,
bool gpu_memory_allocated);
void cuda_check_valid_malloc(uint64_t size, uint32_t gpu_index);
void cuda_memcpy_async_to_gpu(void *dest, const void *src, uint64_t size,
cudaStream_t stream, uint32_t gpu_index);
void cuda_memcpy_with_size_tracking_async_gpu_to_gpu(
void *dest, void const *src, uint64_t size, cudaStream_t stream,
uint32_t gpu_index, bool gpu_memory_allocated);
void cuda_memcpy_async_gpu_to_gpu(void *dest, void const *src, uint64_t size,
cudaStream_t stream, uint32_t gpu_index);
@@ -78,11 +64,6 @@ void cuda_memcpy_gpu_to_gpu(void *dest, void const *src, uint64_t size,
void cuda_memcpy_async_to_cpu(void *dest, const void *src, uint64_t size,
cudaStream_t stream, uint32_t gpu_index);
void cuda_memset_with_size_tracking_async(void *dest, uint64_t val,
uint64_t size, cudaStream_t stream,
uint32_t gpu_index,
bool gpu_memory_allocated);
void cuda_memset_async(void *dest, uint64_t val, uint64_t size,
cudaStream_t stream, uint32_t gpu_index);
@@ -92,10 +73,6 @@ void cuda_synchronize_device(uint32_t gpu_index);
void cuda_drop(void *ptr, uint32_t gpu_index);
void cuda_drop_with_size_tracking_async(void *ptr, cudaStream_t stream,
uint32_t gpu_index,
bool gpu_memory_allocated);
void cuda_drop_async(void *ptr, cudaStream_t stream, uint32_t gpu_index);
}

View File

@@ -4,7 +4,7 @@
#include "../../pbs/pbs_enums.h"
extern "C" {
uint64_t scratch_cuda_integer_compress_radix_ciphertext_64(
void scratch_cuda_integer_compress_radix_ciphertext_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t compression_glwe_dimension,
uint32_t compression_polynomial_size, uint32_t lwe_dimension,
@@ -13,7 +13,7 @@ uint64_t scratch_cuda_integer_compress_radix_ciphertext_64(
uint32_t lwe_per_glwe, uint32_t storage_log_modulus,
bool allocate_gpu_memory);
uint64_t scratch_cuda_integer_decompress_radix_ciphertext_64(
void scratch_cuda_integer_decompress_radix_ciphertext_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t encryption_glwe_dimension,
uint32_t encryption_polynomial_size, uint32_t compression_glwe_dimension,

View File

@@ -14,44 +14,41 @@ template <typename Torus> struct int_compression {
int8_t *fp_ks_buffer;
Torus *tmp_lwe;
Torus *tmp_glwe_array_out;
bool gpu_memory_allocated;
int_compression(cudaStream_t const *streams, uint32_t const *gpu_indexes,
uint32_t gpu_count, int_radix_params compression_params,
uint32_t num_radix_blocks, uint32_t lwe_per_glwe,
uint32_t storage_log_modulus, bool allocate_gpu_memory,
uint64_t *size_tracker) {
gpu_memory_allocated = allocate_gpu_memory;
uint32_t storage_log_modulus, bool allocate_gpu_memory) {
this->compression_params = compression_params;
this->lwe_per_glwe = lwe_per_glwe;
this->storage_log_modulus = storage_log_modulus;
this->body_count = num_radix_blocks;
Torus glwe_accumulator_size = (compression_params.glwe_dimension + 1) *
compression_params.polynomial_size;
if (allocate_gpu_memory) {
Torus glwe_accumulator_size = (compression_params.glwe_dimension + 1) *
compression_params.polynomial_size;
tmp_lwe = (Torus *)cuda_malloc_with_size_tracking_async(
num_radix_blocks * (compression_params.small_lwe_dimension + 1) *
sizeof(Torus),
streams[0], gpu_indexes[0], size_tracker, allocate_gpu_memory);
tmp_glwe_array_out = (Torus *)cuda_malloc_with_size_tracking_async(
lwe_per_glwe * glwe_accumulator_size * sizeof(Torus), streams[0],
gpu_indexes[0], size_tracker, allocate_gpu_memory);
tmp_lwe = (Torus *)cuda_malloc_async(
num_radix_blocks * (compression_params.small_lwe_dimension + 1) *
sizeof(Torus),
streams[0], gpu_indexes[0]);
tmp_glwe_array_out = (Torus *)cuda_malloc_async(
lwe_per_glwe * glwe_accumulator_size * sizeof(Torus), streams[0],
gpu_indexes[0]);
*size_tracker += scratch_packing_keyswitch_lwe_list_to_glwe_64(
streams[0], gpu_indexes[0], &fp_ks_buffer,
compression_params.small_lwe_dimension,
compression_params.glwe_dimension, compression_params.polynomial_size,
num_radix_blocks, allocate_gpu_memory);
scratch_packing_keyswitch_lwe_list_to_glwe_64(
streams[0], gpu_indexes[0], &fp_ks_buffer,
compression_params.small_lwe_dimension,
compression_params.glwe_dimension, compression_params.polynomial_size,
num_radix_blocks, true);
}
}
void release(cudaStream_t const *streams, uint32_t const *gpu_indexes,
uint32_t gpu_count) {
cuda_drop_with_size_tracking_async(tmp_lwe, streams[0], gpu_indexes[0],
gpu_memory_allocated);
cuda_drop_with_size_tracking_async(tmp_glwe_array_out, streams[0],
gpu_indexes[0], gpu_memory_allocated);
cleanup_packing_keyswitch_lwe_list_to_glwe(
streams[0], gpu_indexes[0], &fp_ks_buffer, gpu_memory_allocated);
cuda_drop_async(tmp_lwe, streams[0], gpu_indexes[0]);
cuda_drop_async(tmp_glwe_array_out, streams[0], gpu_indexes[0]);
cleanup_packing_keyswitch_lwe_list_to_glwe(streams[0], gpu_indexes[0],
&fp_ks_buffer);
}
};
@@ -69,71 +66,66 @@ template <typename Torus> struct int_decompression {
uint32_t *tmp_indexes_array;
int_radix_lut<Torus> *decompression_rescale_lut;
bool gpu_memory_allocated;
int_decompression(cudaStream_t const *streams, uint32_t const *gpu_indexes,
uint32_t gpu_count, int_radix_params encryption_params,
int_radix_params compression_params,
uint32_t num_radix_blocks, uint32_t body_count,
uint32_t storage_log_modulus, bool allocate_gpu_memory,
uint64_t *size_tracker) {
gpu_memory_allocated = allocate_gpu_memory;
uint32_t storage_log_modulus, bool allocate_gpu_memory) {
this->encryption_params = encryption_params;
this->compression_params = compression_params;
this->storage_log_modulus = storage_log_modulus;
this->num_radix_blocks = num_radix_blocks;
this->body_count = body_count;
Torus glwe_accumulator_size = (compression_params.glwe_dimension + 1) *
compression_params.polynomial_size;
Torus lwe_accumulator_size = (compression_params.glwe_dimension *
compression_params.polynomial_size +
1);
decompression_rescale_lut = new int_radix_lut<Torus>(
streams, gpu_indexes, gpu_count, encryption_params, 1, num_radix_blocks,
allocate_gpu_memory, size_tracker);
if (allocate_gpu_memory) {
Torus glwe_accumulator_size = (compression_params.glwe_dimension + 1) *
compression_params.polynomial_size;
Torus lwe_accumulator_size = (compression_params.glwe_dimension *
compression_params.polynomial_size +
1);
decompression_rescale_lut = new int_radix_lut<Torus>(
streams, gpu_indexes, gpu_count, encryption_params, 1,
num_radix_blocks, allocate_gpu_memory);
tmp_extracted_glwe = (Torus *)cuda_malloc_with_size_tracking_async(
num_radix_blocks * glwe_accumulator_size * sizeof(Torus), streams[0],
gpu_indexes[0], size_tracker, allocate_gpu_memory);
tmp_indexes_array = (uint32_t *)cuda_malloc_with_size_tracking_async(
num_radix_blocks * sizeof(uint32_t), streams[0], gpu_indexes[0],
size_tracker, allocate_gpu_memory);
tmp_extracted_lwe = (Torus *)cuda_malloc_with_size_tracking_async(
num_radix_blocks * lwe_accumulator_size * sizeof(Torus), streams[0],
gpu_indexes[0], size_tracker, allocate_gpu_memory);
tmp_extracted_glwe = (Torus *)cuda_malloc_async(
num_radix_blocks * glwe_accumulator_size * sizeof(Torus), streams[0],
gpu_indexes[0]);
tmp_indexes_array = (uint32_t *)cuda_malloc_async(
num_radix_blocks * sizeof(uint32_t), streams[0], gpu_indexes[0]);
tmp_extracted_lwe = (Torus *)cuda_malloc_async(
num_radix_blocks * lwe_accumulator_size * sizeof(Torus), streams[0],
gpu_indexes[0]);
// Rescale is done using an identity LUT
// Here we do not divide by message_modulus
// Example: in the 2_2 case we are mapping a 2 bits message onto a 4 bits
// space, we want to keep the original 2 bits value in the 4 bits space,
// so we apply the identity and the encoding will rescale it for us.
auto decompression_rescale_f = [](Torus x) -> Torus { return x; };
// Rescale is done using an identity LUT
// Here we do not divide by message_modulus
// Example: in the 2_2 case we are mapping a 2 bits message onto a 4 bits
// space, we want to keep the original 2 bits value in the 4 bits space,
// so we apply the identity and the encoding will rescale it for us.
auto decompression_rescale_f = [](Torus x) -> Torus { return x; };
auto effective_compression_message_modulus =
encryption_params.carry_modulus;
auto effective_compression_carry_modulus = 1;
auto effective_compression_message_modulus =
encryption_params.carry_modulus;
auto effective_compression_carry_modulus = 1;
generate_device_accumulator_with_encoding<Torus>(
streams[0], gpu_indexes[0], decompression_rescale_lut->get_lut(0, 0),
decompression_rescale_lut->get_degree(0),
decompression_rescale_lut->get_max_degree(0),
encryption_params.glwe_dimension, encryption_params.polynomial_size,
effective_compression_message_modulus,
effective_compression_carry_modulus, encryption_params.message_modulus,
encryption_params.carry_modulus, decompression_rescale_f,
gpu_memory_allocated);
generate_device_accumulator_with_encoding<Torus>(
streams[0], gpu_indexes[0], decompression_rescale_lut->get_lut(0, 0),
decompression_rescale_lut->get_degree(0),
decompression_rescale_lut->get_max_degree(0),
encryption_params.glwe_dimension, encryption_params.polynomial_size,
effective_compression_message_modulus,
effective_compression_carry_modulus,
encryption_params.message_modulus, encryption_params.carry_modulus,
decompression_rescale_f);
decompression_rescale_lut->broadcast_lut(streams, gpu_indexes, 0);
decompression_rescale_lut->broadcast_lut(streams, gpu_indexes, 0);
}
}
void release(cudaStream_t const *streams, uint32_t const *gpu_indexes,
uint32_t gpu_count) {
cuda_drop_with_size_tracking_async(tmp_extracted_glwe, streams[0],
gpu_indexes[0], gpu_memory_allocated);
cuda_drop_with_size_tracking_async(tmp_extracted_lwe, streams[0],
gpu_indexes[0], gpu_memory_allocated);
cuda_drop_with_size_tracking_async(tmp_indexes_array, streams[0],
gpu_indexes[0], gpu_memory_allocated);
cuda_drop_async(tmp_extracted_glwe, streams[0], gpu_indexes[0]);
cuda_drop_async(tmp_extracted_lwe, streams[0], gpu_indexes[0]);
cuda_drop_async(tmp_indexes_array, streams[0], gpu_indexes[0]);
decompression_rescale_lut->release(streams, gpu_indexes, gpu_count);
delete decompression_rescale_lut;

View File

@@ -48,7 +48,7 @@ typedef struct {
uint32_t lwe_dimension;
} CudaRadixCiphertextFFI;
uint64_t scratch_cuda_apply_univariate_lut_kb_64(
void scratch_cuda_apply_univariate_lut_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, void const *input_lut, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t ks_level,
@@ -56,7 +56,7 @@ uint64_t scratch_cuda_apply_univariate_lut_kb_64(
uint32_t grouping_factor, uint32_t input_lwe_ciphertext_count,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
uint64_t lut_degree, bool allocate_gpu_memory, bool allocate_ms_array);
uint64_t scratch_cuda_apply_many_univariate_lut_kb_64(
void scratch_cuda_apply_many_univariate_lut_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, void const *input_lut, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t ks_level,
@@ -78,7 +78,7 @@ void cleanup_cuda_apply_univariate_lut_kb_64(void *const *streams,
uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_apply_bivariate_lut_kb_64(
void scratch_cuda_apply_bivariate_lut_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, void const *input_lut, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t ks_level,
@@ -109,7 +109,7 @@ void cuda_apply_many_univariate_lut_kb_64(
CudaModulusSwitchNoiseReductionKeyFFI const *ms_noise_reduction_key,
void *const *bsks, uint32_t num_luts, uint32_t lut_stride);
uint64_t scratch_cuda_full_propagation_64(
void scratch_cuda_full_propagation_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t ks_level, uint32_t ks_base_log,
@@ -127,7 +127,7 @@ void cleanup_cuda_full_propagation(void *const *streams,
uint32_t const *gpu_indexes,
uint32_t gpu_count, int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_mult_radix_ciphertext_kb_64(
void scratch_cuda_integer_mult_radix_ciphertext_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, bool const is_boolean_left, bool const is_boolean_right,
uint32_t message_modulus, uint32_t carry_modulus, uint32_t glwe_dimension,
@@ -161,7 +161,7 @@ void cuda_scalar_addition_integer_radix_ciphertext_64_inplace(
void const *h_scalar_input, uint32_t num_scalars, uint32_t message_modulus,
uint32_t carry_modulus);
uint64_t scratch_cuda_integer_radix_logical_scalar_shift_kb_64(
void scratch_cuda_integer_radix_logical_scalar_shift_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -176,7 +176,7 @@ void cuda_integer_radix_logical_scalar_shift_kb_64_inplace(
void *const *bsks, void *const *ksks,
CudaModulusSwitchNoiseReductionKeyFFI const *ms_noise_reduction_key);
uint64_t scratch_cuda_integer_radix_arithmetic_scalar_shift_kb_64(
void scratch_cuda_integer_radix_arithmetic_scalar_shift_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -199,7 +199,7 @@ void cleanup_cuda_integer_radix_arithmetic_scalar_shift(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_radix_shift_and_rotate_kb_64(
void scratch_cuda_integer_radix_shift_and_rotate_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -219,7 +219,7 @@ void cleanup_cuda_integer_radix_shift_and_rotate(void *const *streams,
uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_radix_comparison_kb_64(
void scratch_cuda_integer_radix_comparison_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -250,7 +250,7 @@ void cleanup_cuda_integer_comparison(void *const *streams,
uint32_t const *gpu_indexes,
uint32_t gpu_count, int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_radix_bitop_kb_64(
void scratch_cuda_integer_radix_bitop_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -279,7 +279,7 @@ void cleanup_cuda_integer_bitop(void *const *streams,
uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_radix_cmux_kb_64(
void scratch_cuda_integer_radix_cmux_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -301,7 +301,7 @@ void cleanup_cuda_integer_radix_cmux(void *const *streams,
uint32_t const *gpu_indexes,
uint32_t gpu_count, int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_radix_scalar_rotate_kb_64(
void scratch_cuda_integer_radix_scalar_rotate_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -321,7 +321,7 @@ void cleanup_cuda_integer_radix_scalar_rotate(void *const *streams,
uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_propagate_single_carry_kb_64_inplace(
void scratch_cuda_propagate_single_carry_kb_64_inplace(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -330,7 +330,7 @@ uint64_t scratch_cuda_propagate_single_carry_kb_64_inplace(
uint32_t carry_modulus, PBS_TYPE pbs_type, uint32_t requested_flag,
uint32_t uses_carry, bool allocate_gpu_memory, bool allocate_ms_array);
uint64_t scratch_cuda_add_and_propagate_single_carry_kb_64_inplace(
void scratch_cuda_add_and_propagate_single_carry_kb_64_inplace(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -365,7 +365,7 @@ void cleanup_cuda_add_and_propagate_single_carry(void *const *streams,
uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_overflowing_sub_kb_64_inplace(
void scratch_cuda_integer_overflowing_sub_kb_64_inplace(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -388,7 +388,7 @@ void cleanup_cuda_integer_overflowing_sub(void *const *streams,
uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_radix_partial_sum_ciphertexts_vec_kb_64(
void scratch_cuda_integer_radix_partial_sum_ciphertexts_vec_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
@@ -408,7 +408,7 @@ void cleanup_cuda_integer_radix_partial_sum_ciphertexts_vec(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_scalar_mul_kb_64(
void scratch_cuda_integer_scalar_mul_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
@@ -429,7 +429,7 @@ void cleanup_cuda_integer_radix_scalar_mul(void *const *streams,
uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_div_rem_radix_ciphertext_kb_64(
void scratch_cuda_integer_div_rem_radix_ciphertext_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
bool is_signed, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
@@ -450,7 +450,7 @@ void cleanup_cuda_integer_div_rem(void *const *streams,
uint32_t const *gpu_indexes,
uint32_t gpu_count, int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_compute_prefix_sum_hillis_steele_64(
void scratch_cuda_integer_compute_prefix_sum_hillis_steele_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, void const *input_lut, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t ks_level,
@@ -476,7 +476,7 @@ void cuda_integer_reverse_blocks_64_inplace(void *const *streams,
uint32_t gpu_count,
CudaRadixCiphertextFFI *lwe_array);
uint64_t scratch_cuda_integer_abs_inplace_radix_ciphertext_kb_64(
void scratch_cuda_integer_abs_inplace_radix_ciphertext_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, bool is_signed, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
@@ -496,7 +496,7 @@ void cleanup_cuda_integer_abs_inplace(void *const *streams,
uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_are_all_comparisons_block_true_kb_64(
void scratch_cuda_integer_are_all_comparisons_block_true_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
@@ -517,7 +517,7 @@ void cleanup_cuda_integer_are_all_comparisons_block_true(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr_void);
uint64_t scratch_cuda_integer_is_at_least_one_comparisons_block_true_kb_64(
void scratch_cuda_integer_is_at_least_one_comparisons_block_true_kb_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,

View File

@@ -3,8 +3,7 @@
void release_radix_ciphertext_async(cudaStream_t const stream,
uint32_t const gpu_index,
CudaRadixCiphertextFFI *data,
const bool gpu_memory_allocated);
CudaRadixCiphertextFFI *data);
void reset_radix_ciphertext_blocks(CudaRadixCiphertextFFI *data,
uint32_t new_num_blocks);

View File

@@ -19,7 +19,7 @@ void cuda_keyswitch_lwe_ciphertext_vector_64(
uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
uint32_t num_samples);
uint64_t scratch_packing_keyswitch_lwe_list_to_glwe_64(
void scratch_packing_keyswitch_lwe_list_to_glwe_64(
void *stream, uint32_t gpu_index, int8_t **fp_ks_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t num_lwes, bool allocate_gpu_memory);
@@ -31,22 +31,9 @@ void cuda_packing_keyswitch_lwe_list_to_glwe_64(
uint32_t output_polynomial_size, uint32_t base_log, uint32_t level_count,
uint32_t num_lwes);
void scratch_packing_keyswitch_lwe_list_to_glwe_128(
void *stream, uint32_t gpu_index, int8_t **fp_ks_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t num_lwes, bool allocate_gpu_memory);
void cuda_packing_keyswitch_lwe_list_to_glwe_128(
void *stream, uint32_t gpu_index, void *glwe_array_out,
void const *lwe_array_in, void const *fp_ksk_array, int8_t *fp_ks_buffer,
uint32_t input_lwe_dimension, uint32_t output_glwe_dimension,
uint32_t output_polynomial_size, uint32_t base_log, uint32_t level_count,
uint32_t num_lwes);
void cleanup_packing_keyswitch_lwe_list_to_glwe(void *stream,
uint32_t gpu_index,
int8_t **fp_ks_buffer,
bool gpu_memory_allocated);
int8_t **fp_ks_buffer);
}
#endif // CNCRT_KS_H_

View File

@@ -42,24 +42,6 @@ void cuda_mult_lwe_ciphertext_vector_cleartext_vector_64(
void const *lwe_array_in, void const *cleartext_array_in,
const uint32_t input_lwe_dimension,
const uint32_t input_lwe_ciphertext_count);
void scratch_wrapping_polynomial_mul_one_to_many_64(void *stream,
uint32_t gpu_index,
uint32_t polynomial_size,
int8_t **circulant_buf);
void cleanup_wrapping_polynomial_mul_one_to_many_64(void *stream,
uint32_t gpu_index,
int8_t *circulant_buf);
void cuda_wrapping_polynomial_mul_one_to_many_64(
void *stream, uint32_t gpu_index, void *result, void const *poly_lhs,
int8_t *circulant, void const *poly_rhs, uint32_t polynomial_size,
uint32_t n_rhs);
void cuda_glwe_wrapping_polynomial_mul_one_to_many_64(
void *stream, uint32_t gpu_index, void *result, void const *poly_lhs,
int8_t *circulant, void const *poly_rhs, uint32_t polynomial_size,
uint32_t glwe_dimension, uint32_t n_rhs);
void cuda_add_lwe_ciphertext_vector_plaintext_64(
void *stream, uint32_t gpu_index, void *lwe_array_out,
void const *lwe_array_in, const uint64_t plaintext_in,

View File

@@ -14,7 +14,7 @@ bool has_support_to_cuda_programmable_bootstrap_tbc_multi_bit(
#if CUDA_ARCH >= 900
template <typename Torus>
uint64_t scratch_cuda_tbc_multi_bit_programmable_bootstrap(
void scratch_cuda_tbc_multi_bit_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, MULTI_BIT> **buffer,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
@@ -32,7 +32,7 @@ void cuda_tbc_multi_bit_programmable_bootstrap_lwe_ciphertext_vector(
#endif
template <typename Torus>
uint64_t scratch_cuda_cg_multi_bit_programmable_bootstrap(
void scratch_cuda_cg_multi_bit_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, MULTI_BIT> **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
@@ -49,7 +49,7 @@ void cuda_cg_multi_bit_programmable_bootstrap_lwe_ciphertext_vector(
uint32_t num_many_lut, uint32_t lut_stride);
template <typename Torus>
uint64_t scratch_cuda_multi_bit_programmable_bootstrap(
void scratch_cuda_multi_bit_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, MULTI_BIT> **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
@@ -109,14 +109,11 @@ template <typename Torus> struct pbs_buffer<Torus, PBS_TYPE::MULTI_BIT> {
double2 *global_join_buffer;
PBS_VARIANT pbs_variant;
bool gpu_memory_allocated;
pbs_buffer(cudaStream_t stream, uint32_t gpu_index, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t lwe_chunk_size,
PBS_VARIANT pbs_variant, bool allocate_gpu_memory,
uint64_t *size_tracker) {
gpu_memory_allocated = allocate_gpu_memory;
PBS_VARIANT pbs_variant, bool allocate_gpu_memory) {
cuda_set_device(gpu_index);
this->pbs_variant = pbs_variant;
@@ -167,117 +164,107 @@ template <typename Torus> struct pbs_buffer<Torus, PBS_TYPE::MULTI_BIT> {
auto num_blocks_acc_tbc = num_blocks_acc_cg;
#endif
// Keybundle
if (max_shared_memory < full_sm_keybundle)
d_mem_keybundle = (int8_t *)cuda_malloc_with_size_tracking_async(
num_blocks_keybundle * full_sm_keybundle, stream, gpu_index,
size_tracker, allocate_gpu_memory);
if (allocate_gpu_memory) {
// Keybundle
if (max_shared_memory < full_sm_keybundle)
d_mem_keybundle = (int8_t *)cuda_malloc_async(
num_blocks_keybundle * full_sm_keybundle, stream, gpu_index);
switch (pbs_variant) {
case PBS_VARIANT::CG:
// Accumulator CG
if (max_shared_memory < partial_sm_cg_accumulate)
d_mem_acc_cg = (int8_t *)cuda_malloc_with_size_tracking_async(
num_blocks_acc_cg * full_sm_cg_accumulate, stream, gpu_index,
size_tracker, allocate_gpu_memory);
else if (max_shared_memory < full_sm_cg_accumulate)
d_mem_acc_cg = (int8_t *)cuda_malloc_with_size_tracking_async(
num_blocks_acc_cg * partial_sm_cg_accumulate, stream, gpu_index,
size_tracker, allocate_gpu_memory);
break;
case PBS_VARIANT::DEFAULT:
// Accumulator step one
if (max_shared_memory < partial_sm_accumulate_step_one)
d_mem_acc_step_one = (int8_t *)cuda_malloc_with_size_tracking_async(
num_blocks_acc_step_one * full_sm_accumulate_step_one, stream,
gpu_index, size_tracker, allocate_gpu_memory);
else if (max_shared_memory < full_sm_accumulate_step_one)
d_mem_acc_step_one = (int8_t *)cuda_malloc_with_size_tracking_async(
num_blocks_acc_step_one * partial_sm_accumulate_step_one, stream,
gpu_index, size_tracker, allocate_gpu_memory);
switch (pbs_variant) {
case PBS_VARIANT::CG:
// Accumulator CG
if (max_shared_memory < partial_sm_cg_accumulate)
d_mem_acc_cg = (int8_t *)cuda_malloc_async(
num_blocks_acc_cg * full_sm_cg_accumulate, stream, gpu_index);
else if (max_shared_memory < full_sm_cg_accumulate)
d_mem_acc_cg = (int8_t *)cuda_malloc_async(
num_blocks_acc_cg * partial_sm_cg_accumulate, stream, gpu_index);
break;
case PBS_VARIANT::DEFAULT:
// Accumulator step one
if (max_shared_memory < partial_sm_accumulate_step_one)
d_mem_acc_step_one = (int8_t *)cuda_malloc_async(
num_blocks_acc_step_one * full_sm_accumulate_step_one, stream,
gpu_index);
else if (max_shared_memory < full_sm_accumulate_step_one)
d_mem_acc_step_one = (int8_t *)cuda_malloc_async(
num_blocks_acc_step_one * partial_sm_accumulate_step_one, stream,
gpu_index);
// Accumulator step two
if (max_shared_memory < full_sm_accumulate_step_two)
d_mem_acc_step_two = (int8_t *)cuda_malloc_with_size_tracking_async(
num_blocks_acc_step_two * full_sm_accumulate_step_two, stream,
gpu_index, size_tracker, allocate_gpu_memory);
break;
// Accumulator step two
if (max_shared_memory < full_sm_accumulate_step_two)
d_mem_acc_step_two = (int8_t *)cuda_malloc_async(
num_blocks_acc_step_two * full_sm_accumulate_step_two, stream,
gpu_index);
break;
#if CUDA_ARCH >= 900
case TBC:
// There is a minimum amount of memory we need to run the TBC PBS, which
// is minimum_sm_tbc. We know that minimum_sm_tbc bytes are available
// because otherwise the previous check would have redirected
// computation to some other variant. If over that we don't have more
// partial_sm_tbc_accumulate bytes, TBC PBS will run on NOSM. If we have
// partial_sm_tbc_accumulate but not full_sm_tbc_accumulate bytes, it
// will run on PARTIALSM. Otherwise, FULLSM.
//
// NOSM mode actually requires minimum_sm_tbc shared memory bytes.
case TBC:
// There is a minimum amount of memory we need to run the TBC PBS, which
// is minimum_sm_tbc. We know that minimum_sm_tbc bytes are available
// because otherwise the previous check would have redirected
// computation to some other variant. If over that we don't have more
// partial_sm_tbc_accumulate bytes, TBC PBS will run on NOSM. If we have
// partial_sm_tbc_accumulate but not full_sm_tbc_accumulate bytes, it
// will run on PARTIALSM. Otherwise, FULLSM.
//
// NOSM mode actually requires minimum_sm_tbc shared memory bytes.
// Accumulator TBC
if (max_shared_memory < partial_sm_tbc_accumulate + minimum_sm_tbc)
d_mem_acc_tbc = (int8_t *)cuda_malloc_with_size_tracking_async(
num_blocks_acc_tbc * full_sm_tbc_accumulate, stream, gpu_index,
size_tracker, allocate_gpu_memory);
else if (max_shared_memory < full_sm_tbc_accumulate + minimum_sm_tbc)
d_mem_acc_tbc = (int8_t *)cuda_malloc_with_size_tracking_async(
num_blocks_acc_tbc * partial_sm_tbc_accumulate, stream, gpu_index,
size_tracker, allocate_gpu_memory);
break;
// Accumulator TBC
if (max_shared_memory < partial_sm_tbc_accumulate + minimum_sm_tbc)
d_mem_acc_tbc = (int8_t *)cuda_malloc_async(
num_blocks_acc_tbc * full_sm_tbc_accumulate, stream, gpu_index);
else if (max_shared_memory < full_sm_tbc_accumulate + minimum_sm_tbc)
d_mem_acc_tbc = (int8_t *)cuda_malloc_async(
num_blocks_acc_tbc * partial_sm_tbc_accumulate, stream,
gpu_index);
break;
#endif
default:
PANIC("Cuda error (PBS): unsupported implementation variant.")
}
default:
PANIC("Cuda error (PBS): unsupported implementation variant.")
}
keybundle_fft = (double2 *)cuda_malloc_with_size_tracking_async(
num_blocks_keybundle * (polynomial_size / 2) * sizeof(double2), stream,
gpu_index, size_tracker, allocate_gpu_memory);
global_accumulator = (Torus *)cuda_malloc_with_size_tracking_async(
input_lwe_ciphertext_count * (glwe_dimension + 1) * polynomial_size *
sizeof(Torus),
stream, gpu_index, size_tracker, allocate_gpu_memory);
global_join_buffer = (double2 *)cuda_malloc_with_size_tracking_async(
level_count * (glwe_dimension + 1) * input_lwe_ciphertext_count *
(polynomial_size / 2) * sizeof(double2),
stream, gpu_index, size_tracker, allocate_gpu_memory);
keybundle_fft = (double2 *)cuda_malloc_async(
num_blocks_keybundle * (polynomial_size / 2) * sizeof(double2),
stream, gpu_index);
global_accumulator = (Torus *)cuda_malloc_async(
input_lwe_ciphertext_count * (glwe_dimension + 1) * polynomial_size *
sizeof(Torus),
stream, gpu_index);
global_join_buffer = (double2 *)cuda_malloc_async(
level_count * (glwe_dimension + 1) * input_lwe_ciphertext_count *
(polynomial_size / 2) * sizeof(double2),
stream, gpu_index);
}
}
void release(cudaStream_t stream, uint32_t gpu_index) {
if (d_mem_keybundle)
cuda_drop_with_size_tracking_async(d_mem_keybundle, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(d_mem_keybundle, stream, gpu_index);
switch (pbs_variant) {
case DEFAULT:
if (d_mem_acc_step_one)
cuda_drop_with_size_tracking_async(d_mem_acc_step_one, stream,
gpu_index, gpu_memory_allocated);
cuda_drop_async(d_mem_acc_step_one, stream, gpu_index);
if (d_mem_acc_step_two)
cuda_drop_with_size_tracking_async(d_mem_acc_step_two, stream,
gpu_index, gpu_memory_allocated);
cuda_drop_async(d_mem_acc_step_two, stream, gpu_index);
break;
case CG:
if (d_mem_acc_cg)
cuda_drop_with_size_tracking_async(d_mem_acc_cg, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(d_mem_acc_cg, stream, gpu_index);
break;
#if CUDA_ARCH >= 900
case TBC:
if (d_mem_acc_tbc)
cuda_drop_with_size_tracking_async(d_mem_acc_tbc, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(d_mem_acc_tbc, stream, gpu_index);
break;
#endif
default:
PANIC("Cuda error (PBS): unsupported implementation variant.")
}
cuda_drop_with_size_tracking_async(keybundle_fft, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_with_size_tracking_async(global_accumulator, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_with_size_tracking_async(global_join_buffer, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(keybundle_fft, stream, gpu_index);
cuda_drop_async(global_accumulator, stream, gpu_index);
cuda_drop_async(global_join_buffer, stream, gpu_index);
}
};

View File

@@ -84,159 +84,155 @@ template <typename Torus> struct pbs_buffer<Torus, PBS_TYPE::CLASSICAL> {
PBS_VARIANT pbs_variant;
bool uses_noise_reduction;
bool gpu_memory_allocated;
pbs_buffer(cudaStream_t stream, uint32_t gpu_index, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t input_lwe_ciphertext_count,
PBS_VARIANT pbs_variant, bool allocate_gpu_memory,
bool allocate_ms_array, uint64_t *size_tracker) {
gpu_memory_allocated = allocate_gpu_memory;
bool allocate_ms_array) {
cuda_set_device(gpu_index);
this->uses_noise_reduction = allocate_ms_array;
this->pbs_variant = pbs_variant;
auto max_shared_memory = cuda_get_max_shared_memory(gpu_index);
this->temp_lwe_array_in = (Torus *)cuda_malloc_with_size_tracking_async(
(lwe_dimension + 1) * input_lwe_ciphertext_count * sizeof(Torus),
stream, gpu_index, size_tracker, allocate_ms_array);
switch (pbs_variant) {
case PBS_VARIANT::DEFAULT: {
uint64_t full_sm_step_one =
get_buffer_size_full_sm_programmable_bootstrap_step_one<Torus>(
polynomial_size);
uint64_t full_sm_step_two =
get_buffer_size_full_sm_programmable_bootstrap_step_two<Torus>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap<Torus>(
polynomial_size);
if (allocate_ms_array) {
this->temp_lwe_array_in = (Torus *)cuda_malloc_async(
(lwe_dimension + 1) * input_lwe_ciphertext_count * sizeof(Torus),
stream, gpu_index);
}
if (allocate_gpu_memory) {
switch (pbs_variant) {
case PBS_VARIANT::DEFAULT: {
uint64_t full_sm_step_one =
get_buffer_size_full_sm_programmable_bootstrap_step_one<Torus>(
polynomial_size);
uint64_t full_sm_step_two =
get_buffer_size_full_sm_programmable_bootstrap_step_two<Torus>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap<Torus>(
polynomial_size);
uint64_t partial_dm_step_one = full_sm_step_one - partial_sm;
uint64_t partial_dm_step_two = full_sm_step_two - partial_sm;
uint64_t full_dm = full_sm_step_one;
uint64_t partial_dm_step_one = full_sm_step_one - partial_sm;
uint64_t partial_dm_step_two = full_sm_step_two - partial_sm;
uint64_t full_dm = full_sm_step_one;
uint64_t device_mem = 0;
if (max_shared_memory < partial_sm) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm_step_two) {
device_mem = (partial_dm_step_two + partial_dm_step_one * level_count) *
input_lwe_ciphertext_count * (glwe_dimension + 1);
} else if (max_shared_memory < full_sm_step_one) {
device_mem = partial_dm_step_one * input_lwe_ciphertext_count *
level_count * (glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_with_size_tracking_async(
device_mem, stream, gpu_index, size_tracker, allocate_gpu_memory);
uint64_t device_mem = 0;
if (max_shared_memory < partial_sm) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm_step_two) {
device_mem =
(partial_dm_step_two + partial_dm_step_one * level_count) *
input_lwe_ciphertext_count * (glwe_dimension + 1);
} else if (max_shared_memory < full_sm_step_one) {
device_mem = partial_dm_step_one * input_lwe_ciphertext_count *
level_count * (glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_async(device_mem, stream, gpu_index);
global_join_buffer = (double2 *)cuda_malloc_with_size_tracking_async(
(glwe_dimension + 1) * level_count * input_lwe_ciphertext_count *
(polynomial_size / 2) * sizeof(double2),
stream, gpu_index, size_tracker, allocate_gpu_memory);
global_join_buffer = (double2 *)cuda_malloc_async(
(glwe_dimension + 1) * level_count * input_lwe_ciphertext_count *
(polynomial_size / 2) * sizeof(double2),
stream, gpu_index);
global_accumulator = (Torus *)cuda_malloc_with_size_tracking_async(
(glwe_dimension + 1) * input_lwe_ciphertext_count * polynomial_size *
sizeof(Torus),
stream, gpu_index, size_tracker, allocate_gpu_memory);
} break;
case PBS_VARIANT::CG: {
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_cg<Torus>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap_cg<Torus>(
polynomial_size);
global_accumulator = (Torus *)cuda_malloc_async(
(glwe_dimension + 1) * input_lwe_ciphertext_count *
polynomial_size * sizeof(Torus),
stream, gpu_index);
} break;
case PBS_VARIANT::CG: {
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_cg<Torus>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap_cg<Torus>(
polynomial_size);
uint64_t partial_dm = full_sm - partial_sm;
uint64_t full_dm = full_sm;
uint64_t device_mem = 0;
uint64_t partial_dm = full_sm - partial_sm;
uint64_t full_dm = full_sm;
uint64_t device_mem = 0;
if (max_shared_memory < partial_sm) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm) {
device_mem = partial_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
}
if (max_shared_memory < partial_sm) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm) {
device_mem = partial_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_with_size_tracking_async(
device_mem, stream, gpu_index, size_tracker, allocate_gpu_memory);
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_async(device_mem, stream, gpu_index);
global_join_buffer = (double2 *)cuda_malloc_with_size_tracking_async(
(glwe_dimension + 1) * level_count * input_lwe_ciphertext_count *
polynomial_size / 2 * sizeof(double2),
stream, gpu_index, size_tracker, allocate_gpu_memory);
} break;
global_join_buffer = (double2 *)cuda_malloc_async(
(glwe_dimension + 1) * level_count * input_lwe_ciphertext_count *
polynomial_size / 2 * sizeof(double2),
stream, gpu_index);
} break;
#if CUDA_ARCH >= 900
case PBS_VARIANT::TBC: {
case PBS_VARIANT::TBC: {
bool supports_dsm =
supports_distributed_shared_memory_on_classic_programmable_bootstrap<
Torus>(polynomial_size, max_shared_memory);
bool supports_dsm =
supports_distributed_shared_memory_on_classic_programmable_bootstrap<
Torus>(polynomial_size, max_shared_memory);
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_tbc<Torus>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap_tbc<Torus>(
polynomial_size);
uint64_t minimum_sm_tbc = 0;
if (supports_dsm)
minimum_sm_tbc =
get_buffer_size_sm_dsm_plus_tbc_classic_programmable_bootstrap<
Torus>(polynomial_size);
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_tbc<Torus>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap_tbc<Torus>(
polynomial_size);
uint64_t minimum_sm_tbc = 0;
if (supports_dsm)
minimum_sm_tbc =
get_buffer_size_sm_dsm_plus_tbc_classic_programmable_bootstrap<
Torus>(polynomial_size);
uint64_t partial_dm = full_sm - partial_sm;
uint64_t full_dm = full_sm;
uint64_t device_mem = 0;
uint64_t partial_dm = full_sm - partial_sm;
uint64_t full_dm = full_sm;
uint64_t device_mem = 0;
// There is a minimum amount of memory we need to run the TBC PBS, which
// is minimum_sm_tbc. We know that minimum_sm_tbc bytes are available
// because otherwise the previous check would have redirected
// computation to some other variant. If over that we don't have more
// partial_sm bytes, TBC PBS will run on NOSM. If we have partial_sm but
// not full_sm bytes, it will run on PARTIALSM. Otherwise, FULLSM.
//
// NOSM mode actually requires minimum_sm_tbc shared memory bytes.
if (max_shared_memory < partial_sm + minimum_sm_tbc) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm + minimum_sm_tbc) {
device_mem = partial_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
}
// There is a minimum amount of memory we need to run the TBC PBS, which
// is minimum_sm_tbc. We know that minimum_sm_tbc bytes are available
// because otherwise the previous check would have redirected
// computation to some other variant. If over that we don't have more
// partial_sm bytes, TBC PBS will run on NOSM. If we have partial_sm but
// not full_sm bytes, it will run on PARTIALSM. Otherwise, FULLSM.
//
// NOSM mode actually requires minimum_sm_tbc shared memory bytes.
if (max_shared_memory < partial_sm + minimum_sm_tbc) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm + minimum_sm_tbc) {
device_mem = partial_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_with_size_tracking_async(
device_mem, stream, gpu_index, size_tracker, allocate_gpu_memory);
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_async(device_mem, stream, gpu_index);
global_join_buffer = (double2 *)cuda_malloc_with_size_tracking_async(
(glwe_dimension + 1) * level_count * input_lwe_ciphertext_count *
polynomial_size / 2 * sizeof(double2),
stream, gpu_index, size_tracker, allocate_gpu_memory);
} break;
global_join_buffer = (double2 *)cuda_malloc_async(
(glwe_dimension + 1) * level_count * input_lwe_ciphertext_count *
polynomial_size / 2 * sizeof(double2),
stream, gpu_index);
} break;
#endif
default:
PANIC("Cuda error (PBS): unsupported implementation variant.")
default:
PANIC("Cuda error (PBS): unsupported implementation variant.")
}
}
}
void release(cudaStream_t stream, uint32_t gpu_index) {
cuda_drop_with_size_tracking_async(d_mem, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_with_size_tracking_async(global_join_buffer, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(d_mem, stream, gpu_index);
cuda_drop_async(global_join_buffer, stream, gpu_index);
if (pbs_variant == DEFAULT)
cuda_drop_with_size_tracking_async(global_accumulator, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(global_accumulator, stream, gpu_index);
if (uses_noise_reduction)
cuda_drop_with_size_tracking_async(temp_lwe_array_in, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(temp_lwe_array_in, stream, gpu_index);
}
};
@@ -251,162 +247,153 @@ template <> struct pbs_buffer_128<PBS_TYPE::CLASSICAL> {
PBS_VARIANT pbs_variant;
bool uses_noise_reduction;
bool gpu_memory_allocated;
pbs_buffer_128(cudaStream_t stream, uint32_t gpu_index,
uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, PBS_VARIANT pbs_variant,
bool allocate_gpu_memory, bool allocate_ms_array,
uint64_t *size_tracker) {
gpu_memory_allocated = allocate_gpu_memory;
bool allocate_gpu_memory, bool allocate_ms_array) {
cuda_set_device(gpu_index);
this->pbs_variant = pbs_variant;
this->uses_noise_reduction = allocate_ms_array;
this->temp_lwe_array_in =
(__uint128_t *)cuda_malloc_with_size_tracking_async(
(lwe_dimension + 1) * input_lwe_ciphertext_count *
sizeof(__uint128_t),
stream, gpu_index, size_tracker, allocate_ms_array);
if (allocate_ms_array) {
this->temp_lwe_array_in = (__uint128_t *)cuda_malloc_async(
(lwe_dimension + 1) * input_lwe_ciphertext_count *
sizeof(__uint128_t),
stream, gpu_index);
}
auto max_shared_memory = cuda_get_max_shared_memory(gpu_index);
size_t global_join_buffer_size = (glwe_dimension + 1) * level_count *
input_lwe_ciphertext_count *
polynomial_size / 2 * sizeof(double) * 4;
switch (pbs_variant) {
case PBS_VARIANT::DEFAULT: {
uint64_t full_sm_step_one =
get_buffer_size_full_sm_programmable_bootstrap_step_one<__uint128_t>(
polynomial_size);
uint64_t full_sm_step_two =
get_buffer_size_full_sm_programmable_bootstrap_step_two<__uint128_t>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap<__uint128_t>(
polynomial_size);
uint64_t partial_dm_step_one = full_sm_step_one - partial_sm;
uint64_t partial_dm_step_two = full_sm_step_two - partial_sm;
uint64_t full_dm = full_sm_step_one;
uint64_t device_mem = 0;
if (max_shared_memory < partial_sm) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm_step_two) {
device_mem = (partial_dm_step_two + partial_dm_step_one * level_count) *
input_lwe_ciphertext_count * (glwe_dimension + 1);
} else if (max_shared_memory < full_sm_step_one) {
device_mem = partial_dm_step_one * input_lwe_ciphertext_count *
level_count * (glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_with_size_tracking_async(
device_mem, stream, gpu_index, size_tracker, allocate_gpu_memory);
global_join_buffer = (double *)cuda_malloc_with_size_tracking_async(
global_join_buffer_size, stream, gpu_index, size_tracker,
allocate_gpu_memory);
global_accumulator = (__uint128_t *)cuda_malloc_with_size_tracking_async(
(glwe_dimension + 1) * input_lwe_ciphertext_count * polynomial_size *
sizeof(__uint128_t),
stream, gpu_index, size_tracker, allocate_gpu_memory);
} break;
case PBS_VARIANT::CG: {
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_cg<__uint128_t>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap_cg<__uint128_t>(
polynomial_size);
uint64_t partial_dm = full_sm - partial_sm;
uint64_t full_dm = full_sm;
uint64_t device_mem = 0;
if (max_shared_memory < partial_sm) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm) {
device_mem = partial_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_with_size_tracking_async(
device_mem, stream, gpu_index, size_tracker, allocate_gpu_memory);
global_join_buffer = (double *)cuda_malloc_with_size_tracking_async(
global_join_buffer_size, stream, gpu_index, size_tracker,
allocate_gpu_memory);
} break;
#if CUDA_ARCH >= 900
case PBS_VARIANT::TBC: {
bool supports_dsm =
supports_distributed_shared_memory_on_classic_programmable_bootstrap<
__uint128_t>(polynomial_size, max_shared_memory);
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_tbc<__uint128_t>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap_tbc<__uint128_t>(
polynomial_size);
uint64_t minimum_sm_tbc = 0;
if (supports_dsm)
minimum_sm_tbc =
get_buffer_size_sm_dsm_plus_tbc_classic_programmable_bootstrap<
if (allocate_gpu_memory) {
switch (pbs_variant) {
case PBS_VARIANT::DEFAULT: {
uint64_t full_sm_step_one =
get_buffer_size_full_sm_programmable_bootstrap_step_one<
__uint128_t>(polynomial_size);
uint64_t full_sm_step_two =
get_buffer_size_full_sm_programmable_bootstrap_step_two<
__uint128_t>(polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap<__uint128_t>(
polynomial_size);
uint64_t partial_dm = full_sm - partial_sm;
uint64_t full_dm = full_sm;
uint64_t device_mem = 0;
uint64_t partial_dm_step_one = full_sm_step_one - partial_sm;
uint64_t partial_dm_step_two = full_sm_step_two - partial_sm;
uint64_t full_dm = full_sm_step_one;
// There is a minimum amount of memory we need to run the TBC PBS, which
// is minimum_sm_tbc. We know that minimum_sm_tbc bytes are available
// because otherwise the previous check would have redirected
// computation to some other variant. If over that we don't have more
// partial_sm bytes, TBC PBS will run on NOSM. If we have partial_sm but
// not full_sm bytes, it will run on PARTIALSM. Otherwise, FULLSM.
//
// NOSM mode actually requires minimum_sm_tbc shared memory bytes.
if (max_shared_memory < partial_sm + minimum_sm_tbc) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm + minimum_sm_tbc) {
device_mem = partial_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
}
uint64_t device_mem = 0;
if (max_shared_memory < partial_sm) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm_step_two) {
device_mem =
(partial_dm_step_two + partial_dm_step_one * level_count) *
input_lwe_ciphertext_count * (glwe_dimension + 1);
} else if (max_shared_memory < full_sm_step_one) {
device_mem = partial_dm_step_one * input_lwe_ciphertext_count *
level_count * (glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_async(device_mem, stream, gpu_index);
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_with_size_tracking_async(
device_mem, stream, gpu_index, size_tracker, allocate_gpu_memory);
global_join_buffer = (double *)cuda_malloc_async(
global_join_buffer_size, stream, gpu_index);
global_join_buffer = (double *)cuda_malloc_with_size_tracking_async(
global_join_buffer_size, stream, gpu_index, size_tracker,
allocate_gpu_memory);
} break;
global_accumulator = (__uint128_t *)cuda_malloc_async(
(glwe_dimension + 1) * input_lwe_ciphertext_count *
polynomial_size * sizeof(__uint128_t),
stream, gpu_index);
} break;
case PBS_VARIANT::CG: {
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_cg<__uint128_t>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap_cg<__uint128_t>(
polynomial_size);
uint64_t partial_dm = full_sm - partial_sm;
uint64_t full_dm = full_sm;
uint64_t device_mem = 0;
if (max_shared_memory < partial_sm) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm) {
device_mem = partial_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_async(device_mem, stream, gpu_index);
global_join_buffer = (double *)cuda_malloc_async(
global_join_buffer_size, stream, gpu_index);
} break;
#if CUDA_ARCH >= 900
case PBS_VARIANT::TBC: {
bool supports_dsm =
supports_distributed_shared_memory_on_classic_programmable_bootstrap<
__uint128_t>(polynomial_size, max_shared_memory);
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_tbc<__uint128_t>(
polynomial_size);
uint64_t partial_sm =
get_buffer_size_partial_sm_programmable_bootstrap_tbc<__uint128_t>(
polynomial_size);
uint64_t minimum_sm_tbc = 0;
if (supports_dsm)
minimum_sm_tbc =
get_buffer_size_sm_dsm_plus_tbc_classic_programmable_bootstrap<
__uint128_t>(polynomial_size);
uint64_t partial_dm = full_sm - partial_sm;
uint64_t full_dm = full_sm;
uint64_t device_mem = 0;
// There is a minimum amount of memory we need to run the TBC PBS, which
// is minimum_sm_tbc. We know that minimum_sm_tbc bytes are available
// because otherwise the previous check would have redirected
// computation to some other variant. If over that we don't have more
// partial_sm bytes, TBC PBS will run on NOSM. If we have partial_sm but
// not full_sm bytes, it will run on PARTIALSM. Otherwise, FULLSM.
//
// NOSM mode actually requires minimum_sm_tbc shared memory bytes.
if (max_shared_memory < partial_sm + minimum_sm_tbc) {
device_mem = full_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
} else if (max_shared_memory < full_sm + minimum_sm_tbc) {
device_mem = partial_dm * input_lwe_ciphertext_count * level_count *
(glwe_dimension + 1);
}
// Otherwise, both kernels run all in shared memory
d_mem = (int8_t *)cuda_malloc_async(device_mem, stream, gpu_index);
global_join_buffer = (double *)cuda_malloc_async(
global_join_buffer_size, stream, gpu_index);
} break;
#endif
default:
PANIC("Cuda error (PBS): unsupported implementation variant.")
default:
PANIC("Cuda error (PBS): unsupported implementation variant.")
}
}
}
void release(cudaStream_t stream, uint32_t gpu_index) {
cuda_drop_with_size_tracking_async(d_mem, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_with_size_tracking_async(global_join_buffer, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(d_mem, stream, gpu_index);
cuda_drop_async(global_join_buffer, stream, gpu_index);
if (pbs_variant == DEFAULT)
cuda_drop_with_size_tracking_async(global_accumulator, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(global_accumulator, stream, gpu_index);
if (uses_noise_reduction)
cuda_drop_with_size_tracking_async(temp_lwe_array_in, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_async(temp_lwe_array_in, stream, gpu_index);
}
};
@@ -477,7 +464,7 @@ void cuda_programmable_bootstrap_tbc_lwe_ciphertext_vector(
uint32_t lut_stride);
template <typename Torus>
uint64_t scratch_cuda_programmable_bootstrap_tbc(
void scratch_cuda_programmable_bootstrap_tbc(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, CLASSICAL> **pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t input_lwe_ciphertext_count,
@@ -485,14 +472,14 @@ uint64_t scratch_cuda_programmable_bootstrap_tbc(
#endif
template <typename Torus>
uint64_t scratch_cuda_programmable_bootstrap_cg(
void scratch_cuda_programmable_bootstrap_cg(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, CLASSICAL> **pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t input_lwe_ciphertext_count,
bool allocate_gpu_memory, bool allocate_ms_array);
template <typename Torus>
uint64_t scratch_cuda_programmable_bootstrap(
void scratch_cuda_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, CLASSICAL> **buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t input_lwe_ciphertext_count,

View File

@@ -25,12 +25,12 @@ void cuda_convert_lwe_programmable_bootstrap_key_128(
uint32_t input_lwe_dim, uint32_t glwe_dim, uint32_t level_count,
uint32_t polynomial_size);
uint64_t scratch_cuda_programmable_bootstrap_amortized_32(
void scratch_cuda_programmable_bootstrap_amortized_32(
void *stream, uint32_t gpu_index, int8_t **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
uint64_t scratch_cuda_programmable_bootstrap_amortized_64(
void scratch_cuda_programmable_bootstrap_amortized_64(
void *stream, uint32_t gpu_index, int8_t **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
@@ -57,19 +57,19 @@ void cleanup_cuda_programmable_bootstrap_amortized(void *stream,
uint32_t gpu_index,
int8_t **pbs_buffer);
uint64_t scratch_cuda_programmable_bootstrap_32(
void scratch_cuda_programmable_bootstrap_32(
void *stream, uint32_t gpu_index, int8_t **buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory,
bool allocate_ms_array);
uint64_t scratch_cuda_programmable_bootstrap_64(
void scratch_cuda_programmable_bootstrap_64(
void *stream, uint32_t gpu_index, int8_t **buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory,
bool allocate_ms_array);
uint64_t scratch_cuda_programmable_bootstrap_128(
void scratch_cuda_programmable_bootstrap_128(
void *stream, uint32_t gpu_index, int8_t **buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory,
@@ -90,19 +90,18 @@ void cuda_programmable_bootstrap_lwe_ciphertext_vector_64(
void const *lut_vector_indexes, void const *lwe_array_in,
void const *lwe_input_indexes, void const *bootstrapping_key,
CudaModulusSwitchNoiseReductionKeyFFI const *ms_noise_reduction_key,
void *ms_noise_reduction_ptr, int8_t *buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t base_log,
uint32_t level_count, uint32_t num_samples, uint32_t num_many_lut,
uint32_t lut_stride);
int8_t *buffer, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t base_log, uint32_t level_count,
uint32_t num_samples, uint32_t num_many_lut, uint32_t lut_stride);
void cuda_programmable_bootstrap_lwe_ciphertext_vector_128(
void *stream, uint32_t gpu_index, void *lwe_array_out,
void const *lut_vector, void const *lwe_array_in,
void const *bootstrapping_key,
CudaModulusSwitchNoiseReductionKeyFFI const *ms_noise_reduction_key,
void *ms_noise_reduction_ptr, int8_t *buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t base_log,
uint32_t level_count, uint32_t num_samples);
int8_t *buffer, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t base_log, uint32_t level_count,
uint32_t num_samples);
void cleanup_cuda_programmable_bootstrap(void *stream, uint32_t gpu_index,
int8_t **pbs_buffer);

View File

@@ -15,7 +15,7 @@ void cuda_convert_lwe_multi_bit_programmable_bootstrap_key_64(
uint32_t input_lwe_dim, uint32_t glwe_dim, uint32_t level_count,
uint32_t polynomial_size, uint32_t grouping_factor);
uint64_t scratch_cuda_multi_bit_programmable_bootstrap_64(
void scratch_cuda_multi_bit_programmable_bootstrap_64(
void *stream, uint32_t gpu_index, int8_t **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);

View File

@@ -13,7 +13,7 @@ void cuda_lwe_expand_64(void *const stream, uint32_t gpu_index,
const uint32_t *lwe_compact_input_indexes,
const uint32_t *output_body_id_per_compact_list);
uint64_t scratch_cuda_expand_without_verification_64(
void scratch_cuda_expand_without_verification_64(
void *const *streams, uint32_t const *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension,

View File

@@ -20,19 +20,17 @@ template <typename Torus> struct zk_expand_mem {
uint32_t *d_lwe_compact_input_indexes;
uint32_t *d_body_id_per_compact_list;
bool gpu_memory_allocated;
zk_expand_mem(cudaStream_t const *streams, uint32_t const *gpu_indexes,
uint32_t gpu_count, int_radix_params computing_params,
int_radix_params casting_params, KS_TYPE casting_key_type,
const uint32_t *num_lwes_per_compact_list,
const bool *is_boolean_array, uint32_t num_compact_lists,
bool allocate_gpu_memory, uint64_t *size_tracker)
bool allocate_gpu_memory)
: computing_params(computing_params), casting_params(casting_params),
num_compact_lists(num_compact_lists),
casting_key_type(casting_key_type) {
gpu_memory_allocated = allocate_gpu_memory;
num_lwes = 0;
for (int i = 0; i < num_compact_lists; i++) {
num_lwes += num_lwes_per_compact_list[i];
@@ -42,179 +40,173 @@ template <typename Torus> struct zk_expand_mem {
PANIC("GPU backend requires carry_modulus equal to message_modulus")
}
auto message_extract_lut_f = [casting_params](Torus x) -> Torus {
return x % casting_params.message_modulus;
};
auto carry_extract_lut_f = [casting_params](Torus x) -> Torus {
return (x / casting_params.carry_modulus) %
casting_params.message_modulus;
};
if (allocate_gpu_memory) {
// Booleans have to be sanitized
auto sanitize_bool_f = [](Torus x) -> Torus { return x == 0 ? 0 : 1; };
auto message_extract_and_sanitize_bool_lut_f =
[message_extract_lut_f, sanitize_bool_f](Torus x) -> Torus {
return sanitize_bool_f(message_extract_lut_f(x));
};
auto carry_extract_and_sanitize_bool_lut_f =
[carry_extract_lut_f, sanitize_bool_f](Torus x) -> Torus {
return sanitize_bool_f(carry_extract_lut_f(x));
};
auto message_extract_lut_f = [casting_params](Torus x) -> Torus {
return x % casting_params.message_modulus;
};
auto carry_extract_lut_f = [casting_params](Torus x) -> Torus {
return (x / casting_params.carry_modulus) %
casting_params.message_modulus;
};
/** In case the casting key casts from BIG to SMALL key we run a single KS
to expand using the casting key as ksk. Otherwise, in case the casting key
casts from SMALL to BIG key, we first keyswitch from SMALL to BIG using
the casting key as ksk, then we keyswitch from BIG to SMALL using the
computing ksk, and lastly we apply the PBS. The output is always on the
BIG key.
**/
auto params = casting_params;
if (casting_key_type == SMALL_TO_BIG) {
params = computing_params;
}
message_and_carry_extract_luts = new int_radix_lut<Torus>(
streams, gpu_indexes, gpu_count, params, 4, 2 * num_lwes,
allocate_gpu_memory, size_tracker);
// Booleans have to be sanitized
auto sanitize_bool_f = [](Torus x) -> Torus { return x == 0 ? 0 : 1; };
auto message_extract_and_sanitize_bool_lut_f =
[message_extract_lut_f, sanitize_bool_f](Torus x) -> Torus {
return sanitize_bool_f(message_extract_lut_f(x));
};
auto carry_extract_and_sanitize_bool_lut_f =
[carry_extract_lut_f, sanitize_bool_f](Torus x) -> Torus {
return sanitize_bool_f(carry_extract_lut_f(x));
};
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0],
message_and_carry_extract_luts->get_lut(0, 0),
message_and_carry_extract_luts->get_degree(0),
message_and_carry_extract_luts->get_max_degree(0),
params.glwe_dimension, params.polynomial_size, params.message_modulus,
params.carry_modulus, message_extract_lut_f, gpu_memory_allocated);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0],
message_and_carry_extract_luts->get_lut(0, 1),
message_and_carry_extract_luts->get_degree(1),
message_and_carry_extract_luts->get_max_degree(1),
params.glwe_dimension, params.polynomial_size, params.message_modulus,
params.carry_modulus, carry_extract_lut_f, gpu_memory_allocated);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0],
message_and_carry_extract_luts->get_lut(0, 2),
message_and_carry_extract_luts->get_degree(2),
message_and_carry_extract_luts->get_max_degree(2),
params.glwe_dimension, params.polynomial_size, params.message_modulus,
params.carry_modulus, message_extract_and_sanitize_bool_lut_f,
gpu_memory_allocated);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0],
message_and_carry_extract_luts->get_lut(0, 3),
message_and_carry_extract_luts->get_degree(3),
message_and_carry_extract_luts->get_max_degree(3),
params.glwe_dimension, params.polynomial_size, params.message_modulus,
params.carry_modulus, carry_extract_and_sanitize_bool_lut_f,
gpu_memory_allocated);
// Hint for future readers: if message_modulus == 4 then
// packed_messages_per_lwe becomes 2
auto packed_messages_per_lwe = log2_int(params.message_modulus);
// Adjust indexes to permute the output and access the correct LUT
auto h_indexes_in = static_cast<Torus *>(
malloc(packed_messages_per_lwe * num_lwes * sizeof(Torus)));
auto h_indexes_out = static_cast<Torus *>(
malloc(packed_messages_per_lwe * num_lwes * sizeof(Torus)));
auto h_lut_indexes = static_cast<Torus *>(
malloc(packed_messages_per_lwe * num_lwes * sizeof(Torus)));
auto h_body_id_per_compact_list =
static_cast<uint32_t *>(malloc(num_lwes * sizeof(uint32_t)));
auto h_lwe_compact_input_indexes =
static_cast<uint32_t *>(malloc(num_lwes * sizeof(uint32_t)));
d_body_id_per_compact_list =
static_cast<uint32_t *>(cuda_malloc_with_size_tracking_async(
num_lwes * sizeof(uint32_t), streams[0], gpu_indexes[0],
size_tracker, allocate_gpu_memory));
d_lwe_compact_input_indexes =
static_cast<uint32_t *>(cuda_malloc_with_size_tracking_async(
num_lwes * sizeof(uint32_t), streams[0], gpu_indexes[0],
size_tracker, allocate_gpu_memory));
auto compact_list_id = 0;
auto idx = 0;
auto count = 0;
for (int i = 0; i < num_lwes; i++) {
h_lwe_compact_input_indexes[i] = idx;
count++;
if (count == num_lwes_per_compact_list[compact_list_id]) {
compact_list_id++;
idx += casting_params.big_lwe_dimension + count;
count = 0;
/** In case the casting key casts from BIG to SMALL key we run a single KS
to expand using the casting key as ksk. Otherwise, in case the casting key
casts from SMALL to BIG key, we first keyswitch from SMALL to BIG using
the casting key as ksk, then we keyswitch from BIG to SMALL using the
computing ksk, and lastly we apply the PBS. The output is always on the
BIG key.
**/
auto params = casting_params;
if (casting_key_type == SMALL_TO_BIG) {
params = computing_params;
}
}
message_and_carry_extract_luts =
new int_radix_lut<Torus>(streams, gpu_indexes, gpu_count, params, 4,
2 * num_lwes, allocate_gpu_memory);
auto offset = 0;
for (int k = 0; k < num_compact_lists; k++) {
auto num_lwes_in_kth_compact_list = num_lwes_per_compact_list[k];
uint32_t body_count = 0;
for (int i = 0; i < num_lwes_in_kth_compact_list; i++) {
h_body_id_per_compact_list[i + offset] = body_count;
body_count++;
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0],
message_and_carry_extract_luts->get_lut(0, 0),
message_and_carry_extract_luts->get_degree(0),
message_and_carry_extract_luts->get_max_degree(0),
params.glwe_dimension, params.polynomial_size, params.message_modulus,
params.carry_modulus, message_extract_lut_f);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0],
message_and_carry_extract_luts->get_lut(0, 1),
message_and_carry_extract_luts->get_degree(1),
message_and_carry_extract_luts->get_max_degree(1),
params.glwe_dimension, params.polynomial_size, params.message_modulus,
params.carry_modulus, carry_extract_lut_f);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0],
message_and_carry_extract_luts->get_lut(0, 2),
message_and_carry_extract_luts->get_degree(2),
message_and_carry_extract_luts->get_max_degree(2),
params.glwe_dimension, params.polynomial_size, params.message_modulus,
params.carry_modulus, message_extract_and_sanitize_bool_lut_f);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0],
message_and_carry_extract_luts->get_lut(0, 3),
message_and_carry_extract_luts->get_degree(3),
message_and_carry_extract_luts->get_max_degree(3),
params.glwe_dimension, params.polynomial_size, params.message_modulus,
params.carry_modulus, carry_extract_and_sanitize_bool_lut_f);
// Hint for future readers: if message_modulus == 4 then
// packed_messages_per_lwe becomes 2
auto packed_messages_per_lwe = log2_int(params.message_modulus);
// Adjust indexes to permute the output and access the correct LUT
auto h_indexes_in = static_cast<Torus *>(
malloc(packed_messages_per_lwe * num_lwes * sizeof(Torus)));
auto h_indexes_out = static_cast<Torus *>(
malloc(packed_messages_per_lwe * num_lwes * sizeof(Torus)));
auto h_lut_indexes = static_cast<Torus *>(
malloc(packed_messages_per_lwe * num_lwes * sizeof(Torus)));
auto h_body_id_per_compact_list =
static_cast<uint32_t *>(malloc(num_lwes * sizeof(uint32_t)));
auto h_lwe_compact_input_indexes =
static_cast<uint32_t *>(malloc(num_lwes * sizeof(uint32_t)));
d_body_id_per_compact_list = static_cast<uint32_t *>(cuda_malloc_async(
num_lwes * sizeof(uint32_t), streams[0], gpu_indexes[0]));
d_lwe_compact_input_indexes = static_cast<uint32_t *>(cuda_malloc_async(
num_lwes * sizeof(uint32_t), streams[0], gpu_indexes[0]));
auto compact_list_id = 0;
auto idx = 0;
auto count = 0;
for (int i = 0; i < num_lwes; i++) {
h_lwe_compact_input_indexes[i] = idx;
count++;
if (count == num_lwes_per_compact_list[compact_list_id]) {
compact_list_id++;
idx += casting_params.big_lwe_dimension + count;
count = 0;
}
}
offset += num_lwes_in_kth_compact_list;
}
offset = 0;
for (int k = 0; k < num_compact_lists; k++) {
auto num_lwes_in_kth_compact_list = num_lwes_per_compact_list[k];
for (int i = 0;
i < packed_messages_per_lwe * num_lwes_in_kth_compact_list; i++) {
Torus j = i % num_lwes_in_kth_compact_list;
h_indexes_in[i + packed_messages_per_lwe * offset] = j + offset;
h_indexes_out[i + packed_messages_per_lwe * offset] =
packed_messages_per_lwe * (j + offset) +
(i / num_lwes_in_kth_compact_list);
// If the input relates to a boolean, shift the LUT so the correct one
// with sanitization is used
h_lut_indexes[i + packed_messages_per_lwe * offset] =
(is_boolean_array[h_indexes_out[i +
packed_messages_per_lwe * offset]]
? packed_messages_per_lwe
: 0) +
i / num_lwes_in_kth_compact_list;
auto offset = 0;
for (int k = 0; k < num_compact_lists; k++) {
auto num_lwes_in_kth_compact_list = num_lwes_per_compact_list[k];
uint32_t body_count = 0;
for (int i = 0; i < num_lwes_in_kth_compact_list; i++) {
h_body_id_per_compact_list[i + offset] = body_count;
body_count++;
}
offset += num_lwes_in_kth_compact_list;
}
offset += num_lwes_in_kth_compact_list;
offset = 0;
for (int k = 0; k < num_compact_lists; k++) {
auto num_lwes_in_kth_compact_list = num_lwes_per_compact_list[k];
for (int i = 0;
i < packed_messages_per_lwe * num_lwes_in_kth_compact_list; i++) {
Torus j = i % num_lwes_in_kth_compact_list;
h_indexes_in[i + packed_messages_per_lwe * offset] = j + offset;
h_indexes_out[i + packed_messages_per_lwe * offset] =
packed_messages_per_lwe * (j + offset) +
(i / num_lwes_in_kth_compact_list);
// If the input relates to a boolean, shift the LUT so the correct one
// with sanitization is used
h_lut_indexes[i + packed_messages_per_lwe * offset] =
(is_boolean_array[h_indexes_out[i +
packed_messages_per_lwe * offset]]
? packed_messages_per_lwe
: 0) +
i / num_lwes_in_kth_compact_list;
}
offset += num_lwes_in_kth_compact_list;
}
message_and_carry_extract_luts->set_lwe_indexes(
streams[0], gpu_indexes[0], h_indexes_in, h_indexes_out);
auto lut_indexes = message_and_carry_extract_luts->get_lut_indexes(0, 0);
message_and_carry_extract_luts->broadcast_lut(streams, gpu_indexes, 0);
cuda_memcpy_async_to_gpu(
d_lwe_compact_input_indexes, h_lwe_compact_input_indexes,
num_lwes * sizeof(uint32_t), streams[0], gpu_indexes[0]);
cuda_memcpy_async_to_gpu(lut_indexes, h_lut_indexes,
packed_messages_per_lwe * num_lwes *
sizeof(Torus),
streams[0], gpu_indexes[0]);
cuda_memcpy_async_to_gpu(
d_body_id_per_compact_list, h_body_id_per_compact_list,
num_lwes * sizeof(uint32_t), streams[0], gpu_indexes[0]);
// The expanded LWEs will always be on the casting key format
tmp_expanded_lwes = (Torus *)cuda_malloc_async(
num_lwes * (casting_params.big_lwe_dimension + 1) * sizeof(Torus),
streams[0], gpu_indexes[0]);
tmp_ksed_small_to_big_expanded_lwes = (Torus *)cuda_malloc_async(
num_lwes * (casting_params.big_lwe_dimension + 1) * sizeof(Torus),
streams[0], gpu_indexes[0]);
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
free(h_indexes_in);
free(h_indexes_out);
free(h_lut_indexes);
free(h_body_id_per_compact_list);
free(h_lwe_compact_input_indexes);
}
message_and_carry_extract_luts->set_lwe_indexes(
streams[0], gpu_indexes[0], h_indexes_in, h_indexes_out);
auto lut_indexes = message_and_carry_extract_luts->get_lut_indexes(0, 0);
message_and_carry_extract_luts->broadcast_lut(streams, gpu_indexes, 0);
cuda_memcpy_with_size_tracking_async_to_gpu(
d_lwe_compact_input_indexes, h_lwe_compact_input_indexes,
num_lwes * sizeof(uint32_t), streams[0], gpu_indexes[0],
allocate_gpu_memory);
cuda_memcpy_with_size_tracking_async_to_gpu(
lut_indexes, h_lut_indexes,
packed_messages_per_lwe * num_lwes * sizeof(Torus), streams[0],
gpu_indexes[0], allocate_gpu_memory);
cuda_memcpy_with_size_tracking_async_to_gpu(
d_body_id_per_compact_list, h_body_id_per_compact_list,
num_lwes * sizeof(uint32_t), streams[0], gpu_indexes[0],
allocate_gpu_memory);
// The expanded LWEs will always be on the casting key format
tmp_expanded_lwes = (Torus *)cuda_malloc_with_size_tracking_async(
num_lwes * (casting_params.big_lwe_dimension + 1) * sizeof(Torus),
streams[0], gpu_indexes[0], size_tracker, allocate_gpu_memory);
tmp_ksed_small_to_big_expanded_lwes =
(Torus *)cuda_malloc_with_size_tracking_async(
num_lwes * (casting_params.big_lwe_dimension + 1) * sizeof(Torus),
streams[0], gpu_indexes[0], size_tracker, allocate_gpu_memory);
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
free(h_indexes_in);
free(h_indexes_out);
free(h_lut_indexes);
free(h_body_id_per_compact_list);
free(h_lwe_compact_input_indexes);
}
void release(cudaStream_t const *streams, uint32_t const *gpu_indexes,
@@ -223,15 +215,11 @@ template <typename Torus> struct zk_expand_mem {
message_and_carry_extract_luts->release(streams, gpu_indexes, gpu_count);
delete message_and_carry_extract_luts;
cuda_drop_with_size_tracking_async(d_body_id_per_compact_list, streams[0],
gpu_indexes[0], gpu_memory_allocated);
cuda_drop_with_size_tracking_async(d_lwe_compact_input_indexes, streams[0],
gpu_indexes[0], gpu_memory_allocated);
cuda_drop_with_size_tracking_async(tmp_expanded_lwes, streams[0],
gpu_indexes[0], gpu_memory_allocated);
cuda_drop_with_size_tracking_async(tmp_ksed_small_to_big_expanded_lwes,
streams[0], gpu_indexes[0],
gpu_memory_allocated);
cuda_drop_async(d_body_id_per_compact_list, streams[0], gpu_indexes[0]);
cuda_drop_async(d_lwe_compact_input_indexes, streams[0], gpu_indexes[0]);
cuda_drop_async(tmp_expanded_lwes, streams[0], gpu_indexes[0]);
cuda_drop_async(tmp_ksed_small_to_big_expanded_lwes, streams[0],
gpu_indexes[0]);
}
};

View File

@@ -96,45 +96,3 @@ void cuda_improve_noise_modulus_switch_64(
static_cast<const uint64_t *>(encrypted_zeros), lwe_size, num_lwes,
num_zeros, input_variance, r_sigma, bound, log_modulus);
}
void cuda_glwe_sample_extract_128(
void *stream, uint32_t gpu_index, void *lwe_array_out,
void const *glwe_array_in, uint32_t const *nth_array, uint32_t num_nths,
uint32_t lwe_per_glwe, uint32_t glwe_dimension, uint32_t polynomial_size) {
switch (polynomial_size) {
case 256:
host_sample_extract<__uint128_t, AmortizedDegree<256>>(
static_cast<cudaStream_t>(stream), gpu_index,
(__uint128_t *)lwe_array_out, (__uint128_t const *)glwe_array_in,
(uint32_t const *)nth_array, num_nths, lwe_per_glwe, glwe_dimension);
break;
case 512:
host_sample_extract<__uint128_t, AmortizedDegree<512>>(
static_cast<cudaStream_t>(stream), gpu_index,
(__uint128_t *)lwe_array_out, (__uint128_t const *)glwe_array_in,
(uint32_t const *)nth_array, num_nths, lwe_per_glwe, glwe_dimension);
break;
case 1024:
host_sample_extract<__uint128_t, AmortizedDegree<1024>>(
static_cast<cudaStream_t>(stream), gpu_index,
(__uint128_t *)lwe_array_out, (__uint128_t const *)glwe_array_in,
(uint32_t const *)nth_array, num_nths, lwe_per_glwe, glwe_dimension);
break;
case 2048:
host_sample_extract<__uint128_t, AmortizedDegree<2048>>(
static_cast<cudaStream_t>(stream), gpu_index,
(__uint128_t *)lwe_array_out, (__uint128_t const *)glwe_array_in,
(uint32_t const *)nth_array, num_nths, lwe_per_glwe, glwe_dimension);
break;
case 4096:
host_sample_extract<__uint128_t, AmortizedDegree<4096>>(
static_cast<cudaStream_t>(stream), gpu_index,
(__uint128_t *)lwe_array_out, (__uint128_t const *)glwe_array_in,
(uint32_t const *)nth_array, num_nths, lwe_per_glwe, glwe_dimension);
break;
default:
PANIC("Cuda error: unsupported polynomial size. Supported "
"N's are powers of two in the interval [256..4096].")
}
}

View File

@@ -7,7 +7,6 @@
#include "gadget.cuh"
#include "helper_multi_gpu.h"
#include "keyswitch.cuh"
#include "linearalgebra/multiplication.cuh"
#include "polynomial/functions.cuh"
#include "polynomial/polynomial_math.cuh"
#include "torus.cuh"
@@ -18,12 +17,18 @@
#define CEIL_DIV(M, N) ((M) + (N)-1) / (N)
const int BLOCK_SIZE_GEMM = 64;
const int THREADS_GEMM = 8;
const int BLOCK_SIZE_DECOMP = 8;
template <typename Torus> uint64_t get_shared_mem_size_tgemm() {
return BLOCK_SIZE_GEMM * THREADS_GEMM * 2 * sizeof(Torus);
}
// Initialize decomposition by performing rounding
// and decomposing one level of an array of Torus LWEs. Only
// decomposes the mask elements of the incoming LWEs.
template <typename Torus>
template <typename Torus, typename TorusVec>
__global__ void decompose_vectorize_init(Torus const *lwe_in, Torus *lwe_out,
uint32_t lwe_dimension,
uint32_t num_lwe, uint32_t base_log,
@@ -51,14 +56,14 @@ __global__ void decompose_vectorize_init(Torus const *lwe_in, Torus *lwe_out,
Torus mod_b_mask = (1ll << base_log) - 1ll;
lwe_out[write_val_idx] = decompose_one<Torus>(state, mod_b_mask, base_log);
__syncthreads();
synchronize_threads_in_block();
lwe_out[write_state_idx] = state;
}
// Continue decomposiion of an array of Torus elements in place. Supposes
// that the array contains already decomposed elements and
// computes the new decomposed level in place.
template <typename Torus>
template <typename Torus, typename TorusVec>
__global__ void
decompose_vectorize_step_inplace(Torus *buffer_in, uint32_t lwe_dimension,
uint32_t num_lwe, uint32_t base_log,
@@ -76,15 +81,115 @@ decompose_vectorize_step_inplace(Torus *buffer_in, uint32_t lwe_dimension,
auto state_idx = num_lwe * lwe_dimension + val_idx;
Torus state = buffer_in[state_idx];
__syncthreads();
synchronize_threads_in_block();
Torus mod_b_mask = (1ll << base_log) - 1ll;
buffer_in[val_idx] = decompose_one<Torus>(state, mod_b_mask, base_log);
__syncthreads();
synchronize_threads_in_block();
buffer_in[state_idx] = state;
}
// Multiply matrices A, B of size (M, K), (K, N) respectively
// with K as the inner dimension.
//
// A block of threads processeds blocks of size (BLOCK_SIZE_GEMM,
// BLOCK_SIZE_GEMM) splitting them in multiple tiles: (BLOCK_SIZE_GEMM,
// THREADS_GEMM)-shaped tiles of values from A, and a (THREADS_GEMM,
// BLOCK_SIZE_GEMM)-shaped tiles of values from B.
//
// This code is adapted by generalizing the 1d block-tiling
// kernel from https://github.com/siboehm/SGEMM_CUDA
// to any matrix dimension
template <typename Torus, typename TorusVec>
__global__ void tgemm(int M, int N, int K, const Torus *A, const Torus *B,
int stride_B, Torus *C) {
const int BM = BLOCK_SIZE_GEMM;
const int BN = BLOCK_SIZE_GEMM;
const int BK = THREADS_GEMM;
const int TM = THREADS_GEMM;
const uint cRow = blockIdx.y;
const uint cCol = blockIdx.x;
const int threadCol = threadIdx.x % BN;
const int threadRow = threadIdx.x / BN;
// Allocate space for the current block tile in shared memory
__shared__ Torus As[BM * BK];
__shared__ Torus Bs[BK * BN];
// Initialize the pointers to the input blocks from A, B
// Tiles from these blocks are loaded to shared memory
A += cRow * BM * K;
B += cCol * BN;
// Each thread will handle multiple sub-blocks
const uint innerColA = threadIdx.x % BK;
const uint innerRowA = threadIdx.x / BK;
const uint innerColB = threadIdx.x % BN;
const uint innerRowB = threadIdx.x / BN;
// allocate thread-local cache for results in registerfile
Torus threadResults[TM] = {0};
auto row_A = cRow * BM + innerRowA;
auto col_B = cCol * BN + innerColB;
// For each thread, loop over block tiles
for (uint bkIdx = 0; bkIdx < K; bkIdx += BK) {
auto col_A = bkIdx + innerColA;
auto row_B = bkIdx + innerRowB;
if (row_A < M && col_A < K) {
As[innerRowA * BK + innerColA] = A[innerRowA * K + innerColA];
} else {
As[innerRowA * BK + innerColA] = 0;
}
if (col_B < N && row_B < K) {
Bs[innerRowB * BN + innerColB] = B[innerRowB * stride_B + innerColB];
} else {
Bs[innerRowB * BN + innerColB] = 0;
}
synchronize_threads_in_block();
// Advance blocktile for the next iteration of this loop
A += BK;
B += BK * stride_B;
// calculate per-thread results
for (uint dotIdx = 0; dotIdx < BK; ++dotIdx) {
// we make the dotproduct loop the outside loop, which facilitates
// reuse of the Bs entry, which we can cache in a tmp var.
Torus tmp = Bs[dotIdx * BN + threadCol];
for (uint resIdx = 0; resIdx < TM; ++resIdx) {
threadResults[resIdx] +=
As[(threadRow * TM + resIdx) * BK + dotIdx] * tmp;
}
}
synchronize_threads_in_block();
}
// Initialize the pointer to the output block of size (BLOCK_SIZE_GEMM,
// BLOCK_SIZE_GEMM)
C += cRow * BM * N + cCol * BN;
// write out the results
for (uint resIdx = 0; resIdx < TM; ++resIdx) {
int outRow = cRow * BM + threadRow * TM + resIdx;
int outCol = cCol * BN + threadCol;
if (outRow >= M)
continue;
if (outCol >= N)
continue;
C[(threadRow * TM + resIdx) * N + threadCol] += threadResults[resIdx];
}
}
// Finish the keyswitching operation and prepare GLWEs for accumulation.
// 1. Finish the keyswitching computation partially performed with a GEMM:
// - negate the dot product between the GLWE and KSK polynomial
@@ -146,8 +251,8 @@ __global__ void polynomial_accumulate_monic_monomial_mul_many_neg_and_add_C(
degree, coeffIdx, polynomial_size, 1, true);
}
template <typename Torus>
__host__ void host_packing_keyswitch_lwe_list_to_glwe(
template <typename Torus, typename TorusVec>
__host__ void host_fast_packing_keyswitch_lwe_list_to_glwe(
cudaStream_t stream, uint32_t gpu_index, Torus *glwe_out,
Torus const *lwe_array_in, Torus const *fp_ksk_array, int8_t *fp_ks_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
@@ -191,8 +296,10 @@ __host__ void host_packing_keyswitch_lwe_list_to_glwe(
dim3 threads_decomp(BLOCK_SIZE_DECOMP, BLOCK_SIZE_DECOMP);
// decompose first level
decompose_vectorize_init<Torus><<<grid_decomp, threads_decomp, 0, stream>>>(
lwe_array_in, d_mem_0, lwe_dimension, num_lwes, base_log, level_count);
decompose_vectorize_init<Torus, TorusVec>
<<<grid_decomp, threads_decomp, 0, stream>>>(lwe_array_in, d_mem_0,
lwe_dimension, num_lwes,
base_log, level_count);
check_cuda_error(cudaGetLastError());
// gemm to ks the individual LWEs to GLWEs
@@ -202,28 +309,24 @@ __host__ void host_packing_keyswitch_lwe_list_to_glwe(
auto stride_KSK_buffer = glwe_accumulator_size * level_count;
// Shared memory requirement is 8192 bytes for 64-bit Torus elements
uint32_t shared_mem_size = get_shared_mem_size_tgemm<Torus>();
if (shared_mem_size > 8192)
PANIC("GEMM kernel error: shared memory required might be too large");
tgemm<Torus><<<grid_gemm, threads_gemm, shared_mem_size, stream>>>(
tgemm<Torus, TorusVec><<<grid_gemm, threads_gemm, shared_mem_size, stream>>>(
num_lwes, glwe_accumulator_size, lwe_dimension, d_mem_0, fp_ksk_array,
stride_KSK_buffer, d_mem_1, glwe_accumulator_size);
stride_KSK_buffer, d_mem_1);
check_cuda_error(cudaGetLastError());
auto ksk_block_size = glwe_accumulator_size;
for (int li = 1; li < level_count; ++li) {
decompose_vectorize_step_inplace<Torus>
decompose_vectorize_step_inplace<Torus, TorusVec>
<<<grid_decomp, threads_decomp, 0, stream>>>(
d_mem_0, lwe_dimension, num_lwes, base_log, level_count);
check_cuda_error(cudaGetLastError());
tgemm<Torus><<<grid_gemm, threads_gemm, shared_mem_size, stream>>>(
num_lwes, glwe_accumulator_size, lwe_dimension, d_mem_0,
fp_ksk_array + li * ksk_block_size, stride_KSK_buffer, d_mem_1,
glwe_accumulator_size);
tgemm<Torus, TorusVec>
<<<grid_gemm, threads_gemm, shared_mem_size, stream>>>(
num_lwes, glwe_accumulator_size, lwe_dimension, d_mem_0,
fp_ksk_array + li * ksk_block_size, stride_KSK_buffer, d_mem_1);
check_cuda_error(cudaGetLastError());
}

Some files were not shown because too many files have changed in this diff Show More