Compare commits

...

376 Commits

Author SHA1 Message Date
Guillermo Oyarzun
00ecb8306a feat(gpu): implement sub array search on GPU 2024-08-13 18:36:07 +02:00
Ben
2000feb87e chore(CI): update LE commit 2024-08-13 14:56:27 +01:00
tmontaigu
594a5cee25 fix(integer): remove double carry prop in sub
The subtraction is done via addition of the negation,
the negation is done via unchecked_neg, this will make the
first block have a carry.
Then we called add_assign_with_carry_parallelized which did
a carry propagation on the rhs which here is the negated value,
meaning the subtraction would do 2 carry propagation.

To fix that we directly call the lower function.
2024-08-13 14:45:57 +02:00
Nicolas Sarlin
401cfc5fd0 feat(hl): add scalar bitslice operation 2024-08-13 10:07:36 +02:00
Nicolas Sarlin
769c725c67 feat(integer): Adds bitslice operation 2024-08-13 10:07:36 +02:00
dependabot[bot]
07d143e032 chore(deps): bump tj-actions/changed-files from 44.5.6 to 44.5.7
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 44.5.6 to 44.5.7.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](6b2903bdce...c65cd88342)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-12 18:16:17 +02:00
dependabot[bot]
d88bba761b chore(deps): bump actions/upload-artifact from 4.3.4 to 4.3.6
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.4 to 4.3.6.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](0b2256b8c0...834a144ee9)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-12 15:06:40 +02:00
dependabot[bot]
eaa1d07f90 chore(deps): bump dtolnay/rust-toolchain
Bumps [dtolnay/rust-toolchain](https://github.com/dtolnay/rust-toolchain) from 21dc36fb71dd22e3317045c0c31a3f4249868b17 to 7b1c307e0dcbda6122208f10795a713336a9b35a.
- [Release notes](https://github.com/dtolnay/rust-toolchain/releases)
- [Commits](21dc36fb71...7b1c307e0d)

---
updated-dependencies:
- dependency-name: dtolnay/rust-toolchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-12 15:06:32 +02:00
Agnes Leroy
663322cfa5 chore(gpu): remove omp from div 2024-08-09 17:49:20 +02:00
Agnes Leroy
ddd6a6e136 chore(gpu): remove omp from signed overflow add_sub and scalar comparisons 2024-08-09 17:49:20 +02:00
Agnes Leroy
abc39f0a3e chore(gpu): remove omp loop from scalar_shift 2024-08-09 17:49:20 +02:00
Agnes Leroy
8b7556667b chore(gpu): remove omp in cmux 2024-08-09 17:49:20 +02:00
Guillermo Oyarzun
67b1607773 feat(gpu): implement ilog2, trailing and leading zeros and ones on GPU 2024-08-09 13:56:00 +02:00
Agnes Leroy
5340859003 chore(ci): transfer all GPU CI to hyperstack 2024-08-07 17:08:13 +02:00
Agnes Leroy
a26e68c3bc chore(gpu): remove some host decoration and duplicated def 2024-08-06 21:01:24 +02:00
Agnes Leroy
0dd622ebb9 chore(gpu): refactor tree_add_chunks 2024-08-06 14:31:19 +02:00
Agnes Leroy
d69dd20079 chore(gpu): define higher values for the sm size based on compute capability 2024-08-06 14:06:38 +02:00
Nicolas Sarlin
80fe45f354 test(versionable): test Versionize with various rust types 2024-08-05 18:21:07 +02:00
Nicolas Sarlin
33114e3946 feat(versionable): impl Versionize for Wrapping<T> 2024-08-05 18:21:07 +02:00
Nicolas Sarlin
ede0745b7f feat(versionable): Add support for statically sized arrays 2024-08-05 18:21:07 +02:00
Guillermo Oyarzun
bc4cd08e7a refactor(gpu): Specify launch bounds on kernels 2024-08-05 17:56:42 +02:00
Nicolas Sarlin
b03921f1ae chore(doc): ignore data repo in check_md_docs_are_tested 2024-08-05 16:01:39 +02:00
Agnes Leroy
70f7af06f5 refactor(gpu): configure GPU parameters automatically to multi-bit 2024-08-05 15:02:18 +02:00
Agnes Leroy
a9bb6eac5f fix(gpu): fix argument in scratch mul 2024-08-02 16:58:48 +02:00
Agnes Leroy
4fa9b243e0 fix(gpu): fix multi-gpu error in division 2024-08-02 15:36:43 +02:00
Agnes Leroy
b88f561358 fix(gpu): fix full prop with 1 radix block 2024-08-02 13:06:12 +02:00
Mayeul@Zama
0e71ca6c1c fix(hlapi): fix Client/Server Key versionning 2024-08-02 11:32:39 +02:00
Pedro Alves
3ba61c0694 refactor(gpu): fix sample extraction when nth > 0 and keep input unchanged 2024-08-02 11:10:04 +02:00
Nicolas Sarlin
781f78c442 feat(versionable): impl Versionize for Box<[T]> and ABox<[T]> 2024-08-02 10:53:39 +02:00
Nicolas Sarlin
ebfc1ea8ac feat(versionable): impl Versionize for HashSet/HashMap 2024-08-02 10:53:39 +02:00
Agnes Leroy
7fa9f33776 refactor(gpu): remove lwe chunk size argument 2024-08-02 09:12:00 +02:00
Kelong Cong
5547d92c79 refactor(gpu): remove max_shared_memory from pbs arguments
Always use max shared memory from device 0 to configure the
kernels, to avoid bugs with multi-GPU configurations
2024-08-01 11:18:52 +02:00
Kelong Cong
351fc476b5 chore(versionable): add Send and Sync marker traits to Err type 2024-07-31 14:43:18 +02:00
Agnes Leroy
53cd3c8d0f chore(gpu): do no reset shared memory size for tree_add_chunks 2024-07-31 14:38:38 +02:00
Agnes Leroy
0a2ad8ca72 chore(gpu): remove remaining par_iter over gpu_indexes
Rename some variables to try and make the code clearer
2024-07-31 08:50:16 +02:00
Agnes Leroy
eba4f6a89c chore(gpu): use PCIe H100 for multi-gpu bench 2024-07-31 08:50:07 +02:00
Agnes Leroy
4b933cf421 chore(gpu): split hyperstack tests 2024-07-31 08:50:07 +02:00
Agnes Leroy
3303cd8568 chore(gpu): refactor template and clean arguments for the PBS 2024-07-31 08:49:39 +02:00
tmontaigu
f937524f64 feat(integer): improve carry propagation algorithm 2024-07-26 17:38:35 +02:00
Agnes Leroy
e7da96271c fix(gpu): fix scalar rotate and add some checks 2024-07-26 17:03:15 +02:00
Agnes Leroy
0cc716544b fix(gpu): fix scalar shifts 2024-07-26 17:03:15 +02:00
Arthur Meyre
f53087b5ed test(integer): add a case which was previously crashing 2024-07-26 16:41:43 +02:00
Arthur Meyre
bcefe977c9 chore(shortint): add more granular 2_2 TUniform parameters 2024-07-26 16:41:43 +02:00
Arthur Meyre
73ea24fd51 refactor(shortint): refactor the shortint keyswitching code
- this manages better all the cases we encouter, we force a refresh PBS in
all cases for now which is less optimal in certain cases but allows to be
safe in cases where keyswitches might be chained
2024-07-26 16:41:43 +02:00
Agnes Leroy
6f1a9bdaa5 chore(gpu): simplify 4090 bench workflow 2024-07-26 14:17:40 +02:00
Agnes Leroy
7834f699d0 chore(gpu): add checks in hillis&steele to avoid wrong memory access 2024-07-26 13:56:32 +02:00
Beka Barbakadze
b81692b2df chore(gpu): add comments inside host_integer_sum_ciphertexts_vec_kb function 2024-07-26 13:55:23 +02:00
Mayeul@Zama
8748d1cc22 chore(hlapi): remove Wop 2024-07-26 12:03:13 +02:00
Mayeul@Zama
dbb13aa35e chore(trivium): remove Wop usage 2024-07-26 12:03:13 +02:00
Mayeul@Zama
53f4c9bfc7 feat(integer): add reverse_bits 2024-07-26 12:03:13 +02:00
Agnes Leroy
4021812248 fix(gpu): fix memory error in mul 2024-07-26 10:50:32 +02:00
Nicolas Sarlin
190c5e7bb7 fix(ci): auto merge job used wrong variable 2024-07-26 10:22:18 +02:00
Arthur Meyre
2004333d6e chore(tfhe): bump version to 0.8.0.alpha.2 2024-07-25 18:47:15 +02:00
Arthur Meyre
e7c06ef956 feat(integer): add raw parts API for the KeySwitchingKeyMaterial 2024-07-25 18:47:15 +02:00
Arthur Meyre
7b14fe6fee feat(shortint): add raw parts API for the KeySwitchingKeyMaterial 2024-07-25 18:47:15 +02:00
Arthur Meyre
55f4df97b4 chore(core): have the CiphertextModulusKind enum in the prelude
- makes working with CiphertextModulus and the kind method easier
2024-07-25 18:47:15 +02:00
Nicolas Sarlin
2144ec8107 chore(ci): automatically merge pr in the data repo 2024-07-25 16:49:05 +02:00
Nicolas Sarlin
fb862ddbbc chore(ci): use specific workflow for data compatibility tests 2024-07-25 16:49:05 +02:00
Nicolas Sarlin
ab0b01f7e1 chore(hl): add data tests for heterogeneous lists 2024-07-25 16:49:05 +02:00
Arthur Meyre
6c4318b8bb chore(ci): auto data branch 2024-07-25 16:49:05 +02:00
Agnes Leroy
d3f2ecd367 chore(gpu): add nvidia-smi call in all hyperstack workflows 2024-07-25 15:10:23 +02:00
Pedro Alves
19dc0f02f9 refactor(gpu): refactor sample extract and modulus switch to match CPU's version 2024-07-25 11:51:07 +02:00
Nicolas Sarlin
95d50368fa doc(integer): fix typo in shl doc 2024-07-25 11:43:47 +02:00
Mayeul@Zama
c117798b10 chore(integer): add compression benches 2024-07-25 11:41:02 +02:00
Mayeul@Zama
da0934d4bc refactor(integer): compression uses ClientKey instead of RadixClientKey 2024-07-25 11:41:02 +02:00
Agnes Leroy
b522de3273 fix(gpu): fix add with 1 block 2024-07-25 11:39:45 +02:00
Agnes Leroy
9205703454 chore(gpu): fix hardware name in multi-gpu workflows 2024-07-25 11:34:58 +02:00
Arthur Meyre
a1b92a6db8 chore(tfhe)!: remove dependency on the dynamic buffer lib
- this was required in a semver trick setting and is not needed anymore

BREAKING CHANGE:
the way to build the C API has changed and no longer requires the dynamic
buffer lib
2024-07-24 17:30:46 +02:00
Arthur Meyre
8d7c45bf17 chore(ci): remove semver-trick era version for TFHE_SPEC 2024-07-24 17:30:46 +02:00
Arthur Meyre
91f05b00b9 refactor(core): make GGSW encryption consistent
- functions take un-encoded values, reflect that by taking Cleartext
instead of Plaintext
2024-07-24 13:39:40 +02:00
Arthur Meyre
ebb11b15c4 chore(docs): add links to CompressedServerKey in several places
- a ServerKey can be fairly large, and users may want to send the key over
the network so give indications about the CompressedServerKey
2024-07-23 19:18:53 +02:00
Arthur Meyre
18270714d8 chore(bench): record the size of the proof as well
- this is not perfect as one size is serialized, so compression can happen
while the other is an in memory size
2024-07-23 15:59:06 +02:00
Arthur Meyre
6c6525b1ea chore: add the ability to get the in memory size of a proof in proven lists 2024-07-23 15:59:06 +02:00
Arthur Meyre
79f8971712 chore(ci): properly manage all events for our benchmarks 2024-07-23 15:59:00 +02:00
Arthur Meyre
17db09bf2a chore(ci): do not run schedule benchmarks not on our repo 2024-07-23 15:59:00 +02:00
Arthur Meyre
fc9bfcaf61 chore(ci): do not run CPU integer tests if not on our repo 2024-07-23 15:59:00 +02:00
Arthur Meyre
d93c412dc5 chore(ci): only run CUDA integer tests on schedule on our repo 2024-07-23 15:59:00 +02:00
Arthur Meyre
ea222007d8 chore(ci): run 4090 bench on schedule only on our repository 2024-07-23 15:59:00 +02:00
Titouan Tanguy
3470d6c2d8 chore: bump version to alpha.1 2024-07-23 10:00:03 +02:00
Titouan Tanguy
fffdc3862e feat(hlapi): Get num_bits from FheUint* types 2024-07-23 10:00:03 +02:00
Agnes Leroy
d9eca01631 fix(gpu): dispatch/gather inputs and outputs to the ks and pbs on all GPUs 2024-07-23 08:48:48 +02:00
Beka Barbakadze
95ef13f6ce feat(gpu): Add signed_overflowing_scalar_add and signed_overflowing_scalar_sub 2024-07-22 16:45:47 +02:00
Beka Barbakadze
230fa5a8f0 feat(gpu): implement signed_overflowing_sub 2024-07-22 09:31:54 +02:00
Agnes Leroy
b443855b8b fix(gpu): add missing delete 2024-07-22 09:30:48 +02:00
dependabot[bot]
ba80c33328 chore(deps): bump actions/download-artifact from 4.1.7 to 4.1.8
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4.1.7 to 4.1.8.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](65a9edc588...fa0a91b85d)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-22 09:11:34 +02:00
dependabot[bot]
e5dc45c084 chore(deps): bump tj-actions/changed-files from 44.5.5 to 44.5.6
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 44.5.5 to 44.5.6.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](cc733854b1...6b2903bdce)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-22 09:11:22 +02:00
dependabot[bot]
b450f0eb30 chore(deps): bump actions/upload-artifact from 4.3.3 to 4.3.4
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.3 to 4.3.4.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4.3.3...0b2256b8c012f0828dc542b3febcab082c67f72b)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-22 09:11:10 +02:00
Arthur Meyre
7479cc826b chore(bench): increase wasm bench timeout 2024-07-19 14:00:20 +02:00
Arthur Meyre
b2beac2d2c chore(ci): do not trigger notification if bench did not run
- only trigger on failure if benchmarks ran
2024-07-19 13:55:53 +02:00
Arthur Meyre
b700416597 chore(bench): measure proof size for zk benchmarks
Also clean key sizes measurements since they are now done in
shortint_key_sizes.rs

Co-authored-by: David Testé <david.teste@zama.ai>
2024-07-19 13:41:29 +02:00
David Testé
42609987a1 chore(examples): measure more shortint key sizes 2024-07-19 13:41:29 +02:00
David Testé
5b37a838ba chore(shortint): add constructor for compressed key switching key 2024-07-19 13:41:29 +02:00
Guillermo Oyarzun
c1fcd95d72 refactor(gpu): add restrict keyword 2024-07-19 13:08:39 +02:00
Arthur Meyre
ffb8b4f930 chore(ci): fix nvm version usage for web parallel tests 2024-07-19 10:27:05 +02:00
Nicolas Sarlin
3b8dace975 chore(backward): allow custom data repo branch for tests 2024-07-19 09:42:03 +02:00
Agnes Leroy
44f326824f chore(gpu): remove stream callbacks 2024-07-19 09:05:24 +02:00
Nicolas Sarlin
f41d133fc7 fix(hl): wrong Named impl for CompressedCiphertextList 2024-07-18 15:23:55 +02:00
David Testé
52d43961b8 chore(ci): add should-run capability for gpu workflows 2024-07-18 15:21:38 +02:00
Arthur Meyre
35b89704aa chore(ci): do not block integer tests due to concurrency on push to main 2024-07-18 14:28:43 +02:00
Arthur Meyre
b578cf19c2 chore(ci): fix lockfile management 2024-07-18 13:03:53 +02:00
Arthur Meyre
dd68ce67ad chore(ci): pin node version, 22.5 is affected by a bug 2024-07-18 11:23:04 +02:00
David Testé
f8d8cc90fe chore(ci): adapt benchmarks workflows to use slab-github-runner 2024-07-18 10:18:34 +02:00
aquint-zama
eac37a7749 chore: add SLSA for tfhe crate 2024-07-17 18:15:41 +02:00
aquint-zama
4342efecc8 chore: add SLSA provenance for NPM artifacts
# Conflicts:
#	.github/workflows/make_release.yml
2024-07-17 18:15:41 +02:00
Nicolas Sarlin
a3ec84729d feat(hl): add serialize/versionize for hl KSK 2024-07-17 17:49:08 +02:00
Arthur Meyre
90d6b221d7 chore(tfhe): bump version to pre release 2024-07-17 16:52:52 +02:00
Arthur Meyre
b1491734b2 chore(cuda): bump version to pre-release 2024-07-17 16:52:52 +02:00
Arthur Meyre
436dd6a687 chore(zk): bump version to pre-release 2024-07-17 16:52:52 +02:00
Agnes Leroy
39534cb4c4 chore(gpu): avoid broadcasting lut twice for bitops 2024-07-17 16:34:52 +02:00
Agnes Leroy
723443589d chore(gpu): fix some int_radix_luts numbers of blocks 2024-07-17 16:34:52 +02:00
David Testé
d58a1b68cb chore(ci): update slab-github-runner action 2024-07-17 16:16:52 +02:00
Guillermo Oyarzun
b29c477462 feat(gpu): Add missing asserts 2024-07-17 15:26:06 +02:00
Agnes Leroy
bed3d88426 chore(gpu): remove unnecessary templates 2024-07-17 15:16:10 +02:00
Nicolas Sarlin
35201b06b6 chore(versionable): prepare release 0.2.0 2024-07-17 13:44:30 +02:00
Nicolas Sarlin
c8ddc0f008 chore(versionable)!: Impl std::error::Error for UnversionizeError
BREAKING CHANGE: The `Upgrade` trait now requires to specify the Error type as
an associated type (similar to `TryFrom`)
2024-07-17 13:44:30 +02:00
Nicolas Sarlin
4d934f512a chore(backward): run custom tfhe-rs lints in the ci 2024-07-17 13:44:30 +02:00
Nicolas Sarlin
52b0907c47 feat(all): versionize missing types 2024-07-17 13:44:30 +02:00
Nicolas Sarlin
8ea647dc26 feat(versionable): impl Versionize for Arc 2024-07-17 13:44:30 +02:00
Nicolas Sarlin
8f72677fa6 chore(backward): add exceptions to missing versioning lint 2024-07-17 13:44:30 +02:00
Nicolas Sarlin
36a58cf16c chore(backward): add custom lint to detect missing Versionize implem 2024-07-17 13:44:30 +02:00
Nicolas Sarlin
de79f3a280 feat(versionable): support more tuples 2024-07-17 13:44:30 +02:00
Nicolas Sarlin
e9051419cd refactor(versionable)!: fix signature of versionize_owned
BREAKING CHANGE: `versionize_owned` now takes its argument by value.
2024-07-17 13:44:30 +02:00
Nicolas Sarlin
ac37c3883d chore(ci): allow '!' for breaking changes in commit messages 2024-07-17 13:44:30 +02:00
Nicolas Sarlin
72fb770308 chore(versionable): add automatically_derived attribute
For the generated code
2024-07-17 13:44:30 +02:00
Nicolas Sarlin
34d07f5558 chore(backward): use shallow clone for backward compat tests 2024-07-17 13:44:30 +02:00
Agnes Leroy
4176b3dcb5 chore(gpu): clean overflowing sub memory 2024-07-17 13:35:22 +02:00
Agnes Leroy
abf9c3efb7 fix(gpu): add missing delete 2024-07-17 09:21:05 +02:00
David Testé
ebf1fd9e84 chore(ci): fix test filtering for gpu multi-bit parameters set 2024-07-16 18:06:42 +02:00
Arthur Meyre
cef055b7f3 chore(ci): fix the no internal test patch
- condition was not precise enough and we were still running tests on push
to internal main
2024-07-16 17:48:37 +02:00
Beka Barbakadze
d65a4d8690 feat(gpu): implement signed_overflowing_add 2024-07-15 15:29:40 +02:00
Arthur Meyre
928bc13ed2 chore(bench): create modules to avoid headaches with features in utils 2024-07-15 13:54:47 +02:00
Arthur Meyre
c81abae989 chore(ci): do not send notification if H100 tests are skipped 2024-07-15 13:54:27 +02:00
Arthur Meyre
aff50fcb85 chore(ci): do not run integer tests on push if not on our repo 2024-07-15 13:54:27 +02:00
Agnes Leroy
757606fdb4 chore(gpu): pin bsk host memory 2024-07-15 11:03:21 +02:00
Agnes Leroy
7542c89679 chore(gpu): create lut once for all layers in sum_ct_vec 2024-07-15 11:03:06 +02:00
Agnes Leroy
dd74063959 refactor(gpu): make it possible to reuse memory in sum_ct_vec 2024-07-15 11:03:06 +02:00
dependabot[bot]
f6845a988b chore(deps): update zama-ai/slab-github-runner requirement to 9e939a10db25c698cddf0da0f4f015bd47bb6838
Updates the requirements on [zama-ai/slab-github-runner](https://github.com/zama-ai/slab-github-runner) to permit the latest version.
- [Commits](9e939a10db)

---
updated-dependencies:
- dependency-name: zama-ai/slab-github-runner
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-15 10:54:05 +02:00
Agnes Leroy
6a3ff21de2 chore(gpu): fix unsigned integer tests 2024-07-15 09:58:33 +02:00
David Testé
74cafd0e9d chore(bench): benchmark oprf function against all available precisions 2024-07-12 19:14:38 +02:00
David Testé
d8241942a6 chore(ci): run fast default ops for multi-bit gpu benchmarks 2024-07-12 18:03:13 +02:00
David Testé
46f0bf442a chore(bench): add target with deduplicated default integer ops
Some operations share the same underlying implementations (e.g.
max/min, left_shift/right_shift). Benchmarking these operations
can be considered as duplicate for developpers looking for fast
feedback on their changes.
Scalar operations are not included in this subset.
2024-07-12 18:03:13 +02:00
David Testé
81c837c837 chore(ci): check for files changes to run fast aws tests 2024-07-12 16:53:03 +02:00
David Testé
7b96f55900 chore(ci): run only fast integer tests in pull-request
A nightly run has been added to run tests against message_3_carry_3* parameters set.
2024-07-12 12:21:26 +02:00
David Testé
f19e892053 chore(ci): add nightly integer tests filter
Nightly integer tests would run only on message_3_carry_3_* parameters set.
2024-07-12 12:21:26 +02:00
David Testé
2a989d64f9 chore(ci): run integer fast tests only on message_2_carry_2 parameters 2024-07-12 12:21:26 +02:00
David Testé
eeb4accf66 chore(ci): do not run multi-bit tests in fast aws test workflow 2024-07-12 12:21:26 +02:00
Agnes Leroy
0370bf6a3f chore(gpu): reduce integer bench time 2024-07-11 17:37:35 +02:00
Arthur Meyre
a62c19b735 chore(tfhe): bump version to 0.8.0 and CUDA backend to 0.4.0 2024-07-11 13:00:27 +02:00
Arthur Meyre
721a5a57ba refactor(integer): improve CompressedPublicKey encryption performance 2024-07-11 12:58:37 +02:00
Arthur Meyre
3f101d5e8b refactor(tfhe): update native_crt encryption primitives to use new types 2024-07-11 12:58:37 +02:00
David Testé
e01f4abb65 chore(ci): update slab-github-runner to latest version 2024-07-11 12:16:07 +02:00
David Testé
a2ca189283 chore(ci): check labels to launch only on approval
Any other label added other than "approved" would trigger these
workflows which is not desired.
2024-07-11 08:48:08 +02:00
David Testé
e0e9668b0b chore(ci): use large ubuntu runner to get more disk space 2024-07-10 16:44:42 +02:00
David Testé
bd23d18c9d chore(ci): do not cancel integer tests on main branch 2024-07-10 09:29:26 +02:00
Agnes Leroy
491112ffc1 refactor(gpu): start releasing memory before cleanup in scalar mul 2024-07-10 08:54:57 +02:00
David Testé
83c3dadb5d chore(ci): upgrade node packages to latest versions 2024-07-09 17:38:50 +02:00
David Testé
7692643ca4 chore(ci): upgrade node version to 22 2024-07-09 17:38:50 +02:00
David Testé
29cf2b83b8 chore(ci): lock version of wasm-pack to 0.13.0
It also fixes import in generated file otherwise usage of
wasm-pack would result in a broken build.
2024-07-09 17:38:50 +02:00
Arthur Meyre
1b47c74360 chore(doc): remove some leftover whitespace 2024-07-09 17:11:12 +02:00
David Testé
cd329729d7 chore(ci): force test-threads value on gpu integer tests 2024-07-09 17:08:15 +02:00
David Testé
8ec24d1bb7 chore(ci): optimize triggering of gpu workflows 2024-07-09 17:08:15 +02:00
Mayeul@Zama
13f61e4d67 chore(ci): add rustdoc clippy to clippy_all and clippy_fast target
- allows to run the rustdoc check in pcc
2024-07-09 14:08:34 +02:00
Mayeul@Zama
72475a385e chore(all): add makefile command to clippy lint doctests
- The command exits with a warning on windows as it does not work at the
moment
2024-07-09 14:08:34 +02:00
Mayeul@Zama
cc8f2cb4dc chore(all): fix doctests clippy lints 2024-07-09 14:08:34 +02:00
David Testé
a153ea98ae chore(bench): fix filtering for unsigned integer cuda benchmarks 2024-07-09 10:10:51 +02:00
David Testé
60773497fe chore(bench): add benchmarks for integer oprf 2024-07-09 09:48:50 +02:00
David Testé
d632c916c2 chore(ci): lock version of wasm-pack to fix tfhe-rs build 2024-07-08 18:25:08 +02:00
Arthur Meyre
6e4ea82db8 chore(cuda): disable build if cbindgen is running to side step build bug
- also improves performance, as long as the cuda bacend has no macro usage
we are good to keep this trick
2024-07-08 14:25:03 +02:00
Arthur Meyre
a7df399de3 chore(ci): update CMake version to 3.29.6 in CI 2024-07-08 14:25:03 +02:00
Arthur Meyre
90dc9a004e chore(ci): use $(MAKE) in CMake to manage jobserver auth properly 2024-07-08 14:25:03 +02:00
Mayeul@Zama
a4508f8396 chore(core): fix __profiling flag 2024-07-08 14:25:03 +02:00
Mayeul@Zama
c8e1998167 chore(hlapi): fix clippy lints 2024-07-08 14:25:03 +02:00
Mayeul@Zama
85d3ba6238 fix(wasm): return error instead of unwrap 2024-07-08 14:25:03 +02:00
Mayeul@Zama
e9772953bf chore(core): fix tarpaulin flag 2024-07-08 14:25:03 +02:00
Mayeul@Zama
c407f3d5a6 chore(all): fix clippy lints 2024-07-08 14:25:03 +02:00
Mayeul@Zama
5f0bff98dd chore(all): fix clippy::doc_lazy_continuation 2024-07-08 14:25:03 +02:00
Mayeul@Zama
634b7ada32 chore(all): update nightly toolchain 2024-07-08 14:25:03 +02:00
David Testé
734edb3bdc chore(ci): run slack notification on ubuntu for hyperstack tests 2024-07-08 13:46:03 +02:00
David Testé
ee181506c4 chore(bench): fix naming pattern for cuda overflowing scalar add 2024-07-08 13:46:03 +02:00
yuxizama
cf1576efbd chore(doc): add the GPU video tutorial to doc 2024-07-08 09:47:49 +02:00
dependabot[bot]
d215359a75 chore(deps): bump actions/upload-artifact from 4.3.3 to 4.3.4
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.3 to 4.3.4.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](65462800fd...0b2256b8c0)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-08 09:28:54 +02:00
dependabot[bot]
1b5d5eeb94 chore(deps): bump actions/checkout from 4.1.6 to 4.1.7
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.6 to 4.1.7.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.1.6...692973e3d937129bcbf40652eb9f2f61becf3332)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-08 09:28:42 +02:00
dependabot[bot]
bbaaa53656 chore(deps): update dtolnay/rust-toolchain requirement to 21dc36fb71dd22e3317045c0c31a3f4249868b17
Updates the requirements on [dtolnay/rust-toolchain](https://github.com/dtolnay/rust-toolchain) to permit the latest version.
- [Release notes](https://github.com/dtolnay/rust-toolchain/releases)
- [Commits](21dc36fb71)

---
updated-dependencies:
- dependency-name: dtolnay/rust-toolchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-08 09:28:35 +02:00
dependabot[bot]
88ad88e71c chore(deps): bump tj-actions/changed-files from 44.5.3 to 44.5.5
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 44.5.3 to 44.5.5.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](eaf854ef0c...cc733854b1)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-08 09:28:25 +02:00
David Testé
f338df0079 chore(ci): fix conditional expression for integer workflows 2024-06-28 17:57:32 +02:00
Mads Marquart
1e0ed88767 chore(concrete-csprng): bump version to 0.4.1 2024-06-28 17:43:20 +02:00
Mads Marquart
97ebccbb5b fix(concrete-csprng): handle multi-valued target_family 2024-06-28 17:43:20 +02:00
sarah el kazdadi
2dc5c8a891 feat(perf): optimize some custom mod ops 2024-06-28 16:48:43 +02:00
David Testé
22e9505380 chore(ci): reduce ci duration by not running 4_4 parameters set
This only apply for CI triggered in pull-request. A nightly run is added that run 4bits message/4bits carry parameters set.
2024-06-28 14:59:08 +02:00
Agnes Leroy
7c80f295f7 fix(gpu): fix to_boolean_block doc test 2024-06-28 10:30:17 +02:00
David Testé
a34ddd7b54 chore(ci): split cuda-pcc and cuda-tests jobs to parallelize execution 2024-06-28 10:30:17 +02:00
Agnes Leroy
3deff5fbfd chore(gpu): reduce core crypto and cuda backend test time 2024-06-28 10:30:17 +02:00
David Testé
05a0327874 chore(ci): split integer tests for gpu
This is done to parallelize integer tests on GPU backend and thus
reduce iteration duration.
2024-06-28 10:30:17 +02:00
David Testé
879699c072 chore(ci): filter integer and shortint tests using python script
Backend support for GPU has been added to integer tests.
2024-06-28 10:30:17 +02:00
Agnes Leroy
e8b3617926 chore(ci): fix benchmarks 2024-06-27 18:59:11 +02:00
tmontaigu
d9701d99d3 refactor(integer): move sum functions into their own module 2024-06-27 18:44:50 +02:00
tmontaigu
a339025b48 refactor(shortint): move scalar_div_mod to its own modulue 2024-06-27 18:44:17 +02:00
Arthur Meyre
0e1a2ea7f6 chore(doc): update compression example to use default 2_2 params 2024-06-27 17:08:57 +02:00
Arthur Meyre
a44be90a44 chore(shortint): add gaussian compression parameters 2024-06-27 17:08:57 +02:00
J-B Orfila
f026fa5076 doc: udpate pfail README 2024-06-27 17:08:39 +02:00
Agnes Leroy
b06beabfa2 chore(gpu): run only unsigned bench in multi-bit GPU workflows 2024-06-27 16:56:31 +02:00
Agnes Leroy
773adcc26f chore(gpu): use wrapping byte add, update rust msrv 2024-06-27 15:20:33 +02:00
David Testé
ee1c90403c chore(bench): fix naming pattern for zk-pok benchmarks 2024-06-27 14:21:43 +02:00
Arthur Meyre
b9cedfec7f chore(ci): ignore some directories when checking if docs is tested 2024-06-27 11:23:29 +02:00
Arthur Meyre
3992aa7f15 chore(core): change creation metadata to struct with fields vs tuple struct
- prevent potential mistakes (like for the pseudo GGSW where there is an
input and output GLWE size)
2024-06-27 11:23:29 +02:00
David Testé
2b002f81ec chore(ci): run full multi-bit gpu benchmarks on demand 2024-06-27 10:18:49 +02:00
Arthur Meyre
2b695a9563 chore(zk): bump version to 0.2.1 for perf patch release 2024-06-27 10:13:44 +02:00
Arthur Meyre
fd72858c4d chore(bench): also bench the verification alone without the unpack/KS time 2024-06-27 10:13:10 +02:00
Agnes Leroy
3a2bb4470f fix(gpu): fix gpu index in casts, scalar comparison, scalar mul, etc. 2024-06-27 10:08:11 +02:00
Beka Barbakadze
6120fab886 feat(gpu): Implement propagate_single_carry_get_input_carries 2024-06-26 17:34:28 +02:00
Agnes Leroy
53b68619b0 chore(gpu): call nvidia-smi before launching tests on hyperstack 2024-06-26 16:47:29 +02:00
Guillermo Oyarzun
e854823233 refactor(gpu): speedup twiddles reads 2024-06-26 11:30:05 +02:00
sarah el kazdadi
19e00c484b feat(zk): zk perf improvements 2024-06-26 11:24:11 +02:00
David Testé
818e480dac chore(ci): publish only one tag for npm packages
NPM doesn't accept tags that are similar to a semantic-version
compatible string (e.g 0.7.0 or v0.7). We only publish "latest"
tag on release manager discretion.
2024-06-26 09:06:26 +02:00
David Testé
a7fc8a90e1 chore(ci): run build workflow on large windows instance 2024-06-25 18:17:26 +02:00
David Testé
3fad6d194c chore(ci): avoid cancel ongoing benchmarks on main branch 2024-06-25 17:46:24 +02:00
David Testé
23efcb8dd4 chore(bench): fix benchmark naming format for shortint 2024-06-25 17:46:07 +02:00
David Testé
33c69d9d1f chore(ci): update slab-github-runner action 2024-06-25 12:00:12 +02:00
David Testé
960d287e92 chore(bench): fix display name for gpu unsigned integer operations 2024-06-25 11:59:08 +02:00
Nicolas Sarlin
662e5402a3 chore(doc): add missing doc for a data breaking change 2024-06-24 16:09:26 +02:00
dependabot[bot]
bb7bdee25a chore(deps): bump actions/checkout from 4.1.6 to 4.1.7
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.6 to 4.1.7.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.1.6...692973e3d937129bcbf40652eb9f2f61becf3332)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-24 11:18:24 +02:00
dependabot[bot]
3503d5b484 chore(deps): update dtolnay/rust-toolchain requirement to 21dc36fb71dd22e3317045c0c31a3f4249868b17
Updates the requirements on [dtolnay/rust-toolchain](https://github.com/dtolnay/rust-toolchain) to permit the latest version.
- [Release notes](https://github.com/dtolnay/rust-toolchain/releases)
- [Commits](21dc36fb71)

---
updated-dependencies:
- dependency-name: dtolnay/rust-toolchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-24 10:09:52 +02:00
dependabot[bot]
0390f1ce56 chore(deps): bump tj-actions/changed-files from 44.5.2 to 44.5.3
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 44.5.2 to 44.5.3.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](d6babd6899...eaf854ef0c)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-24 10:09:39 +02:00
tmontaigu
d290935de1 feat(hlapi): bind match_value/match_value_or 2024-06-24 10:08:49 +02:00
tmontaigu
39dffcf742 fix(integer): match_value returns radix with correct number of blocks
match_value should have been returning an output
radix that had a number of block based
on the maximum possible output value given
in the MatchValues, and not always return
the same number of blocks than the input
2024-06-24 10:08:49 +02:00
Arthur Meyre
89bb5756cc chore(tfhe): update multi bit parameters 2024-06-24 10:08:12 +02:00
J-B Orfila
501907498f doc: compression updates 2024-06-24 10:07:54 +02:00
Nicolas Sarlin
d712c0fcd0 doc(backward): document breaking changes 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
66bee500a1 doc(backward): add doc for versioning/backward compat 2024-06-24 10:07:14 +02:00
Arthur Meyre
0687d12459 feat(integer): add the ability to reinterpret the data of a compact list
- this can be useful to load old lists which did not have any type info
2024-06-24 10:07:14 +02:00
Arthur Meyre
ecfe6e9a09 chore(doc): fix zk doc links 2024-06-24 10:07:14 +02:00
Arthur Meyre
1366c33034 chore(ci): fix clippy lints 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
c2a57f15ab chore(all): upgrades data types 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
cb679fbbcb chore(hl): upgrade IntegerClientKey version 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
08ddabb3be chore(hl): add backward compatibilty tests for Hl types 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
892a6ae276 feat(core): versionize missing types in core_crypto 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
c37bac2438 chore(hl): restore deprecated CompactList for backward compat 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
e42733fb67 feat(c_api): add versioning serialization functions in C 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
a45f3d7435 feat(all): add safe_serialize_versioned 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
95a08ef0c2 chore(hl): versionize CompactCiphertextList 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
1da159f0f0 test(hl): Add backward compat tests for HL ct and client key 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
7c76ce2cfb feat(hl): versionize booleans 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
3f499c85b3 feat(hl): versionize the Config 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
d736aa170e feat(hl): versionize the FheUint/FheInt and their subtypes 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
9aa4e0f0b5 feat(hl): versionize {Client,Server,Public}Key and their subtypes 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
7cf4f0219f feat(versionable): impl Versionize for tuples 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
97c10df6c2 chore(versionable): ignore struct_field_names clippy lint 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
5b530152fe feat(versionable): Add versionize support for aligned-vec types 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
49ffeba87c feat(versionable): Add support for Vec of custom types 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
679d76e7a6 feat(versionable): Add support for additional bounds for Versionize 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
7613ef2ba9 feat(versionable): Add versionize support for Box<T> 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
6a9e959edf feat(versionable): Add versionize support for num_complex::Complex 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
034da67ff2 chore(backward): moved backward_compat files inside tfhe-rs modules 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
aac136909a chore(ci): uniformize ci Makefile targets names 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
54cf162db5 chore(ci): run backward compatibility tests in ci 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
ac211cf71f test(tfhe): add tests for types backward compatibility 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
a02896b9bc feat(shortint): versionize the shortint client key and its subtypes 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
4a4ad23cee feat(shortint): add versioning to ciphertext 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
fbf38a82ad refactor(core): add conversion between modulus and serializable modulus 2024-06-24 10:07:14 +02:00
Nicolas Sarlin
444ebbde57 feat(vers): add crate for types versioning/backward compatibility 2024-06-24 10:07:14 +02:00
Arthur Meyre
c227bf4a49 chore(doc): add doc for dedicated CPK parameters and Cast for faster ZK 2024-06-21 21:13:04 +02:00
Arthur Meyre
c1916b82ca fix(shortint): fix performance bug for compact list expand
- casting was done sequentially while it could be done in parallel
2024-06-21 21:13:04 +02:00
Agnes Leroy
4b8a3a15e8 doc(gpu): add multi-gpu doc 2024-06-21 17:49:51 +02:00
Mayeul@Zama
cd33712b43 chore(all): reword "in little memory" 2024-06-21 16:32:12 +02:00
Mayeul@Zama
e20114e8e2 doc(hlapi): add compression documentation
Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
2024-06-21 16:32:12 +02:00
Mayeul@Zama
1f7ef064fb feat(hlapi): add C-API for CompressedCiphertextList 2024-06-21 16:32:12 +02:00
Mayeul@Zama
f16458147b feat(hlapi): add CompressedCiphertextList 2024-06-21 16:32:12 +02:00
Mayeul@Zama
82b4b63d0e feat(hlapi): add compression keys to IntegerKeys 2024-06-21 16:32:12 +02:00
Mayeul@Zama
34616ae4f7 feat(integer): add CompressedCiphertextList 2024-06-21 16:32:12 +02:00
Mayeul@Zama
26aa4e3a61 refactor(integer): Expandable::from_expanded_blocks takes ownership to avoid clone 2024-06-21 16:32:12 +02:00
Mayeul@Zama
872d51f5f0 refactor(integer): move Expandable to utils module 2024-06-21 16:32:12 +02:00
Mayeul@Zama
69fb7aa7ae chore(shortint): add glwe packing benches 2024-06-21 16:32:12 +02:00
Mayeul@Zama
ea73ec0832 feat(shortint): add glwe_packing 2024-06-21 16:32:12 +02:00
Mayeul@Zama
d62e365bdc feat(core): add compressed_modulus_switched_glwe_ciphertext 2024-06-21 16:32:12 +02:00
Mayeul@Zama
1579fb249a feat(core): add glwe conformance 2024-06-21 16:32:12 +02:00
Mayeul@Zama
0624a5c5e2 refactor(shortint): apply_programmable_bootstrap takes a &GlweCt 2024-06-21 16:32:12 +02:00
Mayeul@Zama
ec4350edb4 refactor(shortint): isolate functions 2024-06-21 16:32:12 +02:00
Arthur Meyre
904cd00076 chore(tfhe): mark zk-pok as non experimental 2024-06-21 09:19:42 +02:00
sarah el kazdadi
44c64210ca feat(zk): add randomness to hash functions 2024-06-21 07:11:16 +02:00
Agnes Leroy
9cd7aeccf5 chore(gpu): bump cuda backend version 2024-06-20 16:20:27 +02:00
Arthur Meyre
987d68942d chore(ci): update npm packages 2024-06-20 15:35:56 +02:00
David Testé
2ff3b75ef7 chore(ci): update gpu ec2 ami with rustup snap package removed
Rustup was installed using snap and could clash in the workflow
notably on cargo-clippy calls.
2024-06-20 14:42:02 +02:00
Arthur Meyre
9242b2a725 feat(high_level_api): add casting primitives for compact public key 2024-06-20 13:24:27 +02:00
Arthur Meyre
bd674fe5bc feat(tfhe): add CPK casting abilities in shortint and integer 2024-06-20 13:24:27 +02:00
Guillermo Oyarzun
4e5b9986b6 chore(gpu): Add bitnot operation without using pbs 2024-06-20 10:52:58 +02:00
tmontaigu
1e535c83a6 fix(shortint): count PBS in trivial_many_lut
We forgot increase the number of PBSes when doing
a many_lut PBS on a trivial input.

This fixes that, and changes the many_lut test to
test the result on trivial ciphertext and also test
that the PBS count is correct.
2024-06-20 10:44:57 +02:00
Ben
915eafac15 chore(ci): fix non-float print 2024-06-20 08:35:37 +02:00
David Testé
1c760a31e2 chore(ci): change threat model in lattice estimator
Set the hard threshold to 128 bits of security and add a soft
threshold of 132 bits. This new threshold matches the security
level advertised for current cryptographic parameters in shortint.
2024-06-20 08:35:37 +02:00
David Testé
369d6df350 chore(ci): gather more parameters pk for curve security checks 2024-06-20 08:35:37 +02:00
David Testé
bbd12b8a30 chore(tfhe): update tuniform parameters and remove unused ones 2024-06-20 08:35:37 +02:00
sarah el kazdadi
deebe09a8c feat(zk): improve performance of zk pke proofs 2024-06-19 16:49:50 +02:00
David Testé
dcd8224a7e chore(doc): remove old cpu benchmarks arrays 2024-06-19 14:00:27 +02:00
Arthur Meyre
2f7ad4cdcd chore(ci): add server.PID from WASM tests to .gitignore 2024-06-19 10:51:03 +02:00
David Testé
4c8d791a2d chore(bench): measure object sizes in zk_pke benchmarks 2024-06-19 10:51:03 +02:00
David Testé
c60cb88367 chore(ci): add workflow for pke-zk benchmarks 2024-06-19 10:51:03 +02:00
David Testé
f53c0df449 chore(bench): write zk_pke benchmarks results to json file 2024-06-19 10:51:03 +02:00
David Testé
6be983db34 chore(bench): refactor core_crypto benchmarks to use TUniform 2024-06-19 10:51:03 +02:00
Arthur Meyre
8caa0f780e chore(bench): add ZK bench for integer 2024-06-19 10:51:03 +02:00
Arthur Meyre
75e2be2ca2 chore(bench): update zk wasm benchmarks
- add a parameter set for wasm to benchmark relevant ZK timings
- update benchmarking code to be more flexible
2024-06-19 10:51:03 +02:00
Arthur Meyre
cd40176a56 feat(zk): speed up CRS gen by parallelizing exponentiations 2024-06-19 10:51:03 +02:00
Arthur Meyre
65737e83db refactor(HL): disallow unpacked ZK proofs in the HL API and WASM API
- ZK timings being bad, we make the decision to always pack for ZKs
2024-06-19 10:51:03 +02:00
David Testé
e4643c7919 chore(doc): update benchmarks timings 2024-06-19 09:06:40 +02:00
tmontaigu
baa3075f19 feat(tfhe): add FheUint512, FheUint1024, FheUint2048 2024-06-18 10:06:28 +02:00
tmontaigu
9cc97f9ab5 feat(zk): impl CanonicalSerialize/Deserialize
This is to allow specifying whether data should be compressed
as compression and validation adds a very signigicant overhead
especially in wasm where deserialization goes from 6 min to 450ms
2024-06-18 09:11:58 +02:00
David Testé
2bd9f7aab4 chore(shortint): remove compact pk t-uniform parameters set
Add new TUniform under classic/ that is not compact public key.
2024-06-17 16:33:32 +02:00
David Testé
833d52c1f1 chore(boolean): update parameters to security level of 132 bits 2024-06-17 16:33:32 +02:00
Agnes Leroy
4f2de51012 chore(gpu): add missing scalar rotate bench and update div bench 2024-06-17 15:50:14 +02:00
Agnes Leroy
134bec8f78 chore(gpu): fix multi-gpu bench workflow 2024-06-17 15:50:14 +02:00
Agnes Leroy
f2713a12c7 chore(gpu): fix if_then_else benchmark name, add bitnot benchmark 2024-06-17 15:50:14 +02:00
Mayeul@Zama
503fad69d2 chore(all): update SERIALIZATION_VERSION 2024-06-17 15:36:39 +02:00
Arthur Meyre
30ccb34ef9 chore(ci): manage the memory issues we are seeing
- shortint reduce test threads because of large keys
- integer clear in memory cache to avoid keeping two copies of keys per
process
2024-06-17 13:00:22 +02:00
Agnes Leroy
2ff64ccba0 chore(gpu): add apply bivariate lut as entry point on the rust side 2024-06-17 11:30:44 +02:00
dependabot[bot]
aeed5b70f3 chore(deps): bump dtolnay/rust-toolchain
Bumps [dtolnay/rust-toolchain](https://github.com/dtolnay/rust-toolchain) from d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a to 21dc36fb71dd22e3317045c0c31a3f4249868b17.
- [Release notes](https://github.com/dtolnay/rust-toolchain/releases)
- [Commits](d8352f6b1d...21dc36fb71)

---
updated-dependencies:
- dependency-name: dtolnay/rust-toolchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-17 09:18:06 +02:00
dependabot[bot]
8f707611a0 chore(deps): bump codecov/codecov-action from 4.4.1 to 4.5.0
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4.4.1 to 4.5.0.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](125fc84a9a...e28ff129e5)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-17 09:17:51 +02:00
dependabot[bot]
2d0671cdd8 chore(deps): bump actions/checkout from 4.1.5 to 4.1.7
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.5 to 4.1.7.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.1.5...692973e3d937129bcbf40652eb9f2f61becf3332)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-17 09:17:33 +02:00
David Testé
f9307754ef chore(ci): reduce number of keys generated in keycache
The keys/ folder was containing lots of unused keys in tests and
thus was eating around 20GB of disk space.
2024-06-14 19:38:58 +02:00
Beka Barbakadze
3af990b044 feat(gpu): Implement unsigned_overflowing_scalar_add for cuda backend 2024-06-14 15:41:24 +02:00
Arthur Meyre
0d8b1c6509 chore(zk): bump version to 0.2.0 2024-06-14 14:19:30 +02:00
Guillermo Oyarzun
c35cb4998d chore(gpu): update gpu parameters 2024-06-14 11:27:47 +02:00
Agnes Leroy
e825277219 chore(ci): reduce the number of cpu threads used in tests on big instances 2024-06-13 21:22:29 +02:00
Agnes Leroy
71112231b9 feat(gpu): unsigned scalar div 2024-06-13 21:22:29 +02:00
Agnes Leroy
b78c719816 chore(gpu): add benchmark workflow for multi-bit multi-GPU 2024-06-13 17:38:46 +02:00
David Testé
7152f9c5c9 chore(ci): update slab-github-runner action in recent workflows 2024-06-13 17:38:46 +02:00
Agnes Leroy
d3a6b4a7d8 chore(gpu): add p3.8xlarge hourly cost 2024-06-13 13:01:59 +02:00
Pedro Alves
f49684bdac feat(gpu): replicate luts and lut indexes to all available GPUs 2024-06-13 13:01:59 +02:00
Arthur Meyre
cf5fd87efb feat(core): add variable Scalar type to PBS for input and output 2024-06-13 09:08:35 +02:00
David Testé
179fbfc9bb chore(shortint): update default parameters
The default parameters are now offering a security level of 132
bits and uses a p-fail of 2**-64.
2024-06-12 17:22:24 +02:00
Arthur Meyre
ddf236ecbb chore(shortint): remove MaxNoiseLevel check in from_raw_parts
- MaxNoiseLevel could have been optimized in a particular way, not the one
coded by the from function here
2024-06-12 08:59:03 +02:00
Arthur Meyre
e3fdb961b6 chore(core): remove a lost TODO 2024-06-12 08:59:03 +02:00
Agnes Leroy
2185bcf80e chore(gpu): refactor signed overflow sub test to use FnExecutor 2024-06-12 08:44:48 +02:00
Agnes Leroy
418409231b chore(gpu): refactor signed overflowing add tests to use a FnExecutor 2024-06-12 08:44:48 +02:00
Arthur Meyre
ce27c7c44a refactor(tfhe): create associated CompactPrivateKey and prepare casting
- for casting from the CompactPublicKey parameter we need to add the
notion of a kind on the CompactCiphertextList, where one kind will need to
be cast thanks to an auxiliary keyswitching key and the other kind can just
be expanded as before
- to avoid weird situations/corner cases we remove the ability to encrypt a
"normal" ciphertext from a CompactPublicKey (which consisted in expanding
right after encryption)
2024-06-11 19:23:44 +02:00
dependabot[bot]
ccb6f98b09 chore(deps-dev): bump braces in /tfhe/web_wasm_parallel_tests
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3.
- [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md)
- [Commits](https://github.com/micromatch/braces/compare/3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: braces
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-11 18:11:02 +02:00
Arthur Meyre
6014968655 chore(continuous-integration): change commit regex to allow hyphen in scope 2024-06-11 14:43:04 +02:00
Arthur Meyre
6687695d19 chore(gpu): removed unused dependency 2024-06-11 12:09:46 +02:00
Agnes Leroy
c7a0493715 chore(gpu): fix warnings in pcc_gpu 2024-06-11 11:22:33 +02:00
Arthur Meyre
24aeac7843 feat(core): add keyswitch that changes the scalar type from input to ouptut 2024-06-10 18:19:38 +02:00
Arthur Meyre
21a749541a fix(integer): fix ZK packing chunk not being full 2024-06-10 18:19:11 +02:00
Arthur Meyre
b3b8f3273a fix(test): there was a typo in a feature name not picked up by clippy
- fixed the test according to the code that was merged
2024-06-10 18:19:11 +02:00
Agnes Leroy
f2b4ebb863 chore(gpu): use different streams in if_then_else 2024-06-10 17:33:35 +02:00
Agnes Leroy
919a40077c fix(gpu): use all gpus in omp loops 2024-06-10 16:05:36 +02:00
David Testé
ac6c90d13f chore(bench): fix naming pattern on if_then_else cuda benchmark 2024-06-10 15:36:04 +02:00
Agnes Leroy
b8991229ec feat(gpu): make PBS and ks execution parallel over available GPUs
Only GPUs with peer access to GPU 0 can be used for this at the moment.
Peer to peer copy is used if different GPUs are passed to memcpy_gpu_to_gpu
A gpu offset is passed as new parameter to pbs and keyswitch to adjust the input/output index user per gpu.
bsk and ksk are copied to all GPUs.
The CI now tests & runs benchmarks on p3.8xlarge aws instances
2024-06-10 15:05:42 +02:00
David Testé
5f0ca54150 chore(bench): add benchmarks for pbs-ntt64 2024-06-10 09:35:36 +02:00
dependabot[bot]
dddf85fb2c chore(deps): bump tj-actions/changed-files from 44.5.1 to 44.5.2
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 44.5.1 to 44.5.2.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](https://github.com/tj-actions/changed-files/compare/v44.5.1...d6babd6899969df1a11d14c368283ea4436bca78)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-10 08:58:43 +02:00
dependabot[bot]
d000f8ddf7 chore(deps): bump actions/checkout from 4.1.4 to 4.1.6
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.4 to 4.1.6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.1.4...a5ac7e51b41094c92402da3b24376905380afc29)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-10 08:58:33 +02:00
Agnes Leroy
70b643a1db fix(gpu): fix cuda bench warnings 2024-06-07 13:33:37 +02:00
tmontaigu
3f9c1b0ca6 refactor(tfhe): Allow CompactCiphertextList to store heterogeneous types
This refactors the integer's CompactCiphertextList to allow storing
unsigned, signed (without necessarily the same number of blocks) and
booleans in a single comapct list.

This is better as its more flexible and allows for better compression
by not forcing to use a list per data type. This is especially
interessing with zero-knowledge proofs as they are expensive to compute.

This also adds the ability to pack integer blocks by using the carry
space, but makes the expansion require a ServerKey to split blocks
via PBS.

BREAKING CHANGE: expand method from CompactCiphertextList returns a
                 CiphertextExpander
BREAKING CHANGE: Removes 'typed' CompactList and Compact types from the hlapi
                 (e.g. CompactFheUintList/CompactFheUintX)
2024-06-06 17:26:13 +02:00
David Testé
301537a81b chore(bench): add pbs128 to benchmarks suite 2024-06-06 10:23:25 +02:00
Beka Barbakadze
76338de99f feat(gpu): add overflowing_add in cuda_backend 2024-06-05 15:45:16 +04:00
Guillermo Oyarzun
019efb7fef chore(gpu): parallelize keyswitch further 2024-06-05 11:23:53 +02:00
David Testé
772a70d838 chore(ci): remove need for docker on hyperstack instance
By using a GitHub hosted runner to use Slack notification action,
we remove the need to install Docker on test instance.
2024-06-05 09:27:06 +02:00
David Testé
f024e8abae chore(ci): improve action skipping for internal repository 2024-06-05 09:27:06 +02:00
David Testé
31685387ea chore(ci): remove unused slab commands
All the deleted commands have now their workflow using slab action
to spawn/teardown instances.
2024-06-04 09:38:56 +02:00
David Testé
4db77e236f chore(ci): refactor code coverage workflow to use slab action 2024-06-04 09:38:56 +02:00
David Testé
bc02216470 chore(ci): refactor wasm benchmarks workflow to use slab action 2024-06-04 09:38:56 +02:00
Agnes Leroy
228afe80e7 chore(gpu): change the number of threads in blocks_rotate, smart copy and pack blocks 2024-06-03 18:13:41 +02:00
yuxizama
e4a21db7ee chore(docs): update license FAQ 2024-06-03 17:12:18 +02:00
Beka Barbakadze
3e37759f5f fix(gpu): ensure single carry propagation returns carry 2024-06-03 15:55:25 +02:00
Arthur Meyre
dc0d72436d refactor(core): factorize multiplicative factor code for GGSW encryption
- some code was repeated several times, factorize it out in a function
2024-06-03 14:50:08 +02:00
Arthur Meyre
8a31abfca4 feat(core): add non mem optimized NTT64 primitives
- also add docstrings to ntt primitives
- export a now useful functions for decryption on non native moduli
2024-06-03 14:50:08 +02:00
Arthur Meyre
154c2e61b8 feat(core): add NTT PBS and NTT conversion algorithms
Co-authored-by: sarah el kazdadi <sarah.elkazdadi@zama.ai>
2024-06-03 14:50:08 +02:00
Arthur Meyre
b3e6f8522f refactor(core): split LWE PBS algorithms depending on the polymul backend 2024-06-03 14:50:08 +02:00
Arthur Meyre
3d2e3b389a feat(core): add NTT entities
Co-authored-by: sarah el kazdadi <sarah.elkazdadi@zama.ai>
2024-06-03 14:50:08 +02:00
Arthur Meyre
f6f07714cb feat(tfhe): add support for non native moduli to some GLWE and GGSW algos
- update GLWE and GGSW encryption algorithms
2024-06-03 14:50:08 +02:00
Arthur Meyre
57bc1f5abe chore(core): remove some commas just laying there in macros + use div_ceil
- use div_ceil instead of doing it manually in parallel FFT conversion
2024-06-03 14:50:08 +02:00
Arthur Meyre
fd88c3ead2 feat(tfhe): plug NTT primitives as designed by Sarah originally
Co-authored-by: sarah el kazdadi <sarah.elkazdadi@zama.ai>
2024-06-03 14:50:08 +02:00
Arthur Meyre
4bbb3570d1 refactor(core): remove row_count from ggsw row-like structures
- use size primitives in places where it makes sense
2024-06-03 14:50:08 +02:00
dependabot[bot]
e2413ff69e chore(deps): bump actions/checkout from 4.1.4 to 4.1.6
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.4 to 4.1.6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.1.4...a5ac7e51b41094c92402da3b24376905380afc29)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-03 14:49:24 +02:00
dependabot[bot]
82043fb7e2 chore(deps): bump tj-actions/changed-files from 44.5.1 to 44.5.2
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 44.5.1 to 44.5.2.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](03334d095e...d6babd6899)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-03 14:47:46 +02:00
tmontaigu
3097c964e3 chore(tfhe): alias if_then_else to select 2024-05-31 22:56:56 +02:00
Arthur Meyre
e5d6f60a1b chore(tfhe): make sha3 an optional dependency enabled with shortint
- it is used as a Random Oracle for the PRF at the shortint level
2024-05-31 19:07:55 +02:00
David Testé
05105c9d9e chore(ci): update slab-github-runner to latest version 2024-05-31 18:05:50 +02:00
David Testé
a798e1fb52 chore(ci): build parallel wasm client upon npm release 2024-05-31 16:26:38 +02:00
David Testé
9cf4be09fb chore(ci): push tfhe-rs version as npm by default
NPM tag "latest" will be pushed only on demand.
2024-05-31 16:26:38 +02:00
Agnes Leroy
484bddfebd chore(gpu): rebuild cuda backend if files changed 2024-05-31 15:07:06 +02:00
772 changed files with 52375 additions and 36479 deletions

View File

@@ -3,6 +3,8 @@ self-hosted-runner:
labels:
- m1mac
- 4090-desktop
- large_windows_16_latest
- large_ubuntu_16
# Configuration variables in array of strings defined in your repository or
# organization. `null` means disabling configuration variables check.
# Empty array means no configuration variable is allowed.

View File

@@ -0,0 +1,120 @@
# Run backward compatibility tests
name: Backward compatibility Tests on CPU
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
jobs:
setup-instance:
name: Setup instance (backward-compat-tests)
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: cpu-small
backward-compat-tests:
name: Backward compatibility tests
needs: [ setup-instance ]
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: true
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Install git-lfs
run: |
sudo apt update && sudo apt -y install git-lfs
- name: Use specific data branch
if: ${{ contains(github.event.pull_request.labels.*.name, 'data_PR') }}
env:
PR_BRANCH: ${{ github.head_ref || github.ref_name }}
run: |
echo "BACKWARD_COMPAT_DATA_BRANCH=${PR_BRANCH}" >> "${GITHUB_ENV}"
- name: Get backward compat branch
id: backward_compat_branch
run: |
BRANCH="$(make backward_compat_branch)"
echo "branch=${BRANCH}" >> "${GITHUB_OUTPUT}"
- name: Clone test data
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
repository: zama-ai/tfhe-backward-compat-data
path: tfhe/tfhe-backward-compat-data
lfs: 'true'
ref: ${{ steps.backward_compat_branch.outputs.branch }}
- name: Run backward compatibility tests
run: |
make test_backward_compatibility_ci
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Backward compatibility tests finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (backward-compat-tests)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, backward-compat-tests ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (backward-compat-tests) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -1,4 +1,4 @@
# Run a small subset of shortint and integer tests to ensure quick feedback.
# Run a small subset of tests to ensure quick feedback.
name: Fast AWS Tests on CPU
env:
@@ -11,6 +11,7 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -18,15 +19,112 @@ on:
pull_request:
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: write
outputs:
csprng_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.csprng_any_changed }}
zk_pok_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.zk_pok_any_changed }}
core_crypto_test: ${{ env.IS_PULL_REQUEST == 'false' ||
steps.changed-files.outputs.core_crypto_any_changed ||
steps.changed-files.outputs.dependencies_any_changed }}
boolean_test: ${{ env.IS_PULL_REQUEST == 'false' ||
steps.changed-files.outputs.boolean_any_changed ||
steps.changed-files.outputs.dependencies_any_changed }}
shortint_test: ${{ env.IS_PULL_REQUEST == 'false' ||
steps.changed-files.outputs.shortint_any_changed ||
steps.changed-files.outputs.dependencies_any_changed }}
integer_test: ${{ env.IS_PULL_REQUEST == 'false' ||
steps.changed-files.outputs.integer_any_changed ||
steps.changed-files.outputs.dependencies_any_changed }}
wasm_test: ${{ env.IS_PULL_REQUEST == 'false' ||
steps.changed-files.outputs.wasm_any_changed ||
steps.changed-files.outputs.dependencies_any_changed }}
high_level_api_test: ${{ env.IS_PULL_REQUEST == 'false' ||
steps.changed-files.outputs.high_level_api_any_changed ||
steps.changed-files.outputs.dependencies_any_changed }}
user_docs_test: ${{ env.IS_PULL_REQUEST == 'false' ||
steps.changed-files.outputs.user_docs_any_changed ||
steps.changed-files.outputs.dependencies_any_changed }}
any_file_changed: ${{ env.IS_PULL_REQUEST == 'false' || steps.aggregated-changes.outputs.any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
dependencies:
- tfhe/Cargo.toml
- concrete-csprng/**
- tfhe-zk-pok/**
csprng:
- concrete-csprng/**
zk_pok:
- tfhe-zk-pok/**
core_crypto:
- tfhe/src/core_crypto/**
boolean:
- tfhe/src/core_crypto/**
- tfhe/src/boolean/**
shortint:
- tfhe/src/core_crypto/**
- tfhe/src/shortint/**
integer:
- tfhe/src/core_crypto/**
- tfhe/src/shortint/**
- tfhe/src/integer/**
wasm:
- tfhe/src/**
- tfhe/js_on_wasm_tests/**
- tfhe/web_wasm_parallel_tests/**
- '!tfhe/src/c_api/**'
- '!tfhe/src/boolean/**'
high_level_api:
- tfhe/src/**
- '!tfhe/src/c_api/**'
- '!tfhe/src/boolean/**'
- '!tfhe/src/c_api/**'
- '!tfhe/src/js_on_wasm_api/**'
user_docs:
- tfhe/src/**
- '!tfhe/src/c_api/**'
- 'tfhe/docs/**.md'
- README.md
- name: Aggregate file changes
id: aggregated-changes
if: ( steps.changed-files.outputs.dependencies_any_changed == 'true' ||
steps.changed-files.outputs.csprng_any_changed == 'true' ||
steps.changed-files.outputs.zk_pok_any_changed == 'true' ||
steps.changed-files.outputs.core_crypto_any_changed == 'true' ||
steps.changed-files.outputs.boolean_any_changed == 'true' ||
steps.changed-files.outputs.shortint_any_changed == 'true' ||
steps.changed-files.outputs.integer_any_changed == 'true' ||
steps.changed-files.outputs.wasm_any_changed == 'true' ||
steps.changed-files.outputs.high_level_api_any_changed == 'true' ||
steps.changed-files.outputs.user_docs_any_changed == 'true')
run: |
echo "any_changed=true" >> "$GITHUB_OUTPUT"
setup-instance:
name: Setup instance (fast-tests)
if: github.event_name != 'pull_request' ||
needs.should-run.outputs.any_file_changed == 'true'
needs: should-run
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -37,14 +135,16 @@ jobs:
fast-tests:
name: Fast CPU tests
needs: setup-instance
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.setup-instance.result != 'skipped')
needs: [ should-run, setup-instance ]
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: true
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
@@ -53,55 +153,58 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Run concrete-csprng tests
if: needs.should-run.outputs.csprng_test == 'true'
run: |
make test_concrete_csprng
- name: Run tfhe-zk-pok tests
if: needs.should-run.outputs.zk_pok_test == 'true'
run: |
make test_zk_pok
- name: Run core tests
if: needs.should-run.outputs.core_crypto_test == 'true'
run: |
AVX512_SUPPORT=ON make test_core_crypto
- name: Run boolean tests
if: needs.should-run.outputs.boolean_test == 'true'
run: |
make test_boolean
- name: Run user docs tests
if: needs.should-run.outputs.user_docs_test == 'true'
run: |
make test_user_doc
- name: Run js on wasm API tests
if: needs.should-run.outputs.wasm_test == 'true'
run: |
make test_nodejs_wasm_api_in_docker
- name: Gen Keys if required
if: needs.should-run.outputs.shortint_test == 'true' ||
needs.should-run.outputs.integer_test == 'true'
run: |
make gen_key_cache
- name: Run shortint tests
if: needs.should-run.outputs.shortint_test == 'true'
run: |
BIG_TESTS_INSTANCE=TRUE FAST_TESTS=TRUE make test_shortint_ci
- name: Run integer tests
if: needs.should-run.outputs.integer_test == 'true'
run: |
BIG_TESTS_INSTANCE=TRUE FAST_TESTS=TRUE make test_integer_ci
- name: Run shortint multi-bit tests
run: |
BIG_TESTS_INSTANCE=TRUE FAST_TESTS=TRUE make test_shortint_multi_bit_ci
- name: Run integer multi-bit tests
run: |
BIG_TESTS_INSTANCE=TRUE FAST_TESTS=TRUE make test_integer_multi_bit_ci
- name: Run high-level API tests
if: needs.should-run.outputs.high_level_api_test == 'true'
run: |
make test_high_level_api
@@ -125,7 +228,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -10,24 +10,37 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
# We clear the cache to reduce memory pressure because of the numerous processes of cargo
# nextest
TFHE_RS_CLEAR_IN_MEMORY_KEY_CACHE: "1"
NO_BIG_PARAMS: FALSE
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
types: [ labeled ]
push:
branches:
- main
schedule:
# Nightly tests @ 3AM after each work day
- cron: "0 3 * * MON-FRI"
jobs:
setup-instance:
name: Setup instance (unsigned-integer-tests)
if: ${{ github.event_name == 'workflow_dispatch' || contains(github.event.label.name, 'approved') }}
if: (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||
(github.event_name == 'pull_request' && contains(github.event.label.name, 'approved')) ||
github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -40,12 +53,12 @@ jobs:
name: Unsigned integer tests
needs: setup-instance
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: true
group: ${{ github.workflow }}_${{ github.ref }}${{ github.ref == 'refs/heads/main' && github.sha || '' }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
@@ -54,10 +67,15 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Should skip big parameters set
if: github.event_name == 'pull_request'
run: |
echo "NO_BIG_PARAMS=TRUE" >> "${GITHUB_ENV}"
- name: Gen Keys if required
run: |
make GEN_KEY_CACHE_MULTI_BIT_ONLY=TRUE gen_key_cache
@@ -72,7 +90,7 @@ jobs:
- name: Run unsigned integer tests
run: |
AVX512_SUPPORT=ON BIG_TESTS_INSTANCE=TRUE make test_unsigned_integer_ci
AVX512_SUPPORT=ON NO_BIG_PARAMS=${{ env.NO_BIG_PARAMS }} BIG_TESTS_INSTANCE=TRUE make test_unsigned_integer_ci
- name: Slack Notification
if: ${{ always() }}
@@ -90,7 +108,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -10,24 +10,37 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
# We clear the cache to reduce memory pressure because of the numerous processes of cargo
# nextest
TFHE_RS_CLEAR_IN_MEMORY_KEY_CACHE: "1"
NO_BIG_PARAMS: FALSE
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
types: [ labeled ]
push:
branches:
- main
schedule:
# Nightly tests @ 3AM after each work day
- cron: "0 3 * * MON-FRI"
jobs:
setup-instance:
name: Setup instance (signed-integer-tests)
if: ${{ github.event_name == 'workflow_dispatch' || contains(github.event.label.name, 'approved') }}
if: (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||
(github.event_name == 'pull_request' && contains(github.event.label.name, 'approved')) ||
github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -40,12 +53,12 @@ jobs:
name: Signed integer tests
needs: setup-instance
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: true
group: ${{ github.workflow }}_${{ github.ref }}${{ github.ref == 'refs/heads/main' && github.sha || '' }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
@@ -54,10 +67,15 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Should skip big parameters set
if: github.event_name == 'pull_request'
run: |
echo "NO_BIG_PARAMS=TRUE" >> "${GITHUB_ENV}"
- name: Gen Keys if required
run: |
make GEN_KEY_CACHE_MULTI_BIT_ONLY=TRUE gen_key_cache
@@ -76,7 +94,7 @@ jobs:
- name: Run signed integer tests
run: |
AVX512_SUPPORT=ON BIG_TESTS_INSTANCE=TRUE make test_signed_integer_ci
AVX512_SUPPORT=ON NO_BIG_PARAMS=${{ env.NO_BIG_PARAMS }} BIG_TESTS_INSTANCE=TRUE make test_signed_integer_ci
- name: Slack Notification
if: ${{ always() }}
@@ -94,7 +112,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -24,6 +24,8 @@ on:
jobs:
should-run:
runs-on: ubuntu-latest
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
permissions:
pull-requests: write
outputs:
@@ -55,13 +57,13 @@ jobs:
any_file_changed: ${{ env.IS_PULL_REQUEST == 'false' || steps.aggregated-changes.outputs.any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@03334d095e2739fa9ac4034ec16f66d5d01e9eba
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
@@ -84,6 +86,8 @@ jobs:
high_level_api:
- tfhe/src/**
- '!tfhe/src/c_api/**'
- '!tfhe/src/boolean/**'
- '!tfhe/src/js_on_wasm_api/**'
c_api:
- tfhe/src/**
examples:
@@ -119,7 +123,7 @@ jobs:
setup-instance:
name: Setup instance (cpu-tests)
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.should-run.outputs.any_file_changed == 'true')
(github.event.action == 'labeled' && github.event.label.name == 'approved' && needs.should-run.outputs.any_file_changed == 'true')
needs: should-run
runs-on: ubuntu-latest
outputs:
@@ -127,7 +131,7 @@ jobs:
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -147,7 +151,7 @@ jobs:
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
@@ -156,7 +160,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
@@ -233,7 +237,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -27,7 +27,7 @@ jobs:
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -45,7 +45,7 @@ jobs:
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
@@ -54,7 +54,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
@@ -72,7 +72,7 @@ jobs:
- name: Run parallel wasm tests
run: |
make ci_test_web_js_api_parallel
make test_web_js_api_parallel_ci
- name: Slack Notification
if: ${{ always() }}
@@ -90,7 +90,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -3,30 +3,9 @@ name: Boolean benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
env:
CARGO_TERM_COLOR: always
@@ -34,36 +13,60 @@ env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
jobs:
run-boolean-benchmarks:
name: Execute boolean benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
setup-instance:
name: Setup instance (boolean-benchmarks)
runs-on: ubuntu-latest
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: bench
boolean-benchmarks:
name: Execute boolean benchmarks in EC2
needs: setup-instance
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
@@ -73,14 +76,12 @@ jobs:
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--hardware "hpc7a.96xlarge" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
@@ -97,13 +98,13 @@ jobs:
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_boolean
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
@@ -129,8 +130,28 @@ jobs:
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Boolean benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
teardown-instance:
name: Teardown instance (boolean-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, boolean-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (boolean-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -19,11 +19,11 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest, macos-latest-large, windows-latest]
os: [large_ubuntu_16, macos-latest-large, large_windows_16_latest]
fail-fast: false
steps:
- uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
- name: Install and run newline linter checks
if: matrix.os == 'ubuntu-latest'

View File

@@ -10,7 +10,7 @@ jobs:
- name: Check first line
uses: gsactions/commit-message-checker@16fa2d5de096ae0d35626443bcd24f1e756cafee
with:
pattern: '^((feat|fix|chore|refactor|style|test|docs|doc)(\(\w+\))?\:) .+$'
pattern: '^((feat|fix|chore|refactor|style|test|docs|doc)(\([\w\-_]+\))?\!?\:) .+$'
flags: "gs"
error: 'Your first line has to contain a commit type and scope like "feat(my_feature): msg".'
excludeDescription: "true" # optional: this excludes the description body of a pull request

View File

@@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
- name: Get actionlint
run: |

View File

@@ -6,70 +6,58 @@ env:
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
# All the inputs are provided by Slab
inputs:
instance_id:
description: "AWS instance ID"
type: string
instance_image_id:
description: "AWS instance AMI ID"
type: string
instance_type:
description: "AWS instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: 'Slab request ID'
type: string
fork_repo:
description: 'Name of forked repo as user/repo'
type: string
fork_git_sha:
description: 'Git SHA to checkout from fork'
type: string
# Code coverage workflow is only run via workflow_dispatch event since execution duration is not stabilized yet.
jobs:
code-coverage:
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}_${{ inputs.instance_image_id }}_${{ inputs.instance_type }}
cancel-in-progress: true
runs-on: ${{ inputs.runner_name }}
timeout-minutes: 11520 # 8 days
setup-instance:
name: Setup instance (code-coverage)
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
# Step used for log purpose.
- name: Instance configuration used
run: |
echo "ID: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
echo "Fork repo: ${{ inputs.fork_repo }}"
echo "Fork git sha: ${{ inputs.fork_git_sha }}"
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: cpu-small
code-coverage:
name: Code coverage tests
needs: setup-instance
concurrency:
group: ${{ github.workflow }}_${{ github.event_name }}_${{ github.ref }}
cancel-in-progress: true
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
timeout-minutes: 5760 # 4 days
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@03334d095e2739fa9ac4034ec16f66d5d01e9eba
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
files_yaml: |
tfhe:
@@ -99,7 +87,7 @@ jobs:
make test_shortint_cov
- name: Upload tfhe coverage to Codecov
uses: codecov/codecov-action@125fc84a9a348dbcf27191600683ec096ec9021c
uses: codecov/codecov-action@e28ff129e5465c2c0dcc6f003fc735cb6ae0c673
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
with:
token: ${{ secrets.CODECOV_TOKEN }}
@@ -113,7 +101,7 @@ jobs:
make test_integer_cov
- name: Upload tfhe coverage to Codecov
uses: codecov/codecov-action@125fc84a9a348dbcf27191600683ec096ec9021c
uses: codecov/codecov-action@e28ff129e5465c2c0dcc6f003fc735cb6ae0c673
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
with:
token: ${{ secrets.CODECOV_TOKEN }}
@@ -127,8 +115,28 @@ jobs:
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Code coverage finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
teardown-instance:
name: Teardown instance (code-coverage)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, code-coverage ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (code-coverage) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -3,30 +3,6 @@ name: Core crypto benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
@@ -34,67 +10,89 @@ env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
jobs:
run-core-crypto-benchmarks:
name: Execute core crypto benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
setup-instance:
name: Setup instance (core-crypto-benchmarks)
runs-on: ubuntu-latest
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: bench
core-crypto-benchmarks:
name: Execute core crypto benchmarks in EC2
needs: setup-instance
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Run benchmarks with AVX512
run: |
make bench_pbs
make bench_pbs128
make bench_ks
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--hardware "hpc7a.96xlarge" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--name-suffix avx512 \
--walk-subdirs \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_core_crypto
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
@@ -120,8 +118,28 @@ jobs:
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "PBS benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
teardown-instance:
name: Teardown instance (core-crypto-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, core-crypto-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (core-crypto-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -20,13 +20,14 @@ jobs:
setup-instance:
name: Setup instance (cuda-core-crypto-benchmarks)
runs-on: ubuntu-latest
if: ${{ (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') || github.event_name == 'workflow_dispatch' }}
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -49,22 +50,13 @@ jobs:
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.1
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
@@ -73,7 +65,7 @@ jobs:
sudo make install
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
@@ -91,7 +83,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
@@ -136,13 +128,13 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_core_crypto
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
@@ -165,7 +157,7 @@ jobs:
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-core-crypto-benchmarks ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
runs-on: ubuntu-latest
if: ${{ !success() && !cancelled() }}
continue-on-error: true
steps:
@@ -183,7 +175,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -27,7 +27,7 @@ jobs:
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -45,7 +45,7 @@ jobs:
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
@@ -54,7 +54,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
@@ -78,7 +78,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

123
.github/workflows/data_pr_close.yml vendored Normal file
View File

@@ -0,0 +1,123 @@
name: Close or Merge corresponding PR on the data repo
# When a PR with the data_PR tag is closed or merged, this will close the corresponding PR in the data repo.
env:
TARGET_REPO_API_URL: ${{ github.api_url }}/repos/zama-ai/tfhe-backward-compat-data
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
PR_BRANCH: ${{ github.head_ref || github.ref_name }}
CLOSE_TYPE: ${{ github.event.pull_request.merged && 'merge' || 'close' }}
# only trigger on pull request closed events
on:
pull_request:
types: [ closed ]
# The same pattern is used for jobs that use the github api:
# - save the result of the API call in the env var "GH_API_RES". Since the var is multiline
# we use this trick: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#example-of-a-multiline-string
# - "set +e" will make sure we reach the last "echo EOF" even in case of error
# - "set -o" pipefail makes one line piped command return the error of the first failure
# - 'RES="$?"' and 'exit $RES' are used to return the error code if a command failed. Without it, with "set +e"
# the script will always return 0 because of the "echo EOF".
jobs:
auto_close_job:
if: ${{ contains(github.event.pull_request.labels.*.name, 'data_PR') }}
runs-on: ubuntu-latest
steps:
- name: Find corresponding Pull Request in the data repo
run: |
{
set +e
set -o pipefail
echo 'TARGET_REPO_PR<<EOF'
curl --fail-with-body --no-progress-meter -L -X GET \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
${{ env.TARGET_REPO_API_URL }}/pulls\?head=${{ github.repository_owner }}:${{ env.PR_BRANCH }} | jq -e '.[0]' | sed 's/null/{ "message": "corresponding PR not found" }/'
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
- name: Comment on the PR to indicate the reason of the close
run: |
{
set +e
set -o pipefail
echo 'GH_API_RES<<EOF'
curl --fail-with-body --no-progress-meter -L -X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.FHE_ACTIONS_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
${{ fromJson(env.TARGET_REPO_PR).comments_url }} \
-d '{ "body": "PR ${{ env.CLOSE_TYPE }}d because the corresponding PR in main repo was ${{ env.CLOSE_TYPE }}d: ${{ github.repository }}#${{ github.event.number }}" }'
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
- name: Merge the Pull Request in the data repo
if: ${{ github.event.pull_request.merged }}
run: |
{
set +e
set -o pipefail
echo 'GH_API_RES<<EOF'
curl --fail-with-body --no-progress-meter -L -X PUT \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.FHE_ACTIONS_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
${{ fromJson(env.TARGET_REPO_PR).url }}/merge \
-d '{ "merge_method": "rebase" }'
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
- name: Close the Pull Request in the data repo
if: ${{ !github.event.pull_request.merged }}
run: |
{
set +e
set -o pipefail
echo 'GH_API_RES<<EOF'
curl --fail-with-body --no-progress-meter -L -X PATCH \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.FHE_ACTIONS_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
${{ fromJson(env.TARGET_REPO_PR).url }} \
-d '{ "state": "closed" }'
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
- name: Delete the associated branch in the data repo
run: |
{
set +e
set -o pipefail
echo 'GH_API_RES<<EOF'
curl --fail-with-body --no-progress-meter -L -X DELETE \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.FHE_ACTIONS_TOKEN }}" \
-H "X-GitHub-Api-Version: 2022-11-28" \
${{ env.TARGET_REPO_API_URL }}/git/refs/heads/${{ env.PR_BRANCH }}
RES="$?"
echo EOF
} >> "${GITHUB_ENV}"
exit $RES
- name: Slack Notification
if: ${{ always() && job.status == 'failure' }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Failed to auto-${{ env.CLOSE_TYPE }} PR on data repo: ${{ fromJson(env.GH_API_RES || env.TARGET_REPO_PR).message }}"

View File

@@ -1,5 +1,5 @@
# Run all benchmarks on an RTX 4090 machine and return parsed results to Slab CI bot.
name: TFHE Cuda Backend - 4090 full benchmarks
# Run benchmarks on an RTX 4090 machine and return parsed results to Slab CI bot.
name: TFHE Cuda Backend - 4090 benchmarks
env:
CARGO_TERM_COLOR: always
@@ -11,6 +11,7 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_BENCH: TRUE
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -23,23 +24,22 @@ on:
jobs:
cuda-integer-benchmarks:
name: Cuda integer benchmarks for all operations flavor (RTX 4090)
if: ${{ github.event_name == 'workflow_dispatch' || github.event_name == 'schedule' || contains(github.event.label.name, '4090_bench') }}
name: Cuda integer benchmarks (RTX 4090)
if: ${{ github.event_name == 'workflow_dispatch' ||
github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs' ||
contains(github.event.label.name, '4090_bench') }}
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}_cuda_integer_bench
cancel-in-progress: true
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ["self-hosted", "4090-desktop"]
timeout-minutes: 1440 # 24 hours
strategy:
fail-fast: false
max-parallel: 1
matrix:
command: [integer, integer_multi_bit]
op_flavor: [default, unchecked]
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
@@ -50,14 +50,15 @@ jobs:
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
echo "FAST_BENCH=TRUE" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
@@ -65,7 +66,7 @@ jobs:
- name: Run integer benchmarks
run: |
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_${{ matrix.command }}_gpu
make BENCH_OP_FLAVOR=default bench_integer_multi_bit_gpu
- name: Parse results
run: |
@@ -81,9 +82,9 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
name: ${{ github.sha }}_integer_multi_bit_gpu_default
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
@@ -114,13 +115,13 @@ jobs:
needs: cuda-integer-benchmarks
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}_cuda_core_crypto_bench
cancel-in-progress: true
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ["self-hosted", "4090-desktop"]
timeout-minutes: 1440 # 24 hours
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
@@ -133,18 +134,18 @@ jobs:
} >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Run integer benchmarks
- name: Run core crypto benchmarks
run: |
make bench_pbs_gpu
make bench_ks_gpu
@@ -163,7 +164,7 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_core_crypto
path: ${{ env.RESULTS_FILENAME }}

View File

@@ -17,11 +17,16 @@ on:
workflow_dispatch:
pull_request:
types: [ labeled ]
schedule:
# Nightly tests @ 1AM after each work day
- cron: "0 1 * * MON-FRI"
jobs:
cuda-tests-linux:
name: CUDA tests (RTX 4090)
if: ${{ github.event_name == 'workflow_dispatch' || contains(github.event.label.name, '4090_test') }}
if: github.event_name == 'workflow_dispatch' ||
contains(github.event.label.name, '4090_test') ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: true
@@ -29,12 +34,12 @@ jobs:
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable

View File

@@ -0,0 +1,199 @@
# Compile and test tfhe-cuda-backend on an H100 VM on hyperstack
name: TFHE Cuda Backend - Fast tests on H100
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
types: [ labeled ]
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: write
outputs:
gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
gpu:
- tfhe/Cargo.toml
- tfhe/build.rs
- backends/tfhe-cuda-backend/**
- tfhe/src/core_crypto/gpu/**
- tfhe/src/integer/gpu/**
- tfhe/shortint/parameters/**
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- Makefile
- '.github/workflows/gpu_fast_h100_tests.yml'
- scripts/**
- ci/**
setup-instance:
name: Setup instance (cuda-h100-tests)
needs: should-run
if: github.event_name != 'pull_request' ||
(github.event.action != 'labeled' && needs.should-run.outputs.gpu_test == 'true') ||
(github.event.action == 'labeled' && github.event.label.name == 'approved' && needs.should-run.outputs.gpu_test == 'true')
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: hyperstack
profile: single-h100
cuda-tests-linux:
name: CUDA H100 tests
needs: [ should-run, setup-instance ]
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.setup-instance.result != 'skipped')
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run core crypto and internal CUDA backend tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_core_crypto_gpu
BIG_TESTS_INSTANCE=TRUE make test_cuda_backend
- name: Run user docs tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_user_doc_gpu
- name: Test C API
run: |
BIG_TESTS_INSTANCE=TRUE make test_c_api_gpu
- name: Run High Level API Tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_high_level_api_gpu
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-tests-linux.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ needs.cuda-tests-linux.result }}
SLACK_MESSAGE: "Fast H100 tests finished with status: ${{ needs.cuda-tests-linux.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-h100-tests)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-h100-tests) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -1,5 +1,5 @@
# Compile and test tfhe-cuda-backend on an AWS instance
name: TFHE Cuda Backend - Full tests
name: TFHE Cuda Backend - Fast tests
env:
CARGO_TERM_COLOR: always
@@ -11,6 +11,7 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -18,94 +19,64 @@ on:
pull_request:
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: write
outputs:
gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
gpu:
- tfhe/Cargo.toml
- tfhe/build.rs
- backends/tfhe-cuda-backend/**
- tfhe/src/core_crypto/gpu/**
- tfhe/src/integer/gpu/**
- tfhe/shortint/parameters/**
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- '.github/workflows/gpu_fast_tests.yml'
- Makefile
- scripts/**
- ci/**
setup-instance:
name: Setup instance (cuda-tests)
needs: should-run
if: github.event_name != 'pull_request' ||
needs.should-run.outputs.gpu_test == 'true'
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
backend: hyperstack
profile: gpu-test
cuda-pcc:
name: CUDA post-commit checks
needs: setup-instance
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
persist-credentials: 'false'
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
with:
toolchain: stable
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Run fmt checks
run: |
make check_fmt_gpu
- name: Run clippy checks
run: |
make pcc_gpu
- name: Slack Notification
if: ${{ always() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "CUDA AWS post-commit checks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
cuda-tests-linux:
name: CUDA tests
needs: [ setup-instance, cuda-pcc ]
needs: [ should-run, setup-instance ]
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.setup-instance.result != 'skipped')
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
@@ -117,13 +88,25 @@ jobs:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
@@ -132,7 +115,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
@@ -155,9 +138,14 @@ jobs:
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Run core crypto, integer and internal CUDA backend tests
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run core crypto and internal CUDA backend tests
run: |
make test_gpu
make test_core_crypto_gpu
make test_cuda_backend
- name: Run user docs tests
run: |
@@ -171,23 +159,28 @@ jobs:
run: |
make test_high_level_api_gpu
- name: Slack Notification
if: ${{ always() }}
continue-on-error: true
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-tests-linux.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "CUDA AWS tests finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_COLOR: ${{ needs.cuda-tests-linux.result }}
SLACK_MESSAGE: "Base GPU tests finished with status: ${{ needs.cuda-tests-linux.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-tests)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-pcc, cuda-tests-linux ]
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -0,0 +1,199 @@
# Compile and test tfhe-cuda-backend on an AWS instance
name: TFHE Cuda Backend - Full tests multi-GPU
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
types: [ labeled ]
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: write
outputs:
gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
gpu:
- tfhe/Cargo.toml
- tfhe/build.rs
- backends/tfhe-cuda-backend/**
- tfhe/src/core_crypto/gpu/**
- tfhe/src/integer/gpu/**
- tfhe/shortint/parameters/**
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- Makefile
- '.github/workflows/**_multi_gpu_tests.yml'
- scripts/**
- ci/**
setup-instance:
name: Setup instance (cuda-tests-multi-gpu)
needs: should-run
if: github.event_name != 'pull_request' ||
(github.event.action != 'labeled' && needs.should-run.outputs.gpu_test == 'true') ||
(github.event.action == 'labeled' && github.event.label.name == 'approved' && needs.should-run.outputs.gpu_test == 'true')
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: hyperstack
profile: multi-gpu-test
cuda-tests-linux:
name: CUDA multi-GPU tests
needs: [ should-run, setup-instance ]
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.setup-instance.result != 'skipped')
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
# No need to test core_crypto and classic PBS in integer since it's already tested on single GPU.
- name: Run multi-bit CUDA integer tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_integer_multi_bit_gpu_ci
- name: Run user docs tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_user_doc_gpu
- name: Test C API
run: |
BIG_TESTS_INSTANCE=TRUE make test_c_api_gpu
- name: Run High Level API Tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_high_level_api_gpu
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-tests-linux.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ needs.cuda-tests-linux.result }}
SLACK_MESSAGE: "Multi-GPU tests finished with status: ${{ needs.cuda-tests-linux.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-tests-multi-gpu)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-tests-multi-gpu) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

126
.github/workflows/gpu_pcc.yml vendored Normal file
View File

@@ -0,0 +1,126 @@
# Perfom tfhe-cuda-backend post-commit checks on an AWS instance
name: TFHE Cuda Backend - Post-commit Checks
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
on:
pull_request:
jobs:
setup-instance:
name: Setup instance (cuda-pcc)
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: gpu-build
cuda-pcc:
name: CUDA post-commit checks
needs: setup-instance
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: true
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Run fmt checks
run: |
make check_fmt_gpu
- name: Run clippy checks
run: |
make pcc_gpu
- name: Slack Notification
if: ${{ always() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "CUDA AWS post-commit checks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-pcc)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-pcc ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-pcc) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -1,5 +1,5 @@
# Compile and test tfhe-cuda-backend on an H100 VM on hyperstack
name: TFHE Cuda Backend - Full tests on H100
# Signed integer GPU tests on an H100 VM on hyperstack
name: TFHE Cuda Backend - Signed integer tests on H100
env:
CARGO_TERM_COLOR: always
@@ -11,22 +11,61 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
types: [ labeled ]
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: write
outputs:
gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
gpu:
- tfhe/Cargo.toml
- tfhe/build.rs
- backends/tfhe-cuda-backend/**
- tfhe/src/core_crypto/gpu/**
- tfhe/src/integer/gpu/**
- tfhe/shortint/parameters/**
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- Makefile
- '.github/workflows/gpu_signed_integer_h100_tests.yml'
- scripts/**
- ci/**
setup-instance:
name: Setup instance (cuda-h100-tests)
needs: should-run
if: github.event_name != 'pull_request' ||
(github.event.action != 'labeled' && needs.should-run.outputs.gpu_test == 'true') ||
(github.event.action == 'labeled' && github.event.label.name == 'approved' && needs.should-run.outputs.gpu_test == 'true')
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -36,8 +75,10 @@ jobs:
profile: single-h100
cuda-tests-linux:
name: CUDA H100 tests
needs: [ setup-instance ]
name: CUDA H100 signed integer tests
needs: [ should-run, setup-instance ]
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.setup-instance.result != 'skipped')
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
@@ -52,22 +93,13 @@ jobs:
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.1
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
@@ -76,14 +108,14 @@ jobs:
sudo make install
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
@@ -106,27 +138,23 @@ jobs:
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Run core crypto, integer and internal CUDA backend tests
run: |
make test_gpu
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run user docs tests
- name: Run signed integer tests
run: |
make test_user_doc_gpu
BIG_TESTS_INSTANCE=TRUE make test_signed_integer_gpu_ci
- name: Test C API
- name: Run signed integer multi-bit tests
run: |
make test_c_api_gpu
- name: Run High Level API Tests
run: |
make test_high_level_api_gpu
BIG_TESTS_INSTANCE=TRUE make test_signed_integer_multi_bit_gpu_ci
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
if: ${{ !success() && !cancelled() }}
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-tests-linux.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
@@ -143,7 +171,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -0,0 +1,202 @@
# Compile and test tfhe-cuda-backend signed integer on an AWS instance
name: TFHE Cuda Backend - Signed integer tests
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_TESTS: TRUE
NIGHTLY_TESTS: FALSE
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
types:
- opened
- synchronize
- labeled
schedule:
# Nightly tests @ 1AM after each work day
- cron: "0 1 * * MON-FRI"
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: write
outputs:
gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
gpu:
- tfhe/Cargo.toml
- tfhe/build.rs
- backends/tfhe-cuda-backend/**
- tfhe/src/core_crypto/gpu/**
- tfhe/src/integer/gpu/**
- tfhe/shortint/parameters/**
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- '.github/workflows/gpu_signed_integer_tests.yml'
- Makefile
- scripts/**
- ci/**
setup-instance:
name: Setup instance (cuda-signed-integer-tests)
runs-on: ubuntu-latest
needs: should-run
if: (github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||
github.event_name == 'workflow_dispatch' ||
(github.event.action != 'labeled' && needs.should-run.outputs.gpu_test == 'true')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: hyperstack
profile: gpu-test
cuda-signed-integer-tests:
name: CUDA signed integer tests
needs: [ should-run, setup-instance ]
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.setup-instance.result != 'skipped')
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Should run nightly tests
if: github.event_name == 'schedule'
run: |
{
echo "FAST_TESTS=FALSE";
echo "NIGHTLY_TESTS=TRUE";
} >> "${GITHUB_ENV}"
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run signed integer multi-bit tests
run: |
make test_signed_integer_multi_bit_gpu_ci
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-signed-integer-tests ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-signed-integer-tests.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ needs.cuda-signed-integer-tests.result }}
SLACK_MESSAGE: "Base GPU tests finished with status: ${{ needs.cuda-signed-integer-tests.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-tests)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-signed-integer-tests ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-signed-integer-tests) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -0,0 +1,188 @@
# Test unsigned integers on an H100 VM on hyperstack
name: TFHE Cuda Backend - Unsigned integer tests on H100
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
types: [ labeled ]
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: write
outputs:
gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
gpu:
- tfhe/Cargo.toml
- tfhe/build.rs
- backends/tfhe-cuda-backend/**
- tfhe/src/core_crypto/gpu/**
- tfhe/src/integer/gpu/**
- tfhe/shortint/parameters/**
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- Makefile
- '.github/workflows/gpu_unsigned_integer_tests.yml'
- scripts/**
- ci/**
setup-instance:
name: Setup instance (cuda-h100-tests)
needs: should-run
if: github.event_name != 'pull_request' ||
(github.event.action != 'labeled' && needs.should-run.outputs.gpu_test == 'true') ||
(github.event.action == 'labeled' && github.event.label.name == 'approved' && needs.should-run.outputs.gpu_test == 'true')
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: hyperstack
profile: single-h100
cuda-tests-linux:
name: CUDA H100 unsigned integer tests
needs: [ should-run, setup-instance ]
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.setup-instance.result != 'skipped')
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run unsigned integer tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_unsigned_integer_gpu_ci
- name: Run unsigned integer multi-bit tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_unsigned_integer_multi_bit_gpu_ci
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-tests-linux.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ needs.cuda-tests-linux.result }}
SLACK_MESSAGE: "Unsigned integer GPU H100 tests finished with status: ${{ needs.cuda-tests-linux.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-h100-tests)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-tests-linux ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-h100-tests) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -0,0 +1,199 @@
# Compile and test tfhe-cuda-backend unsigned integer on an AWS instance
name: TFHE Cuda Backend - Unsigned integer tests
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_TESTS: TRUE
NIGHTLY_TESTS: FALSE
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
pull_request:
types:
- opened
- synchronize
- labeled
schedule:
# Nightly tests @ 1AM after each work day
- cron: "0 1 * * MON-FRI"
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: write
outputs:
gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
gpu:
- tfhe/Cargo.toml
- tfhe/build.rs
- backends/tfhe-cuda-backend/**
- tfhe/src/core_crypto/gpu/**
- tfhe/src/integer/gpu/**
- tfhe/shortint/parameters/**
- tfhe/src/high_level_api/**
- tfhe/src/c_api/**
- 'tfhe/docs/**.md'
- '.github/workflows/gpu_unsigned_integer_tests.yml'
- Makefile
- scripts/**
- ci/**
setup-instance:
name: Setup instance (cuda-unsigned-integer-tests)
needs: should-run
if: (github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||
github.event_name == 'workflow_dispatch' ||
(github.event.action != 'labeled' && needs.should-run.outputs.gpu_test == 'true')
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: hyperstack
profile: gpu-test
cuda-unsigned-integer-tests:
name: CUDA unsigned integer tests
needs: [ should-run, setup-instance ]
if: github.event_name != 'pull_request' ||
(github.event_name == 'pull_request' && needs.setup-instance.result != 'skipped')
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "HOME=/home/ubuntu";
} >> "${GITHUB_ENV}"
- name: Should run nightly tests
if: github.event_name == 'schedule'
run: |
{
echo "FAST_TESTS=FALSE";
echo "NIGHTLY_TESTS=TRUE";
} >> "${GITHUB_ENV}"
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run unsigned integer multi-bit tests
run: |
make test_unsigned_integer_multi_bit_gpu_ci
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-unsigned-integer-tests ]
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-unsigned-integer-tests.result != 'skipped' }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ needs.cuda-unsigned-integer-tests.result }}
SLACK_MESSAGE: "Unsigned integer GPU tests finished with status: ${{ needs.cuda-unsigned-integer-tests.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-tests)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-unsigned-integer-tests ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-unsigned-integer-tests) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -1,130 +0,0 @@
# Run integer benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Integer benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
name: Execute integer benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
with:
toolchain: nightly
- name: Run benchmarks with AVX512
run: |
make FAST_BENCH=TRUE bench_integer
- name: Parse benchmarks to csv
run: |
make PARSE_INTEGER_BENCH_CSV_FILE=${{ env.PARSE_INTEGER_BENCH_CSV_FILE }} \
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Integer benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -1,28 +1,20 @@
# Run all integer benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Integer full benchmarks
name: Integer benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
all_precisions:
description: "Run all precisions"
type: boolean
default: false
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
# Quarterly benchmarks will be triggered right before end of quarter, the 25th of the current month at 4a.m.
# These benchmarks are far longer to execute hence the reason to run them only four time a year.
- cron: '0 4 25 MAR,JUN,SEP,DEC *'
env:
CARGO_TERM_COLOR: always
@@ -30,21 +22,29 @@ env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_BENCH: TRUE
jobs:
prepare-matrix:
name: Prepare operations matrix
runs-on: ubuntu-latest
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
op_flavor: ${{ steps.set_op_flavor.outputs.op_flavor }}
steps:
- name: Weekly benchmarks
if: ${{ github.event.inputs.user_inputs == 'weekly_benchmarks' }}
if: github.event_name == 'workflow_dispatch' ||
github.event.schedule == '0 1 * * 6'
run: |
echo "OP_FLAVOR=[\"default\"]" >> "${GITHUB_ENV}"
- name: Quarterly benchmarks
if: ${{ github.event.inputs.user_inputs == 'quarterly_benchmarks' }}
if: github.event.schedule == '0 4 25 MAR,JUN,SEP,DEC *'
run: |
echo "OP_FLAVOR=[\"default\", \"smart\", \"unchecked\", \"misc\"]" >> "${GITHUB_ENV}"
@@ -53,11 +53,31 @@ jobs:
run: |
echo "op_flavor=${{ toJSON(env.OP_FLAVOR) }}" >> "${GITHUB_OUTPUT}"
setup-instance:
name: Setup instance (integer-benchmarks)
needs: prepare-matrix
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: bench
integer-benchmarks:
name: Execute integer benchmarks for all operations flavor
needs: prepare-matrix
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
needs: [ prepare-matrix, setup-instance ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
timeout-minutes: 1440 # 24 hours
strategy:
@@ -66,15 +86,8 @@ jobs:
command: [ integer, integer_multi_bit]
op_flavor: ${{ fromJson(needs.prepare-matrix.outputs.op_flavor) }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
@@ -92,17 +105,22 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Should run benchmarks with all precisions
if: inputs.all_precisions
run: |
echo "FAST_BENCH=FALSE" >> "${GITHUB_ENV}"
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_${{ matrix.command }}
@@ -111,7 +129,7 @@ jobs:
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--hardware "hpc7a.96xlarge" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
@@ -121,7 +139,7 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
@@ -140,19 +158,34 @@ jobs:
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
slack-notification:
name: Slack Notification
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ failure() }}
needs: integer-benchmarks
steps:
- name: Notify
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Integer full benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
teardown-instance:
name: Teardown instance (integer-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, integer-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (integer-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -23,13 +23,14 @@ jobs:
setup-instance:
name: Setup instance (cuda-integer-benchmarks)
runs-on: ubuntu-latest
if: ${{ (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') || github.event_name == 'workflow_dispatch' }}
if: github.event_name == 'workflow_dispatch' ||
(github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -52,22 +53,13 @@ jobs:
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.1
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
@@ -76,7 +68,7 @@ jobs:
sudo make install
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
@@ -94,7 +86,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
@@ -118,6 +110,10 @@ jobs:
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
} >> "${GITHUB_ENV}"
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run benchmarks with AVX512
run: |
make FAST_BENCH=TRUE BENCH_OP_FLAVOR=default bench_integer_gpu
@@ -128,7 +124,7 @@ jobs:
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
@@ -148,13 +144,13 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
@@ -177,8 +173,8 @@ jobs:
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-integer-benchmarks ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
if: ${{ !success() && !cancelled() }}
runs-on: ubuntu-latest
if: ${{ always() && needs.cuda-integer-benchmarks.result != 'skipped' && failure() }}
continue-on-error: true
steps:
- name: Send message
@@ -195,7 +191,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -22,13 +22,14 @@ jobs:
setup-instance:
name: Setup instance (cuda-integer-full-benchmarks)
runs-on: ubuntu-latest
if: ${{ (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') || github.event_name == 'workflow_dispatch' }}
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -56,22 +57,13 @@ jobs:
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.1
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
@@ -80,7 +72,7 @@ jobs:
sudo make install
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
@@ -98,7 +90,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
@@ -123,12 +115,16 @@ jobs:
} >> "${GITHUB_ENV}"
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_${{ matrix.command }}_gpu
@@ -148,7 +144,7 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
@@ -170,7 +166,7 @@ jobs:
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-integer-full-benchmarks ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
runs-on: ubuntu-latest
if: ${{ !success() && !cancelled() }}
continue-on-error: true
steps:
@@ -188,7 +184,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -1,130 +0,0 @@
# Run integer benchmarks with multi-bit cryptographic parameters on an AWS instance and return parsed results to Slab CI bot.
name: Integer Multi-bit benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
name: Execute integer multi-bit benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
with:
toolchain: nightly
- name: Run multi-bit benchmarks with AVX512
run: |
make FAST_BENCH=TRUE bench_integer_multi_bit
- name: Parse benchmarks to csv
run: |
make PARSE_INTEGER_BENCH_CSV_FILE=${{ env.PARSE_INTEGER_BENCH_CSV_FILE }} \
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Integer benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -3,6 +3,16 @@ name: Integer GPU Multi-bit benchmarks
on:
workflow_dispatch:
inputs:
all_precisions:
description: "Run all precisions"
type: boolean
default: false
fast_default:
description: "Run only deduplicated default operations without scalar variants"
type: boolean
default: false
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
@@ -18,18 +28,21 @@ env:
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_BENCH: TRUE
BENCH_OP_FLAVOR: default
jobs:
setup-instance:
name: Setup instance (cuda-integer-multi-bit-benchmarks)
runs-on: ubuntu-latest
if: ${{ (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') || github.event_name == 'workflow_dispatch' }}
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -53,22 +66,13 @@ jobs:
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.1
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
@@ -77,7 +81,7 @@ jobs:
sudo make install
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
@@ -95,7 +99,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
@@ -119,9 +123,23 @@ jobs:
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
} >> "${GITHUB_ENV}"
- name: Should run benchmarks with all precisions
if: inputs.all_precisions
run: |
echo "FAST_BENCH=FALSE" >> "${GITHUB_ENV}"
- name: Should run fast subset benchmarks
if: inputs.fast_default
run: |
echo "BENCH_OP_FLAVOR=fast_default" >> "${GITHUB_ENV}"
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run multi-bit benchmarks with AVX512
run: |
make FAST_BENCH=TRUE BENCH_OP_FLAVOR=default bench_integer_multi_bit_gpu
make bench_unsigned_integer_multi_bit_gpu
- name: Parse benchmarks to csv
run: |
@@ -129,7 +147,7 @@ jobs:
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
@@ -149,13 +167,13 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
@@ -179,7 +197,7 @@ jobs:
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-integer-multi-bit-benchmarks ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
runs-on: ubuntu-latest
if: ${{ !success() && !cancelled() }}
continue-on-error: true
steps:
@@ -197,7 +215,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -0,0 +1,221 @@
# Run 64-bit multi-bit integer benchmarks on an instance with CUDA and return parsed results to Slab CI bot.
name: Integer multi GPU Multi-bit benchmarks
on:
workflow_dispatch:
inputs:
all_precisions:
description: "Run all precisions"
type: boolean
default: false
fast_default:
description: "Run only deduplicated default operations without scalar variants"
type: boolean
default: false
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_BENCH: TRUE
BENCH_OP_FLAVOR: default
jobs:
setup-instance:
name: Setup instance (cuda-integer-multi-bit-multi-gpu-benchmarks)
runs-on: ubuntu-latest
if: ${{ (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||
github.event_name == 'workflow_dispatch' }}
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: hyperstack
profile: multi-h100
cuda-integer-multi-bit-multi-gpu-benchmarks:
name: Execute multi GPU integer multi-bit benchmarks
needs: setup-instance
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
timeout-minutes: 1440 # 24 hours
continue-on-error: true
strategy:
fail-fast: false
max-parallel: 1
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
{
echo "CUDA_PATH=$CUDA_PATH";
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH";
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc";
} >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
} >> "${GITHUB_ENV}"
- name: Checkout Slab repo
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Should run benchmarks with all precisions
if: inputs.all_precisions
run: |
echo "FAST_BENCH=FALSE" >> "${GITHUB_ENV}"
- name: Should run fast subset benchmarks
if: inputs.fast_default
run: |
echo "BENCH_OP_FLAVOR=fast_default" >> "${GITHUB_ENV}"
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run multi-bit benchmarks with AVX512
run: |
make bench_unsigned_integer_multi_bit_gpu
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "n3-H100x8" \
--backend gpu \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-integer-multi-bit-multi-gpu-benchmarks ]
runs-on: ubuntu-latest
if: ${{ !success() && !cancelled() }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ needs.cuda-integer-multi-bit-multi-gpu-benchmarks.result }}
SLACK_MESSAGE: "Integer multi GPU multi-bit benchmarks finished with status: ${{ needs.cuda-integer-multi-bit-multi-gpu-benchmarks.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-integer-multi-bit-multi-gpu-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-integer-multi-bit-multi-gpu-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-integer-multi-bit-multi-gpu-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -0,0 +1,201 @@
# Run all integer benchmarks on an instance with CUDA and return parsed results to Slab CI bot.
name: Integer multi GPU full benchmarks
on:
workflow_dispatch:
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
jobs:
setup-instance:
name: Setup instance (cuda-integer-full-multi-gpu-benchmarks)
runs-on: ubuntu-latest
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: hyperstack
profile: multi-h100
cuda-integer-full-multi-gpu-benchmarks:
name: Execute multi GPU integer benchmarks for all operations flavor
needs: setup-instance
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
timeout-minutes: 1440 # 24 hours
continue-on-error: true
strategy:
fail-fast: false
max-parallel: 1
matrix:
command: [integer, integer_multi_bit]
op_flavor: [default, unchecked]
# explicit include-based build matrix, of known valid options
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 11
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
CMAKE_VERSION: 3.29.6
steps:
# Mandatory on hyperstack since a bootable volume is not re-usable yet.
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y checkinstall zlib1g-dev libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v${{ env.CMAKE_VERSION }}/cmake-${{ env.CMAKE_VERSION }}.tar.gz
tar -zxvf cmake-${{ env.CMAKE_VERSION }}.tar.gz
cd cmake-${{ env.CMAKE_VERSION }}
./bootstrap
make -j"$(nproc)"
sudo make install
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
{
echo "CUDA_PATH=$CUDA_PATH";
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH";
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc";
} >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
{
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}";
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}";
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}";
} >> "${GITHUB_ENV}"
- name: Checkout Slab repo
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Check device is detected
if: ${{ !cancelled() }}
run: nvidia-smi
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_${{ matrix.command }}_gpu
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "n3-H100x8" \
--backend gpu \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
slack-notify:
name: Slack Notification
needs: [ setup-instance, cuda-integer-full-multi-gpu-benchmarks ]
runs-on: ubuntu-latest
if: ${{ !success() && !cancelled() }}
continue-on-error: true
steps:
- name: Send message
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ needs.cuda-integer-full-multi-gpu-benchmarks.result }}
SLACK_MESSAGE: "Integer GPU full benchmarks finished with status: ${{ needs.cuda-integer-full-multi-gpu-benchmarks.result }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (cuda-integer-full-multi-gpu-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, cuda-integer-full-multi-gpu-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (cuda-integer-full-multi-gpu-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -18,6 +18,9 @@ env:
RUST_MIN_STACK: "8388608"
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
FAST_TESTS: "TRUE"
# We clear the cache to reduce memory pressure because of the numerous processes of cargo
# nextest
TFHE_RS_CLEAR_IN_MEMORY_KEY_CACHE: "1"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
@@ -31,12 +34,12 @@ jobs:
timeout-minutes: 720
steps:
- uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
persist-credentials: 'false'
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable

View File

@@ -20,20 +20,72 @@ on:
description: "Push node js package"
type: boolean
default: true
npm_latest_tag:
description: "Set NPM tag as latest"
type: boolean
default: false
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
NPM_TAG: ""
jobs:
publish_release:
name: Publish Release
package:
runs-on: ubuntu-latest
outputs:
hash: ${{ steps.hash.outputs.hash }}
steps:
- name: Checkout
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Prepare package
run: |
cargo package -p tfhe
- uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a # v4.3.6
with:
name: crate
path: target/package/*.crate
- name: generate hash
id: hash
run: cd target/package && echo "hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
provenance:
if: ${{ !inputs.dry_run }}
needs: [package]
uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.0.0
permissions:
# Needed to detect the GitHub Actions environment
actions: read
# Needed to create the provenance via GitHub OIDC
id-token: write
# Needed to upload assets/artifacts
contents: write
with:
# SHA-256 hashes of the Crate package.
base64-subjects: ${{ needs.package.outputs.hash }}
publish_release:
name: Publish Release
needs: [package] # for comparing hashes
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write
steps:
- name: Checkout
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Create NPM version tag
if: ${{ inputs.npm_latest_tag }}
run: |
echo "NPM_TAG=latest" >> "${GITHUB_ENV}"
- name: Download artifact
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
with:
name: crate
path: target/package
- name: Publish crate.io package
if: ${{ inputs.push_to_crates }}
env:
@@ -42,10 +94,26 @@ jobs:
run: |
cargo publish -p tfhe --token ${{ env.CRATES_TOKEN }} ${{ env.DRY_RUN }}
- name: Generate hash
id: published_hash
run: cd target/package && echo "pub_hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
- name: Slack notification (hashes comparison)
if: ${{ needs.package.outputs.hash != steps.published_hash.outputs.pub_hash }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: failure
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "SLSA tfhe crate - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
- name: Build web package
if: ${{ inputs.push_web_package }}
run: |
make build_web_js_api
make build_web_js_api_parallel
- name: Publish web package
if: ${{ inputs.push_web_package }}
@@ -54,6 +122,8 @@ jobs:
token: ${{ secrets.NPM_TOKEN }}
package: tfhe/pkg/package.json
dry-run: ${{ inputs.dry_run }}
tag: ${{ env.NPM_TAG }}
provenance: true
- name: Build Node package
if: ${{ inputs.push_node_package }}
@@ -70,6 +140,8 @@ jobs:
token: ${{ secrets.NPM_TOKEN }}
package: tfhe/pkg/package.json
dry-run: ${{ inputs.dry_run }}
tag: ${{ env.NPM_TAG }}
provenance: true
- name: Slack Notification
if: ${{ failure() }}

View File

@@ -18,7 +18,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0

View File

@@ -29,7 +29,7 @@ jobs:
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
@@ -54,7 +54,7 @@ jobs:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
steps:
- name: Checkout
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
@@ -63,7 +63,7 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: stable
@@ -112,7 +112,7 @@ jobs:
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@1dced74825027fe3d481392163ed8fc56813fb5d
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}

View File

@@ -18,7 +18,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0

View File

@@ -14,17 +14,17 @@ on:
jobs:
params-curves-security-check:
runs-on: ubuntu-latest
runs-on: large_ubuntu_16
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
- name: Checkout lattice-estimator
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: malb/lattice-estimator
path: lattice_estimator
ref: '53508253629d3b5d31a2ad110e85dc69391ccb95'
ref: 'e80ec6bbbba212428b0e92d0467c18629cf9ed67'
- name: Install Sage
run: |

View File

@@ -1,128 +0,0 @@
# Run shortint benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Shortint benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-shortint-benchmarks:
name: Execute shortint benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
with:
toolchain: nightly
- name: Run benchmarks with AVX512
run: |
make bench_shortint
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Measure key sizes
run: |
make measure_shortint_key_sizes
- name: Parse key sizes results
run: |
python3 ./ci/benchmark_parser.py tfhe/shortint_key_sizes.csv ${{ env.RESULTS_FILENAME }} \
--key-sizes \
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_shortint
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Shortint benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -0,0 +1,193 @@
# Run all shortint benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Shortint full benchmarks
on:
workflow_dispatch:
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
# Quarterly benchmarks will be triggered right before end of quarter, the 25th of the current month at 4a.m.
# These benchmarks are far longer to execute hence the reason to run them only four time a year.
- cron: '0 4 25 MAR,JUN,SEP,DEC *'
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
jobs:
prepare-matrix:
name: Prepare operations matrix
runs-on: ubuntu-latest
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
op_flavor: ${{ steps.set_op_flavor.outputs.op_flavor }}
steps:
- name: Weekly benchmarks
if: github.event_name == 'workflow_dispatch' ||
github.event.schedule == '0 1 * * 6'
run: |
echo "OP_FLAVOR=[\"default\"]" >> "${GITHUB_ENV}"
- name: Quarterly benchmarks
if: github.event.schedule == '0 4 25 MAR,JUN,SEP,DEC *'
run: |
echo "OP_FLAVOR=[\"default\", \"smart\", \"unchecked\"]" >> "${GITHUB_ENV}"
- name: Set operation flavor output
id: set_op_flavor
run: |
echo "op_flavor=${{ toJSON(env.OP_FLAVOR) }}" >> "${GITHUB_OUTPUT}"
setup-instance:
name: Setup instance (shortint-benchmarks)
needs: prepare-matrix
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: bench
shortint-benchmarks:
name: Execute shortint benchmarks for all operations flavor
needs: [ prepare-matrix, setup-instance ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
strategy:
max-parallel: 1
matrix:
op_flavor: ${{ fromJson(needs.prepare-matrix.outputs.op_flavor) }}
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_shortint
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
# This small benchmark needs to be executed only once.
- name: Measure key sizes
if: matrix.op_flavor == 'default'
run: |
make measure_shortint_key_sizes
- name: Parse key sizes results
if: matrix.op_flavor == 'default'
run: |
python3 ./ci/benchmark_parser.py tfhe/shortint_key_sizes.csv ${{ env.RESULTS_FILENAME }} \
--key-sizes \
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_shortint_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Shortint full benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (shortint-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, shortint-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (shortint-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -1,152 +0,0 @@
# Run all shortint benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Shortint full benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
shortint-benchmarks:
name: Execute shortint benchmarks for all operations flavor
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
strategy:
max-parallel: 1
matrix:
op_flavor: [ default, smart, unchecked ]
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_shortint
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
# This small benchmark needs to be executed only once.
- name: Measure key sizes
if: matrix.op_flavor == 'default'
run: |
make measure_shortint_key_sizes
- name: Parse key sizes results
if: matrix.op_flavor == 'default'
run: |
python3 ./ci/benchmark_parser.py tfhe/shortint_key_sizes.csv ${{ env.RESULTS_FILENAME }} \
--key-sizes \
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_shortint_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
slack-notification:
name: Slack Notification
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ failure() }}
needs: shortint-benchmarks
steps:
- name: Notify
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Shortint full benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -1,130 +0,0 @@
# Run signed integer benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Signed Integer benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
name: Execute signed integer benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
with:
toolchain: nightly
- name: Run benchmarks with AVX512
run: |
make FAST_BENCH=TRUE bench_signed_integer
- name: Parse benchmarks to csv
run: |
make PARSE_INTEGER_BENCH_CSV_FILE=${{ env.PARSE_INTEGER_BENCH_CSV_FILE }} \
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Signed integer benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -0,0 +1,191 @@
# Run all signed integer benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Signed Integer full benchmarks
on:
workflow_dispatch:
inputs:
all_precisions:
description: "Run all precisions"
type: boolean
default: false
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
# Quarterly benchmarks will be triggered right before end of quarter, the 25th of the current month at 4a.m.
# These benchmarks are far longer to execute hence the reason to run them only four time a year.
- cron: '0 4 25 MAR,JUN,SEP,DEC *'
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
FAST_BENCH: TRUE
jobs:
prepare-matrix:
name: Prepare operations matrix
runs-on: ubuntu-latest
if: github.event_name != 'schedule' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs')
outputs:
op_flavor: ${{ steps.set_op_flavor.outputs.op_flavor }}
steps:
- name: Weekly benchmarks
if: github.event_name == 'workflow_dispatch' ||
github.event.schedule == '0 1 * * 6'
run: |
echo "OP_FLAVOR=[\"default\"]" >> "${GITHUB_ENV}"
- name: Quarterly benchmarks
if: github.event.schedule == '0 4 25 MAR,JUN,SEP,DEC *'
run: |
echo "OP_FLAVOR=[\"default\", \"unchecked\"]" >> "${GITHUB_ENV}"
- name: Set operation flavor output
id: set_op_flavor
run: |
echo "op_flavor=${{ toJSON(env.OP_FLAVOR) }}" >> "${GITHUB_OUTPUT}"
setup-instance:
name: Setup instance (signed-integer-benchmarks)
needs: prepare-matrix
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: bench
signed-integer-benchmarks:
name: Execute signed integer benchmarks for all operations flavor
needs: [ prepare-matrix, setup-instance ]
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
continue-on-error: true
timeout-minutes: 1440 # 24 hours
strategy:
max-parallel: 1
matrix:
command: [ integer, integer_multi_bit ]
op_flavor: [ default, unchecked ]
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Should run benchmarks with all precisions
if: inputs.all_precisions
run: |
echo "FAST_BENCH=FALSE" >> "${GITHUB_ENV}"
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_signed_${{ matrix.command }}
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Signed integer full benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (integer-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, signed-integer-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (signed-integer-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

View File

@@ -1,136 +0,0 @@
# Run all signed integer benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Signed Integer full benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
integer-benchmarks:
name: Execute signed integer benchmarks for all operations flavor
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
continue-on-error: true
timeout-minutes: 1440 # 24 hours
strategy:
max-parallel: 1
matrix:
command: [ integer, integer_multi_bit ]
op_flavor: [ default, unchecked ]
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Run benchmarks with AVX512
run: |
make BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_signed_${{ matrix.command }}
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
slack-notification:
name: Slack Notification
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ failure() }}
needs: integer-benchmarks
steps:
- name: Notify
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Signed integer full benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -1,130 +0,0 @@
# Run signed integer benchmarks with multi-bit cryptographic parameters on an AWS instance and return parsed results to Slab CI bot.
name: Signed Integer Multi-bit benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
name: Execute signed integer multi-bit benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
with:
toolchain: nightly
- name: Run multi-bit benchmarks with AVX512
run: |
make FAST_BENCH=TRUE bench_signed_integer_multi_bit
- name: Parse benchmarks to csv
run: |
make PARSE_INTEGER_BENCH_CSV_FILE=${{ env.PARSE_INTEGER_BENCH_CSV_FILE }} \
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Signed integer benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -1,130 +0,0 @@
# Start all benchmark jobs on Slab CI bot.
name: Start all benchmarks
on:
push:
branches:
- "main"
workflow_dispatch:
inputs:
# The input name must be the name of the slab command to launch
boolean_bench:
description: "Run Boolean benches"
type: boolean
default: true
shortint_bench:
description: "Run shortint benches"
type: boolean
default: true
integer_bench:
description: "Run integer benches"
type: boolean
default: true
signed_integer_bench:
description: "Run signed integer benches"
type: boolean
default: true
integer_multi_bit_bench:
description: "Run integer multi bit benches"
type: boolean
default: true
signed_integer_multi_bit_bench:
description: "Run signed integer multi bit benches"
type: boolean
default: true
core_crypto_bench:
description: "Run core crypto benches"
type: boolean
default: true
wasm_client_bench:
description: "Run WASM client benches"
type: boolean
default: true
jobs:
start-benchmarks:
if: ${{ (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') || github.event_name == 'workflow_dispatch' }}
strategy:
matrix:
command: [ boolean_bench, shortint_bench,
integer_bench, integer_multi_bit_bench,
signed_integer_bench, signed_integer_multi_bit_bench,
core_crypto_bench, wasm_client_bench ]
runs-on: ubuntu-latest
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@03334d095e2739fa9ac4034ec16f66d5d01e9eba
with:
files_yaml: |
common_benches:
- toolchain.txt
- Makefile
- ci/slab.toml
- tfhe/Cargo.toml
- tfhe/src/core_crypto/**
- .github/workflows/start_benchmarks.yml
boolean_bench:
- tfhe/src/boolean/**
- tfhe/benches/boolean/**
- .github/workflows/boolean_benchmark.yml
shortint_bench:
- tfhe/src/shortint/**
- tfhe/benches/shortint/**
- .github/workflows/shortint_benchmark.yml
integer_bench:
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/benches/integer/bench.rs
- .github/workflows/integer_benchmark.yml
integer_multi_bit_bench:
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/benches/integer/bench.rs
- .github/workflows/integer_multi_bit_benchmark.yml
signed_integer_bench:
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/benches/integer/signed_bench.rs
- .github/workflows/signed_integer_benchmark.yml
signed_integer_multi_bit_bench:
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/benches/integer/signed_bench.rs
- .github/workflows/signed_integer_multi_bit_benchmark.yml
core_crypto_bench:
- tfhe/src/core_crypto/**
- tfhe/benches/core_crypto/**
- .github/workflows/core_crypto_benchmark.yml
wasm_client_bench:
- tfhe/web_wasm_parallel_tests/**
- .github/workflows/wasm_client_benchmark.yml
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Start AWS job in Slab
# If manually triggered check that the current bench has been requested
# Otherwise if it's on push check that files relevant to benchmarks have changed
if: (github.event_name == 'workflow_dispatch' && github.event.inputs[matrix.command] == 'true') || (github.event_name == 'push' && (steps.changed-files.outputs.common_benches_any_changed == 'true' || steps.changed-files.outputs[format('{0}_any_changed', matrix.command)] == 'true'))
shell: bash
run: |
echo -n '{"command": "${{ matrix.command }}", "git_ref": "${{ github.ref }}", "sha": "${{ github.sha }}"}' > command.json
SIGNATURE="$(slab/scripts/hmac_calculator.sh command.json '${{ secrets.JOB_SECRET }}')"
curl -v -k \
--fail-with-body \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: start_aws" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @command.json \
${{ secrets.SLAB_URL }}

View File

@@ -1,66 +0,0 @@
# Start all benchmark jobs, including full shortint and integer, on Slab CI bot.
name: Start full suite benchmarks
on:
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
# Quarterly benchmarks will be triggered right before end of quarter, the 25th of the current month at 4a.m.
# These benchmarks are far longer to execute hence the reason to run them only four time a year.
- cron: '0 4 25 MAR,JUN,SEP,DEC *'
workflow_dispatch:
inputs:
benchmark_type:
description: 'Benchmark type'
required: true
default: 'weekly'
type: choice
options:
- weekly
- quarterly
jobs:
start-benchmarks:
if: ${{ (github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') || github.event_name == 'workflow_dispatch' }}
strategy:
matrix:
command: [ boolean_bench, shortint_full_bench,
integer_full_bench, signed_integer_full_bench,
core_crypto_bench, wasm_client_bench ]
runs-on: ubuntu-latest
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
fetch-depth: 0
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Set benchmarks type as weekly
if: (github.event_name == 'workflow_dispatch' && inputs.benchmark_type == 'weekly') || github.event.schedule == '0 1 * * 6'
run: |
echo "BENCH_TYPE=weekly_benchmarks" >> "${GITHUB_ENV}"
- name: Set benchmarks type as quarterly
if: (github.event_name == 'workflow_dispatch' && inputs.benchmark_type == 'quarterly') || github.event.schedule == '0 4 25 MAR,JUN,SEP,DEC *'
run: |
echo "BENCH_TYPE=quarterly_benchmarks" >> "${GITHUB_ENV}"
- name: Start AWS job in Slab
shell: bash
run: |
echo -n '{"command": "${{ matrix.command }}", "git_ref": "${{ github.ref }}", "sha": "${{ github.sha }}", "user_inputs": "${{ env.BENCH_TYPE }}"}' > command.json
SIGNATURE="$(slab/scripts/hmac_calculator.sh command.json '${{ secrets.JOB_SECRET }}')"
curl -v -k \
--fail-with-body \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: start_aws" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @command.json \
${{ secrets.SLAB_URL }}

View File

@@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: git-sync

View File

@@ -1,32 +1,14 @@
# Run WASM client benchmarks on an AWS instance and return parsed results to Slab CI bot.
# Run WASM client benchmarks on an instance and return parsed results to Slab CI bot.
name: WASM client benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
push:
branches:
- main
schedule:
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
env:
CARGO_TERM_COLOR: always
@@ -34,56 +16,106 @@ env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
jobs:
run-wasm-client-benchmarks:
name: Execute WASM client benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
should-run:
runs-on: ubuntu-latest
if: github.event_name == 'workflow_dispatch' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||
(github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs')
permissions:
pull-requests: write
outputs:
wasm_bench: ${{ steps.changed-files.outputs.wasm_bench_any_changed }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
wasm_bench:
- tfhe/Cargo.toml
- concrete-csprng/**
- tfhe-zk-pok/**
- tfhe/src/**
- '!tfhe/src/c_api/**'
- tfhe/web_wasm_parallel_tests/**
- .github/workflows/wasm_client_benchmark.yml
setup-instance:
name: Setup instance (wasm-client-benchmarks)
if: github.event_name == 'workflow_dispatch' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||
(github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs' && needs.should-run.outputs.wasm_bench)
needs: should-run
runs-on: ubuntu-latest
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: cpu-small
wasm-client-benchmarks:
name: Execute WASM client benchmarks
needs: setup-instance
if: needs.setup-instance.result != 'skipped'
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@d8352f6b1d2e870bc5716e7a6d9b65c4cc244a1a
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Run benchmarks
run: |
make install_node
make ci_bench_web_js_api_parallel
make bench_web_js_api_parallel_ci
- name: Parse results
run: |
make parse_wasm_benchmarks
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py tfhe/wasm_pk_gen.csv ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--hardware "m6i.4xlarge" \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--key-gen
@@ -98,13 +130,13 @@ jobs:
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_wasm
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
@@ -130,8 +162,28 @@ jobs:
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "WASM benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
teardown-instance:
name: Teardown instance (wasm-client-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, wasm-client-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (wasm-client-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

197
.github/workflows/zk_pke_benchmark.yml vendored Normal file
View File

@@ -0,0 +1,197 @@
# Run PKE Zero-Knowledge benchmarks on an instance and return parsed results to Slab CI bot.
name: PKE ZK benchmarks
on:
workflow_dispatch:
push:
branches:
- main
schedule:
# Weekly benchmarks will be triggered each Saturday at 3a.m.
- cron: '0 3 * * 6'
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
jobs:
should-run:
runs-on: ubuntu-latest
if: github.event_name == 'workflow_dispatch' ||
((github.event_name == 'push' || github.event_name == 'schedule') && github.repository == 'zama-ai/tfhe-rs')
outputs:
zk_pok_changed: ${{ steps.changed-files.outputs.zk_pok_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@c65cd883420fd2eb864698a825fc4162dd94482c
with:
since_last_remote_commit: true
files_yaml: |
zk_pok:
- tfhe/Cargo.toml
- concrete-csprng/**
- tfhe-zk-pok/**
- tfhe/src/core_crypto/**
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/src/zk.rs
- tfhe/benches/integer/zk_pke.rs
- .github/workflows/zk_pke_benchmark.yml
setup-instance:
name: Setup instance (pke-zk-benchmarks)
runs-on: ubuntu-latest
needs: should-run
if: github.event_name == 'workflow_dispatch' ||
(github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') ||
(github.event_name == 'push' &&
github.repository == 'zama-ai/tfhe-rs' &&
needs.should-run.outputs.zk_pok_changed == 'true')
outputs:
runner-name: ${{ steps.start-instance.outputs.label }}
steps:
- name: Start instance
id: start-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: start
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
backend: aws
profile: bench
pke-zk-benchmarks:
name: Execute PKE ZK benchmarks
if: needs.setup-instance.result != 'skipped'
needs: setup-instance
concurrency:
group: ${{ github.workflow }}_${{github.event_name}}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
runs-on: ${{ needs.setup-instance.outputs.runner-name }}
steps:
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
fetch-depth: 0
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@7b1c307e0dcbda6122208f10795a713336a9b35a
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Run benchmarks with AVX512
run: |
make bench_integer_zk
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware "hpc7a.96xlarge" \
--backend cpu \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Parse CRS sizes results
run: |
python3 ./ci/benchmark_parser.py tfhe/pke_zk_crs_sizes.csv ${{ env.RESULTS_FILENAME }} \
--key-sizes \
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@834a144ee995460fba8ed112a2fc961b36a5ec5a
with:
name: ${{ github.sha }}_integer_zk
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.FHE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ !success() && !cancelled() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "PKE ZK benchmarks finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
teardown-instance:
name: Teardown instance (pke-zk-benchmarks)
if: ${{ always() && needs.setup-instance.result != 'skipped' }}
needs: [ setup-instance, pke-zk-benchmarks ]
runs-on: ubuntu-latest
steps:
- name: Stop instance
id: stop-instance
uses: zama-ai/slab-github-runner@447a2d0fd2d1a9d647aa0d0723a6e9255372f261
with:
mode: stop
github-token: ${{ secrets.SLAB_ACTION_TOKEN }}
slab-url: ${{ secrets.SLAB_BASE_URL }}
job-secret: ${{ secrets.JOB_SECRET }}
label: ${{ needs.setup-instance.outputs.runner-name }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@4e5fb42d249be6a45a298f3c9543b111b02f7907
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "Instance teardown (pke-zk-benchmarks) finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"

7
.gitignore vendored
View File

@@ -7,6 +7,7 @@ target/
# In case of symlinked keys
/keys
**/*.rmeta
**/Cargo.lock
**/*.bin
@@ -22,3 +23,9 @@ dieharder_run.log
# Cuda local build
backends/tfhe-cuda-backend/cuda/cmake-build-debug/
# WASM tests
tfhe/web_wasm_parallel_tests/server.PID
# Dir used for backward compatibility test data
tfhe/tfhe-backward-compat-data/

View File

@@ -7,6 +7,14 @@ members = [
"apps/trivium",
"concrete-csprng",
"backends/tfhe-cuda-backend",
"utils/tfhe-versionable",
"utils/tfhe-versionable-derive",
]
exclude = [
"tfhe/backward_compatibility_tests",
"utils/cargo-tfhe-lints-inner",
"utils/cargo-tfhe-lints"
]
[profile.bench]

228
Makefile
View File

@@ -16,19 +16,15 @@ GEN_KEY_CACHE_COVERAGE_ONLY?=FALSE
PARSE_INTEGER_BENCH_CSV_FILE?=tfhe_rs_integer_benches.csv
FAST_TESTS?=FALSE
FAST_BENCH?=FALSE
NIGHTLY_TESTS?=FALSE
BENCH_OP_FLAVOR?=DEFAULT
NODE_VERSION=20
NODE_VERSION=22.4
FORWARD_COMPAT?=OFF
# sed: -n, do not print input stream, -e means a script/expression
# 1,/version/ indicates from the first line, to the line matching version at the start of the line
# p indicates to print, so we keep only the start of the Cargo.toml until we hit the first version
# entry which should be the version of tfhe
TFHE_CURRENT_VERSION:=\
$(shell sed -n -e '1,/^version/p' tfhe/Cargo.toml | \
grep '^version[[:space:]]*=' | cut -d '=' -f 2 | xargs)
# Cargo has a hard time distinguishing between our package from the workspace and a package that
# could be a dependency, so we build an unambiguous spec here
TFHE_SPEC:=tfhe@$(TFHE_CURRENT_VERSION)
BACKWARD_COMPAT_DATA_URL=https://github.com/zama-ai/tfhe-backward-compat-data.git
BACKWARD_COMPAT_DATA_BRANCH?=v0.1
BACKWARD_COMPAT_DATA_PROJECT=tfhe-backward-compat-data
BACKWARD_COMPAT_DATA_DIR=$(BACKWARD_COMPAT_DATA_PROJECT)
TFHE_SPEC:=tfhe
# This is done to avoid forgetting it, we still precise the RUSTFLAGS in the commands to be able to
# copy paste the command in the terminal and change them if required without forgetting the flags
export RUSTFLAGS?=-C target-cpu=native
@@ -115,7 +111,7 @@ install_cargo_nextest: install_rs_build_toolchain
.PHONY: install_wasm_pack # Install wasm-pack to build JS packages
install_wasm_pack: install_rs_build_toolchain
@wasm-pack --version > /dev/null 2>&1 || \
cargo $(CARGO_RS_BUILD_TOOLCHAIN) install wasm-pack || \
cargo $(CARGO_RS_BUILD_TOOLCHAIN) install --locked wasm-pack@0.13.0 || \
( echo "Unable to install cargo wasm-pack, unknown error." && exit 1 )
.PHONY: install_node # Install last version of NodeJS via nvm
@@ -145,6 +141,11 @@ install_tarpaulin: install_rs_build_toolchain
cargo $(CARGO_RS_BUILD_TOOLCHAIN) install cargo-tarpaulin --locked || \
( echo "Unable to install cargo tarpaulin, unknown error." && exit 1 )
.PHONY: install_tfhe_lints # Install custom tfhe-rs lints
install_tfhe_lints:
(cd utils/cargo-tfhe-lints-inner && cargo install --path .) && \
cd utils/cargo-tfhe-lints && cargo install --path .
.PHONY: check_linelint_installed # Check if linelint newline linter is installed
check_linelint_installed:
@printf "\n" | linelint - > /dev/null 2>&1 || \
@@ -264,6 +265,17 @@ clippy: install_rs_check_toolchain
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_rustdoc # Run clippy lints on doctests enabling the boolean, shortint, integer and zk-pok
clippy_rustdoc: install_rs_check_toolchain
if [[ "$(OS)" != "Linux" && "$(OS)" != "Darwin" ]]; then \
echo "WARNING: skipped clippy_rustdoc, unsupported OS $(OS)"; \
exit 0; \
fi && \
CLIPPYFLAGS="-D warnings" RUSTDOCFLAGS="--no-run --nocapture --test-builder ./scripts/clippy_driver.sh -Z unstable-options" \
cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" test --doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,zk-pok,pbs-stats \
-p $(TFHE_SPEC)
.PHONY: clippy_c_api # Run clippy lints enabling the boolean, shortint and the C API
clippy_c_api: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
@@ -273,7 +285,7 @@ clippy_c_api: install_rs_check_toolchain
.PHONY: clippy_js_wasm_api # Run clippy lints enabling the boolean, shortint, integer and the js wasm API
clippy_js_wasm_api: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api \
--features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api,high-level-client-js-wasm-api \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_tasks # Run clippy lints on helper tasks crate.
@@ -289,7 +301,7 @@ clippy_trivium: install_rs_check_toolchain
.PHONY: clippy_all_targets # Run clippy lints on all targets (benches, examples, etc.)
clippy_all_targets: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,zk-pok-experimental \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,zk-pok \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_concrete_csprng # Run clippy lints on concrete-csprng
@@ -304,18 +316,23 @@ clippy_zk_pok: install_rs_check_toolchain
-p tfhe-zk-pok -- --no-deps -D warnings
.PHONY: clippy_all # Run all clippy targets
clippy_all: clippy clippy_boolean clippy_shortint clippy_integer clippy_all_targets clippy_c_api \
clippy_js_wasm_api clippy_tasks clippy_core clippy_concrete_csprng clippy_zk_pok clippy_trivium
clippy_all: clippy_rustdoc clippy clippy_boolean clippy_shortint clippy_integer clippy_all_targets \
clippy_c_api clippy_js_wasm_api clippy_tasks clippy_core clippy_concrete_csprng clippy_zk_pok clippy_trivium
.PHONY: clippy_fast # Run main clippy targets
clippy_fast: clippy clippy_all_targets clippy_c_api clippy_js_wasm_api clippy_tasks clippy_core \
clippy_concrete_csprng
clippy_fast: clippy_rustdoc clippy clippy_all_targets clippy_c_api clippy_js_wasm_api clippy_tasks \
clippy_core clippy_concrete_csprng
.PHONY: clippy_cuda_backend # Run clippy lints on the tfhe-cuda-backend
clippy_cuda_backend: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
-p tfhe-cuda-backend -- --no-deps -D warnings
.PHONY: tfhe_lints # Run custom tfhe-rs lints
tfhe_lints: install_tfhe_lints
cd tfhe && RUSTFLAGS="$(RUSTFLAGS)" cargo tfhe-lints \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer -- -D warnings
.PHONY: build_core # Build core_crypto without experimental features
build_core: install_rs_build_toolchain install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
@@ -368,21 +385,21 @@ symlink_c_libs_without_fingerprint:
.PHONY: build_c_api # Build the C API for boolean, shortint and integer
build_c_api: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,zk-pok-experimental,$(FORWARD_COMPAT_FEATURE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,zk-pok,$(FORWARD_COMPAT_FEATURE) \
-p $(TFHE_SPEC)
@"$(MAKE)" symlink_c_libs_without_fingerprint
.PHONY: build_c_api_gpu # Build the C API for boolean, shortint and integer
build_c_api_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,zk-pok-experimental,gpu \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,zk-pok,gpu \
-p $(TFHE_SPEC)
@"$(MAKE)" symlink_c_libs_without_fingerprint
.PHONY: build_c_api_experimental_deterministic_fft # Build the C API for boolean, shortint and integer with experimental deterministic FFT
build_c_api_experimental_deterministic_fft: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,zk-pok-experimental,experimental-force_fft_algo_dif4,$(FORWARD_COMPAT_FEATURE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,zk-pok,experimental-force_fft_algo_dif4,$(FORWARD_COMPAT_FEATURE) \
-p $(TFHE_SPEC)
@"$(MAKE)" symlink_c_libs_without_fingerprint
@@ -391,7 +408,7 @@ build_web_js_api: install_rs_build_toolchain install_wasm_pack
cd tfhe && \
RUSTFLAGS="$(WASM_RUSTFLAGS)" rustup run "$(RS_BUILD_TOOLCHAIN)" \
wasm-pack build --release --target=web \
-- --features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api,zk-pok-experimental
-- --features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api,zk-pok
.PHONY: build_web_js_api_parallel # Build the js API targeting the web browser with parallelism support
build_web_js_api_parallel: install_rs_check_toolchain install_wasm_pack
@@ -399,15 +416,16 @@ build_web_js_api_parallel: install_rs_check_toolchain install_wasm_pack
rustup component add rust-src --toolchain $(RS_CHECK_TOOLCHAIN) && \
RUSTFLAGS="$(WASM_RUSTFLAGS) -C target-feature=+atomics,+bulk-memory,+mutable-globals" rustup run $(RS_CHECK_TOOLCHAIN) \
wasm-pack build --release --target=web \
-- --features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api,parallel-wasm-api,zk-pok-experimental \
-Z build-std=panic_abort,std
-- --features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api,parallel-wasm-api,zk-pok \
-Z build-std=panic_abort,std && \
find pkg/snippets -type f -iname workerHelpers.worker.js -exec sed -i "s|from '..\/..\/..\/';|from '..\/..\/..\/tfhe.js';|" {} \;
.PHONY: build_node_js_api # Build the js API targeting nodejs
build_node_js_api: install_rs_build_toolchain install_wasm_pack
cd tfhe && \
RUSTFLAGS="$(WASM_RUSTFLAGS)" rustup run "$(RS_BUILD_TOOLCHAIN)" \
wasm-pack build --release --target=nodejs \
-- --features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api,zk-pok-experimental
-- --features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api,zk-pok
.PHONY: build_concrete_csprng # Build concrete_csprng
build_concrete_csprng: install_rs_build_toolchain
@@ -417,10 +435,10 @@ build_concrete_csprng: install_rs_build_toolchain
.PHONY: test_core_crypto # Run the tests of the core_crypto module including experimental ones
test_core_crypto: install_rs_build_toolchain install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),experimental,zk-pok-experimental -p $(TFHE_SPEC) -- core_crypto::
--features=$(TARGET_ARCH_FEATURE),experimental,zk-pok -p $(TFHE_SPEC) -- core_crypto::
@if [[ "$(AVX512_SUPPORT)" == "ON" ]]; then \
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),experimental,zk-pok-experimental,$(AVX512_FEATURE) -p $(TFHE_SPEC) -- core_crypto::; \
--features=$(TARGET_ARCH_FEATURE),experimental,zk-pok,$(AVX512_FEATURE) -p $(TFHE_SPEC) -- core_crypto::; \
fi
.PHONY: test_core_crypto_cov # Run the tests of the core_crypto module with code coverage
@@ -443,8 +461,8 @@ test_cuda_backend:
mkdir -p "$(TFHECUDA_BUILD)" && \
cd "$(TFHECUDA_BUILD)" && \
cmake .. -DCMAKE_BUILD_TYPE=Release -DTFHE_CUDA_BACKEND_BUILD_TESTS=ON && \
make -j "$(CPU_COUNT)" && \
make test
"$(MAKE)" -j "$(CPU_COUNT)" && \
"$(MAKE)" test
.PHONY: test_gpu # Run the tests of the core_crypto module including experimental on the gpu backend
test_gpu: test_core_crypto_gpu test_integer_gpu test_cuda_backend
@@ -459,10 +477,64 @@ test_core_crypto_gpu: install_rs_build_toolchain
.PHONY: test_integer_gpu # Run the tests of the integer module including experimental on the gpu backend
test_integer_gpu: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- integer::gpu::server_key::
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- integer::gpu::server_key:: --test-threads=6
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --doc --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- integer::gpu::server_key::
.PHONY: test_integer_gpu_ci # Run the tests for integer ci on gpu backend
test_integer_gpu_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --backend "gpu" \
--tfhe-package "$(TFHE_SPEC)"
.PHONY: test_unsigned_integer_gpu_ci # Run the tests for unsigned integer ci on gpu backend
test_unsigned_integer_gpu_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --backend "gpu" \
--unsigned-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_signed_integer_gpu_ci # Run the tests for signed integer ci on gpu backend
test_signed_integer_gpu_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --backend "gpu" \
--signed-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_integer_multi_bit_gpu_ci # Run the tests for integer ci on gpu backend running only multibit tests
test_integer_multi_bit_gpu_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --backend "gpu" \
--tfhe-package "$(TFHE_SPEC)"
.PHONY: test_unsigned_integer_multi_bit_gpu_ci # Run the tests for unsigned integer ci on gpu backend running only multibit tests
test_unsigned_integer_multi_bit_gpu_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --backend "gpu" \
--unsigned-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_signed_integer_multi_bit_gpu_ci # Run the tests for signed integer ci on gpu backend running only multibit tests
test_signed_integer_multi_bit_gpu_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --backend "gpu" \
--signed-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_boolean # Run the tests of the boolean module
test_boolean: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
@@ -525,6 +597,7 @@ test_shortint_cov: install_rs_check_toolchain install_tarpaulin
test_integer_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --avx512-support "$(AVX512_SUPPORT)" \
--tfhe-package "$(TFHE_SPEC)"
@@ -533,6 +606,7 @@ test_integer_ci: install_rs_check_toolchain install_cargo_nextest
test_unsigned_integer_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --avx512-support "$(AVX512_SUPPORT)" \
--unsigned-only --tfhe-package "$(TFHE_SPEC)"
@@ -541,6 +615,7 @@ test_unsigned_integer_ci: install_rs_check_toolchain install_cargo_nextest
test_signed_integer_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --avx512-support "$(AVX512_SUPPORT)" \
--signed-only --tfhe-package "$(TFHE_SPEC)"
@@ -549,22 +624,25 @@ test_signed_integer_ci: install_rs_check_toolchain install_cargo_nextest
test_integer_multi_bit_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --avx512-support "$(AVX512_SUPPORT)" \
--tfhe-package "$(TFHE_SPEC)"
.PHONY: test_unsigned_integer_multi_bit_ci # Run the tests for nsigned integer ci running only multibit tests
.PHONY: test_unsigned_integer_multi_bit_ci # Run the tests for unsigned integer ci running only multibit tests
test_unsigned_integer_multi_bit_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --avx512-support "$(AVX512_SUPPORT)" \
--unsigned-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_signed_integer_multi_bit_ci # Run the tests for nsigned integer ci running only multibit tests
.PHONY: test_signed_integer_multi_bit_ci # Run the tests for signed integer ci running only multibit tests
test_signed_integer_multi_bit_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
NIGHTLY_TESTS="$(NIGHTLY_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --avx512-support "$(AVX512_SUPPORT)" \
--signed-only --tfhe-package "$(TFHE_SPEC)"
@@ -591,7 +669,7 @@ test_integer_cov: install_rs_check_toolchain install_tarpaulin
.PHONY: test_high_level_api # Run all the tests for high_level_api
test_high_level_api: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,zk-pok-experimental -p $(TFHE_SPEC) \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,zk-pok -p $(TFHE_SPEC) \
-- high_level_api::
test_high_level_api_gpu: install_rs_build_toolchain install_cargo_nextest
@@ -602,14 +680,14 @@ test_high_level_api_gpu: install_rs_build_toolchain install_cargo_nextest
.PHONY: test_user_doc # Run tests from the .md documentation
test_user_doc: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) --doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,pbs-stats,zk-pok-experimental \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,pbs-stats,zk-pok \
-p $(TFHE_SPEC) \
-- test_user_docs::
.PHONY: test_user_doc_gpu # Run tests for GPU from the .md documentation
test_user_doc_gpu: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) --doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,gpu,zk-pok-experimental -p $(TFHE_SPEC) \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,gpu,zk-pok -p $(TFHE_SPEC) \
-- test_user_docs::
.PHONY: test_fhe_strings # Run tests for fhe_strings example
@@ -648,18 +726,38 @@ test_concrete_csprng: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE) -p concrete-csprng
.PHONY: test_zk_pok # Run tfhe-zk-pok-experimental tests
.PHONY: test_zk_pok # Run tfhe-zk-pok tests
test_zk_pok: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
-p tfhe-zk-pok
.PHONY: test_versionable # Run tests for tfhe-versionable subcrate
test_versionable: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
-p tfhe-versionable
# The backward compat data repo holds historical binary data but also rust code to generate and load them.
# Here we use the "patch" functionality of Cargo to make sure the repo used for the data is the same as the one used for the code.
.PHONY: test_backward_compatibility_ci
test_backward_compatibility_ci: install_rs_build_toolchain
TFHE_BACKWARD_COMPAT_DATA_DIR="$(BACKWARD_COMPAT_DATA_DIR)" RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--config "patch.'$(BACKWARD_COMPAT_DATA_URL)'.$(BACKWARD_COMPAT_DATA_PROJECT).path=\"tfhe/$(BACKWARD_COMPAT_DATA_DIR)\"" \
--features=$(TARGET_ARCH_FEATURE),shortint,integer -p $(TFHE_SPEC) test_backward_compatibility -- --nocapture
.PHONY: test_backward_compatibility # Same as test_backward_compatibility_ci but tries to clone the data repo first if needed
test_backward_compatibility: tfhe/$(BACKWARD_COMPAT_DATA_DIR) test_backward_compatibility_ci
.PHONY: backward_compat_branch # Prints the required backward compatibility branch
backward_compat_branch:
@echo "$(BACKWARD_COMPAT_DATA_BRANCH)"
.PHONY: doc # Build rust doc
doc: install_rs_check_toolchain
@# Even though we are not in docs.rs, this allows to "just" build the doc
DOCS_RS=1 \
RUSTDOCFLAGS="--html-in-header katex-header.html" \
cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,gpu,internal-keycache,experimental --no-deps -p $(TFHE_SPEC)
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,gpu,internal-keycache,experimental,zk-pok --no-deps -p $(TFHE_SPEC)
.PHONY: docs # Build rust doc alias for doc
docs: doc
@@ -670,7 +768,7 @@ lint_doc: install_rs_check_toolchain
DOCS_RS=1 \
RUSTDOCFLAGS="--html-in-header katex-header.html -Dwarnings" \
cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,gpu,internal-keycache,experimental -p $(TFHE_SPEC) --no-deps
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,gpu,internal-keycache,experimental,zk-pok -p $(TFHE_SPEC) --no-deps
.PHONY: lint_docs # Build rust doc with linting enabled alias for lint_doc
lint_docs: lint_doc
@@ -715,7 +813,7 @@ check_compile_tests_benches_gpu: install_rs_build_toolchain
mkdir -p "$(TFHECUDA_BUILD)" && \
cd "$(TFHECUDA_BUILD)" && \
cmake .. -DCMAKE_BUILD_TYPE=Debug -DTFHE_CUDA_BACKEND_BUILD_TESTS=ON -DTFHE_CUDA_BACKEND_BUILD_BENCHMARKS=ON && \
make -j "$(CPU_COUNT)"
"$(MAKE)" -j "$(CPU_COUNT)"
.PHONY: build_nodejs_test_docker # Build a docker image with tools to run nodejs tests for wasm API
build_nodejs_test_docker:
@@ -735,14 +833,14 @@ test_nodejs_wasm_api_in_docker: build_nodejs_test_docker
.PHONY: test_nodejs_wasm_api # Run tests for the nodejs on wasm API
test_nodejs_wasm_api: build_node_js_api
cd tfhe && node --test js_on_wasm_tests
cd tfhe/js_on_wasm_tests && npm run test
.PHONY: test_web_js_api_parallel # Run tests for the web wasm api
test_web_js_api_parallel: build_web_js_api_parallel
$(MAKE) -C tfhe/web_wasm_parallel_tests test
.PHONY: ci_test_web_js_api_parallel # Run tests for the web wasm api
ci_test_web_js_api_parallel: build_web_js_api_parallel
.PHONY: test_web_js_api_parallel_ci # Run tests for the web wasm api
test_web_js_api_parallel_ci: build_web_js_api_parallel
source ~/.nvm/nvm.sh && \
nvm install $(NODE_VERSION) && \
nvm use $(NODE_VERSION) && \
@@ -809,6 +907,22 @@ bench_integer_multi_bit_gpu: install_rs_check_toolchain
--bench integer-bench \
--features=$(TARGET_ARCH_FEATURE),integer,gpu,internal-keycache,nightly-avx512 -p $(TFHE_SPEC) --
.PHONY: bench_unsigned_integer_multi_bit_gpu # Run benchmarks for unsigned integer on GPU backend using multi-bit parameters
bench_unsigned_integer_multi_bit_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=MULTI_BIT \
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=$(TARGET_ARCH_FEATURE),integer,gpu,internal-keycache,nightly-avx512 -p $(TFHE_SPEC) -- ::unsigned
.PHONY: bench_integer_zk # Run benchmarks for integer encryption with ZK proofs
bench_integer_zk: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench zk-pke-bench \
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,zk-pok,nightly-avx512 \
-p $(TFHE_SPEC) --
.PHONY: bench_shortint # Run benchmarks for shortint
bench_shortint: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) \
@@ -816,16 +930,12 @@ bench_shortint: install_rs_check_toolchain
--bench shortint-bench \
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_oprf # Run benchmarks for shortint
bench_oprf: install_rs_check_toolchain
.PHONY: bench_shortint_oprf # Run benchmarks for shortint
bench_shortint_oprf: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench oprf-shortint-bench \
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
RUSTFLAGS="$(RUSTFLAGS)" \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench oprf-integer-bench \
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_shortint_multi_bit # Run benchmarks for shortint using multi-bit parameters
bench_shortint_multi_bit: install_rs_check_toolchain
@@ -847,9 +957,15 @@ bench_pbs: install_rs_check_toolchain
--bench pbs-bench \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_pbs128 # Run benchmarks for PBS using FFT 128 bits
bench_pbs128: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench pbs128-bench \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
.PHONY: bench_pbs_gpu # Run benchmarks for PBS on GPU backend
bench_pbs_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_FAST_BENCH=$(FAST_BENCH) cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench pbs-bench \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,gpu,internal-keycache,nightly-avx512 -p $(TFHE_SPEC)
@@ -869,10 +985,10 @@ bench_ks_gpu: install_rs_check_toolchain
bench_web_js_api_parallel: build_web_js_api_parallel
$(MAKE) -C tfhe/web_wasm_parallel_tests bench
.PHONY: ci_bench_web_js_api_parallel # Run benchmarks for the web wasm api
ci_bench_web_js_api_parallel: build_web_js_api_parallel
.PHONY: bench_web_js_api_parallel_ci # Run benchmarks for the web wasm api
bench_web_js_api_parallel_ci: build_web_js_api_parallel
source ~/.nvm/nvm.sh && \
nvm use node && \
nvm use $(NODE_VERSION) && \
$(MAKE) -C tfhe/web_wasm_parallel_tests bench-ci
#
@@ -929,6 +1045,12 @@ write_params_to_file: install_rs_check_toolchain
--example write_params_to_file \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,internal-keycache
.PHONY: clone_backward_compat_data # Clone the data repo needed for backward compatibility tests
clone_backward_compat_data:
./scripts/clone_backward_compat_data.sh $(BACKWARD_COMPAT_DATA_URL) $(BACKWARD_COMPAT_DATA_BRANCH) tfhe/$(BACKWARD_COMPAT_DATA_DIR)
tfhe/$(BACKWARD_COMPAT_DATA_DIR): clone_backward_compat_data
#
# Real use case examples
#
@@ -955,7 +1077,7 @@ sha256_bool: install_rs_check_toolchain
.PHONY: pcc # pcc stands for pre commit checks (except GPU)
pcc: no_tfhe_typo no_dbg_log check_fmt lint_doc check_md_docs_are_tested check_intra_md_links \
clippy_all check_compile_tests
clippy_all tfhe_lints check_compile_tests
.PHONY: pcc_gpu # pcc stands for pre commit checks for GPU compilation
pcc_gpu: clippy_gpu clippy_cuda_backend check_compile_tests_benches_gpu

View File

@@ -50,7 +50,7 @@ production-ready library for all the advanced features of TFHE.
<br></br>
## Table of Contents
- **[Getting Started](#getting-started)**
- **[Getting started](#getting-started)**
- [Cargo.toml configuration](#cargotoml-configuration)
- [A simple example](#a-simple-example)
- **[Resources](#resources)**
@@ -65,7 +65,7 @@ production-ready library for all the advanced features of TFHE.
- **[Support](#support)**
<br></br>
## Getting Started
## Getting started
### Cargo.toml configuration
To use the latest version of `TFHE-rs` in your project, you first need to add it as a dependency in your `Cargo.toml`:
@@ -198,7 +198,7 @@ Full, comprehensive documentation is available here: [https://docs.zama.ai/tfhe-
### Disclaimers
#### Security Estimation
#### Security estimation
Security estimations are done using the
[Lattice Estimator](https://github.com/malb/lattice-estimator)
@@ -206,13 +206,13 @@ with `red_cost_model = reduction.RC.BDGL16`.
When a new update is published in the Lattice Estimator, we update parameters accordingly.
### Security Model
### Security model
The default parameters for the TFHE-rs library are chosen considering the IND-CPA security model, and are selected with a bootstrapping failure probability fixed at p_error = $2^{-40}$. In particular, it is assumed that the results of decrypted computations are not shared by the secret key owner with any third parties, as such an action can lead to leakage of the secret encryption key. If you are designing an application where decryptions must be shared, you will need to craft custom encryption parameters which are chosen in consideration of the IND-CPA^D security model [1].
The default parameters for the TFHE-rs library are chosen considering the IND-CPA security model, and are selected with a bootstrapping failure probability fixed at p_error = $2^{-64}$. In particular, it is assumed that the results of decrypted computations are not shared by the secret key owner with any third parties, as such an action can lead to leakage of the secret encryption key. If you are designing an application where decryptions must be shared, you will need to craft custom encryption parameters which are chosen in consideration of the IND-CPA^D security model [1].
[1] Li, Baiyu, et al. "Securing approximate homomorphic encryption using differential privacy." Annual International Cryptology Conference. Cham: Springer Nature Switzerland, 2022. https://eprint.iacr.org/2022/816.pdf
#### Side-Channel Attacks
#### Side-channel attacks
Mitigation for side-channel attacks has not yet been implemented in TFHE-rs,
and will be released in upcoming versions.
@@ -241,7 +241,23 @@ Becoming an approved contributor involves signing our Contributor License Agreem
<br></br>
### License
This software is distributed under the **BSD-3-Clause-Clear** license. If you have any questions, please contact us at hello@zama.ai.
This software is distributed under the **BSD-3-Clause-Clear** license. Read [this](LICENSE) for more details.
#### FAQ
**Is Zamas technology free to use?**
>Zamas libraries are free to use under the BSD 3-Clause Clear license only for development, research, prototyping, and experimentation purposes. However, for any commercial use of Zama's open source code, companies must purchase Zamas commercial patent license.
>
>Everything we do is open source and we are very transparent on what it means for our users, you can read more about how we monetize our open source products at Zama in [this blogpost](https://www.zama.ai/post/open-source).
**What do I need to do if I want to use Zamas technology for commercial purposes?**
>To commercially use Zamas technology you need to be granted Zamas patent license. Please contact us hello@zama.ai for more information.
**Do you file IP on your technology?**
>Yes, all Zamas technologies are patented.
**Can you customize a solution for my specific use case?**
>We are open to collaborating and advancing the FHE space with our partners. If you have specific needs, please email us at hello@zama.ai.
<p align="right">
<a href="#about" > ↑ Back to top </a>
</p>

View File

@@ -4,9 +4,8 @@ use tfhe::{generate_keys, ConfigBuilder, FheUint64, FheUint8};
use tfhe_trivium::{KreyviumStreamByte, TransCiphering};
pub fn kreyvium_byte_gen(c: &mut Criterion) {
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();
@@ -33,9 +32,8 @@ pub fn kreyvium_byte_gen(c: &mut Criterion) {
}
pub fn kreyvium_byte_trans(c: &mut Criterion) {
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();
@@ -63,9 +61,8 @@ pub fn kreyvium_byte_trans(c: &mut Criterion) {
}
pub fn kreyvium_byte_warmup(c: &mut Criterion) {
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();

View File

@@ -13,7 +13,7 @@ pub fn kreyvium_shortint_warmup(c: &mut Criterion) {
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = KeySwitchingKey::new(
(&client_key, &server_key),
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS,
);
@@ -63,7 +63,7 @@ pub fn kreyvium_shortint_gen(c: &mut Criterion) {
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = KeySwitchingKey::new(
(&client_key, &server_key),
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS,
);
@@ -108,7 +108,7 @@ pub fn kreyvium_shortint_trans(c: &mut Criterion) {
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = KeySwitchingKey::new(
(&client_key, &server_key),
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS,
);

View File

@@ -13,7 +13,7 @@ pub fn trivium_shortint_warmup(c: &mut Criterion) {
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = KeySwitchingKey::new(
(&client_key, &server_key),
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS,
);
@@ -63,7 +63,7 @@ pub fn trivium_shortint_gen(c: &mut Criterion) {
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = KeySwitchingKey::new(
(&client_key, &server_key),
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS,
);
@@ -108,7 +108,7 @@ pub fn trivium_shortint_trans(c: &mut Criterion) {
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = KeySwitchingKey::new(
(&client_key, &server_key),
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS,
);

View File

@@ -119,7 +119,7 @@ impl KreyviumStreamByte<FheUint8> {
}
// Key and iv are stored in reverse in their shift registers
let mut key = key_bytes.map(|b| b.map(|x| (x as u8).reverse_bits() as u64));
let mut key = key_bytes.map(|b| b.reverse_bits());
let mut iv = iv_bytes.map(|x| FheUint8::encrypt_trivial(x.reverse_bits()));
key.reverse();
iv.reverse();

View File

@@ -224,7 +224,7 @@ fn kreyvium_test_shortint_long() {
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = KeySwitchingKey::new(
(&client_key, &server_key),
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS,
);
@@ -299,9 +299,8 @@ fn kreyvium_test_clear_byte() {
#[test]
fn kreyvium_test_byte_long() {
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();
@@ -338,9 +337,8 @@ fn kreyvium_test_byte_long() {
#[test]
fn kreyvium_test_fhe_byte_transciphering_long() {
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();

View File

@@ -360,7 +360,7 @@ fn trivium_test_shortint_long() {
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = KeySwitchingKey::new(
(&client_key, &server_key),
(&client_key, Some(&server_key)),
(&underlying_ck, &underlying_sk),
PARAM_KEYSWITCH_1_1_KS_PBS_TO_2_2_KS_PBS,
);

View File

@@ -1,6 +1,6 @@
[package]
name = "tfhe-cuda-backend"
version = "0.2.0"
version = "0.4.0-alpha.0"
edition = "2021"
authors = ["Zama team"]
license = "BSD-3-Clause-Clear"
@@ -14,6 +14,3 @@ keywords = ["fully", "homomorphic", "encryption", "fhe", "cryptography"]
[build-dependencies]
cmake = { version = "0.1" }
pkg-config = { version = "0.3" }
[dependencies]
thiserror = "1.0"

View File

@@ -8,7 +8,24 @@ fn main() {
}
}
// This is a workaround to the current nightly toolchain (2024-06-27 which started with
// toolchain 2024-05-05) build issue
// Essentially if cbindgen is running, a wrong argument ends up forwarded to the cuda backend
// "make" command during macro expansions for TFHE-rs C API, crashing make for make < 4.4 and
// thus crashing the build
// On the other hand, this speeds up C API build greatly given we don't have macro expansions
// in the CUDA backend so this skips the second compilation of TFHE-rs for macro inspection by
// cbindgen
if std::env::var("_CBINDGEN_IS_RUNNING").is_ok() {
return;
}
println!("Build tfhe-cuda-backend");
println!("cargo::rerun-if-changed=cuda/include");
println!("cargo::rerun-if-changed=cuda/src");
println!("cargo::rerun-if-changed=cuda/tests_and_benchmarks");
println!("cargo::rerun-if-changed=cuda/CMakeLists.txt");
println!("cargo::rerun-if-changed=src");
if env::consts::OS == "linux" {
let output = Command::new("./get_os_name.sh").output().unwrap();
let distribution = String::from_utf8(output.stdout).unwrap();

View File

@@ -1,6 +1,7 @@
#ifndef CUDA_CIPHERTEXT_H
#define CUDA_CIPHERTEXT_H
#include "device.h"
#include <cstdint>
extern "C" {
@@ -14,5 +15,11 @@ void cuda_convert_lwe_ciphertext_vector_to_cpu_64(void *stream,
void *dest, void *src,
uint32_t number_of_cts,
uint32_t lwe_dimension);
void cuda_glwe_sample_extract_64(void *stream, uint32_t gpu_index,
void *lwe_array_out, void *glwe_array_in,
uint32_t *nth_array, uint32_t num_glwes,
uint32_t glwe_dimension,
uint32_t polynomial_size);
};
#endif

View File

@@ -6,6 +6,7 @@
#include <cstdlib>
#include <cstring>
#include <cuda_runtime.h>
#include <vector>
#define synchronize_threads_in_block() __syncthreads()
extern "C" {
@@ -63,14 +64,8 @@ void cuda_drop(void *ptr, uint32_t gpu_index);
void cuda_drop_async(void *ptr, cudaStream_t stream, uint32_t gpu_index);
int cuda_get_max_shared_memory(uint32_t gpu_index);
void cuda_stream_add_callback(cudaStream_t stream, uint32_t gpu_index,
cudaStreamCallback_t callback, void *user_data);
}
void host_free_on_stream_callback(cudaStream_t stream, cudaError_t status,
void *host_pointer);
template <typename Torus>
void cuda_set_value_async(cudaStream_t stream, uint32_t gpu_index,
Torus *d_array, Torus value, Torus n);

View File

@@ -1,10 +0,0 @@
#ifndef HELPER_H
#define HELPER_H
extern "C" {
int cuda_setup_multi_gpu();
}
void multi_gpu_checks(uint32_t gpu_count);
#endif

View File

@@ -0,0 +1,34 @@
#ifndef HELPER_MULTI_GPU_H
#define HELPER_MULTI_GPU_H
#include <mutex>
#include <variant>
#include <vector>
extern std::mutex m;
extern bool p2p_enabled;
extern "C" {
int cuda_setup_multi_gpu();
}
// Define a variant type that can be either a vector or a single pointer
template <typename Torus>
using LweArrayVariant = std::variant<std::vector<Torus *>, Torus *>;
// Macro to define the visitor logic using std::holds_alternative for vectors
#define GET_VARIANT_ELEMENT(variant, index) \
[&] { \
if (std::holds_alternative<std::vector<Torus *>>(variant)) { \
return std::get<std::vector<Torus *>>(variant)[index]; \
} else { \
return std::get<Torus *>(variant); \
} \
}()
int get_active_gpu_count(int num_inputs, int gpu_count);
int get_num_inputs_on_gpu(int total_num_inputs, int gpu_index, int gpu_count);
int get_gpu_offset(int total_num_inputs, int gpu_index, int gpu_count);
#endif

File diff suppressed because it is too large Load Diff

View File

@@ -26,14 +26,12 @@ void cuda_convert_lwe_programmable_bootstrap_key_64(
void scratch_cuda_programmable_bootstrap_amortized_32(
void *stream, uint32_t gpu_index, int8_t **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
void scratch_cuda_programmable_bootstrap_amortized_64(
void *stream, uint32_t gpu_index, int8_t **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
void cuda_programmable_bootstrap_amortized_lwe_ciphertext_vector_32(
void *stream, uint32_t gpu_index, void *lwe_array_out,
@@ -41,8 +39,7 @@ void cuda_programmable_bootstrap_amortized_lwe_ciphertext_vector_32(
void *lwe_array_in, void *lwe_input_indexes, void *bootstrapping_key,
int8_t *pbs_buffer, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t base_log, uint32_t level_count,
uint32_t num_samples, uint32_t num_luts, uint32_t lwe_idx,
uint32_t max_shared_memory);
uint32_t num_samples);
void cuda_programmable_bootstrap_amortized_lwe_ciphertext_vector_64(
void *stream, uint32_t gpu_index, void *lwe_array_out,
@@ -50,8 +47,7 @@ void cuda_programmable_bootstrap_amortized_lwe_ciphertext_vector_64(
void *lwe_array_in, void *lwe_input_indexes, void *bootstrapping_key,
int8_t *pbs_buffer, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t base_log, uint32_t level_count,
uint32_t num_samples, uint32_t num_luts, uint32_t lwe_idx,
uint32_t max_shared_memory);
uint32_t num_samples);
void cleanup_cuda_programmable_bootstrap_amortized(void *stream,
uint32_t gpu_index,
@@ -60,14 +56,12 @@ void cleanup_cuda_programmable_bootstrap_amortized(void *stream,
void scratch_cuda_programmable_bootstrap_32(
void *stream, uint32_t gpu_index, int8_t **buffer, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
void scratch_cuda_programmable_bootstrap_64(
void *stream, uint32_t gpu_index, int8_t **buffer, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
void cuda_programmable_bootstrap_lwe_ciphertext_vector_32(
void *stream, uint32_t gpu_index, void *lwe_array_out,
@@ -75,8 +69,7 @@ void cuda_programmable_bootstrap_lwe_ciphertext_vector_32(
void *lwe_array_in, void *lwe_input_indexes, void *bootstrapping_key,
int8_t *buffer, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t base_log, uint32_t level_count,
uint32_t num_samples, uint32_t num_luts, uint32_t lwe_idx,
uint32_t max_shared_memory);
uint32_t num_samples);
void cuda_programmable_bootstrap_lwe_ciphertext_vector_64(
void *stream, uint32_t gpu_index, void *lwe_array_out,
@@ -84,44 +77,41 @@ void cuda_programmable_bootstrap_lwe_ciphertext_vector_64(
void *lwe_array_in, void *lwe_input_indexes, void *bootstrapping_key,
int8_t *buffer, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t base_log, uint32_t level_count,
uint32_t num_samples, uint32_t num_luts, uint32_t lwe_idx,
uint32_t max_shared_memory);
uint32_t num_samples);
void cleanup_cuda_programmable_bootstrap(void *stream, uint32_t gpu_index,
int8_t **pbs_buffer);
uint64_t get_buffer_size_programmable_bootstrap_amortized_64(
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory);
uint32_t input_lwe_ciphertext_count);
uint64_t get_buffer_size_programmable_bootstrap_64(
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory);
uint32_t input_lwe_ciphertext_count);
}
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_full_sm_programmable_bootstrap_step_one(
uint64_t get_buffer_size_full_sm_programmable_bootstrap_step_one(
uint32_t polynomial_size) {
return sizeof(Torus) * polynomial_size + // accumulator_rotated
sizeof(double2) * polynomial_size / 2; // accumulator fft
}
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_full_sm_programmable_bootstrap_step_two(
uint64_t get_buffer_size_full_sm_programmable_bootstrap_step_two(
uint32_t polynomial_size) {
return sizeof(Torus) * polynomial_size + // accumulator
sizeof(double2) * polynomial_size / 2; // accumulator fft
}
template <typename Torus>
__host__ __device__ uint64_t
uint64_t
get_buffer_size_partial_sm_programmable_bootstrap(uint32_t polynomial_size) {
return sizeof(double2) * polynomial_size / 2; // accumulator fft
}
template <typename Torus>
__host__ __device__ uint64_t
uint64_t
get_buffer_size_full_sm_programmable_bootstrap_tbc(uint32_t polynomial_size) {
return sizeof(Torus) * polynomial_size + // accumulator_rotated
sizeof(Torus) * polynomial_size + // accumulator
@@ -129,21 +119,19 @@ get_buffer_size_full_sm_programmable_bootstrap_tbc(uint32_t polynomial_size) {
}
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_partial_sm_programmable_bootstrap_tbc(
uint64_t get_buffer_size_partial_sm_programmable_bootstrap_tbc(
uint32_t polynomial_size) {
return sizeof(double2) * polynomial_size / 2; // accumulator fft mask & body
}
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_sm_dsm_plus_tbc_classic_programmable_bootstrap(
uint64_t get_buffer_size_sm_dsm_plus_tbc_classic_programmable_bootstrap(
uint32_t polynomial_size) {
return sizeof(double2) * polynomial_size / 2; // tbc
}
template <typename Torus>
__host__ __device__ uint64_t
uint64_t
get_buffer_size_full_sm_programmable_bootstrap_cg(uint32_t polynomial_size) {
return sizeof(Torus) * polynomial_size + // accumulator_rotated
sizeof(Torus) * polynomial_size + // accumulator
@@ -151,15 +139,14 @@ get_buffer_size_full_sm_programmable_bootstrap_cg(uint32_t polynomial_size) {
}
template <typename Torus>
__host__ __device__ uint64_t
uint64_t
get_buffer_size_partial_sm_programmable_bootstrap_cg(uint32_t polynomial_size) {
return sizeof(double2) * polynomial_size / 2; // accumulator fft mask & body
}
template <typename Torus>
__host__ bool
supports_distributed_shared_memory_on_classic_programmable_bootstrap(
uint32_t polynomial_size, uint32_t max_shared_memory);
bool supports_distributed_shared_memory_on_classic_programmable_bootstrap(
uint32_t polynomial_size);
template <typename Torus, PBS_TYPE pbs_type> struct pbs_buffer;
@@ -178,7 +165,7 @@ template <typename Torus> struct pbs_buffer<Torus, PBS_TYPE::CLASSICAL> {
this->pbs_variant = pbs_variant;
auto max_shared_memory = cuda_get_max_shared_memory(gpu_index);
auto max_shared_memory = cuda_get_max_shared_memory(0);
if (allocate_gpu_memory) {
switch (pbs_variant) {
@@ -255,7 +242,7 @@ template <typename Torus> struct pbs_buffer<Torus, PBS_TYPE::CLASSICAL> {
bool supports_dsm =
supports_distributed_shared_memory_on_classic_programmable_bootstrap<
Torus>(polynomial_size, max_shared_memory);
Torus>(polynomial_size);
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_tbc<Torus>(
@@ -314,10 +301,10 @@ template <typename Torus> struct pbs_buffer<Torus, PBS_TYPE::CLASSICAL> {
};
template <typename Torus>
__host__ __device__ uint64_t get_buffer_size_programmable_bootstrap_cg(
uint64_t get_buffer_size_programmable_bootstrap_cg(
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory) {
uint32_t input_lwe_ciphertext_count) {
int max_shared_memory = cuda_get_max_shared_memory(0);
uint64_t full_sm =
get_buffer_size_full_sm_programmable_bootstrap_cg<Torus>(polynomial_size);
uint64_t partial_sm =
@@ -343,8 +330,7 @@ template <typename Torus>
bool has_support_to_cuda_programmable_bootstrap_cg(uint32_t glwe_dimension,
uint32_t polynomial_size,
uint32_t level_count,
uint32_t num_samples,
uint32_t max_shared_memory);
uint32_t num_samples);
template <typename Torus>
void cuda_programmable_bootstrap_cg_lwe_ciphertext_vector(
@@ -353,8 +339,7 @@ void cuda_programmable_bootstrap_cg_lwe_ciphertext_vector(
Torus *lwe_array_in, Torus *lwe_input_indexes, double2 *bootstrapping_key,
pbs_buffer<Torus, CLASSICAL> *buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t base_log,
uint32_t level_count, uint32_t num_samples, uint32_t num_luts,
uint32_t lwe_idx, uint32_t max_shared_memory);
uint32_t level_count, uint32_t num_samples);
template <typename Torus>
void cuda_programmable_bootstrap_lwe_ciphertext_vector(
@@ -363,8 +348,7 @@ void cuda_programmable_bootstrap_lwe_ciphertext_vector(
Torus *lwe_array_in, Torus *lwe_input_indexes, double2 *bootstrapping_key,
pbs_buffer<Torus, CLASSICAL> *buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t base_log,
uint32_t level_count, uint32_t num_samples, uint32_t num_luts,
uint32_t lwe_idx, uint32_t max_shared_memory);
uint32_t level_count, uint32_t num_samples);
#if (CUDA_ARCH >= 900)
template <typename Torus>
@@ -374,43 +358,44 @@ void cuda_programmable_bootstrap_tbc_lwe_ciphertext_vector(
Torus *lwe_array_in, Torus *lwe_input_indexes, double2 *bootstrapping_key,
pbs_buffer<Torus, CLASSICAL> *buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t base_log,
uint32_t level_count, uint32_t num_samples, uint32_t num_luts,
uint32_t lwe_idx, uint32_t max_shared_memory);
uint32_t level_count, uint32_t num_samples);
template <typename Torus, typename STorus>
template <typename Torus>
void scratch_cuda_programmable_bootstrap_tbc(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, CLASSICAL> **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
#endif
template <typename Torus, typename STorus>
template <typename Torus>
void scratch_cuda_programmable_bootstrap_cg(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, CLASSICAL> **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
template <typename Torus, typename STorus>
template <typename Torus>
void scratch_cuda_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, CLASSICAL> **buffer,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
template <typename Torus>
bool has_support_to_cuda_programmable_bootstrap_tbc(uint32_t num_samples,
uint32_t glwe_dimension,
uint32_t polynomial_size,
uint32_t level_count,
uint32_t max_shared_memory);
uint32_t level_count);
#ifdef __CUDACC__
__device__ inline int get_start_ith_ggsw(int i, uint32_t polynomial_size,
int glwe_dimension,
uint32_t level_count);
template <typename T>
__device__ const T *get_ith_mask_kth_block(const T *ptr, int i, int k,
int level, uint32_t polynomial_size,
int glwe_dimension,
uint32_t level_count);
template <typename T>
__device__ T *get_ith_mask_kth_block(T *ptr, int i, int k, int level,
uint32_t polynomial_size,
@@ -422,8 +407,8 @@ __device__ T *get_ith_body_kth_block(T *ptr, int i, int k, int level,
int glwe_dimension, uint32_t level_count);
template <typename T>
__device__ T *get_multi_bit_ith_lwe_gth_group_kth_block(
T *ptr, int g, int i, int k, int level, uint32_t grouping_factor,
__device__ const T *get_multi_bit_ith_lwe_gth_group_kth_block(
const T *ptr, int g, int i, int k, int level, uint32_t grouping_factor,
uint32_t polynomial_size, uint32_t glwe_dimension, uint32_t level_count);
#endif

View File

@@ -8,7 +8,7 @@ extern "C" {
bool has_support_to_cuda_programmable_bootstrap_cg_multi_bit(
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t num_samples, uint32_t max_shared_memory);
uint32_t num_samples);
void cuda_convert_lwe_multi_bit_programmable_bootstrap_key_64(
void *stream, uint32_t gpu_index, void *dest, void *src,
@@ -19,8 +19,7 @@ void scratch_cuda_multi_bit_programmable_bootstrap_64(
void *stream, uint32_t gpu_index, int8_t **pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t grouping_factor,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory, uint32_t chunk_size = 0);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
void cuda_multi_bit_programmable_bootstrap_lwe_ciphertext_vector_64(
void *stream, uint32_t gpu_index, void *lwe_array_out,
@@ -28,24 +27,7 @@ void cuda_multi_bit_programmable_bootstrap_lwe_ciphertext_vector_64(
void *lwe_array_in, void *lwe_input_indexes, void *bootstrapping_key,
int8_t *buffer, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t grouping_factor, uint32_t base_log,
uint32_t level_count, uint32_t num_samples, uint32_t num_luts,
uint32_t lwe_idx, uint32_t max_shared_memory, uint32_t lwe_chunk_size = 0);
void scratch_cuda_generic_multi_bit_programmable_bootstrap_64(
void *stream, uint32_t gpu_index, int8_t **pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t grouping_factor,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory, uint32_t lwe_chunk_size = 0);
void cuda_generic_multi_bit_programmable_bootstrap_lwe_ciphertext_vector_64(
void *stream, uint32_t gpu_index, void *lwe_array_out,
void *lwe_output_indexes, void *lut_vector, void *lut_vector_indexes,
void *lwe_array_in, void *lwe_input_indexes, void *bootstrapping_key,
int8_t *pbs_buffer, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t grouping_factor, uint32_t base_log,
uint32_t level_count, uint32_t num_samples, uint32_t num_luts,
uint32_t lwe_idx, uint32_t max_shared_memory, uint32_t lwe_chunk_size = 0);
uint32_t level_count, uint32_t num_samples);
void cleanup_cuda_multi_bit_programmable_bootstrap(void *stream,
uint32_t gpu_index,
@@ -53,23 +35,21 @@ void cleanup_cuda_multi_bit_programmable_bootstrap(void *stream,
}
template <typename Torus>
__host__ bool
supports_distributed_shared_memory_on_multibit_programmable_bootstrap(
uint32_t polynomial_size, uint32_t max_shared_memory);
bool supports_distributed_shared_memory_on_multibit_programmable_bootstrap(
uint32_t polynomial_size);
template <typename Torus>
bool has_support_to_cuda_programmable_bootstrap_tbc_multi_bit(
uint32_t num_samples, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t max_shared_memory);
uint32_t level_count);
#if CUDA_ARCH >= 900
template <typename Torus, typename STorus>
template <typename Torus>
void scratch_cuda_tbc_multi_bit_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, MULTI_BIT> **buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t grouping_factor,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory, uint32_t lwe_chunk_size);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
template <typename Torus>
void cuda_tbc_multi_bit_programmable_bootstrap_lwe_ciphertext_vector(
@@ -78,25 +58,14 @@ void cuda_tbc_multi_bit_programmable_bootstrap_lwe_ciphertext_vector(
Torus *lwe_array_in, Torus *lwe_input_indexes, Torus *bootstrapping_key,
pbs_buffer<Torus, MULTI_BIT> *pbs_buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t grouping_factor,
uint32_t base_log, uint32_t level_count, uint32_t num_samples,
uint32_t num_luts, uint32_t lwe_idx, uint32_t max_shared_memory,
uint32_t lwe_chunk_size);
uint32_t base_log, uint32_t level_count, uint32_t num_samples);
#endif
template <typename Torus, typename STorus>
void scratch_cuda_cg_multi_bit_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, MULTI_BIT> **pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t grouping_factor,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory, uint32_t lwe_chunk_size = 0);
template <typename Torus, typename STorus>
template <typename Torus>
void scratch_cuda_cg_multi_bit_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, MULTI_BIT> **pbs_buffer,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory, uint32_t lwe_chunk_size = 0);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
template <typename Torus>
void cuda_cg_multi_bit_programmable_bootstrap_lwe_ciphertext_vector(
@@ -105,17 +74,14 @@ void cuda_cg_multi_bit_programmable_bootstrap_lwe_ciphertext_vector(
Torus *lwe_array_in, Torus *lwe_input_indexes, Torus *bootstrapping_key,
pbs_buffer<Torus, MULTI_BIT> *pbs_buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t grouping_factor,
uint32_t base_log, uint32_t level_count, uint32_t num_samples,
uint32_t num_luts, uint32_t lwe_idx, uint32_t max_shared_memory,
uint32_t lwe_chunk_size = 0);
uint32_t base_log, uint32_t level_count, uint32_t num_samples);
template <typename Torus, typename STorus>
template <typename Torus>
void scratch_cuda_multi_bit_programmable_bootstrap(
void *stream, uint32_t gpu_index, pbs_buffer<Torus, MULTI_BIT> **pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t grouping_factor,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory, uint32_t lwe_chunk_size = 0);
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory);
template <typename Torus>
void cuda_multi_bit_programmable_bootstrap_lwe_ciphertext_vector(
@@ -124,45 +90,34 @@ void cuda_multi_bit_programmable_bootstrap_lwe_ciphertext_vector(
Torus *lwe_array_in, Torus *lwe_input_indexes, Torus *bootstrapping_key,
pbs_buffer<Torus, MULTI_BIT> *pbs_buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t grouping_factor,
uint32_t base_log, uint32_t level_count, uint32_t num_samples,
uint32_t num_luts, uint32_t lwe_idx, uint32_t max_shared_memory,
uint32_t lwe_chunk_size = 0);
uint32_t base_log, uint32_t level_count, uint32_t num_samples);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_full_sm_multibit_programmable_bootstrap_keybundle(
uint64_t get_buffer_size_full_sm_multibit_programmable_bootstrap_keybundle(
uint32_t polynomial_size);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_full_sm_multibit_programmable_bootstrap_step_one(
uint64_t get_buffer_size_full_sm_multibit_programmable_bootstrap_step_one(
uint32_t polynomial_size);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_full_sm_multibit_programmable_bootstrap_step_two(
uint64_t get_buffer_size_full_sm_multibit_programmable_bootstrap_step_two(
uint32_t polynomial_size);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_partial_sm_multibit_programmable_bootstrap_step_one(
uint64_t get_buffer_size_partial_sm_multibit_programmable_bootstrap_step_one(
uint32_t polynomial_size);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_full_sm_cg_multibit_programmable_bootstrap(
uint64_t get_buffer_size_full_sm_cg_multibit_programmable_bootstrap(
uint32_t polynomial_size);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_partial_sm_cg_multibit_programmable_bootstrap(
uint64_t get_buffer_size_partial_sm_cg_multibit_programmable_bootstrap(
uint32_t polynomial_size);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_sm_dsm_plus_tbc_multibit_programmable_bootstrap(
uint64_t get_buffer_size_sm_dsm_plus_tbc_multibit_programmable_bootstrap(
uint32_t polynomial_size);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_partial_sm_tbc_multibit_programmable_bootstrap(
uint64_t get_buffer_size_partial_sm_tbc_multibit_programmable_bootstrap(
uint32_t polynomial_size);
template <typename Torus>
__host__ __device__ uint64_t
get_buffer_size_full_sm_tbc_multibit_programmable_bootstrap(
uint64_t get_buffer_size_full_sm_tbc_multibit_programmable_bootstrap(
uint32_t polynomial_size);
template <typename Torus> struct pbs_buffer<Torus, PBS_TYPE::MULTI_BIT> {
@@ -332,8 +287,7 @@ template <typename Torus> struct pbs_buffer<Torus, PBS_TYPE::MULTI_BIT> {
};
template <typename Torus, class params>
__host__ uint32_t get_lwe_chunk_size(uint32_t gpu_index, uint32_t max_num_pbs,
uint32_t polynomial_size,
uint32_t max_shared_memory);
uint32_t get_lwe_chunk_size(uint32_t gpu_index, uint32_t max_num_pbs,
uint32_t polynomial_size);
#endif // CUDA_MULTI_BIT_H

View File

@@ -11,7 +11,7 @@ set(SOURCES
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/linear_algebra.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/shifts.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/vertical_packing.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/helper.h)
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/helper_multi_gpu.h)
file(GLOB_RECURSE SOURCES "*.cu")
add_library(tfhe_cuda_backend STATIC ${SOURCES})
set_target_properties(tfhe_cuda_backend PROPERTIES CUDA_SEPARABLE_COMPILATION ON CUDA_RESOLVE_DEVICE_SYMBOLS ON)

View File

@@ -1,4 +1,5 @@
#include "ciphertext.cuh"
#include "polynomial/parameters.cuh"
void cuda_convert_lwe_ciphertext_vector_to_gpu_64(void *stream,
uint32_t gpu_index,
@@ -19,3 +20,58 @@ void cuda_convert_lwe_ciphertext_vector_to_cpu_64(void *stream,
static_cast<cudaStream_t>(stream), gpu_index, (uint64_t *)dest,
(uint64_t *)src, number_of_cts, lwe_dimension);
}
void cuda_glwe_sample_extract_64(void *stream, uint32_t gpu_index,
void *lwe_array_out, void *glwe_array_in,
uint32_t *nth_array, uint32_t num_glwes,
uint32_t glwe_dimension,
uint32_t polynomial_size) {
switch (polynomial_size) {
case 256:
host_sample_extract<uint64_t, AmortizedDegree<256>>(
static_cast<cudaStream_t>(stream), gpu_index, (uint64_t *)lwe_array_out,
(uint64_t *)glwe_array_in, (uint32_t *)nth_array, num_glwes,
glwe_dimension);
break;
case 512:
host_sample_extract<uint64_t, AmortizedDegree<512>>(
static_cast<cudaStream_t>(stream), gpu_index, (uint64_t *)lwe_array_out,
(uint64_t *)glwe_array_in, (uint32_t *)nth_array, num_glwes,
glwe_dimension);
break;
case 1024:
host_sample_extract<uint64_t, AmortizedDegree<1024>>(
static_cast<cudaStream_t>(stream), gpu_index, (uint64_t *)lwe_array_out,
(uint64_t *)glwe_array_in, (uint32_t *)nth_array, num_glwes,
glwe_dimension);
break;
case 2048:
host_sample_extract<uint64_t, AmortizedDegree<2048>>(
static_cast<cudaStream_t>(stream), gpu_index, (uint64_t *)lwe_array_out,
(uint64_t *)glwe_array_in, (uint32_t *)nth_array, num_glwes,
glwe_dimension);
break;
case 4096:
host_sample_extract<uint64_t, AmortizedDegree<4096>>(
static_cast<cudaStream_t>(stream), gpu_index, (uint64_t *)lwe_array_out,
(uint64_t *)glwe_array_in, (uint32_t *)nth_array, num_glwes,
glwe_dimension);
break;
case 8192:
host_sample_extract<uint64_t, AmortizedDegree<8192>>(
static_cast<cudaStream_t>(stream), gpu_index, (uint64_t *)lwe_array_out,
(uint64_t *)glwe_array_in, (uint32_t *)nth_array, num_glwes,
glwe_dimension);
break;
case 16384:
host_sample_extract<uint64_t, AmortizedDegree<16384>>(
static_cast<cudaStream_t>(stream), gpu_index, (uint64_t *)lwe_array_out,
(uint64_t *)glwe_array_in, (uint32_t *)nth_array, num_glwes,
glwe_dimension);
break;
default:
PANIC("Cuda error: unsupported polynomial size. Supported "
"N's are powers of two in the interval [256..16384].")
}
}

View File

@@ -3,6 +3,7 @@
#include "ciphertext.h"
#include "device.h"
#include "polynomial/functions.cuh"
#include <cstdint>
template <typename T>
@@ -25,4 +26,39 @@ void cuda_convert_lwe_ciphertext_vector_to_cpu(cudaStream_t stream,
cuda_memcpy_async_to_cpu(dest, src, size, stream, gpu_index);
}
template <typename Torus, class params>
__global__ void sample_extract(Torus *lwe_array_out, Torus *glwe_array_in,
uint32_t *nth_array, uint32_t glwe_dimension) {
const int input_id = blockIdx.x;
const int glwe_input_size = (glwe_dimension + 1) * params::degree;
const int lwe_output_size = glwe_dimension * params::degree + 1;
auto lwe_out = lwe_array_out + input_id * lwe_output_size;
// We assume each GLWE will store the first polynomial_size inputs
uint32_t nth_per_glwe = params::degree;
auto glwe_in = glwe_array_in + (input_id / nth_per_glwe) * glwe_input_size;
auto nth = nth_array[input_id];
sample_extract_mask<Torus, params>(lwe_out, glwe_in, glwe_dimension, nth);
sample_extract_body<Torus, params>(lwe_out, glwe_in, glwe_dimension, nth);
}
template <typename Torus, class params>
__host__ void host_sample_extract(cudaStream_t stream, uint32_t gpu_index,
Torus *lwe_array_out, Torus *glwe_array_in,
uint32_t *nth_array, uint32_t num_glwes,
uint32_t glwe_dimension) {
cudaSetDevice(gpu_index);
dim3 grid(num_glwes);
dim3 thds(params::degree / params::opt);
sample_extract<Torus, params><<<grid, thds, 0, stream>>>(
lwe_array_out, glwe_array_in, nth_array, glwe_dimension);
check_cuda_error(cudaGetLastError());
}
#endif

View File

@@ -3,8 +3,11 @@
#include "device.h"
#include "gadget.cuh"
#include "helper_multi_gpu.h"
#include "polynomial/functions.cuh"
#include "polynomial/polynomial_math.cuh"
#include "torus.cuh"
#include "utils/kernel_dimensions.cuh"
#include <thread>
#include <vector>
@@ -31,43 +34,69 @@ __device__ Torus *get_ith_block(Torus *ksk, int i, int level,
* scaling factor) under key s2 instead of s1, with an increased noise
*
*/
// Each thread in x are used to calculate one output.
// threads in y are used to paralelize the lwe_dimension_in loop.
// shared memory is used to store intermediate results of the reduction.
template <typename Torus>
__global__ void
keyswitch(Torus *lwe_array_out, Torus *lwe_output_indexes, Torus *lwe_array_in,
Torus *lwe_input_indexes, Torus *ksk, uint32_t lwe_dimension_in,
keyswitch(Torus *lwe_array_out, const Torus *__restrict__ lwe_output_indexes,
const Torus *__restrict__ lwe_array_in,
const Torus *__restrict__ lwe_input_indexes,
const Torus *__restrict__ ksk, uint32_t lwe_dimension_in,
uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count) {
int tid = threadIdx.x;
const int tid = threadIdx.x + blockIdx.x * blockDim.x;
const int shmem_index = threadIdx.x + threadIdx.y * blockDim.x;
extern __shared__ int8_t sharedmem[];
Torus *lwe_acc_out = (Torus *)sharedmem;
auto block_lwe_array_out = get_chunk(
lwe_array_out, lwe_output_indexes[blockIdx.y], lwe_dimension_out + 1);
if (tid <= lwe_dimension_out) {
Torus *local_lwe_array_out = (Torus *)sharedmem;
Torus local_lwe_out = 0;
auto block_lwe_array_in = get_chunk(
lwe_array_in, lwe_input_indexes[blockIdx.x], lwe_dimension_in + 1);
auto block_lwe_array_out = get_chunk(
lwe_array_out, lwe_output_indexes[blockIdx.x], lwe_dimension_out + 1);
local_lwe_array_out[tid] = 0;
lwe_array_in, lwe_input_indexes[blockIdx.y], lwe_dimension_in + 1);
if (tid == lwe_dimension_out) {
local_lwe_array_out[lwe_dimension_out] =
block_lwe_array_in[lwe_dimension_in];
if (tid == lwe_dimension_out && threadIdx.y == 0) {
local_lwe_out = block_lwe_array_in[lwe_dimension_in];
}
const Torus mask_mod_b = (1ll << base_log) - 1ll;
for (int i = 0; i < lwe_dimension_in; i++) {
const int pack_size = (lwe_dimension_in + blockDim.y - 1) / blockDim.y;
const int start_i = pack_size * threadIdx.y;
const int end_i = SEL(lwe_dimension_in, pack_size * (threadIdx.y + 1),
pack_size * (threadIdx.y + 1) <= lwe_dimension_in);
// This loop distribution seems to benefit the global mem reads
for (int i = start_i; i < end_i; i++) {
Torus a_i = round_to_closest_multiple(block_lwe_array_in[i], base_log,
level_count);
Torus state = a_i >> (sizeof(Torus) * 8 - base_log * level_count);
Torus mask_mod_b = (1ll << base_log) - 1ll;
for (int j = 0; j < level_count; j++) {
auto ksk_block =
get_ith_block(ksk, i, j, lwe_dimension_out, level_count);
Torus decomposed = decompose_one<Torus>(state, mask_mod_b, base_log);
local_lwe_array_out[tid] -= (Torus)ksk_block[tid] * decomposed;
local_lwe_out -= (Torus)ksk_block[tid] * decomposed;
}
}
block_lwe_array_out[tid] = local_lwe_array_out[tid];
lwe_acc_out[shmem_index] = local_lwe_out;
}
if (tid <= lwe_dimension_out) {
for (int offset = blockDim.y / 2; offset > 0 && threadIdx.y < offset;
offset /= 2) {
__syncthreads();
lwe_acc_out[shmem_index] +=
lwe_acc_out[shmem_index + offset * blockDim.x];
}
if (threadIdx.y == 0)
block_lwe_array_out[tid] = lwe_acc_out[shmem_index];
}
}
/// assume lwe_array_in in the gpu
template <typename Torus>
__host__ void cuda_keyswitch_lwe_ciphertext_vector(
cudaStream_t stream, uint32_t gpu_index, Torus *lwe_array_out,
@@ -76,15 +105,16 @@ __host__ void cuda_keyswitch_lwe_ciphertext_vector(
uint32_t base_log, uint32_t level_count, uint32_t num_samples) {
cudaSetDevice(gpu_index);
constexpr int ideal_threads = 1024;
if (lwe_dimension_out + 1 > ideal_threads)
PANIC("Cuda error (keyswitch): lwe dimension size out should be greater "
"or equal to the number of threads per block")
int lwe_size = lwe_dimension_out + 1;
int shared_mem = sizeof(Torus) * lwe_size;
dim3 grid(num_samples, 1, 1);
dim3 threads(ideal_threads, 1, 1);
constexpr int num_threads_y = 32;
int num_blocks, num_threads_x;
getNumBlocksAndThreads2D(lwe_dimension_out + 1, 512, num_threads_y,
num_blocks, num_threads_x);
int shared_mem = sizeof(Torus) * num_threads_y * num_threads_x;
dim3 grid(num_blocks, num_samples, 1);
dim3 threads(num_threads_x, num_threads_y, 1);
keyswitch<Torus><<<grid, threads, shared_mem, stream>>>(
lwe_array_out, lwe_output_indexes, lwe_array_in, lwe_input_indexes, ksk,
@@ -92,4 +122,36 @@ __host__ void cuda_keyswitch_lwe_ciphertext_vector(
check_cuda_error(cudaGetLastError());
}
template <typename Torus>
void execute_keyswitch_async(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count,
const LweArrayVariant<Torus> &lwe_array_out,
const LweArrayVariant<Torus> &lwe_output_indexes,
const LweArrayVariant<Torus> &lwe_array_in,
const LweArrayVariant<Torus> &lwe_input_indexes,
Torus **ksks, uint32_t lwe_dimension_in,
uint32_t lwe_dimension_out, uint32_t base_log,
uint32_t level_count, uint32_t num_samples) {
/// If the number of radix blocks is lower than the number of GPUs, not all
/// GPUs will be active and there will be 1 input per GPU
for (uint i = 0; i < gpu_count; i++) {
int num_samples_on_gpu = get_num_inputs_on_gpu(num_samples, i, gpu_count);
Torus *current_lwe_array_out = GET_VARIANT_ELEMENT(lwe_array_out, i);
Torus *current_lwe_output_indexes =
GET_VARIANT_ELEMENT(lwe_output_indexes, i);
Torus *current_lwe_array_in = GET_VARIANT_ELEMENT(lwe_array_in, i);
Torus *current_lwe_input_indexes =
GET_VARIANT_ELEMENT(lwe_input_indexes, i);
// Compute Keyswitch
cuda_keyswitch_lwe_ciphertext_vector<Torus>(
streams[i], gpu_indexes[i], current_lwe_array_out,
current_lwe_output_indexes, current_lwe_array_in,
current_lwe_input_indexes, ksks[i], lwe_dimension_in, lwe_dimension_out,
base_log, level_count, num_samples_on_gpu);
}
}
#endif

View File

@@ -39,36 +39,19 @@ __device__ inline T round_to_closest_multiple(T x, uint32_t base_log,
}
template <typename T>
__device__ __forceinline__ void rescale_torus_element(T element, T &output,
uint32_t log_shift) {
output =
round((double)element / (double(std::numeric_limits<T>::max()) + 1.0) *
(double)log_shift);
__device__ __forceinline__ void modulus_switch(T input, T &output,
uint32_t log_modulus) {
constexpr uint32_t BITS = sizeof(T) * 8;
output = input + (((T)1) << (BITS - log_modulus - 1));
output >>= (BITS - log_modulus);
}
template <typename T>
__device__ __forceinline__ T rescale_torus_element(T element,
uint32_t log_shift) {
return round((double)element / (double(std::numeric_limits<T>::max()) + 1.0) *
(double)log_shift);
__device__ __forceinline__ T modulus_switch(T input, uint32_t log_modulus) {
T output;
modulus_switch(input, output, log_modulus);
return output;
}
template <>
__device__ __forceinline__ void
rescale_torus_element<uint32_t>(uint32_t element, uint32_t &output,
uint32_t log_shift) {
output =
round(__uint2double_rn(element) /
(__uint2double_rn(std::numeric_limits<uint32_t>::max()) + 1.0) *
__uint2double_rn(log_shift));
}
template <>
__device__ __forceinline__ void
rescale_torus_element<uint64_t>(uint64_t element, uint64_t &output,
uint32_t log_shift) {
output = round(__ull2double_rn(element) /
(__ull2double_rn(std::numeric_limits<uint64_t>::max()) + 1.0) *
__uint2double_rn(log_shift));
}
#endif // CNCRT_TORUS_H

View File

@@ -6,7 +6,7 @@
cudaStream_t cuda_create_stream(uint32_t gpu_index) {
check_cuda_error(cudaSetDevice(gpu_index));
cudaStream_t stream;
check_cuda_error(cudaStreamCreate(&stream));
check_cuda_error(cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking));
return stream;
}
@@ -47,9 +47,7 @@ void *cuda_malloc_async(uint64_t size, cudaStream_t stream,
&support_async_alloc, cudaDevAttrMemoryPoolsSupported, gpu_index));
if (support_async_alloc) {
cuda_synchronize_stream(stream, gpu_index);
check_cuda_error(cudaMallocAsync((void **)&ptr, size, stream));
cuda_synchronize_stream(stream, gpu_index);
} else {
check_cuda_error(cudaMalloc((void **)&ptr, size));
}
@@ -121,21 +119,22 @@ void cuda_memcpy_async_gpu_to_gpu(void *dest, void *src, uint64_t size,
return;
cudaPointerAttributes attr_dest;
check_cuda_error(cudaPointerGetAttributes(&attr_dest, dest));
if (attr_dest.device != gpu_index && attr_dest.type != cudaMemoryTypeDevice) {
if (attr_dest.type != cudaMemoryTypeDevice) {
PANIC("Cuda error: invalid dest device pointer in copy from GPU to GPU.")
}
cudaPointerAttributes attr_src;
check_cuda_error(cudaPointerGetAttributes(&attr_src, src));
if (attr_src.device != gpu_index && attr_src.type != cudaMemoryTypeDevice) {
if (attr_src.type != cudaMemoryTypeDevice) {
PANIC("Cuda error: invalid src device pointer in copy from GPU to GPU.")
}
if (attr_src.device != attr_dest.device) {
PANIC("Cuda error: different devices specified in copy from GPU to GPU.")
}
check_cuda_error(cudaSetDevice(gpu_index));
check_cuda_error(
cudaMemcpyAsync(dest, src, size, cudaMemcpyDeviceToDevice, stream));
if (attr_src.device == attr_dest.device) {
check_cuda_error(
cudaMemcpyAsync(dest, src, size, cudaMemcpyDeviceToDevice, stream));
} else {
check_cuda_error(cudaMemcpyPeerAsync(dest, attr_dest.device, src,
attr_src.device, size, stream));
}
}
/// Synchronizes device
@@ -167,19 +166,21 @@ __global__ void cuda_set_value_kernel(Torus *array, Torus value, Torus n) {
template <typename Torus>
void cuda_set_value_async(cudaStream_t stream, uint32_t gpu_index,
Torus *d_array, Torus value, Torus n) {
cudaPointerAttributes attr;
check_cuda_error(cudaPointerGetAttributes(&attr, d_array));
if (attr.type != cudaMemoryTypeDevice) {
PANIC("Cuda error: invalid dest device pointer in cuda set value.")
}
check_cuda_error(cudaSetDevice(gpu_index));
int block_size = 256;
int num_blocks = (n + block_size - 1) / block_size;
if (n > 0) {
cudaPointerAttributes attr;
check_cuda_error(cudaPointerGetAttributes(&attr, d_array));
if (attr.type != cudaMemoryTypeDevice) {
PANIC("Cuda error: invalid dest device pointer in cuda set value.")
}
check_cuda_error(cudaSetDevice(gpu_index));
int block_size = 256;
int num_blocks = (n + block_size - 1) / block_size;
// Launch the kernel
cuda_set_value_kernel<<<num_blocks, block_size, 0, stream>>>(d_array, value,
n);
check_cuda_error(cudaGetLastError());
// Launch the kernel
cuda_set_value_kernel<<<num_blocks, block_size, 0, stream>>>(d_array, value,
n);
check_cuda_error(cudaGetLastError());
}
}
/// Explicitly instantiate cuda_set_value_async for 32 and 64 bits
@@ -242,22 +243,18 @@ void cuda_drop_async(void *ptr, cudaStream_t stream, uint32_t gpu_index) {
/// Get the maximum size for the shared memory
int cuda_get_max_shared_memory(uint32_t gpu_index) {
check_cuda_error(cudaSetDevice(gpu_index));
int max_shared_memory = 0;
cudaDeviceGetAttribute(&max_shared_memory, cudaDevAttrMaxSharedMemoryPerBlock,
gpu_index);
check_cuda_error(cudaGetLastError());
#if CUDA_ARCH == 900
max_shared_memory = 226000;
#elif CUDA_ARCH == 890
max_shared_memory = 127000;
#elif CUDA_ARCH == 800
max_shared_memory = 163000;
#elif CUDA_ARCH == 700
max_shared_memory = 95000;
#endif
return max_shared_memory;
}
void cuda_stream_add_callback(cudaStream_t stream, uint32_t gpu_index,
cudaStreamCallback_t callback, void *user_data) {
check_cuda_error(cudaSetDevice(gpu_index));
check_cuda_error(cudaStreamAddCallback(stream, callback, user_data, 0));
}
void host_free_on_stream_callback(cudaStream_t stream, cudaError_t status,
void *host_pointer) {
free(host_pointer);
}

View File

@@ -294,9 +294,6 @@ template <class params> __device__ void NSMFFT_direct(double2 *A) {
__syncthreads();
}
// compressed size = 8192 is actual polynomial size = 16384.
// from this size, twiddles can't fit in constant memory,
// so from here, butterfly operation access device memory.
if constexpr (params::degree >= 8192) {
// level 13
tid = threadIdx.x;
@@ -307,7 +304,7 @@ template <class params> __device__ void NSMFFT_direct(double2 *A) {
(tid & (params::degree / 8192 - 1));
i2 = i1 + params::degree / 8192;
w = negtwiddles13[twid_id];
w = negtwiddles[twid_id + 4096];
u = A[i1];
v = A[i2] * w;
@@ -351,10 +348,6 @@ template <class params> __device__ void NSMFFT_inverse(double2 *A) {
// mapping in backward fft is reversed
// butterfly operation is started from last level
// compressed size = 8192 is actual polynomial size = 16384.
// twiddles for this size can't fit in constant memory so
// butterfly operation for this level access device memory to fetch
// twiddles
if constexpr (params::degree >= 8192) {
// level 13
tid = threadIdx.x;
@@ -365,7 +358,7 @@ template <class params> __device__ void NSMFFT_inverse(double2 *A) {
(tid & (params::degree / 8192 - 1));
i2 = i1 + params::degree / 8192;
w = negtwiddles13[twid_id];
w = negtwiddles[twid_id + 4096];
u = A[i1] - A[i2];
A[i1] += A[i2];

View File

@@ -1,6 +1,6 @@
#include "cuComplex.h"
__constant__ double2 negtwiddles[4096] = {
__device__ double2 negtwiddles[8192] = {
{0, 0},
{0.707106781186547461715008466854, 0.707106781186547572737310929369},
{0.92387953251128673848313610506, 0.382683432365089781779232680492},
@@ -4096,9 +4096,7 @@ __constant__ double2 negtwiddles[4096] = {
{0.70791982920081630847874976098, 0.706292797233758484765075991163},
{-0.706292797233758484765075991163, 0.70791982920081630847874976098},
{0.00115048533711384847431913325266, 0.99999933819152553304832053982},
{-0.99999933819152553304832053982, 0.00115048533711384847431913325266}};
__device__ double2 negtwiddles13[4096] = {
{-0.99999933819152553304832053982, 0.00115048533711384847431913325266},
{0.999999981616429334252416083473, 0.000191747597310703291528452552051},
{-0.000191747597310703291528452552051, 0.999999981616429334252416083473},
{0.706971182161065359039753275283, 0.707242354213734603085583785287},

View File

@@ -2,12 +2,7 @@
#define GPU_BOOTSTRAP_TWIDDLES_CUH
/*
* 'negtwiddles' are stored in constant memory for faster access times
* because of it's limited size, only twiddles for up to 2^12 polynomial size
* can be stored there, twiddles for 2^13 are stored in device memory
* 'negtwiddles13'
* 'negtwiddles' are stored in device memory to profit caching
*/
extern __constant__ double2 negtwiddles[4096];
extern __device__ double2 negtwiddles13[4096];
extern __device__ double2 negtwiddles[8192];
#endif

View File

@@ -0,0 +1,49 @@
#include "integer/addition.cuh"
void scratch_cuda_signed_overflowing_add_or_sub_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
uint32_t ks_base_log, uint32_t pbs_level, uint32_t pbs_base_log,
uint32_t grouping_factor, uint32_t num_blocks, int8_t signed_operation,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
bool allocate_gpu_memory) {
SIGNED_OPERATION op = (signed_operation == 1) ? SIGNED_OPERATION::ADDITION
: SIGNED_OPERATION::SUBTRACTION;
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
ks_base_log, pbs_level, pbs_base_log, grouping_factor,
message_modulus, carry_modulus);
scratch_cuda_integer_signed_overflowing_add_or_sub_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_signed_overflowing_add_or_sub_memory<uint64_t> **)mem_ptr,
num_blocks, op, params, allocate_gpu_memory);
}
void cuda_signed_overflowing_add_or_sub_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, void *lhs,
void *rhs, void *overflowed, int8_t signed_operation, int8_t *mem_ptr,
void **bsks, void **ksks, uint32_t num_blocks) {
auto mem = (int_signed_overflowing_add_or_sub_memory<uint64_t> *)mem_ptr;
SIGNED_OPERATION op = (signed_operation == 1) ? SIGNED_OPERATION::ADDITION
: SIGNED_OPERATION::SUBTRACTION;
host_integer_signed_overflowing_add_or_sub_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lhs), static_cast<uint64_t *>(rhs),
static_cast<uint64_t *>(overflowed), op, bsks, (uint64_t **)(ksks), mem,
num_blocks);
}
void cleanup_signed_overflowing_add_or_sub(void **streams,
uint32_t *gpu_indexes,
uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_signed_overflowing_add_or_sub_memory<uint64_t> *mem_ptr =
(int_signed_overflowing_add_or_sub_memory<uint64_t> *)(*mem_ptr_void);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}

View File

@@ -0,0 +1,137 @@
#ifndef TFHE_RS_ADDITION_CUH
#define TFHE_RS_ADDITION_CUH
#include "crypto/keyswitch.cuh"
#include "device.h"
#include "integer.h"
#include "integer/comparison.cuh"
#include "integer/integer.cuh"
#include "integer/negation.cuh"
#include "integer/scalar_shifts.cuh"
#include "linear_algebra.h"
#include "programmable_bootstrap.h"
#include "utils/helper.cuh"
#include "utils/kernel_dimensions.cuh"
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
template <typename Torus>
void host_resolve_signed_overflow(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *result, Torus *last_block_inner_propagation,
Torus *last_block_input_carry, Torus *last_block_output_carry,
int_resolve_signed_overflow_memory<Torus> *mem, void **bsks, Torus **ksks) {
auto x = mem->x;
Torus *d_clears =
(Torus *)cuda_malloc_async(sizeof(Torus), streams[0], gpu_indexes[0]);
cuda_set_value_async<Torus>(streams[0], gpu_indexes[0], d_clears, 2, 1);
// replace with host function call
cuda_mult_lwe_ciphertext_vector_cleartext_vector_64(
streams[0], gpu_indexes[0], x, last_block_output_carry, d_clears,
mem->params.big_lwe_dimension, 1);
host_addition(streams[0], gpu_indexes[0], last_block_inner_propagation,
last_block_inner_propagation, x, mem->params.big_lwe_dimension,
1);
host_addition(streams[0], gpu_indexes[0], last_block_inner_propagation,
last_block_inner_propagation, last_block_input_carry,
mem->params.big_lwe_dimension, 1);
host_apply_univariate_lut_kb<Torus>(streams, gpu_indexes, gpu_count, result,
last_block_inner_propagation,
mem->resolve_overflow_lut, ksks, bsks, 1);
cuda_drop_async(d_clears, streams[0], gpu_indexes[0]);
}
template <typename Torus>
__host__ void scratch_cuda_integer_signed_overflowing_add_or_sub_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_signed_overflowing_add_or_sub_memory<Torus> **mem_ptr,
uint32_t num_blocks, SIGNED_OPERATION op, int_radix_params params,
bool allocate_gpu_memory) {
*mem_ptr = new int_signed_overflowing_add_or_sub_memory<Torus>(
streams, gpu_indexes, gpu_count, params, num_blocks, op,
allocate_gpu_memory);
}
/*
* Addition - signed_operation = 1
* Subtraction - signed_operation = -1
*/
template <typename Torus>
__host__ void host_integer_signed_overflowing_add_or_sub_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lhs, Torus *rhs, Torus *overflowed, SIGNED_OPERATION op, void **bsks,
uint64_t **ksks,
int_signed_overflowing_add_or_sub_memory<uint64_t> *mem_ptr,
uint32_t num_blocks) {
auto radix_params = mem_ptr->params;
uint32_t big_lwe_dimension = radix_params.big_lwe_dimension;
uint32_t big_lwe_size = big_lwe_dimension + 1;
uint32_t big_lwe_size_bytes = big_lwe_size * sizeof(Torus);
assert(radix_params.message_modulus >= 4 && radix_params.carry_modulus >= 4);
auto result = mem_ptr->result;
auto neg_rhs = mem_ptr->neg_rhs;
auto input_carries = mem_ptr->input_carries;
auto output_carry = mem_ptr->output_carry;
auto last_block_inner_propagation = mem_ptr->last_block_inner_propagation;
cuda_memcpy_async_gpu_to_gpu(result, lhs, num_blocks * big_lwe_size_bytes,
streams[0], gpu_indexes[0]);
// phase 1
if (op == SIGNED_OPERATION::ADDITION) {
host_addition(streams[0], gpu_indexes[0], result, lhs, rhs,
big_lwe_dimension, num_blocks);
} else {
host_integer_radix_negation(
streams, gpu_indexes, gpu_count, neg_rhs, rhs, big_lwe_dimension,
num_blocks, radix_params.message_modulus, radix_params.carry_modulus);
host_addition(streams[0], gpu_indexes[0], result, lhs, neg_rhs,
big_lwe_dimension, num_blocks);
}
// phase 2
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
host_propagate_single_carry(mem_ptr->sub_streams_1, gpu_indexes, gpu_count,
result, output_carry, input_carries,
mem_ptr->scp_mem, bsks, ksks, num_blocks);
host_generate_last_block_inner_propagation(
mem_ptr->sub_streams_2, gpu_indexes, gpu_count,
last_block_inner_propagation, &lhs[(num_blocks - 1) * big_lwe_size],
&rhs[(num_blocks - 1) * big_lwe_size], mem_ptr->las_block_prop_mem, bsks,
ksks);
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(mem_ptr->sub_streams_1[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_2[j], gpu_indexes[j]);
}
// phase 3
auto input_carry = &input_carries[(num_blocks - 1) * big_lwe_size];
host_resolve_signed_overflow(
streams, gpu_indexes, gpu_count, overflowed, last_block_inner_propagation,
input_carry, output_carry, mem_ptr->resolve_overflow_mem, bsks, ksks);
cuda_memcpy_async_gpu_to_gpu(lhs, result, num_blocks * big_lwe_size_bytes,
streams[0], gpu_indexes[0]);
}
#endif // TFHE_RS_ADDITION_CUH

View File

@@ -1,13 +1,13 @@
#include "integer/bitwise_ops.cuh"
void scratch_cuda_integer_radix_bitop_kb_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t lwe_ciphertext_count, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, BITOP_TYPE op_type,
bool allocate_gpu_memory) {
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
uint32_t ks_base_log, uint32_t pbs_level, uint32_t pbs_base_log,
uint32_t grouping_factor, uint32_t lwe_ciphertext_count,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
BITOP_TYPE op_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
@@ -15,7 +15,7 @@ void scratch_cuda_integer_radix_bitop_kb_64(
message_modulus, carry_modulus);
scratch_cuda_integer_radix_bitop_kb<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_bitop_buffer<uint64_t> **)mem_ptr, lwe_ciphertext_count, params,
op_type, allocate_gpu_memory);
}
@@ -23,34 +23,21 @@ void scratch_cuda_integer_radix_bitop_kb_64(
void cuda_bitop_integer_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *lwe_array_out, void *lwe_array_1, void *lwe_array_2, int8_t *mem_ptr,
void *bsk, void *ksk, uint32_t lwe_ciphertext_count) {
void **bsks, void **ksks, uint32_t lwe_ciphertext_count) {
host_integer_radix_bitop_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_1),
static_cast<uint64_t *>(lwe_array_2),
(int_bitop_buffer<uint64_t> *)mem_ptr, bsk, static_cast<uint64_t *>(ksk),
(int_bitop_buffer<uint64_t> *)mem_ptr, bsks, (uint64_t **)(ksks),
lwe_ciphertext_count);
}
void cuda_bitnot_integer_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *lwe_array_out, void *lwe_array_in, int8_t *mem_ptr, void *bsk,
void *ksk, uint32_t lwe_ciphertext_count) {
host_integer_radix_bitnot_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_in),
(int_bitop_buffer<uint64_t> *)mem_ptr, bsk, static_cast<uint64_t *>(ksk),
lwe_ciphertext_count);
}
void cleanup_cuda_integer_bitop(void *stream, uint32_t gpu_index,
int8_t **mem_ptr_void) {
void cleanup_cuda_integer_bitop(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count, int8_t **mem_ptr_void) {
int_bitop_buffer<uint64_t> *mem_ptr =
(int_bitop_buffer<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}

View File

@@ -16,38 +16,25 @@ __host__ void
host_integer_radix_bitop_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array_out,
Torus *lwe_array_1, Torus *lwe_array_2,
int_bitop_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
int_bitop_buffer<Torus> *mem_ptr, void **bsks,
Torus **ksks, uint32_t num_radix_blocks) {
auto lut = mem_ptr->lut;
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_1, lwe_array_2,
bsk, ksk, num_radix_blocks, lut, lut->params.message_modulus);
}
template <typename Torus>
__host__ void host_integer_radix_bitnot_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, int_bitop_buffer<Torus> *mem_ptr,
void *bsk, Torus *ksk, uint32_t num_radix_blocks) {
auto lut = mem_ptr->lut;
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_in, bsk, ksk,
num_radix_blocks, lut);
bsks, ksks, num_radix_blocks, lut, lut->params.message_modulus);
}
template <typename Torus>
__host__ void scratch_cuda_integer_radix_bitop_kb(
cudaStream_t stream, uint32_t gpu_index, int_bitop_buffer<Torus> **mem_ptr,
uint32_t num_radix_blocks, int_radix_params params, BITOP_TYPE op,
bool allocate_gpu_memory) {
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_bitop_buffer<Torus> **mem_ptr, uint32_t num_radix_blocks,
int_radix_params params, BITOP_TYPE op, bool allocate_gpu_memory) {
cudaSetDevice(gpu_index);
*mem_ptr = new int_bitop_buffer<Torus>(stream, gpu_index, op, params,
num_radix_blocks, allocate_gpu_memory);
*mem_ptr =
new int_bitop_buffer<Torus>(streams, gpu_indexes, gpu_count, op, params,
num_radix_blocks, allocate_gpu_memory);
}
#endif

View File

@@ -1,12 +1,13 @@
#include "integer/cmux.cuh"
void scratch_cuda_integer_radix_cmux_kb_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t lwe_ciphertext_count, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, bool allocate_gpu_memory) {
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
uint32_t ks_base_log, uint32_t pbs_level, uint32_t pbs_base_log,
uint32_t grouping_factor, uint32_t lwe_ciphertext_count,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
@@ -17,7 +18,7 @@ void scratch_cuda_integer_radix_cmux_kb_64(
[](uint64_t x) -> uint64_t { return x == 1; };
scratch_cuda_integer_radix_cmux_kb(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_cmux_buffer<uint64_t> **)mem_ptr, predicate_lut_f,
lwe_ciphertext_count, params, allocate_gpu_memory);
}
@@ -25,7 +26,7 @@ void scratch_cuda_integer_radix_cmux_kb_64(
void cuda_cmux_integer_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *lwe_array_out, void *lwe_condition, void *lwe_array_true,
void *lwe_array_false, int8_t *mem_ptr, void *bsk, void *ksk,
void *lwe_array_false, int8_t *mem_ptr, void **bsks, void **ksks,
uint32_t lwe_ciphertext_count) {
host_integer_radix_cmux_kb<uint64_t>(
@@ -34,15 +35,16 @@ void cuda_cmux_integer_radix_ciphertext_kb_64(
static_cast<uint64_t *>(lwe_condition),
static_cast<uint64_t *>(lwe_array_true),
static_cast<uint64_t *>(lwe_array_false),
(int_cmux_buffer<uint64_t> *)mem_ptr, bsk, static_cast<uint64_t *>(ksk),
(int_cmux_buffer<uint64_t> *)mem_ptr, bsks, (uint64_t **)(ksks),
lwe_ciphertext_count);
}
void cleanup_cuda_integer_radix_cmux(void *stream, uint32_t gpu_index,
void cleanup_cuda_integer_radix_cmux(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_cmux_buffer<uint64_t> *mem_ptr =
(int_cmux_buffer<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}

View File

@@ -2,15 +2,14 @@
#define CUDA_INTEGER_CMUX_CUH
#include "integer.cuh"
#include <omp.h>
template <typename Torus>
__host__ void zero_out_if(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array_out,
Torus *lwe_array_input, Torus *lwe_condition,
int_zero_out_if_buffer<Torus> *mem_ptr,
int_radix_lut<Torus> *predicate, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
int_radix_lut<Torus> *predicate, void **bsks,
Torus **ksks, uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
@@ -36,46 +35,41 @@ __host__ void zero_out_if(cudaStream_t *streams, uint32_t *gpu_indexes,
}
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array_out, tmp_lwe_array_input, bsk,
ksk, num_radix_blocks, predicate);
streams, gpu_indexes, gpu_count, lwe_array_out, tmp_lwe_array_input, bsks,
ksks, num_radix_blocks, predicate);
}
template <typename Torus>
__host__ void host_integer_radix_cmux_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_condition, Torus *lwe_array_true,
Torus *lwe_array_false, int_cmux_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
Torus *lwe_array_false, int_cmux_buffer<Torus> *mem_ptr, void **bsks,
Torus **ksks, uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
// Since our CPU threads will be working on different streams we shall assert
// the work in the main stream is completed
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
auto true_stream = mem_ptr->zero_if_true_buffer->local_stream;
auto false_stream = mem_ptr->zero_if_false_buffer->local_stream;
#pragma omp parallel sections
{
// Both sections may be executed in parallel
#pragma omp section
{
auto mem_true = mem_ptr->zero_if_true_buffer;
zero_out_if(&true_stream, gpu_indexes, gpu_count, mem_ptr->tmp_true_ct,
lwe_array_true, lwe_condition, mem_true,
mem_ptr->inverted_predicate_lut, bsk, ksk, num_radix_blocks);
}
#pragma omp section
{
auto mem_false = mem_ptr->zero_if_false_buffer;
zero_out_if(&false_stream, gpu_indexes, gpu_count, mem_ptr->tmp_false_ct,
lwe_array_false, lwe_condition, mem_false,
mem_ptr->predicate_lut, bsk, ksk, num_radix_blocks);
}
auto true_streams = mem_ptr->zero_if_true_buffer->true_streams;
auto false_streams = mem_ptr->zero_if_false_buffer->false_streams;
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
auto mem_true = mem_ptr->zero_if_true_buffer;
zero_out_if(true_streams, gpu_indexes, gpu_count, mem_ptr->tmp_true_ct,
lwe_array_true, lwe_condition, mem_true,
mem_ptr->inverted_predicate_lut, bsks, ksks, num_radix_blocks);
auto mem_false = mem_ptr->zero_if_false_buffer;
zero_out_if(false_streams, gpu_indexes, gpu_count, mem_ptr->tmp_false_ct,
lwe_array_false, lwe_condition, mem_false, mem_ptr->predicate_lut,
bsks, ksks, num_radix_blocks);
for (uint j = 0; j < mem_ptr->zero_if_true_buffer->active_gpu_count; j++) {
cuda_synchronize_stream(true_streams[j], gpu_indexes[j]);
}
for (uint j = 0; j < mem_ptr->zero_if_false_buffer->active_gpu_count; j++) {
cuda_synchronize_stream(false_streams[j], gpu_indexes[j]);
}
cuda_synchronize_stream(true_stream, gpu_indexes[0]);
cuda_synchronize_stream(false_stream, gpu_indexes[0]);
// If the condition was true, true_ct will have kept its value and false_ct
// will be 0 If the condition was false, true_ct will be 0 and false_ct will
@@ -86,19 +80,19 @@ __host__ void host_integer_radix_cmux_kb(
num_radix_blocks);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array_out, added_cts, bsk, ksk,
streams, gpu_indexes, gpu_count, lwe_array_out, added_cts, bsks, ksks,
num_radix_blocks, mem_ptr->message_extract_lut);
}
template <typename Torus>
__host__ void scratch_cuda_integer_radix_cmux_kb(
cudaStream_t stream, uint32_t gpu_index, int_cmux_buffer<Torus> **mem_ptr,
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_cmux_buffer<Torus> **mem_ptr,
std::function<Torus(Torus)> predicate_lut_f, uint32_t num_radix_blocks,
int_radix_params params, bool allocate_gpu_memory) {
cudaSetDevice(gpu_index);
*mem_ptr =
new int_cmux_buffer<Torus>(stream, gpu_index, predicate_lut_f, params,
num_radix_blocks, allocate_gpu_memory);
*mem_ptr = new int_cmux_buffer<Torus>(streams, gpu_indexes, gpu_count,
predicate_lut_f, params,
num_radix_blocks, allocate_gpu_memory);
}
#endif

View File

@@ -1,13 +1,13 @@
#include "integer/comparison.cuh"
void scratch_cuda_integer_radix_comparison_kb_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_radix_blocks, uint32_t message_modulus, uint32_t carry_modulus,
PBS_TYPE pbs_type, COMPARISON_TYPE op_type, bool is_signed,
bool allocate_gpu_memory) {
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
uint32_t ks_base_log, uint32_t pbs_level, uint32_t pbs_base_log,
uint32_t grouping_factor, uint32_t num_radix_blocks,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
COMPARISON_TYPE op_type, bool is_signed, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
@@ -18,7 +18,7 @@ void scratch_cuda_integer_radix_comparison_kb_64(
case EQ:
case NE:
scratch_cuda_integer_radix_comparison_check_kb<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_comparison_buffer<uint64_t> **)mem_ptr, num_radix_blocks, params,
op_type, false, allocate_gpu_memory);
break;
@@ -29,7 +29,7 @@ void scratch_cuda_integer_radix_comparison_kb_64(
case MAX:
case MIN:
scratch_cuda_integer_radix_comparison_check_kb<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_comparison_buffer<uint64_t> **)mem_ptr, num_radix_blocks, params,
op_type, is_signed, allocate_gpu_memory);
break;
@@ -39,7 +39,7 @@ void scratch_cuda_integer_radix_comparison_kb_64(
void cuda_comparison_integer_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *lwe_array_out, void *lwe_array_1, void *lwe_array_2, int8_t *mem_ptr,
void *bsk, void *ksk, uint32_t num_radix_blocks) {
void **bsks, void **ksks, uint32_t num_radix_blocks) {
int_comparison_buffer<uint64_t> *buffer =
(int_comparison_buffer<uint64_t> *)mem_ptr;
@@ -50,8 +50,8 @@ void cuda_comparison_integer_radix_ciphertext_kb_64(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_1),
static_cast<uint64_t *>(lwe_array_2), buffer, bsk,
static_cast<uint64_t *>(ksk), num_radix_blocks);
static_cast<uint64_t *>(lwe_array_2), buffer, bsks, (uint64_t **)(ksks),
num_radix_blocks);
break;
case GT:
case GE:
@@ -62,7 +62,7 @@ void cuda_comparison_integer_radix_ciphertext_kb_64(
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_1),
static_cast<uint64_t *>(lwe_array_2), buffer,
buffer->diff_buffer->operator_f, bsk, static_cast<uint64_t *>(ksk),
buffer->diff_buffer->operator_f, bsks, (uint64_t **)(ksks),
num_radix_blocks);
break;
case MAX:
@@ -71,18 +71,19 @@ void cuda_comparison_integer_radix_ciphertext_kb_64(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_1),
static_cast<uint64_t *>(lwe_array_2), buffer, bsk,
static_cast<uint64_t *>(ksk), num_radix_blocks);
static_cast<uint64_t *>(lwe_array_2), buffer, bsks, (uint64_t **)(ksks),
num_radix_blocks);
break;
default:
PANIC("Cuda error: integer operation not supported")
}
}
void cleanup_cuda_integer_comparison(void *stream, uint32_t gpu_index,
void cleanup_cuda_integer_comparison(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_comparison_buffer<uint64_t> *mem_ptr =
(int_comparison_buffer<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}

View File

@@ -56,12 +56,11 @@ __host__ void accumulate_all_blocks(cudaStream_t stream, uint32_t gpu_index,
*
*/
template <typename Torus>
__host__ void
are_all_comparisons_block_true(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array_out,
Torus *lwe_array_in,
int_comparison_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
__host__ void are_all_comparisons_block_true(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
@@ -94,6 +93,8 @@ are_all_comparisons_block_true(cudaStream_t *streams, uint32_t *gpu_indexes,
// as in the worst case we will be adding `max_value` ones
auto input_blocks = tmp_out;
auto accumulator = are_all_block_true_buffer->tmp_block_accumulated;
auto is_equal_to_num_blocks_map =
&are_all_block_true_buffer->is_equal_to_lut_map;
for (int i = 0; i < num_chunks; i++) {
accumulate_all_blocks(streams[0], gpu_indexes[0], accumulator,
input_blocks, big_lwe_dimension, chunk_length);
@@ -103,8 +104,6 @@ are_all_comparisons_block_true(cudaStream_t *streams, uint32_t *gpu_indexes,
input_blocks += (big_lwe_dimension + 1) * chunk_length;
}
accumulator = are_all_block_true_buffer->tmp_block_accumulated;
auto is_equal_to_num_blocks_map =
&are_all_block_true_buffer->is_equal_to_lut_map;
// Selects a LUT
int_radix_lut<Torus> *lut;
@@ -119,7 +118,7 @@ are_all_comparisons_block_true(cudaStream_t *streams, uint32_t *gpu_indexes,
} else {
// LUT needs to be computed
auto new_lut =
new int_radix_lut<Torus>(streams[0], gpu_indexes[0], params,
new int_radix_lut<Torus>(streams, gpu_indexes, gpu_count, params,
max_value, num_radix_blocks, true);
auto is_equal_to_num_blocks_lut_f = [max_value,
@@ -127,10 +126,12 @@ are_all_comparisons_block_true(cudaStream_t *streams, uint32_t *gpu_indexes,
return (x & max_value) == chunk_length;
};
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], new_lut->lut, glwe_dimension,
polynomial_size, message_modulus, carry_modulus,
streams[0], gpu_indexes[0], new_lut->get_lut(gpu_indexes[0], 0),
glwe_dimension, polynomial_size, message_modulus, carry_modulus,
is_equal_to_num_blocks_lut_f);
new_lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
(*is_equal_to_num_blocks_map)[chunk_length] = new_lut;
lut = new_lut;
}
@@ -140,12 +141,12 @@ are_all_comparisons_block_true(cudaStream_t *streams, uint32_t *gpu_indexes,
if (remaining_blocks == 1) {
// In the last iteration we copy the output to the final address
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array_out, accumulator, bsk, ksk,
1, lut);
streams, gpu_indexes, gpu_count, lwe_array_out, accumulator, bsks,
ksks, 1, lut);
return;
} else {
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, tmp_out, accumulator, bsk, ksk,
streams, gpu_indexes, gpu_count, tmp_out, accumulator, bsks, ksks,
num_chunks, lut);
}
}
@@ -161,7 +162,7 @@ template <typename Torus>
__host__ void is_at_least_one_comparisons_block_true(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in,
int_comparison_buffer<Torus> *mem_ptr, void *bsk, Torus *ksk,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
@@ -207,13 +208,13 @@ __host__ void is_at_least_one_comparisons_block_true(
if (remaining_blocks == 1) {
// In the last iteration we copy the output to the final address
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array_out, accumulator, bsk, ksk,
1, lut);
streams, gpu_indexes, gpu_count, lwe_array_out, accumulator, bsks,
ksks, 1, lut);
return;
} else {
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, mem_ptr->tmp_lwe_array_out,
accumulator, bsk, ksk, num_chunks, lut);
accumulator, bsks, ksks, num_chunks, lut);
}
}
}
@@ -241,10 +242,9 @@ template <typename Torus>
__host__ void host_compare_with_zero_equality(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in,
int_comparison_buffer<Torus> *mem_ptr, void *bsk, Torus *ksk,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
int32_t num_radix_blocks, int_radix_lut<Torus> *zero_comparison) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto message_modulus = params.message_modulus;
@@ -293,27 +293,26 @@ __host__ void host_compare_with_zero_equality(
}
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, sum, sum, bsk, ksk, num_sum_blocks,
streams, gpu_indexes, gpu_count, sum, sum, bsks, ksks, num_sum_blocks,
zero_comparison);
are_all_comparisons_block_true(streams, gpu_indexes, gpu_count, lwe_array_out,
sum, mem_ptr, bsk, ksk, num_sum_blocks);
sum, mem_ptr, bsks, ksks, num_sum_blocks);
}
template <typename Torus>
__host__ void host_integer_radix_equality_check_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_1, Torus *lwe_array_2,
int_comparison_buffer<Torus> *mem_ptr, void *bsk, Torus *ksk,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto eq_buffer = mem_ptr->eq_buffer;
// Applies the LUT for the comparison operation
auto comparisons = mem_ptr->tmp_block_comparisons;
integer_radix_apply_bivariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, comparisons, lwe_array_1, lwe_array_2,
bsk, ksk, num_radix_blocks, eq_buffer->operator_lut,
bsks, ksks, num_radix_blocks, eq_buffer->operator_lut,
eq_buffer->operator_lut->params.message_modulus);
// This takes a Vec of blocks, where each block is either 0 or 1.
@@ -321,7 +320,7 @@ __host__ void host_integer_radix_equality_check_kb(
// It returns a block encrypting 1 if all input blocks are 1
// otherwise the block encrypts 0
are_all_comparisons_block_true(streams, gpu_indexes, gpu_count, lwe_array_out,
comparisons, mem_ptr, bsk, ksk,
comparisons, mem_ptr, bsks, ksks,
num_radix_blocks);
}
@@ -330,10 +329,9 @@ __host__ void
compare_radix_blocks_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array_out,
Torus *lwe_array_left, Torus *lwe_array_right,
int_comparison_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
int_comparison_buffer<Torus> *mem_ptr, void **bsks,
Torus **ksks, uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto message_modulus = params.message_modulus;
@@ -360,7 +358,7 @@ compare_radix_blocks_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
// Apply LUT to compare to 0
auto is_non_zero_lut = mem_ptr->eq_buffer->is_non_zero_lut;
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_out, bsk, ksk,
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_out, bsks, ksks,
num_radix_blocks, is_non_zero_lut);
// Add one
@@ -380,10 +378,9 @@ tree_sign_reduction(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array_out,
Torus *lwe_block_comparisons,
int_tree_sign_reduction_buffer<Torus> *tree_buffer,
std::function<Torus(Torus)> sign_handler_f, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
std::function<Torus(Torus)> sign_handler_f, void **bsks,
Torus **ksks, uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = tree_buffer->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto glwe_dimension = params.glwe_dimension;
@@ -413,7 +410,7 @@ tree_sign_reduction(cudaStream_t *streams, uint32_t *gpu_indexes,
partial_block_count, 4);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, x, y, bsk, ksk,
streams, gpu_indexes, gpu_count, x, y, bsks, ksks,
partial_block_count >> 1, inner_tree_leaf);
if ((partial_block_count % 2) != 0) {
@@ -451,13 +448,15 @@ tree_sign_reduction(cudaStream_t *streams, uint32_t *gpu_indexes,
y = x;
f = sign_handler_f;
}
generate_device_accumulator<Torus>(streams[0], gpu_indexes[0], last_lut->lut,
glwe_dimension, polynomial_size,
message_modulus, carry_modulus, f);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], last_lut->get_lut(gpu_indexes[0], 0),
glwe_dimension, polynomial_size, message_modulus, carry_modulus, f);
last_lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
// Last leaf
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, y, bsk, ksk, 1, last_lut);
integer_radix_apply_univariate_lookup_table_kb(streams, gpu_indexes,
gpu_count, lwe_array_out, y,
bsks, ksks, 1, last_lut);
}
template <typename Torus>
@@ -465,10 +464,9 @@ __host__ void host_integer_radix_difference_check_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_left, Torus *lwe_array_right,
int_comparison_buffer<Torus> *mem_ptr,
std::function<Torus(Torus)> reduction_lut_f, void *bsk, Torus *ksk,
std::function<Torus(Torus)> reduction_lut_f, void **bsks, Torus **ksks,
uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto diff_buffer = mem_ptr->diff_buffer;
auto params = mem_ptr->params;
@@ -500,10 +498,10 @@ __host__ void host_integer_radix_difference_check_kb(
// Clean noise
auto identity_lut = mem_ptr->identity_lut;
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, packed_left, packed_left, bsk, ksk,
streams, gpu_indexes, gpu_count, packed_left, packed_left, bsks, ksks,
packed_num_radix_blocks, identity_lut);
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, packed_right, packed_right, bsk, ksk,
streams, gpu_indexes, gpu_count, packed_right, packed_right, bsks, ksks,
packed_num_radix_blocks, identity_lut);
lhs = packed_left;
@@ -520,14 +518,15 @@ __host__ void host_integer_radix_difference_check_kb(
// Compare packed blocks, or simply the total number of radix blocks in the
// inputs
compare_radix_blocks_kb(streams, gpu_indexes, gpu_count, comparisons, lhs,
rhs, mem_ptr, bsk, ksk, packed_num_radix_blocks);
rhs, mem_ptr, bsks, ksks, packed_num_radix_blocks);
num_comparisons = packed_num_radix_blocks;
} else {
// Packing is possible
if (carry_modulus >= message_modulus) {
// Compare (num_radix_blocks - 2) / 2 packed blocks
compare_radix_blocks_kb(streams, gpu_indexes, gpu_count, comparisons, lhs,
rhs, mem_ptr, bsk, ksk, packed_num_radix_blocks);
rhs, mem_ptr, bsks, ksks,
packed_num_radix_blocks);
// Compare the last block before the sign block separately
auto identity_lut = mem_ptr->identity_lut;
@@ -538,37 +537,37 @@ __host__ void host_integer_radix_difference_check_kb(
packed_num_radix_blocks * big_lwe_size;
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, last_left_block_before_sign_block,
lwe_array_left + (num_radix_blocks - 2) * big_lwe_size, bsk, ksk, 1,
lwe_array_left + (num_radix_blocks - 2) * big_lwe_size, bsks, ksks, 1,
identity_lut);
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, last_right_block_before_sign_block,
lwe_array_right + (num_radix_blocks - 2) * big_lwe_size, bsk, ksk, 1,
identity_lut);
lwe_array_right + (num_radix_blocks - 2) * big_lwe_size, bsks, ksks,
1, identity_lut);
compare_radix_blocks_kb(
streams, gpu_indexes, gpu_count,
comparisons + packed_num_radix_blocks * big_lwe_size,
last_left_block_before_sign_block, last_right_block_before_sign_block,
mem_ptr, bsk, ksk, 1);
mem_ptr, bsks, ksks, 1);
// Compare the sign block separately
integer_radix_apply_bivariate_lookup_table_kb(
streams, gpu_indexes, gpu_count,
comparisons + (packed_num_radix_blocks + 1) * big_lwe_size,
lwe_array_left + (num_radix_blocks - 1) * big_lwe_size,
lwe_array_right + (num_radix_blocks - 1) * big_lwe_size, bsk, ksk, 1,
mem_ptr->signed_lut, mem_ptr->signed_lut->params.message_modulus);
lwe_array_right + (num_radix_blocks - 1) * big_lwe_size, bsks, ksks,
1, mem_ptr->signed_lut, mem_ptr->signed_lut->params.message_modulus);
num_comparisons = packed_num_radix_blocks + 2;
} else {
compare_radix_blocks_kb(streams, gpu_indexes, gpu_count, comparisons,
lwe_array_left, lwe_array_right, mem_ptr, bsk,
ksk, num_radix_blocks - 1);
lwe_array_left, lwe_array_right, mem_ptr, bsks,
ksks, num_radix_blocks - 1);
// Compare the sign block separately
integer_radix_apply_bivariate_lookup_table_kb(
streams, gpu_indexes, gpu_count,
comparisons + (num_radix_blocks - 1) * big_lwe_size,
lwe_array_left + (num_radix_blocks - 1) * big_lwe_size,
lwe_array_right + (num_radix_blocks - 1) * big_lwe_size, bsk, ksk, 1,
mem_ptr->signed_lut, mem_ptr->signed_lut->params.message_modulus);
lwe_array_right + (num_radix_blocks - 1) * big_lwe_size, bsks, ksks,
1, mem_ptr->signed_lut, mem_ptr->signed_lut->params.message_modulus);
num_comparisons = num_radix_blocks;
}
}
@@ -578,20 +577,19 @@ __host__ void host_integer_radix_difference_check_kb(
// final sign
tree_sign_reduction(streams, gpu_indexes, gpu_count, lwe_array_out,
comparisons, mem_ptr->diff_buffer->tree_buffer,
reduction_lut_f, bsk, ksk, num_comparisons);
reduction_lut_f, bsks, ksks, num_comparisons);
}
template <typename Torus>
__host__ void scratch_cuda_integer_radix_comparison_check_kb(
cudaStream_t stream, uint32_t gpu_index,
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_comparison_buffer<Torus> **mem_ptr, uint32_t num_radix_blocks,
int_radix_params params, COMPARISON_TYPE op, bool is_signed,
bool allocate_gpu_memory) {
cudaSetDevice(gpu_index);
*mem_ptr = new int_comparison_buffer<Torus>(stream, gpu_index, op, params,
num_radix_blocks, is_signed,
allocate_gpu_memory);
*mem_ptr = new int_comparison_buffer<Torus>(streams, gpu_indexes, gpu_count,
op, params, num_radix_blocks,
is_signed, allocate_gpu_memory);
}
template <typename Torus>
@@ -599,20 +597,19 @@ __host__ void
host_integer_radix_maxmin_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array_out,
Torus *lwe_array_left, Torus *lwe_array_right,
int_comparison_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t total_num_radix_blocks) {
int_comparison_buffer<Torus> *mem_ptr, void **bsks,
Torus **ksks, uint32_t total_num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
// Compute the sign
host_integer_radix_difference_check_kb(
streams, gpu_indexes, gpu_count, mem_ptr->tmp_lwe_array_out,
lwe_array_left, lwe_array_right, mem_ptr, mem_ptr->identity_lut_f, bsk,
ksk, total_num_radix_blocks);
lwe_array_left, lwe_array_right, mem_ptr, mem_ptr->identity_lut_f, bsks,
ksks, total_num_radix_blocks);
// Selector
host_integer_radix_cmux_kb(streams, gpu_indexes, gpu_count, lwe_array_out,
mem_ptr->tmp_lwe_array_out, lwe_array_left,
lwe_array_right, mem_ptr->cmux_buffer, bsk, ksk,
lwe_array_right, mem_ptr->cmux_buffer, bsks, ksks,
total_num_radix_blocks);
}

View File

@@ -1,12 +1,12 @@
#include "integer/div_rem.cuh"
void scratch_cuda_integer_div_rem_radix_ciphertext_kb_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_blocks, uint32_t message_modulus, uint32_t carry_modulus,
PBS_TYPE pbs_type, bool allocate_gpu_memory) {
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
uint32_t ks_base_log, uint32_t pbs_level, uint32_t pbs_base_log,
uint32_t grouping_factor, uint32_t num_blocks, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
@@ -14,72 +14,29 @@ void scratch_cuda_integer_div_rem_radix_ciphertext_kb_64(
message_modulus, carry_modulus);
scratch_cuda_integer_div_rem_kb<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_div_rem_memory<uint64_t> **)mem_ptr, num_blocks, params,
allocate_gpu_memory);
}
void cuda_integer_div_rem_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, void *quotient,
void *remainder, void *numerator, void *divisor, int8_t *mem_ptr, void *bsk,
void *ksk, uint32_t num_blocks) {
void *remainder, void *numerator, void *divisor, int8_t *mem_ptr,
void **bsks, void **ksks, uint32_t num_blocks) {
auto mem = (int_div_rem_memory<uint64_t> *)mem_ptr;
switch (mem->params.polynomial_size) {
case 512:
host_integer_div_rem_kb<uint64_t, Degree<512>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(quotient), static_cast<uint64_t *>(remainder),
static_cast<uint64_t *>(numerator), static_cast<uint64_t *>(divisor),
bsk, static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 1024:
host_integer_div_rem_kb<uint64_t, Degree<1024>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(quotient), static_cast<uint64_t *>(remainder),
static_cast<uint64_t *>(numerator), static_cast<uint64_t *>(divisor),
bsk, static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 2048:
host_integer_div_rem_kb<uint64_t, Degree<2048>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(quotient), static_cast<uint64_t *>(remainder),
static_cast<uint64_t *>(numerator), static_cast<uint64_t *>(divisor),
bsk, static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 4096:
host_integer_div_rem_kb<uint64_t, Degree<4096>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(quotient), static_cast<uint64_t *>(remainder),
static_cast<uint64_t *>(numerator), static_cast<uint64_t *>(divisor),
bsk, static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 8192:
host_integer_div_rem_kb<uint64_t, Degree<8192>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(quotient), static_cast<uint64_t *>(remainder),
static_cast<uint64_t *>(numerator), static_cast<uint64_t *>(divisor),
bsk, static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 16384:
host_integer_div_rem_kb<uint64_t, Degree<16384>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(quotient), static_cast<uint64_t *>(remainder),
static_cast<uint64_t *>(numerator), static_cast<uint64_t *>(divisor),
bsk, static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
default:
PANIC("Cuda error (integer div_rem): unsupported polynomial size. "
"Only N = 512, 1024, 2048, 4096, 8192, 16384 is supported")
}
host_integer_div_rem_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(quotient), static_cast<uint64_t *>(remainder),
static_cast<uint64_t *>(numerator), static_cast<uint64_t *>(divisor),
bsks, (uint64_t **)(ksks), mem, num_blocks);
}
void cleanup_cuda_integer_div_rem(void *stream, uint32_t gpu_index,
int8_t **mem_ptr_void) {
void cleanup_cuda_integer_div_rem(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count, int8_t **mem_ptr_void) {
int_div_rem_memory<uint64_t> *mem_ptr =
(int_div_rem_memory<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}

View File

@@ -14,7 +14,6 @@
#include "utils/kernel_dimensions.cuh"
#include <fstream>
#include <iostream>
#include <omp.h>
#include <sstream>
#include <string>
#include <vector>
@@ -31,17 +30,13 @@ template <typename Torus> struct lwe_ciphertext_list {
int_radix_params params;
size_t big_lwe_size;
size_t radix_size;
size_t big_lwe_size_bytes;
size_t radix_size_bytes;
size_t big_lwe_dimension;
lwe_ciphertext_list(Torus *src, int_radix_params params, size_t max_blocks)
: data(src), params(params), max_blocks(max_blocks) {
big_lwe_size = params.big_lwe_dimension + 1;
big_lwe_size_bytes = big_lwe_size * sizeof(Torus);
radix_size = max_blocks * big_lwe_size;
radix_size_bytes = radix_size * sizeof(Torus);
big_lwe_dimension = params.big_lwe_dimension;
len = max_blocks;
}
@@ -164,22 +159,21 @@ template <typename Torus> struct lwe_ciphertext_list {
};
template <typename Torus>
__host__ void
scratch_cuda_integer_div_rem_kb(cudaStream_t stream, uint32_t gpu_index,
int_div_rem_memory<Torus> **mem_ptr,
uint32_t num_blocks, int_radix_params params,
bool allocate_gpu_memory) {
__host__ void scratch_cuda_integer_div_rem_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_div_rem_memory<Torus> **mem_ptr, uint32_t num_blocks,
int_radix_params params, bool allocate_gpu_memory) {
*mem_ptr = new int_div_rem_memory<Torus>(stream, gpu_index, params,
num_blocks, allocate_gpu_memory);
*mem_ptr = new int_div_rem_memory<Torus>(
streams, gpu_indexes, gpu_count, params, num_blocks, allocate_gpu_memory);
}
template <typename Torus, class params>
template <typename Torus>
__host__ void
host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *quotient, Torus *remainder,
Torus *numerator, Torus *divisor, void *bsk,
uint64_t *ksk, int_div_rem_memory<uint64_t> *mem_ptr,
Torus *numerator, Torus *divisor, void **bsks,
uint64_t **ksks, int_div_rem_memory<uint64_t> *mem_ptr,
uint32_t num_blocks) {
auto radix_params = mem_ptr->params;
@@ -290,7 +284,7 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, interesting_divisor.last_block(),
interesting_divisor.last_block(), bsk, ksk, 1,
interesting_divisor.last_block(), bsks, ksks, 1,
mem_ptr->masking_luts_1[shifted_mask]);
}; // trim_last_interesting_divisor_bits
@@ -318,7 +312,7 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, divisor_ms_blocks.first_block(),
divisor_ms_blocks.first_block(), bsk, ksk, 1,
divisor_ms_blocks.first_block(), bsks, ksks, 1,
mem_ptr->masking_luts_2[shifted_mask]);
}; // trim_first_divisor_ms_bits
@@ -342,7 +336,7 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
host_integer_radix_logical_scalar_shift_kb_inplace(
streams, gpu_indexes, gpu_count, interesting_remainder1.data, 1,
mem_ptr->shift_mem_1, bsk, ksk, interesting_remainder1.len);
mem_ptr->shift_mem_1, bsks, ksks, interesting_remainder1.len);
tmp_radix.clone_from(interesting_remainder1, 0,
interesting_remainder1.len - 1, streams[0],
@@ -371,41 +365,30 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
[&](cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count) {
host_integer_radix_logical_scalar_shift_kb_inplace(
streams, gpu_indexes, gpu_count, interesting_remainder2.data, 1,
mem_ptr->shift_mem_2, bsk, ksk, interesting_remainder2.len);
mem_ptr->shift_mem_2, bsks, ksks, interesting_remainder2.len);
}; // left_shift_interesting_remainder2
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
#pragma omp parallel sections
{
#pragma omp section
{
// interesting_divisor
trim_last_interesting_divisor_bits(&mem_ptr->sub_stream_1,
&gpu_indexes[0], 1);
}
#pragma omp section
{
// divisor_ms_blocks
trim_first_divisor_ms_bits(&mem_ptr->sub_stream_2, &gpu_indexes[0], 1);
}
#pragma omp section
{
// interesting_remainder1
// numerator_block_stack
left_shift_interesting_remainder1(&mem_ptr->sub_stream_3,
&gpu_indexes[0], 1);
}
#pragma omp section
{
// interesting_remainder2
left_shift_interesting_remainder2(&mem_ptr->sub_stream_4,
&gpu_indexes[0], 1);
}
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
// interesting_divisor
trim_last_interesting_divisor_bits(mem_ptr->sub_streams_1, gpu_indexes,
gpu_count);
// divisor_ms_blocks
trim_first_divisor_ms_bits(mem_ptr->sub_streams_2, gpu_indexes, gpu_count);
// interesting_remainder1
// numerator_block_stack
left_shift_interesting_remainder1(mem_ptr->sub_streams_3, gpu_indexes,
gpu_count);
// interesting_remainder2
left_shift_interesting_remainder2(mem_ptr->sub_streams_4, gpu_indexes,
gpu_count);
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(mem_ptr->sub_streams_1[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_2[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_3[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_4[j], gpu_indexes[j]);
}
cuda_synchronize_stream(mem_ptr->sub_stream_1, gpu_indexes[0]);
cuda_synchronize_stream(mem_ptr->sub_stream_2, gpu_indexes[0]);
cuda_synchronize_stream(mem_ptr->sub_stream_3, gpu_indexes[0]);
cuda_synchronize_stream(mem_ptr->sub_stream_4, gpu_indexes[0]);
// if interesting_remainder1 != 0 -> interesting_remainder2 == 0
// if interesting_remainder1 == 0 -> interesting_remainder2 != 0
@@ -435,10 +418,10 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
// `subtraction_overflowed` - single ciphertext
auto do_overflowing_sub = [&](cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count) {
host_integer_overflowing_sub_kb<Torus, params>(
host_integer_overflowing_sub_kb<Torus>(
streams, gpu_indexes, gpu_count, new_remainder.data,
subtraction_overflowed.data, merged_interesting_remainder.data,
interesting_divisor.data, bsk, ksk, mem_ptr->overflow_sub_mem,
interesting_divisor.data, bsks, ksks, mem_ptr->overflow_sub_mem,
merged_interesting_remainder.len);
};
@@ -458,7 +441,7 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
// So we can skip some stuff
host_compare_with_zero_equality(
streams, gpu_indexes, gpu_count, tmp_1.data, trivial_blocks.data,
mem_ptr->comparison_buffer, bsk, ksk, trivial_blocks.len,
mem_ptr->comparison_buffer, bsks, ksks, trivial_blocks.len,
mem_ptr->comparison_buffer->eq_buffer->is_non_zero_lut);
tmp_1.len =
@@ -467,7 +450,7 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
is_at_least_one_comparisons_block_true(
streams, gpu_indexes, gpu_count,
at_least_one_upper_block_is_non_zero.data, tmp_1.data,
mem_ptr->comparison_buffer, bsk, ksk, tmp_1.len);
mem_ptr->comparison_buffer, bsks, ksks, tmp_1.len);
}
};
@@ -480,36 +463,28 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count,
cleaned_merged_interesting_remainder.data,
cleaned_merged_interesting_remainder.data, bsk, ksk,
cleaned_merged_interesting_remainder.data, bsks, ksks,
cleaned_merged_interesting_remainder.len,
mem_ptr->message_extract_lut_1);
};
// phase 2
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
#pragma omp parallel sections
{
#pragma omp section
{
// new_remainder
// subtraction_overflowed
do_overflowing_sub(&mem_ptr->sub_stream_1, &gpu_indexes[0], 1);
}
#pragma omp section
{
// at_least_one_upper_block_is_non_zero
check_divisor_upper_blocks(&mem_ptr->sub_stream_2, &gpu_indexes[0], 1);
}
#pragma omp section
{
// cleaned_merged_interesting_remainder
create_clean_version_of_merged_remainder(&mem_ptr->sub_stream_3,
&gpu_indexes[0], 1);
}
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
// new_remainder
// subtraction_overflowed
do_overflowing_sub(mem_ptr->sub_streams_1, gpu_indexes, gpu_count);
// at_least_one_upper_block_is_non_zero
check_divisor_upper_blocks(mem_ptr->sub_streams_2, gpu_indexes, gpu_count);
// cleaned_merged_interesting_remainder
create_clean_version_of_merged_remainder(mem_ptr->sub_streams_3,
gpu_indexes, gpu_count);
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(mem_ptr->sub_streams_1[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_2[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_3[j], gpu_indexes[j]);
}
cuda_synchronize_stream(mem_ptr->sub_stream_1, gpu_indexes[0]);
cuda_synchronize_stream(mem_ptr->sub_stream_2, gpu_indexes[0]);
cuda_synchronize_stream(mem_ptr->sub_stream_3, gpu_indexes[0]);
host_addition(streams[0], gpu_indexes[0], overflow_sum.data,
subtraction_overflowed.data,
@@ -528,7 +503,7 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
streams, gpu_indexes, gpu_count,
cleaned_merged_interesting_remainder.data,
cleaned_merged_interesting_remainder.data,
overflow_sum_radix.data, bsk, ksk,
overflow_sum_radix.data, bsks, ksks,
cleaned_merged_interesting_remainder.len,
mem_ptr->zero_out_if_overflow_did_not_happen[factor_lut_id],
factor);
@@ -538,7 +513,7 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
[&](cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count) {
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, new_remainder.data,
new_remainder.data, overflow_sum_radix.data, bsk, ksk,
new_remainder.data, overflow_sum_radix.data, bsks, ksks,
new_remainder.len,
mem_ptr->zero_out_if_overflow_happened[factor_lut_id], factor);
};
@@ -548,7 +523,7 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, did_not_overflow.data,
subtraction_overflowed.data,
at_least_one_upper_block_is_non_zero.data, bsk, ksk, 1,
at_least_one_upper_block_is_non_zero.data, bsks, ksks, 1,
mem_ptr->merge_overflow_flags_luts[pos_in_block],
mem_ptr->merge_overflow_flags_luts[pos_in_block]
->params.message_modulus);
@@ -559,30 +534,22 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
did_not_overflow.data, radix_params.big_lwe_dimension, 1);
};
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
#pragma omp parallel sections
{
#pragma omp section
{
// cleaned_merged_interesting_remainder
conditionally_zero_out_merged_interesting_remainder(
&mem_ptr->sub_stream_1, &gpu_indexes[0], 1);
}
#pragma omp section
{
// new_remainder
conditionally_zero_out_merged_new_remainder(&mem_ptr->sub_stream_2,
&gpu_indexes[0], 1);
}
#pragma omp section
{
// quotient
set_quotient_bit(&mem_ptr->sub_stream_3, &gpu_indexes[0], 1);
}
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
// cleaned_merged_interesting_remainder
conditionally_zero_out_merged_interesting_remainder(mem_ptr->sub_streams_1,
gpu_indexes, gpu_count);
// new_remainder
conditionally_zero_out_merged_new_remainder(mem_ptr->sub_streams_2,
gpu_indexes, gpu_count);
// quotient
set_quotient_bit(mem_ptr->sub_streams_3, gpu_indexes, gpu_count);
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(mem_ptr->sub_streams_1[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_2[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_3[j], gpu_indexes[j]);
}
cuda_synchronize_stream(mem_ptr->sub_stream_1, gpu_indexes[0]);
cuda_synchronize_stream(mem_ptr->sub_stream_2, gpu_indexes[0]);
cuda_synchronize_stream(mem_ptr->sub_stream_3, gpu_indexes[0]);
assert(first_trivial_block - 1 == cleaned_merged_interesting_remainder.len);
assert(first_trivial_block - 1 == new_remainder.len);
@@ -601,24 +568,19 @@ host_integer_div_rem_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
remainder2.data, radix_params.big_lwe_dimension,
remainder1.len);
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
#pragma omp parallel sections
{
#pragma omp section
{
integer_radix_apply_univariate_lookup_table_kb(
&mem_ptr->sub_stream_1, &gpu_indexes[0], 1, remainder, remainder, bsk,
ksk, num_blocks, mem_ptr->message_extract_lut_1);
}
#pragma omp section
{
integer_radix_apply_univariate_lookup_table_kb(
&mem_ptr->sub_stream_2, &gpu_indexes[0], 1, quotient, quotient, bsk,
ksk, num_blocks, mem_ptr->message_extract_lut_2);
}
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
integer_radix_apply_univariate_lookup_table_kb(
mem_ptr->sub_streams_1, gpu_indexes, gpu_count, remainder, remainder,
bsks, ksks, num_blocks, mem_ptr->message_extract_lut_1);
integer_radix_apply_univariate_lookup_table_kb(
mem_ptr->sub_streams_2, gpu_indexes, gpu_count, quotient, quotient, bsks,
ksks, num_blocks, mem_ptr->message_extract_lut_2);
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(mem_ptr->sub_streams_1[j], gpu_indexes[j]);
cuda_synchronize_stream(mem_ptr->sub_streams_2[j], gpu_indexes[j]);
}
cuda_synchronize_stream(mem_ptr->sub_stream_1, gpu_indexes[0]);
cuda_synchronize_stream(mem_ptr->sub_stream_2, gpu_indexes[0]);
}
#endif // TFHE_RS_DIV_REM_CUH

View File

@@ -1,127 +1,52 @@
#include "integer/integer.cuh"
#include <linear_algebra.h>
void cuda_full_propagation_64_inplace(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *input_blocks, int8_t *mem_ptr, void *ksk, void *bsk,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t ks_base_log, uint32_t ks_level, uint32_t pbs_base_log,
uint32_t pbs_level, uint32_t grouping_factor, uint32_t num_blocks) {
void cuda_full_propagation_64_inplace(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count, void *input_blocks,
int8_t *mem_ptr, void **ksks, void **bsks,
uint32_t num_blocks) {
switch (polynomial_size) {
case 256:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<256>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 512:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<512>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 1024:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<1024>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 2048:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<2048>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 4096:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<4096>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 8192:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<8192>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 16384:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<16384>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
default:
PANIC("Cuda error (full propagation inplace): unsupported polynomial size. "
"Supported N's are powers of two"
" in the interval [256..16384].")
}
int_fullprop_buffer<uint64_t> *buffer =
(int_fullprop_buffer<uint64_t> *)mem_ptr;
host_full_propagate_inplace<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(input_blocks), buffer, (uint64_t **)(ksks), bsks,
num_blocks);
}
void scratch_cuda_full_propagation_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t grouping_factor, uint32_t input_lwe_ciphertext_count,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
bool allocate_gpu_memory) {
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t ks_level, uint32_t ks_base_log, uint32_t pbs_level,
uint32_t pbs_base_log, uint32_t grouping_factor, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
glwe_dimension * polynomial_size, lwe_dimension,
ks_level, ks_base_log, pbs_level, pbs_base_log,
grouping_factor, message_modulus, carry_modulus);
scratch_cuda_full_propagation<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
(int_fullprop_buffer<uint64_t> **)mem_ptr, lwe_dimension, glwe_dimension,
polynomial_size, level_count, grouping_factor, input_lwe_ciphertext_count,
message_modulus, carry_modulus, pbs_type, allocate_gpu_memory);
(cudaStream_t *)streams, gpu_indexes, gpu_count,
(int_fullprop_buffer<uint64_t> **)mem_ptr, params, allocate_gpu_memory);
}
void cleanup_cuda_full_propagation(void *stream, uint32_t gpu_index,
int8_t **mem_ptr_void) {
void cleanup_cuda_full_propagation(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count, int8_t **mem_ptr_void) {
int_fullprop_buffer<uint64_t> *mem_ptr =
(int_fullprop_buffer<uint64_t> *)(*mem_ptr_void);
auto s = static_cast<cudaStream_t>(stream);
cuda_drop_async(mem_ptr->lut_buffer, s, gpu_index);
cuda_drop_async(mem_ptr->lut_indexes, s, gpu_index);
cuda_drop_async(mem_ptr->lwe_indexes, s, gpu_index);
cuda_drop_async(mem_ptr->tmp_small_lwe_vector, s, gpu_index);
cuda_drop_async(mem_ptr->tmp_big_lwe_vector, s, gpu_index);
switch (mem_ptr->pbs_type) {
case CLASSICAL: {
auto x = (pbs_buffer<uint64_t, CLASSICAL> *)(mem_ptr->pbs_buffer);
x->release(s, gpu_index);
} break;
case MULTI_BIT: {
auto x = (pbs_buffer<uint64_t, MULTI_BIT> *)(mem_ptr->pbs_buffer);
x->release(s, gpu_index);
} break;
default:
PANIC("Cuda error (PBS): unsupported implementation variant.")
}
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}
void scratch_cuda_propagate_single_carry_kb_64_inplace(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_blocks, uint32_t message_modulus, uint32_t carry_modulus,
PBS_TYPE pbs_type, bool allocate_gpu_memory) {
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
uint32_t ks_base_log, uint32_t pbs_level, uint32_t pbs_base_log,
uint32_t grouping_factor, uint32_t num_blocks, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
@@ -129,35 +54,49 @@ void scratch_cuda_propagate_single_carry_kb_64_inplace(
message_modulus, carry_modulus);
scratch_cuda_propagate_single_carry_kb_inplace(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_sc_prop_memory<uint64_t> **)mem_ptr, num_blocks, params,
allocate_gpu_memory);
}
void cuda_propagate_single_carry_kb_64_inplace(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, void *lwe_array,
int8_t *mem_ptr, void *bsk, void *ksk, uint32_t num_blocks) {
void *carry_out, int8_t *mem_ptr, void **bsks, void **ksks,
uint32_t num_blocks) {
host_propagate_single_carry<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lwe_array),
(int_sc_prop_memory<uint64_t> *)mem_ptr, bsk,
static_cast<uint64_t *>(ksk), num_blocks);
static_cast<uint64_t *>(lwe_array), static_cast<uint64_t *>(carry_out),
nullptr, (int_sc_prop_memory<uint64_t> *)mem_ptr, bsks,
(uint64_t **)(ksks), num_blocks);
}
void cleanup_cuda_propagate_single_carry(void *stream, uint32_t gpu_index,
void cuda_propagate_single_carry_get_input_carries_kb_64_inplace(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, void *lwe_array,
void *carry_out, void *input_carries, int8_t *mem_ptr, void **bsks,
void **ksks, uint32_t num_blocks) {
host_propagate_single_carry<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lwe_array), static_cast<uint64_t *>(carry_out),
static_cast<uint64_t *>(input_carries),
(int_sc_prop_memory<uint64_t> *)mem_ptr, bsks, (uint64_t **)(ksks),
num_blocks);
}
void cleanup_cuda_propagate_single_carry(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_sc_prop_memory<uint64_t> *mem_ptr =
(int_sc_prop_memory<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}
void scratch_cuda_apply_univariate_lut_kb_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, void *input_lut,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t ks_level, uint32_t ks_base_log, uint32_t pbs_level,
uint32_t pbs_base_log, uint32_t grouping_factor, uint32_t num_radix_blocks,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
bool allocate_gpu_memory) {
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
void *input_lut, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_radix_blocks, uint32_t message_modulus, uint32_t carry_modulus,
PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
glwe_dimension * polynomial_size, lwe_dimension,
@@ -165,7 +104,7 @@ void scratch_cuda_apply_univariate_lut_kb_64(
grouping_factor, message_modulus, carry_modulus);
scratch_cuda_apply_univariate_lut_kb<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_radix_lut<uint64_t> **)mem_ptr, static_cast<uint64_t *>(input_lut),
num_radix_blocks, params, allocate_gpu_memory);
}
@@ -173,19 +112,116 @@ void scratch_cuda_apply_univariate_lut_kb_64(
void cuda_apply_univariate_lut_kb_64(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count, void *output_radix_lwe,
void *input_radix_lwe, int8_t *mem_ptr,
void *ksk, void *bsk,
void **ksks, void **bsks,
uint32_t num_blocks) {
host_apply_univariate_lut_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(output_radix_lwe),
static_cast<uint64_t *>(input_radix_lwe),
(int_radix_lut<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk), bsk,
(int_radix_lut<uint64_t> *)mem_ptr, (uint64_t **)(ksks), bsks,
num_blocks);
}
void cleanup_cuda_apply_univariate_lut_kb_64(void *stream, uint32_t gpu_index,
void cleanup_cuda_apply_univariate_lut_kb_64(void **streams,
uint32_t *gpu_indexes,
uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_radix_lut<uint64_t> *mem_ptr = (int_radix_lut<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}
void scratch_cuda_apply_bivariate_lut_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
void *input_lut, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_radix_blocks, uint32_t message_modulus, uint32_t carry_modulus,
PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
glwe_dimension * polynomial_size, lwe_dimension,
ks_level, ks_base_log, pbs_level, pbs_base_log,
grouping_factor, message_modulus, carry_modulus);
scratch_cuda_apply_bivariate_lut_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_radix_lut<uint64_t> **)mem_ptr, static_cast<uint64_t *>(input_lut),
num_radix_blocks, params, allocate_gpu_memory);
}
void cuda_apply_bivariate_lut_kb_64(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count, void *output_radix_lwe,
void *input_radix_lwe_1,
void *input_radix_lwe_2, int8_t *mem_ptr,
void **ksks, void **bsks,
uint32_t num_blocks, uint32_t shift) {
host_apply_bivariate_lut_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(output_radix_lwe),
static_cast<uint64_t *>(input_radix_lwe_1),
static_cast<uint64_t *>(input_radix_lwe_2),
(int_radix_lut<uint64_t> *)mem_ptr, (uint64_t **)(ksks), bsks, num_blocks,
shift);
}
void cleanup_cuda_apply_bivariate_lut_kb_64(void **streams,
uint32_t *gpu_indexes,
uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_radix_lut<uint64_t> *mem_ptr = (int_radix_lut<uint64_t> *)(*mem_ptr_void);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}
void scratch_cuda_integer_compute_prefix_sum_hillis_steele_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
void *input_lut, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_radix_blocks, uint32_t message_modulus, uint32_t carry_modulus,
PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
glwe_dimension * polynomial_size, lwe_dimension,
ks_level, ks_base_log, pbs_level, pbs_base_log,
grouping_factor, message_modulus, carry_modulus);
scratch_cuda_apply_bivariate_lut_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_radix_lut<uint64_t> **)mem_ptr, static_cast<uint64_t *>(input_lut),
num_radix_blocks, params, allocate_gpu_memory);
}
void cuda_integer_compute_prefix_sum_hillis_steele_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *output_radix_lwe, void *input_radix_lwe, int8_t *mem_ptr, void **ksks,
void **bsks, uint32_t num_blocks, uint32_t shift) {
int_radix_params params = ((int_radix_lut<uint64_t> *)mem_ptr)->params;
host_compute_prefix_sum_hillis_steele<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(output_radix_lwe),
static_cast<uint64_t *>(input_radix_lwe), params,
(int_radix_lut<uint64_t> *)mem_ptr, bsks, (uint64_t **)(ksks),
num_blocks);
}
void cleanup_cuda_integer_compute_prefix_sum_hillis_steele_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_radix_lut<uint64_t> *mem_ptr = (int_radix_lut<uint64_t> *)(*mem_ptr_void);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}
void cuda_integer_reverse_blocks_64_inplace(void **streams,
uint32_t *gpu_indexes,
uint32_t gpu_count, void *lwe_array,
uint32_t num_blocks,
uint32_t lwe_size) {
host_radix_blocks_reverse_inplace<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes,
static_cast<uint64_t *>(lwe_array), num_blocks, lwe_size);
}

View File

@@ -3,6 +3,7 @@
#include "crypto/keyswitch.cuh"
#include "device.h"
#include "helper_multi_gpu.h"
#include "integer.h"
#include "integer/scalar_addition.cuh"
#include "linear_algebra.h"
@@ -10,6 +11,7 @@
#include "polynomial/functions.cuh"
#include "programmable_bootstrap.h"
#include "utils/helper.cuh"
#include "utils/helper_multi_gpu.cuh"
#include "utils/kernel_dimensions.cuh"
#include <functional>
@@ -20,18 +22,19 @@ template <typename Torus>
__global__ void radix_blocks_rotate_right(Torus *dst, Torus *src,
uint32_t value, uint32_t blocks_count,
uint32_t lwe_size) {
value %= blocks_count;
size_t tid = threadIdx.x;
size_t src_block_id = blockIdx.x;
size_t dst_block_id = (src_block_id + value) % blocks_count;
size_t stride = blockDim.x;
if (tid < lwe_size) {
value %= blocks_count;
size_t src_block_id = blockIdx.x;
size_t dst_block_id = (src_block_id + value) % blocks_count;
size_t stride = blockDim.x;
auto cur_src_block = &src[src_block_id * lwe_size];
auto cur_dst_block = &dst[dst_block_id * lwe_size];
auto cur_src_block = &src[src_block_id * lwe_size];
auto cur_dst_block = &dst[dst_block_id * lwe_size];
for (size_t i = tid; i < lwe_size; i += stride) {
cur_dst_block[i] = cur_src_block[i];
for (size_t i = tid; i < lwe_size; i += stride) {
cur_dst_block[i] = cur_src_block[i];
}
}
}
@@ -42,25 +45,28 @@ template <typename Torus>
__global__ void radix_blocks_rotate_left(Torus *dst, Torus *src, uint32_t value,
uint32_t blocks_count,
uint32_t lwe_size) {
value %= blocks_count;
size_t src_block_id = blockIdx.x;
size_t tid = threadIdx.x;
size_t dst_block_id = (src_block_id >= value)
? src_block_id - value
: src_block_id - value + blocks_count;
size_t stride = blockDim.x;
if (tid < lwe_size) {
value %= blocks_count;
size_t src_block_id = blockIdx.x;
auto cur_src_block = &src[src_block_id * lwe_size];
auto cur_dst_block = &dst[dst_block_id * lwe_size];
size_t dst_block_id = (src_block_id >= value)
? src_block_id - value
: src_block_id - value + blocks_count;
size_t stride = blockDim.x;
for (size_t i = tid; i < lwe_size; i += stride) {
cur_dst_block[i] = cur_src_block[i];
auto cur_src_block = &src[src_block_id * lwe_size];
auto cur_dst_block = &dst[dst_block_id * lwe_size];
for (size_t i = tid; i < lwe_size; i += stride) {
cur_dst_block[i] = cur_src_block[i];
}
}
}
// rotate radix ciphertext right with specific value
// calculation is not inplace, so `dst` and `src` must not be the same
// one block is responsible to process single lwe ciphertext
template <typename Torus>
__host__ void
host_radix_blocks_rotate_right(cudaStream_t *streams, uint32_t *gpu_indexes,
@@ -72,7 +78,7 @@ host_radix_blocks_rotate_right(cudaStream_t *streams, uint32_t *gpu_indexes,
"pointers should be different");
}
cudaSetDevice(gpu_indexes[0]);
radix_blocks_rotate_right<<<blocks_count, 256, 0, streams[0]>>>(
radix_blocks_rotate_right<<<blocks_count, 1024, 0, streams[0]>>>(
dst, src, value, blocks_count, lwe_size);
}
@@ -89,10 +95,39 @@ host_radix_blocks_rotate_left(cudaStream_t *streams, uint32_t *gpu_indexes,
"pointers should be different");
}
cudaSetDevice(gpu_indexes[0]);
radix_blocks_rotate_left<<<blocks_count, 256, 0, streams[0]>>>(
radix_blocks_rotate_left<<<blocks_count, 1024, 0, streams[0]>>>(
dst, src, value, blocks_count, lwe_size);
}
// reverse the blocks in a list
// each cuda block swaps a couple of blocks
template <typename Torus>
__global__ void radix_blocks_reverse_lwe_inplace(Torus *src,
uint32_t blocks_count,
uint32_t lwe_size) {
size_t idx = blockIdx.x;
size_t rev_idx = blocks_count - 1 - idx;
for (int j = threadIdx.x; j < lwe_size; j += blockDim.x) {
Torus back_element = src[rev_idx * lwe_size + j];
Torus front_element = src[idx * lwe_size + j];
src[idx * lwe_size + j] = back_element;
src[rev_idx * lwe_size + j] = front_element;
}
}
template <typename Torus>
__host__ void
host_radix_blocks_reverse_inplace(cudaStream_t *streams, uint32_t *gpu_indexes,
Torus *src, uint32_t blocks_count,
uint32_t lwe_size) {
cudaSetDevice(gpu_indexes[0]);
int num_blocks = blocks_count / 2, num_threads = 1024;
radix_blocks_reverse_lwe_inplace<<<num_blocks, num_threads, 0, streams[0]>>>(
src, blocks_count, lwe_size);
}
// polynomial_size threads
template <typename Torus>
__global__ void
@@ -138,7 +173,7 @@ __host__ void pack_bivariate_blocks(cudaStream_t *streams,
template <typename Torus>
__host__ void integer_radix_apply_univariate_lookup_table_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, void *bsk, Torus *ksk,
Torus *lwe_array_out, Torus *lwe_array_in, void **bsks, Torus **ksks,
uint32_t num_radix_blocks, int_radix_lut<Torus> *lut) {
// apply_lookup_table
auto params = lut->params;
@@ -153,30 +188,77 @@ __host__ void integer_radix_apply_univariate_lookup_table_kb(
auto polynomial_size = params.polynomial_size;
auto grouping_factor = params.grouping_factor;
// Compute Keyswitch-PBS
cuda_keyswitch_lwe_ciphertext_vector(
streams[0], gpu_indexes[0], lut->tmp_lwe_after_ks,
lut->lwe_trivial_indexes, lwe_array_in, lut->lwe_indexes_in, ksk,
big_lwe_dimension, small_lwe_dimension, ks_base_log, ks_level,
num_radix_blocks);
/// For multi GPU execution we create vectors of pointers for inputs and
/// outputs
std::vector<Torus *> lwe_array_in_vec = lut->lwe_array_in_vec;
std::vector<Torus *> lwe_after_ks_vec = lut->lwe_after_ks_vec;
std::vector<Torus *> lwe_after_pbs_vec = lut->lwe_after_pbs_vec;
std::vector<Torus *> lwe_trivial_indexes_vec = lut->lwe_trivial_indexes_vec;
execute_pbs<Torus>(streams, gpu_indexes, gpu_count, lwe_array_out,
lut->lwe_indexes_out, lut->lut, lut->lut_indexes,
lut->tmp_lwe_after_ks, lut->lwe_trivial_indexes, bsk,
lut->buffer, glwe_dimension, small_lwe_dimension,
polynomial_size, pbs_base_log, pbs_level, grouping_factor,
num_radix_blocks, 1, 0,
cuda_get_max_shared_memory(gpu_indexes[0]), pbs_type);
auto active_gpu_count = get_active_gpu_count(num_radix_blocks, gpu_count);
if (active_gpu_count == 1) {
execute_keyswitch_async<Torus>(streams, gpu_indexes, 1, lwe_after_ks_vec[0],
lwe_trivial_indexes_vec[0], lwe_array_in,
lut->lwe_indexes_in, ksks, big_lwe_dimension,
small_lwe_dimension, ks_base_log, ks_level,
num_radix_blocks);
/// Apply PBS to apply a LUT, reduce the noise and go from a small LWE
/// dimension to a big LWE dimension
execute_pbs_async<Torus>(
streams, gpu_indexes, 1, lwe_array_out, lut->lwe_indexes_out,
lut->lut_vec, lut->lut_indexes_vec, lwe_after_ks_vec[0],
lwe_trivial_indexes_vec[0], bsks, lut->buffer, glwe_dimension,
small_lwe_dimension, polynomial_size, pbs_base_log, pbs_level,
grouping_factor, num_radix_blocks, pbs_type);
} else {
/// Make sure all data that should be on GPU 0 is indeed there
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
/// With multiple GPUs we push to the vectors on each GPU then when we
/// gather data to GPU 0 we can copy back to the original indexing
multi_gpu_scatter_lwe_async<Torus>(
streams, gpu_indexes, active_gpu_count, lwe_array_in_vec, lwe_array_in,
lut->h_lwe_indexes_in, lut->using_trivial_lwe_indexes, num_radix_blocks,
big_lwe_dimension + 1);
/// Apply KS to go from a big LWE dimension to a small LWE dimension
execute_keyswitch_async<Torus>(streams, gpu_indexes, active_gpu_count,
lwe_after_ks_vec, lwe_trivial_indexes_vec,
lwe_array_in_vec, lwe_trivial_indexes_vec,
ksks, big_lwe_dimension, small_lwe_dimension,
ks_base_log, ks_level, num_radix_blocks);
/// Apply PBS to apply a LUT, reduce the noise and go from a small LWE
/// dimension to a big LWE dimension
execute_pbs_async<Torus>(
streams, gpu_indexes, active_gpu_count, lwe_after_pbs_vec,
lwe_trivial_indexes_vec, lut->lut_vec, lut->lut_indexes_vec,
lwe_after_ks_vec, lwe_trivial_indexes_vec, bsks, lut->buffer,
glwe_dimension, small_lwe_dimension, polynomial_size, pbs_base_log,
pbs_level, grouping_factor, num_radix_blocks, pbs_type);
/// Copy data back to GPU 0 and release vecs
multi_gpu_gather_lwe_async<Torus>(streams, gpu_indexes, active_gpu_count,
lwe_array_out, lwe_after_pbs_vec,
lut->h_lwe_indexes_out,
lut->using_trivial_lwe_indexes,
num_radix_blocks, big_lwe_dimension + 1);
/// Synchronize all GPUs
for (uint i = 0; i < active_gpu_count; i++) {
cuda_synchronize_stream(streams[i], gpu_indexes[i]);
}
}
}
template <typename Torus>
__host__ void integer_radix_apply_bivariate_lookup_table_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_1, Torus *lwe_array_2, void *bsk,
Torus *ksk, uint32_t num_radix_blocks, int_radix_lut<Torus> *lut,
Torus *lwe_array_out, Torus *lwe_array_1, Torus *lwe_array_2, void **bsks,
Torus **ksks, uint32_t num_radix_blocks, int_radix_lut<Torus> *lut,
uint32_t shift) {
cudaSetDevice(gpu_indexes[0]);
// apply_lookup_table_bivariate
auto params = lut->params;
auto pbs_type = params.pbs_type;
auto big_lwe_dimension = params.big_lwe_dimension;
@@ -188,7 +270,6 @@ __host__ void integer_radix_apply_bivariate_lookup_table_kb(
auto glwe_dimension = params.glwe_dimension;
auto polynomial_size = params.polynomial_size;
auto grouping_factor = params.grouping_factor;
auto message_modulus = params.message_modulus;
// Left message is shifted
auto lwe_array_pbs_in = lut->tmp_lwe_before_ks;
@@ -198,20 +279,64 @@ __host__ void integer_radix_apply_bivariate_lookup_table_kb(
num_radix_blocks);
check_cuda_error(cudaGetLastError());
// Apply LUT
cuda_keyswitch_lwe_ciphertext_vector(
streams[0], gpu_indexes[0], lut->tmp_lwe_after_ks,
lut->lwe_trivial_indexes, lwe_array_pbs_in, lut->lwe_trivial_indexes, ksk,
big_lwe_dimension, small_lwe_dimension, ks_base_log, ks_level,
num_radix_blocks);
/// For multi GPU execution we create vectors of pointers for inputs and
/// outputs
std::vector<Torus *> lwe_array_in_vec = lut->lwe_array_in_vec;
std::vector<Torus *> lwe_after_ks_vec = lut->lwe_after_ks_vec;
std::vector<Torus *> lwe_after_pbs_vec = lut->lwe_after_pbs_vec;
std::vector<Torus *> lwe_trivial_indexes_vec = lut->lwe_trivial_indexes_vec;
execute_pbs<Torus>(streams, gpu_indexes, gpu_count, lwe_array_out,
lut->lwe_indexes_out, lut->lut, lut->lut_indexes,
lut->tmp_lwe_after_ks, lut->lwe_trivial_indexes, bsk,
lut->buffer, glwe_dimension, small_lwe_dimension,
polynomial_size, pbs_base_log, pbs_level, grouping_factor,
num_radix_blocks, 1, 0,
cuda_get_max_shared_memory(gpu_indexes[0]), pbs_type);
auto active_gpu_count = get_active_gpu_count(num_radix_blocks, gpu_count);
if (active_gpu_count == 1) {
execute_keyswitch_async<Torus>(streams, gpu_indexes, 1, lwe_after_ks_vec[0],
lwe_trivial_indexes_vec[0], lwe_array_pbs_in,
lut->lwe_indexes_in, ksks, big_lwe_dimension,
small_lwe_dimension, ks_base_log, ks_level,
num_radix_blocks);
/// Apply PBS to apply a LUT, reduce the noise and go from a small LWE
/// dimension to a big LWE dimension
execute_pbs_async<Torus>(
streams, gpu_indexes, 1, lwe_array_out, lut->lwe_indexes_out,
lut->lut_vec, lut->lut_indexes_vec, lwe_after_ks_vec[0],
lwe_trivial_indexes_vec[0], bsks, lut->buffer, glwe_dimension,
small_lwe_dimension, polynomial_size, pbs_base_log, pbs_level,
grouping_factor, num_radix_blocks, pbs_type);
} else {
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
multi_gpu_scatter_lwe_async<Torus>(
streams, gpu_indexes, active_gpu_count, lwe_array_in_vec,
lwe_array_pbs_in, lut->h_lwe_indexes_in, lut->using_trivial_lwe_indexes,
num_radix_blocks, big_lwe_dimension + 1);
/// Apply KS to go from a big LWE dimension to a small LWE dimension
execute_keyswitch_async<Torus>(streams, gpu_indexes, active_gpu_count,
lwe_after_ks_vec, lwe_trivial_indexes_vec,
lwe_array_in_vec, lwe_trivial_indexes_vec,
ksks, big_lwe_dimension, small_lwe_dimension,
ks_base_log, ks_level, num_radix_blocks);
/// Apply PBS to apply a LUT, reduce the noise and go from a small LWE
/// dimension to a big LWE dimension
execute_pbs_async<Torus>(
streams, gpu_indexes, active_gpu_count, lwe_after_pbs_vec,
lwe_trivial_indexes_vec, lut->lut_vec, lut->lut_indexes_vec,
lwe_after_ks_vec, lwe_trivial_indexes_vec, bsks, lut->buffer,
glwe_dimension, small_lwe_dimension, polynomial_size, pbs_base_log,
pbs_level, grouping_factor, num_radix_blocks, pbs_type);
/// Copy data back to GPU 0 and release vecs
multi_gpu_gather_lwe_async<Torus>(streams, gpu_indexes, active_gpu_count,
lwe_array_out, lwe_after_pbs_vec,
lut->h_lwe_indexes_out,
lut->using_trivial_lwe_indexes,
num_radix_blocks, big_lwe_dimension + 1);
/// Synchronize all GPUs
for (uint i = 0; i < active_gpu_count; i++) {
cuda_synchronize_stream(streams[i], gpu_indexes[i]);
}
}
}
// Rotates the slice in-place such that the first mid elements of the slice move
@@ -308,7 +433,6 @@ void generate_device_accumulator_bivariate(
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t message_modulus,
uint32_t carry_modulus, std::function<Torus(Torus, Torus)> f) {
cudaSetDevice(gpu_index);
// host lut
Torus *h_lut =
(Torus *)malloc((glwe_dimension + 1) * polynomial_size * sizeof(Torus));
@@ -317,15 +441,15 @@ void generate_device_accumulator_bivariate(
generate_lookup_table_bivariate<Torus>(h_lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, f);
// copy host lut and lut_indexes to device
// copy host lut and lut_indexes_vec to device
cuda_synchronize_stream(stream, gpu_index);
cuda_memcpy_async_to_gpu(acc_bivariate, h_lut,
(glwe_dimension + 1) * polynomial_size *
sizeof(Torus),
stream, gpu_index);
// Release memory when possible
cuda_stream_add_callback(stream, gpu_index, host_free_on_stream_callback,
h_lut);
cuda_synchronize_stream(stream, gpu_index);
free(h_lut);
}
/*
@@ -341,7 +465,6 @@ void generate_device_accumulator_bivariate_with_factor(
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t message_modulus,
uint32_t carry_modulus, std::function<Torus(Torus, Torus)> f, int factor) {
cudaSetDevice(gpu_index);
// host lut
Torus *h_lut =
(Torus *)malloc((glwe_dimension + 1) * polynomial_size * sizeof(Torus));
@@ -351,15 +474,15 @@ void generate_device_accumulator_bivariate_with_factor(
h_lut, glwe_dimension, polynomial_size, message_modulus, carry_modulus, f,
factor);
// copy host lut and lut_indexes to device
cuda_synchronize_stream(stream, gpu_index);
// copy host lut and lut_indexes_vec to device
cuda_memcpy_async_to_gpu(acc_bivariate, h_lut,
(glwe_dimension + 1) * polynomial_size *
sizeof(Torus),
stream, gpu_index);
// Release memory when possible
cuda_stream_add_callback(stream, gpu_index, host_free_on_stream_callback,
h_lut);
cuda_synchronize_stream(stream, gpu_index);
free(h_lut);
}
/*
@@ -377,7 +500,6 @@ void generate_device_accumulator(cudaStream_t stream, uint32_t gpu_index,
uint32_t carry_modulus,
std::function<Torus(Torus)> f) {
cudaSetDevice(gpu_index);
// host lut
Torus *h_lut =
(Torus *)malloc((glwe_dimension + 1) * polynomial_size * sizeof(Torus));
@@ -386,32 +508,70 @@ void generate_device_accumulator(cudaStream_t stream, uint32_t gpu_index,
generate_lookup_table<Torus>(h_lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, f);
// copy host lut and lut_indexes to device
cuda_synchronize_stream(stream, gpu_index);
// copy host lut and lut_indexes_vec to device
cuda_memcpy_async_to_gpu(
acc, h_lut, (glwe_dimension + 1) * polynomial_size * sizeof(Torus),
stream, gpu_index);
// Release memory when possible
cuda_stream_add_callback(stream, gpu_index, host_free_on_stream_callback,
h_lut);
cuda_synchronize_stream(stream, gpu_index);
free(h_lut);
}
template <typename Torus>
void scratch_cuda_propagate_single_carry_kb_inplace(
cudaStream_t stream, uint32_t gpu_index,
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_sc_prop_memory<Torus> **mem_ptr, uint32_t num_radix_blocks,
int_radix_params params, bool allocate_gpu_memory) {
cudaSetDevice(gpu_index);
*mem_ptr = new int_sc_prop_memory<Torus>(
stream, gpu_index, params, num_radix_blocks, allocate_gpu_memory);
*mem_ptr =
new int_sc_prop_memory<Torus>(streams, gpu_indexes, gpu_count, params,
num_radix_blocks, allocate_gpu_memory);
}
template <typename Torus>
void host_compute_prefix_sum_hillis_steele(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *step_output, Torus *generates_or_propagates, int_radix_params params,
int_radix_lut<Torus> *luts, void **bsks, Torus **ksks,
uint32_t num_blocks) {
auto glwe_dimension = params.glwe_dimension;
auto polynomial_size = params.polynomial_size;
auto big_lwe_size = glwe_dimension * polynomial_size + 1;
auto big_lwe_size_bytes = big_lwe_size * sizeof(Torus);
int num_steps = ceil(log2((double)num_blocks));
int space = 1;
cuda_memcpy_async_gpu_to_gpu(step_output, generates_or_propagates,
big_lwe_size_bytes * num_blocks, streams[0],
gpu_indexes[0]);
for (int step = 0; step < num_steps; step++) {
if (space > num_blocks - 1)
PANIC("Cuda error: step output is going out of bounds in Hillis Steele "
"propagation")
auto cur_blocks = &step_output[space * big_lwe_size];
auto prev_blocks = generates_or_propagates;
int cur_total_blocks = num_blocks - space;
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, cur_blocks, cur_blocks, prev_blocks,
bsks, ksks, cur_total_blocks, luts, luts->params.message_modulus);
cuda_memcpy_async_gpu_to_gpu(
&generates_or_propagates[space * big_lwe_size], cur_blocks,
big_lwe_size_bytes * cur_total_blocks, streams[0], gpu_indexes[0]);
space *= 2;
}
}
template <typename Torus>
void host_propagate_single_carry(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array,
int_sc_prop_memory<Torus> *mem, void *bsk,
Torus *ksk, uint32_t num_blocks) {
Torus *carry_out, Torus *input_carries,
int_sc_prop_memory<Torus> *mem, void **bsks,
Torus **ksks, uint32_t num_blocks) {
auto params = mem->params;
auto glwe_dimension = params.glwe_dimension;
auto polynomial_size = params.polynomial_size;
@@ -426,58 +586,58 @@ void host_propagate_single_carry(cudaStream_t *streams, uint32_t *gpu_indexes,
auto message_acc = mem->message_acc;
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, generates_or_propagates, lwe_array, bsk,
ksk, num_blocks, luts_array);
streams, gpu_indexes, gpu_count, generates_or_propagates, lwe_array, bsks,
ksks, num_blocks, luts_array);
// compute prefix sum with hillis&steele
host_compute_prefix_sum_hillis_steele(
streams, gpu_indexes, gpu_count, step_output, generates_or_propagates,
params, luts_carry_propagation_sum, bsks, ksks, num_blocks);
int num_steps = ceil(log2((double)num_blocks));
int space = 1;
cudaSetDevice(gpu_indexes[0]);
cuda_memcpy_async_gpu_to_gpu(step_output, generates_or_propagates,
big_lwe_size_bytes * num_blocks, streams[0],
gpu_indexes[0]);
for (int step = 0; step < num_steps; step++) {
auto cur_blocks = &step_output[space * big_lwe_size];
auto prev_blocks = generates_or_propagates;
int cur_total_blocks = num_blocks - space;
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, cur_blocks, cur_blocks, prev_blocks,
bsk, ksk, cur_total_blocks, luts_carry_propagation_sum,
luts_carry_propagation_sum->params.message_modulus);
cudaSetDevice(gpu_indexes[0]);
cuda_memcpy_async_gpu_to_gpu(
&generates_or_propagates[space * big_lwe_size], cur_blocks,
big_lwe_size_bytes * cur_total_blocks, streams[0], gpu_indexes[0]);
space *= 2;
}
cudaSetDevice(gpu_indexes[0]);
host_radix_blocks_rotate_right(streams, gpu_indexes, gpu_count, step_output,
generates_or_propagates, 1, num_blocks,
big_lwe_size);
if (carry_out != nullptr) {
cuda_memcpy_async_gpu_to_gpu(carry_out, step_output, big_lwe_size_bytes,
streams[0], gpu_indexes[0]);
}
cuda_memset_async(step_output, 0, big_lwe_size_bytes, streams[0],
gpu_indexes[0]);
if (input_carries != nullptr) {
cuda_memcpy_async_gpu_to_gpu(input_carries, step_output,
big_lwe_size_bytes * num_blocks, streams[0],
gpu_indexes[0]);
}
host_addition(streams[0], gpu_indexes[0], lwe_array, lwe_array, step_output,
glwe_dimension * polynomial_size, num_blocks);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array, lwe_array, bsk, ksk,
streams, gpu_indexes, gpu_count, lwe_array, lwe_array, bsks, ksks,
num_blocks, message_acc);
}
template <typename Torus>
void host_generate_last_block_inner_propagation(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *last_block_inner_propagation, Torus *lhs, Torus *rhs,
int_last_block_inner_propagate_memory<Torus> *mem, void **bsks,
Torus **ksks) {
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, last_block_inner_propagation, lhs, rhs,
bsks, ksks, 1, mem->last_block_inner_propagation_lut,
mem->params.message_modulus);
}
template <typename Torus>
void host_propagate_single_sub_borrow(cudaStream_t *streams,
uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *overflowed, Torus *lwe_array,
int_single_borrow_prop_memory<Torus> *mem,
void *bsk, Torus *ksk,
int_overflowing_sub_memory<Torus> *mem,
void **bsks, Torus **ksks,
uint32_t num_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem->params;
auto glwe_dimension = params.glwe_dimension;
auto polynomial_size = params.polynomial_size;
@@ -492,31 +652,13 @@ void host_propagate_single_sub_borrow(cudaStream_t *streams,
auto message_acc = mem->message_acc;
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, generates_or_propagates, lwe_array, bsk,
ksk, num_blocks, luts_array);
streams, gpu_indexes, gpu_count, generates_or_propagates, lwe_array, bsks,
ksks, num_blocks, luts_array);
// compute prefix sum with hillis&steele
int num_steps = ceil(log2((double)num_blocks));
int space = 1;
cuda_memcpy_async_gpu_to_gpu(step_output, generates_or_propagates,
big_lwe_size_bytes * num_blocks, streams[0],
gpu_indexes[0]);
for (int step = 0; step < num_steps; step++) {
auto cur_blocks = &step_output[space * big_lwe_size];
auto prev_blocks = generates_or_propagates;
int cur_total_blocks = num_blocks - space;
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, cur_blocks, cur_blocks, prev_blocks,
bsk, ksk, cur_total_blocks, luts_carry_propagation_sum,
luts_carry_propagation_sum->params.message_modulus);
cuda_memcpy_async_gpu_to_gpu(
&generates_or_propagates[space * big_lwe_size], cur_blocks,
big_lwe_size_bytes * cur_total_blocks, streams[0], gpu_indexes[0]);
space *= 2;
}
host_compute_prefix_sum_hillis_steele<Torus>(
streams, gpu_indexes, gpu_count, step_output, generates_or_propagates,
params, luts_carry_propagation_sum, bsks, ksks, num_blocks);
cuda_memcpy_async_gpu_to_gpu(
overflowed, &generates_or_propagates[big_lwe_size * (num_blocks - 1)],
@@ -532,54 +674,52 @@ void host_propagate_single_sub_borrow(cudaStream_t *streams,
step_output, glwe_dimension * polynomial_size, num_blocks);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array, lwe_array, bsk, ksk,
streams, gpu_indexes, gpu_count, lwe_array, lwe_array, bsks, ksks,
num_blocks, message_acc);
}
/*
* input_blocks: input radix ciphertext propagation will happen inplace
* acc_message_carry: list of two lut s, [(message_acc), (carry_acc)]
* lut_indexes_message_carry: lut_indexes for message and carry, should always
* be {0, 1} small_lwe_vector: output of keyswitch should have size = 2 *
* (lwe_dimension + 1) * sizeof(Torus) big_lwe_vector: output of pbs should have
* size = 2 * (glwe_dimension * polynomial_size + 1) * sizeof(Torus)
* lut_indexes_message_carry: lut_indexes_vec for message and carry, should
* always be {0, 1} small_lwe_vector: output of keyswitch should have size = 2
* * (lwe_dimension + 1) * sizeof(Torus) big_lwe_vector: output of pbs should
* have size = 2 * (glwe_dimension * polynomial_size + 1) * sizeof(Torus)
*/
template <typename Torus, typename STorus, class params>
template <typename Torus>
void host_full_propagate_inplace(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *input_blocks,
int_fullprop_buffer<Torus> *mem_ptr,
Torus *ksk, void *bsk, uint32_t lwe_dimension,
uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t ks_base_log,
uint32_t ks_level, uint32_t pbs_base_log,
uint32_t pbs_level, uint32_t grouping_factor,
Torus **ksks, void **bsks,
uint32_t num_blocks) {
auto params = mem_ptr->lut->params;
cudaSetDevice(gpu_indexes[0]);
int big_lwe_size = (glwe_dimension * polynomial_size + 1);
int small_lwe_size = (lwe_dimension + 1);
int big_lwe_size = (params.glwe_dimension * params.polynomial_size + 1);
int small_lwe_size = (params.small_lwe_dimension + 1);
for (int i = 0; i < num_blocks; i++) {
auto cur_input_block = &input_blocks[i * big_lwe_size];
cuda_keyswitch_lwe_ciphertext_vector<Torus>(
streams[0], gpu_indexes[0], mem_ptr->tmp_small_lwe_vector,
mem_ptr->lwe_indexes, cur_input_block, mem_ptr->lwe_indexes, ksk,
polynomial_size * glwe_dimension, lwe_dimension, ks_base_log, ks_level,
1);
/// Since the keyswitch is done on one input only, use only 1 GPU
execute_keyswitch_async<Torus>(
streams, gpu_indexes, 1, mem_ptr->tmp_small_lwe_vector,
mem_ptr->lut->lwe_trivial_indexes, cur_input_block,
mem_ptr->lut->lwe_trivial_indexes, ksks, params.big_lwe_dimension,
params.small_lwe_dimension, params.ks_base_log, params.ks_level, 1);
cuda_memcpy_async_gpu_to_gpu(&mem_ptr->tmp_small_lwe_vector[small_lwe_size],
mem_ptr->tmp_small_lwe_vector,
small_lwe_size * sizeof(Torus), streams[0],
gpu_indexes[0]);
execute_pbs<Torus>(
execute_pbs_async<Torus>(
streams, gpu_indexes, 1, mem_ptr->tmp_big_lwe_vector,
mem_ptr->lwe_indexes, mem_ptr->lut_buffer, mem_ptr->lut_indexes,
mem_ptr->tmp_small_lwe_vector, mem_ptr->lwe_indexes, bsk,
mem_ptr->pbs_buffer, glwe_dimension, lwe_dimension, polynomial_size,
pbs_base_log, pbs_level, grouping_factor, 2, 2, 0,
cuda_get_max_shared_memory(gpu_indexes[0]), mem_ptr->pbs_type);
mem_ptr->lut->lwe_trivial_indexes, mem_ptr->lut->lut_vec,
mem_ptr->lut->lut_indexes_vec, mem_ptr->tmp_small_lwe_vector,
mem_ptr->lut->lwe_trivial_indexes, bsks, mem_ptr->lut->buffer,
params.glwe_dimension, params.small_lwe_dimension,
params.polynomial_size, params.pbs_base_log, params.pbs_level,
params.grouping_factor, 2, params.pbs_type);
cuda_memcpy_async_gpu_to_gpu(cur_input_block, mem_ptr->tmp_big_lwe_vector,
big_lwe_size * sizeof(Torus), streams[0],
@@ -590,108 +730,20 @@ void host_full_propagate_inplace(cudaStream_t *streams, uint32_t *gpu_indexes,
host_addition(streams[0], gpu_indexes[0], next_input_block,
next_input_block,
&mem_ptr->tmp_big_lwe_vector[big_lwe_size],
glwe_dimension * polynomial_size, 1);
params.big_lwe_dimension, 1);
}
}
}
template <typename Torus>
void scratch_cuda_full_propagation(
cudaStream_t stream, uint32_t gpu_index,
int_fullprop_buffer<Torus> **mem_ptr, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t pbs_level,
uint32_t grouping_factor, uint32_t num_radix_blocks,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
bool allocate_gpu_memory) {
void scratch_cuda_full_propagation(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count,
int_fullprop_buffer<Torus> **mem_ptr,
int_radix_params params,
bool allocate_gpu_memory) {
int8_t *pbs_buffer;
execute_scratch_pbs<Torus>(
stream, gpu_index, &pbs_buffer, glwe_dimension, lwe_dimension,
polynomial_size, pbs_level, grouping_factor, num_radix_blocks,
cuda_get_max_shared_memory(gpu_index), pbs_type, allocate_gpu_memory);
// LUT
Torus *lut_buffer;
if (allocate_gpu_memory) {
// LUT is used as a trivial encryption, so we only allocate memory for the
// body
Torus lut_buffer_size =
2 * (glwe_dimension + 1) * polynomial_size * sizeof(Torus);
lut_buffer = (Torus *)cuda_malloc_async(lut_buffer_size, stream, gpu_index);
// LUTs
auto lut_f_message = [message_modulus](Torus x) -> Torus {
return x % message_modulus;
};
auto lut_f_carry = [message_modulus](Torus x) -> Torus {
return x / message_modulus;
};
//
Torus *lut_buffer_message = lut_buffer;
Torus *lut_buffer_carry =
lut_buffer + (glwe_dimension + 1) * polynomial_size;
generate_device_accumulator<Torus>(
stream, gpu_index, lut_buffer_message, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, lut_f_message);
generate_device_accumulator<Torus>(
stream, gpu_index, lut_buffer_carry, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, lut_f_carry);
}
Torus *lut_indexes;
if (allocate_gpu_memory) {
lut_indexes =
(Torus *)cuda_malloc_async(2 * sizeof(Torus), stream, gpu_index);
Torus h_lut_indexes[2] = {0, 1};
cuda_memcpy_async_to_gpu(lut_indexes, h_lut_indexes, 2 * sizeof(Torus),
stream, gpu_index);
}
Torus *lwe_indexes;
if (allocate_gpu_memory) {
Torus lwe_indexes_size = num_radix_blocks * sizeof(Torus);
lwe_indexes =
(Torus *)cuda_malloc_async(lwe_indexes_size, stream, gpu_index);
Torus *h_lwe_indexes = (Torus *)malloc(lwe_indexes_size);
for (int i = 0; i < num_radix_blocks; i++)
h_lwe_indexes[i] = i;
cuda_memcpy_async_to_gpu(lwe_indexes, h_lwe_indexes, lwe_indexes_size,
stream, gpu_index);
cuda_stream_add_callback(stream, gpu_index, host_free_on_stream_callback,
h_lwe_indexes);
}
// Temporary arrays
Torus *small_lwe_vector;
Torus *big_lwe_vector;
if (allocate_gpu_memory) {
Torus small_vector_size = 2 * (lwe_dimension + 1) * sizeof(Torus);
Torus big_vector_size =
2 * (glwe_dimension * polynomial_size + 1) * sizeof(Torus);
small_lwe_vector =
(Torus *)cuda_malloc_async(small_vector_size, stream, gpu_index);
big_lwe_vector =
(Torus *)cuda_malloc_async(big_vector_size, stream, gpu_index);
}
*mem_ptr = new int_fullprop_buffer<Torus>;
(*mem_ptr)->pbs_type = pbs_type;
(*mem_ptr)->pbs_buffer = pbs_buffer;
(*mem_ptr)->lut_buffer = lut_buffer;
(*mem_ptr)->lut_indexes = lut_indexes;
(*mem_ptr)->lwe_indexes = lwe_indexes;
(*mem_ptr)->tmp_small_lwe_vector = small_lwe_vector;
(*mem_ptr)->tmp_big_lwe_vector = big_lwe_vector;
*mem_ptr = new int_fullprop_buffer<Torus>(streams, gpu_indexes, gpu_count,
params, allocate_gpu_memory);
}
// (lwe_dimension+1) threads
@@ -736,11 +788,12 @@ __host__ void pack_blocks(cudaStream_t stream, uint32_t gpu_index,
Torus *lwe_array_out, Torus *lwe_array_in,
uint32_t lwe_dimension, uint32_t num_radix_blocks,
uint32_t factor) {
if (num_radix_blocks == 0)
return;
cudaSetDevice(gpu_index);
int num_blocks = 0, num_threads = 0;
int num_entries = (lwe_dimension + 1);
getNumBlocksAndThreads(num_entries, 512, num_blocks, num_threads);
getNumBlocksAndThreads(num_entries, 1024, num_blocks, num_threads);
device_pack_blocks<<<num_blocks, num_threads, 0, stream>>>(
lwe_array_out, lwe_array_in, lwe_dimension, num_radix_blocks, factor);
}
@@ -800,22 +853,22 @@ create_trivial_radix(cudaStream_t stream, uint32_t gpu_index,
template <typename Torus>
__host__ void extract_n_bits(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array_out,
Torus *lwe_array_in, void *bsk, Torus *ksk,
Torus *lwe_array_in, void **bsks, Torus **ksks,
uint32_t num_radix_blocks, uint32_t bits_per_block,
int_bit_extract_luts_buffer<Torus> *bit_extract) {
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_in, bsk, ksk,
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_in, bsks, ksks,
num_radix_blocks * bits_per_block, bit_extract->lut);
}
template <typename Torus>
__host__ void reduce_signs(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *signs_array_out,
Torus *signs_array_in,
int_comparison_buffer<Torus> *mem_ptr,
std::function<Torus(Torus)> sign_handler_f,
void *bsk, Torus *ksk, uint32_t num_sign_blocks) {
__host__ void
reduce_signs(cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *signs_array_out, Torus *signs_array_in,
int_comparison_buffer<Torus> *mem_ptr,
std::function<Torus(Torus)> sign_handler_f, void **bsks,
Torus **ksks, uint32_t num_sign_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto diff_buffer = mem_ptr->diff_buffer;
@@ -845,14 +898,16 @@ __host__ void reduce_signs(cudaStream_t *streams, uint32_t *gpu_indexes,
if (num_sign_blocks > 2) {
auto lut = diff_buffer->reduce_signs_lut;
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], lut->lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, reduce_two_orderings_function);
streams[0], gpu_indexes[0], lut->get_lut(gpu_indexes[0], 0),
glwe_dimension, polynomial_size, message_modulus, carry_modulus,
reduce_two_orderings_function);
lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
while (num_sign_blocks > 2) {
pack_blocks(streams[0], gpu_indexes[0], signs_b, signs_a,
big_lwe_dimension, num_sign_blocks, 4);
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, signs_a, signs_b, bsk, ksk,
streams, gpu_indexes, gpu_count, signs_a, signs_b, bsks, ksks,
num_sign_blocks / 2, lut);
auto last_block_signs_b =
@@ -877,14 +932,16 @@ __host__ void reduce_signs(cudaStream_t *streams, uint32_t *gpu_indexes,
auto lut = diff_buffer->reduce_signs_lut;
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], lut->lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, final_lut_f);
streams[0], gpu_indexes[0], lut->get_lut(gpu_indexes[0], 0),
glwe_dimension, polynomial_size, message_modulus, carry_modulus,
final_lut_f);
lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
pack_blocks(streams[0], gpu_indexes[0], signs_b, signs_a, big_lwe_dimension,
2, 4);
integer_radix_apply_univariate_lookup_table_kb(streams, gpu_indexes,
gpu_count, signs_array_out,
signs_b, bsk, ksk, 1, lut);
signs_b, bsks, ksks, 1, lut);
} else {
@@ -895,40 +952,74 @@ __host__ void reduce_signs(cudaStream_t *streams, uint32_t *gpu_indexes,
auto lut = mem_ptr->diff_buffer->reduce_signs_lut;
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], lut->lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, final_lut_f);
streams[0], gpu_indexes[0], lut->get_lut(gpu_indexes[0], 0),
glwe_dimension, polynomial_size, message_modulus, carry_modulus,
final_lut_f);
lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
integer_radix_apply_univariate_lookup_table_kb(streams, gpu_indexes,
gpu_count, signs_array_out,
signs_a, bsk, ksk, 1, lut);
signs_a, bsks, ksks, 1, lut);
}
}
template <typename Torus>
void scratch_cuda_apply_univariate_lut_kb(
cudaStream_t stream, uint32_t gpu_index, int_radix_lut<Torus> **mem_ptr,
Torus *input_lut, uint32_t num_radix_blocks, int_radix_params params,
bool allocate_gpu_memory) {
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_radix_lut<Torus> **mem_ptr, Torus *input_lut, uint32_t num_radix_blocks,
int_radix_params params, bool allocate_gpu_memory) {
*mem_ptr = new int_radix_lut<Torus>(stream, gpu_index, params, 1,
num_radix_blocks, allocate_gpu_memory);
cuda_memcpy_async_to_gpu((*mem_ptr)->lut, input_lut,
*mem_ptr = new int_radix_lut<Torus>(streams, gpu_indexes, gpu_count, params,
1, num_radix_blocks, allocate_gpu_memory);
// It is safe to do this copy on GPU 0, because all LUTs always reside on GPU
// 0
cuda_memcpy_async_to_gpu((*mem_ptr)->get_lut(gpu_indexes[0], 0), input_lut,
(params.glwe_dimension + 1) *
params.polynomial_size * sizeof(Torus),
stream, gpu_index);
streams[0], gpu_indexes[0]);
(*mem_ptr)->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
}
template <typename Torus>
void host_apply_univariate_lut_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *radix_lwe_out,
Torus *radix_lwe_in,
int_radix_lut<Torus> *mem, Torus *ksk,
void *bsk, uint32_t num_blocks) {
int_radix_lut<Torus> *mem, Torus **ksks,
void **bsks, uint32_t num_blocks) {
cudaSetDevice(gpu_indexes[0]);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, radix_lwe_out, radix_lwe_in, bsk, ksk,
streams, gpu_indexes, gpu_count, radix_lwe_out, radix_lwe_in, bsks, ksks,
num_blocks, mem);
}
template <typename Torus>
void scratch_cuda_apply_bivariate_lut_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_radix_lut<Torus> **mem_ptr, Torus *input_lut, uint32_t num_radix_blocks,
int_radix_params params, bool allocate_gpu_memory) {
*mem_ptr = new int_radix_lut<Torus>(streams, gpu_indexes, gpu_count, params,
1, num_radix_blocks, allocate_gpu_memory);
// It is safe to do this copy on GPU 0, because all LUTs always reside on GPU
// 0
cuda_memcpy_async_to_gpu((*mem_ptr)->get_lut(gpu_indexes[0], 0), input_lut,
(params.glwe_dimension + 1) *
params.polynomial_size * sizeof(Torus),
streams[0], gpu_indexes[0]);
(*mem_ptr)->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
}
template <typename Torus>
void host_apply_bivariate_lut_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *radix_lwe_out,
Torus *radix_lwe_in_1, Torus *radix_lwe_in_2,
int_radix_lut<Torus> *mem, Torus **ksks,
void **bsks, uint32_t num_blocks,
uint32_t shift) {
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, radix_lwe_out, radix_lwe_in_1,
radix_lwe_in_2, bsks, ksks, num_blocks, mem, shift);
}
#endif // TFHE_RS_INTERNAL_INTEGER_CUH

View File

@@ -66,12 +66,12 @@ void generate_ids_update_degrees(int *terms_degree, size_t *h_lwe_idx_in,
* the integer radix multiplication in keyswitch->bootstrap order.
*/
void scratch_cuda_integer_mult_radix_ciphertext_kb_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr,
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t message_modulus, uint32_t carry_modulus, uint32_t glwe_dimension,
uint32_t lwe_dimension, uint32_t polynomial_size, uint32_t pbs_base_log,
uint32_t pbs_level, uint32_t ks_base_log, uint32_t ks_level,
uint32_t grouping_factor, uint32_t num_radix_blocks, PBS_TYPE pbs_type,
uint32_t max_shared_memory, bool allocate_gpu_memory) {
bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
polynomial_size * glwe_dimension, lwe_dimension,
@@ -87,7 +87,7 @@ void scratch_cuda_integer_mult_radix_ciphertext_kb_64(
case 8192:
case 16384:
scratch_cuda_integer_mult_radix_ciphertext_kb<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_mul_memory<uint64_t> **)mem_ptr, num_radix_blocks, params,
allocate_gpu_memory);
break;
@@ -123,76 +123,69 @@ void scratch_cuda_integer_mult_radix_ciphertext_kb_64(
* - 'num_blocks' is the number of big lwe ciphertext blocks inside radix
* ciphertext
* - 'pbs_type' selects which PBS implementation should be used
* - 'max_shared_memory' maximum shared memory per cuda block
*/
void cuda_integer_mult_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *radix_lwe_out, void *radix_lwe_left, void *radix_lwe_right, void *bsk,
void *ksk, int8_t *mem_ptr, uint32_t polynomial_size, uint32_t num_blocks) {
void *radix_lwe_out, void *radix_lwe_left, void *radix_lwe_right,
void **bsks, void **ksks, int8_t *mem_ptr, uint32_t polynomial_size,
uint32_t num_blocks) {
switch (polynomial_size) {
case 256:
host_integer_mult_radix_kb<uint64_t, int64_t, AmortizedDegree<256>>(
host_integer_mult_radix_kb<uint64_t, AmortizedDegree<256>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), (int_mul_memory<uint64_t> *)mem_ptr,
num_blocks);
static_cast<uint64_t *>(radix_lwe_right), bsks, (uint64_t **)(ksks),
(int_mul_memory<uint64_t> *)mem_ptr, num_blocks);
break;
case 512:
host_integer_mult_radix_kb<uint64_t, int64_t, AmortizedDegree<512>>(
host_integer_mult_radix_kb<uint64_t, AmortizedDegree<512>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), (int_mul_memory<uint64_t> *)mem_ptr,
num_blocks);
static_cast<uint64_t *>(radix_lwe_right), bsks, (uint64_t **)(ksks),
(int_mul_memory<uint64_t> *)mem_ptr, num_blocks);
break;
case 1024:
host_integer_mult_radix_kb<uint64_t, int64_t, AmortizedDegree<1024>>(
host_integer_mult_radix_kb<uint64_t, AmortizedDegree<1024>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), (int_mul_memory<uint64_t> *)mem_ptr,
num_blocks);
static_cast<uint64_t *>(radix_lwe_right), bsks, (uint64_t **)(ksks),
(int_mul_memory<uint64_t> *)mem_ptr, num_blocks);
break;
case 2048:
host_integer_mult_radix_kb<uint64_t, int64_t, AmortizedDegree<2048>>(
host_integer_mult_radix_kb<uint64_t, AmortizedDegree<2048>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), (int_mul_memory<uint64_t> *)mem_ptr,
num_blocks);
static_cast<uint64_t *>(radix_lwe_right), bsks, (uint64_t **)(ksks),
(int_mul_memory<uint64_t> *)mem_ptr, num_blocks);
break;
case 4096:
host_integer_mult_radix_kb<uint64_t, int64_t, AmortizedDegree<4096>>(
host_integer_mult_radix_kb<uint64_t, AmortizedDegree<4096>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), (int_mul_memory<uint64_t> *)mem_ptr,
num_blocks);
static_cast<uint64_t *>(radix_lwe_right), bsks, (uint64_t **)(ksks),
(int_mul_memory<uint64_t> *)mem_ptr, num_blocks);
break;
case 8192:
host_integer_mult_radix_kb<uint64_t, int64_t, AmortizedDegree<8192>>(
host_integer_mult_radix_kb<uint64_t, AmortizedDegree<8192>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), (int_mul_memory<uint64_t> *)mem_ptr,
num_blocks);
static_cast<uint64_t *>(radix_lwe_right), bsks, (uint64_t **)(ksks),
(int_mul_memory<uint64_t> *)mem_ptr, num_blocks);
break;
case 16384:
host_integer_mult_radix_kb<uint64_t, int64_t, AmortizedDegree<16384>>(
host_integer_mult_radix_kb<uint64_t, AmortizedDegree<16384>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), (int_mul_memory<uint64_t> *)mem_ptr,
num_blocks);
static_cast<uint64_t *>(radix_lwe_right), bsks, (uint64_t **)(ksks),
(int_mul_memory<uint64_t> *)mem_ptr, num_blocks);
break;
default:
PANIC("Cuda error (integer multiplication): unsupported polynomial size. "
@@ -200,37 +193,38 @@ void cuda_integer_mult_radix_ciphertext_kb_64(
}
}
void cleanup_cuda_integer_mult(void *stream, uint32_t gpu_index,
int8_t **mem_ptr_void) {
void cleanup_cuda_integer_mult(void **streams, uint32_t *gpu_indexes,
uint32_t gpu_count, int8_t **mem_ptr_void) {
int_mul_memory<uint64_t> *mem_ptr =
(int_mul_memory<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}
void scratch_cuda_integer_radix_sum_ciphertexts_vec_kb_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t lwe_dimension, uint32_t ks_level,
uint32_t ks_base_log, uint32_t pbs_level, uint32_t pbs_base_log,
uint32_t grouping_factor, uint32_t num_blocks_in_radix,
uint32_t max_num_radix_in_vec, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, bool allocate_gpu_memory) {
void scratch_cuda_integer_radix_partial_sum_ciphertexts_vec_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t lwe_dimension,
uint32_t ks_level, uint32_t ks_base_log, uint32_t pbs_level,
uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_blocks_in_radix, uint32_t max_num_radix_in_vec,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
glwe_dimension * polynomial_size, lwe_dimension,
ks_level, ks_base_log, pbs_level, pbs_base_log,
grouping_factor, message_modulus, carry_modulus);
scratch_cuda_integer_sum_ciphertexts_vec_kb<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
scratch_cuda_integer_partial_sum_ciphertexts_vec_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_sum_ciphertexts_vec_memory<uint64_t> **)mem_ptr, num_blocks_in_radix,
max_num_radix_in_vec, params, allocate_gpu_memory);
}
void cuda_integer_radix_sum_ciphertexts_vec_kb_64(
void cuda_integer_radix_partial_sum_ciphertexts_vec_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *radix_lwe_out, void *radix_lwe_vec, uint32_t num_radix_in_vec,
int8_t *mem_ptr, void *bsk, void *ksk, uint32_t num_blocks_in_radix) {
int8_t *mem_ptr, void **bsks, void **ksks, uint32_t num_blocks_in_radix) {
auto mem = (int_sum_ciphertexts_vec_memory<uint64_t> *)mem_ptr;
@@ -243,52 +237,51 @@ void cuda_integer_radix_sum_ciphertexts_vec_kb_64(
switch (mem->params.polynomial_size) {
case 512:
host_integer_sum_ciphertexts_vec_kb<uint64_t, AmortizedDegree<512>>(
host_integer_partial_sum_ciphertexts_vec_kb<uint64_t, AmortizedDegree<512>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks_in_radix,
num_radix_in_vec);
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsks,
(uint64_t **)(ksks), mem, num_blocks_in_radix, num_radix_in_vec);
break;
case 1024:
host_integer_sum_ciphertexts_vec_kb<uint64_t, AmortizedDegree<1024>>(
host_integer_partial_sum_ciphertexts_vec_kb<uint64_t,
AmortizedDegree<1024>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks_in_radix,
num_radix_in_vec);
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsks,
(uint64_t **)(ksks), mem, num_blocks_in_radix, num_radix_in_vec);
break;
case 2048:
host_integer_sum_ciphertexts_vec_kb<uint64_t, AmortizedDegree<2048>>(
host_integer_partial_sum_ciphertexts_vec_kb<uint64_t,
AmortizedDegree<2048>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks_in_radix,
num_radix_in_vec);
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsks,
(uint64_t **)(ksks), mem, num_blocks_in_radix, num_radix_in_vec);
break;
case 4096:
host_integer_sum_ciphertexts_vec_kb<uint64_t, AmortizedDegree<4096>>(
host_integer_partial_sum_ciphertexts_vec_kb<uint64_t,
AmortizedDegree<4096>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks_in_radix,
num_radix_in_vec);
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsks,
(uint64_t **)(ksks), mem, num_blocks_in_radix, num_radix_in_vec);
break;
case 8192:
host_integer_sum_ciphertexts_vec_kb<uint64_t, AmortizedDegree<8192>>(
host_integer_partial_sum_ciphertexts_vec_kb<uint64_t,
AmortizedDegree<8192>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks_in_radix,
num_radix_in_vec);
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsks,
(uint64_t **)(ksks), mem, num_blocks_in_radix, num_radix_in_vec);
break;
case 16384:
host_integer_sum_ciphertexts_vec_kb<uint64_t, AmortizedDegree<16384>>(
host_integer_partial_sum_ciphertexts_vec_kb<uint64_t,
AmortizedDegree<16384>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks_in_radix,
num_radix_in_vec);
static_cast<uint64_t *>(radix_lwe_vec), terms_degree, bsks,
(uint64_t **)(ksks), mem, num_blocks_in_radix, num_radix_in_vec);
break;
default:
PANIC("Cuda error (integer multiplication): unsupported polynomial size. "
@@ -298,11 +291,11 @@ void cuda_integer_radix_sum_ciphertexts_vec_kb_64(
free(terms_degree);
}
void cleanup_cuda_integer_radix_sum_ciphertexts_vec(void *stream,
uint32_t gpu_index,
int8_t **mem_ptr_void) {
void cleanup_cuda_integer_radix_partial_sum_ciphertexts_vec(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_sum_ciphertexts_vec_memory<uint64_t> *mem_ptr =
(int_sum_ciphertexts_vec_memory<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}

View File

@@ -8,11 +8,13 @@
#include "crypto/keyswitch.cuh"
#include "device.h"
#include "helper_multi_gpu.h"
#include "integer.h"
#include "integer/integer.cuh"
#include "linear_algebra.h"
#include "programmable_bootstrap.h"
#include "utils/helper.cuh"
#include "utils/helper_multi_gpu.cuh"
#include "utils/kernel_dimensions.cuh"
#include <fstream>
#include <iostream>
@@ -91,15 +93,11 @@ all_shifted_lhs_rhs(Torus *radix_lwe_left, Torus *lsb_ciphertext,
}
}
template <typename Torus, sharedMemDegree SMD>
template <typename Torus>
__global__ void tree_add_chunks(Torus *result_blocks, Torus *input_blocks,
uint32_t chunk_size, uint32_t block_size,
uint32_t num_blocks) {
extern __shared__ int8_t sharedmem[];
Torus *result = (Torus *)sharedmem;
size_t stride = blockDim.x;
size_t chunk_id = blockIdx.x;
size_t chunk_elem_size = chunk_size * num_blocks * block_size;
@@ -107,10 +105,7 @@ __global__ void tree_add_chunks(Torus *result_blocks, Torus *input_blocks,
auto src_chunk = &input_blocks[chunk_id * chunk_elem_size];
auto dst_radix = &result_blocks[chunk_id * radix_elem_size];
size_t block_stride = blockIdx.y * block_size;
auto dst_block = &dst_radix[block_stride];
if constexpr (SMD == NOSM)
result = dst_block;
auto result = &dst_radix[block_stride];
// init shared mem with first radix of chunk
size_t tid = threadIdx.x;
@@ -125,18 +120,12 @@ __global__ void tree_add_chunks(Torus *result_blocks, Torus *input_blocks,
result[i] += cur_src_radix[block_stride + i];
}
}
// put result from shared mem to global mem
if constexpr (SMD == FULLSM)
for (int i = tid; i < block_size; i += stride)
dst_block[i] = result[i];
}
template <typename Torus, class params>
__global__ void fill_radix_from_lsb_msb(Torus *result_blocks, Torus *lsb_blocks,
Torus *msb_blocks,
uint32_t glwe_dimension,
uint32_t lsb_count, uint32_t msb_count,
uint32_t num_blocks) {
size_t big_lwe_dimension = glwe_dimension * params::degree + 1;
size_t big_lwe_id = blockIdx.x;
@@ -180,41 +169,25 @@ __global__ void fill_radix_from_lsb_msb(Torus *result_blocks, Torus *lsb_blocks,
}
}
template <typename Torus>
__host__ void scratch_cuda_integer_sum_ciphertexts_vec_kb(
cudaStream_t stream, uint32_t gpu_index,
__host__ void scratch_cuda_integer_partial_sum_ciphertexts_vec_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_sum_ciphertexts_vec_memory<Torus> **mem_ptr,
uint32_t num_blocks_in_radix, uint32_t max_num_radix_in_vec,
int_radix_params params, bool allocate_gpu_memory) {
cudaSetDevice(gpu_index);
size_t sm_size = (params.big_lwe_dimension + 1) * sizeof(Torus);
if (sm_size < cuda_get_max_shared_memory(gpu_index)) {
check_cuda_error(cudaFuncSetAttribute(
tree_add_chunks<Torus, FULLSM>,
cudaFuncAttributeMaxDynamicSharedMemorySize, sm_size));
cudaFuncSetCacheConfig(tree_add_chunks<Torus, FULLSM>,
cudaFuncCachePreferShared);
check_cuda_error(cudaGetLastError());
} else {
check_cuda_error(
cudaFuncSetAttribute(tree_add_chunks<Torus, NOSM>,
cudaFuncAttributeMaxDynamicSharedMemorySize, 0));
cudaFuncSetCacheConfig(tree_add_chunks<Torus, NOSM>, cudaFuncCachePreferL1);
check_cuda_error(cudaGetLastError());
}
*mem_ptr = new int_sum_ciphertexts_vec_memory<Torus>(
stream, gpu_index, params, num_blocks_in_radix, max_num_radix_in_vec,
allocate_gpu_memory);
streams, gpu_indexes, gpu_count, params, num_blocks_in_radix,
max_num_radix_in_vec, allocate_gpu_memory);
}
template <typename Torus, class params>
__host__ void host_integer_sum_ciphertexts_vec_kb(
__host__ void host_integer_partial_sum_ciphertexts_vec_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *radix_lwe_out, Torus *terms, int *terms_degree, void *bsk,
uint64_t *ksk, int_sum_ciphertexts_vec_memory<uint64_t> *mem_ptr,
uint32_t num_blocks_in_radix, uint32_t num_radix_in_vec) {
Torus *radix_lwe_out, Torus *terms, int *terms_degree, void **bsks,
uint64_t **ksks, int_sum_ciphertexts_vec_memory<uint64_t> *mem_ptr,
uint32_t num_blocks_in_radix, uint32_t num_radix_in_vec,
int_radix_lut<Torus> *reused_lut = nullptr) {
cudaSetDevice(gpu_indexes[0]);
auto new_blocks = mem_ptr->new_blocks;
auto old_blocks = mem_ptr->old_blocks;
auto small_lwe_vector = mem_ptr->small_lwe_vector;
@@ -225,11 +198,12 @@ __host__ void host_integer_sum_ciphertexts_vec_kb(
auto message_modulus = mem_ptr->params.message_modulus;
auto carry_modulus = mem_ptr->params.carry_modulus;
auto num_blocks = num_blocks_in_radix;
auto big_lwe_size = mem_ptr->params.big_lwe_dimension + 1;
auto big_lwe_dimension = mem_ptr->params.big_lwe_dimension;
auto big_lwe_size = big_lwe_dimension + 1;
auto glwe_dimension = mem_ptr->params.glwe_dimension;
auto polynomial_size = mem_ptr->params.polynomial_size;
auto lwe_dimension = mem_ptr->params.small_lwe_dimension;
auto big_lwe_dimension = mem_ptr->params.big_lwe_dimension;
auto small_lwe_dimension = mem_ptr->params.small_lwe_dimension;
auto small_lwe_size = small_lwe_dimension + 1;
if (old_blocks != terms) {
cuda_memcpy_async_gpu_to_gpu(old_blocks, terms,
@@ -248,7 +222,48 @@ __host__ void host_integer_sum_ciphertexts_vec_kb(
int32_t h_smart_copy_in[r * num_blocks];
int32_t h_smart_copy_out[r * num_blocks];
auto max_shared_memory = cuda_get_max_shared_memory(gpu_indexes[0]);
/// Here it is important to query the default max shared memory on device 0
/// instead of cuda_get_max_shared_memory,
/// to avoid bugs with tree_add_chunks trying to use too much shared memory
int max_shared_memory = 0;
check_cuda_error(cudaDeviceGetAttribute(
&max_shared_memory, cudaDevAttrMaxSharedMemoryPerBlock, 0));
// create lut object for message and carry
// we allocate luts_message_carry in the host function (instead of scratch)
// to reduce average memory consumption
int_radix_lut<Torus> *luts_message_carry;
size_t ch_amount = r / chunk_size;
if (!ch_amount)
ch_amount++;
if (reused_lut == nullptr) {
luts_message_carry = new int_radix_lut<Torus>(
streams, gpu_indexes, gpu_count, mem_ptr->params, 2,
2 * ch_amount * num_blocks, true);
} else {
luts_message_carry = new int_radix_lut<Torus>(
streams, gpu_indexes, gpu_count, mem_ptr->params, 2,
2 * ch_amount * num_blocks, reused_lut);
}
auto message_acc = luts_message_carry->get_lut(gpu_indexes[0], 0);
auto carry_acc = luts_message_carry->get_lut(gpu_indexes[0], 1);
// define functions for each accumulator
auto lut_f_message = [message_modulus](Torus x) -> Torus {
return x % message_modulus;
};
auto lut_f_carry = [message_modulus](Torus x) -> Torus {
return x / message_modulus;
};
// generate accumulators
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], message_acc, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, lut_f_message);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], carry_acc, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, lut_f_carry);
luts_message_carry->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
while (r > 2) {
size_t cur_total_blocks = r * num_blocks;
@@ -258,12 +273,9 @@ __host__ void host_integer_sum_ciphertexts_vec_kb(
dim3 add_grid(ch_amount, num_blocks, 1);
size_t sm_size = big_lwe_size * sizeof(Torus);
if (sm_size < max_shared_memory)
tree_add_chunks<Torus, FULLSM><<<add_grid, 512, sm_size, streams[0]>>>(
new_blocks, old_blocks, min(r, chunk_size), big_lwe_size, num_blocks);
else
tree_add_chunks<Torus, NOSM><<<add_grid, 512, 0, streams[0]>>>(
new_blocks, old_blocks, min(r, chunk_size), big_lwe_size, num_blocks);
cudaSetDevice(gpu_indexes[0]);
tree_add_chunks<Torus><<<add_grid, 512, 0, streams[0]>>>(
new_blocks, old_blocks, min(r, chunk_size), big_lwe_size, num_blocks);
check_cuda_error(cudaGetLastError());
@@ -276,47 +288,22 @@ __host__ void host_integer_sum_ciphertexts_vec_kb(
terms_degree, h_lwe_idx_in, h_lwe_idx_out, h_smart_copy_in,
h_smart_copy_out, ch_amount, r, num_blocks, chunk_size, message_max,
total_count, message_count, carry_count, sm_copy_count);
// create lut object for message and carry
// we allocate luts_message_carry in the host function (instead of scratch)
// to reduce average memory consumption
auto luts_message_carry = new int_radix_lut<Torus>(
streams[0], gpu_indexes[0], mem_ptr->params, 2, total_count, true);
auto message_acc = luts_message_carry->get_lut(0);
auto carry_acc = luts_message_carry->get_lut(1);
// define functions for each accumulator
auto lut_f_message = [message_modulus](Torus x) -> Torus {
return x % message_modulus;
};
auto lut_f_carry = [message_modulus](Torus x) -> Torus {
return x / message_modulus;
};
// generate accumulators
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], message_acc, glwe_dimension,
polynomial_size, message_modulus, carry_modulus, lut_f_message);
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], carry_acc, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, lut_f_carry);
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
auto lwe_indexes_in = luts_message_carry->lwe_indexes_in;
auto lwe_indexes_out = luts_message_carry->lwe_indexes_out;
luts_message_carry->set_lwe_indexes(streams[0], gpu_indexes[0],
h_lwe_idx_in, h_lwe_idx_out);
size_t copy_size = total_count * sizeof(Torus);
cuda_memcpy_async_to_gpu(lwe_indexes_in, h_lwe_idx_in, copy_size,
streams[0], gpu_indexes[0]);
cuda_memcpy_async_to_gpu(lwe_indexes_out, h_lwe_idx_out, copy_size,
streams[0], gpu_indexes[0]);
copy_size = sm_copy_count * sizeof(int32_t);
size_t copy_size = sm_copy_count * sizeof(int32_t);
cuda_memcpy_async_to_gpu(d_smart_copy_in, h_smart_copy_in, copy_size,
streams[0], gpu_indexes[0]);
cuda_memcpy_async_to_gpu(d_smart_copy_out, h_smart_copy_out, copy_size,
streams[0], gpu_indexes[0]);
smart_copy<<<sm_copy_count, 256, 0, streams[0]>>>(
// inside d_smart_copy_in there are only -1 values
// it's fine to call smart_copy with same pointer
// as source and destination
smart_copy<<<sm_copy_count, 1024, 0, streams[0]>>>(
new_blocks, new_blocks, d_smart_copy_out, d_smart_copy_in,
big_lwe_size);
check_cuda_error(cudaGetLastError());
@@ -324,24 +311,102 @@ __host__ void host_integer_sum_ciphertexts_vec_kb(
if (carry_count > 0)
cuda_set_value_async<Torus>(
streams[0], gpu_indexes[0],
luts_message_carry->get_lut_indexes(message_count), 1, carry_count);
luts_message_carry->get_lut_indexes(gpu_indexes[0], message_count), 1,
carry_count);
cuda_keyswitch_lwe_ciphertext_vector(
streams[0], gpu_indexes[0], small_lwe_vector, lwe_indexes_in,
new_blocks, lwe_indexes_in, ksk, polynomial_size * glwe_dimension,
lwe_dimension, mem_ptr->params.ks_base_log, mem_ptr->params.ks_level,
message_count);
luts_message_carry->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
execute_pbs<Torus>(streams, gpu_indexes, gpu_count, new_blocks,
lwe_indexes_out, luts_message_carry->lut,
luts_message_carry->lut_indexes, small_lwe_vector,
lwe_indexes_in, bsk, luts_message_carry->buffer,
glwe_dimension, lwe_dimension, polynomial_size,
mem_ptr->params.pbs_base_log, mem_ptr->params.pbs_level,
mem_ptr->params.grouping_factor, total_count, 2, 0,
max_shared_memory, mem_ptr->params.pbs_type);
/// For multi GPU execution we create vectors of pointers for inputs and
/// outputs
std::vector<Torus *> new_blocks_vec = luts_message_carry->lwe_array_in_vec;
std::vector<Torus *> small_lwe_vector_vec =
luts_message_carry->lwe_after_ks_vec;
std::vector<Torus *> lwe_after_pbs_vec =
luts_message_carry->lwe_after_pbs_vec;
std::vector<Torus *> lwe_trivial_indexes_vec =
luts_message_carry->lwe_trivial_indexes_vec;
luts_message_carry->release(streams[0], gpu_indexes[0]);
auto active_gpu_count = get_active_gpu_count(total_count, gpu_count);
if (active_gpu_count == 1) {
/// Apply KS to go from a big LWE dimension to a small LWE dimension
/// After this keyswitch execution, we need to synchronize the streams
/// because the keyswitch and PBS do not operate on the same number of
/// inputs
execute_keyswitch_async<Torus>(
streams, gpu_indexes, 1, small_lwe_vector, lwe_indexes_in, new_blocks,
lwe_indexes_in, ksks, polynomial_size * glwe_dimension,
small_lwe_dimension, mem_ptr->params.ks_base_log,
mem_ptr->params.ks_level, message_count);
/// Apply PBS to apply a LUT, reduce the noise and go from a small LWE
/// dimension to a big LWE dimension
execute_pbs_async<Torus>(
streams, gpu_indexes, 1, new_blocks, lwe_indexes_out,
luts_message_carry->lut_vec, luts_message_carry->lut_indexes_vec,
small_lwe_vector, lwe_indexes_in, bsks, luts_message_carry->buffer,
glwe_dimension, small_lwe_dimension, polynomial_size,
mem_ptr->params.pbs_base_log, mem_ptr->params.pbs_level,
mem_ptr->params.grouping_factor, total_count,
mem_ptr->params.pbs_type);
} else {
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
multi_gpu_scatter_lwe_async<Torus>(
streams, gpu_indexes, active_gpu_count, new_blocks_vec, new_blocks,
luts_message_carry->h_lwe_indexes_in,
luts_message_carry->using_trivial_lwe_indexes, message_count,
big_lwe_size);
/// Apply KS to go from a big LWE dimension to a small LWE dimension
/// After this keyswitch execution, we need to synchronize the streams
/// because the keyswitch and PBS do not operate on the same number of
/// inputs
execute_keyswitch_async<Torus>(
streams, gpu_indexes, active_gpu_count, small_lwe_vector_vec,
lwe_trivial_indexes_vec, new_blocks_vec, lwe_trivial_indexes_vec,
ksks, big_lwe_dimension, small_lwe_dimension,
mem_ptr->params.ks_base_log, mem_ptr->params.ks_level, total_count);
/// Copy data back to GPU 0, rebuild the lwe array, and scatter again on a
/// different configuration
multi_gpu_gather_lwe_async<Torus>(
streams, gpu_indexes, gpu_count, small_lwe_vector,
small_lwe_vector_vec, luts_message_carry->h_lwe_indexes_in,
luts_message_carry->using_trivial_lwe_indexes, message_count,
small_lwe_size);
/// Synchronize all GPUs
for (uint i = 0; i < active_gpu_count; i++) {
cuda_synchronize_stream(streams[i], gpu_indexes[i]);
}
multi_gpu_scatter_lwe_async<Torus>(
streams, gpu_indexes, gpu_count, small_lwe_vector_vec,
small_lwe_vector, luts_message_carry->h_lwe_indexes_in,
luts_message_carry->using_trivial_lwe_indexes, total_count,
small_lwe_size);
/// Apply PBS to apply a LUT, reduce the noise and go from a small LWE
/// dimension to a big LWE dimension
execute_pbs_async<Torus>(
streams, gpu_indexes, active_gpu_count, lwe_after_pbs_vec,
lwe_trivial_indexes_vec, luts_message_carry->lut_vec,
luts_message_carry->lut_indexes_vec, small_lwe_vector_vec,
lwe_trivial_indexes_vec, bsks, luts_message_carry->buffer,
glwe_dimension, small_lwe_dimension, polynomial_size,
mem_ptr->params.pbs_base_log, mem_ptr->params.pbs_level,
mem_ptr->params.grouping_factor, total_count,
mem_ptr->params.pbs_type);
multi_gpu_gather_lwe_async<Torus>(
streams, gpu_indexes, active_gpu_count, new_blocks, lwe_after_pbs_vec,
luts_message_carry->h_lwe_indexes_out,
luts_message_carry->using_trivial_lwe_indexes, total_count,
big_lwe_size);
/// Synchronize all GPUs
for (uint i = 0; i < active_gpu_count; i++) {
cuda_synchronize_stream(streams[i], gpu_indexes[i]);
}
}
int rem_blocks = (r > chunk_size) ? r % chunk_size * num_blocks : 0;
int new_blocks_created = 2 * ch_amount * num_blocks;
@@ -354,24 +419,21 @@ __host__ void host_integer_sum_ciphertexts_vec_kb(
std::swap(new_blocks, old_blocks);
r = (new_blocks_created + rem_blocks) / num_blocks;
}
luts_message_carry->release(streams, gpu_indexes, gpu_count);
delete (luts_message_carry);
host_addition(streams[0], gpu_indexes[0], radix_lwe_out, old_blocks,
&old_blocks[num_blocks * big_lwe_size], big_lwe_dimension,
num_blocks);
host_propagate_single_carry<Torus>(streams, gpu_indexes, gpu_count,
radix_lwe_out, mem_ptr->scp_mem, bsk, ksk,
num_blocks);
}
template <typename Torus, typename STorus, class params>
template <typename Torus, class params>
__host__ void host_integer_mult_radix_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
uint64_t *radix_lwe_out, uint64_t *radix_lwe_left,
uint64_t *radix_lwe_right, void *bsk, uint64_t *ksk,
uint64_t *radix_lwe_right, void **bsks, uint64_t **ksks,
int_mul_memory<Torus> *mem_ptr, uint32_t num_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto glwe_dimension = mem_ptr->params.glwe_dimension;
auto polynomial_size = mem_ptr->params.polynomial_size;
auto lwe_dimension = mem_ptr->params.small_lwe_dimension;
@@ -438,6 +500,7 @@ __host__ void host_integer_mult_radix_kb(
dim3 grid(lsb_vector_block_count, 1, 1);
dim3 thds(params::degree / params::opt, 1, 1);
cudaSetDevice(gpu_indexes[0]);
all_shifted_lhs_rhs<Torus, params><<<grid, thds, 0, streams[0]>>>(
radix_lwe_left, vector_result_lsb, vector_result_msb, radix_lwe_right,
vector_lsb_rhs, vector_msb_rhs, num_blocks);
@@ -445,18 +508,18 @@ __host__ void host_integer_mult_radix_kb(
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, block_mul_res, block_mul_res,
vector_result_sb, bsk, ksk, total_block_count, luts_array,
vector_result_sb, bsks, ksks, total_block_count, luts_array,
luts_array->params.message_modulus);
vector_result_lsb = &block_mul_res[0];
vector_result_msb = &block_mul_res[lsb_vector_block_count *
(polynomial_size * glwe_dimension + 1)];
cudaSetDevice(gpu_indexes[0]);
fill_radix_from_lsb_msb<Torus, params>
<<<num_blocks * num_blocks, params::degree / params::opt, 0,
streams[0]>>>(vector_result_sb, vector_result_lsb, vector_result_msb,
glwe_dimension, lsb_vector_block_count,
msb_vector_block_count, num_blocks);
glwe_dimension, num_blocks);
check_cuda_error(cudaGetLastError());
int terms_degree[2 * num_blocks * num_blocks];
@@ -472,35 +535,23 @@ __host__ void host_integer_mult_radix_kb(
terms_degree_msb[i] = (b_id > r_id) ? message_modulus - 2 : 0;
}
host_integer_sum_ciphertexts_vec_kb<Torus, params>(
host_integer_partial_sum_ciphertexts_vec_kb<Torus, params>(
streams, gpu_indexes, gpu_count, radix_lwe_out, vector_result_sb,
terms_degree, bsk, ksk, mem_ptr->sum_ciphertexts_mem, num_blocks,
2 * num_blocks);
terms_degree, bsks, ksks, mem_ptr->sum_ciphertexts_mem, num_blocks,
2 * num_blocks, mem_ptr->luts_array);
auto scp_mem_ptr = mem_ptr->sum_ciphertexts_mem->scp_mem;
host_propagate_single_carry<Torus>(streams, gpu_indexes, gpu_count,
radix_lwe_out, nullptr, nullptr,
scp_mem_ptr, bsks, ksks, num_blocks);
}
template <typename Torus>
__host__ void scratch_cuda_integer_mult_radix_ciphertext_kb(
cudaStream_t stream, uint32_t gpu_index, int_mul_memory<Torus> **mem_ptr,
uint32_t num_radix_blocks, int_radix_params params,
bool allocate_gpu_memory) {
cudaSetDevice(gpu_index);
size_t sm_size = (params.big_lwe_dimension + 1) * sizeof(Torus);
if (sm_size < cuda_get_max_shared_memory(gpu_index)) {
check_cuda_error(cudaFuncSetAttribute(
tree_add_chunks<Torus, FULLSM>,
cudaFuncAttributeMaxDynamicSharedMemorySize, sm_size));
cudaFuncSetCacheConfig(tree_add_chunks<Torus, FULLSM>,
cudaFuncCachePreferShared);
check_cuda_error(cudaGetLastError());
} else {
check_cuda_error(
cudaFuncSetAttribute(tree_add_chunks<Torus, NOSM>,
cudaFuncAttributeMaxDynamicSharedMemorySize, 0));
cudaFuncSetCacheConfig(tree_add_chunks<Torus, NOSM>, cudaFuncCachePreferL1);
check_cuda_error(cudaGetLastError());
}
*mem_ptr = new int_mul_memory<Torus>(stream, gpu_index, params,
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_mul_memory<Torus> **mem_ptr, uint32_t num_radix_blocks,
int_radix_params params, bool allocate_gpu_memory) {
*mem_ptr = new int_mul_memory<Torus>(streams, gpu_indexes, gpu_count, params,
num_radix_blocks, allocate_gpu_memory);
}

View File

@@ -12,12 +12,12 @@ void cuda_negate_integer_radix_ciphertext_64_inplace(
}
void scratch_cuda_integer_radix_overflowing_sub_kb_64(
void *stream, uint32_t gpu_index, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_blocks, uint32_t message_modulus, uint32_t carry_modulus,
PBS_TYPE pbs_type, bool allocate_gpu_memory) {
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count, int8_t **mem_ptr,
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t big_lwe_dimension, uint32_t small_lwe_dimension, uint32_t ks_level,
uint32_t ks_base_log, uint32_t pbs_level, uint32_t pbs_base_log,
uint32_t grouping_factor, uint32_t num_blocks, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
@@ -25,7 +25,7 @@ void scratch_cuda_integer_radix_overflowing_sub_kb_64(
message_modulus, carry_modulus);
scratch_cuda_integer_overflowing_sub_kb<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
(int_overflowing_sub_memory<uint64_t> **)mem_ptr, num_blocks, params,
allocate_gpu_memory);
}
@@ -33,77 +33,26 @@ void scratch_cuda_integer_radix_overflowing_sub_kb_64(
void cuda_integer_radix_overflowing_sub_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *radix_lwe_out, void *radix_lwe_overflowed, void *radix_lwe_left,
void *radix_lwe_right, int8_t *mem_ptr, void *bsk, void *ksk,
void *radix_lwe_right, int8_t *mem_ptr, void **bsks, void **ksks,
uint32_t num_blocks) {
auto mem = (int_overflowing_sub_memory<uint64_t> *)mem_ptr;
switch (mem->params.polynomial_size) {
case 512:
host_integer_overflowing_sub_kb<uint64_t, AmortizedDegree<512>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_overflowed),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 1024:
host_integer_overflowing_sub_kb<uint64_t, AmortizedDegree<1024>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_overflowed),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 2048:
host_integer_overflowing_sub_kb<uint64_t, AmortizedDegree<2048>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_overflowed),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 4096:
host_integer_overflowing_sub_kb<uint64_t, AmortizedDegree<4096>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_overflowed),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 8192:
host_integer_overflowing_sub_kb<uint64_t, AmortizedDegree<8192>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_overflowed),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
case 16384:
host_integer_overflowing_sub_kb<uint64_t, AmortizedDegree<16384>>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_overflowed),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), mem, num_blocks);
break;
default:
PANIC("Cuda error (integer overflowing sub): unsupported polynomial size. "
"Only N = 512, 1024, 2048, 4096, 8192, 16384 is supported")
}
host_integer_overflowing_sub_kb<uint64_t>(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_overflowed),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsks, (uint64_t **)(ksks), mem,
num_blocks);
}
void cleanup_cuda_integer_radix_overflowing_sub(void *stream,
uint32_t gpu_index,
void cleanup_cuda_integer_radix_overflowing_sub(void **streams,
uint32_t *gpu_indexes,
uint32_t gpu_count,
int8_t **mem_ptr_void) {
int_overflowing_sub_memory<uint64_t> *mem_ptr =
(int_overflowing_sub_memory<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(static_cast<cudaStream_t>(stream), gpu_index);
mem_ptr->release((cudaStream_t *)(streams), gpu_indexes, gpu_count);
}

View File

@@ -90,20 +90,19 @@ host_integer_radix_negation(cudaStream_t *streams, uint32_t *gpu_indexes,
template <typename Torus>
__host__ void scratch_cuda_integer_overflowing_sub_kb(
cudaStream_t stream, uint32_t gpu_index,
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
int_overflowing_sub_memory<Torus> **mem_ptr, uint32_t num_blocks,
int_radix_params params, bool allocate_gpu_memory) {
cudaSetDevice(gpu_index);
*mem_ptr = new int_overflowing_sub_memory<Torus>(
stream, gpu_index, params, num_blocks, allocate_gpu_memory);
streams, gpu_indexes, gpu_count, params, num_blocks, allocate_gpu_memory);
}
template <typename Torus, class params>
template <typename Torus>
__host__ void host_integer_overflowing_sub_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *radix_lwe_out, Torus *radix_lwe_overflowed, Torus *radix_lwe_left,
Torus *radix_lwe_right, void *bsk, uint64_t *ksk,
Torus *radix_lwe_right, void **bsks, uint64_t **ksks,
int_overflowing_sub_memory<uint64_t> *mem_ptr, uint32_t num_blocks) {
auto radix_params = mem_ptr->params;
@@ -114,9 +113,9 @@ __host__ void host_integer_overflowing_sub_kb(
radix_params.message_modulus, radix_params.carry_modulus,
radix_params.message_modulus - 1);
host_propagate_single_sub_borrow<Torus>(
streams, gpu_indexes, gpu_count, radix_lwe_overflowed, radix_lwe_out,
mem_ptr->borrow_prop_mem, bsk, ksk, num_blocks);
host_propagate_single_sub_borrow<Torus>(streams, gpu_indexes, gpu_count,
radix_lwe_overflowed, radix_lwe_out,
mem_ptr, bsks, ksks, num_blocks);
}
#endif

View File

@@ -3,7 +3,7 @@
void cuda_scalar_bitop_integer_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *lwe_array_out, void *lwe_array_input, void *clear_blocks,
uint32_t num_clear_blocks, int8_t *mem_ptr, void *bsk, void *ksk,
uint32_t num_clear_blocks, int8_t *mem_ptr, void **bsks, void **ksks,
uint32_t lwe_ciphertext_count, BITOP_TYPE op) {
host_integer_radix_scalar_bitop_kb<uint64_t>(
@@ -11,6 +11,6 @@ void cuda_scalar_bitop_integer_radix_ciphertext_kb_64(
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_input),
static_cast<uint64_t *>(clear_blocks), num_clear_blocks,
(int_bitop_buffer<uint64_t> *)mem_ptr, bsk, static_cast<uint64_t *>(ksk),
(int_bitop_buffer<uint64_t> *)mem_ptr, bsks, (uint64_t **)(ksks),
lwe_ciphertext_count, op);
}

View File

@@ -8,10 +8,9 @@ template <typename Torus>
__host__ void host_integer_radix_scalar_bitop_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_input, Torus *clear_blocks,
uint32_t num_clear_blocks, int_bitop_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks, BITOP_TYPE op) {
uint32_t num_clear_blocks, int_bitop_buffer<Torus> *mem_ptr, void **bsks,
Torus **ksks, uint32_t num_radix_blocks, BITOP_TYPE op) {
cudaSetDevice(gpu_indexes[0]);
auto lut = mem_ptr->lut;
auto params = lut->params;
auto big_lwe_dimension = params.big_lwe_dimension;
@@ -31,13 +30,14 @@ __host__ void host_integer_radix_scalar_bitop_kb(
} else {
// We have all possible LUTs pre-computed and we use the decomposed scalar
// as index to recover the right one
cuda_memcpy_async_gpu_to_gpu(lut->lut_indexes, clear_blocks,
num_clear_blocks * sizeof(Torus), streams[0],
gpu_indexes[0]);
cuda_memcpy_async_gpu_to_gpu(lut->get_lut_indexes(gpu_indexes[0], 0),
clear_blocks, num_clear_blocks * sizeof(Torus),
streams[0], gpu_indexes[0]);
lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_input, bsk,
ksk, num_clear_blocks, lut);
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_input, bsks,
ksks, num_clear_blocks, lut);
if (op == SCALAR_BITAND && num_clear_blocks < num_radix_blocks) {
auto lwe_array_out_block = lwe_array_out + num_clear_blocks * lwe_size;

View File

@@ -3,7 +3,7 @@
void cuda_scalar_comparison_integer_radix_ciphertext_kb_64(
void **streams, uint32_t *gpu_indexes, uint32_t gpu_count,
void *lwe_array_out, void *lwe_array_in, void *scalar_blocks,
int8_t *mem_ptr, void *bsk, void *ksk, uint32_t lwe_ciphertext_count,
int8_t *mem_ptr, void **bsks, void **ksks, uint32_t lwe_ciphertext_count,
uint32_t num_scalar_blocks) {
int_comparison_buffer<uint64_t> *buffer =
@@ -15,8 +15,8 @@ void cuda_scalar_comparison_integer_radix_ciphertext_kb_64(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_in),
static_cast<uint64_t *>(scalar_blocks), buffer, bsk,
static_cast<uint64_t *>(ksk), lwe_ciphertext_count, num_scalar_blocks);
static_cast<uint64_t *>(scalar_blocks), buffer, bsks,
(uint64_t **)(ksks), lwe_ciphertext_count, num_scalar_blocks);
break;
case GT:
case GE:
@@ -27,7 +27,7 @@ void cuda_scalar_comparison_integer_radix_ciphertext_kb_64(
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_in),
static_cast<uint64_t *>(scalar_blocks), buffer,
buffer->diff_buffer->operator_f, bsk, static_cast<uint64_t *>(ksk),
buffer->diff_buffer->operator_f, bsks, (uint64_t **)(ksks),
lwe_ciphertext_count, num_scalar_blocks);
break;
case MAX:
@@ -36,8 +36,8 @@ void cuda_scalar_comparison_integer_radix_ciphertext_kb_64(
(cudaStream_t *)(streams), gpu_indexes, gpu_count,
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_in),
static_cast<uint64_t *>(scalar_blocks), buffer, bsk,
static_cast<uint64_t *>(ksk), lwe_ciphertext_count, num_scalar_blocks);
static_cast<uint64_t *>(scalar_blocks), buffer, bsks,
(uint64_t **)(ksks), lwe_ciphertext_count, num_scalar_blocks);
break;
default:
PANIC("Cuda error: integer operation not supported")

View File

@@ -2,17 +2,15 @@
#define CUDA_INTEGER_SCALAR_COMPARISON_OPS_CUH
#include "integer/comparison.cuh"
#include <omp.h>
template <typename Torus>
__host__ void integer_radix_unsigned_scalar_difference_check_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr,
std::function<Torus(Torus)> sign_handler_f, void *bsk, Torus *ksk,
std::function<Torus(Torus)> sign_handler_f, void **bsks, Torus **ksks,
uint32_t total_num_radix_blocks, uint32_t total_num_scalar_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto glwe_dimension = params.glwe_dimension;
@@ -49,7 +47,7 @@ __host__ void integer_radix_unsigned_scalar_difference_check_kb(
// means scalar is zero
host_compare_with_zero_equality(streams, gpu_indexes, gpu_count,
mem_ptr->tmp_lwe_array_out, lwe_array_in,
mem_ptr, bsk, ksk, total_num_radix_blocks,
mem_ptr, bsks, ksks, total_num_radix_blocks,
mem_ptr->is_zero_lut);
auto scalar_last_leaf_lut_f = [sign_handler_f](Torus x) -> Torus {
@@ -60,12 +58,14 @@ __host__ void integer_radix_unsigned_scalar_difference_check_kb(
auto lut = mem_ptr->diff_buffer->tree_buffer->tree_last_leaf_scalar_lut;
generate_device_accumulator<Torus>(
streams[0], gpu_indexes[0], lut->lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, scalar_last_leaf_lut_f);
streams[0], gpu_indexes[0], lut->get_lut(gpu_indexes[0], 0),
glwe_dimension, polynomial_size, message_modulus, carry_modulus,
scalar_last_leaf_lut_f);
lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
streams, gpu_indexes, gpu_count, lwe_array_out,
mem_ptr->tmp_lwe_array_out, bsk, ksk, 1, lut);
mem_ptr->tmp_lwe_array_out, bsks, ksks, 1, lut);
} else if (total_num_scalar_blocks < total_num_radix_blocks) {
// We have to handle both part of the work described above
@@ -79,58 +79,53 @@ __host__ void integer_radix_unsigned_scalar_difference_check_kb(
auto lwe_array_lsb_out = mem_ptr->tmp_lwe_array_out;
auto lwe_array_msb_out = lwe_array_lsb_out + big_lwe_size;
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
auto lsb_stream = mem_ptr->lsb_stream;
auto msb_stream = mem_ptr->msb_stream;
auto lsb_streams = mem_ptr->lsb_streams;
auto msb_streams = mem_ptr->msb_streams;
#pragma omp parallel sections
{
// Both sections may be executed in parallel
#pragma omp section
{
//////////////
// lsb
Torus *lhs = diff_buffer->tmp_packed_left;
Torus *rhs = diff_buffer->tmp_packed_right;
pack_blocks(lsb_stream, gpu_indexes[0], lhs, lwe_array_in,
big_lwe_dimension, num_lsb_radix_blocks, message_modulus);
pack_blocks(lsb_stream, gpu_indexes[0], rhs, scalar_blocks, 0,
total_num_scalar_blocks, message_modulus);
// From this point we have half number of blocks
num_lsb_radix_blocks /= 2;
num_lsb_radix_blocks += (total_num_scalar_blocks % 2);
// comparisons will be assigned
// - 0 if lhs < rhs
// - 1 if lhs == rhs
// - 2 if lhs > rhs
auto comparisons = mem_ptr->tmp_block_comparisons;
scalar_compare_radix_blocks_kb(&lsb_stream, &gpu_indexes[0], 1,
comparisons, lhs, rhs, mem_ptr, bsk, ksk,
num_lsb_radix_blocks);
// Reduces a vec containing radix blocks that encrypts a sign
// (inferior, equal, superior) to one single radix block containing the
// final sign
tree_sign_reduction(&lsb_stream, &gpu_indexes[0], 1, lwe_array_lsb_out,
comparisons, mem_ptr->diff_buffer->tree_buffer,
mem_ptr->identity_lut_f, bsk, ksk,
num_lsb_radix_blocks);
}
#pragma omp section
{
//////////////
// msb
host_compare_with_zero_equality(
&msb_stream, &gpu_indexes[0], 1, lwe_array_msb_out, msb, mem_ptr,
bsk, ksk, num_msb_radix_blocks, mem_ptr->is_zero_lut);
}
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
//////////////
// lsb
Torus *lhs = diff_buffer->tmp_packed_left;
Torus *rhs = diff_buffer->tmp_packed_right;
pack_blocks(lsb_streams[0], gpu_indexes[0], lhs, lwe_array_in,
big_lwe_dimension, num_lsb_radix_blocks, message_modulus);
pack_blocks(lsb_streams[0], gpu_indexes[0], rhs, scalar_blocks, 0,
total_num_scalar_blocks, message_modulus);
// From this point we have half number of blocks
num_lsb_radix_blocks /= 2;
num_lsb_radix_blocks += (total_num_scalar_blocks % 2);
// comparisons will be assigned
// - 0 if lhs < rhs
// - 1 if lhs == rhs
// - 2 if lhs > rhs
auto comparisons = mem_ptr->tmp_block_comparisons;
scalar_compare_radix_blocks_kb(lsb_streams, gpu_indexes, gpu_count,
comparisons, lhs, rhs, mem_ptr, bsks, ksks,
num_lsb_radix_blocks);
// Reduces a vec containing radix blocks that encrypts a sign
// (inferior, equal, superior) to one single radix block containing the
// final sign
tree_sign_reduction(lsb_streams, gpu_indexes, gpu_count, lwe_array_lsb_out,
comparisons, mem_ptr->diff_buffer->tree_buffer,
mem_ptr->identity_lut_f, bsks, ksks,
num_lsb_radix_blocks);
//////////////
// msb
host_compare_with_zero_equality(msb_streams, gpu_indexes, gpu_count,
lwe_array_msb_out, msb, mem_ptr, bsks, ksks,
num_msb_radix_blocks, mem_ptr->is_zero_lut);
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(lsb_streams[j], gpu_indexes[j]);
cuda_synchronize_stream(msb_streams[j], gpu_indexes[j]);
}
cuda_synchronize_stream(lsb_stream, gpu_indexes[0]);
cuda_synchronize_stream(msb_stream, gpu_indexes[0]);
//////////////
// Reduce the two blocks into one final
@@ -145,12 +140,14 @@ __host__ void integer_radix_unsigned_scalar_difference_check_kb(
auto lut = diff_buffer->tree_buffer->tree_last_leaf_scalar_lut;
generate_device_accumulator_bivariate<Torus>(
streams[0], gpu_indexes[0], lut->lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, scalar_bivariate_last_leaf_lut_f);
streams[0], gpu_indexes[0], lut->get_lut(gpu_indexes[0], 0),
glwe_dimension, polynomial_size, message_modulus, carry_modulus,
scalar_bivariate_last_leaf_lut_f);
lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
integer_radix_apply_bivariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_lsb_out,
lwe_array_msb_out, bsk, ksk, 1, lut, lut->params.message_modulus);
lwe_array_msb_out, bsks, ksks, 1, lut, lut->params.message_modulus);
} else {
// We only have to do the regular comparison
@@ -177,7 +174,7 @@ __host__ void integer_radix_unsigned_scalar_difference_check_kb(
// - 2 if lhs > rhs
auto comparisons = mem_ptr->tmp_lwe_array_out;
scalar_compare_radix_blocks_kb(streams, gpu_indexes, gpu_count, comparisons,
lhs, rhs, mem_ptr, bsk, ksk,
lhs, rhs, mem_ptr, bsks, ksks,
num_lsb_radix_blocks);
// Reduces a vec containing radix blocks that encrypts a sign
@@ -185,7 +182,7 @@ __host__ void integer_radix_unsigned_scalar_difference_check_kb(
// final sign
tree_sign_reduction(streams, gpu_indexes, gpu_count, lwe_array_out,
comparisons, mem_ptr->diff_buffer->tree_buffer,
sign_handler_f, bsk, ksk, num_lsb_radix_blocks);
sign_handler_f, bsks, ksks, num_lsb_radix_blocks);
}
}
@@ -194,10 +191,9 @@ __host__ void integer_radix_signed_scalar_difference_check_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr,
std::function<Torus(Torus)> sign_handler_f, void *bsk, Torus *ksk,
std::function<Torus(Torus)> sign_handler_f, void **bsks, Torus **ksks,
uint32_t total_num_radix_blocks, uint32_t total_num_scalar_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto glwe_dimension = params.glwe_dimension;
@@ -235,7 +231,7 @@ __host__ void integer_radix_signed_scalar_difference_check_kb(
Torus *are_all_msb_zeros = mem_ptr->tmp_lwe_array_out;
host_compare_with_zero_equality(
streams, gpu_indexes, gpu_count, are_all_msb_zeros, lwe_array_in,
mem_ptr, bsk, ksk, total_num_radix_blocks, mem_ptr->is_zero_lut);
mem_ptr, bsks, ksks, total_num_radix_blocks, mem_ptr->is_zero_lut);
Torus *sign_block =
lwe_array_in + (total_num_radix_blocks - 1) * big_lwe_size;
@@ -276,12 +272,14 @@ __host__ void integer_radix_signed_scalar_difference_check_kb(
auto lut = mem_ptr->diff_buffer->tree_buffer->tree_last_leaf_scalar_lut;
generate_device_accumulator_bivariate<Torus>(
streams[0], gpu_indexes[0], lut->lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, scalar_bivariate_last_leaf_lut_f);
streams[0], gpu_indexes[0], lut->get_lut(gpu_indexes[0], 0),
glwe_dimension, polynomial_size, message_modulus, carry_modulus,
scalar_bivariate_last_leaf_lut_f);
lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
integer_radix_apply_bivariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, are_all_msb_zeros,
sign_block, bsk, ksk, 1, lut, lut->params.message_modulus);
sign_block, bsks, ksks, 1, lut, lut->params.message_modulus);
} else if (total_num_scalar_blocks < total_num_radix_blocks) {
// We have to handle both part of the work described above
@@ -295,101 +293,97 @@ __host__ void integer_radix_signed_scalar_difference_check_kb(
auto lwe_array_lsb_out = mem_ptr->tmp_lwe_array_out;
auto lwe_array_msb_out = lwe_array_lsb_out + big_lwe_size;
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
auto lsb_stream = mem_ptr->lsb_stream;
auto msb_stream = mem_ptr->msb_stream;
#pragma omp parallel sections
{
// Both sections may be executed in parallel
#pragma omp section
{
//////////////
// lsb
Torus *lhs = diff_buffer->tmp_packed_left;
Torus *rhs = diff_buffer->tmp_packed_right;
pack_blocks(lsb_stream, gpu_indexes[0], lhs, lwe_array_in,
big_lwe_dimension, num_lsb_radix_blocks, message_modulus);
pack_blocks(lsb_stream, gpu_indexes[0], rhs, scalar_blocks, 0,
total_num_scalar_blocks, message_modulus);
// From this point we have half number of blocks
num_lsb_radix_blocks /= 2;
num_lsb_radix_blocks += (total_num_scalar_blocks % 2);
// comparisons will be assigned
// - 0 if lhs < rhs
// - 1 if lhs == rhs
// - 2 if lhs > rhs
auto comparisons = mem_ptr->tmp_block_comparisons;
scalar_compare_radix_blocks_kb(&lsb_stream, &gpu_indexes[0], 1,
comparisons, lhs, rhs, mem_ptr, bsk, ksk,
num_lsb_radix_blocks);
// Reduces a vec containing radix blocks that encrypts a sign
// (inferior, equal, superior) to one single radix block containing the
// final sign
tree_sign_reduction(&lsb_stream, &gpu_indexes[0], 1, lwe_array_lsb_out,
comparisons, mem_ptr->diff_buffer->tree_buffer,
mem_ptr->identity_lut_f, bsk, ksk,
num_lsb_radix_blocks);
}
#pragma omp section
{
//////////////
// msb
// We remove the last block (which is the sign)
Torus *are_all_msb_zeros = lwe_array_msb_out;
host_compare_with_zero_equality(
&msb_stream, &gpu_indexes[0], 1, are_all_msb_zeros, msb, mem_ptr,
bsk, ksk, num_msb_radix_blocks, mem_ptr->is_zero_lut);
auto sign_bit_pos = (int)log2(message_modulus) - 1;
auto lut_f = [mem_ptr, sign_bit_pos](Torus sign_block,
Torus msb_are_zeros) {
bool sign_bit_is_set = (sign_block >> sign_bit_pos) == 1;
CMP_ORDERING sign_block_ordering;
if (sign_bit_is_set) {
sign_block_ordering = CMP_ORDERING::IS_INFERIOR;
} else if (sign_block != 0) {
sign_block_ordering = CMP_ORDERING::IS_SUPERIOR;
} else {
sign_block_ordering = CMP_ORDERING::IS_EQUAL;
}
CMP_ORDERING msb_ordering;
if (msb_are_zeros == 1)
msb_ordering = CMP_ORDERING::IS_EQUAL;
else
msb_ordering = CMP_ORDERING::IS_SUPERIOR;
return mem_ptr->diff_buffer->tree_buffer->block_selector_f(
sign_block_ordering, msb_ordering);
};
auto signed_msb_lut = mem_ptr->signed_msb_lut;
generate_device_accumulator_bivariate<Torus>(
msb_stream, gpu_indexes[0], signed_msb_lut->lut,
params.glwe_dimension, params.polynomial_size,
params.message_modulus, params.carry_modulus, lut_f);
Torus *sign_block = msb + (num_msb_radix_blocks - 1) * big_lwe_size;
integer_radix_apply_bivariate_lookup_table_kb(
&msb_stream, &gpu_indexes[0], 1, lwe_array_msb_out, sign_block,
are_all_msb_zeros, bsk, ksk, 1, signed_msb_lut,
signed_msb_lut->params.message_modulus);
}
auto lsb_streams = mem_ptr->lsb_streams;
auto msb_streams = mem_ptr->msb_streams;
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
//////////////
// lsb
Torus *lhs = diff_buffer->tmp_packed_left;
Torus *rhs = diff_buffer->tmp_packed_right;
pack_blocks(lsb_streams[0], gpu_indexes[0], lhs, lwe_array_in,
big_lwe_dimension, num_lsb_radix_blocks, message_modulus);
pack_blocks(lsb_streams[0], gpu_indexes[0], rhs, scalar_blocks, 0,
total_num_scalar_blocks, message_modulus);
// From this point we have half number of blocks
num_lsb_radix_blocks /= 2;
num_lsb_radix_blocks += (total_num_scalar_blocks % 2);
// comparisons will be assigned
// - 0 if lhs < rhs
// - 1 if lhs == rhs
// - 2 if lhs > rhs
auto comparisons = mem_ptr->tmp_block_comparisons;
scalar_compare_radix_blocks_kb(lsb_streams, gpu_indexes, gpu_count,
comparisons, lhs, rhs, mem_ptr, bsks, ksks,
num_lsb_radix_blocks);
// Reduces a vec containing radix blocks that encrypts a sign
// (inferior, equal, superior) to one single radix block containing the
// final sign
tree_sign_reduction(lsb_streams, gpu_indexes, gpu_count, lwe_array_lsb_out,
comparisons, mem_ptr->diff_buffer->tree_buffer,
mem_ptr->identity_lut_f, bsks, ksks,
num_lsb_radix_blocks);
//////////////
// msb
// We remove the last block (which is the sign)
Torus *are_all_msb_zeros = lwe_array_msb_out;
host_compare_with_zero_equality(msb_streams, gpu_indexes, gpu_count,
are_all_msb_zeros, msb, mem_ptr, bsks, ksks,
num_msb_radix_blocks, mem_ptr->is_zero_lut);
auto sign_bit_pos = (int)log2(message_modulus) - 1;
auto lut_f = [mem_ptr, sign_bit_pos](Torus sign_block,
Torus msb_are_zeros) {
bool sign_bit_is_set = (sign_block >> sign_bit_pos) == 1;
CMP_ORDERING sign_block_ordering;
if (sign_bit_is_set) {
sign_block_ordering = CMP_ORDERING::IS_INFERIOR;
} else if (sign_block != 0) {
sign_block_ordering = CMP_ORDERING::IS_SUPERIOR;
} else {
sign_block_ordering = CMP_ORDERING::IS_EQUAL;
}
CMP_ORDERING msb_ordering;
if (msb_are_zeros == 1)
msb_ordering = CMP_ORDERING::IS_EQUAL;
else
msb_ordering = CMP_ORDERING::IS_SUPERIOR;
return mem_ptr->diff_buffer->tree_buffer->block_selector_f(
sign_block_ordering, msb_ordering);
};
auto signed_msb_lut = mem_ptr->signed_msb_lut;
generate_device_accumulator_bivariate<Torus>(
msb_streams[0], gpu_indexes[0],
signed_msb_lut->get_lut(gpu_indexes[0], 0), params.glwe_dimension,
params.polynomial_size, params.message_modulus, params.carry_modulus,
lut_f);
signed_msb_lut->broadcast_lut(streams, gpu_indexes, gpu_indexes[0]);
Torus *sign_block = msb + (num_msb_radix_blocks - 1) * big_lwe_size;
integer_radix_apply_bivariate_lookup_table_kb(
msb_streams, gpu_indexes, gpu_count, lwe_array_msb_out, sign_block,
are_all_msb_zeros, bsks, ksks, 1, signed_msb_lut,
signed_msb_lut->params.message_modulus);
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(lsb_streams[j], gpu_indexes[j]);
cuda_synchronize_stream(msb_streams[j], gpu_indexes[j]);
}
cuda_synchronize_stream(lsb_stream, gpu_indexes[0]);
cuda_synchronize_stream(msb_stream, gpu_indexes[0]);
//////////////
// Reduce the two blocks into one final
reduce_signs(streams, gpu_indexes, gpu_count, lwe_array_out,
lwe_array_lsb_out, mem_ptr, sign_handler_f, bsk, ksk, 2);
lwe_array_lsb_out, mem_ptr, sign_handler_f, bsks, ksks, 2);
} else {
// We only have to do the regular comparison
@@ -397,64 +391,56 @@ __host__ void integer_radix_signed_scalar_difference_check_kb(
// total_num_radix_blocks == total_num_scalar_blocks
uint32_t num_lsb_radix_blocks = total_num_radix_blocks;
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
auto lsb_stream = mem_ptr->lsb_stream;
auto msb_stream = mem_ptr->msb_stream;
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
auto lsb_streams = mem_ptr->lsb_streams;
auto msb_streams = mem_ptr->msb_streams;
auto lwe_array_ct_out = mem_ptr->tmp_lwe_array_out;
auto lwe_array_sign_out =
lwe_array_ct_out + (num_lsb_radix_blocks / 2) * big_lwe_size;
#pragma omp parallel sections
{
// Both sections may be executed in parallel
#pragma omp section
{
Torus *lhs = diff_buffer->tmp_packed_left;
Torus *rhs = diff_buffer->tmp_packed_right;
Torus *lhs = diff_buffer->tmp_packed_left;
Torus *rhs = diff_buffer->tmp_packed_right;
pack_blocks(lsb_stream, gpu_indexes[0], lhs, lwe_array_in,
big_lwe_dimension, num_lsb_radix_blocks - 1,
message_modulus);
pack_blocks(lsb_stream, gpu_indexes[0], rhs, scalar_blocks, 0,
num_lsb_radix_blocks - 1, message_modulus);
pack_blocks(lsb_streams[0], gpu_indexes[0], lhs, lwe_array_in,
big_lwe_dimension, num_lsb_radix_blocks - 1, message_modulus);
pack_blocks(lsb_streams[0], gpu_indexes[0], rhs, scalar_blocks, 0,
num_lsb_radix_blocks - 1, message_modulus);
// From this point we have half number of blocks
num_lsb_radix_blocks /= 2;
// From this point we have half number of blocks
num_lsb_radix_blocks /= 2;
// comparisons will be assigned
// - 0 if lhs < rhs
// - 1 if lhs == rhs
// - 2 if lhs > rhs
scalar_compare_radix_blocks_kb(&lsb_stream, &gpu_indexes[0], 1,
lwe_array_ct_out, lhs, rhs, mem_ptr, bsk,
ksk, num_lsb_radix_blocks);
}
#pragma omp section
{
Torus *encrypted_sign_block =
lwe_array_in + (total_num_radix_blocks - 1) * big_lwe_size;
Torus *scalar_sign_block =
scalar_blocks + (total_num_scalar_blocks - 1);
// comparisons will be assigned
// - 0 if lhs < rhs
// - 1 if lhs == rhs
// - 2 if lhs > rhs
scalar_compare_radix_blocks_kb(lsb_streams, gpu_indexes, gpu_count,
lwe_array_ct_out, lhs, rhs, mem_ptr, bsks,
ksks, num_lsb_radix_blocks);
Torus *encrypted_sign_block =
lwe_array_in + (total_num_radix_blocks - 1) * big_lwe_size;
Torus *scalar_sign_block = scalar_blocks + (total_num_scalar_blocks - 1);
auto trivial_sign_block = mem_ptr->tmp_trivial_sign_block;
create_trivial_radix(msb_stream, gpu_indexes[0], trivial_sign_block,
scalar_sign_block, big_lwe_dimension, 1, 1,
message_modulus, carry_modulus);
auto trivial_sign_block = mem_ptr->tmp_trivial_sign_block;
create_trivial_radix(msb_streams[0], gpu_indexes[0], trivial_sign_block,
scalar_sign_block, big_lwe_dimension, 1, 1,
message_modulus, carry_modulus);
integer_radix_apply_bivariate_lookup_table_kb(
&msb_stream, &gpu_indexes[0], 1, lwe_array_sign_out,
encrypted_sign_block, trivial_sign_block, bsk, ksk, 1,
mem_ptr->signed_lut, mem_ptr->signed_lut->params.message_modulus);
}
integer_radix_apply_bivariate_lookup_table_kb(
msb_streams, gpu_indexes, gpu_count, lwe_array_sign_out,
encrypted_sign_block, trivial_sign_block, bsks, ksks, 1,
mem_ptr->signed_lut, mem_ptr->signed_lut->params.message_modulus);
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(lsb_streams[j], gpu_indexes[j]);
cuda_synchronize_stream(msb_streams[j], gpu_indexes[j]);
}
cuda_synchronize_stream(lsb_stream, gpu_indexes[0]);
cuda_synchronize_stream(msb_stream, gpu_indexes[0]);
// Reduces a vec containing radix blocks that encrypts a sign
// (inferior, equal, superior) to one single radix block containing the
// final sign
reduce_signs(streams, gpu_indexes, gpu_count, lwe_array_out,
lwe_array_ct_out, mem_ptr, sign_handler_f, bsk, ksk,
lwe_array_ct_out, mem_ptr, sign_handler_f, bsks, ksks,
num_lsb_radix_blocks + 1);
}
}
@@ -463,7 +449,7 @@ template <typename Torus>
__host__ void integer_radix_signed_scalar_maxmin_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr, void *bsk, Torus *ksk,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
uint32_t total_num_radix_blocks, uint32_t total_num_scalar_blocks) {
cudaSetDevice(gpu_indexes[0]);
@@ -475,7 +461,7 @@ __host__ void integer_radix_signed_scalar_maxmin_kb(
auto sign = mem_ptr->tmp_lwe_array_out;
integer_radix_signed_scalar_difference_check_kb(
streams, gpu_indexes, gpu_count, sign, lwe_array_in, scalar_blocks,
mem_ptr, mem_ptr->identity_lut_f, bsk, ksk, total_num_radix_blocks,
mem_ptr, mem_ptr->identity_lut_f, bsks, ksks, total_num_radix_blocks,
total_num_scalar_blocks);
// There is no optimized CMUX for scalars, so we convert to a trivial
@@ -490,9 +476,10 @@ __host__ void integer_radix_signed_scalar_maxmin_kb(
// Selector
// CMUX for Max or Min
host_integer_radix_cmux_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, sign, lwe_array_left,
lwe_array_right, mem_ptr->cmux_buffer, bsk, ksk, total_num_radix_blocks);
host_integer_radix_cmux_kb(streams, gpu_indexes, gpu_count, lwe_array_out,
sign, lwe_array_left, lwe_array_right,
mem_ptr->cmux_buffer, bsks, ksks,
total_num_radix_blocks);
}
template <typename Torus>
@@ -500,19 +487,19 @@ __host__ void host_integer_radix_scalar_difference_check_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr,
std::function<Torus(Torus)> sign_handler_f, void *bsk, Torus *ksk,
std::function<Torus(Torus)> sign_handler_f, void **bsks, Torus **ksks,
uint32_t total_num_radix_blocks, uint32_t total_num_scalar_blocks) {
if (mem_ptr->is_signed) {
// is signed and scalar is positive
integer_radix_signed_scalar_difference_check_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_in,
scalar_blocks, mem_ptr, sign_handler_f, bsk, ksk,
scalar_blocks, mem_ptr, sign_handler_f, bsks, ksks,
total_num_radix_blocks, total_num_scalar_blocks);
} else {
integer_radix_unsigned_scalar_difference_check_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_in,
scalar_blocks, mem_ptr, sign_handler_f, bsk, ksk,
scalar_blocks, mem_ptr, sign_handler_f, bsks, ksks,
total_num_radix_blocks, total_num_scalar_blocks);
}
}
@@ -521,32 +508,32 @@ template <typename Torus>
__host__ void host_integer_radix_signed_scalar_maxmin_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr, void *bsk, Torus *ksk,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
uint32_t total_num_radix_blocks, uint32_t total_num_scalar_blocks) {
if (mem_ptr->is_signed) {
// is signed and scalar is positive
integer_radix_signed_scalar_maxmin_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_in,
scalar_blocks, mem_ptr, bsk, ksk, total_num_radix_blocks,
scalar_blocks, mem_ptr, bsks, ksks, total_num_radix_blocks,
total_num_scalar_blocks);
} else {
integer_radix_unsigned_scalar_maxmin_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_in,
scalar_blocks, mem_ptr, bsk, ksk, total_num_radix_blocks,
scalar_blocks, mem_ptr, bsks, ksks, total_num_radix_blocks,
total_num_scalar_blocks);
}
}
template <typename Torus>
__host__ void
scalar_compare_radix_blocks_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array_out,
Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
__host__ void scalar_compare_radix_blocks_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
uint32_t num_radix_blocks) {
cudaSetDevice(gpu_indexes[0]);
if (num_radix_blocks == 0)
return;
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto message_modulus = params.message_modulus;
@@ -579,8 +566,8 @@ scalar_compare_radix_blocks_kb(cudaStream_t *streams, uint32_t *gpu_indexes,
// Apply LUT to compare to 0
auto sign_lut = mem_ptr->eq_buffer->is_non_zero_lut;
integer_radix_apply_univariate_lookup_table_kb(
streams, gpu_indexes, gpu_count, lwe_array_out, subtracted_blocks, bsk,
ksk, num_radix_blocks, sign_lut);
streams, gpu_indexes, gpu_count, lwe_array_out, subtracted_blocks, bsks,
ksks, num_radix_blocks, sign_lut);
// Add one
// Here Lhs can have the following values: (-1) % (message modulus * carry
@@ -594,10 +581,9 @@ template <typename Torus>
__host__ void host_integer_radix_scalar_maxmin_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr, void *bsk, Torus *ksk,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
uint32_t total_num_radix_blocks, uint32_t total_num_scalar_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
// Calculates the difference sign between the ciphertext and the scalar
@@ -607,7 +593,7 @@ __host__ void host_integer_radix_scalar_maxmin_kb(
auto sign = mem_ptr->tmp_lwe_array_out;
host_integer_radix_scalar_difference_check_kb(
streams, gpu_indexes, gpu_count, sign, lwe_array_in, scalar_blocks,
mem_ptr, mem_ptr->identity_lut_f, bsk, ksk, total_num_radix_blocks,
mem_ptr, mem_ptr->identity_lut_f, bsks, ksks, total_num_radix_blocks,
total_num_scalar_blocks);
// There is no optimized CMUX for scalars, so we convert to a trivial
@@ -624,7 +610,7 @@ __host__ void host_integer_radix_scalar_maxmin_kb(
// CMUX for Max or Min
host_integer_radix_cmux_kb(streams, gpu_indexes, gpu_count, lwe_array_out,
mem_ptr->tmp_lwe_array_out, lwe_array_left,
lwe_array_right, mem_ptr->cmux_buffer, bsk, ksk,
lwe_array_right, mem_ptr->cmux_buffer, bsks, ksks,
total_num_radix_blocks);
}
@@ -632,10 +618,9 @@ template <typename Torus>
__host__ void host_integer_radix_scalar_equality_check_kb(
cudaStream_t *streams, uint32_t *gpu_indexes, uint32_t gpu_count,
Torus *lwe_array_out, Torus *lwe_array_in, Torus *scalar_blocks,
int_comparison_buffer<Torus> *mem_ptr, void *bsk, Torus *ksk,
int_comparison_buffer<Torus> *mem_ptr, void **bsks, Torus **ksks,
uint32_t num_radix_blocks, uint32_t num_scalar_blocks) {
cudaSetDevice(gpu_indexes[0]);
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto message_modulus = params.message_modulus;
@@ -662,74 +647,69 @@ __host__ void host_integer_radix_scalar_equality_check_kb(
auto lwe_array_msb_out =
lwe_array_lsb_out + big_lwe_size * num_halved_lsb_radix_blocks;
cuda_synchronize_stream(streams[0], gpu_indexes[0]);
auto lsb_stream = mem_ptr->lsb_stream;
auto msb_stream = mem_ptr->msb_stream;
#pragma omp parallel sections
{
// Both sections may be executed in parallel
#pragma omp section
{
if (num_halved_scalar_blocks > 0) {
auto packed_blocks = mem_ptr->tmp_packed_input;
auto packed_scalar =
packed_blocks + big_lwe_size * num_halved_lsb_radix_blocks;
pack_blocks(lsb_stream, gpu_indexes[0], packed_blocks, lsb,
big_lwe_dimension, num_lsb_radix_blocks, message_modulus);
pack_blocks(lsb_stream, gpu_indexes[0], packed_scalar, scalar_blocks, 0,
num_scalar_blocks, message_modulus);
cuda_memcpy_async_gpu_to_gpu(scalar_comparison_luts->lut_indexes,
packed_scalar,
num_halved_scalar_blocks * sizeof(Torus),
lsb_stream, gpu_indexes[0]);
integer_radix_apply_univariate_lookup_table_kb(
&lsb_stream, &gpu_indexes[0], 1, lwe_array_lsb_out, packed_blocks,
bsk, ksk, num_halved_lsb_radix_blocks, scalar_comparison_luts);
}
}
#pragma omp section
{
//////////////
// msb
if (num_msb_radix_blocks > 0) {
int_radix_lut<Torus> *msb_lut;
switch (mem_ptr->op) {
case COMPARISON_TYPE::EQ:
msb_lut = mem_ptr->is_zero_lut;
break;
case COMPARISON_TYPE::NE:
msb_lut = mem_ptr->eq_buffer->is_non_zero_lut;
break;
default:
PANIC("Cuda error: integer operation not supported")
}
host_compare_with_zero_equality(&msb_stream, &gpu_indexes[0], 1,
lwe_array_msb_out, msb, mem_ptr, bsk,
ksk, num_msb_radix_blocks, msb_lut);
}
}
for (uint j = 0; j < gpu_count; j++) {
cuda_synchronize_stream(streams[j], gpu_indexes[j]);
}
cuda_synchronize_stream(lsb_stream, gpu_indexes[0]);
cuda_synchronize_stream(msb_stream, gpu_indexes[0]);
auto lsb_streams = mem_ptr->lsb_streams;
auto msb_streams = mem_ptr->msb_streams;
if (num_halved_scalar_blocks > 0) {
auto packed_blocks = mem_ptr->tmp_packed_input;
auto packed_scalar =
packed_blocks + big_lwe_size * num_halved_lsb_radix_blocks;
pack_blocks(lsb_streams[0], gpu_indexes[0], packed_blocks, lsb,
big_lwe_dimension, num_lsb_radix_blocks, message_modulus);
pack_blocks(lsb_streams[0], gpu_indexes[0], packed_scalar, scalar_blocks, 0,
num_scalar_blocks, message_modulus);
cuda_memcpy_async_gpu_to_gpu(
scalar_comparison_luts->get_lut_indexes(gpu_indexes[0], 0),
packed_scalar, num_halved_scalar_blocks * sizeof(Torus), lsb_streams[0],
gpu_indexes[0]);
scalar_comparison_luts->broadcast_lut(lsb_streams, gpu_indexes, 0);
integer_radix_apply_univariate_lookup_table_kb(
lsb_streams, gpu_indexes, gpu_count, lwe_array_lsb_out, packed_blocks,
bsks, ksks, num_halved_lsb_radix_blocks, scalar_comparison_luts);
}
//////////////
// msb
if (num_msb_radix_blocks > 0) {
int_radix_lut<Torus> *msb_lut;
switch (mem_ptr->op) {
case COMPARISON_TYPE::EQ:
msb_lut = mem_ptr->is_zero_lut;
break;
case COMPARISON_TYPE::NE:
msb_lut = mem_ptr->eq_buffer->is_non_zero_lut;
break;
default:
PANIC("Cuda error: integer operation not supported")
}
host_compare_with_zero_equality(msb_streams, gpu_indexes, gpu_count,
lwe_array_msb_out, msb, mem_ptr, bsks, ksks,
num_msb_radix_blocks, msb_lut);
}
for (uint j = 0; j < mem_ptr->active_gpu_count; j++) {
cuda_synchronize_stream(lsb_streams[j], gpu_indexes[j]);
cuda_synchronize_stream(msb_streams[j], gpu_indexes[j]);
}
switch (mem_ptr->op) {
case COMPARISON_TYPE::EQ:
are_all_comparisons_block_true(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_lsb_out,
mem_ptr, bsk, ksk,
mem_ptr, bsks, ksks,
num_halved_scalar_blocks + (num_msb_radix_blocks > 0));
break;
case COMPARISON_TYPE::NE:
is_at_least_one_comparisons_block_true(
streams, gpu_indexes, gpu_count, lwe_array_out, lwe_array_lsb_out,
mem_ptr, bsk, ksk,
mem_ptr, bsks, ksks,
num_halved_scalar_blocks + (num_msb_radix_blocks > 0));
break;
default:

Some files were not shown because too many files have changed in this diff Show More