Compare commits

...

448 Commits

Author SHA1 Message Date
aquint-zama
2bbcf6e5b3 doc: fix broken links 2024-06-21 17:51:40 +02:00
tmontaigu
0d7a88e640 chore(tfhe): bump to 0.5.5 2024-05-23 17:01:01 +02:00
tmontaigu
77656cd055 fix(tfhe): fix build deterministic 2024-05-23 17:01:01 +02:00
tmontaigu
8ce964cb18 feat(c_api): quick 'n' dirty C API for some array fn 2024-05-22 15:12:24 +02:00
tmontaigu
4ea368d395 feat(integer): add contains_sub_slice
This method returns a Boolean block that tells whether a
slice of radix contains a sub slice.
2024-05-22 15:12:24 +02:00
tmontaigu
59b029e038 feat(integer): add an eq_slice function
This adds eq_slices functions.

This function, compares to slices of radix ciphertexts
and returns true if all pairs of element are equal.
2024-05-22 15:12:24 +02:00
Arthur Meyre
1403663033 chore(tfhe): add forward compatibility with 0.6 2024-04-22 09:28:59 +02:00
Arthur Meyre
0a307497cd chore(ci): to avoid stack overlow crashes increase thread stack size
- Default Linux thread stack size seems to be 8 MB, rust limits it to 2 MB
by default, change that to avoid tests failing because of overflowed stacks
2024-04-22 09:28:59 +02:00
Arthur Meyre
0ce0567cef chore(tfhe): bump version to 0.5.3 2024-02-28 14:59:18 +01:00
Arthur Meyre
e9c19b419d fix(shortint): use proper noise value during compact list encryption 2024-02-28 14:59:18 +01:00
tmontaigu
5b653864b7 chore(tfhe): bump version to 0.5.2 2024-02-23 10:21:47 +01:00
Arthur Meyre
a1d189b415 chore(ci): update macOS runner for cargo builds 2024-02-23 10:21:47 +01:00
sarah el kazdadi
c59434f183 chore(ci): update toolchain, fix clippy warnings 2024-02-23 10:21:47 +01:00
David Testé
83239e6afa chore(bench): implement integer casting benchmarks 2024-02-23 10:21:47 +01:00
sarah el kazdadi
ef8cb0273f fix(tfhe): update pulp and bytemuck to fix nightly breakage 2024-02-23 10:21:47 +01:00
tmontaigu
9b353bac2d fix(integer): correct degree in small comparisons 2024-02-23 10:21:47 +01:00
tmontaigu
46d65f1f87 fix(capi): add missing function on FheBool
- safe ser/de
- classical ser/de
- comparisons
- scalar binary fn/comparisons
- compact & compressed fhe bool encryption
2024-02-23 10:21:47 +01:00
tmontaigu
a63a2cb725 chore(hlapi): add tests for fhe_bool 2024-02-23 10:21:47 +01:00
tmontaigu
c45af05ec6 fix(integer): make encrypt_bool specify the degree
encrypt_one_block does not leak information
on the message.
BooleanBlocks are meant for when we want to
be explicit that the value is a boolean
and are ok for this to be public.

Thus it needs to correctly set the degree to 1
for other operations to properly take advantage of that
2024-02-23 10:21:47 +01:00
tmontaigu
584eaeb4ed fix(shortint): fix bitwise opts degree
We used `after_bitand/or/xor` on the ct_left
**after** the lut had changed its degree.
So the `after_bit` function computed the
resulting using a wrong degree for the left
ct.
2024-02-23 10:21:47 +01:00
tmontaigu
8d94ed2512 fix(hlapi): bind missing cuda bitnot 2024-02-23 10:21:47 +01:00
tmontaigu
b8d9dbe85b refactor(hlapi): split long files of hlapi
This splits the long base.rs files into multiple ones,
to make it easier to navigate.

There is no code changes appart from moving stuff.
2024-02-23 10:21:47 +01:00
tmontaigu
ad25340c33 feat(capi): add Cuda support
- This adds GPU support in the C API
- Also make ctest (cmake test launcher) print
  test output when it fails
2024-02-23 10:21:47 +01:00
Arthur Meyre
ad1ae0c8c2 chore(ci): update scripts and Makefile for future forward compatibility 2024-01-31 18:22:15 +01:00
Arthur Meyre
ee40906b8b chore(ci): convert some make targets to be semver trick compatible 2024-01-31 18:22:15 +01:00
Arthur Meyre
bf6b4cc541 chore(tfhe): bump version to 0.5.1 2024-01-30 10:51:39 +01:00
Arthur Meyre
24404567a4 chore(tfhe): bump tfhe-cuda-backend version to 0.1.3 2024-01-30 10:51:39 +01:00
tmontaigu
052dd4a60e feat(integer): fuse two PBS in comparisons
In comparisons, we were reducing a vec of orderings
(inferior, equal, superior) into one final ordering,
and then we would do one final PBS to transform that
into a boolean value (0 or 1) depending what was wanted
(<=, <, >, >=).

This fuse the last PBS (ordering -> boolean value) with
the last round of reduction, when there are only two blocks left
to be reduced.

This allows to gain one PBS. Meaning for ciphertext/cipheretxt
comparisons we get back the performance lost introduced by
the fix in f4c220c1. And comparisons between a clear and
ciphertext get an improvement.
2024-01-30 10:51:39 +01:00
tmontaigu
f8d829d076 fix(integer): add noise cleaning pbs in comparisons
In comparisons we were packing blocks to then do a subtraction
between them. However this goes above the noise limit
that would guarentee the advertised error propability.

To fix that we add a pbs to clean the noise. This pbs only needs
to be added in the ciphertext/ciphertext comparisons. Making them slower
by 1 PBS.
2024-01-30 10:51:39 +01:00
dependabot[bot]
d9761ca17e chore(deps): bump codecov/codecov-action from 3.1.4 to 3.1.5
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3.1.4 to 3.1.5.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](eaaf4bedf3...4fe8c5f003)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-30 10:51:39 +01:00
dependabot[bot]
8d2e15347b chore(deps): bump tj-actions/changed-files from 42.0.0 to 42.0.2
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 42.0.0 to 42.0.2.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](ae82ed4ae0...90a06d6ba9)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-30 10:51:39 +01:00
dependabot[bot]
a368257bc7 chore(deps): bump actions/upload-artifact from 4.1.0 to 4.3.0
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.1.0 to 4.3.0.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4.1.0...26f96dfa697d77e81fd5907df203aa23a56210a8)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-30 10:51:39 +01:00
David Testé
76d23d0c91 chore(bench): add ciphertexts sum to integer benchmarks 2024-01-30 10:51:39 +01:00
David Testé
ddc5002232 chore(bench): add pbs benchmarks on gpu 2024-01-30 10:51:39 +01:00
tmontaigu
c08c479616 docs(hlapi): document trivial encryption to debug 2024-01-30 10:51:39 +01:00
tmontaigu
f26afc16de docs(hlapi): document how to use rayon 2024-01-30 10:51:39 +01:00
yuxizama
13f533f6fb chore(docs): update readme links and badges 2024-01-30 10:51:39 +01:00
yuxizama
d9541e472b chore(docs): update README.md
Change support banner
2024-01-30 10:51:39 +01:00
Agnes Leroy
3453e45258 fix(gpu): make all async functions unsafe, fix cuda_drop binding, add missing sync 2024-01-30 10:51:39 +01:00
David Testé
55de96f046 chore(ci): add gpu tests from user documentation 2024-01-30 10:51:39 +01:00
Agnes Leroy
9747c06f6e chore(gpu): fix formatting command 2024-01-30 10:51:39 +01:00
Agnes Leroy
00f72d2c13 chore(gpu): fix compilation when no nvidia gpu is available 2024-01-30 10:51:39 +01:00
tmontaigu
01f5cb9056 fix(integer): is_scalar_out_of_bounds handles bigger ct
Fix a bug where in is_scalar_out_of_bounds, if the scalar was
negative and the ciphertext a signed one with more blocks than
the decomposed scalar, we would do an out of bound access
(i.e a panic).

This fixes that, this will fix doing signed_overflowing_mul on 256 bits
where the bug first appeared
2024-01-30 10:51:39 +01:00
David Testé
d66e313fa4 chore(ci): fix inputs for gpu full benchmark workflow 2024-01-30 10:51:39 +01:00
Arthur Meyre
c9d530e642 fix(core): ignore value in the body when doing LWE encryption 2024-01-30 10:51:39 +01:00
Agnes Leroy
6c2096fe52 chore(gpu): rename "test vector" -> "luts" and "tvi" -> "lut_indexes" 2024-01-30 10:51:39 +01:00
Agnes Leroy
1e94134dda chore(gpu): move around code in integer.h for better readability 2024-01-30 10:51:39 +01:00
tmontaigu
c76a60111c fix(integer): fix cast in scalar_shift/rotate
In scalar_shift/rotate, we get the number of bits to shift/rotate
as a generic type, the can be casted to u64.

We compute the total number of bits the ciphertext has, cast that number
to the same type as the scalar, and do "shift % num_bits".

However, if the number of bits computed exceeds the max value the scalar
type can hold, we could end up doing a remainder with 0.

e.g 256bits ciphertext and scalar type u8 => 256u64 casted to u8 results
in 0.

Fix that by casting the scalar value to u64.
2024-01-30 10:51:39 +01:00
tmontaigu
18ff400df2 chore(hlapi): remove leftover file
This file was not correctly removed during the refactor
2024-01-30 10:51:39 +01:00
David Testé
3d31d09be5 chore(ci): change rust-toolchain action
Github thrid-party Action actions-rs/toolchain is not maintained
anymore. We switch to dtolnay/rust-toolchain.
2024-01-30 10:51:39 +01:00
David Testé
76322606f2 chore(ci): set rustbacktrace var to full to ease debug on failure 2024-01-30 10:51:39 +01:00
dependabot[bot]
bf58a9f0c6 chore(deps): bump actions/upload-artifact from 3.1.2 to 4.2.0
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.2 to 4.2.0.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v3.1.2...694cdabd8bdb0f10b2cea11669e1bf5453eed0a6)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-30 10:51:39 +01:00
dependabot[bot]
64461c82b4 chore(deps): bump tj-actions/changed-files from 41.1.1 to 42.0.0
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 41.1.1 to 42.0.0.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](62f4729b5d...ae82ed4ae0)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-30 10:51:39 +01:00
dependabot[bot]
339c84fbd9 chore(deps): bump actions/checkout from 3.5.3 to 4.1.1
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.5.3 to 4.1.1.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3.5.3...b4ffde65f46336ab88eb53be808477a3936bae11)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-30 10:51:39 +01:00
Arthur Meyre
bc682a5ffb docs(bench): add scalar benchmarks for integer 2024-01-29 16:42:32 +01:00
Arthur Meyre
2920daf2d9 chore(docs): fix link to 0.4 semver doc 2024-01-23 10:50:25 +01:00
Arthur Meyre
e7352eee8b chore(tfhe): add version for tfhe-cuda-backend dependency 2024-01-22 17:24:51 +01:00
J-B Orfila
33a7e9f3e4 doc(gpu): add how to page about running on GPU 2024-01-22 17:18:02 +01:00
David Testé
96da25ce90 chore(ci): separate clippy and tests in different job steps 2024-01-22 16:04:07 +01:00
Agnes Leroy
548f2e5d05 chore(gpu): fix gpu package for publication 2024-01-22 15:29:26 +01:00
Arthur Meyre
f313b58c8e doc(tfhe): add doc page about data migration which will redirect to 0.4 doc 2024-01-22 13:43:44 +01:00
J-B Orfila
fd038346b7 doc: overflow detection 2024-01-22 13:42:55 +01:00
tmontaigu
f0fcfd517b feat(hlapi): update wasm and c_api 2024-01-22 10:06:49 +01:00
Arthur Meyre
0d2448e9e9 feat(hl_api): add raw parts API for Ciphertext types 2024-01-19 13:05:26 +01:00
Arthur Meyre
68ef237ae6 feat(hl_api): add raw parts API for ServerKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
fb39864f05 feat(hl_api): add raw parts API for PublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
69a5562aba feat(hl_api): add raw parts API for CompressedServerKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
d49ffdd26f feat(hl_api): add raw parts API for CompressedPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
c87c362d42 feat(hl_api): add raw part API for CompressedCompactPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
905ef4ea78 feat(hl_api): add raw parts API for CompactPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
957dd47295 feat(boolean): add raw parts API for PublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
0a550ac803 feat(boolean): add raw parts API for CompressedPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
8c62155429 feat(boolean): add raw parts API for CompressedServerKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
3e23631bdc feat(boolean): add raw parts API for ServerKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
02b2fcf78d feat(boolean): add raw parts API for KeySwitchingKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
d9d222c1b5 feat(integer): add raw parts API for CompressedCompactPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
f2011cd30d feat(integer): add raw parts API for CompactPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
1b3d41ec44 feat(integer): add raw parts API for CompressedPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
65816a175a feat(integer): add raw parts API for PublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
2d1cf95900 feat(integer): add raw parts API for KeySwitchingKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
247072a81a feat(integer): add raw parts API for CompressedServerKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
c65a58c14f feat(integer): add raw parts API for ServerKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
3a4859553e refactor(integer): add raw parts API for Wopbskey instead of from_shortint 2024-01-19 13:05:26 +01:00
Arthur Meyre
9c3a159ca1 feat(integer): add raw parts API for CompactCiphertextList 2024-01-19 13:05:26 +01:00
Arthur Meyre
e76ddd5a49 feat(shortint): add raw parts API for WopbsKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
fa35b6ef8f feat(boolean): add raw parts API for CompressedCiphertext 2024-01-19 13:05:26 +01:00
Arthur Meyre
a6e835b3f1 feat(shortint): add raw parts API for CompactCiphertext and related list 2024-01-19 13:05:26 +01:00
Arthur Meyre
c586d64fab feat(shortint): add raw parts API for CompressedServerKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
6108f180bf feat(shortint): add raw parts API to PublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
8f0e4f6c99 feat(shortint): add raw parts API for CompressedPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
473b7e0c6a feat(shortint): add raw parts API to CompressedCompactPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
e9c92bc9a3 feat(shortint): raw parts API for CompactPublicKey 2024-01-19 13:05:26 +01:00
Arthur Meyre
e4c0c4c15f feat(shortint): add function to recompute thread count 2024-01-19 13:05:26 +01:00
Arthur Meyre
ea7c579efc feat(shortint): add raw parts API for ServerKey and KS Key
- add a method to easily get the expected LweDimension of a shortint
Ciphertext for a given ServerKey
2024-01-19 13:05:26 +01:00
Arthur Meyre
c9d19fca19 feat(tfhe): add raw parts API for HL API forward compatibility
- allows to convert HL API ClientKey from 0.4 to 0.5
- add raw parts API for shortint ClientKey
- add raw parts API for integer ClientKey
- add raw parts API for HL API IntegersClientKey
- add raw parts API for HL API ClientKey
2024-01-19 13:05:26 +01:00
Arthur Meyre
f3e6074480 feat(core): backward compatibility types for fft re-exported 2024-01-19 13:05:26 +01:00
tmontaigu
17c110f536 feat(hlapi): add gpu support 2024-01-19 09:27:04 +01:00
Arthur Meyre
45e27d8836 chore(core): remove Serialize and Deserialize on GlweBody
- users are not expected to use those structs for serialization
2024-01-18 17:42:07 +01:00
tmontaigu
ee10508c99 refactor(hlapi): prepare hlapi for different backends/representations
- Split GenericInteger into FheUint and FheInt
- Prepare FheUint and FheBool to support different backends
- Allow the use of 1_1 parameters

BREAKING CHANGE: unset_server_key no longer returns the server key
2024-01-18 16:58:11 +01:00
Beka Barbakadze
9213436b93 fix(gpu): fix mem reuse in multiplication, remove extra variable from lut constructor 2024-01-18 16:06:07 +01:00
Beka Barbakadze
aa2a8e31fe feat(gpu): return different chunk sizes for different number of ciphertexts 2024-01-18 16:06:07 +01:00
Beka Barbakadze
d49b8235bf feat(gpu): implement memory reuse for lut objects 2024-01-18 16:06:07 +01:00
David Testé
cec4a5b60b chore(bench): fix multi-bit parameters selection for multi-bit
Operation flavors for GPU don't end with "_gpu" anymore. So we
can't rely on these string, instead we are using the configuration
feature to select which multi-bit parameters set we use.
The bit size limitation for CPU in multi-bit is also put back in
place.
2024-01-18 15:43:53 +01:00
Mayeul@Zama
15147b4359 feat(integer): add oprf bench 2024-01-18 11:03:37 +01:00
Mayeul@Zama
f2ee360a47 feat(integer): add oblivious pseudo-random function 2024-01-18 11:03:37 +01:00
Mayeul@Zama
6631aae069 feat(shortint): add oblivious pseudo-random function 2024-01-18 11:03:37 +01:00
Mayeul@Zama
fa9cd866e4 feat(shortint): add no encoding functions 2024-01-18 11:03:37 +01:00
Pedro Alves
c632ac1b9a feat(gpu): add tfhe-cuda-backend to the repository 2024-01-18 10:14:36 +01:00
Arthur Meyre
f0e6b4c395 chore(ci): add newline to workflow 2024-01-17 18:38:45 +01:00
David Testé
2cd51ed36d chore(ci): add placeholder workflow for cuda backend 2024-01-17 17:59:44 +01:00
tmontaigu
dc04a5138e feat(integer): add signed_overflowing_mul 2024-01-17 10:54:58 +01:00
tmontaigu
eda338aa29 fix(integer): fix decrypting negative value of N >= 32 BITS
Bug was probably introduced in 48405959a4
2024-01-17 10:54:58 +01:00
Arthur Meyre
df6fa86481 refactor(c_api): use DynamicBuffer as the buffer type between C and Rust
- allows to share data safely between C and Rust and make sure it gets
destroyed with the right destructor
- apply formatting to C tests
- add a script to symlink the dependency lib to a fixed name
- the rust build system adds a fingerprint to the lib name which is not
practical when we are linking our C test executable
- link the most recent artifacts corresponding to the dependency we have
2024-01-17 10:45:02 +01:00
David Testé
6742e150b0 chore(ci): add overflowing unsigned multiplication to benchmarks 2024-01-16 17:52:35 +01:00
tmontaigu
93fac32755 refactor(shortint): make scalar_eq/scalar_not_eq take & not &mut
BREAKING_CHANGE: change mutability of scalar_equal/scalar_not_equal
arguement
2024-01-16 16:21:10 +01:00
Arthur Meyre
9ac57e75c9 feat(boolean): add raw parts methods to the ClientKey
- into_raw_parts allows to deconstruct a ClientKey
- new_from_raw_parts allows to construct a ClientKey
2024-01-16 13:08:27 +01:00
David Testé
a7abee0491 chore(ci): gather common benchmarks groups
Instead of having multiple BENCH_OP_FLAVOR to run for full
benchmarks, criterion groups are now gathered by operation kind
(default, smart, unchecked).
2024-01-15 17:09:00 +01:00
tmontaigu
0228a58cfc feat(integer): add cast_to_signed/unsigned
These functions does the logic of uX/iX as iX and uX/iX as uX
2024-01-15 15:14:54 +01:00
tmontaigu
f98c680e95 feat(integer): add unsigned overflowing mul 2024-01-15 12:24:04 +01:00
dependabot[bot]
dcc3d267e4 chore(deps): bump actions/upload-artifact from 4.0.0 to 4.1.0
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.0.0 to 4.1.0.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](c7d193f32e...1eb3cb2b3e)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-15 10:15:31 +01:00
dependabot[bot]
2067092e0a chore(deps): bump tj-actions/changed-files from 41.0.1 to 41.1.1
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 41.0.1 to 41.1.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](716b1e1304...62f4729b5d)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-15 10:13:41 +01:00
Arthur Meyre
7c50216f7a chore(core_crypto): remove Serde on some decomposition structs
- those structures are not expected to be serialized by a user
2024-01-10 18:25:18 +01:00
David Testé
77c0532793 chore(ci): add overflowing add and sub to integer benchmarks 2024-01-10 15:26:28 +01:00
Arthur Meyre
00ddfdec8b refactor(boolean): call as_view on LWE secret keys where appropriate
- previous code had to manually create a view from a container, this is
less convoluted and more user friendly
2024-01-08 17:17:00 +01:00
Arthur Meyre
ef9ec13999 refactor(shortint): remove the large_lwe_secret_key from the ClientKey
- accessors are provided to access the large and small LWE secret keys and
make sure they have compatible data types when used in functions behaving
differently depending on the encryption key choice
2024-01-08 17:17:00 +01:00
Arthur Meyre
b53c8aac3f feat(core): add view types and "as_view" functions for (G)LWE secret keys 2024-01-08 17:17:00 +01:00
Arthur Meyre
f052c1f8ba chore(doc): fix docstring for FheIntX types 2024-01-05 17:30:15 +01:00
Mayeul@Zama
fef6d18605 doc(all): add doc for safe_(de)serialization 2024-01-05 10:24:36 +01:00
Mayeul@Zama
e5505ab686 chore(all): update SERIALIZATION_VERSION for 0.5 release 2024-01-05 10:24:27 +01:00
Arthur Meyre
ab2c5f09a8 chore(tfhe): update 2023 to 2024 in license files and other places 2024-01-04 16:26:21 +01:00
Mayeul@Zama
40ae841a15 refactor(all): make paste dependency non optional 2024-01-04 10:33:36 +01:00
Mayeul@Zama
0a317c5f0e refactor(all): remove safe-deserialization feature
bincode dependency is not optional anymore
2024-01-04 10:33:36 +01:00
Arthur Meyre
415a8a2de5 chore(c_api): add the c_api code from the docs as a test 2024-01-03 13:26:05 +01:00
Arthur Meyre
935da25360 doc(c_api): Add an output for the users compiling the C API example 2024-01-03 13:26:05 +01:00
tmontaigu
dbeff4e4b4 docs(capi): fix C API example 2024-01-03 13:26:05 +01:00
Arthur Meyre
1e50d0cdd2 chore(doc): fix latex equation typo preventing formatting on GitHub
refs https://github.com/zama-ai/tfhe-rs/issues/748
2024-01-02 17:05:12 +01:00
dependabot[bot]
7c5551bf45 chore(deps): bump actions/upload-artifact from 3.1.3 to 4.0.0
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.3 to 4.0.0.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](a8a3f3ad30...c7d193f32e)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-02 13:17:58 +01:00
dependabot[bot]
95c36d54cb chore(deps): bump tj-actions/changed-files from 40.2.1 to 41.0.1
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 40.2.1 to 41.0.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](1c938490c8...716b1e1304)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-02 13:17:45 +01:00
Benoit Chevallier-Mames
384e15ca5a style(docs): fixing a few typos in the README 2023-12-21 11:51:52 +01:00
tmontaigu
526a53f3d4 feat(integer): add signed_overflowing_scalar_add/sub 2023-12-15 10:57:21 +01:00
Mayeul@Zama
7d17b71740 fix(shortint): fix smart_add/sub 2023-12-13 09:55:20 +01:00
Mayeul@Zama
8ecb85e4dd refactor(shortint): is_possible ops take CtNoiseDegree 2023-12-13 09:55:20 +01:00
Mayeul@Zama
cf30db7a30 fix(shortint): rename smart_apply_lookup_table_bivariate 2023-12-13 09:55:20 +01:00
Mayeul@Zama
2599f7d5ea fix(shortint): fix ciphertexts_can_be_packed_without_exceeding_space_or_noise 2023-12-13 09:55:20 +01:00
Mayeul@Zama
ae88bb3264 fix(shortint): fix smart_evaluate_bivariate_function 2023-12-13 09:55:20 +01:00
Mayeul@Zama
6fb898db66 refactor(shortint): move bivariate_pbs to its own module 2023-12-13 09:55:20 +01:00
Mayeul@Zama
e750d2cd92 refactor(shortint): use smart_evaluate_bivariate_function and evaluate_univariate_function 2023-12-13 09:55:20 +01:00
Arthur Meyre
d8586080da chore(integer): simplify an API to create a ServerKey from a shortint one
- the API required passing the client_key to get access to the message and
carry modulus, as the shortint ServerKey already has that information drop
the requirement to pass the ClientKey

BREAKING CHANGE:
new_radix_server_key_from_shortint, new_crt_server_key_from_shortint APIs
have changed and no longer require a ClienKey, the max degree methods now
only take a MessageModulus and CarryModulus instead of the full parameter
set
2023-12-13 09:41:59 +01:00
tmontaigu
4cc2e85556 feat(integer): add unsigned_overflowing_scalar_sub 2023-12-12 16:08:06 +01:00
tmontaigu
303a65c88d feat(integer): add unsigned_overflowing_scalar_add 2023-12-12 16:08:06 +01:00
Arthur Meyre
18c01e74d6 chore(core_crypto): remove legacy new types 2023-12-12 11:06:06 +01:00
dependabot[bot]
a8b6c72910 chore(deps): bump tj-actions/changed-files from 40.2.0 to 40.2.1
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 40.2.0 to 40.2.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](da093c1609...1c938490c8)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-12-11 10:21:53 +01:00
Arthur Meyre
c2b21ed709 chore(integer): remove unused RadixDecomposition struct and code 2023-12-08 10:38:21 +01:00
tmontaigu
ad41fdf5a5 feat(integer): expose default and smart ciphertext sum
This expose the ciphertext sum default and smart variants

This also removes the par_seq_op functions as they are less optimal
2023-12-06 10:50:45 +01:00
Arthur Meyre
eeae19f35f chore(core_crypto): disable seeded entities for non power of 2 moduli
- as random uniform generation has rejection sampling for non native moduli
the seeded decompression currently does not work as it allocates just
enough bytes for a native integer and not for the various retries which may
be needed
- follow-up issue: https://github.com/zama-ai/tfhe-rs-internal/issues/358
2023-12-05 15:25:30 +01:00
Mayeul@Zama
798572e58c style(shortint): rename MaxNoiseLevel::valid validate 2023-12-04 18:16:46 +01:00
Mayeul@Zama
36d375943c refactor(shortint): encapsulate Degree and MaxDegree 2023-12-04 18:16:46 +01:00
Mayeul@Zama
a1488b10d5 style(shortint): move MaxDegree 2023-12-04 18:16:46 +01:00
Mayeul@Zama
b153641280 style(shortint): scalar_sub use scalar_add 2023-12-04 18:16:46 +01:00
tmontaigu
48405959a4 feat(integer): add decrypt_trivial
This adds a decrypt_trivial method to all ciphertext types of
shortint and integer.

This functions tries to "decrypt" the ciphertext if it is a
trivial one, otherwise it return an error.

This is meant to be a debugging 'tool':

To debug a function / circuit, users can call the function
on trivial ciphertexts intead of real ciphertext, that way,
computations are faster _and_ they will now be able to see intermediate
values via these decrypt_trivial, to do some print-debugging or use a
debugger.
2023-12-04 15:50:53 +01:00
Arthur Meyre
d39e73be91 chore(core): freshen up native decomposition tests
- the classic native decomposer results are exact and there aren't that
many cases to test so tests are changed to be exhaustive
- non native currently still has some work in progress parts so won't be
made exhaustive right away, additionally the next PR for the prime Q effort
will make some changes to it, so this is why the current code has not been
touched for the non native decomposer
2023-12-04 14:56:17 +01:00
dependabot[bot]
71447d845f chore(deps): bump tj-actions/changed-files from 40.1.1 to 40.2.0
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 40.1.1 to 40.2.0.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](25ef3926d1...da093c1609)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-12-04 09:22:17 +01:00
Arthur Meyre
837df59b44 fix(shortint): programmable_bootstrapping_native_crt could alter its input
- the alteration is a trick to be able to perform the wopbs in the native
crt mode but it modifies the input ciphertext and leaves it modified
meaning that the input no longer represents the same data/encryption
- applying the functon several times likely would lead to incorrect results
starting from the second call with increasing error/divergence from the
original encrypted data
- clone locally instead, should be negligible given wopbs runtime

BREAKING CHANGE:
programmable_bootstrapping_native_crt signature has changed
2023-12-01 14:46:09 +01:00
Arthur Meyre
303cac2092 chore(ci): re-enable release profile for doctest
- following merge of 17.0.4 in rust stable the bug uncovered by lto on
aarch64 has been fixed https://github.com/rust-lang/rust/issues/116941 so
we remove the hard coded override
2023-12-01 12:34:26 +01:00
Arthur Meyre
bb1a969c34 chore(ci): update nightly toolchain to have fixed LLVM as well
- fix lints linked to latest nightly
2023-12-01 12:34:26 +01:00
tmontaigu
9dac9242be feat(integer): add boolean_ops to work on BooleanBlock
This adds boolean_bitand/or/xor/not to work on BooleanBlock
This is meant to improve API of BooleanBlock.
2023-12-01 11:13:31 +01:00
tmontaigu
e97ac815eb feat(shortint): add trivial pbs
When we detect that the ciphertext to be bootstrapped
is a trivial ciphertext, we can apply the lookup table
in clear thus saving computation time.

This allows shortint,integer,hlapi to have
way faster computations when all inputs are trivial
allowing to more rapidly check and debug a circuit
2023-11-30 22:01:28 +01:00
Arthur Meyre
62feb59722 chore(ci): fix doctest by using parameters with enough precision 2023-11-30 14:57:07 +01:00
Mayeul@Zama
ac8916a30f refactor(shortint): define woppbs on server key instead of engine 2023-11-30 14:54:33 +01:00
Mayeul@Zama
069ea98ad6 style(shortint): fix remaining unused_self 2023-11-30 14:54:33 +01:00
Mayeul@Zama
e587d1835e refactor(shortint): define operations on server key instead of engine 2023-11-30 14:54:33 +01:00
Mayeul@Zama
16121e7487 style(core): fix some unused self 2023-11-30 14:54:33 +01:00
Mayeul@Zama
3ab566de7b style(boolean): fix some unused self 2023-11-30 14:54:33 +01:00
David Testé
2309b07703 test(integer): add smart tests for radix_parallel min/max
This also refactor the code to use macro to parametrize tests to
make code smaller.
2023-11-30 14:44:17 +01:00
David Testé
8755094c38 test(integer): fix tests for unsigned scalar min/max operations 2023-11-30 14:44:17 +01:00
David Testé
4da10e9dd5 chore(ci): add aws ec2 fallback profile for cpu tests
This is done to mitigate resource shortages in our base AWS region
(eu-west-3) due to the high number of instances that are launched
in parallel in our Pull Requests.
2023-11-30 13:02:42 +01:00
Arthur Meyre
cdda260063 chore(ci): .gitignore was ignoring all files/directories names keys
- this was hiding some source files in vscode search and could likely have
been very annoying when commiting stuff
2023-11-29 18:14:33 +01:00
Arthur Meyre
be413fff50 chore(shortint): remove doctest as tests as it is confirmed they fail
- doctest were also failing as tests and so it is not linked to doctests
- still unclear what is causing the issue, the results are sometimes way
off
- the concentration of failed tests can indicate a miscompile as those
tests never fail on the M1 CI, some alignment is causing issues from time
to time
2023-11-29 17:43:25 +01:00
Arthur Meyre
3ed960d255 chore(ci): fix a command naming issue for the CI 2023-11-29 16:07:31 +01:00
Arthur Meyre
bdadd39a34 chore(ci): add some missing spec indicators for cargo commands 2023-11-29 15:29:06 +01:00
Arthur Meyre
f03f2f9c6d chore(ci): fix clippy trivium target which also triggered for tfhe 2023-11-29 15:29:06 +01:00
Arthur Meyre
f03ec9bbed chore(tfhe): re-allow semicolon_if_nothing_returned
- create a section for lints that have been considered to be disallowed but
are kept allowed, either because they bring too little value or because
they don't help with readability in certain circumstances
2023-11-29 15:28:43 +01:00
Arthur Meyre
5137751dd2 chore(ci): fix pedantic lint with missing trailing ; on () return 2023-11-29 15:28:43 +01:00
tmontaigu
edc3449dbf feat(integer): add signed overflowing_add 2023-11-29 13:02:58 +01:00
Arthur Meyre
6068c509de feat(tfhe): add LWE encryption and linalg with non power of 2 moduli 2023-11-29 09:54:55 +01:00
David Testé
30a4348e3a test(integer): use keycache for comparisons and wopbs 2023-11-28 14:11:53 +01:00
Mayeul@Zama
3013e02d90 fix(core): fix typo in comment 2023-11-28 13:10:23 +01:00
Mayeul@Zama
c029917c5c style(all): rename NB_TEST NB_TESTS 2023-11-28 13:10:23 +01:00
Mayeul@Zama
d23d04021b style(core): fix clippy::inconsistent_struct_constructor 2023-11-28 13:10:23 +01:00
Mayeul@Zama
5a5e9e0ac1 style(core): fix clippy::ptr_as_ptr 2023-11-28 13:10:23 +01:00
Mayeul@Zama
cc0a3bad8d style(core): fix clippy::iter_without_into_iter 2023-11-28 13:10:23 +01:00
Mayeul@Zama
1d12f60849 style(core): fix clippy::default_trait_access 2023-11-28 13:10:23 +01:00
Mayeul@Zama
a0db39c86e style(core): fix clippy::redundant_closure_for_method_calls 2023-11-28 13:10:23 +01:00
Mayeul@Zama
ef4558ac13 style(core): fix clippy::trivially_copy_pass_by_ref 2023-11-28 13:10:23 +01:00
Mayeul@Zama
bfb22b4531 style(core): fix clippy::needless_pass_by_value 2023-11-28 13:10:23 +01:00
Mayeul@Zama
88025010e1 style(core): fix clippy::unnecessary_wraps 2023-11-28 13:10:23 +01:00
Mayeul@Zama
b1f4f3b330 style(core): fix clippy::semicolon_if_nothing_returned 2023-11-28 13:10:23 +01:00
Mayeul@Zama
7575a426ab style(core): fix clippy::used_underscore_binding 2023-11-28 13:10:23 +01:00
Mayeul@Zama
1ac57218b1 style(core): fix clippy::manual_let_else 2023-11-28 13:10:23 +01:00
Mayeul@Zama
b7c3f16e24 style(core): fix clippy::implicit_clone 2023-11-28 13:10:23 +01:00
Mayeul@Zama
bf4f9198fb style(core): fix clippy::if_not_else 2023-11-28 13:10:23 +01:00
Mayeul@Zama
e618e1d05d style(core): fix clippy::uninlined_format_args 2023-11-28 13:10:23 +01:00
Mayeul@Zama
000428d688 style(core): replace assert by unwrap 2023-11-28 13:10:23 +01:00
Arthur Meyre
937c90666b chore(core): change the test LUT to apply the identity function
- the identity function more easily detects errors in the PBS as each mega
case contains a different value compared to its neighbours
2023-11-28 11:27:45 +01:00
sarah el kazdadi
b6a6f1b098 feat(core): specialize keyswitch implementation for small scalar values 2023-11-27 09:57:37 +01:00
David Testé
c2d7f1748c chore(ci): add core_crypto layer to code coverage 2023-11-22 10:21:17 +01:00
Mayeul@Zama
e8cd55dee6 feat(shortint): add degree information in CheckError::CarryFull 2023-11-21 19:52:26 +01:00
Mayeul@Zama
95aea9dbe8 feat(shortint): add noise checks 2023-11-21 19:52:26 +01:00
Mayeul@Zama
89f701d307 refactor(shortint): refactor CheckError 2023-11-21 19:52:26 +01:00
Mayeul@Zama
224146686f feat(shortint): add max_noise_level 2023-11-21 19:52:26 +01:00
Mayeul@Zama
b6b5f92220 fix(integer): update noise_level manually in direct calls to core_crypto 2023-11-21 19:52:26 +01:00
David Testé
0fec9e252b chore(ci): change benchmark aws ec2 machine type
This instance type hpc7a.96xlarge yields better performances for
nearly the same hourly cost.
2023-11-21 14:34:08 +01:00
Arthur Meyre
53c9b82824 chore(shortint): fix typo for NoiseLevel variant UNKNOWN 2023-11-20 18:54:14 +01:00
tmontaigu
f670a950d6 fix(integer): fix inner index computation in sum
In the function that sums a vec of ciphertexts,
we track trivial zeros to avoid un-needed PBSes.

One of this tracker is `last_block_where_addition_happened`
however it was not properly computed.
It was initialized to `num_blocks - 1`, and then got applied
a bunch of `max(current, new)` where 0 <= new <= num_block - 1
which means last_block_where_addition_happened was always num_blocks - 1.

The correct initial value is 0.
2023-11-20 16:40:21 +01:00
tmontaigu
a44970a9a3 feat(integer): avoid un-necessary computations in mul params 1_X
When the parameters have 1 bit of message (message modulus == 2)
then the multiplication of 2 blocks does not create a result that can
go into the carry space.

We use that fact to avoid doig un-necessary computations
when multiplying integers encrypted under parameters with 1 bit.
2023-11-20 15:21:59 +01:00
Arthur Meyre
55775b8e02 fix(shortint): fix overflow behavior of NoiseLevel
- we will need to use a MAX/UNKNOWN level for forward compatibility with
old serialized ciphertexts, this patch ensures the add/mul behavior
saturates properly to usize::MAX to force a refresh in operations which
do it automatically
2023-11-17 18:34:02 +01:00
Arthur Meyre
523d561de6 chore(ci): add _ci_run_filter to standalone tests in shortint
- those tests were likely ignored, this is no longer the case
2023-11-17 18:34:02 +01:00
tmontaigu
61a50d0bcc chore(integer): make oveflowing_add/sub return BooleanBlock 2023-11-17 16:22:20 +01:00
Arthur Meyre
ee57f5658b chore(ci): refactor integer script and skip div and rem preferring div_rem 2023-11-17 15:00:50 +01:00
tmontaigu
9362965f50 feat(integer): add accessors to inner shortint sks
Users can access blocks from an integer but they don't have
the ability to use the inner shortint server key to process
individual blocks.

This adds an AsRef impl on integer ServerKey to allow that.

This also adds shortcuts to the integer ServerKey to get
the MessageModulus/CarryModulus (these are shorticuts
because users could do `integer_key.as_ref().message_modulus`.
2023-11-16 16:25:27 +01:00
Arthur Meyre
00fb60451d chore(ci): group signed and unsigned integer for better runtime homogeneity 2023-11-16 14:18:30 +01:00
Arthur Meyre
18b9fd4464 chore(ci): re-enable mistakenly disabled AVX512 for integers 2023-11-16 14:18:30 +01:00
Arthur Meyre
eace0bfb85 chore(ci): spread tests between two CI machines/workflow for faster runtime 2023-11-16 14:18:30 +01:00
Arthur Meyre
af1be5ebca chore(core): fix noise generation which could overflow the custom modulus
- updated some function name (for modulus checking) to be clearer on what
they do and when to use them
2023-11-16 08:58:40 +01:00
tmontaigu
916bd8a09f feat(hlapi): move if_then_else/cmux to FheBool
- This makes FheBool use integer::BooleanBlock internally.
- It makes comparisons (eq, ne, le, etc) return a FheBool instead of
  FheUint/FheInt.
- It also moves the if_then_else and cmux methods to FheBool.
- Adds casting from FheBool to FheUint/FheInt (but not from
  FheUint/FheInt to FheBool as we expect users to do `a.ne(0)`
  as its matches Rust)

BREAKING CHANGE:
    - Comparisons now return FheBool
    - if_then_else/cmux are now methods of FheBool.
2023-11-15 23:22:30 +01:00
tmontaigu
20cb0642ce refactor(hlapi): implement CastFrom for GenericInteger
And add the trait to the prelude so that users can use
it.
2023-11-15 23:22:30 +01:00
Arthur Meyre
151f9f6d82 chore(ci): fix build on main following several big merges 2023-11-15 13:29:08 +01:00
Arthur Meyre
8db8cb49e4 chore(shortint): add some flaky/failing doctests as actual tests
- check that those are actually failing or that they are a doctest bug
- add _ci_run_filter so that we can easily make sure tests run in CI even
if they don't have the "parameter format"
2023-11-15 11:10:44 +01:00
Arthur Meyre
b4583976a2 chore(tfhe): fix .gitignore for key cache
- this was not properly ignoring the keycache if a file had a specific
extension
2023-11-15 11:10:30 +01:00
Arthur Meyre
b450375da1 chore(integer): restore assert after using 3_3 params for CRT doctests
- fix max degree for CRT keys which don't need to propagate carries

BREAKING CHANGE:
pub API removed from pub interface
2023-11-15 11:10:30 +01:00
tmontaigu
f02f1fb297 feat(integer): add unsigned_oveflowing_add 2023-11-14 18:57:09 +01:00
Mayeul@Zama
17642fa703 refactor(shortint): remove unused EngineResult 2023-11-14 16:30:09 +01:00
Mayeul@Zama
23fa9b24bd refactor(shortint): separate lut generation from ShortintEngine 2023-11-14 16:30:09 +01:00
tmontaigu
0453b9bd60 fix(integer): fix signed_overflowing_sub using trivial 0 2023-11-13 15:43:33 +01:00
Arthur Meyre
9b2cf67911 chore(tfhe): fix required features for the generate_test_keys util 2023-11-13 10:05:17 +01:00
dependabot[bot]
36a7656048 chore(deps): bump tj-actions/changed-files from 40.1.0 to 40.1.1
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 40.1.0 to 40.1.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](18c8a4eceb...25ef3926d1)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-13 09:58:27 +01:00
Arthur Meyre
61c8eadd58 chore(ci): update Makefile for semver trick compatibility
- adding the tfhe package as a dependency is currently causing issues with
Cargo because of unified feature resolution it seems, it needs an
additional version specifier to disambiguate which package we are referring
to, an issue exists on their end but I don't think a fix is to be expected
soon https://github.com/rust-lang/cargo/issues/12891
- commiting this to main and then backporting the relevant pieces to 0.4.x
2023-11-10 15:35:38 +01:00
Arthur Meyre
fdd4d9d1cc chore(c_api): add more comments in the build.rs file and cbindgen.toml 2023-11-10 15:35:38 +01:00
Arthur Meyre
62700ab853 chore(tfhe): clarify dependency vs feature selection 2023-11-10 15:35:38 +01:00
Arthur Meyre
27445645e7 chore(c_api): have a way to skip cbindgen in a semver trick setting 2023-11-10 15:35:38 +01:00
tmontaigu
ea0cd26c0b chore(tfhe): fix builds on main 2023-11-10 15:15:31 +01:00
David Testé
ff48582679 test(core_crypto): silence dead code warnings on test utils 2023-11-10 09:35:16 +01:00
tmontaigu
a77c87ff12 refactor(hlapi): make GenericInteger generic over the Id 2023-11-09 20:33:53 +01:00
tmontaigu
6d143f1edc refactor(hlapi): remove unused FromParameters trait 2023-11-09 20:33:53 +01:00
Arthur Meyre
216e6b443a chore(tfhe): fix pedantic lints 2023-11-09 17:12:00 +01:00
Arthur Meyre
1400ae946c test(tfhe): add uniform random test
- use DKW test, it is e.g. used in
https://github.com/wch/r-source/blob/trunk/tests/p-r-random-tests.R

See Wikipedia DKW inequality
2023-11-09 17:12:00 +01:00
Arthur Meyre
c332902a05 feat(core): add support for non power of 2 moduli for random generation
- add convenience function to get truncated f64 value of an integer modulus
- update trait bounds for random generation for clearer diagnostics
2023-11-09 17:12:00 +01:00
Arthur Meyre
cf7a7f132d chore(doc): update a slightly wrong docstring 2023-11-09 14:38:43 +01:00
tmontaigu
6e0a3b9ad7 feat(integer): add BooleanBlock wrapper type
The BooleanBlock wrapper type is meant to convey the fact that
the ciphertext encrypts a 0 or 1.

Since its meant to be a simple wrapper, the goal for is to be flexible
and not add more burden than usefulness.

Hopefully this implementation somehow achieves that

Breaking Changes:
 - This changes the return type of comparisons from a T to
   a BooleanBlock. Requiring existing code to explicitely convert
   using `.into_radix`.
 - This makes the cmux/if_then_else functions take a BooleanValue
   as the input type  Requiring existing code to wrap their condition
   ciphertext in a new BooleanValue
2023-11-08 19:40:21 +01:00
Arthur Meyre
1f825dde08 chore(tfhe): bump version to 0.5.0 2023-11-08 15:55:22 +01:00
tmontaigu
f9222de47c feat(integer): add signed_overflowing_sub 2023-11-08 15:11:05 +01:00
Mayeul@Zama
5732e8dd7a test(hlapi): test base and compressed integer conformance 2023-11-08 09:25:55 +01:00
Mayeul@Zama
9db35c5474 chore(clippy): remove useless #[allow(warning)] 2023-11-07 16:47:04 +01:00
Mayeul@Zama
b69f73e8e6 chore(clippy): fix use_self warnings 2023-11-07 16:47:04 +01:00
Mayeul@Zama
90bdf75147 chore(clippy): enable nursery lints 2023-11-07 16:47:04 +01:00
Mayeul@Zama
233ea17adf chore(clippy): enable pedantic lints 2023-11-07 16:47:04 +01:00
David Testé
df6ee79841 chore(ci): test examples and apps in the ci 2023-11-07 10:58:03 +01:00
Mayeul@Zama
6497fb9a15 feat(shortint): update noise level in operations 2023-11-06 11:33:24 +01:00
Mayeul@Zama
d8894e3b69 feat(shortint): add noise level to ciphertexts 2023-11-06 11:33:24 +01:00
dependabot[bot]
42636bab13 chore(deps): bump tj-actions/changed-files from 40.0.0 to 40.1.0
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 40.0.0 to 40.1.0.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](af292f1e84...18c8a4eceb)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-06 09:42:04 +01:00
tmontaigu
ec27d3dc6f refactor(hlapi): remove wrapping of booleans
This commit removes the wrapping of the `tfhe::boolean`
that was done in the HLAPI, effectively making the HLAPI
only wrapping `tfhe::integer`.

FheBool is now reused to be a single shortint block
compatible with other type FheUint8,16,etc (previously they were not).

In the future, `tfhe::boolean` could be re-wrapped in hlapi, but
this time, to be used as a base for all integers and not just
FheBool.

BREAKING CHANGE:
- hlapi no longer wraps tfhe::boolean API.
- tfhe::ConfigBuilder::enable_bool/disable_bool/all_disabled/all_enabled
  removed. Now default configuration should be done using
  `tfhe::ConfigBuilder::default()`.
- `tfhe::ConfigBuilder::use_default_small_integer` removed
  use `tfhe::CondifBuilder::default_with_small_encryption()`
- Uninitialied{ClientKey, PublicKey, CompressedPublicKey} error types
  removed as these erros are no longer possible
2023-11-04 00:18:16 +01:00
Mayeul@Zama
5272c95de4 fix(shortint): fix modulus on LUT output in test 2023-11-03 09:45:22 +01:00
Mayeul@Zama
27d7ace3ef feat(shortint): fix keyswitching wrapping behavior 2023-11-03 09:45:22 +01:00
Mayeul@Zama
d80ab231a8 fix(shortint): add LUT generation without carry 2023-11-03 09:45:22 +01:00
tmontaigu
fe3fa531f9 refactor(hlapi): Remove shortint support from HLAPI
This removes the wrapping of shortints from the HLAPI,
the reasons are:

Contrary to integers for which we have different bit size
by combining different number of blocks from the _same_ key.
shortints had different bit size, but also different keys
which lead to:

- Not being able to cast between 2 different shortint type
  and between 1 shortint and 1 integer. Technically these casts
  are possible, but requires a keyswitch (and likely a PBS).
  But the keyswitch requires parameters, which may not always exists.

- Due to each shortint having different keys, the internal code to
  manage that made heavy use of macros to avoid having thousands of
  repeated lines. However, this made the code harder to follow / modify
  especially for people that were not familiar with that.

- In practive to really benefit from shortints, proper management of
  carry space is needed, however the HLAPI completely hides that,
  resulting in less optimal performances. In short, shortints
  are better used as a low level construct.

- Building a FheUint4 with two block of message_2_carry_2
  is likely to be faster the one message_4_carry_4 for most use
  cases.

So removing the wrapping of shortints will simplify the code, and
allow for more simplification later.
Also, it will allow us to expose Fhe{Ui/I}nt{2, 4, 6} types
which are compatible (cast_from/into) with Fhe{Ui/I}nt{8, 16, 32, etc}.

BREAKING CHANGE:
    - FheUint{2,3,4} removed from HLAPI
    - All HLAPI functions thied to shortints are removed
2023-10-31 09:32:05 +01:00
tmontaigu
5c1573c266 fix(integer): fix worst case noise growth in encrypted shifts
In encrypted shifts we pack 3 bits from 3 different blocks into the same
blocks by doing `b0 * 4 + 2 * b1 + b2`, and then do a PBS to simulate a
hardware mux gate.

If the inputs of shift (ie, in lhs << rhs, lhs != rhs, ie we don't do
lhs << lhs) this is fine regarding the norm2 noise.

However if we do things like `a << a` or `a >> a`, which is probably a
very rare thing but not impossible, the norm2 noise would go above the
limit that guarantees our error probability.

To fix that, we extract the bits that tells shift amount, so that they
are already properly aligned to their mux input position.
The packing becomes `b0 + 2 * b1 + b2` and so,
the noise growth is ok even in the worst case of doind `a << a`.
2023-10-30 15:02:02 +01:00
dependabot[bot]
7772e8112d chore(deps): bump tj-actions/changed-files from 39.2.3 to 40.0.0
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 39.2.3 to 40.0.0.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](95690f9ece...af292f1e84)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-30 12:07:44 +01:00
dependabot[bot]
5e92cb1475 chore(deps): bump JS-DevTools/npm-publish from 3.0.0 to 3.0.1
Bumps [JS-DevTools/npm-publish](https://github.com/js-devtools/npm-publish) from 3.0.0 to 3.0.1.
- [Release notes](https://github.com/js-devtools/npm-publish/releases)
- [Changelog](https://github.com/JS-DevTools/npm-publish/blob/main/CHANGELOG.md)
- [Commits](6fd3bc8dad...4b07b26a2f)

---
updated-dependencies:
- dependency-name: JS-DevTools/npm-publish
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-30 12:07:24 +01:00
dependabot[bot]
f51e19b071 chore(deps): bump actions/checkout from 4.1.0 to 4.1.1
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.0 to 4.1.1.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4.1.0...b4ffde65f46336ab88eb53be808477a3936bae11)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-30 12:06:53 +01:00
tmontaigu
aeb00ae584 chore(integer): use Arc<ServerKey> for executor
The goal is to avoid holding the key twice in memory
when both the executor and the test case needs the key
2023-10-27 18:01:55 +02:00
Arthur Meyre
ce5e9c1bdb chore(integer): more CRT tests and related fixes
- add remaining tests
- fix unchecked scalar mul for small carries
2023-10-27 11:30:00 +02:00
Arthur Meyre
4d4e124e94 chore(integer): add crt 32 bits tests with 5_1 params
- remove buggy unchecked_scalar_add_assign and replace by the proper
implementation which had a different name

BREAKING CHANGE:
removed an API entry point which was not required
2023-10-27 11:30:00 +02:00
Arthur Meyre
ca6d37e06f feat(integer): better handle trivial 0 blocks from LHS
- currently the filter only applied to the RHS but LHS can also benefit
from the filter
2023-10-27 10:31:24 +02:00
Mayeul@Zama
e3143315f3 fix(integer): disable broken assert in smart_crt_sub_assign 2023-10-27 09:43:51 +02:00
Mayeul@Zama
f8636fe814 feat(integer): add asserts in smart ops 2023-10-27 09:43:51 +02:00
tmontaigu
7e72400321 chore(doc): replace some ^ which could be interpreted as xor not pow 2023-10-26 23:42:58 +02:00
tmontaigu
728b409256 chore(integer): move comparator test out of it
Move the comparisons test (eq, ne, ge, gt, etc)
that were in the comparator module out of the comparator module.

This is so that in later commits will create test cases out
of these tests so they can, like other unsigned tests be
used to test other implementations of ServerKey
2023-10-25 10:31:55 +02:00
Arthur Meyre
d91404e567 chore(integer): remove empty where clause 2023-10-25 09:41:37 +02:00
David Testé
e11c3d7b7c chore(ci): add signed integer benchmarks to the CI 2023-10-25 09:14:00 +02:00
David Testé
6f8eeb043c chore(bench): add default ops for singed integers benchmarks 2023-10-25 09:14:00 +02:00
Arthur Meyre
00d55182b4 chore(ci): update examples to have a tmp dir to avoid rights issues in /tmp
- on machines where multiple users can log in, some files used for
serialization doctests would cause rights access issues and crash doctests
2023-10-23 15:03:18 +02:00
dependabot[bot]
6f6ce106c3 chore(deps): bump tj-actions/changed-files from 39.2.2 to 39.2.3
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 39.2.2 to 39.2.3.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](408093d9ff...95690f9ece)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-23 10:28:00 +02:00
dependabot[bot]
68fcbb5280 chore(deps): bump JS-DevTools/npm-publish from 2.2.2 to 3.0.0
Bumps [JS-DevTools/npm-publish](https://github.com/js-devtools/npm-publish) from 2.2.2 to 3.0.0.
- [Release notes](https://github.com/js-devtools/npm-publish/releases)
- [Changelog](https://github.com/JS-DevTools/npm-publish/blob/main/CHANGELOG.md)
- [Commits](fe72237be0...6fd3bc8dad)

---
updated-dependencies:
- dependency-name: JS-DevTools/npm-publish
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-23 10:27:35 +02:00
dependabot[bot]
3f46389cc8 chore(deps): bump actions/checkout from 4.1.0 to 4.1.1
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.0 to 4.1.1.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](8ade135a41...b4ffde65f4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-23 10:27:10 +02:00
Arthur Meyre
9e8dd01cb9 chore(ci): enable integer multi bit tests on M1 2023-10-20 17:55:38 +02:00
tmontaigu
0085ceb97b chore(ci): set node version 2023-10-20 10:24:50 +02:00
tmontaigu
be9a4d2d9c chore(wasm): update dependencies of wasm tests 2023-10-20 10:24:50 +02:00
Arthur Meyre
87421e8307 chore(ci): update M1 workflow to not explode the 6h GitHub limit
- run doc tests for CI with LTO off following M1 investigation
- LTO fat may be a cause of the wopbs flaky tests, disabling to check
2023-10-19 14:18:05 +02:00
Arthur Meyre
0c3919628f refactor(core): use avx512 intrinsics when available for data conversions
- we use inline assembly for now as rust does not propose those in the std
or core arch crates at the moment
- add tests for avx512 conversion
2023-10-19 13:21:19 +02:00
Arthur Meyre
f1c21888a7 chore(doc): encourage users to use dedicated keys to Radix or CRT 2023-10-19 09:52:22 +02:00
tmontaigu
2624beb7fa fix(integer): fix unsigned_overflowing_sub on trivials
unsigned_overflowing_sub does an independant subtraction
on each blocks with a correcting term being added to avoid
trashing the padding bit (lhs - rhs + correction).

The correction depended on rhs's degree.
e.g. if rhs's degree was in range 1..(msg_mod-1) -> correction =
     msg_mod

However if rhs's degree was zero (so rhs is a trivial 0), the correction
was also 0, however the borrow propagation rely on that correction to
always be added.
2023-10-18 19:26:01 +02:00
tmontaigu
e44c38a102 chore(ci): tell nvm to use node version 20 in wasm parallel tests 2023-10-18 19:04:27 +02:00
Arthur Meyre
4535230874 refactor(core): rename pbs_modulus_switch to fast_pbs_modulus_switch
- update docstring to reflect the change that has been done

BREAKING CHANGE:
pbs_modulus_switch is currently part of the public API and the rename is
therefore a breaking change
2023-10-17 16:53:19 +02:00
Arthur Meyre
a7b2d9b228 chore(ci): update check toolchain to latest nightly
- no new lints
2023-10-17 16:13:26 +02:00
Arthur Meyre
ab923a3ebc fix(crt): fix mul for non symmetrical parameters
- add non reg test for 32 bits mul with 5_1 parameters
2023-10-17 14:22:00 +02:00
Arthur Meyre
a0e85fb355 feat(core): add more custom moduli primitives to UnsignedInteger
As always for now the objective is to have functional custom modulus
implementations, not efficient ones

- add multiplication
- add leading_zeros
- add neg
2023-10-17 13:31:35 +02:00
Arthur Meyre
ecee305340 chore(core): change prelude algorithms imports 2023-10-17 13:31:35 +02:00
Mayeul@Zama
f08ea8cf85 fix(integer): fix max_degree formula 2023-10-17 11:35:08 +02:00
Mayeul@Zama
096e320b97 fix(crt): use 3_3 parameters for crt tests 2023-10-17 11:35:08 +02:00
Mayeul@Zama
95aac64c1c style(crt): compute modulus from base in tests 2023-10-17 11:35:08 +02:00
Mayeul@Zama
76aaa56691 fix(integer): fix small mul test 2023-10-17 11:35:08 +02:00
Mayeul@Zama
a40489bdd2 style(shortint): do not use assign ops on a cloned input 2023-10-17 11:35:08 +02:00
Mayeul@Zama
4bf617eb10 feat(shortint): cleanup input if necessary in ops 2023-10-17 11:35:08 +02:00
Mayeul@Zama
070073d229 feat(shortint): cleanup input if necessary in apply_lookup_table_bivariate 2023-10-17 11:35:08 +02:00
Arthur Meyre
6c1ca8e32b chore(core): use modular_distance instead of abs_diff in fft tests
- we are doing backwards conversions to the torus, so values could wrap
around near 0 or u64::MAX, take the modular distance which represents the
distance on the torus
2023-10-17 10:29:24 +02:00
Arthur Meyre
6523610ca4 refactor(core): refactor conversion code from f64 to i64
- observed that the subnormal case is already handled by the shift logic so
the special handling was not required
- add test for avx512 conversion
2023-10-17 10:29:24 +02:00
Arthur Meyre
41c20e22f5 chore(ci): enable AVX512 for integer and multi bit integer tests 2023-10-17 10:28:14 +02:00
J-B Orfila
4a00d25cb1 doc: updating doc for v0.4 2023-10-16 17:56:17 +02:00
tmontaigu
8c9ee64612 fix(integer): better estimate which algorithm to choose 2023-10-16 16:19:00 +02:00
tmontaigu
bfdfbfac0f chore(integer): add tests for default signed rotations/shifts 2023-10-16 16:16:07 +02:00
tmontaigu
dbe7bdcd5c feat(integer): map cmux to if_then_else 2023-10-16 16:15:49 +02:00
tmontaigu
6d77ff18ad chore(integer): add full_propagate test 2023-10-16 14:11:44 +02:00
tmontaigu
7d4d0e0b16 fix(integer): fix is_scalar_add_possible 2023-10-16 14:11:44 +02:00
Mayeul@Zama
b27762232c feat(wasm): add integers safe deserialization 2023-10-16 10:19:09 +02:00
Mayeul@Zama
f597d0f06f feat(c_api): add base and compress integers safe deserialization 2023-10-16 10:19:09 +02:00
dependabot[bot]
ee188448f3 chore(deps): bump tj-actions/changed-files from 39.2.1 to 39.2.2
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 39.2.1 to 39.2.2.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](db153baf73...408093d9ff)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-16 09:51:26 +02:00
Mayeul@Zama
ee49f048c7 style(integer): rename num_blocks_per_integer 2023-10-13 14:18:44 +02:00
Mayeul@Zama
a9b09ecc45 feat(c_api): add compact integer safe deserialization 2023-10-13 14:18:44 +02:00
Mayeul@Zama
efc243edc9 feat(global): refactor ciphertext conformance 2023-10-13 14:18:44 +02:00
tmontaigu
bc34411d3f feat(integer): speed-up division by using overflowing_sub
using overflowing sub allows to remove the comparison used
in the algorithm, giving significant performance boost.

before
             8        16       32      40       64       128        256
hpc7a:    `981 ms` `2.53 s` `6.41 s` `9.04 s` `16.1 s` `39.3 s` `1.55 min`
m6i:      `1.10 s` `2.97 s` `7.17 s` `10.5 s` `19.7 s` `50.2 s` `2.11 min`

afer:
             8        16       32      40       64       128        256
hpc7a:   `604 ms` `1.6 s`  `3.8 s`  `5.14 s` `9.4 s`  `22.4 s`  `54.613 s`
m6i:     `659 ms` `1.77 s` `4.4 s`  `5.9 s`  `11.5 s` `29.8 s`  `87.95 s`
2023-10-12 14:35:36 +02:00
J-B Orfila
c7923ff3ed refactor(shortint): update compact parameters 2023-10-12 11:56:50 +02:00
Arthur Meyre
7534b68e5c test(core): use polynomial tests from NTT PR
- initial work done in https://github.com/zama-ai/tfhe-rs/pull/394
- useful reworks of the tests have been waiting in that PR, this is to
have those tests while NTT usage gets validated

co-authored-by: sarah-ek <sarah.elkazdadi@zama.ai>
2023-10-12 10:40:15 +02:00
tmontaigu
655f7e6214 chore(hlapi): improve scalar type convertion 2023-10-10 17:18:32 +02:00
tmontaigu
b8556ddbd4 feat(hlapi): add C API support for FheInt 2023-10-10 17:18:32 +02:00
tmontaigu
cab7439064 fix(integer): handle trivial ct in if_then_else
if_then_else uses two calls to zero_out_if.

In zero_out_if, if the condition block given has a degree of 0
then it would return 0, without calling the predicate function.

This is not correct, as its the predicate function that
gives whether the output should be 0 or the original ciphertext.

Which meant that if if_then_else received a condition with a
degree of 0, it would always return 0.
2023-10-10 17:18:12 +02:00
tmontaigu
f8a8780651 fix(integer): remove remove if_then_else assert
unchecked_if_then_else had an assert that required
that the condition value looked like it encrypts a boolean.
This check was made using the degree.

However, the only cases where a value looks like it encrypts a boolean
value is when they are the result of a comparison (lt, le, eq, etc).

But there are other cases were the value holds a boolean value but
due to how degree works, it's not possible to know thus limiting the
use of if_then_else.

So we remove that assert, and rely on the developper knowing
its condition is 0 or 1.
2023-10-09 18:35:26 +02:00
tmontaigu
bb3c8e7d5d feat(integer): add unsigned_overflowing_sub 2023-10-09 15:39:41 +02:00
Arthur Meyre
69536960c3 chore: fix typos 2023-10-09 14:49:13 +02:00
dependabot[bot]
52a7c52a49 chore(deps): bump tj-actions/changed-files from 39.2.0 to 39.2.1
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 39.2.0 to 39.2.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](8238a41032...db153baf73)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-09 10:21:04 +02:00
tmontaigu
751c407ba5 feat(wasm): add FheInt support 2023-10-05 15:52:45 +02:00
Mayeul@Zama
492d348138 test(serialization): run tests in CI 2023-10-05 09:15:58 +02:00
Mayeul@Zama
e7df7eb5ef test(serialization): add serialization test 2023-10-05 09:15:58 +02:00
Mayeul@Zama
380ee52986 test(hlapi): test compact integer conformance 2023-10-05 09:15:58 +02:00
Mayeul@Zama
439a28f68b feat(global): impl ParameterSetConformant for ciphertexts 2023-10-05 09:15:58 +02:00
Mayeul@Zama
2eb1e37ca7 feat(global): add safe deserialization 2023-10-05 09:15:58 +02:00
Mayeul@Zama
eb1b136c45 feat(core): add to_equivalent_lwe_dimension 2023-10-05 09:15:58 +02:00
Mayeul@Zama
1376bcba7c chore(test): add type hint for rust-analyzer 2023-10-05 09:15:58 +02:00
tmontaigu
b5b4e54b9b feat(hlapi): add FheInt{8,16,32,64,128,256} 2023-10-04 20:41:19 +02:00
Arthur Meyre
23c2bd790a chore(test): fix incorrect memory buffer size in wopbs core_crypto tests 2023-10-04 14:17:33 +02:00
tmontaigu
251ee9aa0e chore(hlapi): add InnerCiphertext type to integer wrapper
Make the GenericInteger struct have a generic `InnerCiphertext`
instead of always RadixCiphertext.

This is to prepare the addition of signed types which will use a
SignedRadixCiphertext.
2023-10-03 16:26:09 +02:00
Arthur Meyre
fad066a996 refactor(core): remove a copy in the external product
- add an fft backward primitive that can use the input fourier buffer as
output as well
- gains 0.6 ms on 2_2 m6i.metal
2023-10-03 13:10:01 +02:00
tmontaigu
6ef1f22b33 feat(hlapi): tie scalar ops with corresponding clear type
Operations that used a scalar as right operand where generically
implemented meaning a user could, for example, add a u32 to a FheUint8.

Rust only allows operations between matching types, so we do the same
thing.

BREAKING CHANGE: This is a breaking change on the Rust API, but
for the better I believe. On the C API it is not a breaking change
as we already made that association as it was simpler to implement
2023-10-02 23:17:30 +02:00
tmontaigu
8cc8dba1ab feat(integer): add encryption of signed radix via compressed pk 2023-10-02 16:02:36 +02:00
tmontaigu
082328c91a feat(integer): add default signed_scalar div/rem/div_rem 2023-10-02 16:02:18 +02:00
tmontaigu
fdb6faa0a8 fix(integer): clean output quotient of division
The quotient was slowly computed by
getting a resut bit, shifting it to its position then adding it
to a quotient block, i.e quotient += bit << pos;

This meant that the output quotient was noisy, too noisy for
parameters like param_message_4_carry_4, and so the signed division
would then negate and cmux this quotient and due to the high noise,
some computations would fail, on param_message_4_carry_4.

To fix this we clean the quotient's noise before returning it.
2023-10-02 08:48:45 +02:00
Arthur Meyre
856440386f chore(csprng): the stabilized aarch64 intrisics were in Rust 1.72
- update the version accordingly
2023-09-29 18:33:39 +02:00
tmontaigu
2e8189514c feat(integer): make compact ciphertext compatible with signed 2023-09-28 20:41:38 +02:00
tmontaigu
29b2454cce feat(integer): add sign extend fn for SignedRadixCiphertext 2023-09-28 17:48:41 +02:00
tmontaigu
9ed2589c7a chore(integer): impl RecomposableSignedInteger for StaticSignedBigInt 2023-09-28 14:01:14 +02:00
tmontaigu
36b71529e6 chore(integer): make tests work with different ServerKey
This is a first step, a second step would be
to plug the non parallel radix tests so that
they are testing the same things.
2023-09-28 10:50:18 +02:00
Arthur Meyre
b738946d72 chore(core): add utils to test noise distribution for power of 2 q 2023-09-28 09:49:30 +02:00
David Testé
62f1425257 chore(bench): add missing unsigned integer operations 2023-09-28 08:47:39 +02:00
David Testé
44e491b93f style(integer): rename absolute_value functions to abs
Also add _parallelized suffix since the implementation is located in
radix_parallel directory.
2023-09-28 08:47:39 +02:00
tmontaigu
a470b26672 fix(integer): StaticSignedBigInt right shift 2023-09-27 18:37:25 +02:00
tmontaigu
015409424c chore(hlapi): remove unused keychain_member from macro 2023-09-27 14:33:24 +02:00
tmontaigu
37be751188 fix(integer): is_neg/sub/add possible
The way we did the is_neg/add/sub possible at the integer level was
incorrect in two ways.

1) We simply called the is_neg/add/sub_possible from
   the shortint impl on each block as if the were independant.
   However that is not the case, and to the check did not reflect
   actual computation.

2) We checked that we did not go beyond max degree on each block,
   However, a more correct approach would be to check that adding
   the potential carry from preceding block would not exceeding the
   current block max capacity.
2023-09-26 16:02:15 +02:00
sarah el kazdadi
2580a834af feat(core): optimize monic polynomial operations in pbs 2023-09-26 15:02:33 +02:00
David Testé
a029bd878e chore(ci): fix file exclusion for coverage reports 2023-09-26 08:58:36 +02:00
David Testé
400e7930b6 chore(ci): fix options typos for new tarpaulin version 2023-09-26 08:58:36 +02:00
David Testé
40d07c6bc3 chore(ci): speed-up boolean coverage
This is done by reducing the number of parameters set run in tests.
Using the keycache for the key switching key and public key tests also
help to reduce total run duration.
2023-09-26 08:58:36 +02:00
Mayeul@Zama
9dd2d39f1c style(global): fix typos 2023-09-25 17:27:29 +02:00
dependabot[bot]
4045a3bc2f chore(deps): bump tj-actions/changed-files from 39.0.2 to 39.2.0
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 39.0.2 to 39.2.0.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](6ee9cdc581...8238a41032)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-25 10:30:28 +02:00
dependabot[bot]
b4ffeccd46 chore(deps): bump actions/checkout from 4.0.0 to 4.1.0
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.0.0 to 4.1.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](3df4ab11eb...8ade135a41)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-25 10:30:22 +02:00
tmontaigu
7fe3ad3b6e feat(integer): improve scalar_mul
This changes the algorithm for scalar_mul.
The new algorithm allows to remove a lot of work.

For small precisions (16, 32, 64) the gains are in range 5%-10%
for higher precisions the gains are 25%-50%.

This also changes the mul to use the functions that sums many
clean ciphertexts in parallel. For mul, there is only a 5%-10%
improvements for 128bits and 256bits mul.
2023-09-22 15:45:07 +02:00
tmontaigu
7fdd4f9532 chore(integer): add default signed bitand/or/xor tests 2023-09-22 14:50:27 +02:00
Arthur Meyre
81eef39ddb feat(core): add parallel variant of extract_lwe_sample_from_glwe
- allows to quickly extract all coefficients packed in a GLWE cipehrtext
2023-09-22 10:55:02 +02:00
tmontaigu
b6459e3cda fix(integer): fix signed div_rem test for 0/0 2023-09-21 21:38:16 +02:00
Arthur Meyre
f2ef78c348 refactor(core): simplify closest_representable and pbs_modulus_switch
- both code were selecting the bit below the last representable bit,
extracted it and then added it to the bit above, the same effect can be
achieved by adding a 1 at the bit below the last representable bit
- update closest_representable to use an approach more like
pbs_modulus_switch yielding assembly with 42% less instructions (12 -> 7)
2023-09-21 15:54:53 +02:00
Mayeul@Zama
aef8f31621 chore(deps): update cargo dependencies 2023-09-21 15:11:13 +02:00
sarah el kazdadi
df78d178da fix(integer): replace unnecessary unsafe code in integer shift/add 2023-09-21 11:02:41 +02:00
Arthur Meyre
9297a886a4 chore(docs): fix docstring about encryption key choice 2023-09-20 16:02:55 +02:00
tmontaigu
28b4f91a32 fix(integer): only propagate if necessary after trimming
By unconditionally propagating carries after trimming
we would sometimes do work for nothing, and as propagating
carries is not cheap at all it would degrade performances.

So only propagate when necessary
2023-09-20 15:57:33 +02:00
David Testé
04fb46e41b chore(ci): print security level in parameters check
The devo profile is used to speed up the compilation phase.
2023-09-20 15:33:39 +02:00
David Testé
53da809f37 chore(ci): reduce max dimension threshold in lattice estimator 2023-09-20 09:39:50 +02:00
David Testé
723910c669 chore(ci): fix end-of-file newlines 2023-09-20 09:39:50 +02:00
David Testé
8ecf8879fb chore(ci): add end-of-file newline checks recipe 2023-09-20 09:39:50 +02:00
tmontaigu
2427f744f8 feat(integer): add unchecked implementation of signed ciphertext 2023-09-20 08:50:15 +02:00
Arthur Meyre
422e1f23d5 feat(core): add GLWE linear algebra primitives
- add appropriate tests and doctest
2023-09-19 11:41:16 +02:00
sarah el kazdadi
30a5ade17f fix(csprng): enable target_feature attributes for functions using simd intrinsics 2023-09-19 09:19:47 +02:00
tmontaigu
6cdd41c22f fix(integer): fix is_neg_possible
shortint's is_neg_possible did not check the degree on the correct value.
It check the degree on the value that should be added to the next block
not on the value that actually becomes the degree.

integer's is neg possible had the same problem so we also fix it
and also check the next block can 'receive' the value that should be added to it.

Our tests did not catch that as they were not testing non empty carry case
2023-09-18 17:19:48 +02:00
Mayeul@Zama
f369bec394 feat(core): add par_convert_standard_lwe_multi_bit_bootstrap_key_to_fourier 2023-09-18 14:35:06 +02:00
Mayeul@Zama
df4e9c69c7 feat(core): add par_convert_bootstrap_key_fourier 2023-09-18 14:35:06 +02:00
Mayeul@Zama
0e3d129906 feat(core): add par_fill_with_forward_fourier 2023-09-18 14:35:06 +02:00
Mayeul@Zama
682e455c94 feat(core): add par_convert_polynomials_list_to_fourier 2023-09-18 14:35:06 +02:00
Mayeul@Zama
b553a68fa9 chore(docs): simplify improved formula in dark market 2023-09-18 09:58:13 +02:00
Mayeul@Zama
be95eadf79 chore(docs): remove fallible directory change 2023-09-18 09:58:13 +02:00
Mayeul@Zama
0213a11a0c chore(docs): refactor dark market 2023-09-18 09:58:13 +02:00
Mayeul@Zama
413fde3b3b chore(docs): doc fixes and improvements 2023-09-18 09:58:13 +02:00
sarah el kazdadi
40f8ac9adf feat(core): replace unsafe simd intrinsics 2023-09-18 09:30:17 +02:00
dependabot[bot]
2ab25c1084 chore(deps): bump rtCamp/action-slack-notify from 2.2.0 to 2.2.1
Bumps [rtCamp/action-slack-notify](https://github.com/rtcamp/action-slack-notify) from 2.2.0 to 2.2.1.
- [Release notes](https://github.com/rtcamp/action-slack-notify/releases)
- [Commits](https://github.com/rtcamp/action-slack-notify/compare/v2.2.0...b24d75fe0e728a4bf9fc42ee217caa686d141ee8)

---
updated-dependencies:
- dependency-name: rtCamp/action-slack-notify
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-18 09:25:56 +02:00
dependabot[bot]
86c62b70e5 chore(deps): bump JS-DevTools/npm-publish from 2.2.1 to 2.2.2
Bumps [JS-DevTools/npm-publish](https://github.com/js-devtools/npm-publish) from 2.2.1 to 2.2.2.
- [Release notes](https://github.com/js-devtools/npm-publish/releases)
- [Changelog](https://github.com/JS-DevTools/npm-publish/blob/main/CHANGELOG.md)
- [Commits](5a85faf05d...fe72237be0)

---
updated-dependencies:
- dependency-name: JS-DevTools/npm-publish
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-18 09:25:49 +02:00
dependabot[bot]
18d790fc26 chore(deps): bump actions-ecosystem/action-add-labels
Bumps [actions-ecosystem/action-add-labels](https://github.com/actions-ecosystem/action-add-labels) from 1.1.0 to 1.1.3.
- [Release notes](https://github.com/actions-ecosystem/action-add-labels/releases)
- [Commits](bd52874380...18f1af5e35)

---
updated-dependencies:
- dependency-name: actions-ecosystem/action-add-labels
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-18 09:25:38 +02:00
dependabot[bot]
e9e3dae786 chore(deps): bump actions/checkout from 3.5.3 to 4.0.0
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.5.3 to 4.0.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3.5.3...3df4ab11eba7bda6032a0b82a6bb43b11571feac)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-18 09:25:33 +02:00
dependabot[bot]
9b1dccbcb4 chore(deps): bump tj-actions/changed-files from 39.0.1 to 39.0.2
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 39.0.1 to 39.0.2.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](246636f5fa...6ee9cdc581)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-18 09:25:28 +02:00
David Testé
cef011dd91 chore(ci): add compact keys to parameters security checks 2023-09-18 09:12:51 +02:00
Arthur Meyre
19f7d5af5c chore(test): remove some unused imports 2023-09-15 18:27:57 +02:00
Arthur Meyre
95ca5a80dc feat(tfhe): plug parallel decompression in higher level APIs
- boolean, shortint and integer have been updated to benefit from paralle
decompression
2023-09-15 18:27:57 +02:00
Arthur Meyre
b5fded34d1 feat(core): add multi bit BSK parallel decompression
- added decompression equivalence test
2023-09-15 18:27:57 +02:00
Arthur Meyre
0c3b09c83d chore(core): update multi bit BSK decompression to match encryption
- test passes
2023-09-15 18:27:57 +02:00
Arthur Meyre
85a19d30a9 feat(core): add KSK parallel decompression
- update to check decompression equivalence
2023-09-15 18:27:57 +02:00
Arthur Meyre
f58132c391 feat(core): add GGSW par decompression, add LWE BSK par decompression
- add par decompression equivalency test
2023-09-15 18:27:57 +02:00
Arthur Meyre
099bff84aa refactor(core): use forking to decompress GGSW ciphertext list
- existing BSK equivalency test passes which means the change is compatible
2023-09-15 18:27:57 +02:00
Arthur Meyre
42ad474a46 feat(core): add parallel decompression to GGSW ciphertext
- added equivalence test for parallel decompression
2023-09-15 18:27:57 +02:00
Arthur Meyre
9f6827b803 chore(core): update code to use the newly introduced MaskRandomGenerator 2023-09-15 18:27:57 +02:00
Arthur Meyre
d23c0df449 refactor(core): update decompression code for LweCiphertextList
- update related algorithms
2023-09-15 18:27:57 +02:00
Arthur Meyre
229bfeebe4 chore(core): remove unsafe new_unchecked on CiphertextModulus
- the function has been renamed to new and is now generally available
2023-09-15 18:27:16 +02:00
David Testé
48aab9d494 chore(ci): add boolean layer to code coverage 2023-09-15 11:00:23 +02:00
David Testé
e4769a8212 chore(ci): do not trigger code coverage on pr sync
Automatic code coverage will be enable again, once all the layers of the
library have coverage implemented.
2023-09-15 08:33:10 +02:00
David Testé
79bdaaba20 chore(ci): disable codecov patch status
This is done to avoid noisy reports in GitHub since coverage in all the
library layers haven't been implemented yet.
2023-09-15 08:33:10 +02:00
Arthur Meyre
02a14fff7c feat(core): add parallel LWE packing keyswitch
- update test to check equivalence of parallel and serial algorithm
2023-09-14 14:45:35 +02:00
Arthur Meyre
72cce4c5b2 chore(core): move thread_count computation before buffer allocations
- for parallel LWE KS and LWE PFPKS
- remove useless type annotation as well
2023-09-14 14:45:35 +02:00
David Testé
a317c4b9dd chore(ci): run code coverage workflow on aws ec2 instance 2023-09-14 13:32:09 +02:00
David Testé
2e2bd5ba29 chore(ci): use aws ami with missing packages installed
libssl-dev and pkg-config packages were missing to be able to install
cargo tarpaulin.
2023-09-14 13:32:09 +02:00
David Testé
827d8d8708 chore(ci): run coverage only if source files have changed 2023-09-14 13:32:09 +02:00
David Testé
bf434be347 chore(ci): exclude unwanted files from coverage 2023-09-14 13:32:09 +02:00
Arthur Meyre
ed83fbb460 chore(tfhe): remove unused deps, drop once_cell, enable paste when needed
- bump to 1.72 for std lib OnceLock and stabilized ARM intrisics
2023-09-14 10:52:34 +02:00
David Testé
0aad2e669b chore(ci): notify slack about coverage only in case of failure 2023-09-13 15:25:32 +02:00
David Testé
cd68a3bd1c chore(ci): execute clippy all-targets on benchmark code
The use of internal-keycache feature is mandatory to ensure clippy
is building against benchmark code.
2023-09-13 11:49:33 +02:00
Arthur Meyre
b77286bcbc chore(bench): fix pbs bench code which had issues with name type
- clippy will be hardened in a subsequent commit
2023-09-13 11:49:33 +02:00
Mayeul@Zama
609f83bbff style(shortint): smart always take mut references 2023-09-13 09:44:46 +02:00
David Testé
2a8ebb81d8 chore(ci): fix trivium test recipes 2023-09-13 09:09:39 +02:00
David Testé
1a2a17a6ab chore(ci): avoid running full suite several times on approval
If two or more reviewers approve a Pull Request successively with
no new commits in between, the full test suite would have been run
for each approval. With this commit, the full test suite would be
run again upon approval only if a push has occurred.
2023-09-13 09:09:20 +02:00
David Testé
0080caf95d chore(ci): add code coverage workflow
Coverage is performed and then data are sent to Codecov to handle
reports.
2023-09-13 09:09:03 +02:00
David Testé
c26238533b chore(ci): add key cache keys generation for coverage 2023-09-13 09:09:03 +02:00
David Testé
b29936d844 test(shortint): add key switching key handling in keycache
This mainly done to speed up coverage test by avoiding generating
key on each tests.
2023-09-13 09:09:03 +02:00
David Testé
25914cc727 test(shortint): make tests suite lighter for coverage target 2023-09-13 09:09:03 +02:00
David Testé
ca229e369b chore(ci): add make recipe for shortint test coverage
Tarpaulin is the cargo module used to test coverage.
2023-09-13 09:09:03 +02:00
Asher
4a99e54c0d chore(docs): typo in readme 2023-09-12 18:34:06 +02:00
tmontaigu
2383591351 chore(hlapi): move integrations tests into src dir 2023-09-12 10:32:42 +02:00
tmontaigu
dc464f398d chore(trivium): fix module inception 2023-09-12 10:32:26 +02:00
Mayeul@Zama
ce70b5758a refactor(shortint): replace apply_msg_identity_lut_assign by message_extract 2023-09-11 17:38:15 +02:00
Mayeul@Zama
1c76a08373 refactor(shortint): replace clear_carry by message_extract 2023-09-11 17:38:15 +02:00
Mayeul@Zama
9b19bd1e8b refactor(shortint): remove comp_assign operations 2023-09-11 17:38:15 +02:00
Arthur Meyre
a3dde21240 refactor(core): add NoiseRandomGenerator
- update EncryptionRandomGenerator to make use of that generator
2023-09-11 17:29:47 +02:00
Arthur Meyre
005e1afe2f refactor(core): add MaskRandomGenerator
- update EncryptionRandomGenerator to make use of that generator
- remove fork_n and par_fork_n on EncryptionRandomGenerator which were lazy
shortcuts to get multi bit PBS working faster
- MaskRandomGenerator will be usable for parallel decompression, which will
be done in a subsequent commit
2023-09-11 17:29:47 +02:00
Arthur Meyre
17c404b77d chore(core): re-organize encryption.rs to have smaller files
- preparatory work for refactor around mask and noise generators to allow
parallel decompression
2023-09-11 17:29:47 +02:00
Arthur Meyre
1403971d15 feat(core): add parallel LWE PFPKS and LWE KS
- add utility algorithms where applicable
2023-09-11 13:25:43 +02:00
Arthur Meyre
94ad69bfa3 chore(docs): update toolchain requirements 2023-09-11 13:05:33 +02:00
Arthur Meyre
bc129ba0ed chore(csprng): add dieharder test suite 2023-09-11 13:05:33 +02:00
Arthur Meyre
462834a12e chore(tfhe): use concrete-csprng 0.4.0 allows to use stable for M1/M2 macs 2023-09-11 13:05:33 +02:00
Arthur Meyre
ebeee1d6f8 chore(ci): add concrete-csprng to the CI 2023-09-11 13:05:33 +02:00
Arthur Meyre
d0e1a582e1 chore(csprng): add code base taken from concrete-core repo 2023-09-11 13:05:33 +02:00
Arthur Meyre
546cb369a8 chore(c_api): mark some functions manipulating pointers as unsafe
- restrict visibility to the c_api

BREAKING CHANGE:
change in visibility specifier is a breaking change though those functions
were not meant to be used by external users
2023-09-11 12:49:57 +02:00
dependabot[bot]
445af7ab97 chore(deps): bump tj-actions/changed-files from 38.2.1 to 39.0.1
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 38.2.1 to 39.0.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](2f7246cb26...246636f5fa)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-11 10:05:38 +02:00
dependabot[bot]
23f8c69bae chore(deps): bump actions/upload-artifact from 3.1.2 to 3.1.3
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](0b7f8abb15...a8a3f3ad30)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-11 10:05:29 +02:00
dependabot[bot]
b8df207b68 chore(deps): bump actions/checkout from 3.6.0 to 4.0.0
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.6.0 to 4.0.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](f43a0e5ff2...3df4ab11eb)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-11 10:05:17 +02:00
sarah el kazdadi
03688aee4c feat(fft): use monomial fft for multibit pbs 2023-09-08 18:27:43 +02:00
David Testé
5a3652f398 chore(ci): add clippy target for trivium application 2023-09-08 16:49:27 +02:00
811 changed files with 117542 additions and 30611 deletions

View File

@@ -5,6 +5,8 @@ env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -24,13 +26,13 @@ on:
description: "Action runner name"
type: string
request_id:
description: 'Slab request ID'
description: "Slab request ID"
type: string
fork_repo:
description: 'Name of forked repo as user/repo'
description: "Name of forked repo as user/repo"
type: string
fork_git_sha:
description: 'Git SHA to checkout from fork'
description: "Git SHA to checkout from fork"
type: string
jobs:
@@ -51,7 +53,7 @@ jobs:
echo "Fork git sha: ${{ inputs.fork_git_sha }}"
- name: Checkout tfhe-rs
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
@@ -61,10 +63,13 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
default: true
- name: Run concrete-csprng tests
run: |
make test_concrete_csprng
- name: Run core tests
run: |
@@ -106,6 +111,14 @@ jobs:
run: |
make test_high_level_api
- name: Run safe deserialization tests
run: |
make test_safe_deserialization
- name: Run forward compatibility tests
run: |
make test_forward_compatibility
- name: Slack Notification
if: ${{ always() }}
continue-on-error: true

113
.github/workflows/aws_tfhe_gpu_tests.yml vendored Normal file
View File

@@ -0,0 +1,113 @@
# Compile and test Concrete-cuda on an AWS instance
name: Concrete Cuda - Full tests
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
# All the inputs are provided by Slab
inputs:
instance_id:
description: "AWS instance ID"
type: string
instance_image_id:
description: "AWS instance AMI ID"
type: string
instance_type:
description: "AWS instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: 'Slab request ID'
type: string
fork_repo:
description: 'Name of forked repo as user/repo'
type: string
fork_git_sha:
description: 'Git SHA to checkout from fork'
type: string
jobs:
run-cuda-tests-linux:
concurrency:
group: tfhe_cuda_backend_test-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
name: Test code in EC2
runs-on: ${{ inputs.runner_name }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
steps:
# Step used for log purpose.
- name: Instance configuration used
run: |
echo "ID: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
echo "Fork repo: ${{ inputs.fork_repo }}"
echo "Fork git sha: ${{ inputs.fork_git_sha }}"
- name: Checkout tfhe-rs
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Run clippy checks
run: |
make clippy_gpu
- name: Run all tests
run: |
make test_gpu
- name: Run user docs tests
run: |
make test_user_doc_gpu
- name: Test C API
run: |
make test_c_api_gpu

View File

@@ -1,9 +1,11 @@
name: AWS Integer Tests on CPU
name: AWS Unsigned Integer Tests on CPU
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -23,13 +25,13 @@ on:
description: "Action runner name"
type: string
request_id:
description: 'Slab request ID'
description: "Slab request ID"
type: string
fork_repo:
description: 'Name of forked repo as user/repo'
description: "Name of forked repo as user/repo"
type: string
fork_git_sha:
description: 'Git SHA to checkout from fork'
description: "Git SHA to checkout from fork"
type: string
jobs:
@@ -50,7 +52,7 @@ jobs:
echo "Fork git sha: ${{ inputs.fork_git_sha }}"
- name: Checkout tfhe-rs
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
@@ -60,18 +62,25 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
default: true
- name: Gen Keys if required
run: |
make GEN_KEY_CACHE_MULTI_BIT_ONLY=TRUE gen_key_cache
- name: Run unsigned integer multi-bit tests
run: |
AVX512_SUPPORT=ON make test_unsigned_integer_multi_bit_ci
- name: Gen Keys if required
run: |
make gen_key_cache
- name: Run integer tests
- name: Run unsigned integer tests
run: |
BIG_TESTS_INSTANCE=TRUE make test_integer_ci
AVX512_SUPPORT=ON BIG_TESTS_INSTANCE=TRUE make test_unsigned_integer_ci
- name: Slack Notification
if: ${{ always() }}

View File

@@ -1,9 +1,11 @@
name: AWS Multi Bit Tests on CPU
name: AWS Signed Integer Tests on CPU
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -23,13 +25,13 @@ on:
description: "Action runner name"
type: string
request_id:
description: 'Slab request ID'
description: "Slab request ID"
type: string
fork_repo:
description: 'Name of forked repo as user/repo'
description: "Name of forked repo as user/repo"
type: string
fork_git_sha:
description: 'Git SHA to checkout from fork'
description: "Git SHA to checkout from fork"
type: string
jobs:
@@ -50,7 +52,7 @@ jobs:
echo "Fork git sha: ${{ inputs.fork_git_sha }}"
- name: Checkout tfhe-rs
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
@@ -60,10 +62,9 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
default: true
- name: Gen Keys if required
run: |
@@ -73,9 +74,17 @@ jobs:
run: |
make test_shortint_multi_bit_ci
- name: Run integer multi-bit tests
- name: Run signed integer multi-bit tests
run: |
make test_integer_multi_bit_ci
AVX512_SUPPORT=ON make test_signed_integer_multi_bit_ci
- name: Gen Keys if required
run: |
make gen_key_cache
- name: Run signed integer tests
run: |
AVX512_SUPPORT=ON BIG_TESTS_INSTANCE=TRUE make test_signed_integer_ci
- name: Slack Notification
if: ${{ always() }}

View File

@@ -4,6 +4,8 @@ env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -23,13 +25,13 @@ on:
description: "Action runner name"
type: string
request_id:
description: 'Slab request ID'
description: "Slab request ID"
type: string
fork_repo:
description: 'Name of forked repo as user/repo'
description: "Name of forked repo as user/repo"
type: string
fork_git_sha:
description: 'Git SHA to checkout from fork'
description: "Git SHA to checkout from fork"
type: string
jobs:
@@ -50,7 +52,7 @@ jobs:
echo "Fork git sha: ${{ inputs.fork_git_sha }}"
- name: Checkout tfhe-rs
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
@@ -60,10 +62,13 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
default: true
- name: Run concrete-csprng tests
run: |
make test_concrete_csprng
- name: Run core tests
run: |
@@ -77,6 +82,10 @@ jobs:
run: |
make test_c_api
- name: Run C API tests with forward_compatibility
run: |
FORWARD_COMPAT=ON make test_c_api
- name: Run user docs tests
run: |
make test_user_doc
@@ -96,6 +105,12 @@ jobs:
- name: Run example tests
run: |
make test_examples
make dark_market
- name: Run apps tests
run: |
make test_trivium
make test_kreyvium
- name: Slack Notification
if: ${{ always() }}

View File

@@ -4,6 +4,8 @@ env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
@@ -50,7 +52,7 @@ jobs:
echo "Fork git sha: ${{ inputs.fork_git_sha }}"
- name: Checkout tfhe-rs
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
@@ -60,10 +62,9 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
default: true
- name: Run js on wasm API tests
run: |

View File

@@ -19,11 +19,21 @@ on:
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-boolean-benchmarks:
@@ -43,7 +53,7 @@ jobs:
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -53,10 +63,9 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
override: true
- name: Run benchmarks with AVX512
run: |
@@ -88,13 +97,13 @@ jobs:
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_boolean
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab

View File

@@ -6,6 +6,8 @@ on:
env:
CARGO_TERM_COLOR: always
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
@@ -17,16 +19,30 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
os: [ubuntu-latest, macos-latest-large, windows-latest]
fail-fast: false
steps:
- uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
- name: Install and run newline linter checks
if: matrix.os == 'ubuntu-latest'
run: |
wget https://github.com/fernandrone/linelint/releases/download/0.0.6/linelint-linux-amd64
echo "16b70fb7b471d6f95cbdc0b4e5dc2b0ac9e84ba9ecdc488f7bdf13df823aca4b linelint-linux-amd64" > checksum
sha256sum -c checksum || exit 1
chmod +x linelint-linux-amd64
mv linelint-linux-amd64 /usr/local/bin/linelint
make check_newline
- name: Run pcc checks
run: |
make pcc
- name: Build concrete-csprng
run: |
make build_concrete_csprng
- name: Build Release core
run: |
make build_core AVX512_SUPPORT=ON

120
.github/workflows/code_coverage.yml vendored Normal file
View File

@@ -0,0 +1,120 @@
name: Code Coverage
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
# All the inputs are provided by Slab
inputs:
instance_id:
description: "AWS instance ID"
type: string
instance_image_id:
description: "AWS instance AMI ID"
type: string
instance_type:
description: "AWS instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: 'Slab request ID'
type: string
fork_repo:
description: 'Name of forked repo as user/repo'
type: string
fork_git_sha:
description: 'Git SHA to checkout from fork'
type: string
jobs:
code-coverage:
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}_${{ inputs.instance_image_id }}_${{ inputs.instance_type }}
cancel-in-progress: true
runs-on: ${{ inputs.runner_name }}
timeout-minutes: 1080
steps:
# Step used for log purpose.
- name: Instance configuration used
run: |
echo "ID: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
echo "Fork repo: ${{ inputs.fork_repo }}"
echo "Fork git sha: ${{ inputs.fork_git_sha }}"
- name: Checkout tfhe-rs
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@90a06d6ba9543371ab4df8eeca0be07ca6054959
with:
files_yaml: |
tfhe:
- tfhe/src/**
concrete_csprng:
- concrete-csprng/src/**
- name: Generate Keys
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
run: |
make GEN_KEY_CACHE_COVERAGE_ONLY=TRUE gen_key_cache
make gen_key_cache_core_crypto
- name: Run coverage for core_crypto
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
run: |
make test_core_crypto_cov AVX512_SUPPORT=ON
- name: Run coverage for boolean
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
run: |
make test_boolean_cov
- name: Run coverage for shortint
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
run: |
make test_shortint_cov
- name: Upload tfhe coverage to Codecov
uses: codecov/codecov-action@4fe8c5f003fae66aa5ebb77cfd3e7bfbbda0b6b0
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
with:
token: ${{ secrets.CODECOV_TOKEN }}
directory: ./coverage/
fail_ci_if_error: true
files: shortint/cobertura.xml,boolean/cobertura.xml,core_crypto/cobertura.xml,core_crypto_avx512/cobertura.xml
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Code coverage finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -0,0 +1,74 @@
name: CSPRNG randomness testing Workflow
env:
CARGO_TERM_COLOR: always
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
on:
# Allows you to run this workflow manually from the Actions tab as an alternative.
workflow_dispatch:
# All the inputs are provided by Slab
inputs:
instance_id:
description: "AWS instance ID"
type: string
instance_image_id:
description: "AWS instance AMI ID"
type: string
instance_type:
description: "AWS instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: 'Slab request ID'
type: string
fork_repo:
description: 'Name of forked repo as user/repo'
type: string
fork_git_sha:
description: 'Git SHA to checkout from fork'
type: string
jobs:
csprng-randomness-teting:
name: CSPRNG randomness testing
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}_${{ inputs.instance_image_id }}_${{ inputs.instance_type }}
cancel-in-progress: true
runs-on: ${{ inputs.runner_name }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: ${{ inputs.fork_repo }}
ref: ${{ inputs.fork_git_sha }}
- name: Set up home
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install latest stable
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
- name: Dieharder randomness test suite
run: |
make dieharder_csprng
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "concrete-csprng randomness check finished with status: ${{ job.status }}. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -25,6 +25,8 @@ env:
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
@@ -44,7 +46,7 @@ jobs:
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -54,10 +56,9 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
override: true
- name: Run benchmarks with AVX512
run: |
@@ -69,7 +70,7 @@ jobs:
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
@@ -90,13 +91,13 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab

View File

@@ -19,23 +19,52 @@ on:
request_id:
description: "Slab request ID"
type: string
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
prepare-matrix:
name: Prepare operations matrix
runs-on: ubuntu-latest
outputs:
op_flavor: ${{ steps.set_op_flavor.outputs.op_flavor }}
steps:
- name: Weekly benchmarks
if: ${{ github.event.inputs.user_inputs == 'weekly_benchmarks' }}
run: |
echo "OP_FLAVOR=[\"default\"]" >> ${GITHUB_ENV}
- name: Quarterly benchmarks
if: ${{ github.event.inputs.user_inputs == 'quarterly_benchmarks' }}
run: |
echo "OP_FLAVOR=[\"default\", \"smart\", \"unchecked\", \"misc\"]" >> ${GITHUB_ENV}
- name: Set operation flavor output
id: set_op_flavor
run: |
echo "op_flavor=${{ toJSON(env.OP_FLAVOR) }}" >> ${GITHUB_OUTPUT}
integer-benchmarks:
name: Execute integer benchmarks for all operations flavor
needs: prepare-matrix
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
continue-on-error: true
timeout-minutes: 1440 # 24 hours
strategy:
max-parallel: 1
matrix:
command: [ integer, integer_multi_bit]
op_flavor: [ default, default_comp, default_scalar, default_scalar_comp, smart, smart_comp, smart_scalar, smart_parallelized, smart_parallelized_comp, smart_scalar_parallelized, unchecked, unchecked_comp, unchecked_scalar, unchecked_scalar_comp, misc ]
op_flavor: ${{ fromJson(needs.prepare-matrix.outputs.op_flavor) }}
steps:
- name: Instance configuration used
run: |
@@ -45,7 +74,7 @@ jobs:
echo "Request ID: ${{ inputs.request_id }}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -61,13 +90,12 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
override: true
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
@@ -91,7 +119,7 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}

View File

@@ -0,0 +1,158 @@
# Run integer benchmarks on an AWS instance with CUDA and return parsed results to Slab CI bot.
name: Integer GPU benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
name: Execute integer benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Run benchmarks with AVX512
run: |
make AVX512_SUPPORT=ON FAST_BENCH=TRUE BENCH_OP_FLAVOR=default bench_integer_gpu
- name: Parse benchmarks to csv
run: |
make PARSE_INTEGER_BENCH_CSV_FILE=${{ env.PARSE_INTEGER_BENCH_CSV_FILE }} \
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--backend gpu \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.CONCRETE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Integer GPU benchmarks failed. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -0,0 +1,163 @@
# Run all integer benchmarks on an AWS instance with CUDA and return parsed results to Slab CI bot.
name: Integer GPU full benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
integer-benchmarks:
name: Execute integer benchmarks for all operations flavor
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
continue-on-error: true
strategy:
fail-fast: false
max-parallel: 1
matrix:
command: [ integer, integer_multi_bit]
op_flavor: [ default, unchecked ]
# explicit include-based build matrix, of known valid options
include:
- os: ubuntu-22.04
cuda: "12.2"
gcc: 9
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Get benchmark details
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})" >> "${GITHUB_ENV}"
echo "COMMIT_HASH=$(git describe --tags --dirty)" >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Checkout Slab repo
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.CONCRETE_ACTIONS_TOKEN }}
- name: Run benchmarks with AVX512
run: |
make AVX512_SUPPORT=ON BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_${{ matrix.command }}_gpu
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--backend gpu \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
slack-notification:
name: Slack Notification
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ failure() }}
needs: integer-benchmarks
steps:
- name: Notify
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Integer GPU full benchmarks failed. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -25,6 +25,8 @@ env:
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
@@ -44,7 +46,7 @@ jobs:
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -54,10 +56,9 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
override: true
- name: Run multi-bit benchmarks with AVX512
run: |
@@ -69,7 +70,7 @@ jobs:
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
@@ -90,13 +91,13 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab

View File

@@ -0,0 +1,159 @@
# Run integer benchmarks with multi-bit cryptographic parameters on an AWS instance and return parsed results to Slab CI bot.
name: Integer Multi-bit benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
name: Execute integer multi-bit benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
strategy:
fail-fast: false
# explicit include-based build matrix, of known valid options
matrix:
include:
- os: ubuntu-22.04
cuda: "11.8"
cuda_arch: "70"
gcc: 9
env:
CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Run multi-bit benchmarks with AVX512
run: |
make AVX512_SUPPORT=ON FAST_BENCH=TRUE BENCH_OP_FLAVOR=default bench_integer_multi_bit_gpu
- name: Parse benchmarks to csv
run: |
make PARSE_INTEGER_BENCH_CSV_FILE=${{ env.PARSE_INTEGER_BENCH_CSV_FILE }} \
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--backend gpu \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.CONCRETE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Integer GPU benchmarks failed. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -14,8 +14,9 @@ on:
env:
CARGO_TERM_COLOR: always
RUSTFLAGS: "-C target-cpu=native"
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
CARGO_PROFILE: release_lto_off
FAST_TESTS: "TRUE"
concurrency:
@@ -28,18 +29,21 @@ jobs:
runs-on: ["self-hosted", "m1mac"]
steps:
- uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
- name: Install latest stable
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: stable
default: true
- name: Run pcc checks
run: |
make pcc
- name: Build concrete-csprng
run: |
make build_concrete_csprng
- name: Build Release core
run: |
make build_core
@@ -64,6 +68,10 @@ jobs:
run: |
make build_c_api
- name: Run concrete-csprng tests
run: |
make test_concrete_csprng
- name: Run core tests
run: |
make test_core_crypto
@@ -103,10 +111,9 @@ jobs:
run: |
make test_shortint_multi_bit_ci
# # These multi bit integer tests are too slow on M1 with low core count and low RAM
# - name: Run integer multi bit tests
# run: |
# make test_integer_multi_bit_ci
- name: Run integer multi bit tests
run: |
make test_integer_multi_bit_ci
remove_label:
name: Remove m1_test label

View File

@@ -30,7 +30,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -49,7 +49,7 @@ jobs:
- name: Publish web package
if: ${{ inputs.push_web_package }}
uses: JS-DevTools/npm-publish@5a85faf05d2ade2d5b6682bfe5359915d5159c6c
uses: JS-DevTools/npm-publish@4b07b26a2f6e0a51846e1870223e545bae91c552
with:
token: ${{ secrets.NPM_TOKEN }}
package: tfhe/pkg/package.json
@@ -65,7 +65,7 @@ jobs:
- name: Publish Node package
if: ${{ inputs.push_node_package }}
uses: JS-DevTools/npm-publish@5a85faf05d2ade2d5b6682bfe5359915d5159c6c
uses: JS-DevTools/npm-publish@4b07b26a2f6e0a51846e1870223e545bae91c552
with:
token: ${{ secrets.NPM_TOKEN }}
package: tfhe/pkg/package.json
@@ -79,6 +79,6 @@ jobs:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Integer benchmarks failed. (${{ env.ACTION_RUN_URL }})"
SLACK_MESSAGE: "tfhe release failed: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -0,0 +1,42 @@
# Publish new release of tfhe-rs on various platform.
name: Publish concrete-csprng release
on:
workflow_dispatch:
inputs:
dry_run:
description: "Dry-run"
type: boolean
default: true
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
jobs:
publish_release:
name: Publish concrete-csprng Release
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Publish crate.io package
env:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
cargo publish -p concrete-csprng --token ${{ env.CRATES_TOKEN }} ${{ env.DRY_RUN }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "concrete-csprng release failed: (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -17,10 +17,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
- name: Checkout lattice-estimator
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: malb/lattice-estimator
path: lattice_estimator
@@ -32,7 +32,7 @@ jobs:
- name: Collect parameters
run: |
make write_params_to_file
CARGO_PROFILE=devo make write_params_to_file
- name: Perform security check
run: |

View File

@@ -19,11 +19,21 @@ on:
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-pbs-benchmarks:
@@ -43,7 +53,7 @@ jobs:
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -53,10 +63,9 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
override: true
- name: Run benchmarks with AVX512
run: |
@@ -78,13 +87,13 @@ jobs:
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_pbs
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab

142
.github/workflows/pbs_gpu_benchmark.yml vendored Normal file
View File

@@ -0,0 +1,142 @@
# Run PBS benchmarks on an AWS instance with CUDA and return parsed results to Slab CI bot.
name: PBS GPU benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
jobs:
run-pbs-benchmarks:
name: Execute PBS benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
- name: Export CUDA variables
if: ${{ !cancelled() }}
run: |
echo "CUDA_PATH=$CUDA_PATH" >> "${GITHUB_ENV}"
echo "$CUDA_PATH/bin" >> "${GITHUB_PATH}"
echo "LD_LIBRARY_PATH=$CUDA_PATH/lib:$LD_LIBRARY_PATH" >> "${GITHUB_ENV}"
echo "CUDACXX=/usr/local/cuda-${{ matrix.cuda }}/bin/nvcc" >> "${GITHUB_ENV}"
# Specify the correct host compilers
- name: Export gcc and g++ variables
if: ${{ !cancelled() }}
run: |
echo "CC=/usr/bin/gcc-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}" >> "${GITHUB_ENV}"
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Run benchmarks with AVX512
run: |
make AVX512_SUPPORT=ON bench_pbs_gpu
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--backend gpu \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--name-suffix avx512 \
--walk-subdirs \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_pbs
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.CONCRETE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on downloaded artifact"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "PBS GPU benchmarks failed. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -24,6 +24,8 @@ env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-shortint-benchmarks:
@@ -43,7 +45,7 @@ jobs:
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -53,10 +55,9 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
override: true
- name: Run benchmarks with AVX512
run: |
@@ -88,13 +89,13 @@ jobs:
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_shortint
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab

View File

@@ -19,11 +19,21 @@ on:
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
shortint-benchmarks:
@@ -43,7 +53,7 @@ jobs:
echo "Request ID: ${{ inputs.request_id }}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -59,13 +69,12 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
override: true
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
@@ -104,7 +113,7 @@ jobs:
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_shortint_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}

View File

@@ -0,0 +1,130 @@
# Run signed integer benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Signed Integer benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
name: Execute signed integer benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
- name: Run benchmarks with AVX512
run: |
make AVX512_SUPPORT=ON FAST_BENCH=TRUE bench_signed_integer
- name: Parse benchmarks to csv
run: |
make PARSE_INTEGER_BENCH_CSV_FILE=${{ env.PARSE_INTEGER_BENCH_CSV_FILE }} \
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.CONCRETE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Signed integer benchmarks failed. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -0,0 +1,134 @@
# Run all signed integer benchmarks on an AWS instance and return parsed results to Slab CI bot.
name: Signed Integer full benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
integer-benchmarks:
name: Execute signed integer benchmarks for all operations flavor
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
continue-on-error: true
timeout-minutes: 1440 # 24 hours
strategy:
max-parallel: 1
matrix:
command: [ integer, integer_multi_bit ]
op_flavor: [ default, unchecked ]
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Get benchmark details
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})" >> "${GITHUB_ENV}"
echo "COMMIT_HASH=$(git describe --tags --dirty)" >> "${GITHUB_ENV}"
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.CONCRETE_ACTIONS_TOKEN }}
- name: Run benchmarks with AVX512
run: |
make AVX512_SUPPORT=ON BENCH_OP_FLAVOR=${{ matrix.op_flavor }} bench_signed_${{ matrix.command }}
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${{ env.COMMIT_HASH }}" \
--branch ${{ github.ref_name }} \
--commit-date "${{ env.COMMIT_DATE }}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
slack-notification:
name: Slack Notification
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ failure() }}
needs: integer-benchmarks
steps:
- name: Notify
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Signed integer full benchmarks failed. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -0,0 +1,130 @@
# Run signed integer benchmarks with multi-bit cryptographic parameters on an AWS instance and return parsed results to Slab CI bot.
name: Signed Integer Multi-bit benchmarks
on:
workflow_dispatch:
inputs:
instance_id:
description: "Instance ID"
type: string
instance_image_id:
description: "Instance AMI ID"
type: string
instance_type:
description: "Instance product type"
type: string
runner_name:
description: "Action runner name"
type: string
request_id:
description: "Slab request ID"
type: string
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
PARSE_INTEGER_BENCH_CSV_FILE: tfhe_rs_integer_benches_${{ github.sha }}.csv
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-integer-benchmarks:
name: Execute signed integer multi-bit benchmarks in EC2
runs-on: ${{ github.event.inputs.runner_name }}
if: ${{ !cancelled() }}
steps:
- name: Instance configuration used
run: |
echo "IDs: ${{ inputs.instance_id }}"
echo "AMI: ${{ inputs.instance_image_id }}"
echo "Type: ${{ inputs.instance_type }}"
echo "Request ID: ${{ inputs.request_id }}"
- name: Get benchmark date
run: |
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Set up home
# "Install rust" step require root user to have a HOME directory which is not set.
run: |
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
- name: Run multi-bit benchmarks with AVX512
run: |
make AVX512_SUPPORT=ON FAST_BENCH=TRUE bench_signed_integer_multi_bit
- name: Parse benchmarks to csv
run: |
make PARSE_INTEGER_BENCH_CSV_FILE=${{ env.PARSE_INTEGER_BENCH_CSV_FILE }} \
parse_integer_benches
- name: Upload csv results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_csv_integer
path: ${{ env.PARSE_INTEGER_BENCH_CSV_FILE }}
- name: Parse results
run: |
COMMIT_DATE="$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})"
COMMIT_HASH="$(git describe --tags --dirty)"
python3 ./ci/benchmark_parser.py target/criterion ${{ env.RESULTS_FILENAME }} \
--database tfhe_rs \
--hardware ${{ inputs.instance_type }} \
--project-version "${COMMIT_HASH}" \
--branch ${{ github.ref_name }} \
--commit-date "${COMMIT_DATE}" \
--bench-date "${{ env.BENCH_DATE }}" \
--walk-subdirs \
--name-suffix avx512 \
--throughput
- name: Upload parsed results artifact
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_integer
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.CONCRETE_ACTIONS_TOKEN }}
- name: Send data to Slab
shell: bash
run: |
echo "Computing HMac on results file"
SIGNATURE="$(slab/scripts/hmac_calculator.sh ${{ env.RESULTS_FILENAME }} '${{ secrets.JOB_SECRET }}')"
echo "Sending results to Slab..."
curl -v -k \
-H "Content-Type: application/json" \
-H "X-Slab-Repository: ${{ github.repository }}" \
-H "X-Slab-Command: store_data_v2" \
-H "X-Hub-Signature-256: sha256=${SIGNATURE}" \
-d @${{ env.RESULTS_FILENAME }} \
${{ secrets.SLAB_URL }}
- name: Slack Notification
if: ${{ failure() }}
continue-on-error: true
uses: rtCamp/action-slack-notify@b24d75fe0e728a4bf9fc42ee217caa686d141ee8
env:
SLACK_COLOR: ${{ job.status }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_MESSAGE: "Signed integer benchmarks failed. (${{ env.ACTION_RUN_URL }})"
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

View File

@@ -20,14 +20,26 @@ on:
description: "Run integer benches"
type: boolean
default: true
signed_integer_bench:
description: "Run signed integer benches"
type: boolean
default: true
integer_multi_bit_bench:
description: "Run integer multi bit benches"
type: boolean
default: true
signed_integer_multi_bit_bench:
description: "Run signed integer multi bit benches"
type: boolean
default: true
pbs_bench:
description: "Run PBS benches"
type: boolean
default: true
pbs_gpu_bench:
description: "Run PBS benches on GPU"
type: boolean
default: true
wasm_client_bench:
description: "Run WASM client benches"
type: boolean
@@ -38,17 +50,21 @@ jobs:
if: ${{ (github.event_name == 'push' && github.repository == 'zama-ai/tfhe-rs') || github.event_name == 'workflow_dispatch' }}
strategy:
matrix:
command: [boolean_bench, shortint_bench, integer_bench, integer_multi_bit_bench, pbs_bench, wasm_client_bench]
command: [ boolean_bench, shortint_bench,
integer_bench, integer_multi_bit_bench,
signed_integer_bench, signed_integer_multi_bit_bench,
integer_gpu_bench, integer_multi_bit_gpu_bench,
pbs_bench, pbs_gpu_bench, wasm_client_bench ]
runs-on: ubuntu-latest
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@2f7246cb26e8bb6709b6cbfc1fec7febfe82e96a
uses: tj-actions/changed-files@90a06d6ba9543371ab4df8eeca0be07ca6054959
with:
files_yaml: |
common_benches:
@@ -69,13 +85,23 @@ jobs:
integer_bench:
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/benches/integer/**
- tfhe/benches/integer/bench.rs
- .github/workflows/integer_benchmark.yml
integer_multi_bit_bench:
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/benches/integer/**
- .github/workflows/integer_benchmark.yml
- tfhe/benches/integer/bench.rs
- .github/workflows/integer_multi_bit_benchmark.yml
signed_integer_bench:
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/benches/integer/signed_bench.rs
- .github/workflows/signed_integer_benchmark.yml
signed_integer_multi_bit_bench:
- tfhe/src/shortint/**
- tfhe/src/integer/**
- tfhe/benches/integer/signed_bench.rs
- .github/workflows/signed_integer_multi_bit_benchmark.yml
pbs_bench:
- tfhe/src/core_crypto/**
- tfhe/benches/core_crypto/**
@@ -85,7 +111,7 @@ jobs:
- .github/workflows/wasm_client_benchmark.yml
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab

View File

@@ -3,34 +3,58 @@ name: Start full suite benchmarks
on:
schedule:
# Job will be triggered each Saturday at 1a.m.
# Weekly benchmarks will be triggered each Saturday at 1a.m.
- cron: '0 1 * * 6'
# Quarterly benchmarks will be triggered right before end of quarter, the 25th of the current month at 4a.m.
# These benchmarks are far longer to execute hence the reason to run them only four time a year.
- cron: '0 4 25 MAR,JUN,SEP,DEC *'
workflow_dispatch:
inputs:
benchmark_type:
description: 'Benchmark type'
required: true
default: 'weekly'
type: choice
options:
- weekly
- quarterly
jobs:
start-benchmarks:
if: ${{ (github.event_name == 'schedule' && github.repository == 'zama-ai/tfhe-rs') || github.event_name == 'workflow_dispatch' }}
strategy:
matrix:
command: [ boolean_bench, shortint_full_bench, integer_full_bench, pbs_bench, wasm_client_bench ]
command: [ boolean_bench, shortint_full_bench,
integer_full_bench, signed_integer_full_bench, integer_gpu_full_bench,
pbs_bench, pbs_gpu_bench, wasm_client_bench ]
runs-on: ubuntu-latest
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab
token: ${{ secrets.CONCRETE_ACTIONS_TOKEN }}
- name: Set benchmarks type as weekly
if: (github.event_name == 'workflow_dispatch' && inputs.benchmark_type == 'weekly') || github.event.schedule == '0 1 * * 6'
run: |
echo "BENCH_TYPE=weekly_benchmarks" >> "${GITHUB_ENV}"
- name: Set benchmarks type as quarterly
if: (github.event_name == 'workflow_dispatch' && inputs.benchmark_type == 'quarterly') || github.event.schedule == '0 4 25 MAR,JUN,SEP,DEC *'
run: |
echo "BENCH_TYPE=quarterly_benchmarks" >> "${GITHUB_ENV}"
- name: Start AWS job in Slab
shell: bash
run: |
echo -n '{"command": "${{ matrix.command }}", "git_ref": "${{ github.ref }}", "sha": "${{ github.sha }}"}' > command.json
echo -n '{"command": "${{ matrix.command }}", "git_ref": "${{ github.ref }}", "sha": "${{ github.sha }}", "user_inputs": "${{ env.BENCH_TYPE }}"}' > command.json
SIGNATURE="$(slab/scripts/hmac_calculator.sh command.json '${{ secrets.JOB_SECRET }}')"
curl -v -k \
--fail-with-body \

View File

@@ -13,11 +13,11 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
- name: Save repo
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: repo-archive
path: '.'

View File

@@ -12,6 +12,16 @@ jobs:
permissions:
pull-requests: write
steps:
- name: Get current labels
uses: snnaplab/get-labels-action@f426df40304808ace3b5282d4f036515f7609576
- name: Remove approved label
if: ${{ github.event_name == 'pull_request' && contains(fromJSON(env.LABELS), 'approved') }}
uses: actions-ecosystem/action-remove-labels@2ce5d41b4b6aa8503e285553f75ed56e0a40bae0
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
labels: approved
- name: Launch fast tests
if: ${{ github.event_name == 'pull_request' }}
uses: mshick/add-pr-comment@a65df5f64fc741e91c59b8359a4bc56e57aaf5b1
@@ -19,9 +29,19 @@ jobs:
allow-repeats: true
message: |
@slab-ci cpu_fast_test
@slab-ci gpu_test
- name: Add approved label
uses: actions-ecosystem/action-add-labels@18f1af5e3544586314bbe15c0273249c770b2daf
if: ${{ github.event_name == 'pull_request_review' && github.event.review.state == 'approved' && !contains(fromJSON(env.LABELS), 'approved') }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
labels: approved
# PR label 'approved' presence is checked to avoid running the full test suite several times
# in case of multiple approvals without new commits in between.
- name: Launch full tests suite
if: ${{ github.event_name == 'pull_request_review' && github.event.review.state == 'approved' }}
if: ${{ github.event_name == 'pull_request_review' && github.event.review.state == 'approved' && !contains(fromJSON(env.LABELS), 'approved') }}
uses: mshick/add-pr-comment@a65df5f64fc741e91c59b8359a4bc56e57aaf5b1
with:
allow-repeats: true
@@ -29,6 +49,7 @@ jobs:
Pull Request has been approved :tada:
Launching full test suite...
@slab-ci cpu_test
@slab-ci cpu_integer_test
@slab-ci cpu_multi_bit_test
@slab-ci cpu_unsigned_integer_test
@slab-ci cpu_signed_integer_test
@slab-ci cpu_wasm_test
@slab-ci csprng_randomness_testing

View File

@@ -19,11 +19,21 @@ on:
request_id:
description: "Slab request ID"
type: string
# This input is not used in this workflow but still mandatory since a calling workflow could
# use it. If a triggering command include a user_inputs field, then the triggered workflow
# must include this very input, otherwise the workflow won't be called.
# See start_full_benchmarks.yml as example.
user_inputs:
description: "Type of benchmarks to run"
type: string
default: "weekly_benchmarks"
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
jobs:
run-wasm-client-benchmarks:
@@ -43,7 +53,7 @@ jobs:
echo "BENCH_DATE=$(date --iso-8601=seconds)" >> "${GITHUB_ENV}"
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
fetch-depth: 0
@@ -53,10 +63,9 @@ jobs:
echo "HOME=/home/ubuntu" >> "${GITHUB_ENV}"
- name: Install rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
uses: dtolnay/rust-toolchain@be73d7920c329f220ce78e0234b8f96b7ae60248
with:
toolchain: nightly
override: true
- name: Run benchmarks
run: |
@@ -89,13 +98,13 @@ jobs:
--append-results
- name: Upload parsed results artifact
uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce
uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8
with:
name: ${{ github.sha }}_wasm
path: ${{ env.RESULTS_FILENAME }}
- name: Checkout Slab repo
uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
with:
repository: zama-ai/slab
path: slab

10
.gitignore vendored
View File

@@ -3,9 +3,9 @@ target/
.vscode/
# Path we use for internal-keycache during tests
./keys/
/keys/
# In case of symlinked keys
./keys
/keys
**/Cargo.lock
**/*.bin
@@ -13,3 +13,9 @@ target/
# Some of our bench outputs
/tfhe/benchmarks_parameters
**/*.csv
# dieharder run log
dieharder_run.log
# Coverage reports
/coverage/

14
.linelint.yml Normal file
View File

@@ -0,0 +1,14 @@
ignore:
- .git
- target
- tfhe/benchmarks_parameters
- tfhe/web_wasm_parallel_tests/node_modules
- tfhe/web_wasm_parallel_tests/dist
- keys
- coverage
rules:
# checks if file ends in a newline character
end-of-file:
enable: true
single-new-line: true

View File

@@ -1,6 +1,6 @@
[workspace]
resolver = "2"
members = ["tfhe", "tasks", "apps/trivium"]
members = ["tfhe", "tasks", "apps/trivium", "concrete-csprng"]
[profile.bench]
lto = "fat"

View File

@@ -1,6 +1,6 @@
BSD 3-Clause Clear License
Copyright © 2023 ZAMA.
Copyright © 2024 ZAMA.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,

448
Makefile
View File

@@ -3,19 +3,31 @@ OS:=$(shell uname)
RS_CHECK_TOOLCHAIN:=$(shell cat toolchain.txt | tr -d '\n')
CARGO_RS_CHECK_TOOLCHAIN:=+$(RS_CHECK_TOOLCHAIN)
TARGET_ARCH_FEATURE:=$(shell ./scripts/get_arch_feature.sh)
RS_BUILD_TOOLCHAIN:=$(shell \
( (echo $(TARGET_ARCH_FEATURE) | grep -q x86) && echo stable) || echo $(RS_CHECK_TOOLCHAIN))
RS_BUILD_TOOLCHAIN:=stable
CARGO_RS_BUILD_TOOLCHAIN:=+$(RS_BUILD_TOOLCHAIN)
CARGO_PROFILE?=release
MIN_RUST_VERSION:=$(shell grep rust-version tfhe/Cargo.toml | cut -d '=' -f 2 | xargs)
MIN_RUST_VERSION:=$(shell grep '^rust-version[[:space:]]*=' tfhe/Cargo.toml | cut -d '=' -f 2 | xargs)
AVX512_SUPPORT?=OFF
WASM_RUSTFLAGS:=
BIG_TESTS_INSTANCE?=FALSE
GEN_KEY_CACHE_MULTI_BIT_ONLY?=FALSE
GEN_KEY_CACHE_COVERAGE_ONLY?=FALSE
PARSE_INTEGER_BENCH_CSV_FILE?=tfhe_rs_integer_benches.csv
FAST_TESTS?=FALSE
FAST_BENCH?=FALSE
BENCH_OP_FLAVOR?=DEFAULT
NODE_VERSION=20
FORWARD_COMPAT?=OFF
# sed: -n, do not print input stream, -e means a script/expression
# 1,/version/ indicates from the first line, to the line matching version at the start of the line
# p indicates to print, so we keep only the start of the Cargo.toml until we hit the first version
# entry which should be the version of tfhe
TFHE_CURRENT_VERSION:=\
$(shell sed -n -e '1,/^version/p' tfhe/Cargo.toml | \
grep '^version[[:space:]]*=' | cut -d '=' -f 2 | xargs)
# Cargo has a hard time distinguishing between our package from the workspace and a package that
# could be a dependency, so we build an unambiguous spec here
TFHE_SPEC:=tfhe@$(TFHE_CURRENT_VERSION)
# This is done to avoid forgetting it, we still precise the RUSTFLAGS in the commands to be able to
# copy paste the command in the terminal and change them if required without forgetting the flags
export RUSTFLAGS?=-C target-cpu=native
@@ -32,10 +44,42 @@ else
MULTI_BIT_ONLY=
endif
ifeq ($(GEN_KEY_CACHE_COVERAGE_ONLY),TRUE)
COVERAGE_ONLY=--coverage-only
else
COVERAGE_ONLY=
endif
ifeq ($(FORWARD_COMPAT),ON)
FORWARD_COMPAT_FEATURE=forward_compatibility
else
FORWARD_COMPAT_FEATURE=
endif
# Variables used only for regex_engine example
REGEX_STRING?=''
REGEX_PATTERN?=''
# tfhe-cuda-backend
TFHECUDA_SRC="backends/tfhe-cuda-backend/cuda"
TFHECUDA_BUILD=$(TFHECUDA_SRC)/build
# Exclude these files from coverage reports
define COVERAGE_EXCLUDED_FILES
--exclude-files apps/trivium/src/trivium/* \
--exclude-files apps/trivium/src/kreyvium/* \
--exclude-files apps/trivium/src/static_deque/* \
--exclude-files apps/trivium/src/trans_ciphering/* \
--exclude-files tasks/src/* \
--exclude-files tfhe/benches/boolean/* \
--exclude-files tfhe/benches/core_crypto/* \
--exclude-files tfhe/benches/shortint/* \
--exclude-files tfhe/benches/integer/* \
--exclude-files tfhe/benches/* \
--exclude-files tfhe/examples/regex_engine/* \
--exclude-files tfhe/examples/utilities/*
endef
.PHONY: rs_check_toolchain # Echo the rust toolchain used for checks
rs_check_toolchain:
@echo $(RS_CHECK_TOOLCHAIN)
@@ -77,136 +121,208 @@ install_wasm_pack: install_rs_build_toolchain
install_node:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh | $(SHELL)
source ~/.bashrc
$(SHELL) -i -c 'nvm install node' || \
$(SHELL) -i -c 'nvm install $(NODE_VERSION)' || \
( echo "Unable to install node, unknown error." && exit 1 )
.PHONY: install_dieharder # Install dieharder for apt distributions or macOS
install_dieharder:
@dieharder -h > /dev/null 2>&1 || \
if [[ "$(OS)" == "Linux" ]]; then \
sudo apt update && sudo apt install -y dieharder; \
elif [[ "$(OS)" == "Darwin" ]]; then\
brew install dieharder; \
fi || ( echo "Unable to install dieharder, unknown error." && exit 1 )
.PHONY: install_tarpaulin # Install tarpaulin to perform code coverage
install_tarpaulin: install_rs_build_toolchain
@cargo tarpaulin --version > /dev/null 2>&1 || \
cargo $(CARGO_RS_BUILD_TOOLCHAIN) install cargo-tarpaulin --locked || \
( echo "Unable to install cargo tarpaulin, unknown error." && exit 1 )
.PHONY: check_linelint_installed # Check if linelint newline linter is installed
check_linelint_installed:
@printf "\n" | linelint - > /dev/null 2>&1 || \
( echo "Unable to locate linelint. Try installing it: https://github.com/fernandrone/linelint/releases" && exit 1 )
.PHONY: fmt # Format rust code
fmt: install_rs_check_toolchain
cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" fmt
.PHONT: check_fmt # Check rust code format
.PHONY: fmt_gpu # Format rust and cuda code
fmt_gpu: install_rs_check_toolchain
cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" fmt
cd "$(TFHECUDA_SRC)" && ./format_tfhe_cuda_backend.sh
.PHONY: check_fmt # Check rust code format
check_fmt: install_rs_check_toolchain
cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" fmt --check
.PHONY: clippy_gpu # Run clippy lints on the gpu backend
clippy_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE),integer,shortint,gpu \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: fix_newline # Fix newline at end of file issues to be UNIX compliant
fix_newline: check_linelint_installed
linelint -a .
.PHONY: check_newline # Check for newline at end of file to be UNIX compliant
check_newline: check_linelint_installed
linelint .
.PHONY: clippy_core # Run clippy lints on core_crypto with and without experimental features
clippy_core: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE) \
-p tfhe -- --no-deps -D warnings
-p $(TFHE_SPEC) -- --no-deps -D warnings
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE),experimental \
-p tfhe -- --no-deps -D warnings
-p $(TFHE_SPEC) -- --no-deps -D warnings
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE),nightly-avx512 \
-p $(TFHE_SPEC) -- --no-deps -D warnings
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE),experimental,nightly-avx512 \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_boolean # Run clippy lints enabling the boolean features
clippy_boolean: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE),boolean \
-p tfhe -- --no-deps -D warnings
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_shortint # Run clippy lints enabling the shortint features
clippy_shortint: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE),shortint \
-p tfhe -- --no-deps -D warnings
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_integer # Run clippy lints enabling the integer features
clippy_integer: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE),integer \
-p tfhe -- --no-deps -D warnings
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy # Run clippy lints enabling the boolean, shortint, integer
clippy: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer \
-p tfhe -- --no-deps -D warnings
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_c_api # Run clippy lints enabling the boolean, shortint and the C API
clippy_c_api: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api \
-p tfhe -- --no-deps -D warnings
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_js_wasm_api # Run clippy lints enabling the boolean, shortint, integer and the js wasm API
clippy_js_wasm_api: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api \
-p tfhe -- --no-deps -D warnings
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_tasks # Run clippy lints on helper tasks crate.
clippy_tasks:
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
-p tasks -- --no-deps -D warnings
.PHONY: clippy_trivium # Run clippy lints on Trivium app
clippy_trivium: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
-p tfhe-trivium -- --no-deps -D warnings
.PHONY: clippy_all_targets # Run clippy lints on all targets (benches, examples, etc.)
clippy_all_targets:
clippy_all_targets: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer \
-p tfhe -- --no-deps -D warnings
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_all_targets_forward_compatibility # Run clippy lints on all targets (benches, examples, etc.)
clippy_all_targets_forward_compatibility: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,forward_compatibility \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_concrete_csprng # Run clippy lints on concrete-csprng
clippy_concrete_csprng:
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=$(TARGET_ARCH_FEATURE) \
-p concrete-csprng -- --no-deps -D warnings
.PHONY: clippy_all # Run all clippy targets
clippy_all: clippy clippy_boolean clippy_shortint clippy_integer clippy_all_targets clippy_c_api \
clippy_js_wasm_api clippy_tasks clippy_core
clippy_js_wasm_api clippy_tasks clippy_core clippy_concrete_csprng clippy_trivium \
clippy_all_targets_forward_compatibility
.PHONY: clippy_fast # Run main clippy targets
clippy_fast: clippy clippy_all_targets clippy_c_api clippy_js_wasm_api clippy_tasks clippy_core
.PHONY: gen_key_cache # Run the script to generate keys and cache them for shortint tests
gen_key_cache: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) run --profile $(CARGO_PROFILE) \
--example generates_test_keys \
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache -p tfhe -- \
$(MULTI_BIT_ONLY)
clippy_fast: clippy clippy_all_targets clippy_c_api clippy_js_wasm_api clippy_tasks clippy_core \
clippy_concrete_csprng
.PHONY: build_core # Build core_crypto without experimental features
build_core: install_rs_build_toolchain install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE) -p tfhe
--features=$(TARGET_ARCH_FEATURE) -p $(TFHE_SPEC)
@if [[ "$(AVX512_SUPPORT)" == "ON" ]]; then \
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),$(AVX512_FEATURE) -p tfhe; \
--features=$(TARGET_ARCH_FEATURE),$(AVX512_FEATURE) -p $(TFHE_SPEC); \
fi
.PHONY: build_core_experimental # Build core_crypto with experimental features
build_core_experimental: install_rs_build_toolchain install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),experimental -p tfhe
--features=$(TARGET_ARCH_FEATURE),experimental -p $(TFHE_SPEC)
@if [[ "$(AVX512_SUPPORT)" == "ON" ]]; then \
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),experimental,$(AVX512_FEATURE) -p tfhe; \
--features=$(TARGET_ARCH_FEATURE),experimental,$(AVX512_FEATURE) -p $(TFHE_SPEC); \
fi
.PHONY: build_boolean # Build with boolean enabled
build_boolean: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean -p tfhe --all-targets
--features=$(TARGET_ARCH_FEATURE),boolean -p $(TFHE_SPEC) --all-targets
.PHONY: build_shortint # Build with shortint enabled
build_shortint: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),shortint -p tfhe --all-targets
--features=$(TARGET_ARCH_FEATURE),shortint -p $(TFHE_SPEC) --all-targets
.PHONY: build_integer # Build with integer enabled
build_integer: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer -p tfhe --all-targets
--features=$(TARGET_ARCH_FEATURE),integer -p $(TFHE_SPEC) --all-targets
.PHONY: build_tfhe_full # Build with boolean, shortint and integer enabled
build_tfhe_full: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer -p tfhe --all-targets
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer -p $(TFHE_SPEC) --all-targets
.PHONY: symlink_c_libs_without_fingerprint # Link the .a and .so files without the changing hash part in target
symlink_c_libs_without_fingerprint:
@./scripts/symlink_c_libs_without_fingerprint.sh \
--cargo-profile "$(CARGO_PROFILE)" \
--lib-name tfhe-c-api-dynamic-buffer
.PHONY: build_c_api # Build the C API for boolean, shortint and integer
build_c_api: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api, \
-p tfhe
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,$(FORWARD_COMPAT_FEATURE) \
-p $(TFHE_SPEC)
@"$(MAKE)" symlink_c_libs_without_fingerprint
.PHONY: build_c_api_gpu # Build the C API for boolean, shortint and integer
build_c_api_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,gpu \
-p $(TFHE_SPEC)
@"$(MAKE)" symlink_c_libs_without_fingerprint
.PHONY: build_c_api_experimental_deterministic_fft # Build the C API for boolean, shortint and integer with experimental deterministic FFT
build_c_api_experimental_deterministic_fft: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,experimental-force_fft_algo_dif4 \
-p tfhe
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api,experimental-force_fft_algo_dif4,$(FORWARD_COMPAT_FEATURE) \
-p $(TFHE_SPEC)
@"$(MAKE)" symlink_c_libs_without_fingerprint
.PHONY: build_web_js_api # Build the js API targeting the web browser
build_web_js_api: install_rs_build_toolchain install_wasm_pack
@@ -231,82 +347,190 @@ build_node_js_api: install_rs_build_toolchain install_wasm_pack
wasm-pack build --release --target=nodejs \
-- --features=boolean-client-js-wasm-api,shortint-client-js-wasm-api,integer-client-js-wasm-api
.PHONY: build_concrete_csprng # Build concrete_csprng
build_concrete_csprng: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) build --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE) -p concrete-csprng --all-targets
.PHONY: test_core_crypto # Run the tests of the core_crypto module including experimental ones
test_core_crypto: install_rs_build_toolchain install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),experimental -p tfhe -- core_crypto::
--features=$(TARGET_ARCH_FEATURE),experimental -p $(TFHE_SPEC) -- core_crypto::
@if [[ "$(AVX512_SUPPORT)" == "ON" ]]; then \
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),experimental,$(AVX512_FEATURE) -p tfhe -- core_crypto::; \
--features=$(TARGET_ARCH_FEATURE),experimental,$(AVX512_FEATURE) -p $(TFHE_SPEC) -- core_crypto::; \
fi
.PHONY: test_core_crypto_cov # Run the tests of the core_crypto module with code coverage
test_core_crypto_cov: install_rs_build_toolchain install_rs_check_toolchain install_tarpaulin
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) tarpaulin --profile $(CARGO_PROFILE) \
--out xml --output-dir coverage/core_crypto --line --engine llvm --timeout 500 \
--implicit-test-threads $(COVERAGE_EXCLUDED_FILES) \
--features=$(TARGET_ARCH_FEATURE),experimental,internal-keycache,__coverage \
-p $(TFHE_SPEC) -- core_crypto::
@if [[ "$(AVX512_SUPPORT)" == "ON" ]]; then \
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) tarpaulin --profile $(CARGO_PROFILE) \
--out xml --output-dir coverage/core_crypto_avx512 --line --engine llvm --timeout 500 \
--implicit-test-threads $(COVERAGE_EXCLUDED_FILES) \
--features=$(TARGET_ARCH_FEATURE),experimental,internal-keycache,__coverage,$(AVX512_FEATURE) \
-p $(TFHE_SPEC) -- core_crypto::; \
fi
.PHONY: test_gpu # Run the tests of the core_crypto module including experimental on the gpu backend
test_gpu: test_core_crypto_gpu test_integer_gpu
.PHONY: test_core_crypto_gpu # Run the tests of the core_crypto module including experimental on the gpu backend
test_core_crypto_gpu: install_rs_build_toolchain install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- core_crypto::gpu::
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --doc --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- core_crypto::gpu::
.PHONY: test_integer_gpu # Run the tests of the integer module including experimental on the gpu backend
test_integer_gpu: install_rs_build_toolchain install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- integer::gpu::server_key::
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --doc --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- integer::gpu::server_key::
.PHONY: test_boolean # Run the tests of the boolean module
test_boolean: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean -p tfhe -- boolean::
--features=$(TARGET_ARCH_FEATURE),boolean -p $(TFHE_SPEC) -- boolean::
.PHONY: test_boolean_cov # Run the tests of the boolean module with code coverage
test_boolean_cov: install_rs_check_toolchain install_tarpaulin
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) tarpaulin --profile $(CARGO_PROFILE) \
--out xml --output-dir coverage/boolean --line --engine llvm --timeout 500 \
$(COVERAGE_EXCLUDED_FILES) \
--features=$(TARGET_ARCH_FEATURE),boolean,internal-keycache,__coverage \
-p $(TFHE_SPEC) -- boolean::
.PHONY: test_c_api_rs # Run the rust tests for the C API
test_c_api_rs: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean-c-api,shortint-c-api,high-level-c-api \
-p tfhe \
-p $(TFHE_SPEC) \
c_api
.PHONY: test_c_api_c # Run the C tests for the C API
test_c_api_c: build_c_api
./scripts/c_api_tests.sh
./scripts/c_api_tests.sh --forward-compat "$(FORWARD_COMPAT)"
.PHONY: test_c_api # Run all the tests for the C API
test_c_api: test_c_api_rs test_c_api_c
.PHONY: test_c_api_gpu # Run the C tests for the C API
test_c_api_gpu: build_c_api_gpu
./scripts/c_api_tests.sh --gpu
.PHONY: test_shortint_ci # Run the tests for shortint ci
test_shortint_ci: install_rs_build_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
./scripts/shortint-tests.sh --rust-toolchain $(CARGO_RS_BUILD_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)"
--cargo-profile "$(CARGO_PROFILE)" --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_shortint_multi_bit_ci # Run the tests for shortint ci running only multibit tests
test_shortint_multi_bit_ci: install_rs_build_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
./scripts/shortint-tests.sh --rust-toolchain $(CARGO_RS_BUILD_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_shortint # Run all the tests for shortint
test_shortint: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache -p tfhe -- shortint::
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache -p $(TFHE_SPEC) -- shortint::
.PHONY: test_shortint_cov # Run the tests of the shortint module with code coverage
test_shortint_cov: install_rs_check_toolchain install_tarpaulin
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) tarpaulin --profile $(CARGO_PROFILE) \
--out xml --output-dir coverage/shortint --line --engine llvm --timeout 500 \
$(COVERAGE_EXCLUDED_FILES) \
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache,__coverage \
-p $(TFHE_SPEC) -- shortint::
.PHONY: test_integer_ci # Run the tests for integer ci
test_integer_ci: install_rs_build_toolchain install_cargo_nextest
test_integer_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_BUILD_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)"
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --avx512-support "$(AVX512_SUPPORT)" \
--tfhe-package "$(TFHE_SPEC)"
.PHONY: test_unsigned_integer_ci # Run the tests for unsigned integer ci
test_unsigned_integer_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --avx512-support "$(AVX512_SUPPORT)" \
--unsigned-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_signed_integer_ci # Run the tests for signed integer ci
test_signed_integer_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --avx512-support "$(AVX512_SUPPORT)" \
--signed-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_integer_multi_bit_ci # Run the tests for integer ci running only multibit tests
test_integer_multi_bit_ci: install_rs_build_toolchain install_cargo_nextest
test_integer_multi_bit_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_BUILD_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --avx512-support "$(AVX512_SUPPORT)" \
--tfhe-package "$(TFHE_SPEC)"
.PHONY: test_unsigned_integer_multi_bit_ci # Run the tests for nsigned integer ci running only multibit tests
test_unsigned_integer_multi_bit_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --avx512-support "$(AVX512_SUPPORT)" \
--unsigned-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_signed_integer_multi_bit_ci # Run the tests for nsigned integer ci running only multibit tests
test_signed_integer_multi_bit_ci: install_rs_check_toolchain install_cargo_nextest
BIG_TESTS_INSTANCE="$(BIG_TESTS_INSTANCE)" \
FAST_TESTS="$(FAST_TESTS)" \
./scripts/integer-tests.sh --rust-toolchain $(CARGO_RS_CHECK_TOOLCHAIN) \
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --avx512-support "$(AVX512_SUPPORT)" \
--signed-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_safe_deserialization # Run the tests for safe deserialization
test_safe_deserialization: install_rs_build_toolchain install_cargo_nextest
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache -p $(TFHE_SPEC) -- safe_deserialization::
.PHONY: test_integer # Run all the tests for integer
test_integer: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache -p tfhe -- integer::
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache -p $(TFHE_SPEC) -- integer::
.PHONY: test_high_level_api # Run all the tests for high_level_api
test_high_level_api: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache -p tfhe \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache -p $(TFHE_SPEC) \
-- high_level_api::
.PHONY: test_forward_compatibility # Run forward compatibility tests
test_forward_compatibility: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --tests --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,forward_compatibility,internal-keycache -p $(TFHE_SPEC) \
-- forward_compatibility::
.PHONY: test_user_doc # Run tests from the .md documentation
test_user_doc: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) --doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache -p tfhe \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache -p $(TFHE_SPEC) \
-- test_user_docs::
.PHONY: test_user_doc_gpu # Run tests for GPU from the .md documentation
test_user_doc_gpu: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) --doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer,internal-keycache,gpu -p $(TFHE_SPEC) \
-- test_user_docs::
.PHONY: test_regex_engine # Run tests for regex_engine example
@@ -327,20 +551,23 @@ test_examples: test_sha256_bool test_regex_engine
.PHONY: test_trivium # Run tests for trivium
test_trivium: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
trivium --features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer \
-- --test-threads=1
-p tfhe-trivium -- --test-threads=1 trivium::
.PHONY: test_kreyvium # Run tests for kreyvium
test_kreyvium: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
kreyvium --features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer \
-- --test-threads=1
-p tfhe-trivium -- --test-threads=1 kreyvium::
.PHONY: test_concrete_csprng # Run concrete-csprng tests
test_concrete_csprng:
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE) -p concrete-csprng
.PHONY: doc # Build rust doc
doc: install_rs_check_toolchain
RUSTDOCFLAGS="--html-in-header katex-header.html" \
cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer --no-deps
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer --no-deps -p $(TFHE_SPEC)
.PHONY: docs # Build rust doc alias for doc
docs: doc
@@ -349,7 +576,7 @@ docs: doc
lint_doc: install_rs_check_toolchain
RUSTDOCFLAGS="--html-in-header katex-header.html -Dwarnings" \
cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" doc \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer --no-deps
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,integer -p $(TFHE_SPEC) --no-deps
.PHONY: lint_docs # Build rust doc with linting enabled alias for lint_doc
lint_docs: lint_doc
@@ -367,17 +594,19 @@ format_doc_latex:
check_compile_tests:
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --no-run \
--features=$(TARGET_ARCH_FEATURE),experimental,boolean,shortint,integer,internal-keycache \
-p tfhe
-p $(TFHE_SPEC)
@if [[ "$(OS)" == "Linux" || "$(OS)" == "Darwin" ]]; then \
"$(MAKE)" build_c_api; \
./scripts/c_api_tests.sh --build-only; \
"$(MAKE)" build_c_api && \
./scripts/c_api_tests.sh --build-only --forward-compat "$(FORWARD_COMPAT)" && \
FORWARD_COMPAT=ON "$(MAKE)" build_c_api && \
./scripts/c_api_tests.sh --build-only --forward-compat "$(FORWARD_COMPAT)"; \
fi
.PHONY: build_nodejs_test_docker # Build a docker image with tools to run nodejs tests for wasm API
build_nodejs_test_docker:
DOCKER_BUILDKIT=1 docker build --build-arg RUST_TOOLCHAIN="$(RS_BUILD_TOOLCHAIN)" \
-f docker/Dockerfile.wasm_tests -t tfhe-wasm-tests .
-f docker/Dockerfile.wasm_tests --build-arg NODE_VERSION=$(NODE_VERSION) -t tfhe-wasm-tests .
.PHONY: test_nodejs_wasm_api_in_docker # Run tests for the nodejs on wasm API in a docker container
test_nodejs_wasm_api_in_docker: build_nodejs_test_docker
@@ -401,7 +630,8 @@ test_web_js_api_parallel: build_web_js_api_parallel
.PHONY: ci_test_web_js_api_parallel # Run tests for the web wasm api
ci_test_web_js_api_parallel: build_web_js_api_parallel
source ~/.nvm/nvm.sh && \
nvm use node && \
nvm install $(NODE_VERSION) && \
nvm use $(NODE_VERSION) && \
$(MAKE) -C tfhe/web_wasm_parallel_tests test-ci
.PHONY: no_tfhe_typo # Check we did not invert the h and f in tfhe
@@ -412,31 +642,78 @@ no_tfhe_typo:
no_dbg_log:
@./scripts/no_dbg_calls.sh
.PHONY: dieharder_csprng # Run the dieharder test suite on our CSPRNG implementation
dieharder_csprng: install_dieharder build_concrete_csprng
./scripts/dieharder_test.sh
#
# Benchmarks
#
.PHONY: bench_integer # Run benchmarks for integer
.PHONY: bench_integer # Run benchmarks for unsigned integer
bench_integer: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,$(AVX512_FEATURE) -p tfhe --
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC) --
.PHONY: bench_integer_multi_bit # Run benchmarks for integer using multi-bit parameters
.PHONY: bench_signed_integer # Run benchmarks for signed integer
bench_signed_integer: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-signed-bench \
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC) --
.PHONY: bench_integer_gpu # Run benchmarks for integer on GPU backend
bench_integer_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=$(TARGET_ARCH_FEATURE),integer,gpu,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC) --
.PHONY: bench_integer_multi_bit # Run benchmarks for unsigned integer using multi-bit parameters
bench_integer_multi_bit: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=MULTI_BIT \
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,$(AVX512_FEATURE) -p tfhe --
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC) --
.PHONY: bench_signed_integer_multi_bit # Run benchmarks for signed integer using multi-bit parameters
bench_signed_integer_multi_bit: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=MULTI_BIT \
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-signed-bench \
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC) --
.PHONY: bench_integer_multi_bit_gpu # Run benchmarks for integer on GPU backend using multi-bit parameters
bench_integer_multi_bit_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=MULTI_BIT \
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=$(TARGET_ARCH_FEATURE),integer,gpu,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC) --
.PHONY: bench_shortint # Run benchmarks for shortint
bench_shortint: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench shortint-bench \
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache,$(AVX512_FEATURE) -p tfhe
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC)
.PHONY: bench_oprf # Run benchmarks for shortint
bench_oprf: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench oprf-shortint-bench \
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC)
RUSTFLAGS="$(RUSTFLAGS)" \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench oprf-integer-bench \
--features=$(TARGET_ARCH_FEATURE),integer,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC)
.PHONY: bench_shortint_multi_bit # Run benchmarks for shortint using multi-bit parameters
bench_shortint_multi_bit: install_rs_check_toolchain
@@ -444,20 +721,26 @@ bench_shortint_multi_bit: install_rs_check_toolchain
__TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench shortint-bench \
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache,$(AVX512_FEATURE) -p tfhe --
--features=$(TARGET_ARCH_FEATURE),shortint,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC) --
.PHONY: bench_boolean # Run benchmarks for boolean
bench_boolean: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench boolean-bench \
--features=$(TARGET_ARCH_FEATURE),boolean,internal-keycache,$(AVX512_FEATURE) -p tfhe
--features=$(TARGET_ARCH_FEATURE),boolean,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC)
.PHONY: bench_pbs # Run benchmarks for PBS
bench_pbs: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench pbs-bench \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,internal-keycache,$(AVX512_FEATURE) -p tfhe
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC)
.PHONY: bench_pbs_gpu # Run benchmarks for PBS on GPU backend
bench_pbs_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench pbs-bench \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,gpu,internal-keycache,$(AVX512_FEATURE) -p $(TFHE_SPEC)
.PHONY: bench_web_js_api_parallel # Run benchmarks for the web wasm api
bench_web_js_api_parallel: build_web_js_api_parallel
@@ -472,6 +755,18 @@ ci_bench_web_js_api_parallel: build_web_js_api_parallel
#
# Utility tools
#
.PHONY: gen_key_cache # Run the script to generate keys and cache them for shortint tests
gen_key_cache: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) run --profile $(CARGO_PROFILE) \
--example generates_test_keys \
--features=$(TARGET_ARCH_FEATURE),boolean,shortint,internal-keycache -- \
$(MULTI_BIT_ONLY) $(COVERAGE_ONLY)
.PHONY: gen_key_cache_core_crypto # Run function to generate keys and cache them for core_crypto tests
gen_key_cache_core_crypto: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --tests --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),experimental,internal-keycache -p $(TFHE_SPEC) -- --nocapture \
core_crypto::keycache::generate_keys
.PHONY: measure_hlapi_compact_pk_ct_sizes # Measure sizes of public keys and ciphertext for high-level API
measure_hlapi_compact_pk_ct_sizes: install_rs_check_toolchain
@@ -534,14 +829,17 @@ sha256_bool: install_rs_check_toolchain
--example sha256_bool \
--features=$(TARGET_ARCH_FEATURE),boolean
.PHONY: pcc # pcc stands for pre commit checks
.PHONY: pcc # pcc stands for pre commit checks (except GPU)
pcc: no_tfhe_typo no_dbg_log check_fmt lint_doc clippy_all check_compile_tests
.PHONY: pcc_gpu # pcc stands for pre commit checks for GPU compilation
pcc_gpu: pcc clippy_gpu
.PHONY: fpcc # pcc stands for pre commit checks, the f stands for fast
fpcc: no_tfhe_typo no_dbg_log check_fmt lint_doc clippy_fast check_compile_tests
.PHONY: conformance # Automatically fix problems that can be fixed
conformance: fmt
conformance: fix_newline fmt
.PHONY: help # Generate list of targets with descriptions
help:

View File

@@ -4,13 +4,17 @@
</p>
<hr/>
<p align="center">
<a href="https://docs.zama.ai/tfhe-rs"> 📒 Read documentation</a> | <a href="https://zama.ai/community"> 💛 Community support</a>
<a href="https://docs.zama.ai/tfhe-rs"> 📒 Read documentation</a> | <a href="https://zama.ai/community"> 💛 Community support</a> | <a href="https://github.com/zama-ai/awesome-zama"> 📚 FHE resources</a>
</p>
<p align="center">
<!-- Version badge using shields.io -->
<a href="https://github.com/zama-ai/tfhe-rs/releases">
<img src="https://img.shields.io/github/v/release/zama-ai/tfhe-rs?style=flat-square">
</a>
<!-- Link to tutorials badge using shields.io -->
<a href="#license">
<img src="https://img.shields.io/badge/License-BSD--3--Clause--Clear-orange?style=flat-square">
</a>
<!-- Zama Bounty Program -->
<a href="https://github.com/zama-ai/bounty-program">
<img src="https://img.shields.io/badge/Contribute-Zama%20Bounty%20Program-yellow?style=flat-square">
@@ -47,7 +51,7 @@ tfhe = { version = "*", features = ["boolean", "shortint", "integer", "x86_64-un
```toml
tfhe = { version = "*", features = ["boolean", "shortint", "integer", "aarch64-unix"] }
```
Note: users with ARM devices must use `TFHE-rs` by compiling using the `nightly` toolchain.
Note: users with ARM devices must compile `TFHE-rs` using a stable toolchain with version >= 1.72.
+ For x86_64-based machines with the [`rdseed instruction`](https://en.wikipedia.org/wiki/RDRAND)
@@ -57,7 +61,7 @@ running Windows:
tfhe = { version = "*", features = ["boolean", "shortint", "integer", "x86_64"] }
```
Note: aarch64-based machines are not yet supported for Windows as it's currently missing an entropy source to be able to seed the [CSPRNGs](https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator) used in TFHE-rs
Note: aarch64-based machines are not yet supported for Windows as it's currently missing an entropy source to be able to seed the [CSPRNGs](https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator) used in TFHE-rs.
## A simple example
@@ -70,9 +74,7 @@ use tfhe::{generate_keys, set_server_key, ConfigBuilder, FheUint32, FheUint8};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Basic configuration to use homomorphic integers
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
// Key generation
let (client_key, server_keys) = generate_keys(config);
@@ -92,10 +94,10 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
// On the server side:
set_server_key(server_keys);
// Clear equivalent computations: 1344 * 8 = 10752
// Clear equivalent computations: 1344 * 5 = 6720
let encrypted_res_mul = &encrypted_a * &encrypted_b;
// Clear equivalent computations: 1344 >> 8 = 42
// Clear equivalent computations: 1344 >> 5 = 42
encrypted_a = &encrypted_res_mul >> &encrypted_b;
// Clear equivalent computations: let casted_a = a as u8;
@@ -120,7 +122,7 @@ To run this code, use the following command:
<p align="center"> <code> cargo run --release </code> </p>
Note that when running code that uses `tfhe-rs`, it is highly recommended
to run in release mode with cargo's `--release` flag to have the best performances possible,
to run in release mode with cargo's `--release` flag to have the best performances possible.
## Contributing
@@ -140,9 +142,11 @@ libraries.
## Need support?
<a target="_blank" href="https://community.zama.ai">
<img src="https://user-images.githubusercontent.com/5758427/231115030-21195b55-2629-4c01-9809-be5059243999.png">
<img src="https://github.com/zama-ai/tfhe-rs/assets/157474013/33d856dc-f25d-454b-a010-af12bff2aa7d">
</a>
## Citing TFHE-rs
To cite TFHE-rs in academic papers, please use the following entry:

View File

@@ -17,7 +17,7 @@ path = "../../tfhe"
features = [ "boolean", "shortint", "integer", "aarch64-unix" ]
[dev-dependencies]
criterion = { version = "0.4", features = [ "html_reports" ]}
criterion = { version = "0.5.1", features = [ "html_reports" ]}
[[bench]]
name = "trivium"

View File

@@ -120,7 +120,7 @@ fn main() {
# FHE byte Trivium implementation
The same objects have also been implemented to stream bytes insead of booleans. They can be constructed and used in the same way via the functions `TriviumStreamByte::<u8>::new` and
The same objects have also been implemented to stream bytes instead of booleans. They can be constructed and used in the same way via the functions `TriviumStreamByte::<u8>::new` and
`TriviumStreamByte::<FheUint8>::new` with the same arguments as before. The `FheUint8` version is significantly slower than the `FheBool` version, because not running
with the same cryptographic parameters. Its interest lie in its trans-ciphering capabilities: `TriviumStreamByte<FheUint8>` implements the trait `TransCiphering`,
meaning it implements the functions `trans_encrypt_64`. This function takes as input a `FheUint64` and outputs a `FheUint64`, the output being
@@ -146,7 +146,7 @@ use tfhe::prelude::*;
use tfhe_trivium::TriviumStreamShortint;
fn test_shortint() {
let config = ConfigBuilder::all_disabled().enable_default_integers().build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let (client_key, server_key): (ClientKey, ServerKey) = gen_keys(PARAM_MESSAGE_1_CARRY_1_KS_PBS);
let ksk = CastingKey::new((&client_key, &server_key), (&hl_client_key, &hl_server_key));

View File

@@ -6,7 +6,7 @@ use tfhe_trivium::KreyviumStream;
use criterion::Criterion;
pub fn kreyvium_bool_gen(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled().enable_default_bool().build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();
@@ -41,7 +41,7 @@ pub fn kreyvium_bool_gen(c: &mut Criterion) {
}
pub fn kreyvium_bool_warmup(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled().enable_default_bool().build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();

View File

@@ -6,9 +6,8 @@ use tfhe_trivium::{KreyviumStreamByte, TransCiphering};
use criterion::Criterion;
pub fn kreyvium_byte_gen(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.enable_function_evaluation_integers()
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let (client_key, server_key) = generate_keys(config);
@@ -36,9 +35,8 @@ pub fn kreyvium_byte_gen(c: &mut Criterion) {
}
pub fn kreyvium_byte_trans(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.enable_function_evaluation_integers()
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let (client_key, server_key) = generate_keys(config);
@@ -67,9 +65,8 @@ pub fn kreyvium_byte_trans(c: &mut Criterion) {
}
pub fn kreyvium_byte_warmup(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.enable_function_evaluation_integers()
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let (client_key, server_key) = generate_keys(config);

View File

@@ -8,9 +8,7 @@ use tfhe_trivium::{KreyviumStreamShortint, TransCiphering};
use criterion::Criterion;
pub fn kreyvium_shortint_warmup(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
@@ -60,9 +58,7 @@ pub fn kreyvium_shortint_warmup(c: &mut Criterion) {
}
pub fn kreyvium_shortint_gen(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
@@ -107,9 +103,7 @@ pub fn kreyvium_shortint_gen(c: &mut Criterion) {
}
pub fn kreyvium_shortint_trans(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();

View File

@@ -6,7 +6,7 @@ use tfhe_trivium::TriviumStream;
use criterion::Criterion;
pub fn trivium_bool_gen(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled().enable_default_bool().build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB".to_string();
@@ -41,7 +41,7 @@ pub fn trivium_bool_gen(c: &mut Criterion) {
}
pub fn trivium_bool_warmup(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled().enable_default_bool().build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB".to_string();

View File

@@ -6,9 +6,7 @@ use tfhe_trivium::{TransCiphering, TriviumStreamByte};
use criterion::Criterion;
pub fn trivium_byte_gen(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB".to_string();
@@ -35,9 +33,7 @@ pub fn trivium_byte_gen(c: &mut Criterion) {
}
pub fn trivium_byte_trans(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB".to_string();
@@ -65,9 +61,7 @@ pub fn trivium_byte_trans(c: &mut Criterion) {
}
pub fn trivium_byte_warmup(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB".to_string();

View File

@@ -8,9 +8,7 @@ use tfhe_trivium::{TransCiphering, TriviumStreamShortint};
use criterion::Criterion;
pub fn trivium_shortint_warmup(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
@@ -60,9 +58,7 @@ pub fn trivium_shortint_warmup(c: &mut Criterion) {
}
pub fn trivium_shortint_gen(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
@@ -107,9 +103,7 @@ pub fn trivium_shortint_gen(c: &mut Criterion) {
}
pub fn trivium_shortint_trans(c: &mut Criterion) {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();

View File

@@ -1,5 +1,5 @@
//! This module implements the Kreyvium stream cipher, using booleans or FheBool
//! for the representaion of the inner bits.
//! for the representation of the inner bits.
use crate::static_deque::StaticDeque;
@@ -35,7 +35,7 @@ pub struct KreyviumStream<T> {
}
impl KreyviumStream<bool> {
/// Contructor for `KreyviumStream<bool>`: arguments are the secret key and the input vector.
/// Constructor for `KreyviumStream<bool>`: arguments are the secret key and the input vector.
/// Outputs a KreyviumStream object already initialized (1152 steps have been run before
/// returning)
pub fn new(mut key: [bool; 128], mut iv: [bool; 128]) -> KreyviumStream<bool> {
@@ -118,7 +118,7 @@ where
T: KreyviumBoolInput<T> + std::marker::Send + std::marker::Sync,
for<'a> &'a T: KreyviumBoolInput<T>,
{
/// Internal generic contructor: arguments are already prepared registers, and an optional FHE
/// Internal generic constructor: arguments are already prepared registers, and an optional FHE
/// server key
fn new_from_registers(
a_register: [T; 93],

View File

@@ -1,5 +1,5 @@
//! This module implements the Kreyvium stream cipher, using u8 or FheUint8
//! for the representaion of the inner bits.
//! for the representation of the inner bits.
use crate::static_deque::{StaticByteDeque, StaticByteDequeInput};
@@ -31,7 +31,7 @@ impl KreyviumByteInput<FheUint8> for &FheUint8 {}
/// representation of bits (u8 or FheUint8). To be able to compute FHE operations, it also owns
/// an Option for a ServerKey.
/// Since the original Kreyvium registers' sizes are not a multiple of 8, these registers (which
/// store byte-like objects) have a size that is the eigth of the closest multiple of 8 above the
/// store byte-like objects) have a size that is the eighth of the closest multiple of 8 above the
/// originals' sizes.
pub struct KreyviumStreamByte<T> {
a_byte: StaticByteDeque<12, T>,
@@ -43,7 +43,7 @@ pub struct KreyviumStreamByte<T> {
}
impl KreyviumStreamByte<u8> {
/// Contructor for `KreyviumStreamByte<u8>`: arguments are the secret key and the input vector.
/// Constructor for `KreyviumStreamByte<u8>`: arguments are the secret key and the input vector.
/// Outputs a KreyviumStream object already initialized (1152 steps have been run before
/// returning)
pub fn new(key_bytes: [u8; 16], iv_bytes: [u8; 16]) -> KreyviumStreamByte<u8> {
@@ -146,7 +146,7 @@ where
T: KreyviumByteInput<T> + Send,
for<'a> &'a T: KreyviumByteInput<T>,
{
/// Internal generic contructor: arguments are already prepared registers, and an optional FHE
/// Internal generic constructor: arguments are already prepared registers, and an optional FHE
/// server key
fn new_from_registers(
a_register: [T; 12],

View File

@@ -19,7 +19,7 @@ pub struct KreyviumStreamShortint {
}
impl KreyviumStreamShortint {
/// Contructor for KreyviumStreamShortint: arguments are the secret key and the input vector,
/// Constructor for KreyviumStreamShortint: arguments are the secret key and the input vector,
/// and a ServerKey reference. Outputs a KreyviumStream object already initialized (1152
/// steps have been run before returning)
pub fn new(
@@ -149,7 +149,7 @@ impl KreyviumStreamShortint {
.unchecked_add_assign(&mut new_c, c5);
self.internal_server_key
.unchecked_add_assign(&mut new_c, &temp_b);
self.internal_server_key.clear_carry_assign(&mut new_c);
self.internal_server_key.message_extract_assign(&mut new_c);
new_c
},
|| {

View File

@@ -170,7 +170,7 @@ fn kreyvium_test_4() {
#[test]
fn kreyvium_test_fhe_long() {
let config = ConfigBuilder::all_disabled().enable_default_bool().build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB000000000000".to_string();
@@ -217,9 +217,7 @@ use tfhe::shortint::prelude::*;
#[test]
fn kreyvium_test_shortint_long() {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();
@@ -302,9 +300,8 @@ fn kreyvium_test_clear_byte() {
#[test]
fn kreyvium_test_byte_long() {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.enable_function_evaluation_integers()
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let (client_key, server_key) = generate_keys(config);
@@ -342,9 +339,8 @@ fn kreyvium_test_byte_long() {
#[test]
fn kreyvium_test_fhe_byte_transciphering_long() {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.enable_function_evaluation_integers()
let config = ConfigBuilder::default()
.enable_function_evaluation()
.build();
let (client_key, server_key) = generate_keys(config);

View File

@@ -1,6 +1,6 @@
//! This module implements the StaticByteDeque struct: a deque of bytes. The idea
//! is that this is a wrapper around StaticDeque, but StaticByteDeque has an additional
//! functionnality: it can construct the "intermediate" bytes, made of parts of other bytes.
//! functionality: it can construct the "intermediate" bytes, made of parts of other bytes.
//! This is pretending to store bits, and allows accessing bits in chunks of 8 consecutive.
use crate::static_deque::StaticDeque;

View File

@@ -5,7 +5,7 @@
use core::ops::{Index, IndexMut};
/// StaticDeque: a struct implementing a deque whose size is known at compile time.
/// It has 2 members: the static array conatining the data (never empty), and a cursor
/// It has 2 members: the static array containing the data (never empty), and a cursor
/// equal to the index of the oldest element (and the next one to be overwritten).
#[derive(Clone)]
pub struct StaticDeque<const N: usize, T> {

View File

@@ -4,6 +4,7 @@
use crate::{KreyviumStreamByte, KreyviumStreamShortint, TriviumStreamByte, TriviumStreamShortint};
use tfhe::shortint::Ciphertext;
use tfhe::prelude::*;
use tfhe::{set_server_key, unset_server_key, FheUint64, FheUint8, ServerKey};
use rayon::prelude::*;

View File

@@ -1,6 +1,5 @@
#[allow(clippy::module_inception)]
mod trivium;
pub use trivium::TriviumStream;
mod trivium_bool;
pub use trivium_bool::TriviumStream;
mod trivium_byte;
pub use trivium_byte::TriviumStreamByte;

View File

@@ -232,7 +232,7 @@ fn trivium_test_clear_byte() {
#[test]
fn trivium_test_fhe_long() {
let config = ConfigBuilder::all_disabled().enable_default_bool().build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB".to_string();
@@ -277,9 +277,7 @@ fn trivium_test_fhe_long() {
#[test]
fn trivium_test_fhe_byte_long() {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB".to_string();
@@ -316,9 +314,7 @@ fn trivium_test_fhe_byte_long() {
#[test]
fn trivium_test_fhe_byte_transciphering_long() {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (client_key, server_key) = generate_keys(config);
let key_string = "0053A6F94C9FF24598EB".to_string();
@@ -357,9 +353,7 @@ use tfhe::shortint::prelude::*;
#[test]
fn trivium_test_shortint_long() {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let config = ConfigBuilder::default().build();
let (hl_client_key, hl_server_key) = generate_keys(config);
let underlying_ck: tfhe::shortint::ClientKey = (*hl_client_key.as_ref()).clone().into();
let underlying_sk: tfhe::shortint::ServerKey = (*hl_server_key.as_ref()).clone().into();

View File

@@ -1,5 +1,5 @@
//! This module implements the Trivium stream cipher, using booleans or FheBool
//! for the representaion of the inner bits.
//! for the representation of the inner bits.
use crate::static_deque::StaticDeque;
@@ -33,7 +33,7 @@ pub struct TriviumStream<T> {
}
impl TriviumStream<bool> {
/// Contructor for `TriviumStream<bool>`: arguments are the secret key and the input vector.
/// Constructor for `TriviumStream<bool>`: arguments are the secret key and the input vector.
/// Outputs a TriviumStream object already initialized (1152 steps have been run before
/// returning)
pub fn new(key: [bool; 80], iv: [bool; 80]) -> TriviumStream<bool> {
@@ -94,7 +94,7 @@ where
T: TriviumBoolInput<T> + std::marker::Send + std::marker::Sync,
for<'a> &'a T: TriviumBoolInput<T>,
{
/// Internal generic contructor: arguments are already prepared registers, and an optional FHE
/// Internal generic constructor: arguments are already prepared registers, and an optional FHE
/// server key
fn new_from_registers(
a_register: [T; 93],

View File

@@ -1,5 +1,5 @@
//! This module implements the Trivium stream cipher, using u8 or FheUint8
//! for the representaion of the inner bits.
//! for the representation of the inner bits.
use crate::static_deque::{StaticByteDeque, StaticByteDequeInput};
@@ -31,7 +31,7 @@ impl TriviumByteInput<FheUint8> for &FheUint8 {}
/// representation of bits (u8 or FheUint8). To be able to compute FHE operations, it also owns
/// an Option for a ServerKey.
/// Since the original Trivium registers' sizes are not a multiple of 8, these registers (which
/// store byte-like objects) have a size that is the eigth of the closest multiple of 8 above the
/// store byte-like objects) have a size that is the eighth of the closest multiple of 8 above the
/// originals' sizes.
pub struct TriviumStreamByte<T> {
a_byte: StaticByteDeque<12, T>,
@@ -41,7 +41,7 @@ pub struct TriviumStreamByte<T> {
}
impl TriviumStreamByte<u8> {
/// Contructor for `TriviumStreamByte<u8>`: arguments are the secret key and the input vector.
/// Constructor for `TriviumStreamByte<u8>`: arguments are the secret key and the input vector.
/// Outputs a TriviumStream object already initialized (1152 steps have been run before
/// returning)
pub fn new(key: [u8; 10], iv: [u8; 10]) -> TriviumStreamByte<u8> {
@@ -111,7 +111,7 @@ where
T: TriviumByteInput<T> + Send,
for<'a> &'a T: TriviumByteInput<T>,
{
/// Internal generic contructor: arguments are already prepared registers, and an optional FHE
/// Internal generic constructor: arguments are already prepared registers, and an optional FHE
/// server key
fn new_from_registers(
a_register: [T; 12],

View File

@@ -17,9 +17,9 @@ pub struct TriviumStreamShortint {
}
impl TriviumStreamShortint {
/// Contructor for TriviumStreamShortint: arguments are the secret key and the input vector, and
/// a ServerKey reference. Outputs a TriviumStream object already initialized (1152 steps
/// have been run before returning)
/// Constructor for TriviumStreamShortint: arguments are the secret key and the input vector,
/// and a ServerKey reference. Outputs a TriviumStream object already initialized (1152
/// steps have been run before returning)
pub fn new(
key: [Ciphertext; 80],
iv: [u64; 80],
@@ -113,7 +113,7 @@ impl TriviumStreamShortint {
.unchecked_add_assign(&mut new_a, a5);
self.internal_server_key
.unchecked_add_assign(&mut new_a, &temp_c);
self.internal_server_key.clear_carry_assign(&mut new_a);
self.internal_server_key.message_extract_assign(&mut new_a);
new_a
},
|| {
@@ -122,7 +122,7 @@ impl TriviumStreamShortint {
.unchecked_add_assign(&mut new_b, b5);
self.internal_server_key
.unchecked_add_assign(&mut new_b, &temp_a);
self.internal_server_key.clear_carry_assign(&mut new_b);
self.internal_server_key.message_extract_assign(&mut new_b);
new_b
},
)
@@ -135,7 +135,7 @@ impl TriviumStreamShortint {
.unchecked_add_assign(&mut new_c, c5);
self.internal_server_key
.unchecked_add_assign(&mut new_c, &temp_b);
self.internal_server_key.clear_carry_assign(&mut new_c);
self.internal_server_key.message_extract_assign(&mut new_c);
new_c
},
|| {

View File

@@ -0,0 +1,18 @@
[package]
name = "tfhe-cuda-backend"
version = "0.1.3"
edition = "2021"
authors = ["Zama team"]
license = "BSD-3-Clause-Clear"
description = "Cuda implementation of TFHE-rs primitives."
homepage = "https://www.zama.ai/"
documentation = "https://docs.zama.ai/tfhe-rs"
repository = "https://github.com/zama-ai/tfhe-rs"
readme = "README.md"
keywords = ["fully", "homomorphic", "encryption", "fhe", "cryptography"]
[build-dependencies]
cmake = { version = "0.1" }
[dependencies]
thiserror = "1.0"

View File

@@ -0,0 +1,28 @@
BSD 3-Clause Clear License
Copyright © 2024 ZAMA.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
3. Neither the name of ZAMA nor the names of its contributors may be used to endorse
or promote products derived from this software without specific prior written permission.
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE.
THIS SOFTWARE IS PROVIDED BY THE ZAMA AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
ZAMA OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,53 @@
# TFHE Cuda backend
## Introduction
The `tfhe-cuda-backend` holds the code for GPU acceleration of Zama's variant of TFHE.
It implements CUDA/C++ functions to perform homomorphic operations on LWE ciphertexts.
It provides functions to allocate memory on the GPU, to copy data back
and forth between the CPU and the GPU, to create and destroy Cuda streams, etc.:
- `cuda_create_stream`, `cuda_destroy_stream`
- `cuda_malloc`, `cuda_check_valid_malloc`
- `cuda_memcpy_async_to_cpu`, `cuda_memcpy_async_to_gpu`
- `cuda_get_number_of_gpus`
- `cuda_synchronize_device`
The cryptographic operations it provides are:
- an amortized implementation of the TFHE programmable bootstrap: `cuda_bootstrap_amortized_lwe_ciphertext_vector_32` and `cuda_bootstrap_amortized_lwe_ciphertext_vector_64`
- a low latency implementation of the TFHE programmable bootstrap: `cuda_bootstrap_low latency_lwe_ciphertext_vector_32` and `cuda_bootstrap_low_latency_lwe_ciphertext_vector_64`
- the keyswitch: `cuda_keyswitch_lwe_ciphertext_vector_32` and `cuda_keyswitch_lwe_ciphertext_vector_64`
- the larger precision programmable bootstrap (wop PBS, which supports up to 16 bits of message while the classical PBS only supports up to 8 bits of message) and its sub-components: `cuda_wop_pbs_64`, `cuda_extract_bits_64`, `cuda_circuit_bootstrap_64`, `cuda_cmux_tree_64`, `cuda_blind_rotation_sample_extraction_64`
- acceleration for leveled operations: `cuda_negate_lwe_ciphertext_vector_64`, `cuda_add_lwe_ciphertext_vector_64`, `cuda_add_lwe_ciphertext_vector_plaintext_vector_64`, `cuda_mult_lwe_ciphertext_vector_cleartext_vector`.
## Dependencies
**Disclaimer**: Compilation on Windows/Mac is not supported yet. Only Nvidia GPUs are supported.
- nvidia driver - for example, if you're running Ubuntu 20.04 check this [page](https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-20-04-focal-fossa-linux) for installation
- [nvcc](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) >= 10.0
- [gcc](https://gcc.gnu.org/) >= 8.0 - check this [page](https://gist.github.com/ax3l/9489132) for more details about nvcc/gcc compatible versions
- [cmake](https://cmake.org/) >= 3.24
## Build
The Cuda project held in `tfhe-cuda-backend` can be compiled independently from Concrete in the
following way:
```
git clone git@github.com:zama-ai/tfhe-rs
cd backends/tfhe-cuda-backend/cuda
mkdir build
cd build
cmake ..
make
```
The compute capability is detected automatically (with the first GPU information) and set accordingly.
If your machine does not have an available Nvidia GPU, the compilation will work if you have the nvcc compiler installed. The generated executable will target a 7.0 compute capability (sm_70).
## Links
- [TFHE](https://eprint.iacr.org/2018/421.pdf)
## License
This software is distributed under the BSD-3-Clause-Clear license. If you have any questions,
please contact us at `hello@zama.ai`.

View File

@@ -0,0 +1,28 @@
use std::env;
use std::process::Command;
fn main() {
println!("Build tfhe-cuda-backend");
if env::consts::OS == "linux" {
let output = Command::new("./get_os_name.sh").output().unwrap();
let distribution = String::from_utf8(output.stdout).unwrap();
if distribution != "Ubuntu\n" {
println!(
"cargo:warning=This Linux distribution is not officially supported. \
Only Ubuntu is supported by tfhe-cuda-backend at this time. Build may fail\n"
);
}
let dest = cmake::build("cuda");
println!("cargo:rustc-link-search=native={}", dest.display());
println!("cargo:rustc-link-lib=static=tfhe_cuda_backend");
println!("cargo:rustc-link-search=native=/usr/local/cuda/lib64");
println!("cargo:rustc-link-lib=gomp");
println!("cargo:rustc-link-lib=cudart");
println!("cargo:rustc-link-search=native=/usr/lib/x86_64-linux-gnu/");
println!("cargo:rustc-link-lib=stdc++");
} else {
panic!(
"Error: platform not supported, tfhe-cuda-backend not built (only Linux is supported)"
);
}
}

View File

@@ -0,0 +1,10 @@
# -----------------------------
# Options effecting formatting.
# -----------------------------
with section("format"):
# How wide to allow formatted cmake files
line_width = 120
# How many spaces to tab for indent
tab_size = 2

View File

@@ -0,0 +1,90 @@
cmake_minimum_required(VERSION 3.24 FATAL_ERROR)
project(tfhe_cuda_backend LANGUAGES CXX)
# See if the minimum CUDA version is available. If not, only enable documentation building.
set(MINIMUM_SUPPORTED_CUDA_VERSION 10.0)
include(CheckLanguage)
# See if CUDA is available
check_language(CUDA)
# If so, enable CUDA to check the version.
if(CMAKE_CUDA_COMPILER)
enable_language(CUDA)
endif()
# If CUDA is not available, or the minimum version is too low do not build
if(NOT CMAKE_CUDA_COMPILER)
message(FATAL_ERROR "Cuda compiler not found.")
endif()
if(CMAKE_CUDA_COMPILER_VERSION VERSION_LESS ${MINIMUM_SUPPORTED_CUDA_VERSION})
message(FATAL_ERROR "CUDA ${MINIMUM_SUPPORTED_CUDA_VERSION} or greater is required for compilation.")
endif()
# Get CUDA compute capability
set(OUTPUTFILE ${CMAKE_CURRENT_SOURCE_DIR}/cuda_script) # No suffix required
set(CUDAFILE ${CMAKE_CURRENT_SOURCE_DIR}/check_cuda.cu)
execute_process(COMMAND nvcc -lcuda ${CUDAFILE} -o ${OUTPUTFILE})
execute_process(
COMMAND ${OUTPUTFILE}
RESULT_VARIABLE CUDA_RETURN_CODE
OUTPUT_VARIABLE ARCH)
file(REMOVE ${OUTPUTFILE})
if(${CUDA_RETURN_CODE} EQUAL 0)
set(CUDA_SUCCESS "TRUE")
else()
set(CUDA_SUCCESS "FALSE")
endif()
if(${CUDA_SUCCESS})
message(STATUS "CUDA Architecture: ${ARCH}")
message(STATUS "CUDA Version: ${CUDA_VERSION_STRING}")
message(STATUS "CUDA Path: ${CUDA_TOOLKIT_ROOT_DIR}")
message(STATUS "CUDA Libraries: ${CUDA_LIBRARIES}")
message(STATUS "CUDA Performance Primitives: ${CUDA_npp_LIBRARY}")
else()
message(WARNING ${ARCH})
endif()
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()
# Add OpenMP support
find_package(OpenMP REQUIRED)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler ${OpenMP_CXX_FLAGS}")
if(${CUDA_SUCCESS})
set(CMAKE_CUDA_ARCHITECTURES native)
else()
set(CMAKE_CUDA_ARCHITECTURES 70)
endif()
# in production, should use -arch=sm_70 --ptxas-options=-v to see register spills -lineinfo for better debugging
set(CMAKE_CUDA_FLAGS
"${CMAKE_CUDA_FLAGS} -ccbin ${CMAKE_CXX_COMPILER} -O3 \
-std=c++17 --no-exceptions --expt-relaxed-constexpr -rdc=true \
--use_fast_math -Xcompiler -fPIC")
set(INCLUDE_DIR include)
add_subdirectory(src)
target_include_directories(tfhe_cuda_backend PRIVATE ${INCLUDE_DIR})
# This is required for rust cargo build
install(TARGETS tfhe_cuda_backend DESTINATION .)
install(TARGETS tfhe_cuda_backend DESTINATION lib)
# Define a function to add a lint target.
find_file(CPPLINT NAMES cpplint cpplint.exe)
if(CPPLINT)
# Add a custom target to lint all child projects. Dependencies are specified in child projects.
add_custom_target(all_lint)
# Don't trigger this target on ALL_BUILD or Visual Studio 'Rebuild Solution'
set_target_properties(all_lint PROPERTIES EXCLUDE_FROM_ALL TRUE)
# set_target_properties(all_lint PROPERTIES EXCLUDE_FROM_DEFAULT_BUILD TRUE)
endif()
enable_testing()

View File

@@ -0,0 +1,3 @@
set noparent
linelength=240
filter=-legal/copyright,-readability/todo,-runtime/references,-build/c++17

View File

@@ -0,0 +1,22 @@
#include <stdio.h>
int main(int argc, char **argv) {
cudaDeviceProp dP;
float min_cc = 3.0;
int rc = cudaGetDeviceProperties(&dP, 0);
if (rc != cudaSuccess) {
cudaError_t error = cudaGetLastError();
printf("CUDA error: %s", cudaGetErrorString(error));
return rc; /* Failure */
}
if ((dP.major + (dP.minor / 10)) < min_cc) {
printf("Min Compute Capability of %2.1f required: %d.%d found\n Not "
"Building CUDA Code",
min_cc, dP.major, dP.minor);
return 1; /* Failure */
} else {
printf("-arch=sm_%d%d", dP.major, dP.minor);
return 0; /* Success */
}
}

View File

@@ -0,0 +1,6 @@
#!/bin/bash
find ./{include,src} -iregex '^.*\.\(cpp\|cu\|h\|cuh\)$' -print | xargs clang-format-15 -i -style='file'
cmake-format -i CMakeLists.txt -c .cmake-format-config.py
find ./{include,src} -type f -name "CMakeLists.txt" | xargs -I % sh -c 'cmake-format -i % -c .cmake-format-config.py'

View File

@@ -0,0 +1,118 @@
#ifndef CUDA_BOOTSTRAP_H
#define CUDA_BOOTSTRAP_H
#include "device.h"
#include <cstdint>
enum PBS_TYPE { MULTI_BIT = 0, LOW_LAT = 1, AMORTIZED = 2 };
extern "C" {
void cuda_fourier_polynomial_mul(void *input1, void *input2, void *output,
cuda_stream_t *stream,
uint32_t polynomial_size,
uint32_t total_polynomials);
void cuda_convert_lwe_bootstrap_key_32(void *dest, void *src,
cuda_stream_t *stream,
uint32_t input_lwe_dim,
uint32_t glwe_dim, uint32_t level_count,
uint32_t polynomial_size);
void cuda_convert_lwe_bootstrap_key_64(void *dest, void *src,
cuda_stream_t *stream,
uint32_t input_lwe_dim,
uint32_t glwe_dim, uint32_t level_count,
uint32_t polynomial_size);
void scratch_cuda_bootstrap_amortized_32(
cuda_stream_t *stream, int8_t **pbs_buffer, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t input_lwe_ciphertext_count,
uint32_t max_shared_memory, bool allocate_gpu_memory);
void scratch_cuda_bootstrap_amortized_64(
cuda_stream_t *stream, int8_t **pbs_buffer, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t input_lwe_ciphertext_count,
uint32_t max_shared_memory, bool allocate_gpu_memory);
void cuda_bootstrap_amortized_lwe_ciphertext_vector_32(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lut_vector, void *lut_vector_indexes, void *lwe_array_in,
void *lwe_input_indexes, void *bootstrapping_key, int8_t *pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t base_log, uint32_t level_count, uint32_t num_samples,
uint32_t num_luts, uint32_t lwe_idx, uint32_t max_shared_memory);
void cuda_bootstrap_amortized_lwe_ciphertext_vector_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lut_vector, void *lut_vector_indexes, void *lwe_array_in,
void *lwe_input_indexes, void *bootstrapping_key, int8_t *pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t base_log, uint32_t level_count, uint32_t num_samples,
uint32_t num_luts, uint32_t lwe_idx, uint32_t max_shared_memory);
void cleanup_cuda_bootstrap_amortized(cuda_stream_t *stream,
int8_t **pbs_buffer);
void scratch_cuda_bootstrap_low_latency_32(
cuda_stream_t *stream, int8_t **pbs_buffer, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
void scratch_cuda_bootstrap_low_latency_64(
cuda_stream_t *stream, int8_t **pbs_buffer, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory,
bool allocate_gpu_memory);
void cuda_bootstrap_low_latency_lwe_ciphertext_vector_32(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lut_vector, void *lut_vector_indexes, void *lwe_array_in,
void *lwe_input_indexes, void *bootstrapping_key, int8_t *pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t base_log, uint32_t level_count, uint32_t num_samples,
uint32_t num_luts, uint32_t lwe_idx, uint32_t max_shared_memory);
void cuda_bootstrap_low_latency_lwe_ciphertext_vector_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lut_vector, void *lut_vector_indexes, void *lwe_array_in,
void *lwe_input_indexes, void *bootstrapping_key, int8_t *pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t base_log, uint32_t level_count, uint32_t num_samples,
uint32_t num_luts, uint32_t lwe_idx, uint32_t max_shared_memory);
void cleanup_cuda_bootstrap_low_latency(cuda_stream_t *stream,
int8_t **pbs_buffer);
uint64_t get_buffer_size_bootstrap_amortized_64(
uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory);
uint64_t get_buffer_size_bootstrap_low_latency_64(
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, uint32_t max_shared_memory);
}
#ifdef __CUDACC__
__device__ inline int get_start_ith_ggsw(int i, uint32_t polynomial_size,
int glwe_dimension,
uint32_t level_count);
template <typename T>
__device__ T *get_ith_mask_kth_block(T *ptr, int i, int k, int level,
uint32_t polynomial_size,
int glwe_dimension, uint32_t level_count);
template <typename T>
__device__ T *get_ith_body_kth_block(T *ptr, int i, int k, int level,
uint32_t polynomial_size,
int glwe_dimension, uint32_t level_count);
template <typename T>
__device__ T *get_multi_bit_ith_lwe_gth_group_kth_block(
T *ptr, int g, int i, int k, int level, uint32_t grouping_factor,
uint32_t polynomial_size, uint32_t glwe_dimension, uint32_t level_count);
#endif
#endif // CUDA_BOOTSTRAP_H

View File

@@ -0,0 +1,46 @@
#ifndef CUDA_MULTI_BIT_H
#define CUDA_MULTI_BIT_H
#include <cstdint>
extern "C" {
void cuda_convert_lwe_multi_bit_bootstrap_key_64(
void *dest, void *src, cuda_stream_t *stream, uint32_t input_lwe_dim,
uint32_t glwe_dim, uint32_t level_count, uint32_t polynomial_size,
uint32_t grouping_factor);
void cuda_multi_bit_pbs_lwe_ciphertext_vector_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lut_vector, void *lut_vector_indexes, void *lwe_array_in,
void *lwe_input_indexes, void *bootstrapping_key, int8_t *pbs_buffer,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t grouping_factor, uint32_t base_log, uint32_t level_count,
uint32_t num_samples, uint32_t num_luts, uint32_t lwe_idx,
uint32_t max_shared_memory, uint32_t chunk_size = 0);
void scratch_cuda_multi_bit_pbs_64(
cuda_stream_t *stream, int8_t **pbs_buffer, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t grouping_factor, uint32_t input_lwe_ciphertext_count,
uint32_t max_shared_memory, bool allocate_gpu_memory,
uint32_t chunk_size = 0);
void cleanup_cuda_multi_bit_pbs(cuda_stream_t *stream, int8_t **pbs_buffer);
}
#ifdef __CUDACC__
__host__ uint32_t get_lwe_chunk_size(uint32_t lwe_dimension,
uint32_t level_count,
uint32_t glwe_dimension,
uint32_t num_samples);
__host__ uint32_t get_average_lwe_chunk_size(uint32_t lwe_dimension,
uint32_t level_count,
uint32_t glwe_dimension,
uint32_t ct_count);
__host__ uint64_t get_max_buffer_size_multibit_bootstrap(
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t level_count, uint32_t max_input_lwe_ciphertext_count);
#endif
#endif // CUDA_MULTI_BIT_H

View File

@@ -0,0 +1,18 @@
#ifndef CUDA_CIPHERTEXT_H
#define CUDA_CIPHERTEXT_H
#include <cstdint>
extern "C" {
void cuda_convert_lwe_ciphertext_vector_to_gpu_64(void *dest, void *src,
void *v_stream,
uint32_t gpu_index,
uint32_t number_of_cts,
uint32_t lwe_dimension);
void cuda_convert_lwe_ciphertext_vector_to_cpu_64(void *dest, void *src,
void *v_stream,
uint32_t gpu_index,
uint32_t number_of_cts,
uint32_t lwe_dimension);
};
#endif

View File

@@ -0,0 +1,88 @@
#ifndef DEVICE_H
#define DEVICE_H
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <cuda_runtime.h>
#define synchronize_threads_in_block() __syncthreads()
extern "C" {
struct cuda_stream_t {
cudaStream_t stream;
uint32_t gpu_index;
cuda_stream_t(uint32_t gpu_index) {
this->gpu_index = gpu_index;
cudaStreamCreate(&stream);
}
void release() {
cudaSetDevice(gpu_index);
cudaStreamDestroy(stream);
}
void synchronize() { cudaStreamSynchronize(stream); }
};
cuda_stream_t *cuda_create_stream(uint32_t gpu_index);
int cuda_destroy_stream(cuda_stream_t *stream);
void *cuda_malloc(uint64_t size, uint32_t gpu_index);
void *cuda_malloc_async(uint64_t size, cuda_stream_t *stream);
int cuda_check_valid_malloc(uint64_t size, uint32_t gpu_index);
int cuda_check_support_cooperative_groups();
int cuda_memcpy_to_cpu(void *dest, const void *src, uint64_t size);
int cuda_memcpy_async_to_gpu(void *dest, void *src, uint64_t size,
cuda_stream_t *stream);
int cuda_memcpy_async_gpu_to_gpu(void *dest, void *src, uint64_t size,
cuda_stream_t *stream);
int cuda_memcpy_to_gpu(void *dest, void *src, uint64_t size);
int cuda_memcpy_async_to_cpu(void *dest, const void *src, uint64_t size,
cuda_stream_t *stream);
int cuda_memset_async(void *dest, uint64_t val, uint64_t size,
cuda_stream_t *stream);
int cuda_get_number_of_gpus();
int cuda_synchronize_device(uint32_t gpu_index);
int cuda_drop(void *ptr, uint32_t gpu_index);
int cuda_drop_async(void *ptr, cuda_stream_t *stream);
int cuda_get_max_shared_memory(uint32_t gpu_index);
int cuda_synchronize_stream(cuda_stream_t *stream);
#define check_cuda_error(ans) \
{ cuda_error((ans), __FILE__, __LINE__); }
inline void cuda_error(cudaError_t code, const char *file, int line,
bool abort = true) {
if (code != cudaSuccess) {
fprintf(stderr, "Cuda error: %s %s %d\n", cudaGetErrorString(code), file,
line);
if (abort)
exit(code);
}
}
}
template <typename Torus>
void cuda_set_value_async(cudaStream_t *stream, Torus *d_array, Torus value,
Torus n);
#endif

View File

@@ -0,0 +1,100 @@
#include "cuComplex.h"
#include "thrust/complex.h"
#include <iostream>
#include <string>
#include <type_traits>
#define PRINT_VARS
#ifdef PRINT_VARS
#define PRINT_DEBUG_5(var, begin, end, step, cond) \
_print_debug(var, #var, begin, end, step, cond, "", false)
#define PRINT_DEBUG_6(var, begin, end, step, cond, text) \
_print_debug(var, #var, begin, end, step, cond, text, true)
#define CAT(A, B) A##B
#define PRINT_SELECT(NAME, NUM) CAT(NAME##_, NUM)
#define GET_COUNT(_1, _2, _3, _4, _5, _6, COUNT, ...) COUNT
#define VA_SIZE(...) GET_COUNT(__VA_ARGS__, 6, 5, 4, 3, 2, 1)
#define PRINT_DEBUG(...) \
PRINT_SELECT(PRINT_DEBUG, VA_SIZE(__VA_ARGS__))(__VA_ARGS__)
#else
#define PRINT_DEBUG(...)
#endif
template <typename T>
__device__ typename std::enable_if<std::is_unsigned<T>::value, void>::type
_print_debug(T *var, const char *var_name, int start, int end, int step,
bool cond, const char *text, bool has_text) {
__syncthreads();
if (cond) {
if (has_text)
printf("%s\n", text);
for (int i = start; i < end; i += step) {
printf("%s[%u]: %u\n", var_name, i, var[i]);
}
}
__syncthreads();
}
template <typename T>
__device__ typename std::enable_if<std::is_signed<T>::value, void>::type
_print_debug(T *var, const char *var_name, int start, int end, int step,
bool cond, const char *text, bool has_text) {
__syncthreads();
if (cond) {
if (has_text)
printf("%s\n", text);
for (int i = start; i < end; i += step) {
printf("%s[%u]: %d\n", var_name, i, var[i]);
}
}
__syncthreads();
}
template <typename T>
__device__ typename std::enable_if<std::is_floating_point<T>::value, void>::type
_print_debug(T *var, const char *var_name, int start, int end, int step,
bool cond, const char *text, bool has_text) {
__syncthreads();
if (cond) {
if (has_text)
printf("%s\n", text);
for (int i = start; i < end; i += step) {
printf("%s[%u]: %.15f\n", var_name, i, var[i]);
}
}
__syncthreads();
}
template <typename T>
__device__
typename std::enable_if<std::is_same<T, thrust::complex<double>>::value,
void>::type
_print_debug(T *var, const char *var_name, int start, int end, int step,
bool cond, const char *text, bool has_text) {
__syncthreads();
if (cond) {
if (has_text)
printf("%s\n", text);
for (int i = start; i < end; i += step) {
printf("%s[%u]: %.15f , %.15f\n", var_name, i, var[i].real(),
var[i].imag());
}
}
__syncthreads();
}
template <typename T>
__device__
typename std::enable_if<std::is_same<T, cuDoubleComplex>::value, void>::type
_print_debug(T *var, const char *var_name, int start, int end, int step,
bool cond, const char *text, bool has_text) {
__syncthreads();
if (cond) {
if (has_text)
printf("%s\n", text);
for (int i = start; i < end; i += step) {
printf("%s[%u]: %.15f , %.15f\n", var_name, i, var[i].x, var[i].y);
}
}
__syncthreads();
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,21 @@
#ifndef CNCRT_KS_H_
#define CNCRT_KS_H_
#include <cstdint>
extern "C" {
void cuda_keyswitch_lwe_ciphertext_vector_32(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lwe_array_in, void *lwe_input_indexes, void *ksk,
uint32_t lwe_dimension_in, uint32_t lwe_dimension_out, uint32_t base_log,
uint32_t level_count, uint32_t num_samples);
void cuda_keyswitch_lwe_ciphertext_vector_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lwe_array_in, void *lwe_input_indexes, void *ksk,
uint32_t lwe_dimension_in, uint32_t lwe_dimension_out, uint32_t base_log,
uint32_t level_count, uint32_t num_samples);
}
#endif // CNCRT_KS_H_

View File

@@ -0,0 +1,50 @@
#ifndef CUDA_LINALG_H_
#define CUDA_LINALG_H_
#include "bootstrap.h"
#include <cstdint>
#include <device.h>
extern "C" {
void cuda_negate_lwe_ciphertext_vector_32(cuda_stream_t *stream,
void *lwe_array_out,
void *lwe_array_in,
uint32_t input_lwe_dimension,
uint32_t input_lwe_ciphertext_count);
void cuda_negate_lwe_ciphertext_vector_64(cuda_stream_t *stream,
void *lwe_array_out,
void *lwe_array_in,
uint32_t input_lwe_dimension,
uint32_t input_lwe_ciphertext_count);
void cuda_add_lwe_ciphertext_vector_32(cuda_stream_t *stream,
void *lwe_array_out,
void *lwe_array_in_1,
void *lwe_array_in_2,
uint32_t input_lwe_dimension,
uint32_t input_lwe_ciphertext_count);
void cuda_add_lwe_ciphertext_vector_64(cuda_stream_t *stream,
void *lwe_array_out,
void *lwe_array_in_1,
void *lwe_array_in_2,
uint32_t input_lwe_dimension,
uint32_t input_lwe_ciphertext_count);
void cuda_add_lwe_ciphertext_vector_plaintext_vector_32(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_array_in,
void *plaintext_array_in, uint32_t input_lwe_dimension,
uint32_t input_lwe_ciphertext_count);
void cuda_add_lwe_ciphertext_vector_plaintext_vector_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_array_in,
void *plaintext_array_in, uint32_t input_lwe_dimension,
uint32_t input_lwe_ciphertext_count);
void cuda_mult_lwe_ciphertext_vector_cleartext_vector_32(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_array_in,
void *cleartext_array_in, uint32_t input_lwe_dimension,
uint32_t input_lwe_ciphertext_count);
void cuda_mult_lwe_ciphertext_vector_cleartext_vector_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_array_in,
void *cleartext_array_in, uint32_t input_lwe_dimension,
uint32_t input_lwe_ciphertext_count);
}
#endif // CUDA_LINALG_H_

View File

@@ -0,0 +1,18 @@
set(SOURCES
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/bit_extraction.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/bitwise_ops.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/bootstrap.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/bootstrap_multibit.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/ciphertext.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/circuit_bootstrap.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/device.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/integer.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/keyswitch.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/linear_algebra.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/shifts.h
${CMAKE_SOURCE_DIR}/${INCLUDE_DIR}/vertical_packing.h)
file(GLOB_RECURSE SOURCES "*.cu")
add_library(tfhe_cuda_backend STATIC ${SOURCES})
set_target_properties(tfhe_cuda_backend PROPERTIES CUDA_SEPARABLE_COMPILATION ON CUDA_RESOLVE_DEVICE_SYMBOLS ON)
target_link_libraries(tfhe_cuda_backend PUBLIC cudart OpenMP::OpenMP_CXX)
target_include_directories(tfhe_cuda_backend PRIVATE .)

View File

@@ -0,0 +1 @@
#include "ciphertext.cuh"

View File

@@ -0,0 +1,44 @@
#ifndef CUDA_CIPHERTEXT_CUH
#define CUDA_CIPHERTEXT_CUH
#include "ciphertext.h"
#include "device.h"
#include <cstdint>
template <typename T>
void cuda_convert_lwe_ciphertext_vector_to_gpu(T *dest, T *src,
cuda_stream_t *stream,
uint32_t number_of_cts,
uint32_t lwe_dimension) {
cudaSetDevice(stream->gpu_index);
uint64_t size = number_of_cts * (lwe_dimension + 1) * sizeof(T);
cuda_memcpy_async_to_gpu(dest, src, size, stream);
}
void cuda_convert_lwe_ciphertext_vector_to_gpu_64(void *dest, void *src,
cuda_stream_t *stream,
uint32_t number_of_cts,
uint32_t lwe_dimension) {
cuda_convert_lwe_ciphertext_vector_to_gpu<uint64_t>(
(uint64_t *)dest, (uint64_t *)src, stream, number_of_cts, lwe_dimension);
}
template <typename T>
void cuda_convert_lwe_ciphertext_vector_to_cpu(T *dest, T *src,
cuda_stream_t *stream,
uint32_t number_of_cts,
uint32_t lwe_dimension) {
cudaSetDevice(stream->gpu_index);
uint64_t size = number_of_cts * (lwe_dimension + 1) * sizeof(T);
cuda_memcpy_async_to_cpu(dest, src, size, stream);
}
void cuda_convert_lwe_ciphertext_vector_to_cpu_64(void *dest, void *src,
cuda_stream_t *stream,
uint32_t number_of_cts,
uint32_t lwe_dimension) {
cuda_convert_lwe_ciphertext_vector_to_cpu<uint64_t>(
(uint64_t *)dest, (uint64_t *)src, stream, number_of_cts, lwe_dimension);
}
#endif

View File

@@ -0,0 +1,162 @@
#ifndef CNCRT_CRYPTO_CUH
#define CNCRT_CRPYTO_CUH
#include "device.h"
#include <cstdint>
/**
* GadgetMatrix implements the iterator design pattern to decompose a set of
* num_poly consecutive polynomials with degree params::degree. A total of
* level_count levels is expected and each call to decompose_and_compress_next()
* writes to the result the next level. It is also possible to advance an
* arbitrary amount of levels by using decompose_and_compress_level().
*
* This class always decomposes the entire set of num_poly polynomials.
* By default, it works on a single polynomial.
*/
#pragma once
template <typename T, class params> class GadgetMatrix {
private:
uint32_t level_count;
uint32_t base_log;
uint32_t mask;
uint32_t halfbg;
uint32_t num_poly;
T offset;
int current_level;
T mask_mod_b;
T *state;
public:
__device__ GadgetMatrix(uint32_t base_log, uint32_t level_count, T *state,
uint32_t num_poly = 1)
: base_log(base_log), level_count(level_count), num_poly(num_poly),
state(state) {
mask_mod_b = (1ll << base_log) - 1ll;
current_level = level_count;
int tid = threadIdx.x;
for (int i = 0; i < num_poly * params::opt; i++) {
state[tid] >>= (sizeof(T) * 8 - base_log * level_count);
tid += params::degree / params::opt;
}
synchronize_threads_in_block();
}
// Decomposes all polynomials at once
__device__ void decompose_and_compress_next(double2 *result) {
for (int j = 0; j < num_poly; j++) {
auto result_slice = result + j * params::degree / 2;
decompose_and_compress_next_polynomial(result_slice, j);
}
}
// Decomposes a single polynomial
__device__ void decompose_and_compress_next_polynomial(double2 *result,
int j) {
if (j == 0)
current_level -= 1;
int tid = threadIdx.x;
auto state_slice = state + j * params::degree;
for (int i = 0; i < params::opt / 2; i++) {
T res_re = state_slice[tid] & mask_mod_b;
T res_im = state_slice[tid + params::degree / 2] & mask_mod_b;
state_slice[tid] >>= base_log;
state_slice[tid + params::degree / 2] >>= base_log;
T carry_re = ((res_re - 1ll) | state_slice[tid]) & res_re;
T carry_im =
((res_im - 1ll) | state_slice[tid + params::degree / 2]) & res_im;
carry_re >>= (base_log - 1);
carry_im >>= (base_log - 1);
state_slice[tid] += carry_re;
state_slice[tid + params::degree / 2] += carry_im;
res_re -= carry_re << base_log;
res_im -= carry_im << base_log;
result[tid].x = (int32_t)res_re;
result[tid].y = (int32_t)res_im;
tid += params::degree / params::opt;
}
synchronize_threads_in_block();
}
// Decomposes a single polynomial
__device__ void
decompose_and_compress_next_polynomial_elements(double2 *result, int j) {
if (j == 0)
current_level -= 1;
int tid = threadIdx.x;
auto state_slice = state + j * params::degree;
for (int i = 0; i < params::opt / 2; i++) {
T res_re = state_slice[tid] & mask_mod_b;
T res_im = state_slice[tid + params::degree / 2] & mask_mod_b;
state_slice[tid] >>= base_log;
state_slice[tid + params::degree / 2] >>= base_log;
T carry_re = ((res_re - 1ll) | state_slice[tid]) & res_re;
T carry_im =
((res_im - 1ll) | state_slice[tid + params::degree / 2]) & res_im;
carry_re >>= (base_log - 1);
carry_im >>= (base_log - 1);
state_slice[tid] += carry_re;
state_slice[tid + params::degree / 2] += carry_im;
res_re -= carry_re << base_log;
res_im -= carry_im << base_log;
result[i].x = (int32_t)res_re;
result[i].y = (int32_t)res_im;
tid += params::degree / params::opt;
}
synchronize_threads_in_block();
}
__device__ void decompose_and_compress_level(double2 *result, int level) {
for (int i = 0; i < level_count - level; i++)
decompose_and_compress_next(result);
}
};
template <typename T> class GadgetMatrixSingle {
private:
uint32_t level_count;
uint32_t base_log;
uint32_t mask;
uint32_t halfbg;
T offset;
public:
__device__ GadgetMatrixSingle(uint32_t base_log, uint32_t level_count)
: base_log(base_log), level_count(level_count) {
uint32_t bg = 1 << base_log;
this->halfbg = bg / 2;
this->mask = bg - 1;
T temp = 0;
for (int i = 0; i < this->level_count; i++) {
temp += 1ULL << (sizeof(T) * 8 - (i + 1) * this->base_log);
}
this->offset = temp * this->halfbg;
}
__device__ T decompose_one_level_single(T element, uint32_t level) {
T s = element + this->offset;
uint32_t decal = (sizeof(T) * 8 - (level + 1) * this->base_log);
T temp1 = (s >> decal) & this->mask;
return (T)(temp1 - this->halfbg);
}
};
template <typename Torus>
__device__ Torus decompose_one(Torus &state, Torus mask_mod_b, int base_log) {
Torus res = state & mask_mod_b;
state >>= base_log;
Torus carry = ((res - 1ll) | state) & res;
carry >>= base_log - 1;
state += carry;
res -= carry << base_log;
return res;
}
#endif // CNCRT_CRPYTO_H

View File

@@ -0,0 +1,74 @@
#ifndef CNCRT_GGSW_CUH
#define CNCRT_GGSW_CUH
#include "device.h"
#include "fft/bnsmfft.cuh"
#include "polynomial/parameters.cuh"
template <typename T, typename ST, class params, sharedMemDegree SMD>
__global__ void device_batch_fft_ggsw_vector(double2 *dest, T *src,
int8_t *device_mem) {
extern __shared__ int8_t sharedmem[];
double2 *selected_memory;
if constexpr (SMD == FULLSM)
selected_memory = (double2 *)sharedmem;
else
selected_memory = (double2 *)device_mem[blockIdx.x * params::degree];
// Compression
int offset = blockIdx.x * blockDim.x;
int tid = threadIdx.x;
#pragma unroll
for (int i = 0; i < params::opt / 2; i++) {
ST x = src[(tid) + params::opt * offset];
ST y = src[(tid + params::degree / 2) + params::opt * offset];
selected_memory[tid].x = x / (double)std::numeric_limits<T>::max();
selected_memory[tid].y = y / (double)std::numeric_limits<T>::max();
tid += params::degree / params::opt;
}
synchronize_threads_in_block();
// Switch to the FFT space
NSMFFT_direct<HalfDegree<params>>(selected_memory);
synchronize_threads_in_block();
// Write the output to global memory
tid = threadIdx.x;
#pragma unroll
for (int j = 0; j < params::opt / 2; j++) {
dest[tid + (params::opt >> 1) * offset] = selected_memory[tid];
tid += params::degree / params::opt;
}
}
/**
* Applies the FFT transform on sequence of GGSW ciphertexts already in the
* global memory
*/
template <typename T, typename ST, class params>
void batch_fft_ggsw_vector(cuda_stream_t *stream, double2 *dest, T *src,
int8_t *d_mem, uint32_t r, uint32_t glwe_dim,
uint32_t polynomial_size, uint32_t level_count,
uint32_t gpu_index, uint32_t max_shared_memory) {
cudaSetDevice(stream->gpu_index);
int shared_memory_size = sizeof(double) * polynomial_size;
int gridSize = r * (glwe_dim + 1) * (glwe_dim + 1) * level_count;
int blockSize = polynomial_size / params::opt;
if (max_shared_memory < shared_memory_size) {
device_batch_fft_ggsw_vector<T, ST, params, NOSM>
<<<gridSize, blockSize, 0, stream->stream>>>(dest, src, d_mem);
} else {
device_batch_fft_ggsw_vector<T, ST, params, FULLSM>
<<<gridSize, blockSize, shared_memory_size, stream->stream>>>(dest, src,
d_mem);
}
check_cuda_error(cudaGetLastError());
}
#endif // CNCRT_GGSW_CUH

View File

@@ -0,0 +1,48 @@
#include "keyswitch.cuh"
#include "keyswitch.h"
#include <cstdint>
/* Perform keyswitch on a batch of 32 bits input LWE ciphertexts.
* Head out to the equivalent operation on 64 bits for more details.
*/
void cuda_keyswitch_lwe_ciphertext_vector_32(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lwe_array_in, void *lwe_input_indexes, void *ksk,
uint32_t lwe_dimension_in, uint32_t lwe_dimension_out, uint32_t base_log,
uint32_t level_count, uint32_t num_samples) {
cuda_keyswitch_lwe_ciphertext_vector(
stream, static_cast<uint32_t *>(lwe_array_out),
static_cast<uint32_t *>(lwe_output_indexes),
static_cast<uint32_t *>(lwe_array_in),
static_cast<uint32_t *>(lwe_input_indexes), static_cast<uint32_t *>(ksk),
lwe_dimension_in, lwe_dimension_out, base_log, level_count, num_samples);
}
/* Perform keyswitch on a batch of 64 bits input LWE ciphertexts.
*
* - `v_stream` is a void pointer to the Cuda stream to be used in the kernel
* launch
* - `gpu_index` is the index of the GPU to be used in the kernel launch
* - lwe_array_out: output batch of num_samples keyswitched ciphertexts c =
* (a0,..an-1,b) where n is the output LWE dimension (lwe_dimension_out)
* - lwe_array_in: input batch of num_samples LWE ciphertexts, containing
* lwe_dimension_in mask values + 1 body value
* - ksk: the keyswitch key to be used in the operation
* - base log: the log of the base used in the decomposition (should be the one
* used to create the ksk)
*
* This function calls a wrapper to a device kernel that performs the keyswitch
* - num_samples blocks of threads are launched
*/
void cuda_keyswitch_lwe_ciphertext_vector_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_output_indexes,
void *lwe_array_in, void *lwe_input_indexes, void *ksk,
uint32_t lwe_dimension_in, uint32_t lwe_dimension_out, uint32_t base_log,
uint32_t level_count, uint32_t num_samples) {
cuda_keyswitch_lwe_ciphertext_vector(
stream, static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_output_indexes),
static_cast<uint64_t *>(lwe_array_in),
static_cast<uint64_t *>(lwe_input_indexes), static_cast<uint64_t *>(ksk),
lwe_dimension_in, lwe_dimension_out, base_log, level_count, num_samples);
}

View File

@@ -0,0 +1,144 @@
#ifndef CNCRT_KS_CUH
#define CNCRT_KS_CUH
#include "device.h"
#include "gadget.cuh"
#include "polynomial/polynomial_math.cuh"
#include "torus.cuh"
#include <thread>
#include <vector>
template <typename Torus>
__device__ Torus *get_ith_block(Torus *ksk, int i, int level,
uint32_t lwe_dimension_out,
uint32_t level_count) {
int pos = i * level_count * (lwe_dimension_out + 1) +
level * (lwe_dimension_out + 1);
Torus *ptr = &ksk[pos];
return ptr;
}
/*
* keyswitch kernel
* Each thread handles a piece of the following equation:
* $$GLWE_s2(\Delta.m+e) = (0,0,..,0,b) - \sum_{i=0,k-1} <Dec(a_i),
* (GLWE_s2(s1_i q/beta),..,GLWE(s1_i q/beta^l)>$$ where k is the dimension of
* the GLWE ciphertext. If the polynomial dimension in GLWE is > 1, this
* equation is solved for each polynomial coefficient. where Dec denotes the
* decomposition with base beta and l levels and the inner product is done
* between the decomposition of a_i and l GLWE encryptions of s1_i q/\beta^j,
* with j in [1,l] We obtain a GLWE encryption of Delta.m (with Delta the
* scaling factor) under key s2 instead of s1, with an increased noise
*
*/
template <typename Torus>
__global__ void
keyswitch(Torus *lwe_array_out, Torus *lwe_output_indexes, Torus *lwe_array_in,
Torus *lwe_input_indexes, Torus *ksk, uint32_t lwe_dimension_in,
uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
int lwe_lower, int lwe_upper, int cutoff) {
int tid = threadIdx.x;
extern __shared__ int8_t sharedmem[];
Torus *local_lwe_array_out = (Torus *)sharedmem;
auto block_lwe_array_in = get_chunk(
lwe_array_in, lwe_input_indexes[blockIdx.x], lwe_dimension_in + 1);
auto block_lwe_array_out = get_chunk(
lwe_array_out, lwe_output_indexes[blockIdx.x], lwe_dimension_out + 1);
auto gadget = GadgetMatrixSingle<Torus>(base_log, level_count);
int lwe_part_per_thd;
if (tid < cutoff) {
lwe_part_per_thd = lwe_upper;
} else {
lwe_part_per_thd = lwe_lower;
}
__syncthreads();
for (int k = 0; k < lwe_part_per_thd; k++) {
int idx = tid + k * blockDim.x;
local_lwe_array_out[idx] = 0;
}
__syncthreads();
if (tid == 0) {
local_lwe_array_out[lwe_dimension_out] =
block_lwe_array_in[lwe_dimension_in];
}
for (int i = 0; i < lwe_dimension_in; i++) {
__syncthreads();
Torus a_i =
round_to_closest_multiple(block_lwe_array_in[i], base_log, level_count);
Torus state = a_i >> (sizeof(Torus) * 8 - base_log * level_count);
Torus mask_mod_b = (1ll << base_log) - 1ll;
for (int j = 0; j < level_count; j++) {
auto ksk_block = get_ith_block(ksk, i, j, lwe_dimension_out, level_count);
Torus decomposed = decompose_one<Torus>(state, mask_mod_b, base_log);
for (int k = 0; k < lwe_part_per_thd; k++) {
int idx = tid + k * blockDim.x;
local_lwe_array_out[idx] -= (Torus)ksk_block[idx] * decomposed;
}
}
}
for (int k = 0; k < lwe_part_per_thd; k++) {
int idx = tid + k * blockDim.x;
block_lwe_array_out[idx] = local_lwe_array_out[idx];
}
}
/// assume lwe_array_in in the gpu
template <typename Torus>
__host__ void cuda_keyswitch_lwe_ciphertext_vector(
cuda_stream_t *stream, Torus *lwe_array_out, Torus *lwe_output_indexes,
Torus *lwe_array_in, Torus *lwe_input_indexes, Torus *ksk,
uint32_t lwe_dimension_in, uint32_t lwe_dimension_out, uint32_t base_log,
uint32_t level_count, uint32_t num_samples) {
cudaSetDevice(stream->gpu_index);
constexpr int ideal_threads = 128;
int lwe_dim = lwe_dimension_out + 1;
int lwe_lower, lwe_upper, cutoff;
if (lwe_dim % ideal_threads == 0) {
lwe_lower = lwe_dim / ideal_threads;
lwe_upper = lwe_dim / ideal_threads;
cutoff = 0;
} else {
int y =
ceil((double)lwe_dim / (double)ideal_threads) * ideal_threads - lwe_dim;
cutoff = ideal_threads - y;
lwe_lower = lwe_dim / ideal_threads;
lwe_upper = (int)ceil((double)lwe_dim / (double)ideal_threads);
}
int lwe_size_after = (lwe_dimension_out + 1) * num_samples;
int shared_mem = sizeof(Torus) * (lwe_dimension_out + 1);
cuda_memset_async(lwe_array_out, 0, sizeof(Torus) * lwe_size_after, stream);
check_cuda_error(cudaGetLastError());
dim3 grid(num_samples, 1, 1);
dim3 threads(ideal_threads, 1, 1);
// cudaFuncSetAttribute(keyswitch<Torus>,
// cudaFuncAttributeMaxDynamicSharedMemorySize,
// shared_mem);
keyswitch<<<grid, threads, shared_mem, stream->stream>>>(
lwe_array_out, lwe_output_indexes, lwe_array_in, lwe_input_indexes, ksk,
lwe_dimension_in, lwe_dimension_out, base_log, level_count, lwe_lower,
lwe_upper, cutoff);
check_cuda_error(cudaGetLastError());
}
#endif

View File

@@ -0,0 +1,74 @@
#ifndef CNCRT_TORUS_CUH
#define CNCRT_TORUS_CUH
#include "types/int128.cuh"
#include <limits>
template <typename T>
__device__ inline void typecast_double_to_torus(double x, T &r) {
r = T(x);
}
template <>
__device__ inline void typecast_double_to_torus<uint32_t>(double x,
uint32_t &r) {
r = __double2uint_rn(x);
}
template <>
__device__ inline void typecast_double_to_torus<uint64_t>(double x,
uint64_t &r) {
// The ull intrinsic does not behave in the same way on all architectures and
// on some platforms this causes the cmux tree test to fail
// Hence the intrinsic is not used here
uint128 nnnn = make_uint128_from_float(x);
uint64_t lll = nnnn.lo_;
r = lll;
}
template <typename T>
__device__ inline T round_to_closest_multiple(T x, uint32_t base_log,
uint32_t level_count) {
T shift = sizeof(T) * 8 - level_count * base_log;
T mask = 1ll << (shift - 1);
T b = (x & mask) >> (shift - 1);
T res = x >> shift;
res += b;
res <<= shift;
return res;
}
template <typename T>
__device__ __forceinline__ void rescale_torus_element(T element, T &output,
uint32_t log_shift) {
output =
round((double)element / (double(std::numeric_limits<T>::max()) + 1.0) *
(double)log_shift);
}
template <typename T>
__device__ __forceinline__ T rescale_torus_element(T element,
uint32_t log_shift) {
return round((double)element / (double(std::numeric_limits<T>::max()) + 1.0) *
(double)log_shift);
}
template <>
__device__ __forceinline__ void
rescale_torus_element<uint32_t>(uint32_t element, uint32_t &output,
uint32_t log_shift) {
output =
round(__uint2double_rn(element) /
(__uint2double_rn(std::numeric_limits<uint32_t>::max()) + 1.0) *
__uint2double_rn(log_shift));
}
template <>
__device__ __forceinline__ void
rescale_torus_element<uint64_t>(uint64_t element, uint64_t &output,
uint32_t log_shift) {
output = round(__ull2double_rn(element) /
(__ull2double_rn(std::numeric_limits<uint64_t>::max()) + 1.0) *
__uint2double_rn(log_shift));
}
#endif // CNCRT_TORUS_H

View File

@@ -0,0 +1,350 @@
#include "device.h"
#include <cstdint>
#include <cuda_runtime.h>
/// Unsafe function to create a CUDA stream, must check first that GPU exists
cuda_stream_t *cuda_create_stream(uint32_t gpu_index) {
cudaSetDevice(gpu_index);
cuda_stream_t *stream = new cuda_stream_t(gpu_index);
return stream;
}
/// Unsafe function to destroy CUDA stream, must check first the GPU exists
int cuda_destroy_stream(cuda_stream_t *stream) {
stream->release();
return 0;
}
/// Unsafe function that will try to allocate even if gpu_index is invalid
/// or if there's not enough memory. A safe wrapper around it must call
/// cuda_check_valid_malloc() first
void *cuda_malloc(uint64_t size, uint32_t gpu_index) {
cudaSetDevice(gpu_index);
void *ptr;
cudaMalloc((void **)&ptr, size);
check_cuda_error(cudaGetLastError());
return ptr;
}
/// Allocates a size-byte array at the device memory. Tries to do it
/// asynchronously.
void *cuda_malloc_async(uint64_t size, cuda_stream_t *stream) {
cudaSetDevice(stream->gpu_index);
void *ptr;
#ifndef CUDART_VERSION
#error CUDART_VERSION Undefined!
#elif (CUDART_VERSION >= 11020)
int support_async_alloc;
check_cuda_error(cudaDeviceGetAttribute(&support_async_alloc,
cudaDevAttrMemoryPoolsSupported,
stream->gpu_index));
if (support_async_alloc) {
check_cuda_error(cudaMallocAsync((void **)&ptr, size, stream->stream));
} else {
check_cuda_error(cudaMalloc((void **)&ptr, size));
}
#else
check_cuda_error(cudaMalloc((void **)&ptr, size));
#endif
return ptr;
}
/// Checks that allocation is valid
/// 0: valid
/// -1: invalid, not enough memory in device
/// -2: invalid, gpu index doesn't exist
int cuda_check_valid_malloc(uint64_t size, uint32_t gpu_index) {
if (gpu_index >= cuda_get_number_of_gpus()) {
// error code: invalid gpu_index
return -2;
}
cudaSetDevice(gpu_index);
size_t total_mem, free_mem;
cudaMemGetInfo(&free_mem, &total_mem);
if (size > free_mem) {
// error code: not enough memory
return -1;
}
return 0;
}
/// Returns
/// -> 0 if Cooperative Groups is not supported.
/// -> 1 otherwise
int cuda_check_support_cooperative_groups() {
int cooperative_groups_supported = 0;
cudaDeviceGetAttribute(&cooperative_groups_supported,
cudaDevAttrCooperativeLaunch, 0);
return cooperative_groups_supported > 0;
}
/// Tries to copy memory to the GPU asynchronously
/// 0: success
/// -1: error, invalid device pointer
/// -2: error, gpu index doesn't exist
/// -3: error, zero copy size
int cuda_memcpy_async_to_gpu(void *dest, void *src, uint64_t size,
cuda_stream_t *stream) {
if (size == 0) {
// error code: zero copy size
return -3;
}
if (stream->gpu_index >= cuda_get_number_of_gpus()) {
// error code: invalid gpu_index
return -2;
}
cudaPointerAttributes attr;
cudaPointerGetAttributes(&attr, dest);
if (attr.device != stream->gpu_index && attr.type != cudaMemoryTypeDevice) {
// error code: invalid device pointer
return -1;
}
cudaSetDevice(stream->gpu_index);
check_cuda_error(
cudaMemcpyAsync(dest, src, size, cudaMemcpyHostToDevice, stream->stream));
return 0;
}
/// Tries to copy memory to the GPU synchronously
/// 0: success
/// -1: error, invalid device pointer
/// -2: error, gpu index doesn't exist
/// -3: error, zero copy size
int cuda_memcpy_to_gpu(void *dest, void *src, uint64_t size) {
if (size == 0) {
// error code: zero copy size
return -3;
}
cudaPointerAttributes attr;
cudaPointerGetAttributes(&attr, dest);
if (attr.type != cudaMemoryTypeDevice) {
// error code: invalid device pointer
return -1;
}
check_cuda_error(cudaMemcpy(dest, src, size, cudaMemcpyHostToDevice));
return 0;
}
/// Tries to copy memory to the CPU synchronously
/// 0: success
/// -1: error, invalid device pointer
/// -2: error, gpu index doesn't exist
/// -3: error, zero copy size
int cuda_memcpy_to_cpu(void *dest, void *src, uint64_t size) {
if (size == 0) {
// error code: zero copy size
return -3;
}
cudaPointerAttributes attr;
cudaPointerGetAttributes(&attr, src);
if (attr.type != cudaMemoryTypeDevice) {
// error code: invalid device pointer
return -1;
}
check_cuda_error(cudaMemcpy(dest, src, size, cudaMemcpyDeviceToHost));
return 0;
}
/// Tries to copy memory within a GPU asynchronously
/// 0: success
/// -1: error, invalid device pointer
/// -2: error, gpu index doesn't exist
/// -3: error, zero copy size
int cuda_memcpy_async_gpu_to_gpu(void *dest, void *src, uint64_t size,
cuda_stream_t *stream) {
if (size == 0) {
// error code: zero copy size
return -3;
}
if (stream->gpu_index >= cuda_get_number_of_gpus()) {
// error code: invalid gpu_index
return -2;
}
cudaPointerAttributes attr_dest;
cudaPointerGetAttributes(&attr_dest, dest);
if (attr_dest.device != stream->gpu_index &&
attr_dest.type != cudaMemoryTypeDevice) {
// error code: invalid device pointer
return -1;
}
cudaPointerAttributes attr_src;
cudaPointerGetAttributes(&attr_src, src);
if (attr_src.device != stream->gpu_index &&
attr_src.type != cudaMemoryTypeDevice) {
// error code: invalid device pointer
return -1;
}
if (attr_src.device != attr_dest.device) {
// error code: different devices
return -1;
}
cudaSetDevice(stream->gpu_index);
check_cuda_error(cudaMemcpyAsync(dest, src, size, cudaMemcpyDeviceToDevice,
stream->stream));
return 0;
}
/// Synchronizes device
/// 0: success
/// -2: error, gpu index doesn't exist
int cuda_synchronize_device(uint32_t gpu_index) {
if (gpu_index >= cuda_get_number_of_gpus()) {
// error code: invalid gpu_index
return -2;
}
cudaSetDevice(gpu_index);
cudaDeviceSynchronize();
return 0;
}
int cuda_memset_async(void *dest, uint64_t val, uint64_t size,
cuda_stream_t *stream) {
if (size == 0) {
// error code: zero copy size
return -3;
}
if (stream->gpu_index >= cuda_get_number_of_gpus()) {
// error code: invalid gpu_index
return -2;
}
cudaPointerAttributes attr;
cudaPointerGetAttributes(&attr, dest);
if (attr.device != stream->gpu_index && attr.type != cudaMemoryTypeDevice) {
// error code: invalid device pointer
return -1;
}
cudaSetDevice(stream->gpu_index);
check_cuda_error(cudaMemsetAsync(dest, val, size, stream->stream));
return 0;
}
template <typename Torus>
__global__ void cuda_set_value_kernel(Torus *array, Torus value, Torus n) {
int index = threadIdx.x + blockIdx.x * blockDim.x;
if (index < n)
array[index] = value;
}
template <typename Torus>
void cuda_set_value_async(cudaStream_t *stream, Torus *d_array, Torus value,
Torus n) {
int block_size = 256;
int num_blocks = (n + block_size - 1) / block_size;
// Launch the kernel
cuda_set_value_kernel<<<num_blocks, block_size, 0, *stream>>>(d_array, value,
n);
}
/// Explicitly instantiate cuda_set_value_async for 32 and 64 bits
template void cuda_set_value_async(cudaStream_t *stream, uint64_t *d_array,
uint64_t value, uint64_t n);
template void cuda_set_value_async(cudaStream_t *stream, uint32_t *d_array,
uint32_t value, uint32_t n);
/// Tries to copy memory to the GPU asynchronously
/// 0: success
/// -1: error, invalid device pointer
/// -2: error, gpu index doesn't exist
/// -3: error, zero copy size
int cuda_memcpy_async_to_cpu(void *dest, const void *src, uint64_t size,
cuda_stream_t *stream) {
if (size == 0) {
// error code: zero copy size
return -3;
}
if (stream->gpu_index >= cuda_get_number_of_gpus()) {
// error code: invalid gpu_index
return -2;
}
cudaPointerAttributes attr;
cudaPointerGetAttributes(&attr, src);
if (attr.device != stream->gpu_index && attr.type != cudaMemoryTypeDevice) {
// error code: invalid device pointer
return -1;
}
cudaSetDevice(stream->gpu_index);
check_cuda_error(
cudaMemcpyAsync(dest, src, size, cudaMemcpyDeviceToHost, stream->stream));
return 0;
}
/// Return number of GPUs available
int cuda_get_number_of_gpus() {
int num_gpus;
cudaGetDeviceCount(&num_gpus);
return num_gpus;
}
/// Drop a cuda array
int cuda_drop(void *ptr, uint32_t gpu_index) {
if (gpu_index >= cuda_get_number_of_gpus()) {
// error code: invalid gpu_index
return -2;
}
cudaSetDevice(gpu_index);
check_cuda_error(cudaFree(ptr));
return 0;
}
/// Drop a cuda array. Tries to do it asynchronously
int cuda_drop_async(void *ptr, cuda_stream_t *stream) {
cudaSetDevice(stream->gpu_index);
#ifndef CUDART_VERSION
#error CUDART_VERSION Undefined!
#elif (CUDART_VERSION >= 11020)
int support_async_alloc;
check_cuda_error(cudaDeviceGetAttribute(&support_async_alloc,
cudaDevAttrMemoryPoolsSupported,
stream->gpu_index));
if (support_async_alloc) {
check_cuda_error(cudaFreeAsync(ptr, stream->stream));
} else {
check_cuda_error(cudaFree(ptr));
}
#else
check_cuda_error(cudaFree(ptr));
#endif
return 0;
}
/// Get the maximum size for the shared memory
int cuda_get_max_shared_memory(uint32_t gpu_index) {
if (gpu_index >= cuda_get_number_of_gpus()) {
// error code: invalid gpu_index
return -2;
}
cudaSetDevice(gpu_index);
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, gpu_index);
int max_shared_memory = 0;
if (prop.major >= 6) {
max_shared_memory = prop.sharedMemPerMultiprocessor;
} else {
max_shared_memory = prop.sharedMemPerBlock;
}
return max_shared_memory;
}
int cuda_synchronize_stream(cuda_stream_t *stream) {
stream->synchronize();
return 0;
}

View File

@@ -0,0 +1,725 @@
#ifndef GPU_BOOTSTRAP_FFT_CUH
#define GPU_BOOTSTRAP_FFT_CUH
#include "polynomial/functions.cuh"
#include "polynomial/parameters.cuh"
#include "twiddles.cuh"
#include "types/complex/operations.cuh"
/*
* Direct negacyclic FFT:
* - before the FFT the N real coefficients are stored into a
* N/2 sized complex with the even coefficients in the real part
* and the odd coefficients in the imaginary part. This is referred to
* as the half-size FFT
* - when calling BNSMFFT_direct for the forward negacyclic FFT of PBS,
* opt is divided by 2 because the butterfly pattern is always applied
* between pairs of coefficients
* - instead of twisting each coefficient A_j before the FFT by
* multiplying by the w^j roots of unity (aka twiddles, w=exp(-i pi /N)),
* the FFT is modified, and for each level k of the FFT the twiddle:
* w_j,k = exp(-i pi j/2^k)
* is replaced with:
* \zeta_j,k = exp(-i pi (2j-1)/2^k)
*/
template <class params> __device__ void NSMFFT_direct(double2 *A) {
/* We don't make bit reverse here, since twiddles are already reversed
* Each thread is always in charge of "opt/2" pairs of coefficients,
* which is why we always loop through N/2 by N/opt strides
* The pragma unroll instruction tells the compiler to unroll the
* full loop, which should increase performance
*/
size_t tid = threadIdx.x;
size_t twid_id;
size_t i1, i2;
double2 u, v, w;
// level 1
// we don't make actual complex multiplication on level1 since we have only
// one twiddle, it's real and image parts are equal, so we can multiply
// it with simpler operations
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
i1 = tid;
i2 = tid + params::degree / 2;
u = A[i1];
v = A[i2] * (double2){0.707106781186547461715008466854,
0.707106781186547461715008466854};
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
// level 2
// from this level there are more than one twiddles and none of them has equal
// real and imag parts, so complete complex multiplication is needed
// for each level params::degree / 2^level represents number of coefficients
// inside divided chunk of specific level
//
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 4);
i1 = 2 * (params::degree / 4) * twid_id + (tid & (params::degree / 4 - 1));
i2 = i1 + params::degree / 4;
w = negtwiddles[twid_id + 2];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
// level 3
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 8);
i1 = 2 * (params::degree / 8) * twid_id + (tid & (params::degree / 8 - 1));
i2 = i1 + params::degree / 8;
w = negtwiddles[twid_id + 4];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
// level 4
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 16);
i1 =
2 * (params::degree / 16) * twid_id + (tid & (params::degree / 16 - 1));
i2 = i1 + params::degree / 16;
w = negtwiddles[twid_id + 8];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
// level 5
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 32);
i1 =
2 * (params::degree / 32) * twid_id + (tid & (params::degree / 32 - 1));
i2 = i1 + params::degree / 32;
w = negtwiddles[twid_id + 16];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
// level 6
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 64);
i1 =
2 * (params::degree / 64) * twid_id + (tid & (params::degree / 64 - 1));
i2 = i1 + params::degree / 64;
w = negtwiddles[twid_id + 32];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
// level 7
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 128);
i1 = 2 * (params::degree / 128) * twid_id +
(tid & (params::degree / 128 - 1));
i2 = i1 + params::degree / 128;
w = negtwiddles[twid_id + 64];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
// from level 8, we need to check size of params degree, because we support
// minimum actual polynomial size = 256, when compressed size is halfed and
// minimum supported compressed size is 128, so we always need first 7
// levels of butterfy operation, since butterfly levels are hardcoded
// we need to check if polynomial size is big enough to require specific level
// of butterfly.
if constexpr (params::degree >= 256) {
// level 8
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 256);
i1 = 2 * (params::degree / 256) * twid_id +
(tid & (params::degree / 256 - 1));
i2 = i1 + params::degree / 256;
w = negtwiddles[twid_id + 128];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 512) {
// level 9
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 512);
i1 = 2 * (params::degree / 512) * twid_id +
(tid & (params::degree / 512 - 1));
i2 = i1 + params::degree / 512;
w = negtwiddles[twid_id + 256];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 1024) {
// level 10
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 1024);
i1 = 2 * (params::degree / 1024) * twid_id +
(tid & (params::degree / 1024 - 1));
i2 = i1 + params::degree / 1024;
w = negtwiddles[twid_id + 512];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 2048) {
// level 11
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 2048);
i1 = 2 * (params::degree / 2048) * twid_id +
(tid & (params::degree / 2048 - 1));
i2 = i1 + params::degree / 2048;
w = negtwiddles[twid_id + 1024];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 4096) {
// level 12
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 4096);
i1 = 2 * (params::degree / 4096) * twid_id +
(tid & (params::degree / 4096 - 1));
i2 = i1 + params::degree / 4096;
w = negtwiddles[twid_id + 2048];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
}
// compressed size = 8192 is actual polynomial size = 16384.
// from this size, twiddles can't fit in constant memory,
// so from here, butterfly operation access device memory.
if constexpr (params::degree >= 8192) {
// level 13
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 8192);
i1 = 2 * (params::degree / 8192) * twid_id +
(tid & (params::degree / 8192 - 1));
i2 = i1 + params::degree / 8192;
w = negtwiddles13[twid_id];
u = A[i1];
v = A[i2] * w;
A[i1] += v;
A[i2] = u - v;
tid += params::degree / params::opt;
}
__syncthreads();
}
}
/*
* negacyclic inverse fft
*/
template <class params> __device__ void NSMFFT_inverse(double2 *A) {
/* We don't make bit reverse here, since twiddles are already reversed
* Each thread is always in charge of "opt/2" pairs of coefficients,
* which is why we always loop through N/2 by N/opt strides
* The pragma unroll instruction tells the compiler to unroll the
* full loop, which should increase performance
*/
size_t tid = threadIdx.x;
size_t twid_id;
size_t i1, i2;
double2 u, w;
// divide input by compressed polynomial size
tid = threadIdx.x;
for (size_t i = 0; i < params::opt; ++i) {
A[tid] /= params::degree;
tid += params::degree / params::opt;
}
__syncthreads();
// none of the twiddles have equal real and imag part, so
// complete complex multiplication has to be done
// here we have more than one twiddle
// mapping in backward fft is reversed
// butterfly operation is started from last level
// compressed size = 8192 is actual polynomial size = 16384.
// twiddles for this size can't fit in constant memory so
// butterfly operation for this level acess device memory to fetch
// twiddles
if constexpr (params::degree >= 8192) {
// level 13
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 8192);
i1 = 2 * (params::degree / 8192) * twid_id +
(tid & (params::degree / 8192 - 1));
i2 = i1 + params::degree / 8192;
w = negtwiddles13[twid_id];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 4096) {
// level 12
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 4096);
i1 = 2 * (params::degree / 4096) * twid_id +
(tid & (params::degree / 4096 - 1));
i2 = i1 + params::degree / 4096;
w = negtwiddles[twid_id + 2048];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 2048) {
// level 11
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 2048);
i1 = 2 * (params::degree / 2048) * twid_id +
(tid & (params::degree / 2048 - 1));
i2 = i1 + params::degree / 2048;
w = negtwiddles[twid_id + 1024];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 1024) {
// level 10
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 1024);
i1 = 2 * (params::degree / 1024) * twid_id +
(tid & (params::degree / 1024 - 1));
i2 = i1 + params::degree / 1024;
w = negtwiddles[twid_id + 512];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 512) {
// level 9
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 512);
i1 = 2 * (params::degree / 512) * twid_id +
(tid & (params::degree / 512 - 1));
i2 = i1 + params::degree / 512;
w = negtwiddles[twid_id + 256];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
}
if constexpr (params::degree >= 256) {
// level 8
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 256);
i1 = 2 * (params::degree / 256) * twid_id +
(tid & (params::degree / 256 - 1));
i2 = i1 + params::degree / 256;
w = negtwiddles[twid_id + 128];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
}
// below level 8, we don't need to check size of params degree, because we
// support minimum actual polynomial size = 256, when compressed size is
// halfed and minimum supported compressed size is 128, so we always need
// last 7 levels of butterfy operation, since butterfly levels are hardcoded
// we don't need to check if polynomial size is big enough to require
// specific level of butterfly.
// level 7
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 128);
i1 = 2 * (params::degree / 128) * twid_id +
(tid & (params::degree / 128 - 1));
i2 = i1 + params::degree / 128;
w = negtwiddles[twid_id + 64];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
// level 6
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 64);
i1 =
2 * (params::degree / 64) * twid_id + (tid & (params::degree / 64 - 1));
i2 = i1 + params::degree / 64;
w = negtwiddles[twid_id + 32];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
// level 5
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 32);
i1 =
2 * (params::degree / 32) * twid_id + (tid & (params::degree / 32 - 1));
i2 = i1 + params::degree / 32;
w = negtwiddles[twid_id + 16];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
// level 4
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 16);
i1 =
2 * (params::degree / 16) * twid_id + (tid & (params::degree / 16 - 1));
i2 = i1 + params::degree / 16;
w = negtwiddles[twid_id + 8];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
// level 3
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 8);
i1 = 2 * (params::degree / 8) * twid_id + (tid & (params::degree / 8 - 1));
i2 = i1 + params::degree / 8;
w = negtwiddles[twid_id + 4];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
// level 2
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 4);
i1 = 2 * (params::degree / 4) * twid_id + (tid & (params::degree / 4 - 1));
i2 = i1 + params::degree / 4;
w = negtwiddles[twid_id + 2];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
// level 1
tid = threadIdx.x;
#pragma unroll
for (size_t i = 0; i < params::opt / 2; ++i) {
twid_id = tid / (params::degree / 2);
i1 = 2 * (params::degree / 2) * twid_id + (tid & (params::degree / 2 - 1));
i2 = i1 + params::degree / 2;
w = negtwiddles[twid_id + 1];
u = A[i1] - A[i2];
A[i1] += A[i2];
A[i2] = u * conjugate(w);
tid += params::degree / params::opt;
}
__syncthreads();
}
/*
* global batch fft
* does fft in half size
* unrolling half size fft result in half size + 1 elements
* this function must be called with actual degree
* function takes as input already compressed input
*/
template <class params, sharedMemDegree SMD>
__global__ void batch_NSMFFT(double2 *d_input, double2 *d_output,
double2 *buffer) {
extern __shared__ double2 sharedMemoryFFT[];
double2 *fft = (SMD == NOSM) ? &buffer[blockIdx.x * params::degree / 2]
: sharedMemoryFFT;
int tid = threadIdx.x;
#pragma unroll
for (int i = 0; i < params::opt / 2; i++) {
fft[tid] = d_input[blockIdx.x * (params::degree / 2) + tid];
tid = tid + params::degree / params::opt;
}
__syncthreads();
NSMFFT_direct<HalfDegree<params>>(fft);
__syncthreads();
tid = threadIdx.x;
#pragma unroll
for (int i = 0; i < params::opt / 2; i++) {
d_output[blockIdx.x * (params::degree / 2) + tid] = fft[tid];
tid = tid + params::degree / params::opt;
}
}
/*
* global batch polynomial multiplication
* only used for fft tests
* d_input1 and d_output must not have the same pointer
* d_input1 can be modified inside the function
*/
template <class params, sharedMemDegree SMD>
__global__ void batch_polynomial_mul(double2 *d_input1, double2 *d_input2,
double2 *d_output, double2 *buffer) {
extern __shared__ double2 sharedMemoryFFT[];
double2 *fft = (SMD == NOSM) ? &buffer[blockIdx.x * params::degree / 2]
: sharedMemoryFFT;
// Move first polynomial into shared memory(if possible otherwise it will
// be moved in device buffer)
int tid = threadIdx.x;
#pragma unroll
for (int i = 0; i < params::opt / 2; i++) {
fft[tid] = d_input1[blockIdx.x * (params::degree / 2) + tid];
tid = tid + params::degree / params::opt;
}
// Perform direct negacyclic fourier transform
__syncthreads();
NSMFFT_direct<HalfDegree<params>>(fft);
__syncthreads();
// Put the result of direct fft inside input1
tid = threadIdx.x;
#pragma unroll
for (int i = 0; i < params::opt / 2; i++) {
d_input1[blockIdx.x * (params::degree / 2) + tid] = fft[tid];
tid = tid + params::degree / params::opt;
}
__syncthreads();
// Move first polynomial into shared memory(if possible otherwise it will
// be moved in device buffer)
tid = threadIdx.x;
#pragma unroll
for (int i = 0; i < params::opt / 2; i++) {
fft[tid] = d_input2[blockIdx.x * (params::degree / 2) + tid];
tid = tid + params::degree / params::opt;
}
// Perform direct negacyclic fourier transform on the second polynomial
__syncthreads();
NSMFFT_direct<HalfDegree<params>>(fft);
__syncthreads();
// calculate pointwise multiplication inside fft buffer
tid = threadIdx.x;
#pragma unroll
for (int i = 0; i < params::opt / 2; i++) {
fft[tid] *= d_input1[blockIdx.x * (params::degree / 2) + tid];
tid = tid + params::degree / params::opt;
}
// Perform backward negacyclic fourier transform
__syncthreads();
NSMFFT_inverse<HalfDegree<params>>(fft);
__syncthreads();
// copy results in output buffer
tid = threadIdx.x;
#pragma unroll
for (int i = 0; i < params::opt / 2; i++) {
d_output[blockIdx.x * (params::degree / 2) + tid] = fft[tid];
tid = tid + params::degree / params::opt;
}
}
#endif // GPU_BOOTSTRAP_FFT_CUH

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,13 @@
#ifndef GPU_BOOTSTRAP_TWIDDLES_CUH
#define GPU_BOOTSTRAP_TWIDDLES_CUH
/*
* 'negtwiddles' are stored in constant memory for faster access times
* because of it's limitied size, only twiddles for up to 2^12 polynomial size
* can be stored there, twiddles for 2^13 are stored in device memory
* 'negtwiddles13'
*/
extern __constant__ double2 negtwiddles[4096];
extern __device__ double2 negtwiddles13[4096];
#endif

View File

@@ -0,0 +1,51 @@
#include "integer/bitwise_ops.cuh"
void scratch_cuda_integer_radix_bitop_kb_64(
cuda_stream_t *stream, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t lwe_ciphertext_count, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, BITOP_TYPE op_type,
bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
ks_base_log, pbs_level, pbs_base_log, grouping_factor,
message_modulus, carry_modulus);
scratch_cuda_integer_radix_bitop_kb<uint64_t>(
stream, (int_bitop_buffer<uint64_t> **)mem_ptr, lwe_ciphertext_count,
params, op_type, allocate_gpu_memory);
}
void cuda_bitop_integer_radix_ciphertext_kb_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_array_1,
void *lwe_array_2, int8_t *mem_ptr, void *bsk, void *ksk,
uint32_t lwe_ciphertext_count) {
host_integer_radix_bitop_kb<uint64_t>(
stream, static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_1),
static_cast<uint64_t *>(lwe_array_2),
(int_bitop_buffer<uint64_t> *)mem_ptr, bsk, static_cast<uint64_t *>(ksk),
lwe_ciphertext_count);
}
void cuda_bitnot_integer_radix_ciphertext_kb_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_array_in,
int8_t *mem_ptr, void *bsk, void *ksk, uint32_t lwe_ciphertext_count) {
host_integer_radix_bitnot_kb<uint64_t>(
stream, static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_in),
(int_bitop_buffer<uint64_t> *)mem_ptr, bsk, static_cast<uint64_t *>(ksk),
lwe_ciphertext_count);
}
void cleanup_cuda_integer_bitop(cuda_stream_t *stream, int8_t **mem_ptr_void) {
int_bitop_buffer<uint64_t> *mem_ptr =
(int_bitop_buffer<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(stream);
}

View File

@@ -0,0 +1,51 @@
#ifndef CUDA_INTEGER_BITWISE_OPS_CUH
#define CUDA_INTEGER_BITWISE_OPS_CUH
#include "crypto/keyswitch.cuh"
#include "device.h"
#include "integer.cuh"
#include "integer.h"
#include "pbs/bootstrap_low_latency.cuh"
#include "pbs/bootstrap_multibit.cuh"
#include "polynomial/functions.cuh"
#include "utils/kernel_dimensions.cuh"
#include <omp.h>
template <typename Torus>
__host__ void
host_integer_radix_bitop_kb(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_array_1, Torus *lwe_array_2,
int_bitop_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
auto lut = mem_ptr->lut;
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
stream, lwe_array_out, lwe_array_1, lwe_array_2, bsk, ksk,
num_radix_blocks, lut);
}
template <typename Torus>
__host__ void
host_integer_radix_bitnot_kb(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_array_in,
int_bitop_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
auto lut = mem_ptr->lut;
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, lwe_array_out, lwe_array_in, bsk, ksk, num_radix_blocks, lut);
}
template <typename Torus>
__host__ void scratch_cuda_integer_radix_bitop_kb(
cuda_stream_t *stream, int_bitop_buffer<Torus> **mem_ptr,
uint32_t num_radix_blocks, int_radix_params params, BITOP_TYPE op,
bool allocate_gpu_memory) {
*mem_ptr = new int_bitop_buffer<Torus>(stream, op, params, num_radix_blocks,
allocate_gpu_memory);
}
#endif

View File

@@ -0,0 +1,45 @@
#include "integer/cmux.cuh"
void scratch_cuda_integer_radix_cmux_kb_64(
cuda_stream_t *stream, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t lwe_ciphertext_count, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
ks_base_log, pbs_level, pbs_base_log, grouping_factor,
message_modulus, carry_modulus);
std::function<uint64_t(uint64_t)> predicate_lut_f =
[](uint64_t x) -> uint64_t { return x == 1; };
scratch_cuda_integer_radix_cmux_kb(
stream, (int_cmux_buffer<uint64_t> **)mem_ptr, predicate_lut_f,
lwe_ciphertext_count, params, allocate_gpu_memory);
}
void cuda_cmux_integer_radix_ciphertext_kb_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_condition,
void *lwe_array_true, void *lwe_array_false, int8_t *mem_ptr, void *bsk,
void *ksk, uint32_t lwe_ciphertext_count) {
host_integer_radix_cmux_kb<uint64_t>(
stream, static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_condition),
static_cast<uint64_t *>(lwe_array_true),
static_cast<uint64_t *>(lwe_array_false),
(int_cmux_buffer<uint64_t> *)mem_ptr, bsk, static_cast<uint64_t *>(ksk),
lwe_ciphertext_count);
}
void cleanup_cuda_integer_radix_cmux(cuda_stream_t *stream,
int8_t **mem_ptr_void) {
int_cmux_buffer<uint64_t> *mem_ptr =
(int_cmux_buffer<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(stream);
}

View File

@@ -0,0 +1,100 @@
#ifndef CUDA_INTEGER_CMUX_CUH
#define CUDA_INTEGER_CMUX_CUH
#include "integer.cuh"
#include <omp.h>
template <typename Torus>
__host__ void zero_out_if(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_array_input, Torus *lwe_condition,
int_zero_out_if_buffer<Torus> *mem_ptr,
int_radix_lut<Torus> *predicate, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
auto params = mem_ptr->params;
int big_lwe_size = params.big_lwe_dimension + 1;
// Left message is shifted
int num_blocks = 0, num_threads = 0;
int num_entries = (params.big_lwe_dimension + 1);
getNumBlocksAndThreads(num_entries, 512, num_blocks, num_threads);
// We can't use integer_radix_apply_bivariate_lookup_table_kb since the
// second operand is fixed
auto tmp_lwe_array_input = mem_ptr->tmp;
for (int i = 0; i < num_radix_blocks; i++) {
auto lwe_array_out_block = tmp_lwe_array_input + i * big_lwe_size;
auto lwe_array_input_block = lwe_array_input + i * big_lwe_size;
device_pack_bivariate_blocks<<<num_blocks, num_threads, 0,
stream->stream>>>(
lwe_array_out_block, lwe_array_input_block, lwe_condition,
predicate->lwe_indexes, params.big_lwe_dimension,
params.message_modulus, 1);
check_cuda_error(cudaGetLastError());
}
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, lwe_array_out, tmp_lwe_array_input, bsk, ksk, num_radix_blocks,
predicate);
}
template <typename Torus>
__host__ void
host_integer_radix_cmux_kb(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_condition, Torus *lwe_array_true,
Torus *lwe_array_false,
int_cmux_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
auto params = mem_ptr->params;
// Since our CPU threads will be working on different streams we shall assert
// the work in the main stream is completed
stream->synchronize();
auto true_stream = mem_ptr->zero_if_true_buffer->local_stream;
auto false_stream = mem_ptr->zero_if_false_buffer->local_stream;
#pragma omp parallel sections
{
// Both sections may be executed in parallel
#pragma omp section
{
auto mem_true = mem_ptr->zero_if_true_buffer;
zero_out_if(true_stream, mem_ptr->tmp_true_ct, lwe_array_true,
lwe_condition, mem_true, mem_ptr->inverted_predicate_lut, bsk,
ksk, num_radix_blocks);
}
#pragma omp section
{
auto mem_false = mem_ptr->zero_if_false_buffer;
zero_out_if(false_stream, mem_ptr->tmp_false_ct, lwe_array_false,
lwe_condition, mem_false, mem_ptr->predicate_lut, bsk, ksk,
num_radix_blocks);
}
}
cuda_synchronize_stream(true_stream);
cuda_synchronize_stream(false_stream);
// If the condition was true, true_ct will have kept its value and false_ct
// will be 0 If the condition was false, true_ct will be 0 and false_ct will
// have kept its value
auto added_cts = mem_ptr->tmp_true_ct;
host_addition(stream, added_cts, mem_ptr->tmp_true_ct, mem_ptr->tmp_false_ct,
params.big_lwe_dimension, num_radix_blocks);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, lwe_array_out, added_cts, bsk, ksk, num_radix_blocks,
mem_ptr->message_extract_lut);
}
template <typename Torus>
__host__ void scratch_cuda_integer_radix_cmux_kb(
cuda_stream_t *stream, int_cmux_buffer<Torus> **mem_ptr,
std::function<Torus(Torus)> predicate_lut_f, uint32_t num_radix_blocks,
int_radix_params params, bool allocate_gpu_memory) {
*mem_ptr = new int_cmux_buffer<Torus>(stream, predicate_lut_f, params,
num_radix_blocks, allocate_gpu_memory);
}
#endif

View File

@@ -0,0 +1,83 @@
#include "integer/comparison.cuh"
void scratch_cuda_integer_radix_comparison_kb_64(
cuda_stream_t *stream, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t lwe_ciphertext_count, uint32_t message_modulus,
uint32_t carry_modulus, PBS_TYPE pbs_type, COMPARISON_TYPE op_type,
bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
ks_base_log, pbs_level, pbs_base_log, grouping_factor,
message_modulus, carry_modulus);
switch (op_type) {
case EQ:
case NE:
scratch_cuda_integer_radix_equality_check_kb<uint64_t>(
stream, (int_comparison_buffer<uint64_t> **)mem_ptr,
lwe_ciphertext_count, params, op_type, allocate_gpu_memory);
break;
case GT:
case GE:
case LT:
case LE:
case MAX:
case MIN:
scratch_cuda_integer_radix_difference_check_kb<uint64_t>(
stream, (int_comparison_buffer<uint64_t> **)mem_ptr,
lwe_ciphertext_count, params, op_type, allocate_gpu_memory);
break;
}
}
void cuda_comparison_integer_radix_ciphertext_kb_64(
cuda_stream_t *stream, void *lwe_array_out, void *lwe_array_1,
void *lwe_array_2, int8_t *mem_ptr, void *bsk, void *ksk,
uint32_t lwe_ciphertext_count) {
int_comparison_buffer<uint64_t> *buffer =
(int_comparison_buffer<uint64_t> *)mem_ptr;
switch (buffer->op) {
case EQ:
case NE:
host_integer_radix_equality_check_kb<uint64_t>(
stream, static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_1),
static_cast<uint64_t *>(lwe_array_2), buffer, bsk,
static_cast<uint64_t *>(ksk), lwe_ciphertext_count);
break;
case GT:
case GE:
case LT:
case LE:
host_integer_radix_difference_check_kb<uint64_t>(
stream, static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_1),
static_cast<uint64_t *>(lwe_array_2), buffer,
buffer->diff_buffer->operator_f, bsk, static_cast<uint64_t *>(ksk),
lwe_ciphertext_count);
break;
case MAX:
case MIN:
host_integer_radix_maxmin_kb<uint64_t>(
stream, static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t *>(lwe_array_1),
static_cast<uint64_t *>(lwe_array_2), buffer, bsk,
static_cast<uint64_t *>(ksk), lwe_ciphertext_count);
break;
default:
printf("Not implemented\n");
}
}
void cleanup_cuda_integer_comparison(cuda_stream_t *stream,
int8_t **mem_ptr_void) {
int_comparison_buffer<uint64_t> *mem_ptr =
(int_comparison_buffer<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(stream);
}

View File

@@ -0,0 +1,468 @@
#ifndef CUDA_INTEGER_COMPARISON_OPS_CUH
#define CUDA_INTEGER_COMPARISON_OPS_CUH
#include "crypto/keyswitch.cuh"
#include "device.h"
#include "integer.cuh"
#include "integer.h"
#include "integer/cmux.cuh"
#include "integer/negation.cuh"
#include "integer/scalar_addition.cuh"
#include "pbs/bootstrap_low_latency.cuh"
#include "pbs/bootstrap_multibit.cuh"
#include "types/complex/operations.cuh"
#include "utils/kernel_dimensions.cuh"
// lwe_dimension + 1 threads
// todo: This kernel MUST be refactored to a binary reduction
template <typename Torus>
__global__ void device_accumulate_all_blocks(Torus *output, Torus *input_block,
uint32_t lwe_dimension,
uint32_t num_blocks) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx < lwe_dimension + 1) {
auto block = &input_block[idx];
Torus sum = block[0];
for (int i = 1; i < num_blocks; i++) {
sum += block[i * (lwe_dimension + 1)];
}
output[idx] = sum;
}
}
template <typename Torus>
__host__ void accumulate_all_blocks(cuda_stream_t *stream, Torus *output,
Torus *input, uint32_t lwe_dimension,
uint32_t num_radix_blocks) {
int num_blocks = 0, num_threads = 0;
int num_entries = (lwe_dimension + 1);
getNumBlocksAndThreads(num_entries, 512, num_blocks, num_threads);
// Add all blocks and store in sum
device_accumulate_all_blocks<<<num_blocks, num_threads, 0, stream->stream>>>(
output, input, lwe_dimension, num_radix_blocks);
check_cuda_error(cudaGetLastError());
}
template <typename Torus>
__host__ void
are_all_comparisons_block_true(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_array_in,
int_comparison_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto glwe_dimension = params.glwe_dimension;
auto polynomial_size = params.polynomial_size;
auto message_modulus = params.message_modulus;
auto carry_modulus = params.carry_modulus;
auto are_all_block_true_buffer =
mem_ptr->eq_buffer->are_all_block_true_buffer;
uint32_t total_modulus = message_modulus * carry_modulus;
uint32_t max_value = total_modulus - 1;
cuda_memcpy_async_gpu_to_gpu(
lwe_array_out, lwe_array_in,
num_radix_blocks * (big_lwe_dimension + 1) * sizeof(Torus), stream);
int lut_num_blocks = 0;
uint32_t remaining_blocks = num_radix_blocks;
while (remaining_blocks > 1) {
// Split in max_value chunks
uint32_t chunk_length = std::min(max_value, remaining_blocks);
int num_chunks = remaining_blocks / chunk_length;
// Since all blocks encrypt either 0 or 1, we can sum max_value of them
// as in the worst case we will be adding `max_value` ones
auto input_blocks = lwe_array_out;
auto accumulator = are_all_block_true_buffer->tmp_block_accumulated;
for (int i = 0; i < num_chunks; i++) {
accumulate_all_blocks(stream, accumulator, input_blocks,
big_lwe_dimension, chunk_length);
accumulator += (big_lwe_dimension + 1);
remaining_blocks -= (chunk_length - 1);
input_blocks += (big_lwe_dimension + 1) * chunk_length;
}
accumulator = are_all_block_true_buffer->tmp_block_accumulated;
// Selects a LUT
int_radix_lut<Torus> *lut;
if (are_all_block_true_buffer->op == COMPARISON_TYPE::NE) {
// is_non_zero_lut_buffer LUT
lut = mem_ptr->eq_buffer->is_non_zero_lut;
} else if (chunk_length == max_value) {
// is_max_value LUT
lut = are_all_block_true_buffer->is_max_value_lut;
} else {
// is_equal_to_num_blocks LUT
lut = are_all_block_true_buffer->is_equal_to_num_blocks_lut;
if (chunk_length != lut_num_blocks) {
auto is_equal_to_num_blocks_lut_f = [max_value,
chunk_length](Torus x) -> Torus {
return (x & max_value) == chunk_length;
};
generate_device_accumulator<Torus>(
stream, lut->lut, glwe_dimension, polynomial_size, message_modulus,
carry_modulus, is_equal_to_num_blocks_lut_f);
// We don't have to generate this lut again
lut_num_blocks = chunk_length;
}
}
// Applies the LUT
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, lwe_array_out, accumulator, bsk, ksk, num_chunks, lut);
}
}
// This takes an input slice of blocks.
//
// Each block can encrypt any value as long as its < message_modulus.
//
// It will compare blocks with 0, for either equality or difference.
//
// This returns a Vec of block, where each block encrypts 1 or 0
// depending of if all blocks matched with the comparison type with 0.
//
// E.g. For ZeroComparisonType::Equality, if all input blocks are zero
// than all returned block will encrypt 1
//
// The returned Vec will have less block than the number of input blocks.
// The returned blocks potentially needs to be 'reduced' to one block
// with eg are_all_comparisons_block_true.
//
// This function exists because sometimes it is faster to concatenate
// multiple vec of 'boolean' shortint block before reducing them with
// are_all_comparisons_block_true
template <typename Torus>
__host__ void host_compare_with_zero_equality(
cuda_stream_t *stream, Torus *lwe_array_out, Torus *lwe_array_in,
int_comparison_buffer<Torus> *mem_ptr, void *bsk, Torus *ksk,
int32_t num_radix_blocks) {
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto message_modulus = params.message_modulus;
auto carry_modulus = params.carry_modulus;
// The idea is that we will sum chunks of blocks until carries are full
// then we compare the sum with 0.
//
// If all blocks were 0, the sum will be zero
// If at least one bock was not zero, the sum won't be zero
uint32_t total_modulus = message_modulus * carry_modulus;
uint32_t message_max = message_modulus - 1;
uint32_t num_elements_to_fill_carry = (total_modulus - 1) / message_max;
size_t big_lwe_size = big_lwe_dimension + 1;
size_t big_lwe_size_bytes = big_lwe_size * sizeof(Torus);
int num_sum_blocks = 0;
// Accumulator
auto sum = lwe_array_out;
if (num_radix_blocks == 1) {
// Just copy
cuda_memcpy_async_gpu_to_gpu(sum, lwe_array_in, big_lwe_size_bytes, stream);
num_sum_blocks = 1;
} else {
uint32_t remainder_blocks = num_radix_blocks;
auto sum_i = sum;
auto chunk = lwe_array_in;
while (remainder_blocks > 1) {
uint32_t chunk_size =
std::min(remainder_blocks, num_elements_to_fill_carry);
accumulate_all_blocks(stream, sum_i, chunk, big_lwe_dimension,
chunk_size);
num_sum_blocks++;
remainder_blocks -= (chunk_size - 1);
// Update operands
chunk += chunk_size * big_lwe_size;
sum_i += big_lwe_size;
}
}
auto is_equal_to_zero_lut = mem_ptr->diff_buffer->is_zero_lut;
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, sum, sum, bsk, ksk, num_sum_blocks, is_equal_to_zero_lut);
are_all_comparisons_block_true(stream, lwe_array_out, sum, mem_ptr, bsk, ksk,
num_sum_blocks);
// The result will be in the two first block. Everything else is
// garbage.
cuda_memset_async(lwe_array_out + big_lwe_size, 0,
big_lwe_size_bytes * (num_radix_blocks - 1), stream);
}
template <typename Torus>
__host__ void host_integer_radix_equality_check_kb(
cuda_stream_t *stream, Torus *lwe_array_out, Torus *lwe_array_1,
Torus *lwe_array_2, int_comparison_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
auto eq_buffer = mem_ptr->eq_buffer;
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
// Applies the LUT for the comparison operation
auto comparisons = mem_ptr->tmp_block_comparisons;
integer_radix_apply_bivariate_lookup_table_kb(
stream, comparisons, lwe_array_1, lwe_array_2, bsk, ksk, num_radix_blocks,
eq_buffer->operator_lut);
// This takes a Vec of blocks, where each block is either 0 or 1.
//
// It return a block encrypting 1 if all input blocks are 1
// otherwise the block encrypts 0
are_all_comparisons_block_true(stream, lwe_array_out, comparisons, mem_ptr,
bsk, ksk, num_radix_blocks);
// Zero all blocks but the first
size_t big_lwe_size = big_lwe_dimension + 1;
size_t big_lwe_size_bytes = big_lwe_size * sizeof(Torus);
cuda_memset_async(lwe_array_out + big_lwe_size, 0,
big_lwe_size_bytes * (num_radix_blocks - 1), stream);
}
template <typename Torus>
__host__ void scratch_cuda_integer_radix_equality_check_kb(
cuda_stream_t *stream, int_comparison_buffer<Torus> **mem_ptr,
uint32_t num_radix_blocks, int_radix_params params, COMPARISON_TYPE op,
bool allocate_gpu_memory) {
*mem_ptr = new int_comparison_buffer<Torus>(
stream, op, params, num_radix_blocks, allocate_gpu_memory);
}
template <typename Torus>
__host__ void
compare_radix_blocks_kb(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_array_left, Torus *lwe_array_right,
int_comparison_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto message_modulus = params.message_modulus;
auto carry_modulus = params.carry_modulus;
// When rhs > lhs, the subtraction will overflow, and the bit of padding will
// be set to 1
// meaning that the output of the pbs will be the negative (modulo message
// space)
//
// Example:
// lhs: 1, rhs: 3, message modulus: 4, carry modulus 4
// lhs - rhs = -2 % (4 * 4) = 14 = 1|1110 (padding_bit|b4b3b2b1)
// Since there was an overflow the bit of padding is 1 and not 0.
// When applying the LUT for an input value of 14 we would expect 1,
// but since the bit of padding is 1, we will get -1 modulus our message
// space, so (-1) % (4 * 4) = 15 = 1|1111 We then add one and get 0 = 0|0000
// Subtract
// Here we need the true lwe sub, not the one that comes from shortint.
host_subtraction(stream, lwe_array_out, lwe_array_left, lwe_array_right,
big_lwe_dimension, num_radix_blocks);
// Apply LUT to compare to 0
auto is_non_zero_lut = mem_ptr->eq_buffer->is_non_zero_lut;
integer_radix_apply_univariate_lookup_table_kb(
stream, lwe_array_out, lwe_array_out, bsk, ksk, num_radix_blocks,
is_non_zero_lut);
// Add one
// Here Lhs can have the following values: (-1) % (message modulus * carry
// modulus), 0, 1 So the output values after the addition will be: 0, 1, 2
host_integer_radix_add_scalar_one_inplace(stream, lwe_array_out,
big_lwe_dimension, num_radix_blocks,
message_modulus, carry_modulus);
}
// Reduces a vec containing shortint blocks that encrypts a sign
// (inferior, equal, superior) to one single shortint block containing the
// final sign
template <typename Torus>
__host__ void
tree_sign_reduction(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_block_comparisons,
int_tree_sign_reduction_buffer<Torus> *tree_buffer,
std::function<Torus(Torus)> sign_handler_f, void *bsk,
Torus *ksk, uint32_t num_radix_blocks) {
auto params = tree_buffer->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto glwe_dimension = params.glwe_dimension;
auto polynomial_size = params.polynomial_size;
auto message_modulus = params.message_modulus;
auto carry_modulus = params.carry_modulus;
// Tree reduction
// Reduces a vec containing shortint blocks that encrypts a sign
// (inferior, equal, superior) to one single shortint block containing the
// final sign
size_t big_lwe_size = big_lwe_dimension + 1;
size_t big_lwe_size_bytes = big_lwe_size * sizeof(Torus);
auto x = tree_buffer->tmp_x;
auto y = tree_buffer->tmp_y;
if (x != lwe_block_comparisons)
cuda_memcpy_async_gpu_to_gpu(x, lwe_block_comparisons,
big_lwe_size_bytes * num_radix_blocks, stream);
uint32_t partial_block_count = num_radix_blocks;
auto inner_tree_leaf = tree_buffer->tree_inner_leaf_lut;
while (partial_block_count > 2) {
pack_blocks(stream, y, x, big_lwe_dimension, partial_block_count, 4);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, x, y, bsk, ksk, partial_block_count >> 1, inner_tree_leaf);
if ((partial_block_count % 2) != 0) {
partial_block_count >>= 1;
partial_block_count++;
auto last_y_block = y + (partial_block_count - 1) * big_lwe_size;
auto last_x_block = x + (partial_block_count - 1) * big_lwe_size;
cuda_memcpy_async_gpu_to_gpu(last_x_block, last_y_block,
big_lwe_size_bytes, stream);
} else {
partial_block_count >>= 1;
}
}
auto last_lut = tree_buffer->tree_last_leaf_lut;
auto block_selector_f = tree_buffer->block_selector_f;
std::function<Torus(Torus)> f;
if (partial_block_count == 2) {
pack_blocks(stream, y, x, big_lwe_dimension, partial_block_count, 4);
f = [block_selector_f, sign_handler_f](Torus x) -> Torus {
int msb = (x >> 2) & 3;
int lsb = x & 3;
int final_sign = block_selector_f(msb, lsb);
return sign_handler_f(final_sign);
};
} else {
// partial_block_count == 1
y = x;
f = sign_handler_f;
}
generate_device_accumulator<Torus>(stream, last_lut->lut, glwe_dimension,
polynomial_size, message_modulus,
carry_modulus, f);
// Last leaf
integer_radix_apply_univariate_lookup_table_kb(stream, lwe_array_out, y, bsk,
ksk, 1, last_lut);
}
template <typename Torus>
__host__ void host_integer_radix_difference_check_kb(
cuda_stream_t *stream, Torus *lwe_array_out, Torus *lwe_array_left,
Torus *lwe_array_right, int_comparison_buffer<Torus> *mem_ptr,
std::function<Torus(Torus)> reduction_lut_f, void *bsk, Torus *ksk,
uint32_t total_num_radix_blocks) {
auto diff_buffer = mem_ptr->diff_buffer;
auto params = mem_ptr->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto message_modulus = params.message_modulus;
auto carry_modulus = params.carry_modulus;
uint32_t num_radix_blocks = total_num_radix_blocks;
auto lhs = lwe_array_left;
auto rhs = lwe_array_right;
if (carry_modulus == message_modulus) {
// Packing is possible
// Pack inputs
Torus *packed_left = diff_buffer->tmp_packed_left;
Torus *packed_right = diff_buffer->tmp_packed_right;
pack_blocks(stream, packed_left, lwe_array_left, big_lwe_dimension,
num_radix_blocks, message_modulus);
pack_blocks(stream, packed_right, lwe_array_right, big_lwe_dimension,
num_radix_blocks, message_modulus);
// From this point we have half number of blocks
num_radix_blocks /= 2;
// Clean noise
auto cleaning_lut = mem_ptr->cleaning_lut;
integer_radix_apply_univariate_lookup_table_kb(
stream, packed_left, packed_left, bsk, ksk, num_radix_blocks,
cleaning_lut);
integer_radix_apply_univariate_lookup_table_kb(
stream, packed_right, packed_right, bsk, ksk, num_radix_blocks,
cleaning_lut);
lhs = packed_left;
rhs = packed_right;
}
// comparisons will be assigned
// - 0 if lhs < rhs
// - 1 if lhs == rhs
// - 2 if lhs > rhs
auto comparisons = mem_ptr->tmp_block_comparisons;
compare_radix_blocks_kb(stream, comparisons, lhs, rhs, mem_ptr, bsk, ksk,
num_radix_blocks);
// Reduces a vec containing radix blocks that encrypts a sign
// (inferior, equal, superior) to one single radix block containing the
// final sign
tree_sign_reduction(stream, lwe_array_out, comparisons,
mem_ptr->diff_buffer->tree_buffer, reduction_lut_f, bsk,
ksk, num_radix_blocks);
// The result will be in the first block. Everything else is garbage.
size_t big_lwe_size = big_lwe_dimension + 1;
size_t big_lwe_size_bytes = big_lwe_size * sizeof(Torus);
cuda_memset_async(lwe_array_out + big_lwe_size, 0,
(total_num_radix_blocks - 1) * big_lwe_size_bytes, stream);
}
template <typename Torus>
__host__ void scratch_cuda_integer_radix_difference_check_kb(
cuda_stream_t *stream, int_comparison_buffer<Torus> **mem_ptr,
uint32_t num_radix_blocks, int_radix_params params, COMPARISON_TYPE op,
bool allocate_gpu_memory) {
*mem_ptr = new int_comparison_buffer<Torus>(
stream, op, params, num_radix_blocks, allocate_gpu_memory);
}
template <typename Torus>
__host__ void
host_integer_radix_maxmin_kb(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_array_left, Torus *lwe_array_right,
int_comparison_buffer<Torus> *mem_ptr, void *bsk,
Torus *ksk, uint32_t total_num_radix_blocks) {
// Compute the sign
host_integer_radix_difference_check_kb(
stream, mem_ptr->tmp_lwe_array_out, lwe_array_left, lwe_array_right,
mem_ptr, mem_ptr->cleaning_lut_f, bsk, ksk, total_num_radix_blocks);
// Selector
host_integer_radix_cmux_kb(
stream, lwe_array_out, mem_ptr->tmp_lwe_array_out, lwe_array_left,
lwe_array_right, mem_ptr->cmux_buffer, bsk, ksk, total_num_radix_blocks);
}
#endif

View File

@@ -0,0 +1,127 @@
#include "integer/integer.cuh"
#include <linear_algebra.h>
void cuda_full_propagation_64_inplace(
cuda_stream_t *stream, void *input_blocks, int8_t *mem_ptr, void *ksk,
void *bsk, uint32_t lwe_dimension, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t ks_base_log, uint32_t ks_level,
uint32_t pbs_base_log, uint32_t pbs_level, uint32_t grouping_factor,
uint32_t num_blocks) {
switch (polynomial_size) {
case 256:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<256>>(
stream, static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 512:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<512>>(
stream, static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 1024:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<1024>>(
stream, static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 2048:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<2048>>(
stream, static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 4096:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<4096>>(
stream, static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 8192:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<8192>>(
stream, static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
case 16384:
host_full_propagate_inplace<uint64_t, int64_t, AmortizedDegree<16384>>(
stream, static_cast<uint64_t *>(input_blocks),
(int_fullprop_buffer<uint64_t> *)mem_ptr, static_cast<uint64_t *>(ksk),
bsk, lwe_dimension, glwe_dimension, polynomial_size, ks_base_log,
ks_level, pbs_base_log, pbs_level, grouping_factor, num_blocks);
break;
default:
break;
}
}
void scratch_cuda_full_propagation_64(
cuda_stream_t *stream, int8_t **mem_ptr, uint32_t lwe_dimension,
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t grouping_factor, uint32_t input_lwe_ciphertext_count,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
bool allocate_gpu_memory) {
scratch_cuda_full_propagation<uint64_t>(
stream, (int_fullprop_buffer<uint64_t> **)mem_ptr, lwe_dimension,
glwe_dimension, polynomial_size, level_count, grouping_factor,
input_lwe_ciphertext_count, message_modulus, carry_modulus, pbs_type,
allocate_gpu_memory);
}
void cleanup_cuda_full_propagation(cuda_stream_t *stream,
int8_t **mem_ptr_void) {
int_fullprop_buffer<uint64_t> *mem_ptr =
(int_fullprop_buffer<uint64_t> *)(*mem_ptr_void);
cuda_drop_async(mem_ptr->lut_buffer, stream);
cuda_drop_async(mem_ptr->lut_indexes, stream);
cuda_drop_async(mem_ptr->pbs_buffer, stream);
cuda_drop_async(mem_ptr->tmp_small_lwe_vector, stream);
cuda_drop_async(mem_ptr->tmp_big_lwe_vector, stream);
}
void scratch_cuda_propagate_single_carry_low_latency_kb_64_inplace(
cuda_stream_t *stream, int8_t **mem_ptr, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t big_lwe_dimension,
uint32_t small_lwe_dimension, uint32_t ks_level, uint32_t ks_base_log,
uint32_t pbs_level, uint32_t pbs_base_log, uint32_t grouping_factor,
uint32_t num_blocks, uint32_t message_modulus, uint32_t carry_modulus,
PBS_TYPE pbs_type, bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
big_lwe_dimension, small_lwe_dimension, ks_level,
ks_base_log, pbs_level, pbs_base_log, grouping_factor,
message_modulus, carry_modulus);
scratch_cuda_propagate_single_carry_low_latency_kb_inplace(
stream, (int_sc_prop_memory<uint64_t> **)mem_ptr, num_blocks, params,
allocate_gpu_memory);
}
void cuda_propagate_single_carry_low_latency_kb_64_inplace(
cuda_stream_t *stream, void *lwe_array, int8_t *mem_ptr, void *bsk,
void *ksk, uint32_t num_blocks) {
host_propagate_single_carry_low_latency<uint64_t>(
stream, static_cast<uint64_t *>(lwe_array),
(int_sc_prop_memory<uint64_t> *)mem_ptr, bsk,
static_cast<uint64_t *>(ksk), num_blocks);
}
void cleanup_cuda_propagate_single_carry_low_latency(cuda_stream_t *stream,
int8_t **mem_ptr_void) {
int_sc_prop_memory<uint64_t> *mem_ptr =
(int_sc_prop_memory<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(stream);
}

View File

@@ -0,0 +1,677 @@
#ifndef CUDA_INTEGER_CUH
#define CUDA_INTEGER_CUH
#include "crypto/keyswitch.cuh"
#include "device.h"
#include "integer.h"
#include "integer/scalar_addition.cuh"
#include "linear_algebra.h"
#include "linearalgebra/addition.cuh"
#include "pbs/bootstrap_low_latency.cuh"
#include "pbs/bootstrap_multibit.cuh"
#include "polynomial/functions.cuh"
#include "utils/kernel_dimensions.cuh"
#include <functional>
template <typename Torus>
void execute_pbs(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_output_indexes, Torus *lut_vector,
Torus *lut_vector_indexes, Torus *lwe_array_in,
Torus *lwe_input_indexes, void *bootstrapping_key,
int8_t *pbs_buffer, uint32_t glwe_dimension,
uint32_t lwe_dimension, uint32_t polynomial_size,
uint32_t base_log, uint32_t level_count,
uint32_t grouping_factor, uint32_t input_lwe_ciphertext_count,
uint32_t num_luts, uint32_t lwe_idx,
uint32_t max_shared_memory, PBS_TYPE pbs_type) {
if (sizeof(Torus) == sizeof(uint32_t)) {
// 32 bits
switch (pbs_type) {
case MULTI_BIT:
printf("multibit\n");
printf("Error: 32-bit multibit PBS is not supported.\n");
break;
case LOW_LAT:
cuda_bootstrap_low_latency_lwe_ciphertext_vector_32(
stream, lwe_array_out, lwe_output_indexes, lut_vector,
lut_vector_indexes, lwe_array_in, lwe_input_indexes,
bootstrapping_key, pbs_buffer, lwe_dimension, glwe_dimension,
polynomial_size, base_log, level_count, input_lwe_ciphertext_count,
num_luts, lwe_idx, max_shared_memory);
break;
case AMORTIZED:
cuda_bootstrap_amortized_lwe_ciphertext_vector_32(
stream, lwe_array_out, lwe_output_indexes, lut_vector,
lut_vector_indexes, lwe_array_in, lwe_input_indexes,
bootstrapping_key, pbs_buffer, lwe_dimension, glwe_dimension,
polynomial_size, base_log, level_count, input_lwe_ciphertext_count,
num_luts, lwe_idx, max_shared_memory);
break;
default:
break;
}
} else {
// 64 bits
switch (pbs_type) {
case MULTI_BIT:
cuda_multi_bit_pbs_lwe_ciphertext_vector_64(
stream, lwe_array_out, lwe_output_indexes, lut_vector,
lut_vector_indexes, lwe_array_in, lwe_input_indexes,
bootstrapping_key, pbs_buffer, lwe_dimension, glwe_dimension,
polynomial_size, grouping_factor, base_log, level_count,
input_lwe_ciphertext_count, num_luts, lwe_idx,
max_shared_memory);
break;
case LOW_LAT:
cuda_bootstrap_low_latency_lwe_ciphertext_vector_64(
stream, lwe_array_out, lwe_output_indexes, lut_vector,
lut_vector_indexes, lwe_array_in, lwe_input_indexes,
bootstrapping_key, pbs_buffer, lwe_dimension, glwe_dimension,
polynomial_size, base_log, level_count, input_lwe_ciphertext_count,
num_luts, lwe_idx, max_shared_memory);
break;
case AMORTIZED:
cuda_bootstrap_amortized_lwe_ciphertext_vector_64(
stream, lwe_array_out, lwe_output_indexes, lut_vector,
lut_vector_indexes, lwe_array_in, lwe_input_indexes,
bootstrapping_key, pbs_buffer, lwe_dimension, glwe_dimension,
polynomial_size, base_log, level_count, input_lwe_ciphertext_count,
num_luts, lwe_idx, max_shared_memory);
break;
default:
break;
}
}
}
// function rotates right radix ciphertext with specific value
// grid is one dimensional
// blockIdx.x represents x_th block of radix ciphertext
template <typename Torus>
__global__ void radix_blocks_rotate_right(Torus *dst, Torus *src,
uint32_t value, uint32_t blocks_count,
uint32_t lwe_size) {
value %= blocks_count;
size_t tid = threadIdx.x;
size_t src_block_id = blockIdx.x;
size_t dst_block_id = (src_block_id + value) % blocks_count;
size_t stride = blockDim.x;
auto cur_src_block = &src[src_block_id * lwe_size];
auto cur_dst_block = &dst[dst_block_id * lwe_size];
for (size_t i = tid; i < lwe_size; i += stride) {
cur_dst_block[i] = cur_src_block[i];
}
}
// function rotates left radix ciphertext with specific value
// grid is one dimensional
// blockIdx.x represents x_th block of radix ciphertext
template <typename Torus>
__global__ void radix_blocks_rotate_left(Torus *dst, Torus *src, uint32_t value,
uint32_t blocks_count,
uint32_t lwe_size) {
value %= blocks_count;
size_t src_block_id = blockIdx.x;
size_t tid = threadIdx.x;
size_t dst_block_id = (src_block_id >= value)
? src_block_id - value
: src_block_id - value + blocks_count;
size_t stride = blockDim.x;
auto cur_src_block = &src[src_block_id * lwe_size];
auto cur_dst_block = &dst[dst_block_id * lwe_size];
for (size_t i = tid; i < lwe_size; i += stride) {
cur_dst_block[i] = cur_src_block[i];
}
}
// polynomial_size threads
template <typename Torus>
__global__ void
device_pack_bivariate_blocks(Torus *lwe_array_out, Torus *lwe_array_1,
Torus *lwe_array_2, Torus *lwe_indexes,
uint32_t lwe_dimension, uint32_t message_modulus,
uint32_t num_blocks) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid < num_blocks * (lwe_dimension + 1)) {
int block_id = tid / (lwe_dimension + 1);
int coeff_id = tid % (lwe_dimension + 1);
int pos = lwe_indexes[block_id] * (lwe_dimension + 1) + coeff_id;
lwe_array_out[pos] = lwe_array_1[pos] * message_modulus + lwe_array_2[pos];
}
}
template <typename Torus>
__host__ void pack_bivariate_blocks(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_array_1, Torus *lwe_array_2,
Torus *lwe_indexes, uint32_t lwe_dimension,
uint32_t message_modulus,
uint32_t num_radix_blocks) {
// Left message is shifted
int num_blocks = 0, num_threads = 0;
int num_entries = num_radix_blocks * (lwe_dimension + 1);
getNumBlocksAndThreads(num_entries, 512, num_blocks, num_threads);
device_pack_bivariate_blocks<<<num_blocks, num_threads, 0, stream->stream>>>(
lwe_array_out, lwe_array_1, lwe_array_2, lwe_indexes, lwe_dimension,
message_modulus, num_radix_blocks);
check_cuda_error(cudaGetLastError());
}
template <typename Torus>
__host__ void integer_radix_apply_univariate_lookup_table_kb(
cuda_stream_t *stream, Torus *lwe_array_out, Torus *lwe_array_in, void *bsk,
Torus *ksk, uint32_t num_radix_blocks, int_radix_lut<Torus> *lut) {
// apply_lookup_table
auto params = lut->params;
auto pbs_type = params.pbs_type;
auto big_lwe_dimension = params.big_lwe_dimension;
auto small_lwe_dimension = params.small_lwe_dimension;
auto ks_level = params.ks_level;
auto ks_base_log = params.ks_base_log;
auto pbs_level = params.pbs_level;
auto pbs_base_log = params.pbs_base_log;
auto glwe_dimension = params.glwe_dimension;
auto polynomial_size = params.polynomial_size;
auto grouping_factor = params.grouping_factor;
// Compute Keyswitch-PBS
cuda_keyswitch_lwe_ciphertext_vector(
stream, lut->tmp_lwe_after_ks, lut->lwe_indexes, lwe_array_in,
lut->lwe_indexes, ksk, big_lwe_dimension, small_lwe_dimension,
ks_base_log, ks_level, num_radix_blocks);
execute_pbs(stream, lwe_array_out, lut->lwe_indexes, lut->lut,
lut->lut_indexes, lut->tmp_lwe_after_ks, lut->lwe_indexes, bsk,
lut->pbs_buffer, glwe_dimension, small_lwe_dimension,
polynomial_size, pbs_base_log, pbs_level, grouping_factor,
num_radix_blocks, 1, 0,
cuda_get_max_shared_memory(stream->gpu_index), pbs_type);
}
template <typename Torus>
__host__ void integer_radix_apply_bivariate_lookup_table_kb(
cuda_stream_t *stream, Torus *lwe_array_out, Torus *lwe_array_1,
Torus *lwe_array_2, void *bsk, Torus *ksk, uint32_t num_radix_blocks,
int_radix_lut<Torus> *lut) {
// apply_lookup_table_bivariate
auto params = lut->params;
auto big_lwe_dimension = params.big_lwe_dimension;
auto message_modulus = params.message_modulus;
// Left message is shifted
pack_bivariate_blocks(stream, lut->tmp_lwe_before_ks, lwe_array_1,
lwe_array_2, lut->lwe_indexes, big_lwe_dimension,
message_modulus, num_radix_blocks);
check_cuda_error(cudaGetLastError());
// Apply LUT
integer_radix_apply_univariate_lookup_table_kb(stream, lwe_array_out,
lut->tmp_lwe_before_ks, bsk,
ksk, num_radix_blocks, lut);
}
// Rotates the slice in-place such that the first mid elements of the slice move
// to the end while the last array_length elements move to the front. After
// calling rotate_left, the element previously at index mid will become the
// first element in the slice.
template <typename Torus>
void rotate_left(Torus *buffer, int mid, uint32_t array_length) {
mid = mid % array_length;
std::rotate(buffer, buffer + mid, buffer + array_length);
}
template <typename Torus>
void generate_lookup_table(Torus *acc, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t message_modulus,
uint32_t carry_modulus,
std::function<Torus(Torus)> f) {
uint32_t modulus_sup = message_modulus * carry_modulus;
uint32_t box_size = polynomial_size / modulus_sup;
Torus delta = (1ul << 63) / modulus_sup;
memset(acc, 0, glwe_dimension * polynomial_size * sizeof(Torus));
auto body = &acc[glwe_dimension * polynomial_size];
// This accumulator extracts the carry bits
for (int i = 0; i < modulus_sup; i++) {
int index = i * box_size;
for (int j = index; j < index + box_size; j++) {
auto f_eval = f(i);
body[j] = f_eval * delta;
}
}
int half_box_size = box_size / 2;
// Negate the first half_box_size coefficients
for (int i = 0; i < half_box_size; i++) {
body[i] = -body[i];
}
rotate_left(body, half_box_size, polynomial_size);
}
template <typename Torus>
void generate_lookup_table_bivariate(Torus *acc, uint32_t glwe_dimension,
uint32_t polynomial_size,
uint32_t message_modulus,
uint32_t carry_modulus,
std::function<Torus(Torus, Torus)> f) {
Torus factor_u64 = message_modulus;
auto wrapped_f = [factor_u64, message_modulus, f](Torus input) -> Torus {
Torus lhs = (input / factor_u64) % message_modulus;
Torus rhs = (input % factor_u64) % message_modulus;
return f(lhs, rhs);
};
generate_lookup_table<Torus>(acc, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, wrapped_f);
}
/*
* generate bivariate accumulator for device pointer
* v_stream - cuda stream
* acc - device pointer for bivariate accumulator
* ...
* f - wrapping function with two Torus inputs
*/
template <typename Torus>
void generate_device_accumulator_bivariate(
cuda_stream_t *stream, Torus *acc_bivariate, uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t message_modulus, uint32_t carry_modulus,
std::function<Torus(Torus, Torus)> f) {
// host lut
Torus *h_lut =
(Torus *)malloc((glwe_dimension + 1) * polynomial_size * sizeof(Torus));
// fill bivariate accumulator
generate_lookup_table_bivariate<Torus>(h_lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, f);
// copy host lut and lut_indexes to device
cuda_memcpy_async_to_gpu(
acc_bivariate, h_lut,
(glwe_dimension + 1) * polynomial_size * sizeof(Torus), stream);
cuda_synchronize_stream(stream);
free(h_lut);
}
/*
* generate bivariate accumulator for device pointer
* v_stream - cuda stream
* acc - device pointer for accumulator
* ...
* f - evaluating function with one Torus input
*/
template <typename Torus>
void generate_device_accumulator(cuda_stream_t *stream, Torus *acc,
uint32_t glwe_dimension,
uint32_t polynomial_size,
uint32_t message_modulus,
uint32_t carry_modulus,
std::function<Torus(Torus)> f) {
// host lut
Torus *h_lut =
(Torus *)malloc((glwe_dimension + 1) * polynomial_size * sizeof(Torus));
// fill accumulator
generate_lookup_table<Torus>(h_lut, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, f);
// copy host lut and lut_indexes to device
cuda_memcpy_async_to_gpu(
acc, h_lut, (glwe_dimension + 1) * polynomial_size * sizeof(Torus),
stream);
cuda_synchronize_stream(stream);
free(h_lut);
}
template <typename Torus>
void scratch_cuda_propagate_single_carry_low_latency_kb_inplace(
cuda_stream_t *stream, int_sc_prop_memory<Torus> **mem_ptr,
uint32_t num_radix_blocks, int_radix_params params,
bool allocate_gpu_memory) {
*mem_ptr = new int_sc_prop_memory<Torus>(stream, params, num_radix_blocks,
allocate_gpu_memory);
}
template <typename Torus>
void host_propagate_single_carry_low_latency(cuda_stream_t *stream,
Torus *lwe_array,
int_sc_prop_memory<Torus> *mem,
void *bsk, Torus *ksk,
uint32_t num_blocks) {
auto params = mem->params;
auto glwe_dimension = params.glwe_dimension;
auto polynomial_size = params.polynomial_size;
auto message_modulus = params.message_modulus;
auto big_lwe_size = glwe_dimension * polynomial_size + 1;
auto big_lwe_size_bytes = big_lwe_size * sizeof(Torus);
auto generates_or_propagates = mem->generates_or_propagates;
auto step_output = mem->step_output;
auto luts_array = mem->luts_array;
auto luts_carry_propagation_sum = mem->luts_carry_propagation_sum;
auto message_acc = mem->message_acc;
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, generates_or_propagates, lwe_array, bsk, ksk, num_blocks,
luts_array);
// compute prefix sum with hillis&steele
int num_steps = ceil(log2((double)num_blocks));
int space = 1;
cuda_memcpy_async_gpu_to_gpu(step_output, generates_or_propagates,
big_lwe_size_bytes * num_blocks, stream);
for (int step = 0; step < num_steps; step++) {
auto cur_blocks = &step_output[space * big_lwe_size];
auto prev_blocks = generates_or_propagates;
int cur_total_blocks = num_blocks - space;
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
stream, cur_blocks, cur_blocks, prev_blocks, bsk, ksk, cur_total_blocks,
luts_carry_propagation_sum);
cuda_memcpy_async_gpu_to_gpu(&generates_or_propagates[space * big_lwe_size],
cur_blocks,
big_lwe_size_bytes * cur_total_blocks, stream);
space *= 2;
}
radix_blocks_rotate_right<<<num_blocks, 256, 0, stream->stream>>>(
step_output, generates_or_propagates, 1, num_blocks, big_lwe_size);
cuda_memset_async(step_output, 0, big_lwe_size_bytes, stream);
host_addition(stream, lwe_array, lwe_array, step_output,
glwe_dimension * polynomial_size, num_blocks);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, lwe_array, lwe_array, bsk, ksk, num_blocks, message_acc);
}
/*
* input_blocks: input radix ciphertext propagation will happen inplace
* acc_message_carry: list of two lut s, [(message_acc), (carry_acc)]
* lut_indexes_message_carry: lut_indexes for message and carry, should always be {0, 1}
* small_lwe_vector: output of keyswitch should have
* size = 2 * (lwe_dimension + 1) * sizeof(Torus)
* big_lwe_vector: output of pbs should have
* size = 2 * (glwe_dimension * polynomial_size + 1) * sizeof(Torus)
*/
template <typename Torus, typename STorus, class params>
void host_full_propagate_inplace(cuda_stream_t *stream, Torus *input_blocks,
int_fullprop_buffer<Torus> *mem_ptr,
Torus *ksk, void *bsk, uint32_t lwe_dimension,
uint32_t glwe_dimension,
uint32_t polynomial_size, uint32_t ks_base_log,
uint32_t ks_level, uint32_t pbs_base_log,
uint32_t pbs_level, uint32_t grouping_factor,
uint32_t num_blocks) {
int big_lwe_size = (glwe_dimension * polynomial_size + 1);
int small_lwe_size = (lwe_dimension + 1);
for (int i = 0; i < num_blocks; i++) {
auto cur_input_block = &input_blocks[i * big_lwe_size];
cuda_keyswitch_lwe_ciphertext_vector<Torus>(
stream, mem_ptr->tmp_small_lwe_vector, mem_ptr->lwe_indexes,
cur_input_block, mem_ptr->lwe_indexes, ksk,
polynomial_size * glwe_dimension, lwe_dimension, ks_base_log, ks_level,
1);
cuda_memcpy_async_gpu_to_gpu(&mem_ptr->tmp_small_lwe_vector[small_lwe_size],
mem_ptr->tmp_small_lwe_vector,
small_lwe_size * sizeof(Torus), stream);
execute_pbs<Torus>(
stream, mem_ptr->tmp_big_lwe_vector, mem_ptr->lwe_indexes,
mem_ptr->lut_buffer, mem_ptr->lut_indexes,
mem_ptr->tmp_small_lwe_vector, mem_ptr->lwe_indexes, bsk,
mem_ptr->pbs_buffer, glwe_dimension, lwe_dimension, polynomial_size,
pbs_base_log, pbs_level, grouping_factor, 2, 2, 0,
cuda_get_max_shared_memory(stream->gpu_index), mem_ptr->pbs_type);
cuda_memcpy_async_gpu_to_gpu(cur_input_block, mem_ptr->tmp_big_lwe_vector,
big_lwe_size * sizeof(Torus), stream);
if (i < num_blocks - 1) {
auto next_input_block = &input_blocks[(i + 1) * big_lwe_size];
host_addition(stream, next_input_block, next_input_block,
&mem_ptr->tmp_big_lwe_vector[big_lwe_size],
glwe_dimension * polynomial_size, 1);
}
}
}
template <typename Torus>
void scratch_cuda_full_propagation(
cuda_stream_t *stream, int_fullprop_buffer<Torus> **mem_ptr,
uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
uint32_t pbs_level, uint32_t grouping_factor, uint32_t num_radix_blocks,
uint32_t message_modulus, uint32_t carry_modulus, PBS_TYPE pbs_type,
bool allocate_gpu_memory) {
// PBS
int8_t *pbs_buffer;
if (pbs_type == MULTI_BIT) {
uint32_t lwe_chunk_size = get_average_lwe_chunk_size(
lwe_dimension, pbs_level, glwe_dimension, num_radix_blocks);
// Only 64 bits is supported
scratch_cuda_multi_bit_pbs_64(stream, &pbs_buffer, lwe_dimension,
glwe_dimension, polynomial_size, pbs_level,
grouping_factor, num_radix_blocks,
cuda_get_max_shared_memory(stream->gpu_index),
allocate_gpu_memory, lwe_chunk_size);
} else {
// Classic
// We only use low latency for classic mode
if (sizeof(Torus) == sizeof(uint32_t))
scratch_cuda_bootstrap_low_latency_32(
stream, &pbs_buffer, glwe_dimension, polynomial_size, pbs_level,
num_radix_blocks, cuda_get_max_shared_memory(stream->gpu_index),
allocate_gpu_memory);
else
scratch_cuda_bootstrap_low_latency_64(
stream, &pbs_buffer, glwe_dimension, polynomial_size, pbs_level,
num_radix_blocks, cuda_get_max_shared_memory(stream->gpu_index),
allocate_gpu_memory);
}
// LUT
Torus *lut_buffer;
if (allocate_gpu_memory) {
// LUT is used as a trivial encryption, so we only allocate memory for the
// body
Torus lut_buffer_size =
2 * (glwe_dimension + 1) * polynomial_size * sizeof(Torus);
lut_buffer = (Torus *)cuda_malloc_async(lut_buffer_size, stream);
// LUTs
auto lut_f_message = [message_modulus](Torus x) -> Torus {
return x % message_modulus;
};
auto lut_f_carry = [message_modulus](Torus x) -> Torus {
return x / message_modulus;
};
//
Torus *lut_buffer_message = lut_buffer;
Torus *lut_buffer_carry =
lut_buffer + (glwe_dimension + 1) * polynomial_size;
generate_device_accumulator<Torus>(
stream, lut_buffer_message, glwe_dimension, polynomial_size,
message_modulus, carry_modulus, lut_f_message);
generate_device_accumulator<Torus>(stream, lut_buffer_carry, glwe_dimension,
polynomial_size, message_modulus,
carry_modulus, lut_f_carry);
}
Torus *lut_indexes;
if (allocate_gpu_memory) {
lut_indexes = (Torus *)cuda_malloc_async(2 * sizeof(Torus), stream);
Torus h_lut_indexes[2] = {0, 1};
cuda_memcpy_async_to_gpu(lut_indexes, h_lut_indexes, 2 * sizeof(Torus),
stream);
}
Torus *lwe_indexes;
if (allocate_gpu_memory) {
Torus lwe_indexes_size = num_radix_blocks * sizeof(Torus);
lwe_indexes = (Torus *)cuda_malloc_async(lwe_indexes_size, stream);
Torus *h_lwe_indexes = (Torus *)malloc(lwe_indexes_size);
for (int i = 0; i < num_radix_blocks; i++)
h_lwe_indexes[i] = i;
cuda_memcpy_async_to_gpu(lwe_indexes, h_lwe_indexes, lwe_indexes_size,
stream);
cuda_synchronize_stream(stream);
free(h_lwe_indexes);
}
// Temporary arrays
Torus *small_lwe_vector;
Torus *big_lwe_vector;
if (allocate_gpu_memory) {
Torus small_vector_size = 2 * (lwe_dimension + 1) * sizeof(Torus);
Torus big_vector_size =
2 * (glwe_dimension * polynomial_size + 1) * sizeof(Torus);
small_lwe_vector = (Torus *)cuda_malloc_async(small_vector_size, stream);
big_lwe_vector = (Torus *)cuda_malloc_async(big_vector_size, stream);
}
*mem_ptr = new int_fullprop_buffer<Torus>;
(*mem_ptr)->pbs_type = pbs_type;
(*mem_ptr)->pbs_buffer = pbs_buffer;
(*mem_ptr)->lut_buffer = lut_buffer;
(*mem_ptr)->lut_indexes = lut_indexes;
(*mem_ptr)->lwe_indexes = lwe_indexes;
(*mem_ptr)->tmp_small_lwe_vector = small_lwe_vector;
(*mem_ptr)->tmp_big_lwe_vector = big_lwe_vector;
}
// (lwe_dimension+1) threads
// (num_radix_blocks / 2) thread blocks
template <typename Torus>
__global__ void device_pack_blocks(Torus *lwe_array_out, Torus *lwe_array_in,
uint32_t lwe_dimension,
uint32_t num_radix_blocks, uint32_t factor) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid < (lwe_dimension + 1)) {
for (int bid = 0; bid < (num_radix_blocks / 2); bid++) {
Torus *lsb_block = lwe_array_in + (2 * bid) * (lwe_dimension + 1);
Torus *msb_block = lsb_block + (lwe_dimension + 1);
Torus *packed_block = lwe_array_out + bid * (lwe_dimension + 1);
packed_block[tid] = lsb_block[tid] + factor * msb_block[tid];
}
if (num_radix_blocks % 2 != 0) {
// We couldn't pack the last block, so we just copy it
Torus *lsb_block =
lwe_array_in + (num_radix_blocks - 1) * (lwe_dimension + 1);
Torus *last_block =
lwe_array_out + (num_radix_blocks / 2) * (lwe_dimension + 1);
last_block[tid] = lsb_block[tid];
}
}
}
// Packs the low ciphertext in the message parts of the high ciphertext
// and moves the high ciphertext into the carry part.
//
// This requires the block parameters to have enough room for two ciphertexts,
// so at least as many carry modulus as the message modulus
//
// Expects the carry buffer to be empty
template <typename Torus>
__host__ void pack_blocks(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *lwe_array_in, uint32_t lwe_dimension,
uint32_t num_radix_blocks, uint32_t factor) {
assert(lwe_array_out != lwe_array_in);
int num_blocks = 0, num_threads = 0;
int num_entries = (lwe_dimension + 1);
getNumBlocksAndThreads(num_entries, 512, num_blocks, num_threads);
device_pack_blocks<<<num_blocks, num_threads, 0, stream->stream>>>(
lwe_array_out, lwe_array_in, lwe_dimension, num_radix_blocks, factor);
}
template <typename Torus>
__global__ void
device_create_trivial_radix(Torus *lwe_array, Torus *scalar_input,
int32_t num_blocks, uint32_t lwe_dimension,
uint64_t delta) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < num_blocks) {
Torus scalar = scalar_input[tid];
Torus *body = lwe_array + tid * (lwe_dimension + 1) + lwe_dimension;
*body = scalar * delta;
}
}
template <typename Torus>
__host__ void
create_trivial_radix(cuda_stream_t *stream, Torus *lwe_array_out,
Torus *scalar_array, uint32_t lwe_dimension,
uint32_t num_radix_blocks, uint32_t num_scalar_blocks,
uint64_t message_modulus, uint64_t carry_modulus) {
size_t radix_size = (lwe_dimension + 1) * num_radix_blocks;
cuda_memset_async(lwe_array_out, 0, radix_size * sizeof(Torus), stream);
if (num_scalar_blocks == 0)
return;
// Create a 1-dimensional grid of threads
int num_blocks = 0, num_threads = 0;
int num_entries = num_scalar_blocks;
getNumBlocksAndThreads(num_entries, 512, num_blocks, num_threads);
dim3 grid(num_blocks, 1, 1);
dim3 thds(num_threads, 1, 1);
// Value of the shift we multiply our messages by
// If message_modulus and carry_modulus are always powers of 2 we can simplify
// this
uint64_t delta = ((uint64_t)1 << 63) / (message_modulus * carry_modulus);
device_create_trivial_radix<<<grid, thds, 0, stream->stream>>>(
lwe_array_out, scalar_array, num_scalar_blocks, lwe_dimension, delta);
check_cuda_error(cudaGetLastError());
}
#endif // TFHE_RS_INTERNAL_INTEGER_CUH

View File

@@ -0,0 +1,107 @@
#include "integer/multiplication.cuh"
/*
* This scratch function allocates the necessary amount of data on the GPU for
* the integer radix multiplication in keyswitch->bootstrap order.
*/
void scratch_cuda_integer_mult_radix_ciphertext_kb_64(
cuda_stream_t *stream, int8_t **mem_ptr, uint32_t message_modulus,
uint32_t carry_modulus, uint32_t glwe_dimension, uint32_t lwe_dimension,
uint32_t polynomial_size, uint32_t pbs_base_log, uint32_t pbs_level,
uint32_t ks_base_log, uint32_t ks_level, uint32_t grouping_factor,
uint32_t num_radix_blocks, PBS_TYPE pbs_type, uint32_t max_shared_memory,
bool allocate_gpu_memory) {
int_radix_params params(pbs_type, glwe_dimension, polynomial_size,
polynomial_size, lwe_dimension, ks_level, ks_base_log,
pbs_level, pbs_base_log, grouping_factor,
message_modulus, carry_modulus);
switch (polynomial_size) {
case 2048:
scratch_cuda_integer_mult_radix_ciphertext_kb<uint64_t>(
stream, (int_mul_memory<uint64_t> **)mem_ptr, num_radix_blocks, params,
allocate_gpu_memory);
break;
default:
break;
}
}
/*
* Computes a multiplication between two 64 bit radix lwe ciphertexts
* encrypting integer values. keyswitch -> bootstrap pattern is used, function
* works for single pair of radix ciphertexts, 'v_stream' can be used for
* parallelization
* - 'v_stream' is a void pointer to the Cuda stream to be used in the kernel
* launch
* - 'gpu_index' is the index of the GPU to be used in the kernel launch
* - 'radix_lwe_out' is 64 bit radix big lwe ciphertext, product of
* multiplication
* - 'radix_lwe_left' left radix big lwe ciphertext
* - 'radix_lwe_right' right radix big lwe ciphertext
* - 'bsk' bootstrapping key in fourier domain
* - 'ksk' keyswitching key
* - 'mem_ptr'
* - 'message_modulus' message_modulus
* - 'carry_modulus' carry_modulus
* - 'glwe_dimension' glwe_dimension
* - 'lwe_dimension' is the dimension of small lwe ciphertext
* - 'polynomial_size' polynomial size
* - 'pbs_base_log' base log used in the pbs
* - 'pbs_level' decomposition level count used in the pbs
* - 'ks_level' decomposition level count used in the keyswitch
* - 'num_blocks' is the number of big lwe ciphertext blocks inside radix
* ciphertext
* - 'pbs_type' selects which PBS implementation should be used
* - 'max_shared_memory' maximum shared memory per cuda block
*/
void cuda_integer_mult_radix_ciphertext_kb_64(
cuda_stream_t *stream, void *radix_lwe_out, void *radix_lwe_left,
void *radix_lwe_right, void *bsk, void *ksk, int8_t *mem_ptr,
uint32_t message_modulus, uint32_t carry_modulus, uint32_t glwe_dimension,
uint32_t lwe_dimension, uint32_t polynomial_size, uint32_t pbs_base_log,
uint32_t pbs_level, uint32_t ks_base_log, uint32_t ks_level,
uint32_t grouping_factor, uint32_t num_blocks, PBS_TYPE pbs_type,
uint32_t max_shared_memory) {
switch (polynomial_size) {
case 2048:
host_integer_mult_radix_kb<uint64_t, int64_t, AmortizedDegree<2048>>(
stream, static_cast<uint64_t *>(radix_lwe_out),
static_cast<uint64_t *>(radix_lwe_left),
static_cast<uint64_t *>(radix_lwe_right), bsk,
static_cast<uint64_t *>(ksk), (int_mul_memory<uint64_t> *)mem_ptr,
num_blocks);
break;
default:
break;
}
}
void cleanup_cuda_integer_mult(cuda_stream_t *stream, int8_t **mem_ptr_void) {
int_mul_memory<uint64_t> *mem_ptr =
(int_mul_memory<uint64_t> *)(*mem_ptr_void);
mem_ptr->release(stream);
}
void cuda_small_scalar_multiplication_integer_radix_ciphertext_64_inplace(
cuda_stream_t *stream, void *lwe_array, uint64_t scalar,
uint32_t lwe_dimension, uint32_t lwe_ciphertext_count) {
cuda_small_scalar_multiplication_integer_radix_ciphertext_64(
stream, lwe_array, lwe_array, scalar, lwe_dimension,
lwe_ciphertext_count);
}
void cuda_small_scalar_multiplication_integer_radix_ciphertext_64(
cuda_stream_t *stream, void *output_lwe_array, void *input_lwe_array,
uint64_t scalar, uint32_t lwe_dimension, uint32_t lwe_ciphertext_count) {
host_integer_small_scalar_mult_radix(
stream, static_cast<uint64_t *>(output_lwe_array),
static_cast<uint64_t *>(input_lwe_array), scalar, lwe_dimension,
lwe_ciphertext_count);
}

View File

@@ -0,0 +1,639 @@
#ifndef CUDA_INTEGER_MULT_CUH
#define CUDA_INTEGER_MULT_CUH
#ifdef __CDT_PARSER__
#undef __CUDA_RUNTIME_H__
#include <cuda_runtime.h>
#endif
#include "bootstrap.h"
#include "bootstrap_multibit.h"
#include "crypto/keyswitch.cuh"
#include "device.h"
#include "integer.h"
#include "integer/integer.cuh"
#include "linear_algebra.h"
#include "pbs/bootstrap_amortized.cuh"
#include "pbs/bootstrap_low_latency.cuh"
#include "pbs/bootstrap_multibit.cuh"
#include "utils/helper.cuh"
#include "utils/kernel_dimensions.cuh"
#include <fstream>
#include <iostream>
#include <omp.h>
#include <sstream>
#include <string>
#include <vector>
template <typename Torus, class params>
__global__ void
all_shifted_lhs_rhs(Torus *radix_lwe_left, Torus *lsb_ciphertext,
Torus *msb_ciphertext, Torus *radix_lwe_right,
Torus *lsb_rhs, Torus *msb_rhs, int num_blocks) {
size_t block_id = blockIdx.x;
double D = sqrt((2 * num_blocks + 1) * (2 * num_blocks + 1) - 8 * block_id);
size_t radix_id = int((2 * num_blocks + 1 - D) / 2.);
size_t local_block_id =
block_id - (2 * num_blocks - radix_id + 1) / 2. * radix_id;
bool process_msb = (local_block_id < (num_blocks - radix_id - 1));
auto cur_lsb_block = &lsb_ciphertext[block_id * (params::degree + 1)];
auto cur_msb_block =
(process_msb)
? &msb_ciphertext[(block_id - radix_id) * (params::degree + 1)]
: nullptr;
auto cur_lsb_rhs_block = &lsb_rhs[block_id * (params::degree + 1)];
auto cur_msb_rhs_block =
(process_msb) ? &msb_rhs[(block_id - radix_id) * (params::degree + 1)]
: nullptr;
auto cur_ct_right = &radix_lwe_right[radix_id * (params::degree + 1)];
auto cur_src = &radix_lwe_left[local_block_id * (params::degree + 1)];
size_t tid = threadIdx.x;
for (int i = 0; i < params::opt; i++) {
Torus value = cur_src[tid];
if (process_msb) {
cur_lsb_block[tid] = cur_msb_block[tid] = value;
cur_lsb_rhs_block[tid] = cur_msb_rhs_block[tid] = cur_ct_right[tid];
} else {
cur_lsb_block[tid] = value;
cur_lsb_rhs_block[tid] = cur_ct_right[tid];
}
tid += params::degree / params::opt;
}
if (threadIdx.x == 0) {
Torus value = cur_src[params::degree];
if (process_msb) {
cur_lsb_block[params::degree] = cur_msb_block[params::degree] = value;
cur_lsb_rhs_block[params::degree] = cur_msb_rhs_block[params::degree] =
cur_ct_right[params::degree];
} else {
cur_lsb_block[params::degree] = value;
cur_lsb_rhs_block[params::degree] = cur_ct_right[params::degree];
}
}
}
template <typename Torus>
void compress_device_array_with_map(cuda_stream_t *stream, Torus *src,
Torus *dst, int *S, int *F, int num_blocks,
uint32_t map_size, uint32_t unit_size,
int &total_copied, bool is_message) {
for (int i = 0; i < map_size; i++) {
int s_index = i * num_blocks + S[i];
int number_of_unit = F[i] - S[i] + is_message;
auto cur_dst = &dst[total_copied * unit_size];
auto cur_src = &src[s_index * unit_size];
size_t copy_size = unit_size * number_of_unit * sizeof(Torus);
cuda_memcpy_async_gpu_to_gpu(cur_dst, cur_src, copy_size, stream);
total_copied += number_of_unit;
}
}
template <typename Torus>
void extract_message_carry_to_full_radix(cuda_stream_t *stream, Torus *src,
Torus *dst, int *S, int *F,
uint32_t map_size, uint32_t unit_size,
int &total_copied,
int &total_radix_copied,
int num_blocks, bool is_message) {
size_t radix_size = unit_size * num_blocks;
for (int i = 0; i < map_size; i++) {
auto cur_dst_radix = &dst[total_radix_copied * radix_size];
int s_index = S[i];
int number_of_unit = F[i] - s_index + is_message;
if (!is_message) {
int zero_block_count = num_blocks - number_of_unit;
cuda_memset_async(cur_dst_radix, 0,
zero_block_count * unit_size * sizeof(Torus), stream);
s_index = zero_block_count;
}
auto cur_dst = &cur_dst_radix[s_index * unit_size];
auto cur_src = &src[total_copied * unit_size];
size_t copy_size = unit_size * number_of_unit * sizeof(Torus);
cuda_memcpy_async_gpu_to_gpu(cur_dst, cur_src, copy_size, stream);
total_copied += number_of_unit;
++total_radix_copied;
}
}
template <typename Torus, class params>
__global__ void tree_add_chunks(Torus *result_blocks, Torus *input_blocks,
uint32_t chunk_size, uint32_t num_blocks) {
extern __shared__ Torus result[];
size_t chunk_id = blockIdx.x;
size_t chunk_elem_size = chunk_size * num_blocks * (params::degree + 1);
size_t radix_elem_size = num_blocks * (params::degree + 1);
auto src_chunk = &input_blocks[chunk_id * chunk_elem_size];
auto dst_radix = &result_blocks[chunk_id * radix_elem_size];
size_t block_stride = blockIdx.y * (params::degree + 1);
auto dst_block = &dst_radix[block_stride];
// init shared mem with first radix of chunk
size_t tid = threadIdx.x;
for (int i = 0; i < params::opt; i++) {
result[tid] = src_chunk[block_stride + tid];
tid += params::degree / params::opt;
}
if (threadIdx.x == 0) {
result[params::degree] = src_chunk[block_stride + params::degree];
}
// accumulate rest of the radixes
for (int r_id = 1; r_id < chunk_size; r_id++) {
auto cur_src_radix = &src_chunk[r_id * radix_elem_size];
tid = threadIdx.x;
for (int i = 0; i < params::opt; i++) {
result[tid] += cur_src_radix[block_stride + tid];
tid += params::degree / params::opt;
}
if (threadIdx.x == 0) {
result[params::degree] += cur_src_radix[block_stride + params::degree];
}
}
// put result from shared mem to global mem
tid = threadIdx.x;
for (int i = 0; i < params::opt; i++) {
dst_block[tid] = result[tid];
tid += params::degree / params::opt;
}
if (threadIdx.x == 0) {
dst_block[params::degree] = result[params::degree];
}
}
template <typename Torus, class params>
__global__ void fill_radix_from_lsb_msb(Torus *result_blocks, Torus *lsb_blocks,
Torus *msb_blocks,
uint32_t glwe_dimension,
uint32_t lsb_count, uint32_t msb_count,
uint32_t num_blocks) {
size_t big_lwe_dimension = glwe_dimension * params::degree + 1;
size_t big_lwe_id = blockIdx.x;
size_t radix_id = big_lwe_id / num_blocks;
size_t block_id = big_lwe_id % num_blocks;
size_t lsb_block_id = block_id - radix_id;
size_t msb_block_id = block_id - radix_id - 1;
bool process_lsb = (radix_id <= block_id);
bool process_msb = (radix_id + 1 <= block_id);
auto cur_res_lsb_ct = &result_blocks[big_lwe_id * big_lwe_dimension];
auto cur_res_msb_ct =
&result_blocks[num_blocks * num_blocks * big_lwe_dimension +
big_lwe_id * big_lwe_dimension];
Torus *cur_lsb_radix = &lsb_blocks[(2 * num_blocks - radix_id + 1) *
radix_id / 2 * (params::degree + 1)];
Torus *cur_msb_radix = (process_msb)
? &msb_blocks[(2 * num_blocks - radix_id - 1) *
radix_id / 2 * (params::degree + 1)]
: nullptr;
Torus *cur_lsb_ct = (process_lsb)
? &cur_lsb_radix[lsb_block_id * (params::degree + 1)]
: nullptr;
Torus *cur_msb_ct = (process_msb)
? &cur_msb_radix[msb_block_id * (params::degree + 1)]
: nullptr;
size_t tid = threadIdx.x;
for (int i = 0; i < params::opt; i++) {
cur_res_lsb_ct[tid] = (process_lsb) ? cur_lsb_ct[tid] : 0;
cur_res_msb_ct[tid] = (process_msb) ? cur_msb_ct[tid] : 0;
tid += params::degree / params::opt;
}
if (threadIdx.x == 0) {
cur_res_lsb_ct[params::degree] =
(process_lsb) ? cur_lsb_ct[params::degree] : 0;
cur_res_msb_ct[params::degree] =
(process_msb) ? cur_msb_ct[params::degree] : 0;
}
}
template <typename Torus, typename STorus, class params>
__host__ void host_integer_mult_radix_kb(
cuda_stream_t *stream, uint64_t *radix_lwe_out, uint64_t *radix_lwe_left,
uint64_t *radix_lwe_right, void *bsk, uint64_t *ksk,
int_mul_memory<Torus> *mem_ptr, uint32_t num_blocks) {
auto glwe_dimension = mem_ptr->params.glwe_dimension;
auto polynomial_size = mem_ptr->params.polynomial_size;
auto lwe_dimension = mem_ptr->params.small_lwe_dimension;
auto message_modulus = mem_ptr->params.message_modulus;
auto carry_modulus = mem_ptr->params.carry_modulus;
int big_lwe_dimension = glwe_dimension * polynomial_size;
int big_lwe_size = big_lwe_dimension + 1;
// 'vector_result_lsb' contains blocks from all possible right shifts of
// radix_lwe_left, only nonzero blocks are kept
int lsb_vector_block_count = num_blocks * (num_blocks + 1) / 2;
// 'vector_result_msb' contains blocks from all possible shifts of
// radix_lwe_left except the last blocks of each shift. Only nonzero blocks
// are kept
int msb_vector_block_count = num_blocks * (num_blocks - 1) / 2;
// total number of blocks msb and lsb
int total_block_count = lsb_vector_block_count + msb_vector_block_count;
// buffer to keep all lsb and msb shifts
// for lsb all nonzero blocks of each right shifts are kept
// for 0 shift num_blocks blocks
// for 1 shift num_blocks - 1 blocks
// for num_blocks - 1 shift 1 block
// (num_blocks + 1) * num_blocks / 2 blocks
// for msb we don't keep track for last blocks so
// for 0 shift num_blocks - 1 blocks
// for 1 shift num_blocks - 2 blocks
// for num_blocks - 1 shift 0 blocks
// (num_blocks - 1) * num_blocks / 2 blocks
// in total num_blocks^2 blocks
// in each block three is big polynomial with
// glwe_dimension * polynomial_size + 1 coefficients
auto vector_result_sb = mem_ptr->vector_result_sb;
// buffer to keep lsb_vector + msb_vector
// addition will happen in full terms so there will be
// num_blocks terms and each term will have num_blocks block
// num_blocks^2 blocks in total
// and each blocks has big lwe ciphertext with
// glwe_dimension * polynomial_size + 1 coefficients
auto block_mul_res = mem_ptr->block_mul_res;
// buffer to keep keyswitch result of num_blocks^2 ciphertext
// in total it has num_blocks^2 small lwe ciphertexts with
// lwe_dimension +1 coefficients
auto small_lwe_vector = mem_ptr->small_lwe_vector;
// buffer to keep pbs result for num_blocks^2 lwe_ciphertext
// in total it has num_blocks^2 big lwe ciphertexts with
// glwe_dimension * polynomial_size + 1 coefficients
auto lwe_pbs_out_array = mem_ptr->lwe_pbs_out_array;
// it contains two lut, first for lsb extraction,
// second for msb extraction, with total length =
// 2 * (glwe_dimension + 1) * polynomial_size
auto luts_array = mem_ptr->luts_array;
// accumulator to extract message
// with length (glwe_dimension + 1) * polynomial_size
auto luts_message = mem_ptr->luts_message;
// accumulator to extract carry
// with length (glwe_dimension + 1) * polynomial_size
auto luts_carry = mem_ptr->luts_carry;
// to be used as default indexing
auto lwe_indexes = luts_array->lwe_indexes;
auto vector_result_lsb = &vector_result_sb[0];
auto vector_result_msb =
&vector_result_sb[lsb_vector_block_count *
(polynomial_size * glwe_dimension + 1)];
auto vector_lsb_rhs = &block_mul_res[0];
auto vector_msb_rhs = &block_mul_res[lsb_vector_block_count *
(polynomial_size * glwe_dimension + 1)];
dim3 grid(lsb_vector_block_count, 1, 1);
dim3 thds(params::degree / params::opt, 1, 1);
all_shifted_lhs_rhs<Torus, params><<<grid, thds, 0, stream->stream>>>(
radix_lwe_left, vector_result_lsb, vector_result_msb, radix_lwe_right,
vector_lsb_rhs, vector_msb_rhs, num_blocks);
integer_radix_apply_bivariate_lookup_table_kb<Torus>(
stream, block_mul_res, block_mul_res, vector_result_sb, bsk, ksk,
total_block_count, luts_array);
vector_result_lsb = &block_mul_res[0];
vector_result_msb = &block_mul_res[lsb_vector_block_count *
(polynomial_size * glwe_dimension + 1)];
fill_radix_from_lsb_msb<Torus, params>
<<<num_blocks * num_blocks, params::degree / params::opt, 0,
stream->stream>>>(vector_result_sb, vector_result_lsb,
vector_result_msb, glwe_dimension,
lsb_vector_block_count, msb_vector_block_count,
num_blocks);
auto new_blocks = block_mul_res;
auto old_blocks = vector_result_sb;
// amount of current radixes after block_mul
size_t r = 2 * num_blocks;
size_t total_modulus = message_modulus * carry_modulus;
size_t message_max = message_modulus - 1;
size_t chunk_size = (total_modulus - 1) / message_max;
size_t ch_amount = r / chunk_size;
int terms_degree[r * num_blocks];
int f_b[ch_amount];
int l_b[ch_amount];
for (int i = 0; i < num_blocks * num_blocks; i++) {
size_t r_id = i / num_blocks;
size_t b_id = i % num_blocks;
terms_degree[i] = (b_id >= r_id) ? 3 : 0;
}
auto terms_degree_msb = &terms_degree[num_blocks * num_blocks];
for (int i = 0; i < num_blocks * num_blocks; i++) {
size_t r_id = i / num_blocks;
size_t b_id = i % num_blocks;
terms_degree_msb[i] = (b_id > r_id) ? 2 : 0;
}
auto max_shared_memory = cuda_get_max_shared_memory(stream->gpu_index);
while (r > chunk_size) {
int cur_total_blocks = r * num_blocks;
ch_amount = r / chunk_size;
dim3 add_grid(ch_amount, num_blocks, 1);
size_t sm_size = big_lwe_size * sizeof(Torus);
cuda_memset_async(new_blocks, 0,
ch_amount * num_blocks * big_lwe_size * sizeof(Torus),
stream);
tree_add_chunks<Torus, params><<<add_grid, 256, sm_size, stream->stream>>>(
new_blocks, old_blocks, chunk_size, num_blocks);
for (int c_id = 0; c_id < ch_amount; c_id++) {
auto cur_chunk = &terms_degree[c_id * chunk_size * num_blocks];
int mx = 0;
int mn = num_blocks;
for (int r_id = 1; r_id < chunk_size; r_id++) {
auto cur_radix = &cur_chunk[r_id * num_blocks];
for (int i = 0; i < num_blocks; i++) {
if (cur_radix[i]) {
mn = min(mn, i);
mx = max(mx, i);
}
}
}
f_b[c_id] = mn;
l_b[c_id] = mx;
}
int total_copied = 0;
int message_count = 0;
int carry_count = 0;
compress_device_array_with_map<Torus>(stream, new_blocks, old_blocks, f_b,
l_b, num_blocks, ch_amount,
big_lwe_size, total_copied, true);
message_count = total_copied;
compress_device_array_with_map<Torus>(stream, new_blocks, old_blocks, f_b,
l_b, num_blocks, ch_amount,
big_lwe_size, total_copied, false);
carry_count = total_copied - message_count;
auto message_blocks_vector = old_blocks;
auto carry_blocks_vector =
&old_blocks[message_count * (glwe_dimension * polynomial_size + 1)];
cuda_keyswitch_lwe_ciphertext_vector(
stream, small_lwe_vector, lwe_indexes, old_blocks, lwe_indexes, ksk,
polynomial_size * glwe_dimension, lwe_dimension,
mem_ptr->params.ks_base_log, mem_ptr->params.ks_level, total_copied);
execute_pbs<Torus>(
stream, message_blocks_vector, lwe_indexes, luts_message->lut,
luts_message->lut_indexes, small_lwe_vector, lwe_indexes, bsk,
luts_message->pbs_buffer, glwe_dimension, lwe_dimension,
polynomial_size, mem_ptr->params.pbs_base_log,
mem_ptr->params.pbs_level, mem_ptr->params.grouping_factor,
message_count, 1, 0, max_shared_memory, mem_ptr->params.pbs_type);
execute_pbs<Torus>(stream, carry_blocks_vector, lwe_indexes,
luts_carry->lut, luts_carry->lut_indexes,
&small_lwe_vector[message_count * (lwe_dimension + 1)],
lwe_indexes, bsk, luts_carry->pbs_buffer,
glwe_dimension, lwe_dimension, polynomial_size,
mem_ptr->params.pbs_base_log, mem_ptr->params.pbs_level,
mem_ptr->params.grouping_factor, carry_count, 1, 0,
max_shared_memory, mem_ptr->params.pbs_type);
int rem_blocks = r % chunk_size * num_blocks;
int new_blocks_created = 2 * ch_amount * num_blocks;
int copy_size = rem_blocks * big_lwe_size * sizeof(Torus);
auto cur_dst = &new_blocks[new_blocks_created * big_lwe_size];
auto cur_src = &old_blocks[(cur_total_blocks - rem_blocks) * big_lwe_size];
cuda_memcpy_async_gpu_to_gpu(cur_dst, cur_src, copy_size, stream);
total_copied = 0;
int total_radix_copied = 0;
extract_message_carry_to_full_radix<Torus>(
stream, old_blocks, new_blocks, f_b, l_b, ch_amount, big_lwe_size,
total_copied, total_radix_copied, num_blocks, true);
extract_message_carry_to_full_radix<Torus>(
stream, old_blocks, new_blocks, f_b, l_b, ch_amount, big_lwe_size,
total_copied, total_radix_copied, num_blocks, false);
std::swap(new_blocks, old_blocks);
r = (new_blocks_created + rem_blocks) / num_blocks;
}
dim3 add_grid(1, num_blocks, 1);
size_t sm_size = big_lwe_size * sizeof(Torus);
cuda_memset_async(radix_lwe_out, 0, num_blocks * big_lwe_size * sizeof(Torus),
stream);
tree_add_chunks<Torus, params><<<add_grid, 256, sm_size, stream->stream>>>(
radix_lwe_out, old_blocks, r, num_blocks);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, vector_result_sb, radix_lwe_out, bsk, ksk, num_blocks,
luts_message);
integer_radix_apply_univariate_lookup_table_kb<Torus>(
stream, &block_mul_res[big_lwe_size], radix_lwe_out, bsk, ksk, num_blocks,
luts_carry);
cuda_memset_async(block_mul_res, 0, big_lwe_size * sizeof(Torus), stream);
host_addition(stream, radix_lwe_out, vector_result_sb, block_mul_res,
big_lwe_size, num_blocks);
host_propagate_single_carry_low_latency<Torus>(
stream, radix_lwe_out, mem_ptr->scp_mem, bsk, ksk, num_blocks);
}
template <typename Torus>
__host__ void scratch_cuda_integer_mult_radix_ciphertext_kb(
cuda_stream_t *stream, int_mul_memory<Torus> **mem_ptr,
uint32_t num_radix_blocks, int_radix_params params,
bool allocate_gpu_memory) {
*mem_ptr = new int_mul_memory<Torus>(stream, params, num_radix_blocks,
allocate_gpu_memory);
}
// Function to apply lookup table,
// It has two mode
// lsb_msb_mode == true - extracts lsb and msb
// lsb_msb_mode == false - extracts message and carry
template <typename Torus, typename STorus, class params>
void apply_lookup_table(Torus *input_ciphertexts, Torus *output_ciphertexts,
int_mul_memory<Torus> *mem_ptr, uint32_t glwe_dimension,
uint32_t lwe_dimension, uint32_t polynomial_size,
uint32_t pbs_base_log, uint32_t pbs_level,
uint32_t ks_base_log, uint32_t ks_level,
uint32_t grouping_factor,
uint32_t lsb_message_blocks_count,
uint32_t msb_carry_blocks_count,
uint32_t max_shared_memory, bool lsb_msb_mode) {
int total_blocks_count = lsb_message_blocks_count + msb_carry_blocks_count;
int gpu_n = mem_ptr->p2p_gpu_count;
if (total_blocks_count < gpu_n)
gpu_n = total_blocks_count;
int gpu_blocks_count = total_blocks_count / gpu_n;
int big_lwe_size = glwe_dimension * polynomial_size + 1;
// int small_lwe_size = lwe_dimension + 1;
#pragma omp parallel for num_threads(gpu_n)
for (int i = 0; i < gpu_n; i++) {
cudaSetDevice(i);
auto this_stream = mem_ptr->streams[i];
// Index where input and output blocks start for current gpu
int big_lwe_start_index = i * gpu_blocks_count * big_lwe_size;
// Last gpu might have extra blocks to process if total blocks number is not
// divisible by gpu_n
if (i == gpu_n - 1) {
gpu_blocks_count += total_blocks_count % gpu_n;
}
int can_access_peer;
cudaDeviceCanAccessPeer(&can_access_peer, i, 0);
if (i == 0) {
check_cuda_error(
cudaMemcpyAsync(mem_ptr->pbs_output_multi_gpu[i],
&input_ciphertexts[big_lwe_start_index],
gpu_blocks_count * big_lwe_size * sizeof(Torus),
cudaMemcpyDeviceToDevice, *this_stream));
} else if (can_access_peer) {
check_cuda_error(cudaMemcpyPeerAsync(
mem_ptr->pbs_output_multi_gpu[i], i,
&input_ciphertexts[big_lwe_start_index], 0,
gpu_blocks_count * big_lwe_size * sizeof(Torus), *this_stream));
} else {
// Uses host memory as middle ground
cuda_memcpy_async_to_cpu(mem_ptr->device_to_device_buffer[i],
&input_ciphertexts[big_lwe_start_index],
gpu_blocks_count * big_lwe_size * sizeof(Torus),
this_stream, i);
cuda_memcpy_async_to_gpu(
mem_ptr->pbs_output_multi_gpu[i], mem_ptr->device_to_device_buffer[i],
gpu_blocks_count * big_lwe_size * sizeof(Torus), this_stream, i);
}
// when lsb and msb have to be extracted
// for first lsb_count blocks we need lsb_acc
// for last msb_count blocks we need msb_acc
// when message and carry have tobe extracted
// for first message_count blocks we need message_acc
// for last carry_count blocks we need carry_acc
Torus *cur_lut_indexes;
if (lsb_msb_mode) {
cur_lut_indexes = (big_lwe_start_index < lsb_message_blocks_count)
? mem_ptr->lut_indexes_lsb_multi_gpu[i]
: mem_ptr->lut_indexes_msb_multi_gpu[i];
} else {
cur_lut_indexes = (big_lwe_start_index < lsb_message_blocks_count)
? mem_ptr->lut_indexes_message_multi_gpu[i]
: mem_ptr->lut_indexes_carry_multi_gpu[i];
}
// execute keyswitch on a current gpu with corresponding input and output
// blocks pbs_output_multi_gpu[i] is an input for keyswitch and
// pbs_input_multi_gpu[i] is an output for keyswitch
cuda_keyswitch_lwe_ciphertext_vector(
this_stream, i, mem_ptr->pbs_input_multi_gpu[i],
mem_ptr->pbs_output_multi_gpu[i], mem_ptr->ksk_multi_gpu[i],
polynomial_size * glwe_dimension, lwe_dimension, ks_base_log, ks_level,
gpu_blocks_count);
// execute pbs on a current gpu with corresponding input and output
cuda_multi_bit_pbs_lwe_ciphertext_vector_64(
this_stream, i, mem_ptr->pbs_output_multi_gpu[i],
mem_ptr->lut_multi_gpu[i], cur_lut_indexes,
mem_ptr->pbs_input_multi_gpu[i], mem_ptr->bsk_multi_gpu[i],
mem_ptr->pbs_buffer_multi_gpu[i], lwe_dimension, glwe_dimension,
polynomial_size, grouping_factor, pbs_base_log, pbs_level,
grouping_factor, gpu_blocks_count, 2, 0, max_shared_memory);
// lookup table is applied and now data from current gpu have to be copied
// back to gpu_0 in 'output_ciphertexts' buffer
if (i == 0) {
check_cuda_error(
cudaMemcpyAsync(&output_ciphertexts[big_lwe_start_index],
mem_ptr->pbs_output_multi_gpu[i],
gpu_blocks_count * big_lwe_size * sizeof(Torus),
cudaMemcpyDeviceToDevice, *this_stream));
} else if (can_access_peer) {
check_cuda_error(cudaMemcpyPeerAsync(
&output_ciphertexts[big_lwe_start_index], 0,
mem_ptr->pbs_output_multi_gpu[i], i,
gpu_blocks_count * big_lwe_size * sizeof(Torus), *this_stream));
} else {
// Uses host memory as middle ground
cuda_memcpy_async_to_cpu(
mem_ptr->device_to_device_buffer[i], mem_ptr->pbs_output_multi_gpu[i],
gpu_blocks_count * big_lwe_size * sizeof(Torus), this_stream, i);
cuda_memcpy_async_to_gpu(&output_ciphertexts[big_lwe_start_index],
mem_ptr->device_to_device_buffer[i],
gpu_blocks_count * big_lwe_size * sizeof(Torus),
this_stream, i);
}
}
}
template <typename T>
__global__ void device_small_scalar_radix_multiplication(T *output_lwe_array,
T *input_lwe_array,
T scalar,
uint32_t lwe_dimension,
uint32_t num_blocks) {
int index = blockIdx.x * blockDim.x + threadIdx.x;
int lwe_size = lwe_dimension + 1;
if (index < num_blocks * lwe_size) {
// Here we take advantage of the wrapping behaviour of uint
output_lwe_array[index] = input_lwe_array[index] * scalar;
}
}
template <typename T>
__host__ void host_integer_small_scalar_mult_radix(
cuda_stream_t *stream, T *output_lwe_array, T *input_lwe_array, T scalar,
uint32_t input_lwe_dimension, uint32_t input_lwe_ciphertext_count) {
cudaSetDevice(stream->gpu_index);
// lwe_size includes the presence of the body
// whereas lwe_dimension is the number of elements in the mask
int lwe_size = input_lwe_dimension + 1;
// Create a 1-dimensional grid of threads
int num_blocks = 0, num_threads = 0;
int num_entries = input_lwe_ciphertext_count * lwe_size;
getNumBlocksAndThreads(num_entries, 512, num_blocks, num_threads);
dim3 grid(num_blocks, 1, 1);
dim3 thds(num_threads, 1, 1);
device_small_scalar_radix_multiplication<<<grid, thds, 0, stream->stream>>>(
output_lwe_array, input_lwe_array, scalar, input_lwe_dimension,
input_lwe_ciphertext_count);
check_cuda_error(cudaGetLastError());
}
#endif

View File

@@ -0,0 +1,12 @@
#include "integer/negation.cuh"
void cuda_negate_integer_radix_ciphertext_64_inplace(
cuda_stream_t *stream, void *lwe_array, uint32_t lwe_dimension,
uint32_t lwe_ciphertext_count, uint32_t message_modulus,
uint32_t carry_modulus) {
host_integer_radix_negation(stream, static_cast<uint64_t *>(lwe_array),
static_cast<uint64_t *>(lwe_array), lwe_dimension,
lwe_ciphertext_count, message_modulus,
carry_modulus);
}

View File

@@ -0,0 +1,79 @@
#ifndef CUDA_INTEGER_NEGATE_CUH
#define CUDA_INTEGER_NEGATE_CUH
#ifdef __CDT_PARSER__
#undef __CUDA_RUNTIME_H__
#include <cuda_runtime.h>
#endif
#include "device.h"
#include "integer.h"
#include "utils/kernel_dimensions.cuh"
template <typename Torus>
__global__ void
device_integer_radix_negation(Torus *output, Torus *input, int32_t num_blocks,
uint64_t lwe_dimension, uint64_t message_modulus,
uint64_t carry_modulus, uint64_t delta) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < lwe_dimension + 1) {
bool is_body = (tid == lwe_dimension);
// z = ceil( degree / 2^p ) * 2^p
uint64_t z = (2 * message_modulus - 1) / message_modulus;
__syncthreads();
z *= message_modulus;
// (0,Delta*z) - ct
output[tid] = (is_body ? z * delta - input[tid] : -input[tid]);
for (int radix_block_id = 1; radix_block_id < num_blocks;
radix_block_id++) {
tid += (lwe_dimension + 1);
// Subtract z/B to the next ciphertext to compensate for the addition of z
uint64_t zb = z / message_modulus;
uint64_t encoded_zb = zb * delta;
__syncthreads();
// (0,Delta*z) - ct
output[tid] =
(is_body ? z * delta - (input[tid] + encoded_zb) : -input[tid]);
__syncthreads();
}
}
}
template <typename Torus>
__host__ void host_integer_radix_negation(cuda_stream_t *stream, Torus *output,
Torus *input, uint32_t lwe_dimension,
uint32_t input_lwe_ciphertext_count,
uint64_t message_modulus,
uint64_t carry_modulus) {
cudaSetDevice(stream->gpu_index);
// lwe_size includes the presence of the body
// whereas lwe_dimension is the number of elements in the mask
int lwe_size = lwe_dimension + 1;
// Create a 1-dimensional grid of threads
int num_blocks = 0, num_threads = 0;
int num_entries = lwe_size;
getNumBlocksAndThreads(num_entries, 512, num_blocks, num_threads);
dim3 grid(num_blocks, 1, 1);
dim3 thds(num_threads, 1, 1);
uint64_t shared_mem = input_lwe_ciphertext_count * sizeof(uint32_t);
// Value of the shift we multiply our messages by
// If message_modulus and carry_modulus are always powers of 2 we can simplify
// this
uint64_t delta = ((uint64_t)1 << 63) / (message_modulus * carry_modulus);
device_integer_radix_negation<<<grid, thds, shared_mem, stream->stream>>>(
output, input, input_lwe_ciphertext_count, lwe_dimension, message_modulus,
carry_modulus, delta);
check_cuda_error(cudaGetLastError());
}
#endif

View File

@@ -0,0 +1,12 @@
#include "integer/scalar_addition.cuh"
void cuda_scalar_addition_integer_radix_ciphertext_64_inplace(
cuda_stream_t *stream, void *lwe_array, void *scalar_input,
uint32_t lwe_dimension, uint32_t lwe_ciphertext_count,
uint32_t message_modulus, uint32_t carry_modulus) {
host_integer_radix_scalar_addition_inplace(
stream, static_cast<uint64_t *>(lwe_array),
static_cast<uint64_t *>(scalar_input), lwe_dimension,
lwe_ciphertext_count, message_modulus, carry_modulus);
}

Some files were not shown because too many files have changed in this diff Show More