Running against last remote commit would induce undesired behavior,
especially on pull-request approval.
For example a change in integer layer could occur in the
pull-request commits list but the changes aren't contained in the
last remote commit. Then, on approval, aws_tfhe_integer_tests.yml
workflow would be skipped although it should run regarding the
base commit.
This commit does 2 things:
* It adds to GpuIndex the invariant that the index corresponds
to a valid GPU, to do so, the inner u32 is made private
and new/try_new method are now used to construct a GpuIndex
these methods checks that the index is valid
* It makes GpuIndex transparent, allowing to safely cast a *const
GpuIndex to *const u32, this is to save same copies made to transform
Vec<GpuIndex> to Vec<u32> that was used to get a *const u32
BREAKING CHANGES: GpuIndex(some_value) is no longer valid and
GpuIndex::new(some_value) / GpuIndex::try_new(some_value) has to be
used
After a safe_serialize/safe_deserialize, the CompressedCiphertextList
was on Cpu. As the `get` method looked at the device of the data
and not the device of the server_key to know where computation
needs to happen, it meant that in this case decompressing using Gpu
was impossible, only Cpu was usable (as data was always onlu on Cpu)
The fix is twofold:
* First, when deserializing, the data will use the current serverkey
(if any) as a hint on where data should be placed
* the `get` method now uses the server_key to know where computations
needs to be done, which may incur a temporary copy/transfer on every
call to `get` if the device is not correct.
The API to move data has also been added
Note that this was not the case when using regular serialize/deserialize
as this would store the device, so that deserialize was able to restore
into the same device (hence why the test which use serialie/deserialize
did not fail). In hindsight, the ser/de impl should not save which
device the data originated from
- multiple operations are issued in parallel running independently on
different devices,
- tests will only run when more than 1 GPU is available,
- we only test ERC20-related operators: (overflow_) add/sub, cmp, and if_then_else.
- fix a bug in which the wrong GPU may be queried for the max shared memory
- If multiple streams are running split through multiple GPUs,
operations happening on a stream in GPU i should query GPU i about its
max shared memory,
- also fixes wrong indexing at rust side.