feat(gpu): make PBS and ks execution parallel over available GPUs

Only GPUs with peer access to GPU 0 can be used for this at the moment.
Peer to peer copy is used if different GPUs are passed to memcpy_gpu_to_gpu
A gpu offset is passed as new parameter to pbs and keyswitch to adjust the input/output index user per gpu.
bsk and ksk are copied to all GPUs.
The CI now tests & runs benchmarks on p3.8xlarge aws instances
This commit is contained in:
Agnes Leroy
2024-04-24 17:38:28 +02:00
committed by Agnès Leroy
parent 5f0ca54150
commit b8991229ec
82 changed files with 3406 additions and 2390 deletions

View File

@@ -459,7 +459,7 @@ test_core_crypto_gpu: install_rs_build_toolchain
.PHONY: test_integer_gpu # Run the tests of the integer module including experimental on the gpu backend
test_integer_gpu: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- integer::gpu::server_key::
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- integer::gpu::server_key:: --test-threads=6
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --doc --profile $(CARGO_PROFILE) \
--features=$(TARGET_ARCH_FEATURE),integer,gpu -p $(TFHE_SPEC) -- integer::gpu::server_key::