Commit Graph

474 Commits

Author SHA1 Message Date
pgardratzama
ca4159f123 fix(hpu): fix overflow flag of OVF_MUL & OVF_MULS, also update simulation HPU config 2025-10-07 10:14:43 +02:00
Arthur Meyre
e07f07c4c8 chore: bump tfhe-cuda-backend to 0.12.0 2025-10-06 13:26:54 +02:00
Arthur Meyre
81cc0c31b4 chore: constrain bytemuck < 1.24.0 as we don't have avx512 updated code 2025-10-06 13:24:16 +02:00
Enzo Di Maria
f0f3dd76eb feat(gpu): aes 128 2025-10-06 09:31:36 +02:00
Andrei Stoian
0604d237eb chore(gpu): multi-gpu debug target 2025-10-03 16:48:42 +02:00
Agnes Leroy
f9e876730a chore(gpu): remove support for drift noise reduction 2025-10-03 09:45:20 +02:00
pgardratzama
602c6faf8a chore(hpu): update hpu-backend dependencies, fix pcc 2025-10-02 13:20:36 +02:00
pgardratzama
563502a6a6 chore(hpu): update tfhe-hpu-backend version, readme and run-on-hpu doc 2025-10-02 13:20:36 +02:00
pgardratzama
5f30569452 fix(hpu): update AMC firmware in bitstream to lower polling period 2025-10-02 13:20:36 +02:00
pgardratzama
39b81a8ded feat(hpu): move to new bitstream at 400Mhz with GRAM_NB 3
- update SIMD_N and min_batch_size to 12 which seems to give better
  latency and ERC20 throughput
- support IOp on several lines in ami /proc file
- reduce amount of ERC_20_SIMD per batch in HLAPI bench
2025-10-02 13:20:36 +02:00
pgardratzama
da223b36b6 fix(hpu): reduce polling period of backend on iop ack file from 10 to 2us 2025-10-02 13:20:36 +02:00
JJ-hw
db16276715 chore(hpu): Remove all references to U55C, which is not supported anymore. 2025-10-02 13:20:36 +02:00
pgardratzama
a59742f518 fix(hpu): uuid comparison is now done in lower case from both value (metadata, ami read) 2025-10-02 13:20:36 +02:00
Arthur Meyre
9fdaa983e3 chore: fix october typos 2025-10-01 14:32:41 +02:00
Agnes Leroy
71b45c14da chore(gpu): refactor subset_first and subset 2025-09-30 12:21:39 +02:00
Beka Barbakadze
7549474aac feat(gpu): Implements optimized division algorithm for message_2_carry_2, when 4 or more gpus are used 2025-09-29 15:16:34 +02:00
Agnes Leroy
23d46ba2bc fix(gpu): fix oprf output degree 2025-09-29 08:33:25 +02:00
Agnes Leroy
daf0e79e4a fix(gpu): fix get oprf size on gpu 2025-09-29 08:33:25 +02:00
Arthur Meyre
c5ad73865c chore: prepare alpha.2
- bump tfhe-cuda-backend to 0.12.0-alpha.2
- bump tfhe to 1.4.0-alpha.2
2025-09-27 11:35:27 +02:00
Agnes Leroy
9aab79e23a chore(gpu): fix compilation warning 2025-09-26 17:04:17 +02:00
Agnes Leroy
f53c75636d chore(gpu): refactor oprf test, remove unused arg and fix multi-GPU for oprf 2025-09-26 13:19:34 +02:00
Arthur Meyre
ce63cabc05 chore: bump tfhe-cuda-backend to 0.12.0-alpha.1 2025-09-26 10:39:24 +02:00
JJ-hw
3680f796af feat(hpu): Now the mockup takes into account the field position from the regmap toml to generate its register read and write answers. 2025-09-25 14:00:07 +02:00
JJ-hw
3ded3fe7c9 fix(hpu): (From hcampos-zama) Missing a field. 2025-09-25 14:00:07 +02:00
Arthur Meyre
d60028c47c chore: bump tfhe-cuda-backend to 0.12.0-alpha.0 2025-09-24 15:57:30 +02:00
Agnes Leroy
4b0623da4a chore(gpu): remove unused variable 2025-09-22 16:36:34 +02:00
Guillermo Oyarzun
d415d47894 chore(gpu): remove unnecessary nvtx lib dependency 2025-09-22 16:34:57 +02:00
Nicolas Sarlin
4d02d3abb4 fix(hpu): clippy lint 2025-09-22 14:02:41 +02:00
Guillermo Oyarzun
022cb3b18a fix(gpu): avoid out of memory when benchmarking throughput 2025-09-19 14:44:12 +02:00
Agnes Leroy
fe6e81ff78 chore(gpu): post hackathon cleanup 2025-09-18 16:30:45 +02:00
Andrei Stoian
87c0d646a4 fix(gpu): coprocessor bench 2025-09-18 13:56:55 +02:00
Agnes Leroy
e5b39a6d4d fix(gpu): fix memory leak in multi-gpu calculations 2025-09-18 13:55:03 +02:00
Andrei Stoian
1dcc3c8c89 chore(gpu): structure to encapsulate streams 2025-09-18 09:43:17 +02:00
Pedro Alves
becd08db71 fix(gpu): fix an overflow that may happen when the user tries to allocate a huge amount of blocks 2025-09-16 16:17:32 -03:00
Pedro Alves
6b94872a00 fix(gpu): add an assert to be sure the carry part has correct size in expand 2025-09-15 12:57:11 -03:00
Pedro Alves
b2624d1a76 chore(gpu): refactor the indexing logic for the LWE expand 2025-09-11 13:10:18 -03:00
Pedro Alves
c78cc2d2e9 chore(gpu): add a benchmark for 128-bit multi-bit noise squashing
- Also, remove the lut indexes concept from the 128-bit multi-bit pbs. It's assumed not to exist by the entire backend (as it doesn't for classical PBS). So to keep it here would be a bit error prone.
2025-09-09 07:51:35 -03:00
Himess
6fde90ad9c chore(clap): Replace use of deprecated attributes
Replace deprecated #[clap(...)] attributes to #[arg]/#[command] and remove redundant use of value_parser
2025-09-09 09:35:59 +02:00
Agnes Leroy
5d70ae4232 fix(gpu): add missing broadcast lut 2025-09-09 08:47:53 +02:00
Guillermo Oyarzun
a3168eb1b5 feat(gpu): enable lut generation with preallocated buffers 2025-09-08 10:01:34 +02:00
pgardratzama
0a1651adf3 fix(hpu): update firmware in bitstream to allow SIMD operations 2025-09-05 10:42:36 +02:00
pgardratzama
2279d0deb8 chore(hpu): update hpu firmware (fix 2 bits operations issue) 2025-09-05 10:42:36 +02:00
Helder Campos
d3a867ecfe feat(hpu): High bandwidth HPU 2025-09-05 10:42:36 +02:00
Helder Campos
a83c92f28f feat(hpu): Soft Reset Support and fix some runtime registers 2025-09-05 10:42:36 +02:00
Helder Campos
3b48ef301e feat(hpu): Made two SIMD IOPs, ADD and ERC20. 2025-09-05 10:42:36 +02:00
Helder Campos
827a6e912c feat(hpu): Adding a massively parallel multiplier operation 2025-09-05 10:42:36 +02:00
Guillermo Oyarzun
eeccace7b3 fix(gpu): add missing syncs when releasing scalar ops and returning to old lut release 2025-09-05 09:53:00 +02:00
Guillermo Oyarzun
60d137de6e feat(gpu): use mempools to optimize mem reuse 2025-09-04 13:23:18 +02:00
Guillermo Oyarzun
c2e816a86c fix(gpu): change mininum number of elements in benches 2025-09-04 11:03:27 +02:00
Guillermo Oyarzun
baad6a6b49 feat(gpu): change broadcast lut to communicate the minimum possible 2025-09-03 15:20:58 +02:00