pgardratzama
ca4159f123
fix(hpu): fix overflow flag of OVF_MUL & OVF_MULS, also update simulation HPU config
2025-10-07 10:14:43 +02:00
Arthur Meyre
e07f07c4c8
chore: bump tfhe-cuda-backend to 0.12.0
2025-10-06 13:26:54 +02:00
Arthur Meyre
81cc0c31b4
chore: constrain bytemuck < 1.24.0 as we don't have avx512 updated code
2025-10-06 13:24:16 +02:00
Enzo Di Maria
f0f3dd76eb
feat(gpu): aes 128
2025-10-06 09:31:36 +02:00
Andrei Stoian
0604d237eb
chore(gpu): multi-gpu debug target
2025-10-03 16:48:42 +02:00
Agnes Leroy
f9e876730a
chore(gpu): remove support for drift noise reduction
2025-10-03 09:45:20 +02:00
pgardratzama
602c6faf8a
chore(hpu): update hpu-backend dependencies, fix pcc
2025-10-02 13:20:36 +02:00
pgardratzama
563502a6a6
chore(hpu): update tfhe-hpu-backend version, readme and run-on-hpu doc
2025-10-02 13:20:36 +02:00
pgardratzama
5f30569452
fix(hpu): update AMC firmware in bitstream to lower polling period
2025-10-02 13:20:36 +02:00
pgardratzama
39b81a8ded
feat(hpu): move to new bitstream at 400Mhz with GRAM_NB 3
...
- update SIMD_N and min_batch_size to 12 which seems to give better
latency and ERC20 throughput
- support IOp on several lines in ami /proc file
- reduce amount of ERC_20_SIMD per batch in HLAPI bench
2025-10-02 13:20:36 +02:00
pgardratzama
da223b36b6
fix(hpu): reduce polling period of backend on iop ack file from 10 to 2us
2025-10-02 13:20:36 +02:00
JJ-hw
db16276715
chore(hpu): Remove all references to U55C, which is not supported anymore.
2025-10-02 13:20:36 +02:00
pgardratzama
a59742f518
fix(hpu): uuid comparison is now done in lower case from both value (metadata, ami read)
2025-10-02 13:20:36 +02:00
Arthur Meyre
9fdaa983e3
chore: fix october typos
2025-10-01 14:32:41 +02:00
Agnes Leroy
71b45c14da
chore(gpu): refactor subset_first and subset
2025-09-30 12:21:39 +02:00
Beka Barbakadze
7549474aac
feat(gpu): Implements optimized division algorithm for message_2_carry_2, when 4 or more gpus are used
2025-09-29 15:16:34 +02:00
Agnes Leroy
23d46ba2bc
fix(gpu): fix oprf output degree
2025-09-29 08:33:25 +02:00
Agnes Leroy
daf0e79e4a
fix(gpu): fix get oprf size on gpu
2025-09-29 08:33:25 +02:00
Arthur Meyre
c5ad73865c
chore: prepare alpha.2
...
- bump tfhe-cuda-backend to 0.12.0-alpha.2
- bump tfhe to 1.4.0-alpha.2
2025-09-27 11:35:27 +02:00
Agnes Leroy
9aab79e23a
chore(gpu): fix compilation warning
2025-09-26 17:04:17 +02:00
Agnes Leroy
f53c75636d
chore(gpu): refactor oprf test, remove unused arg and fix multi-GPU for oprf
2025-09-26 13:19:34 +02:00
Arthur Meyre
ce63cabc05
chore: bump tfhe-cuda-backend to 0.12.0-alpha.1
2025-09-26 10:39:24 +02:00
JJ-hw
3680f796af
feat(hpu): Now the mockup takes into account the field position from the regmap toml to generate its register read and write answers.
2025-09-25 14:00:07 +02:00
JJ-hw
3ded3fe7c9
fix(hpu): (From hcampos-zama) Missing a field.
2025-09-25 14:00:07 +02:00
Arthur Meyre
d60028c47c
chore: bump tfhe-cuda-backend to 0.12.0-alpha.0
2025-09-24 15:57:30 +02:00
Agnes Leroy
4b0623da4a
chore(gpu): remove unused variable
2025-09-22 16:36:34 +02:00
Guillermo Oyarzun
d415d47894
chore(gpu): remove unnecessary nvtx lib dependency
2025-09-22 16:34:57 +02:00
Nicolas Sarlin
4d02d3abb4
fix(hpu): clippy lint
2025-09-22 14:02:41 +02:00
Guillermo Oyarzun
022cb3b18a
fix(gpu): avoid out of memory when benchmarking throughput
2025-09-19 14:44:12 +02:00
Agnes Leroy
fe6e81ff78
chore(gpu): post hackathon cleanup
2025-09-18 16:30:45 +02:00
Andrei Stoian
87c0d646a4
fix(gpu): coprocessor bench
2025-09-18 13:56:55 +02:00
Agnes Leroy
e5b39a6d4d
fix(gpu): fix memory leak in multi-gpu calculations
2025-09-18 13:55:03 +02:00
Andrei Stoian
1dcc3c8c89
chore(gpu): structure to encapsulate streams
2025-09-18 09:43:17 +02:00
Pedro Alves
becd08db71
fix(gpu): fix an overflow that may happen when the user tries to allocate a huge amount of blocks
2025-09-16 16:17:32 -03:00
Pedro Alves
6b94872a00
fix(gpu): add an assert to be sure the carry part has correct size in expand
2025-09-15 12:57:11 -03:00
Pedro Alves
b2624d1a76
chore(gpu): refactor the indexing logic for the LWE expand
2025-09-11 13:10:18 -03:00
Pedro Alves
c78cc2d2e9
chore(gpu): add a benchmark for 128-bit multi-bit noise squashing
...
- Also, remove the lut indexes concept from the 128-bit multi-bit pbs. It's assumed not to exist by the entire backend (as it doesn't for classical PBS). So to keep it here would be a bit error prone.
2025-09-09 07:51:35 -03:00
Himess
6fde90ad9c
chore(clap): Replace use of deprecated attributes
...
Replace deprecated #[clap(...)] attributes to #[arg]/#[command] and remove redundant use of value_parser
2025-09-09 09:35:59 +02:00
Agnes Leroy
5d70ae4232
fix(gpu): add missing broadcast lut
2025-09-09 08:47:53 +02:00
Guillermo Oyarzun
a3168eb1b5
feat(gpu): enable lut generation with preallocated buffers
2025-09-08 10:01:34 +02:00
pgardratzama
0a1651adf3
fix(hpu): update firmware in bitstream to allow SIMD operations
2025-09-05 10:42:36 +02:00
pgardratzama
2279d0deb8
chore(hpu): update hpu firmware (fix 2 bits operations issue)
2025-09-05 10:42:36 +02:00
Helder Campos
d3a867ecfe
feat(hpu): High bandwidth HPU
2025-09-05 10:42:36 +02:00
Helder Campos
a83c92f28f
feat(hpu): Soft Reset Support and fix some runtime registers
2025-09-05 10:42:36 +02:00
Helder Campos
3b48ef301e
feat(hpu): Made two SIMD IOPs, ADD and ERC20.
2025-09-05 10:42:36 +02:00
Helder Campos
827a6e912c
feat(hpu): Adding a massively parallel multiplier operation
2025-09-05 10:42:36 +02:00
Guillermo Oyarzun
eeccace7b3
fix(gpu): add missing syncs when releasing scalar ops and returning to old lut release
2025-09-05 09:53:00 +02:00
Guillermo Oyarzun
60d137de6e
feat(gpu): use mempools to optimize mem reuse
2025-09-04 13:23:18 +02:00
Guillermo Oyarzun
c2e816a86c
fix(gpu): change mininum number of elements in benches
2025-09-04 11:03:27 +02:00
Guillermo Oyarzun
baad6a6b49
feat(gpu): change broadcast lut to communicate the minimum possible
2025-09-03 15:20:58 +02:00