tfhe-rs

mirror of https://github.com/zama-ai/tfhe-rs.git synced 2026-01-08 22:28:01 -05:00

Author	SHA1	Message	Date
Andrei Stoian	e43528db71	feat(gpu): support keyswitch 64/32 in PBS	2026-01-05 09:48:00 +01:00
Andrei Stoian	78d1ce18c1	feat(gpu): support keyswitch 64/32	2025-12-12 22:01:49 +01:00
Andrei Stoian	e2063c8ef4	chore(gpu): bench KS latency batches	2025-11-27 17:32:44 +01:00
Arthur Meyre	9fdaa983e3	chore: fix october typos	2025-10-01 14:32:41 +02:00
Andrei Stoian	1dcc3c8c89	chore(gpu): structure to encapsulate streams	2025-09-18 09:43:17 +02:00
Pedro Alves	57ea3e3e88	chore(gpu): refactor the entry points for PBS in the backend	2025-08-29 16:46:27 -03:00
Andrei Stoian	71f427de9e	chore(gpu): add assert macro	2025-08-27 10:32:43 +02:00
cryptoraph	d78266e141	fix(cuda): correct typo in keyswitch error message	2025-07-28 17:09:56 +02:00
Agnes Leroy	48dfeb21dc	chore(gpu): refactor size tracker to avoid future bugs	2025-07-04 14:37:02 +01:00
Guillermo Oyarzun	981083360e	feat(gpu): increase keyswitch occupancy	2025-07-01 09:54:14 +02:00
Agnes Leroy	8a2d93aaa8	fix(gpu): compression memory check bug, size computation was incorrect	2025-06-20 15:45:01 +02:00
Pedro Alves	53845b298a	fix(gpu): fix the packing keyswitch buffer not being allocated on large parameter sets	2025-06-11 08:58:09 +02:00
Agnes Leroy	9eaa77ddef	feat(gpu): make all scratch functions return the amount of memory consumed for temporary buffers	2025-04-30 10:48:03 +02:00
Guillermo Oyarzun	0f44ffdf30	fix(gpu): enable larger number of samples in the keyswitch	2025-02-17 19:34:26 -03:00
Pedro Alves	3c88574a52	chore(gpu): encapsulate cudaSetDevice	2025-01-31 09:08:30 +01:00
Andrei Stoian	298fd66631	feat(gpu): optimize packing keyswitch on gpu	2025-01-13 09:18:53 -03:00
Andrei Stoian	2c8f0ce7de	feat(gpu): optimize packing keyswitch in ML special case	2024-12-23 10:32:23 -03:00
Arthur Meyre	d28040342c	chore(gpu): use same balanced decomposition code as in the CPU code	2024-11-13 14:26:13 +01:00
Arthur Meyre	615ed3d5db	refactor(tfhe)!: update key level order for better performance - use natural order for decomposition levels in bsk co-authored-by: Agnes Leroy <agnes.leroy@zama.ai>	2024-11-05 17:23:57 +01:00
Arthur Meyre	5a54cf678f	chore(data)!: breaking data changes for future compatibility - invert the LweKeyswitchKey level order and propagate change - remove dependency on unsupported wopbs keys for the HL keys	2024-10-22 10:23:21 +02:00
Guillermo Oyarzun	d780276ae6	fix(gpu): add template parameter to packing keyswitch calls	2024-10-16 09:30:38 +02:00
Agnes Leroy	e698d18242	chore(gpu): automatically generate rust bindings for cuda functions, except device.cu	2024-10-14 17:07:57 +02:00
Pedro Alves	faf200218b	chore(gpu): add checks to ensure limits for compression	2024-09-19 15:57:16 -03:00
Pedro Alves	fe5641ef6d	feat(gpu): implement CUDA-based Radix Integer compression and public functional packing keyswitch	2024-08-16 15:44:34 -03:00
Agnes Leroy	d9eca01631	fix(gpu): dispatch/gather inputs and outputs to the ks and pbs on all GPUs	2024-07-23 08:48:48 +02:00
Guillermo Oyarzun	c1fcd95d72	refactor(gpu): add restrict keyword	2024-07-19 13:08:39 +02:00
Agnes Leroy	b8991229ec	feat(gpu): make PBS and ks execution parallel over available GPUs Only GPUs with peer access to GPU 0 can be used for this at the moment. Peer to peer copy is used if different GPUs are passed to memcpy_gpu_to_gpu A gpu offset is passed as new parameter to pbs and keyswitch to adjust the input/output index user per gpu. bsk and ksk are copied to all GPUs. The CI now tests & runs benchmarks on p3.8xlarge aws instances	2024-06-10 15:05:42 +02:00
Guillermo Oyarzun	019efb7fef	chore(gpu): parallelize keyswitch further	2024-06-05 11:23:53 +02:00
Agnes Leroy	fedd1ca7b2	chore(gpu): change the number of threads used in the keyswitch	2024-05-24 12:03:53 +02:00
Agnes Leroy	4f29db404c	feat(gpu): prepare code base for multi-gpu support	2024-05-06 09:41:54 +02:00
Agnes Leroy	c1c56ab770	fix(gpu): fix memory bug in multi-bit PBS	2024-03-06 14:18:29 +01:00
Agnes Leroy	548f2e5d05	chore(gpu): fix gpu package for publication	2024-01-22 15:29:26 +01:00

32 Commits