chore: update CODEOWNERS file

chore(bench): code refactor and automation for hlapi
feat(hpu): new HPU bitstream RTL v2.2
2026-01-11 07:38:08 -05:00 · 2026-01-09 16:12:50 +01:00 · 2026-01-09 16:09:27 +01:00 · 2026-01-09 15:25:35 +01:00 · 2026-01-09 15:19:08 +01:00 · 2026-01-09 15:19:08 +01:00
275 changed files with 8145 additions and 2884 deletions
--- a/.cargo/audit.toml
+++ b/.cargo/audit.toml
@@ -2,6 +2,8 @@
 ignore = [
    # Ignoring unmaintained 'paste' advisory as it is a widely used, low-risk build dependency.
    "RUSTSEC-2024-0436",
+    # Ignoring unmaintained 'bincode' crate. Getting rid of it would be too complex on the short term.
+    "RUSTSEC-2025-0141",
 ]

 [output]
--- a/.github/actions/gpu_setup/action.yml
+++ b/.github/actions/gpu_setup/action.yml
@@ -23,6 +23,8 @@ runs:
        echo "${CMAKE_SCRIPT_SHA} cmake-${CMAKE_VERSION}-linux-x86_64.sh" > checksum
        sha256sum -c checksum
        sudo bash cmake-"${CMAKE_VERSION}"-linux-x86_64.sh --skip-license --prefix=/usr/ --exclude-subdir
+        sudo apt-get clean
+        sudo rm -rf /var/lib/apt/lists/*
        sudo apt update
        sudo apt remove -y unattended-upgrades
        sudo apt install -y cmake-format libclang-dev
--- a/.github/workflows/aws_tfhe_backward_compat_tests.yml
+++ b/.github/workflows/aws_tfhe_backward_compat_tests.yml
@@ -66,7 +66,7 @@ jobs:
      cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'true' # Needed to pull lfs data
          token: ${{ env.CHECKOUT_TOKEN }}
@@ -80,7 +80,7 @@ jobs:

      - name: Retrieve data from cache
        id: retrieve-data-cache
-        uses: actions/cache/restore@0057852bfaa89a56745cba8c7296529d2fc39830 #v4.3.0
+        uses: actions/cache/restore@9255dc7a253b0ccc959486e2bca901246202afeb #v5.0.1
        with:
          path: |
            utils/tfhe-backward-compat-data/**/*.cbor
@@ -109,7 +109,7 @@ jobs:
      - name: Store data in cache
        if: steps.retrieve-data-cache.outputs.cache-hit != 'true'
        continue-on-error: true
-        uses: actions/cache/save@0057852bfaa89a56745cba8c7296529d2fc39830 #v4.3.0
+        uses: actions/cache/save@9255dc7a253b0ccc959486e2bca901246202afeb #v5.0.1
        with:
          path: |
            utils/tfhe-backward-compat-data/**/*.cbor
--- a/.github/workflows/aws_tfhe_fast_tests.yml
+++ b/.github/workflows/aws_tfhe_fast_tests.yml
@@ -63,7 +63,7 @@ jobs:
      any_file_changed: ${{ env.IS_PULL_REQUEST == 'false' || steps.aggregated-changes.outputs.any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -71,7 +71,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            dependencies:
@@ -171,7 +171,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
@@ -219,7 +219,7 @@ jobs:

      - name: Node cache restoration
        id: node-cache
-        uses: actions/cache/restore@0057852bfaa89a56745cba8c7296529d2fc39830 #v4.3.0
+        uses: actions/cache/restore@9255dc7a253b0ccc959486e2bca901246202afeb #v5.0.1
        with:
          path: |
            ~/.nvm
@@ -232,7 +232,7 @@ jobs:
          make install_node

      - name: Node cache save
-        uses: actions/cache/save@0057852bfaa89a56745cba8c7296529d2fc39830 #v4.3.0
+        uses: actions/cache/save@9255dc7a253b0ccc959486e2bca901246202afeb #v5.0.1
        if: steps.node-cache.outputs.cache-hit != 'true'
        with:
          path: |
--- a/.github/workflows/aws_tfhe_integer_tests.yml
+++ b/.github/workflows/aws_tfhe_integer_tests.yml
@@ -50,7 +50,7 @@ jobs:
        steps.changed-files.outputs.integer_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -58,7 +58,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            integer:
@@ -112,7 +112,7 @@ jobs:
    timeout-minutes: 480 # 8 hours
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: "false"
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/aws_tfhe_noise_checks.yml
+++ b/.github/workflows/aws_tfhe_noise_checks.yml
@@ -60,7 +60,7 @@ jobs:
    timeout-minutes: 1440
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/aws_tfhe_signed_integer_tests.yml
+++ b/.github/workflows/aws_tfhe_signed_integer_tests.yml
@@ -51,7 +51,7 @@ jobs:
        steps.changed-files.outputs.integer_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -59,7 +59,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            integer:
@@ -112,7 +112,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: "false"
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/aws_tfhe_tests.yml
+++ b/.github/workflows/aws_tfhe_tests.yml
@@ -72,7 +72,7 @@ jobs:
      any_file_changed: ${{ env.IS_PULL_REQUEST == 'false' || steps.aggregated-changes.outputs.any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -80,7 +80,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            dependencies:
@@ -182,7 +182,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/aws_tfhe_wasm_tests.yml
+++ b/.github/workflows/aws_tfhe_wasm_tests.yml
@@ -64,7 +64,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
@@ -80,7 +80,7 @@ jobs:

      - name: Node cache restoration
        id: node-cache
-        uses: actions/cache/restore@0057852bfaa89a56745cba8c7296529d2fc39830 #v4.3.0
+        uses: actions/cache/restore@9255dc7a253b0ccc959486e2bca901246202afeb #v5.0.1
        with:
          path: |
            ~/.nvm
@@ -93,7 +93,7 @@ jobs:
          make install_node

      - name: Node cache save
-        uses: actions/cache/save@0057852bfaa89a56745cba8c7296529d2fc39830 #v4.3.0
+        uses: actions/cache/save@9255dc7a253b0ccc959486e2bca901246202afeb #v5.0.1
        if: steps.node-cache.outputs.cache-hit != 'true'
        with:
          path: |
--- a/.github/workflows/benchmark_cpu_common.yml
+++ b/.github/workflows/benchmark_cpu_common.yml
@@ -149,7 +149,7 @@ jobs:
        params_type: ${{ fromJSON(needs.prepare-matrix.outputs.params_type) }}
    steps:
      - name: Checkout tfhe-rs repo with tags
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -223,13 +223,13 @@ jobs:
          results_type: ${{ inputs.additional_results_type }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}_${{ matrix.bench_type }}_${{ matrix.params_type }}
          path: ${{ env.RESULTS_FILENAME }}

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
--- a/.github/workflows/benchmark_ct_key_sizes.yml
+++ b/.github/workflows/benchmark_ct_key_sizes.yml
@@ -49,7 +49,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs repo with tags
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -99,13 +99,13 @@ jobs:
          --append-results

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_ct_key_sizes
          path: ${{ env.RESULTS_FILENAME }}

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
--- a/.github/workflows/benchmark_documentation.yml
+++ b/.github/workflows/benchmark_documentation.yml
@@ -8,8 +8,13 @@ on:
        description: "Run CPU benchmarks"
        type: boolean
        default: true
-      run-gpu-benchmarks:
-        description: "Run GPU benchmarks"
+      # GPU benchmarks are split because of resource scarcity.
+      run-gpu-integer-benchmarks:
+        description: "Run GPU integer benchmarks"
+        type: boolean
+        default: true
+      run-gpu-core-crypto-benchmarks:
+        description: "Run GPU core-crypto benchmarks"
        type: boolean
        default: true
      run-hpu-benchmarks:
@@ -52,7 +57,7 @@ jobs:
  run-benchmarks-gpu-integer:
    name: benchmark_documentation/run-benchmarks-gpu-integer
    uses: ./.github/workflows/benchmark_gpu_common.yml
-    if: inputs.run-gpu-benchmarks
+    if: inputs.run-gpu-integer-benchmarks
    with:
      profile: multi-h100-sxm5
      hardware_name: n3-H100-SXM5x8
@@ -113,7 +118,7 @@ jobs:
  run-benchmarks-gpu-core-crypto:
    name: benchmark_documentation/run-benchmarks-gpu-core-crypto
    uses: ./.github/workflows/benchmark_gpu_common.yml
-    if: inputs.run-gpu-benchmarks
+    if: inputs.run-gpu-core-crypto-benchmarks
    with:
      profile: multi-h100-sxm5
      hardware_name: n3-H100-SXM5x8
@@ -133,7 +138,7 @@ jobs:
  generate-svgs-with-benchmarks-run:
    name: benchmark-documentation/generate-svgs-with-benchmarks-run
    if: ${{ always() &&
-      (inputs.run-cpu-benchmarks || inputs.run-gpu-benchmarks ||inputs.run-hpu-benchmarks) &&
+      (inputs.run-cpu-benchmarks || inputs.run-gpu-integer-benchmarks || inputs.run-gpu-core-crypto-benchmarks ||inputs.run-hpu-benchmarks) &&
      inputs.generate-svgs }}
    needs: [
      run-benchmarks-cpu-integer, run-benchmarks-gpu-integer, run-benchmarks-hpu-integer,
@@ -143,7 +148,7 @@ jobs:
    with:
      time_span_days: 5
      generate-cpu-svgs: ${{ inputs.run-cpu-benchmarks }}
-      generate-gpu-svgs: ${{ inputs.run-gpu-benchmarks }}
+      generate-gpu-svgs: ${{ inputs.run-gpu-integer-benchmarks || inputs.run-gpu-core-crypto-benchmarks }}
      generate-hpu-svgs: ${{ inputs.run-hpu-benchmarks }}
    secrets:
      DATA_EXTRACTOR_DATABASE_USER: ${{ secrets.DATA_EXTRACTOR_DATABASE_USER }}
@@ -152,7 +157,7 @@ jobs:

  generate-svgs-without-benchmarks-run:
    name: benchmark-documentation/generate-svgs-without-benchmarks-run
-    if: ${{ !(inputs.run-cpu-benchmarks || inputs.run-gpu-benchmarks || inputs.run-hpu-benchmarks) &&
+    if: ${{ !(inputs.run-cpu-benchmarks || inputs.run-gpu-integer-benchmarks || inputs.run-gpu-core-crypto-benchmarks || inputs.run-hpu-benchmarks) &&
      inputs.generate-svgs }}
    uses: ./.github/workflows/generate_svgs.yml
    with:
@@ -175,12 +180,12 @@ jobs:
      PATH_TO_DOC_ASSETS: tfhe/docs/.gitbook/assets
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'

      - name: Download SVG tables
-        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
        with:
          path: svg_tables
          merge-multiple: 'true'
@@ -198,7 +203,7 @@ jobs:
          echo "date=$(date '+%g_%m_%d_%Hh%Mm%Ss')" >> "${GITHUB_OUTPUT}"

      - name: Create pull-request
-        uses: peter-evans/create-pull-request@84ae59a2cdc2258d6fa0732dd66352dddae2a412 # v7.0.9
+        uses: peter-evans/create-pull-request@98357b18bf14b5342f975ff684046ec3b2a07725 # v8.0.0
        with:
          sign-commits: true # Commit will be signed by github-actions bot
          add-paths: ${{ env.PATH_TO_DOC_ASSETS }}/*.svg
--- a/.github/workflows/benchmark_gpu_4090.yml
+++ b/.github/workflows/benchmark_gpu_4090.yml
@@ -40,7 +40,7 @@ jobs:
    timeout-minutes: 1440 # 24 hours
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -63,7 +63,7 @@ jobs:
          toolchain: nightly

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
@@ -89,7 +89,7 @@ jobs:
          REF_NAME: ${{ github.ref_name }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_integer_multi_bit_gpu_default
          path: ${{ env.RESULTS_FILENAME }}
@@ -123,7 +123,7 @@ jobs:

    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -146,7 +146,7 @@ jobs:
          toolchain: nightly

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
@@ -173,7 +173,7 @@ jobs:
          REF_NAME: ${{ github.ref_name }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_core_crypto
          path: ${{ env.RESULTS_FILENAME }}
--- a/.github/workflows/benchmark_gpu_common.yml
+++ b/.github/workflows/benchmark_gpu_common.yml
@@ -175,7 +175,7 @@ jobs:
            gcc: 11
    steps:
      - name: Checkout tfhe-rs repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -209,7 +209,7 @@ jobs:
      CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
    steps:
      - name: Checkout tfhe-rs repo with tags
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -281,13 +281,13 @@ jobs:
          BENCH_TYPE: ${{ matrix.bench_type }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_${{ matrix.command }}_${{ matrix.op_flavor }}_${{ inputs.profile }}_${{ matrix.bench_type }}_${{ matrix.params_type }}
          path: ${{ env.RESULTS_FILENAME }}

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
--- a/.github/workflows/benchmark_gpu_coprocessor.yml
+++ b/.github/workflows/benchmark_gpu_coprocessor.yml
@@ -130,7 +130,7 @@ jobs:
          git lfs install

      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          path: tfhe-rs
          persist-credentials: false
@@ -141,7 +141,7 @@ jobs:
          ls

      - name: Checkout fhevm
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          repository: zama-ai/fhevm
          persist-credentials: 'false'
@@ -192,10 +192,10 @@ jobs:
          cargo install sqlx-cli

      - name: Install foundry
-        uses: foundry-rs/foundry-toolchain@50d5a8956f2e319df19e6b57539d7e2acb9f8c1e
+        uses: foundry-rs/foundry-toolchain@8b0419c685ef46cb79ec93fbdc131174afceb730

      - name: Cache cargo
-        uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
+        uses: actions/cache@9255dc7a253b0ccc959486e2bca901246202afeb # v5.0.1
        with:
          path: |
            ~/.cargo/registry
@@ -223,7 +223,7 @@ jobs:
        working-directory: fhevm/coprocessor/fhevm-engine/tfhe-worker

      - name: Use Node.js
-        uses: actions/setup-node@2028fbc5c25fe9cf00d9f06a71cc4710d4507903 # v6.0.0
+        uses: actions/setup-node@395ad3262231945c25e8478fd5baf05154b1d79f # v6.1.0
        with:
          node-version: 20.x

@@ -262,7 +262,7 @@ jobs:
      - name: Upload profile artifact
        env:
          REPORT_NAME: ${{ steps.nsys_profile_name.outputs.profile }}
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ env.REPORT_NAME }}
          path: fhevm/coprocessor/fhevm-engine/tfhe-worker/${{ env.REPORT_NAME }}
@@ -293,13 +293,13 @@ jobs:
        working-directory: fhevm/

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${COMMIT_SHA}_${BENCHMARKS}_${{ needs.parse-inputs.outputs.profile }}
          path: fhevm/$${{ env.RESULTS_FILENAME }}

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
--- a/.github/workflows/benchmark_hpu_common.yml
+++ b/.github/workflows/benchmark_hpu_common.yml
@@ -126,7 +126,7 @@ jobs:
          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}

      - name: Checkout tfhe-rs repo with tags
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -185,13 +185,13 @@ jobs:
          BENCH_TYPE: ${{ matrix.bench_type }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_${{ matrix.bench_type }}_integer_benchmarks
          path: ${{ env.RESULTS_FILENAME }}

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
--- a/.github/workflows/benchmark_perf_regression.yml
+++ b/.github/workflows/benchmark_perf_regression.yml
@@ -50,7 +50,7 @@ jobs:
      pull-requests: write # Needed to write a comment in a pull-request
    steps:
      - name: Checkout tfhe-rs repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
@@ -164,7 +164,7 @@ jobs:
            gcc: 11
    steps:
      - name: Checkout tfhe-rs repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
@@ -191,7 +191,7 @@ jobs:
        command: ${{ fromJson(needs.prepare-benchmarks.outputs.commands) }}
    steps:
      - name: Checkout tfhe-rs repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0  # Needed to get commit hash
          persist-credentials: 'false'
@@ -245,7 +245,7 @@ jobs:
          toolchain: nightly

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
@@ -280,7 +280,7 @@ jobs:
          BENCH_TYPE: ${{ env.__TFHE_RS_BENCH_TYPE }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_regression_${{ env.RESULTS_FILE_SHA }} # RESULT_FILE_SHA is needed to avoid collision between matrix.command runs
          path: ${{ env.RESULTS_FILENAME }}
@@ -305,7 +305,7 @@ jobs:
      REF_NAME: ${{ github.head_ref || github.ref_name }}
    steps:
      - name: Checkout tfhe-rs repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
--- a/.github/workflows/benchmark_tfhe_fft.yml
+++ b/.github/workflows/benchmark_tfhe_fft.yml
@@ -55,7 +55,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs repo with tags
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -96,13 +96,13 @@ jobs:
          REF_NAME: ${{ github.ref_name }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_fft
          path: ${{ env.RESULTS_FILENAME }}

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
--- a/.github/workflows/benchmark_tfhe_ntt.yml
+++ b/.github/workflows/benchmark_tfhe_ntt.yml
@@ -55,7 +55,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs repo with tags
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -96,13 +96,13 @@ jobs:
          REF_NAME: ${{ github.ref_name }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_ntt
          path: ${{ env.RESULTS_FILENAME }}

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
--- a/.github/workflows/benchmark_wasm_client.yml
+++ b/.github/workflows/benchmark_wasm_client.yml
@@ -39,7 +39,7 @@ jobs:
      wasm_bench: ${{ steps.changed-files.outputs.wasm_bench_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -47,7 +47,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            wasm_bench:
@@ -91,7 +91,7 @@ jobs:
        browser: [ chrome, firefox ]
    steps:
      - name: Checkout tfhe-rs repo with tags
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -119,7 +119,7 @@ jobs:

      - name: Node cache restoration
        id: node-cache
-        uses: actions/cache/restore@0057852bfaa89a56745cba8c7296529d2fc39830 #v4.3.0
+        uses: actions/cache/restore@9255dc7a253b0ccc959486e2bca901246202afeb #v5.0.1
        with:
          path: |
            ~/.nvm
@@ -132,7 +132,7 @@ jobs:
          make install_node

      - name: Node cache save
-        uses: actions/cache/save@0057852bfaa89a56745cba8c7296529d2fc39830 #v4.3.0
+        uses: actions/cache/save@9255dc7a253b0ccc959486e2bca901246202afeb #v5.0.1
        if: steps.node-cache.outputs.cache-hit != 'true'
        with:
          path: |
@@ -153,6 +153,12 @@ jobs:
        env:
          BROWSER: ${{ matrix.browser }}

+      - name: Run benchmarks (unsafe coop)
+        run: |
+          make bench_web_js_api_unsafe_coop_"${BROWSER}"_ci
+        env:
+          BROWSER: ${{ matrix.browser }}
+
      - name: Parse results
        run: |
          make parse_wasm_benchmarks
@@ -169,13 +175,13 @@ jobs:
          REF_NAME: ${{ github.ref_name }}

      - name: Upload parsed results artifact
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
          name: ${{ github.sha }}_wasm_${{ matrix.browser }}
          path: ${{ env.RESULTS_FILENAME }}

      - name: Checkout Slab repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: zama-ai/slab
          path: slab
--- a/.github/workflows/cargo_audit.yml
+++ b/.github/workflows/cargo_audit.yml
@@ -26,7 +26,7 @@ jobs:
    name: cargo_audit/audit
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/cargo_build.yml
+++ b/.github/workflows/cargo_build.yml
@@ -24,7 +24,7 @@ jobs:
    outputs:
      matrix_command: ${{ steps.set-pcc-commands-matrix.outputs.commands }}
    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: "false"
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/cargo_build_common.yml
+++ b/.github/workflows/cargo_build_common.yml
@@ -140,7 +140,7 @@ jobs:
      result: ${{ steps.set_builds_result.outputs.result }}
    steps:
      - name: Checkout tfhe-rs repo
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/cargo_build_tfhe_fft.yml
+++ b/.github/workflows/cargo_build_tfhe_fft.yml
@@ -26,7 +26,7 @@ jobs:
      fail-fast: false

    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/cargo_build_tfhe_ntt.yml
+++ b/.github/workflows/cargo_build_tfhe_ntt.yml
@@ -24,7 +24,7 @@ jobs:
        os: [ubuntu-latest, macos-latest, windows-latest]
      fail-fast: false
    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/cargo_test_fft.yml
+++ b/.github/workflows/cargo_test_fft.yml
@@ -29,7 +29,7 @@ jobs:
      fft_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.fft_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -37,7 +37,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            fft:
@@ -56,7 +56,7 @@ jobs:
        runner_type: [ ubuntu-latest, macos-latest, windows-latest ]
      fail-fast: false
    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
@@ -92,7 +92,7 @@ jobs:
    if: needs.should-run.outputs.fft_test == 'true'
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/cargo_test_ntt.yml
+++ b/.github/workflows/cargo_test_ntt.yml
@@ -31,7 +31,7 @@ jobs:
      ntt_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.ntt_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: "false"
@@ -39,7 +39,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            ntt:
@@ -87,7 +87,7 @@ jobs:
        os: ${{fromJson(needs.setup-instance.outputs.matrix_os)}}
      fail-fast: false
    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: "false"
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/ci_lint.yml
+++ b/.github/workflows/ci_lint.yml
@@ -20,7 +20,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
@@ -50,7 +50,7 @@ jobs:
          version: ${{ steps.get_zizmor.outputs.version }}

      - name: Ensure SHA pinned actions
-        uses: zgosalvez/github-actions-ensure-sha-pinned-actions@9e9574ef04ea69da568d6249bd69539ccc704e74 # v4.0.0
+        uses: zgosalvez/github-actions-ensure-sha-pinned-actions@6124774845927d14c601359ab8138699fa5b70c3 # v4.0.1
        with:
          allowlist: |
            slsa-framework/slsa-github-generator
--- a/.github/workflows/code_coverage.yml
+++ b/.github/workflows/code_coverage.yml
@@ -50,7 +50,7 @@ jobs:
    timeout-minutes: 5760 # 4 days
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
@@ -62,7 +62,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            tfhe:
@@ -92,7 +92,7 @@ jobs:
          make test_shortint_cov

      - name: Upload tfhe coverage to Codecov
-        uses: codecov/codecov-action@5a1091511ad55cbe89839c7260b706298ca349f7
+        uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de
        if: steps.changed-files.outputs.tfhe_any_changed == 'true'
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
@@ -106,7 +106,7 @@ jobs:
          make test_integer_cov

      - name: Upload tfhe coverage to Codecov
-        uses: codecov/codecov-action@5a1091511ad55cbe89839c7260b706298ca349f7
+        uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de
        if: steps.changed-files.outputs.tfhe_any_changed == 'true'
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
--- a/.github/workflows/csprng_randomness_tests.yml
+++ b/.github/workflows/csprng_randomness_tests.yml
@@ -62,7 +62,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/generate_svg_common.yml
+++ b/.github/workflows/generate_svg_common.yml
@@ -43,7 +43,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'

@@ -75,13 +75,22 @@ jobs:
          DATA_EXTRACTOR_DATABASE_HOST: ${{ secrets.DATA_EXTRACTOR_DATABASE_HOST }}
          DATA_EXTRACTOR_DATABASE_PASSWORD: ${{ secrets.DATA_EXTRACTOR_DATABASE_PASSWORD }}

+      - name: Upload tables
+        if: inputs.backend_comparison == false
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
+        with:
+          name: ${{ github.sha }}_${{ inputs.backend }}_${{ inputs.layer }}_${{ inputs.pbs_kind }}_${{ inputs.bench_type }}_tables
+          # This will upload all the file generated
+          path: ${{ inputs.output_filename }}*.svg
+          retention-days: 60
+
      - name: Produce backends comparison table from database
        if: inputs.backend_comparison == true
        run: |
          python3 -m pip install -r ci/data_extractor/requirements.txt
          python3 ci/data_extractor/src/data_extractor.py "${OUTPUT_FILENAME}" \
          --generate-svg \
-          --backend-comparison\
+          --backends-comparison \
          --time-span-days "${TIME_SPAN}"
        env:
          OUTPUT_FILENAME: ${{ inputs.output_filename }}
@@ -90,10 +99,11 @@ jobs:
          DATA_EXTRACTOR_DATABASE_HOST: ${{ secrets.DATA_EXTRACTOR_DATABASE_HOST }}
          DATA_EXTRACTOR_DATABASE_PASSWORD: ${{ secrets.DATA_EXTRACTOR_DATABASE_PASSWORD }}

-      - name: Upload tables
-        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4
+      - name: Upload comparison tables
+        if: inputs.backend_comparison == true
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f
        with:
-          name: ${{ github.sha }}_${{ inputs.backend }}_${{ inputs.layer }}_${{ inputs.pbs_kind }}_${{ inputs.bench_type }}_tables
+          name: ${{ github.sha }}_backends_comparison_tables
          # This will upload all the file generated
          path: ${{ inputs.output_filename }}*.svg
          retention-days: 60
--- a/.github/workflows/gpu_4090_tests.yml
+++ b/.github/workflows/gpu_4090_tests.yml
@@ -41,7 +41,7 @@ jobs:

    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_code_validation_tests.yml
+++ b/.github/workflows/gpu_code_validation_tests.yml
@@ -23,8 +23,8 @@ on:
  # Allows you to run this workflow manually from the Actions tab as an alternative.
  workflow_dispatch:
  schedule:
-    # every 3 months
-    - cron: "0 0 1 */3 *"
+    # every month
+    - cron: "0 0 1 * *"

 permissions:
  contents: read
@@ -50,7 +50,7 @@ jobs:
          slab-url: ${{ secrets.SLAB_BASE_URL }}
          job-secret: ${{ secrets.JOB_SECRET }}
          backend: hyperstack
-          profile: gpu-test
+          profile: single-h100

      # This instance will be spawned especially for pull-request from forked repository
      - name: Start GitHub instance
@@ -79,7 +79,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_fast_h100_tests.yml
+++ b/.github/workflows/gpu_fast_h100_tests.yml
@@ -40,7 +40,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -48,7 +48,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -129,7 +129,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_fast_tests.yml
+++ b/.github/workflows/gpu_fast_tests.yml
@@ -39,7 +39,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -47,7 +47,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -114,7 +114,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_full_h100_tests.yml
+++ b/.github/workflows/gpu_full_h100_tests.yml
@@ -68,7 +68,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_full_multi_gpu_tests.yml
+++ b/.github/workflows/gpu_full_multi_gpu_tests.yml
@@ -40,7 +40,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -48,7 +48,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -116,7 +116,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_integer_long_run_tests.yml
+++ b/.github/workflows/gpu_integer_long_run_tests.yml
@@ -65,7 +65,7 @@ jobs:
    timeout-minutes: 4320 # 72 hours
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_memory_sanitizer.yml
+++ b/.github/workflows/gpu_memory_sanitizer.yml
@@ -78,7 +78,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_memory_sanitizer_h100.yml
+++ b/.github/workflows/gpu_memory_sanitizer_h100.yml
@@ -78,7 +78,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_pcc.yml
+++ b/.github/workflows/gpu_pcc.yml
@@ -74,7 +74,7 @@ jobs:

    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_signed_integer_classic_tests.yml
+++ b/.github/workflows/gpu_signed_integer_classic_tests.yml
@@ -40,7 +40,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -48,7 +48,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -116,7 +116,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_signed_integer_h100_tests.yml
+++ b/.github/workflows/gpu_signed_integer_h100_tests.yml
@@ -40,7 +40,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -48,7 +48,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -129,7 +129,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_signed_integer_tests.yml
+++ b/.github/workflows/gpu_signed_integer_tests.yml
@@ -41,7 +41,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -49,7 +49,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -117,7 +117,7 @@ jobs:
            gcc: 11
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_unsigned_integer_classic_tests.yml
+++ b/.github/workflows/gpu_unsigned_integer_classic_tests.yml
@@ -40,7 +40,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -48,7 +48,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -116,7 +116,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_unsigned_integer_h100_tests.yml
+++ b/.github/workflows/gpu_unsigned_integer_h100_tests.yml
@@ -40,7 +40,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -48,7 +48,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -129,7 +129,7 @@ jobs:
            gcc: 11 
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/gpu_unsigned_integer_tests.yml
+++ b/.github/workflows/gpu_unsigned_integer_tests.yml
@@ -41,7 +41,7 @@ jobs:
      gpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.gpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -49,7 +49,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            gpu:
@@ -117,7 +117,7 @@ jobs:
            gcc: 11
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/hpu_hlapi_tests.yml
+++ b/.github/workflows/hpu_hlapi_tests.yml
@@ -32,7 +32,7 @@ jobs:
      hpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.hpu_any_changed }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -40,7 +40,7 @@ jobs:

      - name: Check for file changes
        id: changed-files
-        uses: tj-actions/changed-files@24d32ffd492484c1d75e0c0b894501ddb9d30d62 # v47.0.0
+        uses: tj-actions/changed-files@e0021407031f5be11a464abee9a0776171c79891 # v47.0.1
        with:
          files_yaml: |
            hpu:
@@ -83,7 +83,7 @@ jobs:
    needs: setup-instance
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/integer_long_run_tests.yml
+++ b/.github/workflows/integer_long_run_tests.yml
@@ -53,7 +53,7 @@ jobs:
    timeout-minutes: 4320 # 72 hours
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
--- a/.github/workflows/m1_tests.yml
+++ b/.github/workflows/m1_tests.yml
@@ -41,7 +41,7 @@ jobs:
    timeout-minutes: 720

    steps:
-      - uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: "false"
          token: ${{ env.CHECKOUT_TOKEN }}
--- a/.github/workflows/make_release_common.yml
+++ b/.github/workflows/make_release_common.yml
@@ -52,7 +52,7 @@ jobs:
      hash: ${{ steps.hash.outputs.hash }}
    steps:
      - name: Checkout
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -62,7 +62,7 @@ jobs:
          PACKAGE: ${{ inputs.package-name }}
        run: |
          cargo package -p "${PACKAGE}"
-      - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+      - uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
        with:
          name: crate-${{ inputs.package-name }}
          path: target/package/*.crate
@@ -93,14 +93,14 @@ jobs:
      id-token: write # Needed for OIDC token exchange on crates.io
    steps:
      - name: Checkout
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 0
          persist-credentials: 'false'
          token: ${{ secrets.REPO_CHECKOUT_TOKEN }}

      - name: Download artifact
-        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
        with:
          name: crate-${{ inputs.package-name }}
          path: target/package
--- a/.github/workflows/make_release_cuda.yml
+++ b/.github/workflows/make_release_cuda.yml
@@ -64,7 +64,7 @@ jobs:
      CUDA_PATH: /usr/local/cuda-${{ matrix.cuda }}
    steps:
      - name: Checkout
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 0
          persist-credentials: "false"
@@ -104,7 +104,7 @@ jobs:
        run: |
          cargo package -p tfhe-cuda-backend

-      - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+      - uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
        with:
          name: crate-tfhe-cuda-backend
          path: target/package/*.crate
@@ -174,7 +174,7 @@ jobs:
          GCC_VERSION: ${{ matrix.gcc }}

      - name: Download artifact
-        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
        with:
          name: crate-tfhe-cuda-backend
          path: target/package
--- a/.github/workflows/make_release_tfhe.yml
+++ b/.github/workflows/make_release_tfhe.yml
@@ -41,6 +41,7 @@ jobs:
  make-release:
    name: make_release_tfhe/make-release
    uses: ./.github/workflows/make_release_common.yml
+    if: ${{ inputs.push_to_crates }}
    with:
      package-name: "tfhe"
      dry-run: ${{ inputs.dry_run }}
@@ -59,6 +60,7 @@ jobs:
  make-release-js:
    name: make_release_tfhe/make-release-js
    needs: make-release
+    if: ${{ always() && needs.make-release.result != 'failure' }}
    runs-on: ubuntu-latest
    # For provenance of npmjs publish
    permissions:
@@ -66,7 +68,7 @@ jobs:
      id-token: write # also needed for OIDC token exchange on crates.io and npmjs.com
    steps:
      - name: Checkout
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 0
          persist-credentials: 'false'
@@ -83,9 +85,9 @@ jobs:
          make build_web_js_api_parallel

      - name: Authenticate on NPM
-        uses: actions/setup-node@2028fbc5c25fe9cf00d9f06a71cc4710d4507903 # v6.0.0
+        uses: actions/setup-node@395ad3262231945c25e8478fd5baf05154b1d79f # v6.1.0
        with:
-          node-version: '22'
+          node-version: '24'
          registry-url: 'https://registry.npmjs.org'

      - name: Publish web package
--- a/.github/workflows/parameters_check.yml
+++ b/.github/workflows/parameters_check.yml
@@ -60,7 +60,7 @@ jobs:
    runs-on: ${{ needs.setup-instance.outputs.runner-name }}
    steps:
      - name: Checkout tfhe-rs
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          persist-credentials: 'false'
          token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
@@ -71,7 +71,7 @@ jobs:
          toolchain: stable

      - name: Checkout lattice-estimator
-        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8
        with:
          repository: malb/lattice-estimator
          path: lattice_estimator
--- a/.github/workflows/unverified_prs.yml
+++ b/.github/workflows/unverified_prs.yml
@@ -17,7 +17,7 @@ jobs:
      issues: read # Needed to fetch all issues
      pull-requests: write # Needed to write message and close the PR
    steps:
-      - uses: actions/stale@5f858e3efba33a5ca4407a664cc011ad407f2008 # v10.1.0
+      - uses: actions/stale@997185467fa4f803885201cee163a9f38240193d # v10.1.1
        with:
          stale-pr-message: 'This PR is unverified and has been open for 2 days, it will now be closed. If you want to contribute please sign the CLA as indicated by the bot.'
          days-before-stale: 2
--- a/.gitignore
+++ b/.gitignore
@@ -10,6 +10,7 @@ target/
 **/*.rmeta
 **/Cargo.lock
 **/*.bin
+**/.DS_Store

 # Some of our bench outputs
 /tfhe/benchmarks_parameters
--- a/6
+++ b/6
@@ -11,7 +11,7 @@
 /tfhe/src/core_crypto/gpu               @agnesLeroy
 /tfhe/src/core_crypto/hpu               @zama-ai/hardware

-/tfhe/src/shortint/                     @mayeul-zama
+/tfhe/src/shortint/                     @mayeul-zama @nsarlin-zama

 /tfhe/src/integer/                      @tmontaigu
 /tfhe/src/integer/gpu                   @agnesLeroy
@@ -19,8 +19,12 @@

 /tfhe/src/high_level_api/               @tmontaigu

+/tfhe-zk-pok/                           @nsarlin-zama
+
 /tfhe-benchmark/                        @soonum

+/utils/                                 @nsarlin-zama
+
 /Makefile                               @IceTDrinker @soonum

 /mockups/tfhe-hpu-mockup                @zama-ai/hardware
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -27,7 +27,7 @@ rust-version = "1.91.1"
 [workspace.dependencies]
 aligned-vec = { version = "0.6", default-features = false }
 bytemuck = "1.24"
-dyn-stack = { version = "0.11", default-features = false }
+dyn-stack = { version = "0.13", default-features = false }
 itertools = "0.14"
 num-complex = "0.4"
 pulp = { version = "0.22", default-features = false }
@@ -36,6 +36,8 @@ rayon = "1.11"
 serde = { version = "1.0", default-features = false }
 wasm-bindgen = "0.2.101"
 getrandom = "0.2.8"
+# The project maintainers consider that this is the last version of the 1.3 branch, any newer version should not be trusted
+bincode = "=1.3.3"

 [profile.bench]
 lto = "fat"
--- a/49
+++ b/49
@@ -20,7 +20,7 @@ BENCH_TYPE?=latency
 BENCH_PARAM_TYPE?=classical
 BENCH_PARAMS_SET?=default
 BENCH_CUSTOM_COMMAND:=
-NODE_VERSION=22.6
+NODE_VERSION=24.12
 BACKWARD_COMPAT_DATA_DIR=utils/tfhe-backward-compat-data
 BACKWARD_COMPAT_DATA_GEN_VERSION:=$(TFHE_VERSION)
 TEST_VECTORS_DIR=apps/test-vectors
@@ -996,6 +996,15 @@ test_noise_check:
 		--features=boolean,shortint,integer -p tfhe -- noise_check \
 		--test-threads=1 --nocapture

+.PHONY: test_noise_check_gpu # Run dedicated noise and pfail check tests on gpu backend
+test_noise_check_gpu:
+	@# First run the sanity checks to make sure the atomic patterns are correct
+	RUSTFLAGS="$(RUSTFLAGS)" cargo test --profile $(CARGO_PROFILE) \
+		--features=boolean,shortint,integer,gpu -p tfhe -- gpu_sanity_check
+	RUSTFLAGS="$(RUSTFLAGS)" cargo test --profile $(CARGO_PROFILE) \
+		--features=boolean,shortint,integer,gpu -p tfhe -- gpu_noise_check \
+		--test-threads=1 --nocapture
+
 .PHONY: test_safe_serialization # Run the tests for safe serialization
 test_safe_serialization: install_cargo_nextest
 	RUSTFLAGS="$(RUSTFLAGS)" cargo test --profile $(CARGO_PROFILE) \
@@ -1291,13 +1300,14 @@ run_web_js_api_parallel: build_web_js_api_parallel setup_venv
 	--browser-path $(browser_path) \
 	--driver-path $(driver_path) \
 	--browser-kind  $(browser_kind) \
-	--server-cmd "npm run server" \
+	--server-cmd $(server_cmd) \
 	--server-workdir "$(WEB_SERVER_DIR)" \
 	--id-pattern $(filter)

 test_web_js_api_parallel_chrome: browser_path = "$(WEB_RUNNER_DIR)/chrome/chrome-linux64/chrome"
 test_web_js_api_parallel_chrome: driver_path = "$(WEB_RUNNER_DIR)/chrome/chromedriver-linux64/chromedriver"
 test_web_js_api_parallel_chrome: browser_kind = chrome
+test_web_js_api_parallel_chrome: server_cmd = "npm run server:multithreaded"
 test_web_js_api_parallel_chrome: filter = Test

 .PHONY: test_web_js_api_parallel_chrome # Run tests for the web wasm api on Chrome
@@ -1313,6 +1323,7 @@ test_web_js_api_parallel_chrome_ci: setup_venv
 test_web_js_api_parallel_firefox: browser_path = "$(WEB_RUNNER_DIR)/firefox/firefox/firefox"
 test_web_js_api_parallel_firefox: driver_path = "$(WEB_RUNNER_DIR)/firefox/geckodriver"
 test_web_js_api_parallel_firefox: browser_kind = firefox
+test_web_js_api_parallel_firefox: server_cmd = "npm run server:multithreaded"
 test_web_js_api_parallel_firefox: filter = Test

 .PHONY: test_web_js_api_parallel_firefox # Run tests for the web wasm api on Firefox
@@ -1562,6 +1573,7 @@ bench_pbs128_gpu: install_rs_check_toolchain
 bench_web_js_api_parallel_chrome: browser_path = "$(WEB_RUNNER_DIR)/chrome/chrome-linux64/chrome"
 bench_web_js_api_parallel_chrome: driver_path = "$(WEB_RUNNER_DIR)/chrome/chromedriver-linux64/chromedriver"
 bench_web_js_api_parallel_chrome: browser_kind = chrome
+bench_web_js_api_parallel_chrome: server_cmd = "npm run server:multithreaded"
 bench_web_js_api_parallel_chrome: filter = Bench

 .PHONY: bench_web_js_api_parallel_chrome # Run benchmarks for the web wasm api
@@ -1577,6 +1589,7 @@ bench_web_js_api_parallel_chrome_ci: setup_venv
 bench_web_js_api_parallel_firefox: browser_path = "$(WEB_RUNNER_DIR)/firefox/firefox/firefox"
 bench_web_js_api_parallel_firefox: driver_path = "$(WEB_RUNNER_DIR)/firefox/geckodriver"
 bench_web_js_api_parallel_firefox: browser_kind = firefox
+bench_web_js_api_parallel_firefox: server_cmd = "npm run server:multithreaded"
 bench_web_js_api_parallel_firefox: filter = Bench

 .PHONY: bench_web_js_api_parallel_firefox # Run benchmarks for the web wasm api
@@ -1589,6 +1602,38 @@ bench_web_js_api_parallel_firefox_ci: setup_venv
 	nvm use $(NODE_VERSION) && \
 	$(MAKE) bench_web_js_api_parallel_firefox

+bench_web_js_api_unsafe_coop_chrome: browser_path = "$(WEB_RUNNER_DIR)/chrome/chrome-linux64/chrome"
+bench_web_js_api_unsafe_coop_chrome: driver_path = "$(WEB_RUNNER_DIR)/chrome/chromedriver-linux64/chromedriver"
+bench_web_js_api_unsafe_coop_chrome: browser_kind = chrome
+bench_web_js_api_unsafe_coop_chrome: server_cmd = "npm run server:unsafe-coop"
+bench_web_js_api_unsafe_coop_chrome: filter = ZeroKnowledgeBench # Only bench zk with unsafe coop
+
+.PHONY: bench_web_js_api_unsafe_coop_chrome # Run benchmarks for the web wasm api without cross-origin isolation
+bench_web_js_api_unsafe_coop_chrome: run_web_js_api_parallel
+
+.PHONY: bench_web_js_api_unsafe_coop_chrome_ci # Run benchmarks for the web wasm api without cross-origin isolation
+bench_web_js_api_unsafe_coop_chrome_ci: setup_venv
+	source ~/.nvm/nvm.sh && \
+	nvm install $(NODE_VERSION) && \
+	nvm use $(NODE_VERSION) && \
+	$(MAKE) bench_web_js_api_unsafe_coop_chrome
+
+bench_web_js_api_unsafe_coop_firefox: browser_path = "$(WEB_RUNNER_DIR)/firefox/firefox/firefox"
+bench_web_js_api_unsafe_coop_firefox: driver_path = "$(WEB_RUNNER_DIR)/firefox/geckodriver"
+bench_web_js_api_unsafe_coop_firefox: browser_kind = firefox
+bench_web_js_api_unsafe_coop_firefox: server_cmd = "npm run server:unsafe-coop"
+bench_web_js_api_unsafe_coop_firefox: filter = ZeroKnowledgeBench # Only bench zk with unsafe coop
+
+.PHONY: bench_web_js_api_unsafe_coop_firefox # Run benchmarks for the web wasm api without cross-origin isolation
+bench_web_js_api_unsafe_coop_firefox: run_web_js_api_parallel
+
+.PHONY: bench_web_js_api_unsafe_coop_firefox_ci # Run benchmarks for the web wasm api without cross-origin isolation
+bench_web_js_api_unsafe_coop_firefox_ci: setup_venv
+	source ~/.nvm/nvm.sh && \
+	nvm install $(NODE_VERSION) && \
+	nvm use $(NODE_VERSION) && \
+	$(MAKE) bench_web_js_api_unsafe_coop_firefox
+
 .PHONY: bench_hlapi # Run benchmarks for integer operations
 bench_hlapi: install_rs_check_toolchain
 	RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_BIT_SIZES_SET=$(BIT_SIZES_SET) \
--- a/apps/test-vectors/checksums.sha256
+++ b/apps/test-vectors/checksums.sha256
@@ -1,24 +1,32 @@
+08f31a47c29cc4d72ad32c0b5411fa20b3deef5b84558dd2fb892d3cdf90528a  data/toy_params/glwe_after_id_br_karatsuba.cbor
 29b6e3e7d27700004b70dca24d225816500490e2d6ee49b9af05837fd421896b  data/valid_params_128/lwe_after_spec_pbs.cbor
 2c70d1d78cc3760733850a353ace2b9c4705e840141b75841739e90e51247e18  data/valid_params_128/small_lwe_secret_key.cbor
+2fb4bb45c259b8383da10fc8f9459c40a6972c49b1696eb107f0a75640724be5  data/toy_params/lwe_after_id_pbs_karatsuba.cbor
 36c9080b636475fcacca503ce041bbfeee800fd3e1890dee559ea18defff9fe8  data/toy_params/glwe_after_id_br.cbor
 377761beeb4216cf5aa2624a8b64b8259f5a75c32d28e850be8bced3a0cdd6f5  data/toy_params/ksk.cbor
 59dba26d457f96478eda130cab5301fce86f23c6a8807de42f2a1e78c4985ca7  data/valid_params_128/lwe_ks.cbor
+5d80dd93fefae4f4f89484dfcd65bbe99cc32e7e3b0a90c33dd0d77516c0a023  data/valid_params_128/glwe_after_id_br_karatsuba.cbor
 656f0009c7834c5bcb61621e222047516054b9bc5d0593d474ab8f1c086b67a6  data/valid_params_128/lwe_after_id_pbs.cbor
 699580ca92b9c2f9e1f57fb1e312c9e8cb29714f7acdef9d2ba05f798546751f  data/toy_params/lwe_sum.cbor
 6e54ab41056984595b077baff70236d934308cf5c0c33b4482fbfb129b3756c6  data/valid_params_128/glwe_after_id_br.cbor
 70f5e5728822de05b49071efb5ec28551b0f5cc87aa709a455d8e7f04b9c96ee  data/toy_params/lwe_after_id_pbs.cbor
+76a5c52cab7fec1dc167da676c6cd39479cda6b2bb9f4e0573cb7d99c2692faa  data/valid_params_128/lwe_after_id_pbs_karatsuba.cbor
 7cc6803f5fbc3d5a1bf597f2b979ce17eecd3d6baca12183dea21022a7b65c52  data/toy_params/bsk.cbor
 7f3c40a134623b44779a556212477fea26eaed22450f3b6faeb8721d63699972  data/valid_params_128/lwe_sum.cbor
 837b3bd3245d4d0534ed255fdef896fb4fa6998a258a14543dfdadd0bfc9b6dd  data/toy_params/lwe_prod.cbor
+9ece8ca9c1436258b94e8c5e629b8722f9b18fdd415dd5209b6167a9dde8491c  data/toy_params/glwe_after_spec_br_karatsuba.cbor
 aa44aea29efd6d9e4d35a21a625d9cba155672e3f7ed3eddee1e211e62ad146b  data/valid_params_128/lwe_ms.cbor
 b7a037b9eaa88d6385167579b93e26a0cb6976d9b8967416fd1173e113bda199  data/valid_params_128/large_lwe_secret_key.cbor
+b7b8e3586128887bd682120f3e3a43156139bce5e3fe0b03284f8753a864d647  data/toy_params/lwe_after_spec_pbs_karatsuba.cbor
 bd00a8ae7494e400de5753029552ee1647efe7e17409b863a26a13b081099b8c  data/toy_params/lwe_after_spec_pbs.cbor
 c6df98676de04fe54b5ffc2eb30a82ebb706c9d7d5a4e0ed509700fec88761f7  data/toy_params/lwe_ms.cbor
 c7d5a864d5616a7d8ad50bbf40416e41e6c9b60c546dc14d4aa8fc40a418baa7  data/toy_params/large_lwe_secret_key.cbor
 c806533b325b1009db38be2f9bef5f3b2fad6b77b4c71f2855ccc9d3b4162e98  data/valid_params_128/lwe_b.cbor
 c9eb75bd2993639348a679cf48c06e3c38d1a513f48e5b0ce0047cea8cff6bbc  data/toy_params/lwe_a.cbor
+d3391969acf26dc69de0927ba279139d8d79999944069addc8ff469ad6c5ae2d  data/valid_params_128/lwe_after_spec_pbs_karatsuba.cbor
 d6da5baef0e787f6be56e218d8354e26904652602db964844156fdff08350ce6  data/toy_params/lwe_ks.cbor
 e591ab9af1b6a0aede273f9a3abb65a4c387feb5fa06a6959e9314058ca0f7e5  data/valid_params_128/ksk.cbor
+e59b002df3a9b01ad321ec51cf076fa35131ab9dbef141d1c54b717d61426c92  data/valid_params_128/glwe_after_spec_br_karatsuba.cbor
 e628354c81508a2d888016e8282df363dd12f1e19190b6475d4eb9d7ab8ae007  data/valid_params_128/glwe_after_spec_br.cbor
 e69d2d2c064fc8c0460b39191ca65338146990349954f5ec5ebd01d93610e7eb  data/valid_params_128/lwe_a.cbor
 e76c24b2a0c9a842ad13dda35473c2514f9e7d20983b5ea0759c4521a91626d9  data/valid_params_128/lwe_prod.cbor
--- a/apps/test-vectors/data/README.md
+++ b/apps/test-vectors/data/README.md
@@ -39,6 +39,9 @@ The following values are generated:
 | `glwe_after_spec_br` | The glwe returned by the application of the spec blind rotation on the mod switched ciphertexts.             | `GlweCiphertext<Vec<u64>>` | rot spec LUT |
 | `lwe_after_spec_pbs` | The lwe returned by the application of the sample extract operation on the output of the spec blind rotation | `LweCiphertext<Vec<u64>>`  | `spec(A)`    |

+Ciphertexts with the `_karatsuba` suffix are generated using the Karatsuba polynomial multiplication algorithm in the blind rotation, while default ciphertexts are generated using an FFT multiplication.
+This makes it easier to reproduce bit exact results.
+
 ### Encodings
 #### Non native encoding
 Warning: TFHE-rs uses a specific encoding for non native (ie: u32, u64) power of two ciphertext modulus. This encoding puts the encoded value in the high bits of the native integer.
--- a/apps/test-vectors/data/toy_params/glwe_after_id_br_karatsuba.cbor
+++ b/apps/test-vectors/data/toy_params/glwe_after_id_br_karatsuba.cbor
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:08f31a47c29cc4d72ad32c0b5411fa20b3deef5b84558dd2fb892d3cdf90528a
+size 4679
--- a/apps/test-vectors/data/toy_params/glwe_after_spec_br_karatsuba.cbor
+++ b/apps/test-vectors/data/toy_params/glwe_after_spec_br_karatsuba.cbor
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9ece8ca9c1436258b94e8c5e629b8722f9b18fdd415dd5209b6167a9dde8491c
+size 4679
--- a/apps/test-vectors/data/toy_params/lwe_after_id_pbs_karatsuba.cbor
+++ b/apps/test-vectors/data/toy_params/lwe_after_id_pbs_karatsuba.cbor
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2fb4bb45c259b8383da10fc8f9459c40a6972c49b1696eb107f0a75640724be5
+size 2365
--- a/apps/test-vectors/data/toy_params/lwe_after_spec_pbs_karatsuba.cbor
+++ b/apps/test-vectors/data/toy_params/lwe_after_spec_pbs_karatsuba.cbor
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:b7b8e3586128887bd682120f3e3a43156139bce5e3fe0b03284f8753a864d647
+size 2365
--- a/apps/test-vectors/data/valid_params_128/glwe_after_id_br_karatsuba.cbor
+++ b/apps/test-vectors/data/valid_params_128/glwe_after_id_br_karatsuba.cbor
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5d80dd93fefae4f4f89484dfcd65bbe99cc32e7e3b0a90c33dd0d77516c0a023
+size 36935
--- a/apps/test-vectors/data/valid_params_128/glwe_after_spec_br_karatsuba.cbor
+++ b/apps/test-vectors/data/valid_params_128/glwe_after_spec_br_karatsuba.cbor
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e59b002df3a9b01ad321ec51cf076fa35131ab9dbef141d1c54b717d61426c92
+size 36935
--- a/apps/test-vectors/data/valid_params_128/lwe_after_id_pbs_karatsuba.cbor
+++ b/apps/test-vectors/data/valid_params_128/lwe_after_id_pbs_karatsuba.cbor
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:76a5c52cab7fec1dc167da676c6cd39479cda6b2bb9f4e0573cb7d99c2692faa
+size 18493
--- a/apps/test-vectors/data/valid_params_128/lwe_after_spec_pbs_karatsuba.cbor
+++ b/apps/test-vectors/data/valid_params_128/lwe_after_spec_pbs_karatsuba.cbor
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:d3391969acf26dc69de0927ba279139d8d79999944069addc8ff469ad6c5ae2d
+size 18493
--- a/apps/test-vectors/src/main.rs
+++ b/apps/test-vectors/src/main.rs
@@ -265,6 +265,7 @@ fn generate_test_vectors<P: AsRef<Path>>(

    let mut id_lut = encoding.encode_lut(glwe_dimension, polynomial_size, ID_LUT);
    assert_data_not_zero(&id_lut);
+    let mut id_lut_karatsuba = id_lut.clone();

    blind_rotate_assign(&modswitched, &mut id_lut, &fourier_bsk);
    assert_data_not_zero(&id_lut);
@@ -287,8 +288,32 @@ fn generate_test_vectors<P: AsRef<Path>>(
    assert_data_not_zero(&lwe_pbs_id);
    store_data(path, &lwe_pbs_id, "lwe_after_id_pbs");

+    blind_rotate_karatsuba_assign(&modswitched, &mut id_lut_karatsuba, &bsk);
+    store_data(path, &id_lut_karatsuba, "glwe_after_id_br_karatsuba");
+
+    let mut lwe_pbs_karatsuba_id = LweCiphertext::new(
+        0u64,
+        glwe_dimension
+            .to_equivalent_lwe_dimension(polynomial_size)
+            .to_lwe_size(),
+        encoding.ciphertext_modulus,
+    );
+
+    extract_lwe_sample_from_glwe_ciphertext(
+        &id_lut_karatsuba,
+        &mut lwe_pbs_karatsuba_id,
+        MonomialDegree(0),
+    );
+
+    let decrypted_pbs_id = decrypt_lwe_ciphertext(&large_lwe_secret_key, &lwe_pbs_karatsuba_id);
+    let res = encoding.decode(decrypted_pbs_id);
+
+    assert_eq!(res, MSG_A);
+    store_data(path, &lwe_pbs_karatsuba_id, "lwe_after_id_pbs_karatsuba");
+
    let mut spec_lut = encoding.encode_lut(glwe_dimension, polynomial_size, SPEC_LUT);
    assert_data_not_zero(&spec_lut);
+    let mut spec_lut_karatsuba = spec_lut.clone();

    blind_rotate_assign(&modswitched, &mut spec_lut, &fourier_bsk);
    assert_data_not_zero(&spec_lut);
@@ -310,6 +335,33 @@ fn generate_test_vectors<P: AsRef<Path>>(
    assert_eq!(res, SPEC_LUT(MSG_A));
    assert_data_not_zero(&lwe_pbs_spec);
    store_data(path, &lwe_pbs_spec, "lwe_after_spec_pbs");
+
+    blind_rotate_karatsuba_assign(&modswitched, &mut spec_lut_karatsuba, &bsk);
+    store_data(path, &spec_lut_karatsuba, "glwe_after_spec_br_karatsuba");
+
+    let mut lwe_pbs_karatsuba_spec = LweCiphertext::new(
+        0u64,
+        glwe_dimension
+            .to_equivalent_lwe_dimension(polynomial_size)
+            .to_lwe_size(),
+        encoding.ciphertext_modulus,
+    );
+
+    extract_lwe_sample_from_glwe_ciphertext(
+        &spec_lut_karatsuba,
+        &mut lwe_pbs_karatsuba_spec,
+        MonomialDegree(0),
+    );
+
+    let decrypted_pbs_spec = decrypt_lwe_ciphertext(&large_lwe_secret_key, &lwe_pbs_karatsuba_spec);
+    let res = encoding.decode(decrypted_pbs_spec);
+
+    assert_eq!(res, SPEC_LUT(MSG_A));
+    store_data(
+        path,
+        &lwe_pbs_karatsuba_spec,
+        "lwe_after_spec_pbs_karatsuba",
+    );
 }

 fn rm_dir_except_readme<P: AsRef<Path>>(dir: P) {
--- a/backends/tfhe-cuda-backend/cuda/include/aes/aes_utilities.h
+++ b/backends/tfhe-cuda-backend/cuda/include/aes/aes_utilities.h
@@ -35,7 +35,8 @@ template <typename Torus> struct int_aes_lut_buffers {
        params.glwe_dimension, params.polynomial_size, params.message_modulus,
        params.carry_modulus, and_lambda, allocate_gpu_memory);
    auto active_streams_and_lut = streams.active_gpu_subset(
-        SBOX_MAX_AND_GATES * num_aes_inputs * sbox_parallelism);
+        SBOX_MAX_AND_GATES * num_aes_inputs * sbox_parallelism,
+        params.pbs_type);
    this->and_lut->broadcast_lut(active_streams_and_lut);
    this->and_lut->setup_gemm_batch_ks_temp_buffers(size_tracker);

@@ -50,8 +51,8 @@ template <typename Torus> struct int_aes_lut_buffers {
        this->flush_lut->get_degree(0), this->flush_lut->get_max_degree(0),
        params.glwe_dimension, params.polynomial_size, params.message_modulus,
        params.carry_modulus, flush_lambda, allocate_gpu_memory);
-    auto active_streams_flush_lut =
-        streams.active_gpu_subset(AES_STATE_BITS * num_aes_inputs);
+    auto active_streams_flush_lut = streams.active_gpu_subset(
+        AES_STATE_BITS * num_aes_inputs, params.pbs_type);
    this->flush_lut->broadcast_lut(active_streams_flush_lut);
    this->flush_lut->setup_gemm_batch_ks_temp_buffers(size_tracker);

@@ -65,7 +66,8 @@ template <typename Torus> struct int_aes_lut_buffers {
        this->carry_lut->get_degree(0), this->carry_lut->get_max_degree(0),
        params.glwe_dimension, params.polynomial_size, params.message_modulus,
        params.carry_modulus, carry_lambda, allocate_gpu_memory);
-    auto active_streams_carry_lut = streams.active_gpu_subset(num_aes_inputs);
+    auto active_streams_carry_lut =
+        streams.active_gpu_subset(num_aes_inputs, params.pbs_type);
    this->carry_lut->broadcast_lut(active_streams_carry_lut);
    this->carry_lut->setup_gemm_batch_ks_temp_buffers(size_tracker);
  }
--- a/backends/tfhe-cuda-backend/cuda/include/ciphertext.h
+++ b/backends/tfhe-cuda-backend/cuda/include/ciphertext.h
@@ -35,17 +35,9 @@ void cuda_centered_modulus_switch_64(void *stream, uint32_t gpu_index,
                                     uint32_t lwe_dimension,
                                     uint32_t log_modulus);

-void cuda_improve_noise_modulus_switch_64(
-    void *stream, uint32_t gpu_index, void *lwe_array_out,
-    void const *lwe_array_in, void const *lwe_array_indexes,
-    void const *encrypted_zeros, uint32_t lwe_size, uint32_t num_lwes,
-    uint32_t num_zeros, double input_variance, double r_sigma, double bound,
-    uint32_t log_modulus);
-
 void cuda_glwe_sample_extract_128(
    void *stream, uint32_t gpu_index, void *lwe_array_out,
    void const *glwe_array_in, uint32_t const *nth_array, uint32_t num_nths,
    uint32_t lwe_per_glwe, uint32_t glwe_dimension, uint32_t polynomial_size);
 }
-
 #endif
--- a/backends/tfhe-cuda-backend/cuda/include/helper_multi_gpu.h
+++ b/backends/tfhe-cuda-backend/cuda/include/helper_multi_gpu.h
@@ -8,7 +8,8 @@

 extern std::mutex m;
 extern bool p2p_enabled;
-extern const int THRESHOLD_MULTI_GPU;
+extern const int THRESHOLD_MULTI_GPU_WITH_MULTI_BIT_PARAMS;
+extern const int THRESHOLD_MULTI_GPU_WITH_CLASSICAL_PARAMS;

 extern "C" {
 int32_t cuda_setup_multi_gpu(int device_0_id);
@@ -39,7 +40,8 @@ get_variant_element(const std::variant<std::vector<Torus>, Torus> &variant,
  }
 }

-uint32_t get_active_gpu_count(uint32_t num_inputs, uint32_t gpu_count);
+uint32_t get_active_gpu_count(uint32_t num_inputs, uint32_t gpu_count,
+                              PBS_TYPE pbs_type);

 int get_num_inputs_on_gpu(int total_num_inputs, int gpu_index, int gpu_count);

@@ -73,9 +75,10 @@ public:

  // Returns a subset of this set as an active subset. An active subset is one
  // that is temporarily used to perform some computation
-  CudaStreams active_gpu_subset(int num_radix_blocks) {
-    return CudaStreams(_streams, _gpu_indexes,
-                       get_active_gpu_count(num_radix_blocks, _gpu_count));
+  CudaStreams active_gpu_subset(int num_radix_blocks, PBS_TYPE pbs_type) {
+    return CudaStreams(
+        _streams, _gpu_indexes,
+        get_active_gpu_count(num_radix_blocks, _gpu_count, pbs_type));
  }

  // Returns a CudaStreams struct containing only the ith stream
--- a/backends/tfhe-cuda-backend/cuda/include/integer/bitwise_ops.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/bitwise_ops.h
@@ -20,7 +20,8 @@ template <typename Torus> struct boolean_bitop_buffer {
    gpu_memory_allocated = allocate_gpu_memory;
    this->op = op;
    this->params = params;
-    auto active_streams = streams.active_gpu_subset(lwe_ciphertext_count);
+    auto active_streams =
+        streams.active_gpu_subset(lwe_ciphertext_count, params.pbs_type);
    this->unchecked = is_unchecked;
    switch (op) {
    case BITAND:
@@ -119,7 +120,8 @@ template <typename Torus> struct int_bitop_buffer {
    gpu_memory_allocated = allocate_gpu_memory;
    this->op = op;
    this->params = params;
-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    switch (op) {
    case BITAND:
    case BITOR:
@@ -216,7 +218,8 @@ template <typename Torus> struct boolean_bitnot_buffer {
          message_extract_lut->get_max_degree(0), params.glwe_dimension,
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          lut_f_message_extract, gpu_memory_allocated);
-      auto active_streams = streams.active_gpu_subset(lwe_ciphertext_count);
+      auto active_streams =
+          streams.active_gpu_subset(lwe_ciphertext_count, params.pbs_type);
      message_extract_lut->broadcast_lut(active_streams);
    }
  }
--- a/backends/tfhe-cuda-backend/cuda/include/integer/cast.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/cast.h
@@ -39,7 +39,8 @@ template <typename Torus> struct int_extend_radix_with_sign_msb_buffer {
          },
          allocate_gpu_memory);

-      auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+      auto active_streams =
+          streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
      lut->broadcast_lut(active_streams);

      this->last_block = new CudaRadixCiphertextFFI;
--- a/backends/tfhe-cuda-backend/cuda/include/integer/cmux.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/cmux.h
@@ -14,7 +14,8 @@ template <typename Torus> struct int_zero_out_if_buffer {
                         uint64_t &size_tracker) {
    gpu_memory_allocated = allocate_gpu_memory;
    this->params = params;
-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);

    tmp = new CudaRadixCiphertextFFI;
    create_zero_radix_ciphertext_async<Torus>(
@@ -114,9 +115,11 @@ template <typename Torus> struct int_cmux_buffer {
        predicate_lut->get_lut_indexes(0, 0), h_lut_indexes,
        2 * num_radix_blocks * sizeof(Torus), streams.stream(0),
        streams.gpu_index(0), allocate_gpu_memory);
-    auto active_streams_pred = streams.active_gpu_subset(2 * num_radix_blocks);
+    auto active_streams_pred =
+        streams.active_gpu_subset(2 * num_radix_blocks, params.pbs_type);
    predicate_lut->broadcast_lut(active_streams_pred);
-    auto active_streams_msg = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams_msg =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    message_extract_lut->broadcast_lut(active_streams_msg);
  }

--- a/backends/tfhe-cuda-backend/cuda/include/integer/comparison.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/comparison.h
@@ -52,7 +52,8 @@ template <typename Torus> struct int_are_all_block_true_buffer {
        params.glwe_dimension, params.polynomial_size, params.message_modulus,
        params.carry_modulus, is_max_value_f, gpu_memory_allocated);

-    auto active_streams = streams.active_gpu_subset(max_chunks);
+    auto active_streams =
+        streams.active_gpu_subset(max_chunks, params.pbs_type);
    is_max_value->broadcast_lut(active_streams);
  }

@@ -108,7 +109,8 @@ template <typename Torus> struct int_comparison_eq_buffer {
        params.glwe_dimension, params.polynomial_size, params.message_modulus,
        params.carry_modulus, is_non_zero_lut_f, gpu_memory_allocated);

-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    is_non_zero_lut->broadcast_lut(active_streams);

    // Scalar may have up to num_radix_blocks blocks
@@ -238,7 +240,8 @@ template <typename Torus> struct int_tree_sign_reduction_buffer {
        tree_inner_leaf_lut->get_max_degree(0), params.glwe_dimension,
        params.polynomial_size, params.message_modulus, params.carry_modulus,
        block_selector_f, gpu_memory_allocated);
-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    tree_inner_leaf_lut->broadcast_lut(active_streams);
  }

@@ -390,7 +393,8 @@ template <typename Torus> struct int_comparison_buffer {
    this->op = op;
    this->is_signed = is_signed;

-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);

    identity_lut_f = [](Torus x) -> Torus { return x; };

@@ -523,7 +527,7 @@ template <typename Torus> struct int_comparison_buffer {
          signed_lut->get_degree(0), signed_lut->get_max_degree(0),
          params.glwe_dimension, params.polynomial_size, params.message_modulus,
          params.carry_modulus, signed_lut_f, gpu_memory_allocated);
-      auto active_streams = streams.active_gpu_subset(1);
+      auto active_streams = streams.active_gpu_subset(1, params.pbs_type);
      signed_lut->broadcast_lut(active_streams);
    }
    preallocated_h_lut = (Torus *)malloc(
--- a/backends/tfhe-cuda-backend/cuda/include/integer/compression/compression_utilities.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/compression/compression_utilities.h
@@ -116,7 +116,8 @@ template <typename Torus> struct int_decompression {
          effective_compression_carry_modulus,
          encryption_params.message_modulus, encryption_params.carry_modulus,
          decompression_rescale_f, gpu_memory_allocated);
-      auto active_streams = streams.active_gpu_subset(num_blocks_to_decompress);
+      auto active_streams = streams.active_gpu_subset(
+          num_blocks_to_decompress, decompression_rescale_lut->params.pbs_type);
      decompression_rescale_lut->broadcast_lut(active_streams);
    }
  }
--- a/backends/tfhe-cuda-backend/cuda/include/integer/div_rem.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/div_rem.h
@@ -356,7 +356,8 @@ template <typename Torus> struct unsigned_int_div_rem_2_2_memory {
          luts[j]->get_degree(0), luts[j]->get_max_degree(0),
          params.glwe_dimension, params.polynomial_size, params.message_modulus,
          params.carry_modulus, lut_f_message_extract, gpu_memory_allocated);
-      auto active_streams = streams.active_gpu_subset(num_blocks);
+      auto active_streams =
+          streams.active_gpu_subset(num_blocks, params.pbs_type);
      luts[j]->broadcast_lut(active_streams);
    }
  }
@@ -1012,7 +1013,7 @@ template <typename Torus> struct unsigned_int_div_rem_memory {
          masking_luts_1[i]->get_max_degree(0), params.glwe_dimension,
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          lut_f_masking, gpu_memory_allocated);
-      auto active_streams_1 = streams.active_gpu_subset(1);
+      auto active_streams_1 = streams.active_gpu_subset(1, params.pbs_type);
      masking_luts_1[i]->broadcast_lut(active_streams_1);

      generate_device_accumulator<Torus>(
@@ -1021,7 +1022,8 @@ template <typename Torus> struct unsigned_int_div_rem_memory {
          masking_luts_2[i]->get_max_degree(0), params.glwe_dimension,
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          lut_f_masking, gpu_memory_allocated);
-      auto active_streams_2 = streams.active_gpu_subset(num_blocks);
+      auto active_streams_2 =
+          streams.active_gpu_subset(num_blocks, params.pbs_type);
      masking_luts_2[i]->broadcast_lut(active_streams_2);
    }

@@ -1040,7 +1042,8 @@ template <typename Torus> struct unsigned_int_div_rem_memory {

    int_radix_lut<Torus> *luts[2] = {message_extract_lut_1,
                                     message_extract_lut_2};
-    auto active_streams = streams.active_gpu_subset(num_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);
    for (int j = 0; j < 2; j++) {
      generate_device_accumulator<Torus>(
          streams.stream(0), streams.gpu_index(0), luts[j]->get_lut(0, 0),
@@ -1128,7 +1131,8 @@ template <typename Torus> struct unsigned_int_div_rem_memory {

    // merge_overflow_flags_luts
    merge_overflow_flags_luts = new int_radix_lut<Torus> *[num_bits_in_message];
-    auto active_gpu_count_for_bits = streams.active_gpu_subset(1);
+    auto active_gpu_count_for_bits =
+        streams.active_gpu_subset(1, params.pbs_type);
    for (int i = 0; i < num_bits_in_message; i++) {
      auto lut_f_bit = [i](Torus x, Torus y) -> Torus {
        return (x == 0 && y == 0) << i;
@@ -1152,7 +1156,8 @@ template <typename Torus> struct unsigned_int_div_rem_memory {
                              uint32_t num_blocks, bool allocate_gpu_memory,
                              uint64_t &size_tracker) {
    gpu_memory_allocated = allocate_gpu_memory;
-    auto active_streams = streams.active_gpu_subset(2 * num_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(2 * num_blocks, params.pbs_type);
    this->params = params;

    if (params.message_modulus == 4 && params.carry_modulus == 4 &&
@@ -1473,7 +1478,8 @@ template <typename Torus> struct int_div_rem_memory {
                     bool allocate_gpu_memory, uint64_t &size_tracker) {

    gpu_memory_allocated = allocate_gpu_memory;
-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);
    this->params = params;
    this->is_signed = is_signed;

@@ -1559,7 +1565,7 @@ template <typename Torus> struct int_div_rem_memory {
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          f_compare_extracted_signed_bits, gpu_memory_allocated);
      auto active_gpu_count_cmp =
-          streams.active_gpu_subset(1); // only 1 block needed
+          streams.active_gpu_subset(1, params.pbs_type); // only 1 block needed
      compare_signed_bits_lut->broadcast_lut(active_gpu_count_cmp);
    }
  }
--- a/backends/tfhe-cuda-backend/cuda/include/integer/ilog2.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/ilog2.h
@@ -20,7 +20,8 @@ template <typename Torus> struct int_prepare_count_of_consecutive_bits_buffer {
    this->allocate_gpu_memory = allocate_gpu_memory;
    this->direction = direction;
    this->bit_value = bit_value;
-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    this->univ_lut_mem =
        new int_radix_lut<Torus>(streams, params, 1, num_radix_blocks,
                                 allocate_gpu_memory, size_tracker);
@@ -246,7 +247,8 @@ template <typename Torus> struct int_ilog2_buffer {
                                params.glwe_dimension, params.polynomial_size,
                                params.message_modulus, params.carry_modulus,
                                lut_message_lambda, allocate_gpu_memory);
-    auto active_streams = streams.active_gpu_subset(counter_num_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(counter_num_blocks, params.pbs_type);
    lut_message_not->broadcast_lut(active_streams);

    this->lut_carry_not =
--- a/backends/tfhe-cuda-backend/cuda/include/integer/integer.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/integer.h
@@ -883,6 +883,10 @@ void cuda_unchecked_first_index_in_clears_64(
    uint32_t num_unique, uint32_t num_blocks, uint32_t num_blocks_index,
    int8_t *mem, void *const *bsks, void *const *ksks);

+void cuda_small_scalar_multiplication_integer_64_inplace(
+    CudaStreamsFFI streams, CudaRadixCiphertextFFI *lwe_array, uint64_t scalar,
+    const uint32_t message_modulus, const uint32_t carry_modulus);
+
 void cleanup_cuda_unchecked_first_index_in_clears_64(CudaStreamsFFI streams,
                                                     int8_t **mem_ptr_void);

--- a/backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h
@@ -43,6 +43,8 @@ public:
              "parameters");                                                   \
    } else if ((msg_mod) == 0 && (carry_mod) == 0) {                           \
      break;                                                                   \
+    } else if ((msg_mod) == 4 && (carry_mod) == 32) {                          \
+      break;                                                                   \
    } else {                                                                   \
      PANIC("Invalid message modulus or carry modulus")                        \
    }                                                                          \
@@ -369,7 +371,8 @@ struct int_radix_lut_custom_input_output {
    this->num_input_blocks = num_input_blocks;
    this->gpu_memory_allocated = allocate_gpu_memory;

-    this->active_streams = streams.active_gpu_subset(num_radix_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
  }

  void setup_degrees() {
@@ -380,14 +383,18 @@ struct int_radix_lut_custom_input_output {

  void allocate_pbs_buffers(int_radix_params params, uint32_t num_radix_blocks,
                            bool allocate_gpu_memory, uint64_t &size_tracker) {
+
+    int threshold = (params.pbs_type == PBS_TYPE::MULTI_BIT)
+                        ? THRESHOLD_MULTI_GPU_WITH_MULTI_BIT_PARAMS
+                        : THRESHOLD_MULTI_GPU_WITH_CLASSICAL_PARAMS;
+
    for (uint i = 0; i < active_streams.count(); i++) {
      cuda_set_device(active_streams.gpu_index(i));
      int8_t *gpu_pbs_buffer;
-      auto num_blocks_on_gpu =
-          std::min((int)num_radix_blocks,
-                   std::max(THRESHOLD_MULTI_GPU,
-                            get_num_inputs_on_gpu(num_radix_blocks, i,
-                                                  active_streams.count())));
+      auto num_blocks_on_gpu = std::min(
+          (int)num_radix_blocks,
+          std::max(threshold, get_num_inputs_on_gpu(num_radix_blocks, i,
+                                                    active_streams.count())));

      uint64_t size = 0;
      execute_scratch_pbs<OutputTorus>(
@@ -422,18 +429,22 @@ struct int_radix_lut_custom_input_output {
    /// back to the original indexing
    multi_gpu_alloc_lwe_async(active_streams, lwe_array_in_vec,
                              num_radix_blocks, params.big_lwe_dimension + 1,
-                              size_tracker, allocate_gpu_memory);
+                              size_tracker, params.pbs_type,
+                              allocate_gpu_memory);
    multi_gpu_alloc_lwe_async(active_streams, lwe_after_ks_vec,
                              num_radix_blocks, params.small_lwe_dimension + 1,
-                              size_tracker, allocate_gpu_memory);
+                              size_tracker, params.pbs_type,
+                              allocate_gpu_memory);
    if (num_many_lut > 1) {
      multi_gpu_alloc_lwe_many_lut_output_async(
          active_streams, lwe_after_pbs_vec, num_radix_blocks, num_many_lut,
-          params.big_lwe_dimension + 1, size_tracker, allocate_gpu_memory);
+          params.big_lwe_dimension + 1, size_tracker, params.pbs_type,
+          allocate_gpu_memory);
    } else {
      multi_gpu_alloc_lwe_async(active_streams, lwe_after_pbs_vec,
                                num_radix_blocks, params.big_lwe_dimension + 1,
-                                size_tracker, allocate_gpu_memory);
+                                size_tracker, params.pbs_type,
+                                allocate_gpu_memory);
    }
    multi_gpu_alloc_array_async(active_streams, lwe_trivial_indexes_vec,
                                num_radix_blocks, size_tracker,
@@ -449,12 +460,14 @@ struct int_radix_lut_custom_input_output {
  }

  void setup_gemm_batch_ks_temp_buffers(uint64_t &size_tracker) {
+    int threshold = (params.pbs_type == PBS_TYPE::MULTI_BIT)
+                        ? THRESHOLD_MULTI_GPU_WITH_MULTI_BIT_PARAMS
+                        : THRESHOLD_MULTI_GPU_WITH_CLASSICAL_PARAMS;

-    auto inputs_on_gpu =
-        std::min((int)num_input_blocks,
-                 std::max(THRESHOLD_MULTI_GPU,
-                          get_num_inputs_on_gpu(num_input_blocks, 0,
-                                                active_streams.count())));
+    auto inputs_on_gpu = std::min(
+        (int)num_input_blocks,
+        std::max(threshold, get_num_inputs_on_gpu(num_input_blocks, 0,
+                                                  active_streams.count())));

    if (inputs_on_gpu >= get_threshold_ks_gemm()) {
      for (auto i = 0; i < active_streams.count(); ++i) {
@@ -796,16 +809,20 @@ struct int_radix_lut_custom_input_output {
  void allocate_lwe_vector_for_non_trivial_indexes(
      CudaStreams streams, uint64_t max_num_radix_blocks,
      uint64_t &size_tracker, bool allocate_gpu_memory) {
+
+    int threshold = (params.pbs_type == PBS_TYPE::MULTI_BIT)
+                        ? THRESHOLD_MULTI_GPU_WITH_MULTI_BIT_PARAMS
+                        : THRESHOLD_MULTI_GPU_WITH_CLASSICAL_PARAMS;
+
    // We need to create the auxiliary array only in GPU 0
    if (active_streams.count() > 1) {
      lwe_aligned_vec.resize(active_streams.count());
      for (uint i = 0; i < active_streams.count(); i++) {
        uint64_t size_tracker_on_array_i = 0;
-        auto inputs_on_gpu =
-            std::min((int)max_num_radix_blocks,
-                     std::max(THRESHOLD_MULTI_GPU,
-                              get_num_inputs_on_gpu(max_num_radix_blocks, i,
-                                                    active_streams.count())));
+        auto inputs_on_gpu = std::min(
+            (int)max_num_radix_blocks,
+            std::max(threshold, get_num_inputs_on_gpu(max_num_radix_blocks, i,
+                                                      active_streams.count())));
        InputTorus *d_array =
            (InputTorus *)cuda_malloc_with_size_tracking_async(
                inputs_on_gpu * (params.big_lwe_dimension + 1) *
@@ -996,8 +1013,8 @@ template <typename Torus> struct int_bit_extract_luts_buffer {
        num_radix_blocks * bits_per_block * sizeof(Torus), streams.stream(0),
        streams.gpu_index(0), allocate_gpu_memory);

-    auto active_streams =
-        streams.active_gpu_subset(bits_per_block * num_radix_blocks);
+    auto active_streams = streams.active_gpu_subset(
+        bits_per_block * num_radix_blocks, params.pbs_type);
    lut->broadcast_lut(active_streams);

    /**
@@ -1264,7 +1281,8 @@ template <typename Torus> struct int_sum_ciphertexts_vec_memory {
          luts_message_carry->get_max_degree(1), params.glwe_dimension,
          params.polynomial_size, message_modulus, params.carry_modulus,
          lut_f_carry, gpu_memory_allocated);
-      auto active_gpu_count_mc = streams.active_gpu_subset(pbs_count);
+      auto active_gpu_count_mc =
+          streams.active_gpu_subset(pbs_count, params.pbs_type);
      luts_message_carry->broadcast_lut(active_gpu_count_mc);
    }
  }
@@ -1434,7 +1452,8 @@ template <typename Torus> struct int_seq_group_prop_memory {
    cuda_memcpy_with_size_tracking_async_to_gpu(
        seq_lut_indexes, h_seq_lut_indexes, num_seq_luts * sizeof(Torus),
        streams.stream(0), streams.gpu_index(0), allocate_gpu_memory);
-    auto active_streams = streams.active_gpu_subset(num_seq_luts);
+    auto active_streams =
+        streams.active_gpu_subset(num_seq_luts, params.pbs_type);
    lut_sequential_algorithm->broadcast_lut(active_streams);
    free(h_seq_lut_indexes);
  };
@@ -1488,7 +1507,8 @@ template <typename Torus> struct int_hs_group_prop_memory {
        lut_hillis_steele->get_max_degree(0), glwe_dimension, polynomial_size,
        message_modulus, carry_modulus, f_lut_hillis_steele,
        gpu_memory_allocated);
-    auto active_streams = streams.active_gpu_subset(num_groups);
+    auto active_streams =
+        streams.active_gpu_subset(num_groups, params.pbs_type);
    lut_hillis_steele->broadcast_lut(active_streams);
  };
  void release(CudaStreams streams) {
@@ -1665,7 +1685,8 @@ template <typename Torus> struct int_shifted_blocks_and_states_memory {
        lut_indexes, h_lut_indexes, lut_indexes_size, streams.stream(0),
        streams.gpu_index(0), allocate_gpu_memory);
    // Do I need to do something else for the multi-gpu?
-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    luts_array_first_step->broadcast_lut(active_streams);
  };
  void release(CudaStreams streams) {
@@ -1930,7 +1951,8 @@ template <typename Torus> struct int_prop_simu_group_carries_memory {
        scalar_array_cum_sum, h_scalar_array_cum_sum,
        num_radix_blocks * sizeof(Torus), streams.stream(0),
        streams.gpu_index(0), allocate_gpu_memory);
-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    luts_array_second_step->broadcast_lut(active_streams);

    if (use_sequential_algorithm_to_resolve_group_carries) {
@@ -1955,7 +1977,8 @@ template <typename Torus> struct int_prop_simu_group_carries_memory {
    cuda_memcpy_with_size_tracking_async_gpu_to_gpu(
        lut_indexes, new_lut_indexes, new_num_blocks * sizeof(Torus),
        streams.stream(0), streams.gpu_index(0), gpu_memory_allocated);
-    auto new_active_streams = streams.active_gpu_subset(new_num_blocks);
+    auto new_active_streams = streams.active_gpu_subset(
+        new_num_blocks, luts_array_second_step->params.pbs_type);
    // We just need to update the lut indexes so we use false here
    luts_array_second_step->broadcast_lut(new_active_streams, false);

@@ -2122,7 +2145,7 @@ template <typename Torus> struct int_sc_prop_memory {
          polynomial_size, message_modulus, carry_modulus, f_overflow_fp,
          gpu_memory_allocated);

-      auto active_streams = streams.active_gpu_subset(1);
+      auto active_streams = streams.active_gpu_subset(1, params.pbs_type);
      lut_overflow_flag_prep->broadcast_lut(active_streams);
    }

@@ -2194,7 +2217,8 @@ template <typename Torus> struct int_sc_prop_memory {
          (num_radix_blocks + 1) * sizeof(Torus), streams.stream(0),
          streams.gpu_index(0), allocate_gpu_memory);
    }
-    auto active_streams = streams.active_gpu_subset(num_radix_blocks + 1);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks + 1, params.pbs_type);
    lut_message_extract->broadcast_lut(active_streams);
  };

@@ -2391,7 +2415,8 @@ template <typename Torus> struct int_shifted_blocks_and_borrow_states_memory {
        lut_indexes, h_lut_indexes, lut_indexes_size, streams.stream(0),
        streams.gpu_index(0), allocate_gpu_memory);
    // Do I need to do something else for the multi-gpu?
-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    luts_array_first_step->broadcast_lut(active_streams);
  };

@@ -2402,7 +2427,8 @@ template <typename Torus> struct int_shifted_blocks_and_borrow_states_memory {
    cuda_memcpy_with_size_tracking_async_gpu_to_gpu(
        lut_indexes, new_lut_indexes, new_num_blocks * sizeof(Torus),
        streams.stream(0), streams.gpu_index(0), gpu_memory_allocated);
-    auto new_active_streams = streams.active_gpu_subset(new_num_blocks);
+    auto new_active_streams = streams.active_gpu_subset(
+        new_num_blocks, luts_array_first_step->params.pbs_type);
    // We just need to update the lut indexes so we use false here
    luts_array_first_step->broadcast_lut(new_active_streams, false);
  }
@@ -2497,7 +2523,8 @@ template <typename Torus> struct int_borrow_prop_memory {
        lut_message_extract->get_max_degree(0), glwe_dimension, polynomial_size,
        message_modulus, carry_modulus, f_message_extract,
        gpu_memory_allocated);
-    active_streams = streams.active_gpu_subset(num_radix_blocks);
+    active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);

    lut_message_extract->broadcast_lut(active_streams);

@@ -2518,7 +2545,8 @@ template <typename Torus> struct int_borrow_prop_memory {
      lut_borrow_flag->broadcast_lut(active_streams);
    }

-    active_streams = streams.active_gpu_subset(num_radix_blocks);
+    active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    internal_streams.create_internal_cuda_streams_on_same_gpus(active_streams,
                                                               2);
  };
--- a/backends/tfhe-cuda-backend/cuda/include/integer/multiplication.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/multiplication.h
@@ -45,7 +45,8 @@ template <typename Torus> struct int_mul_memory {
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          zero_out_predicate_lut_f, gpu_memory_allocated);

-      auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+      auto active_streams =
+          streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
      zero_out_predicate_lut->broadcast_lut(active_streams);

      zero_out_mem = new int_zero_out_if_buffer<Torus>(
@@ -122,7 +123,8 @@ template <typename Torus> struct int_mul_memory {
          streams.stream(0), streams.gpu_index(0),
          luts_array->get_lut_indexes(0, lsb_vector_block_count), 1,
          msb_vector_block_count);
-    auto active_streams = streams.active_gpu_subset(total_block_count);
+    auto active_streams =
+        streams.active_gpu_subset(total_block_count, params.pbs_type);
    luts_array->broadcast_lut(active_streams);
    // create memory object for sum ciphertexts
    sum_ciphertexts_mem = new int_sum_ciphertexts_vec_memory<Torus>(
--- a/backends/tfhe-cuda-backend/cuda/include/integer/oprf.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/oprf.h
@@ -126,9 +126,11 @@ template <typename Torus> struct int_grouped_oprf_memory {
        luts->get_lut_indexes(0, 0), this->h_lut_indexes,
        num_blocks_to_process * sizeof(Torus), streams.stream(0),
        streams.gpu_index(0), allocate_gpu_memory);
-    auto active_streams = streams.active_gpu_subset(num_blocks_to_process);
+    auto active_streams =
+        streams.active_gpu_subset(num_blocks_to_process, params.pbs_type);
    luts->broadcast_lut(active_streams);

+    cuda_synchronize_stream(streams.stream(0), streams.gpu_index(0));
    free(h_corrections);
  }

--- a/backends/tfhe-cuda-backend/cuda/include/integer/radix_ciphertext.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/radix_ciphertext.h
@@ -6,6 +6,8 @@ void release_radix_ciphertext_async(cudaStream_t const stream,
                                    CudaRadixCiphertextFFI *data,
                                    const bool gpu_memory_allocated);

+void release_cpu_radix_ciphertext_async(CudaRadixCiphertextFFI *data);
+
 void reset_radix_ciphertext_blocks(CudaRadixCiphertextFFI *data,
                                   uint32_t new_num_blocks);

--- a/backends/tfhe-cuda-backend/cuda/include/integer/scalar_shifts.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/scalar_shifts.h
@@ -91,7 +91,8 @@ template <typename Torus> struct int_logical_scalar_shift_buffer {
          cur_lut_bivariate->get_max_degree(0), params.glwe_dimension,
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          shift_lut_f, gpu_memory_allocated);
-      auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+      auto active_streams =
+          streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
      cur_lut_bivariate->broadcast_lut(active_streams);

      lut_buffers_bivariate.push_back(cur_lut_bivariate);
@@ -177,7 +178,8 @@ template <typename Torus> struct int_logical_scalar_shift_buffer {
          cur_lut_bivariate->get_max_degree(0), params.glwe_dimension,
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          shift_lut_f, gpu_memory_allocated);
-      auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+      auto active_streams =
+          streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
      cur_lut_bivariate->broadcast_lut(active_streams);

      lut_buffers_bivariate.push_back(cur_lut_bivariate);
@@ -220,7 +222,7 @@ template <typename Torus> struct int_arithmetic_scalar_shift_buffer {
                                     uint64_t &size_tracker) {
    gpu_memory_allocated = allocate_gpu_memory;

-    auto active_streams = streams.active_gpu_subset(1);
+    auto active_streams = streams.active_gpu_subset(1, params.pbs_type);
    // In the arithmetic shift, a PBS has to be applied to the last rotated
    // block twice: once to shift it, once to compute the padding block to be
    // copied onto all blocks to the left of the last rotated block
@@ -276,7 +278,8 @@ template <typename Torus> struct int_arithmetic_scalar_shift_buffer {
          shift_last_block_lut_univariate->get_max_degree(0),
          params.glwe_dimension, params.polynomial_size, params.message_modulus,
          params.carry_modulus, last_block_lut_f, gpu_memory_allocated);
-      auto active_streams_shift_last = streams.active_gpu_subset(1);
+      auto active_streams_shift_last =
+          streams.active_gpu_subset(1, params.pbs_type);
      shift_last_block_lut_univariate->broadcast_lut(active_streams_shift_last);

      lut_buffers_univariate.push_back(shift_last_block_lut_univariate);
@@ -302,7 +305,7 @@ template <typename Torus> struct int_arithmetic_scalar_shift_buffer {
        padding_block_lut_univariate->get_max_degree(0), params.glwe_dimension,
        params.polynomial_size, params.message_modulus, params.carry_modulus,
        padding_block_lut_f, gpu_memory_allocated);
-    // auto active_streams = streams.active_gpu_subset(1);
+    // auto active_streams = streams.active_gpu_subset(1, params.pbs_type);
    padding_block_lut_univariate->broadcast_lut(active_streams);

    lut_buffers_univariate.push_back(padding_block_lut_univariate);
@@ -344,7 +347,7 @@ template <typename Torus> struct int_arithmetic_scalar_shift_buffer {
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          blocks_lut_f, gpu_memory_allocated);
      auto active_streams_shift_blocks =
-          streams.active_gpu_subset(num_radix_blocks);
+          streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
      shift_blocks_lut_bivariate->broadcast_lut(active_streams_shift_blocks);

      lut_buffers_bivariate.push_back(shift_blocks_lut_bivariate);
--- a/backends/tfhe-cuda-backend/cuda/include/integer/shift_and_rotate.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/shift_and_rotate.h
@@ -119,8 +119,8 @@ template <typename Torus> struct int_shift_and_rotate_buffer {
        mux_lut->get_degree(0), mux_lut->get_max_degree(0),
        params.glwe_dimension, params.polynomial_size, params.message_modulus,
        params.carry_modulus, mux_lut_f, gpu_memory_allocated);
-    auto active_gpu_count_mux =
-        streams.active_gpu_subset(bits_per_block * num_radix_blocks);
+    auto active_gpu_count_mux = streams.active_gpu_subset(
+        bits_per_block * num_radix_blocks, params.pbs_type);
    mux_lut->broadcast_lut(active_gpu_count_mux);

    auto cleaning_lut_f = [params](Torus x) -> Torus {
@@ -132,7 +132,7 @@ template <typename Torus> struct int_shift_and_rotate_buffer {
        params.glwe_dimension, params.polynomial_size, params.message_modulus,
        params.carry_modulus, cleaning_lut_f, gpu_memory_allocated);
    auto active_gpu_count_cleaning =
-        streams.active_gpu_subset(num_radix_blocks);
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    cleaning_lut->broadcast_lut(active_gpu_count_cleaning);
  }

--- a/backends/tfhe-cuda-backend/cuda/include/integer/subtraction.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/subtraction.h
@@ -108,7 +108,8 @@ template <typename Torus> struct int_overflowing_sub_memory {
        glwe_dimension, polynomial_size, message_modulus, carry_modulus,
        f_message_acc, gpu_memory_allocated);

-    auto active_streams = streams.active_gpu_subset(num_radix_blocks);
+    auto active_streams =
+        streams.active_gpu_subset(num_radix_blocks, params.pbs_type);
    luts_array->broadcast_lut(active_streams);
    luts_borrow_propagation_sum->broadcast_lut(active_streams);
    message_acc->broadcast_lut(active_streams);
--- a/backends/tfhe-cuda-backend/cuda/include/integer/vector_comparison.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/vector_comparison.h
@@ -38,7 +38,8 @@ template <typename Torus> struct int_unchecked_all_eq_slices_buffer {
      num_streams_to_use = 1;

    this->num_streams = num_streams_to_use;
-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    uint32_t num_gpus = active_streams.count();

--- a/backends/tfhe-cuda-backend/cuda/include/integer/vector_find.h
+++ b/backends/tfhe-cuda-backend/cuda/include/integer/vector_find.h
@@ -40,7 +40,8 @@ template <typename Torus> struct int_equality_selectors_buffer {

    this->num_streams = num_streams_to_use;

-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams_to_use);
@@ -154,7 +155,8 @@ template <typename Torus> struct int_possible_results_buffer {

    this->num_streams = num_streams_to_use;

-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams_to_use);
@@ -207,7 +209,8 @@ template <typename Torus> struct int_possible_results_buffer {
            params.message_modulus, params.carry_modulus, fns,
            allocate_gpu_memory);

-        current_lut->broadcast_lut(streams.active_gpu_subset(1));
+        current_lut->broadcast_lut(
+            streams.active_gpu_subset(1, params.pbs_type));
        stream_luts[lut_count++] = current_lut;
        lut_value_start += luts_in_this_call;
      }
@@ -282,7 +285,8 @@ template <typename Torus> struct int_aggregate_one_hot_buffer {

    this->num_streams = num_streams_to_use;

-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams);
@@ -300,7 +304,8 @@ template <typename Torus> struct int_aggregate_one_hot_buffer {
          params.polynomial_size, params.message_modulus, params.carry_modulus,
          id_fn, allocate_gpu_memory);

-      lut->broadcast_lut(streams.active_gpu_subset(num_blocks));
+      lut->broadcast_lut(
+          streams.active_gpu_subset(num_blocks, params.pbs_type));
      this->stream_identity_luts[i] = lut;
    }

@@ -321,7 +326,7 @@ template <typename Torus> struct int_aggregate_one_hot_buffer {
        params.polynomial_size, params.message_modulus, params.carry_modulus,
        msg_fn, allocate_gpu_memory);
    this->message_extract_lut->broadcast_lut(
-        streams.active_gpu_subset(num_blocks));
+        streams.active_gpu_subset(num_blocks, params.pbs_type));

    this->carry_extract_lut = new int_radix_lut<Torus>(
        streams, params, 1, num_blocks, allocate_gpu_memory, size_tracker);
@@ -333,7 +338,7 @@ template <typename Torus> struct int_aggregate_one_hot_buffer {
        params.polynomial_size, params.message_modulus, params.carry_modulus,
        carry_fn, allocate_gpu_memory);
    this->carry_extract_lut->broadcast_lut(
-        streams.active_gpu_subset(num_blocks));
+        streams.active_gpu_subset(num_blocks, params.pbs_type));

    this->partial_aggregated_vectors =
        new CudaRadixCiphertextFFI *[num_streams];
@@ -628,7 +633,8 @@ template <typename Torus> struct int_unchecked_contains_buffer {
      num_streams_to_use = 1;

    this->num_streams = num_streams_to_use;
-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams_to_use);
@@ -703,7 +709,8 @@ template <typename Torus> struct int_unchecked_contains_clear_buffer {
      num_streams_to_use = 1;

    this->num_streams = num_streams_to_use;
-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams_to_use);
@@ -1094,7 +1101,8 @@ template <typename Torus> struct int_unchecked_first_index_of_clear_buffer {
      num_streams_to_use = 1;

    this->num_streams = num_streams_to_use;
-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams_to_use);
@@ -1184,7 +1192,8 @@ template <typename Torus> struct int_unchecked_first_index_of_clear_buffer {
        this->prefix_sum_lut->get_max_degree(0), params.glwe_dimension,
        params.polynomial_size, params.message_modulus, params.carry_modulus,
        prefix_sum_fn, allocate_gpu_memory);
-    this->prefix_sum_lut->broadcast_lut(streams.active_gpu_subset(num_inputs));
+    this->prefix_sum_lut->broadcast_lut(
+        streams.active_gpu_subset(num_inputs, params.pbs_type));

    auto cleanup_fn = [ALREADY_SEEN, params](Torus x) -> Torus {
      Torus val = x % params.message_modulus;
@@ -1200,7 +1209,8 @@ template <typename Torus> struct int_unchecked_first_index_of_clear_buffer {
        this->cleanup_lut->get_max_degree(0), params.glwe_dimension,
        params.polynomial_size, params.message_modulus, params.carry_modulus,
        cleanup_fn, allocate_gpu_memory);
-    this->cleanup_lut->broadcast_lut(streams.active_gpu_subset(num_inputs));
+    this->cleanup_lut->broadcast_lut(
+        streams.active_gpu_subset(num_inputs, params.pbs_type));
  }

  void release(CudaStreams streams) {
@@ -1292,7 +1302,8 @@ template <typename Torus> struct int_unchecked_first_index_of_buffer {
      num_streams_to_use = 1;

    this->num_streams = num_streams_to_use;
-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams_to_use);
@@ -1372,7 +1383,8 @@ template <typename Torus> struct int_unchecked_first_index_of_buffer {
        this->prefix_sum_lut->get_max_degree(0), params.glwe_dimension,
        params.polynomial_size, params.message_modulus, params.carry_modulus,
        prefix_sum_fn, allocate_gpu_memory);
-    this->prefix_sum_lut->broadcast_lut(streams.active_gpu_subset(num_inputs));
+    this->prefix_sum_lut->broadcast_lut(
+        streams.active_gpu_subset(num_inputs, params.pbs_type));

    auto cleanup_fn = [ALREADY_SEEN, params](Torus x) -> Torus {
      Torus val = x % params.message_modulus;
@@ -1388,7 +1400,8 @@ template <typename Torus> struct int_unchecked_first_index_of_buffer {
        this->cleanup_lut->get_max_degree(0), params.glwe_dimension,
        params.polynomial_size, params.message_modulus, params.carry_modulus,
        cleanup_fn, allocate_gpu_memory);
-    this->cleanup_lut->broadcast_lut(streams.active_gpu_subset(num_inputs));
+    this->cleanup_lut->broadcast_lut(
+        streams.active_gpu_subset(num_inputs, params.pbs_type));
  }

  void release(CudaStreams streams) {
@@ -1462,7 +1475,8 @@ template <typename Torus> struct int_unchecked_index_of_buffer {
      num_streams_to_use = 1;

    this->num_streams = num_streams_to_use;
-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams_to_use);
@@ -1523,7 +1537,8 @@ template <typename Torus> struct int_unchecked_index_of_clear_buffer {
      num_streams_to_use = 1;

    this->num_streams = num_streams_to_use;
-    this->active_streams = streams.active_gpu_subset(num_blocks);
+    this->active_streams =
+        streams.active_gpu_subset(num_blocks, params.pbs_type);

    this->internal_cuda_streams.create_internal_cuda_streams_on_same_gpus(
        active_streams, num_streams_to_use);
--- a/backends/tfhe-cuda-backend/cuda/include/keyswitch/keyswitch.h
+++ b/backends/tfhe-cuda-backend/cuda/include/keyswitch/keyswitch.h
@@ -5,21 +5,14 @@

 extern "C" {

-void cuda_keyswitch_lwe_ciphertext_vector_32(
+void cuda_keyswitch_lwe_ciphertext_vector_64_64(
    void *stream, uint32_t gpu_index, void *lwe_array_out,
    void const *lwe_output_indexes, void const *lwe_array_in,
    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
    uint32_t num_samples);

-void cuda_keyswitch_gemm_lwe_ciphertext_vector_64(
-    void *stream, uint32_t gpu_index, void *lwe_array_out,
-    void const *lwe_output_indexes, void const *lwe_array_in,
-    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
-    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
-    uint32_t num_samples, const void *ks_tmp_buffer, bool uses_trivial_indexes);
-
-void cuda_keyswitch_lwe_ciphertext_vector_64(
+void cuda_keyswitch_lwe_ciphertext_vector_64_32(
    void *stream, uint32_t gpu_index, void *lwe_array_out,
    void const *lwe_output_indexes, void const *lwe_array_in,
    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
@@ -31,6 +24,20 @@ uint64_t scratch_packing_keyswitch_lwe_list_to_glwe_64(
    uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
    uint32_t num_lwes, bool allocate_gpu_memory);

+void cuda_keyswitch_gemm_lwe_ciphertext_vector_64_64(
+    void *stream, uint32_t gpu_index, void *lwe_array_out,
+    void const *lwe_output_indexes, void const *lwe_array_in,
+    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
+    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
+    uint32_t num_samples, const void *ks_tmp_buffer, bool uses_trivial_indexes);
+
+void cuda_keyswitch_gemm_lwe_ciphertext_vector_64_32(
+    void *stream, uint32_t gpu_index, void *lwe_array_out,
+    void const *lwe_output_indexes, void const *lwe_array_in,
+    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
+    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
+    uint32_t num_samples, const void *ks_tmp_buffer, bool uses_trivial_indexes);
+
 uint64_t scratch_cuda_keyswitch_gemm_64(void *stream, uint32_t gpu_index,
                                        void **ks_tmp_memory,
                                        uint32_t lwe_dimension_in,
@@ -65,6 +72,10 @@ void cleanup_packing_keyswitch_lwe_list_to_glwe(void *stream,
                                                uint32_t gpu_index,
                                                int8_t **fp_ks_buffer,
                                                bool gpu_memory_allocated);
+
+void cuda_closest_representable_64(void *stream, uint32_t gpu_index,
+                                   void const *input, void *output,
+                                   uint32_t base_log, uint32_t level_count);
 }

 #endif // CNCRT_KS_H_
--- a/backends/tfhe-cuda-backend/cuda/include/zk/zk_utilities.h
+++ b/backends/tfhe-cuda-backend/cuda/include/zk/zk_utilities.h
@@ -47,7 +47,7 @@ template <typename Torus> struct compact_lwe_list {

 template <typename Torus> struct flattened_compact_lwe_lists {
  Torus *d_ptr;
-  Torus **d_ptr_to_compact_list;
+  Torus **ptr_array_to_d_compact_list;
  const uint32_t *h_num_lwes_per_compact_list;
  uint32_t num_compact_lists;
  uint32_t lwe_dimension;
@@ -59,13 +59,13 @@ template <typename Torus> struct flattened_compact_lwe_lists {
                              uint32_t lwe_dimension)
      : d_ptr(d_ptr), h_num_lwes_per_compact_list(h_num_lwes_per_compact_list),
        num_compact_lists(num_compact_lists), lwe_dimension(lwe_dimension) {
-    d_ptr_to_compact_list =
-        static_cast<Torus **>(malloc(num_compact_lists * sizeof(Torus **)));
+    ptr_array_to_d_compact_list =
+        static_cast<Torus **>(malloc(num_compact_lists * sizeof(Torus *)));
    total_num_lwes = 0;
    auto curr_list = d_ptr;
    for (auto i = 0; i < num_compact_lists; ++i) {
      total_num_lwes += h_num_lwes_per_compact_list[i];
-      d_ptr_to_compact_list[i] = curr_list;
+      ptr_array_to_d_compact_list[i] = curr_list;
      curr_list += lwe_dimension + h_num_lwes_per_compact_list[i];
    }
  }
@@ -75,10 +75,12 @@ template <typename Torus> struct flattened_compact_lwe_lists {
      PANIC("index out of range in flattened_compact_lwe_lists::get");
    }

-    return compact_lwe_list(d_ptr_to_compact_list[compact_list_index],
+    return compact_lwe_list(ptr_array_to_d_compact_list[compact_list_index],
                            lwe_dimension,
                            h_num_lwes_per_compact_list[compact_list_index]);
  }
+
+  void release() { free(ptr_array_to_d_compact_list); }
 };

 /*
@@ -121,7 +123,6 @@ template <typename Torus> struct zk_expand_mem {
      : computing_params(computing_params), casting_params(casting_params),
        num_compact_lists(num_compact_lists),
        casting_key_type(casting_key_type) {
-
    gpu_memory_allocated = allocate_gpu_memory;

    // We copy num_lwes_per_compact_list so we get protection against
@@ -289,7 +290,8 @@ template <typename Torus> struct zk_expand_mem {
        lut_indexes, h_lut_indexes, num_packed_msgs * num_lwes * sizeof(Torus),
        streams.stream(0), streams.gpu_index(0), allocate_gpu_memory);

-    auto active_streams = streams.active_gpu_subset(2 * num_lwes);
+    auto active_streams =
+        streams.active_gpu_subset(2 * num_lwes, params.pbs_type);
    message_and_carry_extract_luts->broadcast_lut(active_streams);

    message_and_carry_extract_luts->allocate_lwe_vector_for_non_trivial_indexes(
@@ -313,7 +315,6 @@ template <typename Torus> struct zk_expand_mem {
  }

  void release(CudaStreams streams) {
-
    message_and_carry_extract_luts->release(streams);
    delete message_and_carry_extract_luts;

--- a/backends/tfhe-cuda-backend/cuda/src/aes/aes.cuh
+++ b/backends/tfhe-cuda-backend/cuda/src/aes/aes.cuh
@@ -105,11 +105,11 @@ aes_xor(CudaStreams streams, int_aes_encrypt_buffer<Torus> *mem,
 * result.
 *
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ __forceinline__ void
 aes_flush_inplace(CudaStreams streams, CudaRadixCiphertextFFI *data,
                  int_aes_encrypt_buffer<Torus> *mem, void *const *bsks,
-                  Torus *const *ksks) {
+                  KSTorus *const *ksks) {

  integer_radix_apply_univariate_lookup_table<Torus>(streams, data, data, bsks,
                                                     ksks, mem->luts->flush_lut,
@@ -121,10 +121,12 @@ aes_flush_inplace(CudaStreams streams, CudaRadixCiphertextFFI *data,
 * ciphertext, then flushes the result to ensure it's a valid bit.
 *
 */
-template <typename Torus>
-__host__ __forceinline__ void aes_scalar_add_one_flush_inplace(
-    CudaStreams streams, CudaRadixCiphertextFFI *data,
-    int_aes_encrypt_buffer<Torus> *mem, void *const *bsks, Torus *const *ksks) {
+template <typename Torus, typename KSTorus>
+__host__ __forceinline__ void
+aes_scalar_add_one_flush_inplace(CudaStreams streams,
+                                 CudaRadixCiphertextFFI *data,
+                                 int_aes_encrypt_buffer<Torus> *mem,
+                                 void *const *bsks, KSTorus *const *ksks) {

  host_add_scalar_one_inplace<Torus>(streams, data, mem->params.message_modulus,
                                     mem->params.carry_modulus);
@@ -142,11 +144,11 @@ __host__ __forceinline__ void aes_scalar_add_one_flush_inplace(
 * ciphertext locations.
 *
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void
 batch_vec_flush_inplace(CudaStreams streams, CudaRadixCiphertextFFI **targets,
                        size_t count, int_aes_encrypt_buffer<Torus> *mem,
-                        void *const *bsks, Torus *const *ksks) {
+                        void *const *bsks, KSTorus *const *ksks) {

  uint32_t num_radix_blocks = targets[0]->num_radix_blocks;

@@ -185,13 +187,13 @@ batch_vec_flush_inplace(CudaStreams streams, CudaRadixCiphertextFFI **targets,
 * Batches multiple "and" operations into a single, large launch.
 *
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void batch_vec_and_inplace(CudaStreams streams,
                                    CudaRadixCiphertextFFI **outs,
                                    CudaRadixCiphertextFFI **lhs,
                                    CudaRadixCiphertextFFI **rhs, size_t count,
                                    int_aes_encrypt_buffer<Torus> *mem,
-                                    void *const *bsks, Torus *const *ksks) {
+                                    void *const *bsks, KSTorus *const *ksks) {

  uint32_t num_aes_inputs = outs[0]->num_radix_blocks;

@@ -274,13 +276,13 @@ __host__ void batch_vec_and_inplace(CudaStreams streams,
 * [ptr] -> [R2b0, R2b1, R2b2, R2b3, R2b4, R2b5, R2b6, R2b7]
 * ...
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void vectorized_sbox_n_bytes(CudaStreams streams,
                                      CudaRadixCiphertextFFI **sbox_io_bytes,
                                      uint32_t num_bytes_parallel,
                                      uint32_t num_aes_inputs,
                                      int_aes_encrypt_buffer<Torus> *mem,
-                                      void *const *bsks, Torus *const *ksks) {
+                                      void *const *bsks, KSTorus *const *ksks) {

  uint32_t num_sbox_blocks = num_bytes_parallel * num_aes_inputs;

@@ -702,12 +704,12 @@ __host__ void vectorized_mul_by_2(CudaStreams streams,
 * [ s'_3 ]   [ 03 01 01 02 ]   [ s_3 ]
 *
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void vectorized_mix_columns(CudaStreams streams,
                                     CudaRadixCiphertextFFI *s_bits,
                                     uint32_t num_aes_inputs,
                                     int_aes_encrypt_buffer<Torus> *mem,
-                                     void *const *bsks, Torus *const *ksks) {
+                                     void *const *bsks, KSTorus *const *ksks) {

  constexpr uint32_t BITS_PER_BYTE = 8;
  constexpr uint32_t BYTES_PER_COLUMN = 4;
@@ -842,11 +844,12 @@ __host__ void vectorized_mix_columns(CudaStreams streams,
 * - AddRoundKey
 *
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void vectorized_aes_encrypt_inplace(
    CudaStreams streams, CudaRadixCiphertextFFI *all_states_bitsliced,
    CudaRadixCiphertextFFI const *round_keys, uint32_t num_aes_inputs,
-    int_aes_encrypt_buffer<Torus> *mem, void *const *bsks, Torus *const *ksks) {
+    int_aes_encrypt_buffer<Torus> *mem, void *const *bsks,
+    KSTorus *const *ksks) {

  constexpr uint32_t BITS_PER_BYTE = 8;
  constexpr uint32_t STATE_BYTES = 16;
@@ -987,11 +990,12 @@ __host__ void vectorized_aes_encrypt_inplace(
 * The "transposed_states" buffer is updated in-place with the sum bits $S_i$.
 *
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void vectorized_aes_full_adder_inplace(
    CudaStreams streams, CudaRadixCiphertextFFI *transposed_states,
    const Torus *counter_bits_le_all_blocks, uint32_t num_aes_inputs,
-    int_aes_encrypt_buffer<Torus> *mem, void *const *bsks, Torus *const *ksks) {
+    int_aes_encrypt_buffer<Torus> *mem, void *const *bsks,
+    KSTorus *const *ksks) {

  constexpr uint32_t NUM_BITS = 128;

@@ -1091,12 +1095,13 @@ __host__ void vectorized_aes_full_adder_inplace(
 * +---------------------------------+
 *
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void host_integer_aes_ctr_encrypt(
    CudaStreams streams, CudaRadixCiphertextFFI *output,
    CudaRadixCiphertextFFI const *iv, CudaRadixCiphertextFFI const *round_keys,
    const Torus *counter_bits_le_all_blocks, uint32_t num_aes_inputs,
-    int_aes_encrypt_buffer<Torus> *mem, void *const *bsks, Torus *const *ksks) {
+    int_aes_encrypt_buffer<Torus> *mem, void *const *bsks,
+    KSTorus *const *ksks) {

  constexpr uint32_t NUM_BITS = 128;

@@ -1148,13 +1153,13 @@ uint64_t scratch_cuda_integer_key_expansion(
 * - If (i % 4 == 0): w_i = w_{i-4} + SubWord(RotWord(w_{i-1})) + Rcon[i/4]
 * - If (i % 4 != 0): w_i = w_{i-4} + w_{i-1}
 */
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void host_integer_key_expansion(CudaStreams streams,
                                         CudaRadixCiphertextFFI *expanded_keys,
                                         CudaRadixCiphertextFFI const *key,
                                         int_key_expansion_buffer<Torus> *mem,
                                         void *const *bsks,
-                                         Torus *const *ksks) {
+                                         KSTorus *const *ksks) {

  constexpr uint32_t BITS_PER_WORD = 32;
  constexpr uint32_t BITS_PER_BYTE = 8;
--- a/backends/tfhe-cuda-backend/cuda/src/crypto/keyswitch.cu
+++ b/backends/tfhe-cuda-backend/cuda/src/crypto/keyswitch.cu
@@ -2,26 +2,9 @@
 #include "keyswitch/keyswitch.h"
 #include "packing_keyswitch.cuh"

-/* Perform keyswitch on a batch of 32 bits input LWE ciphertexts.
- * Head out to the equivalent operation on 64 bits for more details.
- */
-void cuda_keyswitch_lwe_ciphertext_vector_32(
-    void *stream, uint32_t gpu_index, void *lwe_array_out,
-    void *lwe_output_indexes, void *lwe_array_in, void *lwe_input_indexes,
-    void *ksk, uint32_t lwe_dimension_in, uint32_t lwe_dimension_out,
-    uint32_t base_log, uint32_t level_count, uint32_t num_samples,
-    void *ksk_tmp_buffer, bool uses_trivial_indices) {
-  host_gemm_keyswitch_lwe_ciphertext_vector<uint32_t>(
-      static_cast<cudaStream_t>(stream), gpu_index,
-      static_cast<uint32_t *>(lwe_array_out),
-      static_cast<uint32_t *>(lwe_output_indexes),
-      static_cast<uint32_t *>(lwe_array_in),
-      static_cast<uint32_t *>(lwe_input_indexes), static_cast<uint32_t *>(ksk),
-      lwe_dimension_in, lwe_dimension_out, base_log, level_count, num_samples,
-      static_cast<uint32_t *>(ksk_tmp_buffer), uses_trivial_indices);
-}
-
-/* Perform keyswitch on a batch of 64 bits input LWE ciphertexts.
+/* Perform keyswitch on a batch of 64 bits input LWE ciphertexts
+ * using a 64-b key-switching key. Uses the GEMM approach which
+ * achieves good throughput on large batches (128 LWEs on H100)
 *
 * - `v_stream` is a void pointer to the Cuda stream to be used in the kernel
 * launch
@@ -37,7 +20,7 @@ void cuda_keyswitch_lwe_ciphertext_vector_32(
 * This function calls a wrapper to a device kernel that performs the keyswitch
 * 	- num_samples blocks of threads are launched
 */
-void cuda_keyswitch_gemm_lwe_ciphertext_vector_64(
+void cuda_keyswitch_gemm_lwe_ciphertext_vector_64_64(
    void *stream, uint32_t gpu_index, void *lwe_array_out,
    void const *lwe_output_indexes, void const *lwe_array_in,
    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
@@ -45,7 +28,7 @@ void cuda_keyswitch_gemm_lwe_ciphertext_vector_64(
    uint32_t num_samples, const void *ks_tmp_buffer,
    bool uses_trivial_indices) {

-  host_gemm_keyswitch_lwe_ciphertext_vector<uint64_t>(
+  host_gemm_keyswitch_lwe_ciphertext_vector<uint64_t, uint64_t>(
      static_cast<cudaStream_t>(stream), gpu_index,
      static_cast<uint64_t *>(lwe_array_out),
      static_cast<const uint64_t *>(lwe_output_indexes),
@@ -57,13 +40,37 @@ void cuda_keyswitch_gemm_lwe_ciphertext_vector_64(
      uses_trivial_indices);
 }

-void cuda_keyswitch_lwe_ciphertext_vector_64(
+/* Perform keyswitch on a batch of 64 bits input LWE ciphertexts
+ * using a 32-b key-switching key, producing 32-bit LWE outputs.
+ * Uses the GEMM approach which achieves good throughput on large batches
+ */
+void cuda_keyswitch_gemm_lwe_ciphertext_vector_64_32(
+    void *stream, uint32_t gpu_index, void *lwe_array_out,
+    void const *lwe_output_indexes, void const *lwe_array_in,
+    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
+    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
+    uint32_t num_samples, const void *ks_tmp_buffer,
+    bool uses_trivial_indices) {
+
+  host_gemm_keyswitch_lwe_ciphertext_vector<uint64_t, uint32_t>(
+      static_cast<cudaStream_t>(stream), gpu_index,
+      static_cast<uint32_t *>(lwe_array_out),
+      static_cast<const uint64_t *>(lwe_output_indexes),
+      static_cast<const uint64_t *>(lwe_array_in),
+      static_cast<const uint64_t *>(lwe_input_indexes),
+      static_cast<const uint32_t *>(ksk), lwe_dimension_in, lwe_dimension_out,
+      base_log, level_count, num_samples,
+      static_cast<const ks_mem<uint64_t> *>(ks_tmp_buffer)->d_buffer,
+      uses_trivial_indices);
+}
+
+void cuda_keyswitch_lwe_ciphertext_vector_64_64(
    void *stream, uint32_t gpu_index, void *lwe_array_out,
    void const *lwe_output_indexes, void const *lwe_array_in,
    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
    uint32_t num_samples) {
-  host_keyswitch_lwe_ciphertext_vector<uint64_t>(
+  host_keyswitch_lwe_ciphertext_vector<uint64_t, uint64_t>(
      static_cast<cudaStream_t>(stream), gpu_index,
      static_cast<uint64_t *>(lwe_array_out),
      static_cast<uint64_t const *>(lwe_output_indexes),
@@ -73,6 +80,22 @@ void cuda_keyswitch_lwe_ciphertext_vector_64(
      base_log, level_count, num_samples);
 }

+void cuda_keyswitch_lwe_ciphertext_vector_64_32(
+    void *stream, uint32_t gpu_index, void *lwe_array_out,
+    void const *lwe_output_indexes, void const *lwe_array_in,
+    void const *lwe_input_indexes, void const *ksk, uint32_t lwe_dimension_in,
+    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
+    uint32_t num_samples) {
+  host_keyswitch_lwe_ciphertext_vector<uint64_t, uint32_t>(
+      static_cast<cudaStream_t>(stream), gpu_index,
+      static_cast<uint32_t *>(lwe_array_out),
+      static_cast<const uint64_t *>(lwe_output_indexes),
+      static_cast<const uint64_t *>(lwe_array_in),
+      static_cast<const uint64_t *>(lwe_input_indexes),
+      static_cast<const uint32_t *>(ksk), lwe_dimension_in, lwe_dimension_out,
+      base_log, level_count, num_samples);
+}
+
 uint64_t scratch_packing_keyswitch_lwe_list_to_glwe_64(
    void *stream, uint32_t gpu_index, int8_t **fp_ks_buffer,
    uint32_t lwe_dimension, uint32_t glwe_dimension, uint32_t polynomial_size,
@@ -159,3 +182,12 @@ void cuda_packing_keyswitch_lwe_list_to_glwe_128(
      input_lwe_dimension, output_glwe_dimension, output_polynomial_size,
      base_log, level_count, num_lwes);
 }
+
+void cuda_closest_representable_64(void *stream, uint32_t gpu_index,
+                                   void const *input, void *output,
+                                   uint32_t base_log, uint32_t level_count) {
+  host_cuda_closest_representable(static_cast<cudaStream_t>(stream), gpu_index,
+                                  static_cast<const uint64_t *>(input),
+                                  static_cast<uint64_t *>(output), base_log,
+                                  level_count);
+}
--- a/backends/tfhe-cuda-backend/cuda/src/crypto/keyswitch.cuh
+++ b/backends/tfhe-cuda-backend/cuda/src/crypto/keyswitch.cuh
@@ -12,7 +12,6 @@
 #include "utils/helper.cuh"
 #include "utils/kernel_dimensions.cuh"
 #include <thread>
-#include <unistd.h>
 #include <vector>

 const int BLOCK_SIZE_DECOMP = 8;
@@ -46,10 +45,42 @@ __device__ Torus *get_ith_block(Torus *ksk, int i, int level,
  return ptr;
 }

+template <typename T>
+__device__ T closest_repr(T input, uint32_t base_log, uint32_t level_count) {
+  T minus_2 = static_cast<T>(-2);
+  const T rep_bit_count = level_count * base_log;            // 32
+  const T non_rep_bit_count = sizeof(T) * 8 - rep_bit_count; // 32
+  auto shift = (non_rep_bit_count - 1);                      // 31
+  T res = input >> shift;
+  res++;
+  res &= minus_2;
+  res <<= shift;
+  return res;
+}
+
+template <typename T>
+__global__ void closest_representable(const T *input, T *output,
+                                      uint32_t base_log, uint32_t level_count) {
+  output[0] = closest_repr(input[0], base_log, level_count);
+}
+
+template <typename T>
+__host__ void
+host_cuda_closest_representable(cudaStream_t stream, uint32_t gpu_index,
+                                const T *input, T *output, uint32_t base_log,
+                                uint32_t level_count) {
+  dim3 grid(1, 1, 1);
+  dim3 threads(1, 1, 1);
+
+  cuda_set_device(gpu_index);
+  closest_representable<<<grid, threads, 0, stream>>>(input, output, base_log,
+                                                      level_count);
+}
+
 // Initialize decomposition by performing rounding
 // and decomposing one level of an array of Torus LWEs. Only
 // decomposes the mask elements of the incoming LWEs.
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __global__ void decompose_vectorize_init(Torus const *lwe_in, Torus *lwe_out,
                                         uint32_t lwe_dimension,
                                         uint32_t num_lwe, uint32_t base_log,
@@ -76,7 +107,9 @@ __global__ void decompose_vectorize_init(Torus const *lwe_in, Torus *lwe_out,
  Torus state = init_decomposer_state(a_i, base_log, level_count);

  Torus mod_b_mask = (1ll << base_log) - 1ll;
-  lwe_out[write_val_idx] = decompose_one<Torus>(state, mod_b_mask, base_log);
+  KSTorus *kst_ptr_lwe_out = (KSTorus *)lwe_out;
+  kst_ptr_lwe_out[write_val_idx] =
+      decompose_one<Torus>(state, mod_b_mask, base_log);
  __syncthreads();
  lwe_out[write_state_idx] = state;
 }
@@ -86,7 +119,7 @@ __global__ void decompose_vectorize_init(Torus const *lwe_in, Torus *lwe_out,
 // from num_lwe. The maximum index should be <= total_lwe. num_lwe is the number
 // of LWEs to decompose The output buffer should have space for num_lwe LWEs.
 // These will be sorted according to the input indices.
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __global__ void decompose_vectorize_init_with_indices(
    Torus const *lwe_in, const Torus *__restrict__ lwe_input_indices,
    Torus *lwe_out, uint32_t lwe_dimension, uint32_t num_lwe, uint32_t base_log,
@@ -114,7 +147,9 @@ __global__ void decompose_vectorize_init_with_indices(
  Torus state = init_decomposer_state(a_i, base_log, level_count);

  Torus mod_b_mask = (1ll << base_log) - 1ll;
-  lwe_out[write_val_idx] = decompose_one<Torus>(state, mod_b_mask, base_log);
+  KSTorus *kst_ptr_lwe_out = (KSTorus *)lwe_out;
+  kst_ptr_lwe_out[write_val_idx] =
+      decompose_one<Torus>(state, mod_b_mask, base_log);
  __syncthreads();
  lwe_out[write_state_idx] = state;
 }
@@ -122,7 +157,7 @@ __global__ void decompose_vectorize_init_with_indices(
 // Continue decomposition of an array of Torus elements in place. Supposes
 // that the array contains already decomposed elements and
 // computes the new decomposed level in place.
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __global__ void
 decompose_vectorize_step_inplace(Torus *buffer_in, uint32_t lwe_dimension,
                                 uint32_t num_lwe, uint32_t base_log,
@@ -144,15 +179,22 @@ decompose_vectorize_step_inplace(Torus *buffer_in, uint32_t lwe_dimension,

  Torus mod_b_mask = (1ll << base_log) - 1ll;

-  buffer_in[val_idx] = decompose_one<Torus>(state, mod_b_mask, base_log);
+  KSTorus *kst_ptr_lwe_out = (KSTorus *)buffer_in;
+  kst_ptr_lwe_out[val_idx] = decompose_one<Torus>(state, mod_b_mask, base_log);
  __syncthreads();
  buffer_in[state_idx] = state;
 }

-template <typename Torus>
+/* LWEs inputs to the keyswitch function are stored as a_0,...,a_{lwe_dim},b,
+ * where a_i are mask elements and b is the message. We initialize
+ * the output keyswitched LWEs to 0, ..., 0, -b. The GEMM keyswitch is computed
+ * as:
+ * -(-b + sum(a_i A_KSK))
+ */
+template <typename Torus, typename KSTorus>
 __global__ void keyswitch_gemm_copy_negated_message_with_indices(
    const Torus *__restrict__ lwe_in,
-    const Torus *__restrict__ lwe_input_indices, Torus *__restrict__ lwe_out,
+    const Torus *__restrict__ lwe_input_indices, KSTorus *__restrict__ lwe_out,
    const Torus *__restrict__ lwe_output_indices,

    uint32_t lwe_dimension_in, uint32_t num_lwes, uint32_t lwe_dimension_out) {
@@ -165,16 +207,39 @@ __global__ void keyswitch_gemm_copy_negated_message_with_indices(
  uint32_t lwe_in_idx = lwe_input_indices[lwe_id];
  uint32_t lwe_out_idx = lwe_output_indices[lwe_id];

+  Torus body_in =
+      lwe_in[lwe_in_idx * (lwe_dimension_in + 1) + lwe_dimension_in];
+  Torus body_out;
+  if constexpr (std::is_same_v<KSTorus, Torus>) {
+    body_out = -body_in;
+  } else {
+    body_out = closest_repr(
+        lwe_in[lwe_in_idx * (lwe_dimension_in + 1) + lwe_dimension_in],
+        sizeof(KSTorus) * 8, 1);
+
+    // Power of two are encoded in the MSBs of the types so we need to scale
+    // the type to the other one without having to worry about the moduli
+    static_assert(sizeof(Torus) >= sizeof(KSTorus),
+                  "Cannot compile keyswitch with given input/output dtypes");
+    Torus input_to_output_scaling_factor =
+        (sizeof(Torus) - sizeof(KSTorus)) * 8;
+
+    auto rounded_downscaled_body =
+        (KSTorus)(body_out >> input_to_output_scaling_factor);
+
+    body_out = -rounded_downscaled_body;
+  }
  lwe_out[lwe_out_idx * (lwe_dimension_out + 1) + lwe_dimension_out] =
-      -lwe_in[lwe_in_idx * (lwe_dimension_in + 1) + lwe_dimension_in];
+      (KSTorus)body_out;
 }

-// Finishes the KS computation by negating all elements in the array
-// using output indices. The array contains -b + SUM(a_i x LWE_i)
-// and this final step computes b - SUM(a_i x LWE_i)
-template <typename Torus>
+// The GEMM keyswitch is computed as: -(-b + sum(a_i A_KSK)).
+// This function finishes the KS computation by negating all elements in the
+// array using output indices. The array contains -b + SUM(a_i x LWE_i) and this
+// final step computes b - SUM(a_i x LWE_i).
+template <typename Torus, typename KSTorus>
 __global__ void keyswitch_negate_with_output_indices(
-    Torus *buffer_in, const Torus *__restrict__ lwe_output_indices,
+    KSTorus *buffer_in, const Torus *__restrict__ lwe_output_indices,
    uint32_t lwe_size, uint32_t num_lwe) {

  // index of this LWE ct in the buffer
@@ -191,9 +256,9 @@ __global__ void keyswitch_negate_with_output_indices(
  buffer_in[val_idx] = -val;
 }

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __global__ void keyswitch_zero_output_with_output_indices(
-    Torus *buffer_in, const Torus *__restrict__ lwe_output_indices,
+    KSTorus *buffer_in, const Torus *__restrict__ lwe_output_indices,
    uint32_t lwe_size, uint32_t num_lwe) {

  // index of this LWE ct in the buffer
@@ -235,12 +300,12 @@ __global__ void keyswitch_zero_output_with_output_indices(
 // in two parts, a constant part is calculated before the loop, and a variable
 // part is calculated inside the loop. This seems to help with the register
 // pressure as well.
-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __global__ void
-keyswitch(Torus *lwe_array_out, const Torus *__restrict__ lwe_output_indexes,
+keyswitch(KSTorus *lwe_array_out, const Torus *__restrict__ lwe_output_indexes,
          const Torus *__restrict__ lwe_array_in,
          const Torus *__restrict__ lwe_input_indexes,
-          const Torus *__restrict__ ksk, uint32_t lwe_dimension_in,
+          const KSTorus *__restrict__ ksk, uint32_t lwe_dimension_in,
          uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count) {
  const int tid = threadIdx.x + blockIdx.y * blockDim.x;
  const int shmem_index = threadIdx.x + threadIdx.y * blockDim.x;
@@ -252,12 +317,27 @@ keyswitch(Torus *lwe_array_out, const Torus *__restrict__ lwe_output_indexes,

  if (tid <= lwe_dimension_out) {

-    Torus local_lwe_out = 0;
+    KSTorus local_lwe_out = 0;
    auto block_lwe_array_in = get_chunk(
        lwe_array_in, lwe_input_indexes[blockIdx.x], lwe_dimension_in + 1);

    if (tid == lwe_dimension_out && threadIdx.y == 0) {
-      local_lwe_out = -block_lwe_array_in[lwe_dimension_in];
+      if constexpr (std::is_same_v<KSTorus, Torus>) {
+        local_lwe_out = -block_lwe_array_in[lwe_dimension_in];
+      } else {
+        auto new_body = closest_repr(block_lwe_array_in[lwe_dimension_in],
+                                     sizeof(KSTorus) * 8, 1);
+
+        // Power of two are encoded in the MSBs of the types so we need to scale
+        // the type to the other one without having to worry about the moduli
+        Torus input_to_output_scaling_factor =
+            (sizeof(Torus) - sizeof(KSTorus)) * 8;
+
+        auto rounded_downscaled_body =
+            (KSTorus)(new_body >> input_to_output_scaling_factor);
+
+        local_lwe_out = -rounded_downscaled_body;
+      }
    }
    const Torus mask_mod_b = (1ll << base_log) - 1ll;

@@ -273,9 +353,10 @@ keyswitch(Torus *lwe_array_out, const Torus *__restrict__ lwe_output_indexes,
      uint32_t offset = i * level_count * (lwe_dimension_out + 1);
      for (int j = 0; j < level_count; j++) {

-        Torus decomposed = decompose_one<Torus>(state, mask_mod_b, base_log);
+        KSTorus decomposed = decompose_one<Torus>(state, mask_mod_b, base_log);
        local_lwe_out +=
-            (Torus)ksk[tid + j * (lwe_dimension_out + 1) + offset] * decomposed;
+            (KSTorus)ksk[tid + j * (lwe_dimension_out + 1) + offset] *
+            decomposed;
      }
    }

@@ -294,13 +375,13 @@ keyswitch(Torus *lwe_array_out, const Torus *__restrict__ lwe_output_indexes,
  }
 }

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void host_keyswitch_lwe_ciphertext_vector(
-    cudaStream_t stream, uint32_t gpu_index, Torus *lwe_array_out,
+    cudaStream_t stream, uint32_t gpu_index, KSTorus *lwe_array_out,
    Torus const *lwe_output_indexes, Torus const *lwe_array_in,
-    Torus const *lwe_input_indexes, Torus const *ksk, uint32_t lwe_dimension_in,
-    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
-    uint32_t num_samples) {
+    Torus const *lwe_input_indexes, KSTorus const *ksk,
+    uint32_t lwe_dimension_in, uint32_t lwe_dimension_out, uint32_t base_log,
+    uint32_t level_count, uint32_t num_samples) {

  cuda_set_device(gpu_index);

@@ -322,29 +403,36 @@ __host__ void host_keyswitch_lwe_ciphertext_vector(
  dim3 grid(num_samples, num_blocks_per_sample, 1);
  dim3 threads(num_threads_x, num_threads_y, 1);

-  keyswitch<Torus><<<grid, threads, shared_mem, stream>>>(
+  keyswitch<Torus, KSTorus><<<grid, threads, shared_mem, stream>>>(
      lwe_array_out, lwe_output_indexes, lwe_array_in, lwe_input_indexes, ksk,
      lwe_dimension_in, lwe_dimension_out, base_log, level_count);
  check_cuda_error(cudaGetLastError());
 }

-template <typename Torus>
+// The GEMM keyswitch is computed as: -(-b + sum(a_i A_KSK))
+template <typename Torus, typename KSTorus>
 __host__ void host_gemm_keyswitch_lwe_ciphertext_vector(
-    cudaStream_t stream, uint32_t gpu_index, Torus *lwe_array_out,
+    cudaStream_t stream, uint32_t gpu_index, KSTorus *lwe_array_out,
    Torus const *lwe_output_indices, Torus const *lwe_array_in,
-    Torus const *lwe_input_indices, Torus const *ksk, uint32_t lwe_dimension_in,
-    uint32_t lwe_dimension_out, uint32_t base_log, uint32_t level_count,
-    uint32_t num_samples, Torus *fp_tmp_buffer, bool uses_trivial_indices) {
+    Torus const *lwe_input_indices, KSTorus const *ksk,
+    uint32_t lwe_dimension_in, uint32_t lwe_dimension_out, uint32_t base_log,
+    uint32_t level_count, uint32_t num_samples, Torus *fp_tmp_buffer,
+    bool uses_trivial_indices) {
  cuda_set_device(gpu_index);
  check_cuda_error(cudaGetLastError());

-  auto d_mem_0 = fp_tmp_buffer; // keeps decomposed value
+  // fp_tmp_buffer contains 2x the space to store the input LWE masks without
+  // the body the first half can be interpreted with a smaller dtype when
+  // performing 64->32 KS the second half, storing decomposition state, must be
+  // interpreted as Torus* (usually 64b)
+  KSTorus *d_mem_0 =
+      (KSTorus *)fp_tmp_buffer; // keeps decomposed value (in KSTorus type)

  // Set the scratch buffer to 0 as it is used to accumulate
  // decomposition temporary results
  if (uses_trivial_indices) {
    cuda_memset_async(lwe_array_out, 0,
-                      num_samples * (lwe_dimension_out + 1) * sizeof(Torus),
+                      num_samples * (lwe_dimension_out + 1) * sizeof(KSTorus),
                      stream, gpu_index);
  } else {
    // gemm to ks the individual LWEs to GLWEs
@@ -352,7 +440,7 @@ __host__ void host_gemm_keyswitch_lwe_ciphertext_vector(
                   CEIL_DIV(num_samples, BLOCK_SIZE_DECOMP));
    dim3 threads_zero(BLOCK_SIZE_DECOMP, BLOCK_SIZE_DECOMP);

-    keyswitch_zero_output_with_output_indices<Torus>
+    keyswitch_zero_output_with_output_indices<Torus, KSTorus>
        <<<grid_zero, threads_zero, 0, stream>>>(
            lwe_array_out, lwe_output_indices, lwe_dimension_out + 1,
            num_samples);
@@ -364,8 +452,8 @@ __host__ void host_gemm_keyswitch_lwe_ciphertext_vector(

  // lwe_array_out is num_samples x (lwe_dimension_out + 1). copy the bodies
  // lwe_array_in[:,lwe_dimension_in] to lwe_array_out[:,lwe_dimension_out]
-  // and negate
-  keyswitch_gemm_copy_negated_message_with_indices<Torus>
+  // and negates them
+  keyswitch_gemm_copy_negated_message_with_indices<Torus, KSTorus>
      <<<grid_copy, threads_copy, 0, stream>>>(
          lwe_array_in, lwe_input_indices, lwe_array_out, lwe_output_indices,
          lwe_dimension_in, num_samples, lwe_dimension_out);
@@ -394,21 +482,21 @@ __host__ void host_gemm_keyswitch_lwe_ciphertext_vector(
  dim3 threads_gemm(BLOCK_SIZE_GEMM_KS * THREADS_GEMM_KS);

  // decompose first level (skips the body in the input buffer)
-  decompose_vectorize_init_with_indices<Torus>
+  decompose_vectorize_init_with_indices<Torus, KSTorus>
      <<<grid_decomp, threads_decomp, 0, stream>>>(
          lwe_array_in, lwe_input_indices, fp_tmp_buffer, lwe_dimension_in,
          num_samples, base_log, level_count);
  check_cuda_error(cudaGetLastError());

  if (uses_trivial_indices) {
-    tgemm<Torus, BLOCK_SIZE_GEMM_KS, THREADS_GEMM_KS>
+    tgemm<KSTorus, BLOCK_SIZE_GEMM_KS, THREADS_GEMM_KS>
        <<<grid_gemm, threads_gemm, shared_mem_size, stream>>>(
            num_samples, (lwe_dimension_out + 1), lwe_dimension_in, d_mem_0,
            ksk, stride_KSK_buffer, lwe_array_out, lwe_dimension_out + 1);
    check_cuda_error(cudaGetLastError());

  } else {
-    tgemm_with_indices<Torus, BLOCK_SIZE_GEMM_KS, THREADS_GEMM_KS>
+    tgemm_with_indices<KSTorus, Torus, BLOCK_SIZE_GEMM_KS, THREADS_GEMM_KS>
        <<<grid_gemm, threads_gemm, shared_mem_size, stream>>>(
            num_samples, (lwe_dimension_out + 1), lwe_dimension_in, d_mem_0,
            ksk, stride_KSK_buffer, lwe_array_out, lwe_dimension_out + 1,
@@ -419,14 +507,14 @@ __host__ void host_gemm_keyswitch_lwe_ciphertext_vector(
  auto ksk_block_size = (lwe_dimension_out + 1);

  for (int li = 1; li < level_count; ++li) {
-    decompose_vectorize_step_inplace<Torus>
+    decompose_vectorize_step_inplace<Torus, KSTorus>
        <<<grid_decomp, threads_decomp, 0, stream>>>(
            fp_tmp_buffer, lwe_dimension_in, num_samples, base_log,
            level_count);
    check_cuda_error(cudaGetLastError());

    if (uses_trivial_indices) {
-      tgemm<Torus, BLOCK_SIZE_GEMM_KS, THREADS_GEMM_KS>
+      tgemm<KSTorus, BLOCK_SIZE_GEMM_KS, THREADS_GEMM_KS>
          <<<grid_gemm, threads_gemm, shared_mem_size, stream>>>(
              num_samples, (lwe_dimension_out + 1), lwe_dimension_in, d_mem_0,
              ksk + li * ksk_block_size, stride_KSK_buffer, lwe_array_out,
@@ -434,7 +522,7 @@ __host__ void host_gemm_keyswitch_lwe_ciphertext_vector(
      check_cuda_error(cudaGetLastError());

    } else {
-      tgemm_with_indices<Torus, BLOCK_SIZE_GEMM_KS, THREADS_GEMM_KS>
+      tgemm_with_indices<KSTorus, Torus, BLOCK_SIZE_GEMM_KS, THREADS_GEMM_KS>
          <<<grid_gemm, threads_gemm, shared_mem_size, stream>>>(
              num_samples, (lwe_dimension_out + 1), lwe_dimension_in, d_mem_0,
              ksk + li * ksk_block_size, stride_KSK_buffer, lwe_array_out,
@@ -447,20 +535,22 @@ __host__ void host_gemm_keyswitch_lwe_ciphertext_vector(
  dim3 grid_negate(CEIL_DIV(lwe_dimension_out + 1, BLOCK_SIZE_DECOMP),
                   CEIL_DIV(num_samples, BLOCK_SIZE_DECOMP));
  dim3 threads_negate(BLOCK_SIZE_DECOMP, BLOCK_SIZE_DECOMP);
-  // Negate all outputs in the LWE
-  keyswitch_negate_with_output_indices<Torus>
+
+  // Negate all outputs in the output LWEs. This is the final step in the GEMM
+  // keyswitch computed as: -(-b + sum(a_i A_KSK))
+  keyswitch_negate_with_output_indices<Torus, KSTorus>
      <<<grid_negate, threads_negate, 0, stream>>>(
          lwe_array_out, lwe_output_indices, lwe_dimension_out + 1,
          num_samples);
  check_cuda_error(cudaGetLastError());
 }

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 void execute_keyswitch_async(
    CudaStreams streams, const LweArrayVariant<Torus> &lwe_array_out,
    const LweArrayVariant<Torus> &lwe_output_indexes,
    const LweArrayVariant<Torus> &lwe_array_in,
-    const LweArrayVariant<Torus> &lwe_input_indexes, Torus *const *ksks,
+    const LweArrayVariant<Torus> &lwe_input_indexes, KSTorus *const *ksks,
    uint32_t lwe_dimension_in, uint32_t lwe_dimension_out, uint32_t base_log,
    uint32_t level_count, uint32_t num_samples, bool uses_trivial_indices,
    const std::vector<ks_mem<Torus> *> &fp_tmp_buffer) {
--- a/backends/tfhe-cuda-backend/cuda/src/crypto/packing_keyswitch.cuh
+++ b/backends/tfhe-cuda-backend/cuda/src/crypto/packing_keyswitch.cuh
@@ -124,8 +124,10 @@ __host__ void host_packing_keyswitch_lwe_list_to_glwe(
  dim3 threads_decomp(BLOCK_SIZE_DECOMP, BLOCK_SIZE_DECOMP);

  // decompose first level
-  decompose_vectorize_init<Torus><<<grid_decomp, threads_decomp, 0, stream>>>(
-      lwe_array_in, d_mem_0, lwe_dimension, num_lwes, base_log, level_count);
+  decompose_vectorize_init<Torus, Torus>
+      <<<grid_decomp, threads_decomp, 0, stream>>>(lwe_array_in, d_mem_0,
+                                                   lwe_dimension, num_lwes,
+                                                   base_log, level_count);
  check_cuda_error(cudaGetLastError());

  // gemm to ks the individual LWEs to GLWEs
@@ -151,7 +153,7 @@ __host__ void host_packing_keyswitch_lwe_list_to_glwe(
  auto ksk_block_size = glwe_accumulator_size;

  for (int li = 1; li < level_count; ++li) {
-    decompose_vectorize_step_inplace<Torus>
+    decompose_vectorize_step_inplace<Torus, Torus>
        <<<grid_decomp, threads_decomp, 0, stream>>>(
            d_mem_0, lwe_dimension, num_lwes, base_log, level_count);
    check_cuda_error(cudaGetLastError());
--- a/backends/tfhe-cuda-backend/cuda/src/crypto/torus.cuh
+++ b/backends/tfhe-cuda-backend/cuda/src/crypto/torus.cuh
@@ -144,6 +144,20 @@ __device__ __forceinline__ T modulus_switch(T input, uint32_t log_modulus) {
  return output;
 }

+template <typename Torus, class params>
+__device__ uint32_t calculates_monomial_degree(const Torus *lwe_array_group,
+                                               uint32_t ggsw_idx,
+                                               uint32_t grouping_factor) {
+  Torus x = 0;
+  for (int i = 0; i < grouping_factor; i++) {
+    uint32_t mask_position = grouping_factor - (i + 1);
+    int selection_bit = (ggsw_idx >> mask_position) & 1;
+    x += selection_bit * lwe_array_group[i];
+  }
+
+  return modulus_switch(x, params::log2_degree + 1);
+}
+
 template <typename Torus>
 __global__ void modulus_switch_inplace(Torus *array, uint32_t size,
                                       uint32_t log_modulus) {
--- a/backends/tfhe-cuda-backend/cuda/src/integer/bitwise_ops.cuh
+++ b/backends/tfhe-cuda-backend/cuda/src/integer/bitwise_ops.cuh
@@ -22,13 +22,13 @@ __host__ uint64_t scratch_cuda_boolean_bitop(
  return size_tracker;
 }

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void host_boolean_bitop(CudaStreams streams,
                                 CudaRadixCiphertextFFI *lwe_array_out,
                                 CudaRadixCiphertextFFI const *lwe_array_1,
                                 CudaRadixCiphertextFFI const *lwe_array_2,
                                 boolean_bitop_buffer<Torus> *mem_ptr,
-                                 void *const *bsks, Torus *const *ksks) {
+                                 void *const *bsks, KSTorus *const *ksks) {

  PANIC_IF_FALSE(
      lwe_array_out->num_radix_blocks == lwe_array_1->num_radix_blocks &&
@@ -203,11 +203,11 @@ __host__ uint64_t scratch_cuda_boolean_bitnot(
  return size_tracker;
 }

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void host_boolean_bitnot(CudaStreams streams,
                                  CudaRadixCiphertextFFI *lwe_array,
                                  boolean_bitnot_buffer<Torus> *mem_ptr,
-                                  void *const *bsks, Torus *const *ksks) {
+                                  void *const *bsks, KSTorus *const *ksks) {
  bool carries_empty = true;
  for (size_t i = 0; i < lwe_array->num_radix_blocks; ++i) {
    if (lwe_array->degrees[i] >= mem_ptr->params.message_modulus) {
@@ -228,13 +228,13 @@ __host__ void host_boolean_bitnot(CudaStreams streams,
  // this function calls `host_bitnot` with `ct_message_modulus = 2`
 }

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void host_bitop(CudaStreams streams,
                         CudaRadixCiphertextFFI *lwe_array_out,
                         CudaRadixCiphertextFFI const *lwe_array_1,
                         CudaRadixCiphertextFFI const *lwe_array_2,
                         int_bitop_buffer<Torus> *mem_ptr, void *const *bsks,
-                         Torus *const *ksks) {
+                         KSTorus *const *ksks) {

  PANIC_IF_FALSE(
      lwe_array_out->num_radix_blocks == lwe_array_1->num_radix_blocks &&
--- a/backends/tfhe-cuda-backend/cuda/src/integer/cast.cuh
+++ b/backends/tfhe-cuda-backend/cuda/src/integer/cast.cuh
@@ -68,12 +68,12 @@ __host__ uint64_t scratch_extend_radix_with_sign_msb(
  return size_tracker;
 }

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void host_extend_radix_with_sign_msb(
    CudaStreams streams, CudaRadixCiphertextFFI *output,
    CudaRadixCiphertextFFI const *input,
    int_extend_radix_with_sign_msb_buffer<Torus> *mem_ptr,
-    uint32_t num_additional_blocks, void *const *bsks, Torus *const *ksks) {
+    uint32_t num_additional_blocks, void *const *bsks, KSTorus *const *ksks) {

  if (num_additional_blocks == 0) {
    PUSH_RANGE("cast/extend no addblocks")
--- a/backends/tfhe-cuda-backend/cuda/src/integer/cmux.cuh
+++ b/backends/tfhe-cuda-backend/cuda/src/integer/cmux.cuh
@@ -5,14 +5,14 @@
 #include "integer/cmux.h"
 #include "radix_ciphertext.cuh"

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void zero_out_if(CudaStreams streams,
                          CudaRadixCiphertextFFI *lwe_array_out,
                          CudaRadixCiphertextFFI const *lwe_array_input,
                          CudaRadixCiphertextFFI const *lwe_condition,
                          int_zero_out_if_buffer<Torus> *mem_ptr,
                          int_radix_lut<Torus> *predicate, void *const *bsks,
-                          Torus *const *ksks, uint32_t num_radix_blocks) {
+                          KSTorus *const *ksks, uint32_t num_radix_blocks) {
  PANIC_IF_FALSE(
      lwe_array_out->num_radix_blocks >= num_radix_blocks &&
          lwe_array_input->num_radix_blocks >= num_radix_blocks,
@@ -41,14 +41,14 @@ __host__ void zero_out_if(CudaStreams streams,
      num_radix_blocks);
 }

-template <typename Torus>
+template <typename Torus, typename KSTorus>
 __host__ void host_cmux(CudaStreams streams,
                        CudaRadixCiphertextFFI *lwe_array_out,
                        CudaRadixCiphertextFFI const *lwe_condition,
                        CudaRadixCiphertextFFI const *lwe_array_true,
                        CudaRadixCiphertextFFI const *lwe_array_false,
                        int_cmux_buffer<Torus> *mem_ptr, void *const *bsks,
-                        Torus *const *ksks) {
+                        KSTorus *const *ksks) {

  if (lwe_array_out->num_radix_blocks != lwe_array_true->num_radix_blocks)
    PANIC("Cuda error: input and output num radix blocks must be the same")
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Arthur Meyre	7043246c17	chore: update CODEOWNERS file	2026-01-09 16:12:50 +01:00
Theo Souchon	51735fb8ed	chore(bench): code refactor and automation for hlapi	2026-01-09 16:09:27 +01:00
pgardratzama	23a348c9ae	feat(hpu): new HPU bitstream RTL v2.2	2026-01-09 15:25:35 +01:00
Mayeul@Zama	61b616b784	chore(hlapi): add bench of oprf over any range	2026-01-09 15:19:08 +01:00
Mayeul@Zama	df48e176f3	feat(hlapi): add oprf over any range	2026-01-09 15:19:08 +01:00
Mayeul@Zama	dd2345df6b	refactor(integer): use NonZeroU64 for excluded_upper_bound	2026-01-09 15:19:08 +01:00
Mayeul@Zama	933800ea6f	doc(hlapi): fix documentation	2026-01-09 15:19:08 +01:00
Mayeul@Zama	3e4cee3a75	refactor(integer): split oprf_almost_uniformity_test	2026-01-09 15:19:08 +01:00
Mayeul@Zama	00ea9b8e07	refactor(shortint): improve error in uniformity_p_value	2026-01-09 15:19:08 +01:00
Mayeul@Zama	23ce85f6a2	fix(core): make sup_diff more permissive	2026-01-09 15:19:08 +01:00
Nicolas Sarlin	126a95e929	fix(js): unsafe coop bench was overwritting mt one	2026-01-08 16:48:18 +01:00
Nicolas Sarlin	23fffb1443	chore(deps): ignore unmaintained bincode cargo audit warning	2026-01-08 15:16:37 +01:00
Agnes Leroy	6d58a54266	chore(gpu): attempt to fix apt in ci	2026-01-08 14:54:03 +01:00
Baptiste Roux	9b8d5f5a43	chore(hpu): bump version of lru Lru required version update following caro audit Signed-off-by: Baptiste Roux <baptiste.roux@zama.ai>	2026-01-08 14:08:31 +01:00
dependabot[bot]	696f964ecf	chore(deps): bump actions/cache from 4.3.0 to 5.0.1 Bumps [actions/cache](https://github.com/actions/cache) from 4.3.0 to 5.0.1. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](`0057852bfa...9255dc7a25`) --- updated-dependencies: - dependency-name: actions/cache dependency-version: 5.0.1 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-07 16:20:28 +01:00
Nicolas Sarlin	a5323d1edf	chore(test-vectors): add vectors for karatsuba br	2026-01-07 10:19:30 +01:00
Nicolas Sarlin	2d500d0de6	feat(core): add karatsuba bootstrap Co-authored-by: IceTDrinker <arthur.meyre@zama.ai>	2026-01-07 10:19:30 +01:00
pgardratzama	b1657876fb	fix(hpu): fix double import when hpu feature is on	2026-01-06 18:01:14 +01:00
pgardratzama	d2a570bdd6	chore: uses if_then_zero only in HPU ERC20 whitepaper (to be updated when encrypt_trivial becomes available on HPU), adds test of if_then_zero for both CPU & HPU	2026-01-06 16:55:07 +01:00
pgardratzama	122ef489fd	chore(hpu): make pcc happy	2026-01-06 16:55:07 +01:00
pgardratzama	ed84387bba	chore: trying to insure GPU ERC20 bench are not impacted while CPU & HPU uses if_then_zero	2026-01-06 16:55:07 +01:00
Baptiste Roux	1f4ba33a50	fix(hpu): Correctly select adder configuration in ERC_20/ERC_20_SIMD Add knobs to select ripple or kogge adder in ERC_20/ERC_20_SIMD. Previously, it was hardcoded to ripple carry and thus degraded latency performance of ERC_20.	2026-01-06 16:55:07 +01:00
Baptiste Roux	e645ee3397	feat: Add IfThenZero impl for Cpu	2026-01-06 16:55:07 +01:00
pgardratzama	569abd9a3b	fix(hpu): fix whitepaper erc20 for HPU using if_then_zero	2026-01-06 16:55:07 +01:00
Nicolas Sarlin	917bb5e1ef	chore(js): use smaller favicon for wasm tests	2026-01-06 16:42:43 +01:00
Nicolas Sarlin	509aadcad2	chore(js): add benchmark for unsafe coop context	2026-01-06 16:42:43 +01:00
David Testé	e20aea90df	chore(bench): set ks32 parameters as default for integer benchmarks	2026-01-06 14:59:34 +01:00
dependabot[bot]	e8ab448454	chore(deps): bump tj-actions/changed-files from 47.0.0 to 47.0.1 Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 47.0.0 to 47.0.1. - [Release notes](https://github.com/tj-actions/changed-files/releases) - [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md) - [Commits](`24d32ffd49...e002140703`) --- updated-dependencies: - dependency-name: tj-actions/changed-files dependency-version: 47.0.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-05 18:22:17 +01:00
dependabot[bot]	50f6773c82	chore(deps): bump codecov/codecov-action from 5.5.1 to 5.5.2 Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5.5.1 to 5.5.2. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](`5a1091511a...671740ac38`) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: 5.5.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-05 15:58:43 +01:00
dependabot[bot]	1eb8270812	chore(deps): bump peter-evans/create-pull-request from 7.0.9 to 8.0.0 Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 7.0.9 to 8.0.0. - [Release notes](https://github.com/peter-evans/create-pull-request/releases) - [Commits](`84ae59a2cd...98357b18bf`) --- updated-dependencies: - dependency-name: peter-evans/create-pull-request dependency-version: 8.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-05 15:58:21 +01:00
Andrei Stoian	0fab6324b9	fix(gpu): memory sanitizer ci	2026-01-05 15:25:30 +01:00
David Testé	bb1c215951	chore(ci): fix naming on svg upload for backends comparison table Comparison table generation is a special kind. There are no inputs besides output filename. So if the regular name pattern is used we'll get only underscores between the SHA and "tables" in the artifact name.	2026-01-05 10:22:12 +01:00
Nicolas Sarlin	70a0021cbf	chore: fix bincode dependency version	2026-01-02 18:57:32 +01:00
dependabot[bot]	36b6376cc4	chore(deps): bump zgosalvez/github-actions-ensure-sha-pinned-actions Bumps [zgosalvez/github-actions-ensure-sha-pinned-actions](https://github.com/zgosalvez/github-actions-ensure-sha-pinned-actions) from 4.0.0 to 4.0.1. - [Release notes](https://github.com/zgosalvez/github-actions-ensure-sha-pinned-actions/releases) - [Commits](`9e9574ef04...6124774845`) --- updated-dependencies: - dependency-name: zgosalvez/github-actions-ensure-sha-pinned-actions dependency-version: 4.0.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-30 13:32:22 +01:00
dependabot[bot]	62d0d16f6d	chore(deps): bump foundry-rs/foundry-toolchain from 1.5.0 to 1.6.0 Bumps [foundry-rs/foundry-toolchain](https://github.com/foundry-rs/foundry-toolchain) from 1.5.0 to 1.6.0. - [Release notes](https://github.com/foundry-rs/foundry-toolchain/releases) - [Changelog](https://github.com/foundry-rs/foundry-toolchain/blob/master/RELEASE.md) - [Commits](`50d5a8956f...8b0419c685`) --- updated-dependencies: - dependency-name: foundry-rs/foundry-toolchain dependency-version: 1.6.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-30 13:32:14 +01:00
dependabot[bot]	c86deec683	chore(deps): bump actions/upload-artifact from 5.0.0 to 6.0.0 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 5.0.0 to 6.0.0. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](`330a01c490...b7c566a772`) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: 6.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-30 11:14:47 +01:00
dependabot[bot]	4d42425f4f	chore(deps): bump actions/download-artifact from 6.0.0 to 7.0.0 Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 6.0.0 to 7.0.0. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](`018cc2cf5b...37930b1c2a`) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: 7.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-30 11:13:33 +01:00
Guillermo Oyarzun	92df46f8f2	fix(gpu): return to 64 regs in multi-bit pbs	2025-12-23 11:51:00 +01:00
David Testé	effb7ada6d	chore(ci): fix argument name passed to data_extractor	2025-12-18 18:09:34 +01:00
Agnes Leroy	49be544297	fix(gpu): fix cpu memory leak in expand and rerand	2025-12-18 16:33:23 +01:00
David Testé	23600eb8e1	chore(ci): split gpu documentation benchmarks execution This is done to mitigate H100x8-SXM5 server scarcity.	2025-12-18 14:56:15 +01:00
Agnes Leroy	9708cc7fe9	chore(gpu): remove core crypto from valgrind run	2025-12-18 13:01:12 +01:00
dependabot[bot]	4cdfccb659	chore(deps): bump actions/setup-node from 6.0.0 to 6.1.0 Bumps [actions/setup-node](https://github.com/actions/setup-node) from 6.0.0 to 6.1.0. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](`2028fbc5c2...395ad32622`) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: 6.1.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-18 11:05:56 +01:00
dependabot[bot]	031c3fe34f	chore(deps): bump actions/checkout from 6.0.0 to 6.0.1 Bumps [actions/checkout](https://github.com/actions/checkout) from 6.0.0 to 6.0.1. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](`1af3b93b68...8e8c483db8`) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-18 11:05:47 +01:00
dependabot[bot]	ea99307cf5	chore(deps): bump actions/stale from 10.1.0 to 10.1.1 Bumps [actions/stale](https://github.com/actions/stale) from 10.1.0 to 10.1.1. - [Release notes](https://github.com/actions/stale/releases) - [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) - [Commits](`5f858e3efb...997185467f`) --- updated-dependencies: - dependency-name: actions/stale dependency-version: 10.1.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-18 11:05:36 +01:00
Enzo Di Maria	ca2a79f1fb	refactor(gpu): Threshold for multi-GPU with Classical PBS	2025-12-18 09:27:09 +01:00
Enzo Di Maria	0a59e86675	fix(gpu): Using tbc for classical 64 bits pbs on H100	2025-12-17 19:18:01 +01:00
Nicolas Sarlin	312ce494bf	chore(zk): add 1 * 64 benches with production CRS	2025-12-17 15:06:37 +01:00
Nicolas Sarlin	5f2e7e31f1	chore(zk): align wasm bench and integer bench	2025-12-17 15:06:37 +01:00
Agnes Leroy	cfa53682ae	fix(gpu): add missing sync before free in oprf	2025-12-16 09:42:11 +01:00
Agnes Leroy	006d6cc300	fix(gpu): fix some cpu memory leaks	2025-12-15 14:27:35 -03:00
David Testé	b950b551e6	chore(ci): update node version to 24.12 To be able to get npm version that allow trusted publishing (>v11.5.1).	2025-12-15 16:08:29 +01:00
David Testé	95524966ca	chore(ci): handle push to crates.io input in release workflow	2025-12-15 16:08:29 +01:00
Thomas Montaigu	d394af7f4d	chore: bump dyn-stack to 0.13 Notable changes: - StackReq methods no longer returns Result<StackReq, SizeOverflow> instead, StackReq contains the invalid state. Now, its when we create a PodBuffer that we can check/catch if the size req is invalid by catching errors when calling `PodBuffer::try_new`. Its also possible to manually check that `stack_req != StackReq::OVERFLOW` - GlobalaPodBuffer is now PodBuffer	2025-12-15 10:02:17 +01:00
Andrei Stoian	78d1ce18c1	feat(gpu): support keyswitch 64/32	2025-12-12 22:01:49 +01:00
Arthur Meyre	14d49f0891	chore: add possibility to manually populate tfhe FFT plan cache	2025-12-12 19:06:47 +01:00
Nicolas Sarlin	e544dfc08e	fix(integer): handle large string size in DataKind::num_blocks	2025-12-12 13:10:14 +01:00
Nicolas Sarlin	5891a4d78a	fix(hl): handles empty compact lists	2025-12-12 13:10:14 +01:00
Nicolas Sarlin	f17cd9bd37	fix(integer): handle message mod 0 in num_blocks	2025-12-12 13:10:14 +01:00
Thomas Montaigu	c083eb826d	fix(zk-pok): Check Modulus of deserialized Fp	2025-12-12 13:10:14 +01:00
Nicolas Sarlin	1479315725	fix(integer): handles num_blocks_per_integer is 0 in ct list upgrade	2025-12-12 13:10:14 +01:00
Nicolas Sarlin	b5e5058759	fix(core): handle lwe dim of 0 when computing ct list size	2025-12-12 13:10:14 +01:00
Nicolas Sarlin	d98033c71d	fix(integer): check overflows when computing expected list size	2025-12-12 13:10:14 +01:00
Nicolas Sarlin	c7b869c956	fix(core): use saturating_* to convert from lwe dim and size	2025-12-12 13:10:14 +01:00
Nicolas Sarlin	50b76817c9	fix(shortint): use saturating_sub to get degree from message modulus	2025-12-12 13:10:14 +01:00
David Testé	238f7d51f6	chore(ci): fix data extraction on backends comparison for docs	2025-12-12 12:24:47 +01:00
Thomas Montaigu	aa49d141c7	feat: add MetaParameterFinder	2025-12-12 11:20:03 +01:00
Thomas Montaigu	be1de6ef2b	chore: impl Versionize for MetaParameters	2025-12-12 11:20:03 +01:00
Guillermo Oyarzun	11579bd3d0	feat(gpu): create noise and pfail tests for pbs + ks + ms	2025-12-12 09:41:11 +01:00