Compare commits

...

49 Commits

Author SHA1 Message Date
Agnes Leroy
de710cb2fb Debug with long run tests 2025-05-28 12:15:52 +02:00
Agnes Leroy
59a78c76a9 fix(gpu): fix build after shift/rotate mem tracking merge 2025-05-28 12:08:09 +02:00
Pedro Alves
1025246b17 fix(gpu): fix a linking problem on Hopper GPUs 2025-05-28 09:27:33 +02:00
Agnes Leroy
338e9eaeef feat(gpu): add memory tracking functions for shift/rotate 2025-05-28 09:26:27 +02:00
David Testé
0bec4d2ba1 chore(ci): pin rust-toolchain action to v1 2025-05-27 17:31:33 +02:00
David Testé
c5fab98900 chore(ci): add token to do online workflow security checks 2025-05-27 17:31:33 +02:00
Nicolas Sarlin
14e1ee5bd3 fix(gpu): build with hpu and zk features 2025-05-27 16:10:38 +02:00
Pedro Alves
52bc778629 feat(gpu): completely remove the internal CUDA_STREAMS in the HL API
- From now on the streams stored in the available cuda server key are the ones to be
2025-05-27 10:29:34 -03:00
Pedro Alves
10405c9836 feat(gpu): improve test_specific_gpu_selection() so it always tests all possible GPU configurations 2025-05-27 10:29:34 -03:00
Pedro Alves
5eaf6cec55 feat(gpu): reintroduce the feature that allows a user to perform computation on multi-gpu using a custom selection of GPUs
This reverts commit a7d8d2b1d4.
2025-05-27 10:29:34 -03:00
Agnes Leroy
3bfacc1e9d chore(bench): add swap throughput benchmark 2025-05-27 12:08:31 +02:00
Agnes Leroy
a47a418d41 chore(gpu): rework dex bench to prepare throughput benchmark 2025-05-27 12:08:31 +02:00
David Testé
75b3141e19 chore(ci): fix command parsing for gpu benchmark common workflow
Quote escaping was flawed and would generate an array containing a unique string instead of several ones separated by commas.
2025-05-27 10:14:06 +02:00
Agnes Leroy
d01328e0fe fix(gpu): fix overflow error in clear inputs remainder in long run tests 2025-05-26 22:51:18 +02:00
Agnes Leroy
6e102b5fa1 chore(gpu): fix oom error in ci 2025-05-26 22:50:55 +02:00
Pedro Alves
8aa6fa514e fix(gpu): add missing error checks after some kernels 2025-05-26 16:29:23 -03:00
Nicolas Sarlin
21a19cd3c5 chore(shortint): modswitch noise reduction key upgrade without clone 2025-05-26 16:53:35 +02:00
Nicolas Sarlin
f51c70d536 feat(shortint): adds generic client key for atomic pattern support 2025-05-26 16:53:35 +02:00
Agnes Leroy
66e3c02838 feat(gpu): add memory tracking functions for comparisons 2025-05-23 14:37:39 +02:00
Pedro Alves
408e81c45a feat(gpu): add support for GPU-accelerated expand on the HL Api
- includes documentation about GPU's accelerated expand on the HL API
- rework CudaKeySwitchingKey
- Cloning the key is no longer necessary on the HL API
2025-05-23 11:54:29 +02:00
dependabot[bot]
4152906c5d chore(deps): bump actions/upload-artifact from 4.6.0 to 4.6.2
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.0 to 4.6.2.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4.6.0...ea165f8d65b6e75b540449e92b4886f43607fa02)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: 4.6.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-23 11:23:02 +02:00
dependabot[bot]
9fc8a0b5bc chore(deps): bump codecov/codecov-action from 5.4.2 to 5.4.3
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5.4.2 to 5.4.3.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](ad3126e916...18283e04ce)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-version: 5.4.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-23 11:22:55 +02:00
dependabot[bot]
5dc3e59d13 chore(deps): bump zgosalvez/github-actions-ensure-sha-pinned-actions
Bumps [zgosalvez/github-actions-ensure-sha-pinned-actions](https://github.com/zgosalvez/github-actions-ensure-sha-pinned-actions) from 3.0.23 to 3.0.25.
- [Release notes](https://github.com/zgosalvez/github-actions-ensure-sha-pinned-actions/releases)
- [Commits](4830be28ce...fc87bb5b5a)

---
updated-dependencies:
- dependency-name: zgosalvez/github-actions-ensure-sha-pinned-actions
  dependency-version: 3.0.25
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-23 11:22:48 +02:00
Nicolas Sarlin
b40996a7e5 chore(shortint): prepare the v1.3 params folder 2025-05-23 10:57:56 +02:00
Pedro Alves
b066ef19fa fix(gpu): fix the internal benchmark 2025-05-23 10:32:24 +02:00
Nicolas Sarlin
25d008bae8 fix(bench): add missing internal keycache feature 2025-05-22 16:14:30 +02:00
David Testé
2749c1088c chore(ci): handle multi directories for parameters records 2025-05-22 15:03:02 +02:00
Guillermo Oyarzun
c19cd9f021 fix(gpu): add indexes to modulus switch noise reduction 2025-05-22 10:50:51 +02:00
Nicolas Sarlin
45fdba04b1 fix(gpu): allow to build with hpu feature enabled 2025-05-22 10:21:35 +02:00
youben11
69d46810b8 feat(core): chunked seeded_lwe_ksk generation 2025-05-21 18:06:58 +01:00
youben11
a16eeb983f feat(core): chunked lwe_ksk generation 2025-05-21 18:06:58 +01:00
Agnes Leroy
8278a9373c fix(gpu): fix degrees after abs 2025-05-21 15:46:18 +02:00
Arthur Meyre
e2a2768484 chore: fix typos
Co-authored-by: crStiv <cryptostiv7@gmail.com>
2025-05-21 13:06:42 +02:00
Arthur Meyre
57cfc38b66 chore: some more CODEOWNERS 2025-05-21 11:30:35 +02:00
Pedro Alves
259d125434 fix(gpu): fix pbs and ks benchmarks 2025-05-20 17:37:48 +02:00
Arthur Meyre
2571196b41 chore: fix ambiguous decrypt 2025-05-20 17:32:05 +02:00
Arthur Meyre
9f3dc6167d chore: remove raw decomposition
- this was left in by mistake
2025-05-20 17:32:05 +02:00
Agnes Leroy
59c17692a3 feat(gpu): add memory tracking functions for bitops 2025-05-20 16:16:22 +02:00
David Testé
e29d615b9d chore(bench): add suitable heuristic for zk throughput
Heuristic based on PBS count was flawed since a ZK verification operation will eat up to 32 threads on the machine. The previous heuristic could generate an input data vector way bigger than the total of threads divided by 32. This in turn lead to long execution time for benchmark and generate bad results.
2025-05-20 15:02:59 +02:00
tmontaigu
8caff604ed chore: use wrapping div in long_run 2025-05-20 14:36:22 +02:00
Agnes Leroy
16badf0c00 chore(gpu): add degree prints in long run tests in case of failure 2025-05-20 14:13:59 +02:00
Nicolas Sarlin
99a27c1cbe chore(hpu): fix Cargo.toml for release 2025-05-19 17:47:40 +02:00
Nicolas Sarlin
9131aaa383 fix(doc): uniformized readme file names 2025-05-19 15:22:34 +02:00
Nicolas Sarlin
a01949e630 fix(bench): compilation error without the internal-keycache feature 2025-05-19 09:50:29 +02:00
Arthur Meyre
30a58cdd1a chore: update version in docs to 1.2.0 2025-05-16 17:10:12 +02:00
Agnes Leroy
03325bf94e feat(gpu): add memory tracking functions for add/sub and scalar add/sub 2025-05-16 16:39:34 +02:00
Nicolas Sarlin
786fe66495 chore(zk): check that crs group element at index n is 0 2025-05-16 16:38:27 +02:00
Baptiste Roux
9ee8259002 feat(hpu): Add Hpu backend implementation
This backend abstract communication with Hpu Fpga hardware.
It define it's proper entities to prevent circular dependencies with
tfhe-rs.
Object lifetime is handle through Arc<Mutex<T>> wrapper, and enforce
that all objects currently alive in Hpu Hw are also kept valid on the
host side.

It contains the second version of HPU instruction set (HIS_V2.0):
* DOp have following properties:
  + Template as first class citizen
  + Support of Immediate template
  + Direct parser and conversion between Asm/Hex
  + Replace deku (and it's associated endianess limitation) by
  + bitfield_struct and manual parsing

* IOp have following properties:
  + Support various number of Destination
  + Support various number of Sources
  + Support various number of Immediat values
  + Support of multiple bitwidth (Not implemented yet in the Fpga
    firmware)

Details could be view in `backends/tfhe-hpu-backend/Readme.md`
2025-05-16 16:30:23 +02:00
Agnes Leroy
a7d8d2b1d4 feat(gpu): revert enables the user to perform computation on multi-gpu using a custom selection of GPUs
This reverts commit 0280dbeb41.
2025-05-15 18:01:17 +02:00
507 changed files with 59854 additions and 2461 deletions

View File

@@ -6,6 +6,7 @@ self-hosted-runner:
- large_windows_16_latest
- large_ubuntu_16
- large_ubuntu_16-22.04
- v80-desktop
# Configuration variables in array of strings defined in your repository or
# organization. `null` means disabling configuration variables check.
# Empty array means no configuration variable is allowed.

View File

@@ -84,7 +84,7 @@ jobs:
run: |
# Use Sed to extract a value from a string, this cannot be done with the ${variable//search/replace} pattern.
# shellcheck disable=SC2001
PARSED_COMMAND=$(echo "${INPUTS_COMMAND}" | sed 's/[[:space:]]*,[[:space:]]*/\\", \\"/g')
PARSED_COMMAND=$(echo "${INPUTS_COMMAND}" | sed 's/[[:space:]]*,[[:space:]]*/\", \"/g')
echo "COMMAND=[\"${PARSED_COMMAND}\"]" >> "${GITHUB_ENV}"
- name: Set single operations flavor

View File

@@ -0,0 +1,88 @@
# Run all integer benchmarks on a permanent HPU instance and return parsed results to Slab CI bot.
name: Hpu Integer Benchmarks
on:
workflow_dispatch:
env:
CARGO_TERM_COLOR: always
RESULTS_FILENAME: parsed_benchmark_results_${{ github.sha }}.json
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
RUST_BACKTRACE: "full"
RUST_MIN_STACK: "8388608"
permissions: {}
jobs:
integer-benchmarks-hpu:
name: Execute integer & erc20 benchmarks for HPU backend
runs-on: v80-desktop
concurrency:
group: ${{ github.workflow }}_${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
timeout-minutes: 1440 # 24 hours
steps:
# Needed as long as hw_regmap repository is private
- name: Configure SSH
uses: webfactory/ssh-agent@a6f90b1f127823b31d4d4a8d96047790581349bd # v0.9.1
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
- name: Checkout tfhe-rs repo with tags
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Get benchmark details
run: |
{
echo "BENCH_DATE=$(date --iso-8601=seconds)";
echo "COMMIT_DATE=$(git --no-pager show -s --format=%cd --date=iso8601-strict ${{ github.sha }})";
echo "COMMIT_HASH=$(git describe --tags --dirty)";
} >> "${GITHUB_ENV}"
- name: Install rust
uses: dtolnay/rust-toolchain@888c2e1ea69ab0d4330cbf0af1ecc7b68f368cc1
with:
toolchain: nightly
- name: Checkout Slab repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
repository: zama-ai/slab
path: slab
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Run benchmarks
run: |
make bench_integer_hpu
make bench_hlapi_erc20_hpu
- name: Parse results
run: |
python3 ./ci/benchmark_parser.py target/criterion "${RESULTS_FILENAME}" \
--database tfhe_rs \
--hardware "hpu_x1" \
--backend hpu \
--project-version "${COMMIT_HASH}" \
--branch "${REF_NAME}" \
--commit-date "${COMMIT_DATE}" \
--bench-date "${BENCH_DATE}" \
--walk-subdirs
env:
REF_NAME: ${{ github.ref_name }}
- name: Upload parsed results artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
with:
name: ${{ github.sha }}_integer_benchmarks
path: ${{ env.RESULTS_FILENAME }}
- name: Send data to Slab
shell: bash
run: |
python3 slab/scripts/data_sender.py "${RESULTS_FILENAME}" "${{ secrets.JOB_SECRET }}" \
--slab-url "${{ secrets.SLAB_URL }}"

View File

@@ -94,5 +94,10 @@ jobs:
run: |
make build_tfhe_coverage
- name: Run Hpu pcc checks
if: ${{ contains(matrix.os, 'ubuntu') }}
run: |
make pcc_hpu
# The wasm build check is a bit annoying to set-up here and is done during the tests in
# aws_tfhe_tests.yml

View File

@@ -51,7 +51,7 @@ jobs:
runs-on: ${{ matrix.runner_type }}
strategy:
matrix:
runner_type: [ubuntu-latest, macos-latest, windows-latest]
runner_type: [ ubuntu-latest, macos-latest, windows-latest ]
fail-fast: false
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
@@ -82,7 +82,7 @@ jobs:
runs-on: ${{ matrix.runner_type }}
strategy:
matrix:
runner_type: [ubuntu-latest, macos-latest, windows-latest]
runner_type: [ ubuntu-latest, macos-latest, windows-latest ]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:

View File

@@ -51,7 +51,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
os: [ ubuntu-latest, macos-latest, windows-latest ]
fail-fast: false
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
@@ -77,7 +77,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
os: [ ubuntu-latest, macos-latest, windows-latest ]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:

View File

@@ -38,9 +38,11 @@ jobs:
- name: Check workflows security
run: |
make check_workflow_security
env:
GH_TOKEN: ${{ env.CHECKOUT_TOKEN }}
- name: Ensure SHA pinned actions
uses: zgosalvez/github-actions-ensure-sha-pinned-actions@4830be28ce81da52ec70d65c552a7403821d98d4 # v3.0.23
uses: zgosalvez/github-actions-ensure-sha-pinned-actions@fc87bb5b5a97953d987372e74478de634726b3e5 # v3.0.25
with:
allowlist: |
slsa-framework/slsa-github-generator

View File

@@ -90,7 +90,7 @@ jobs:
make test_shortint_cov
- name: Upload tfhe coverage to Codecov
uses: codecov/codecov-action@ad3126e916f78f00edff4ed0317cf185271ccc2d
uses: codecov/codecov-action@18283e04ce6e62d37312384ff67231eb8fd56d24
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
with:
token: ${{ secrets.CODECOV_TOKEN }}
@@ -104,7 +104,7 @@ jobs:
make test_integer_cov
- name: Upload tfhe coverage to Codecov
uses: codecov/codecov-action@ad3126e916f78f00edff4ed0317cf185271ccc2d
uses: codecov/codecov-action@18283e04ce6e62d37312384ff67231eb8fd56d24
if: steps.changed-files.outputs.tfhe_any_changed == 'true'
with:
token: ${{ secrets.CODECOV_TOKEN }}

View File

@@ -149,7 +149,7 @@ jobs:
- name: Run High Level API Tests
run: |
BIG_TESTS_INSTANCE=FALSE make test_high_level_api_gpu
make test_high_level_api_gpu
slack-notify:
name: Slack Notification

View File

@@ -1,4 +1,4 @@
# Perfom tfhe-cuda-backend post-commit checks on an AWS instance
# Perform tfhe-cuda-backend post-commit checks on an AWS instance
name: Cuda - Post-commit Checks
env:
@@ -120,6 +120,10 @@ jobs:
run: |
make pcc_gpu
- name: Check build with hpu enabled
run: |
make clippy_gpu_hpu
- name: Set pull-request URL
if: ${{ failure() && github.event_name == 'pull_request' }}
run: |

73
.github/workflows/hpu_hlapi_tests.yml vendored Normal file
View File

@@ -0,0 +1,73 @@
# Test tfhe-fft
name: Cargo Test HLAPI HPU
on:
pull_request:
push:
branches:
- main
env:
CARGO_TERM_COLOR: always
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}
CHECKOUT_TOKEN: ${{ secrets.REPO_CHECKOUT_TOKEN || secrets.GITHUB_TOKEN }}
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
permissions: { }
jobs:
should-run:
runs-on: ubuntu-latest
permissions:
pull-requests: read
outputs:
hpu_test: ${{ env.IS_PULL_REQUEST == 'false' || steps.changed-files.outputs.hpu_any_changed }}
steps:
- name: Checkout tfhe-rs
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Check for file changes
id: changed-files
uses: tj-actions/changed-files@ed68ef82c095e0d48ec87eccea555d944a631a4c # v46.0.5
with:
files_yaml: |
hpu:
- tfhe/Cargo.toml
- Makefile
- backends/tfhe-hpu-backend/**
- mockups/tfhe-hpu-mockup/**
cargo-tests-hpu:
needs: should-run
if: needs.should-run.outputs.hpu_test == 'true'
runs-on: large_ubuntu_16
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: 'false'
token: ${{ env.CHECKOUT_TOKEN }}
- name: Install Rust
uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af
with:
toolchain: stable
override: true
- name: Install Just
run: |
cargo install just
- name: Test HLAPI HPU
run: |
source setup_hpu.sh
just -f mockups/tfhe-hpu-mockup/Justfile BUILD_PROFILE=release mockup &
make HPU_CONFIG=sim test_high_level_api_hpu

105
.github/workflows/make_release_hpu.yml vendored Normal file
View File

@@ -0,0 +1,105 @@
name: Publish HPU release
on:
workflow_dispatch:
inputs:
dry_run:
description: "Dry-run"
type: boolean
default: true
env:
ACTION_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
SLACK_ICON: https://pbs.twimg.com/profile_images/1274014582265298945/OjBKP9kn_400x400.png
SLACK_USERNAME: ${{ secrets.BOT_USERNAME }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
permissions: {}
jobs:
verify_tag:
uses: ./.github/workflows/verify_tagged_commit.yml
secrets:
RELEASE_TEAM: ${{ secrets.RELEASE_TEAM }}
READ_ORG_TOKEN: ${{ secrets.READ_ORG_TOKEN }}
package:
runs-on: ubuntu-latest
needs: verify_tag
outputs:
hash: ${{ steps.hash.outputs.hash }}
steps:
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Prepare package
run: |
cargo package -p tfhe-hpu-backend
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: crate
path: target/package/*.crate
- name: generate hash
id: hash
run: cd target/package && echo "hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
provenance:
if: ${{ !inputs.dry_run }}
needs: [package]
uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0
permissions:
# Needed to detect the GitHub Actions environment
actions: read
# Needed to create the provenance via GitHub OIDC
id-token: write
# Needed to upload assets/artifacts
contents: write
with:
# SHA-256 hashes of the Crate package.
base64-subjects: ${{ needs.package.outputs.hash }}
publish_release:
name: Publish tfhe-hpu-backend Release
runs-on: ubuntu-latest
needs: [verify_tag, package] # for comparing hashes
steps:
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: 'false'
token: ${{ secrets.REPO_CHECKOUT_TOKEN }}
- name: Publish crate.io package
env:
CRATES_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN }}
DRY_RUN: ${{ inputs.dry_run && '--dry-run' || '' }}
run: |
# DRY_RUN expansion cannot be double quoted when variable contains empty string otherwise cargo publish
# would fail. This is safe since DRY_RUN is handled in the env section above.
# shellcheck disable=SC2086
cargo publish -p tfhe-hpu-backend --token "${CRATES_TOKEN}" ${DRY_RUN}
- name: Generate hash
id: published_hash
run: cd target/package && echo "pub_hash=$(sha256sum ./*.crate | base64 -w0)" >> "${GITHUB_OUTPUT}"
- name: Slack notification (hashes comparison)
if: ${{ needs.package.outputs.hash != steps.published_hash.outputs.pub_hash }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: failure
SLACK_MESSAGE: "SLSA tfhe-hpu-backend crate - hash comparison failure: (${{ env.ACTION_RUN_URL }})"
- name: Slack Notification
if: ${{ failure() || (cancelled() && github.event_name != 'pull_request') }}
continue-on-error: true
uses: rtCamp/action-slack-notify@e31e87e03dd19038e411e38ae27cbad084a90661 # v2.3.3
env:
SLACK_COLOR: ${{ job.status }}
SLACK_MESSAGE: "tfhe-hpu-backend release failed: (${{ env.ACTION_RUN_URL }})"

2
.lfsconfig Normal file
View File

@@ -0,0 +1,2 @@
[lfs]
fetchexclude = *

View File

@@ -10,3 +10,9 @@
/tfhe/src/integer/gpu
/tfhe/src/high_level_api/ @tmontaigu
/Makefile @IceTDrinker @soonum
/.github/ @soonum
/CODEOWNERS @IceTDrinker

View File

@@ -9,10 +9,12 @@ members = [
"tasks",
"tfhe-csprng",
"backends/tfhe-cuda-backend",
"backends/tfhe-hpu-backend",
"utils/tfhe-versionable",
"utils/tfhe-versionable-derive",
"utils/param_dedup",
"tests",
"mockups/tfhe-hpu-mockup",
]
exclude = [

View File

@@ -2,6 +2,7 @@ SHELL:=$(shell /usr/bin/env which bash)
OS:=$(shell uname)
RS_CHECK_TOOLCHAIN:=$(shell cat toolchain.txt | tr -d '\n')
CARGO_RS_CHECK_TOOLCHAIN:=+$(RS_CHECK_TOOLCHAIN)
CARGO_BUILD_JOBS=default
CPU_COUNT=$(shell ./scripts/cpu_count.sh)
RS_BUILD_TOOLCHAIN:=stable
CARGO_RS_BUILD_TOOLCHAIN:=+$(RS_BUILD_TOOLCHAIN)
@@ -55,6 +56,9 @@ REGEX_PATTERN?=''
TFHECUDA_SRC=backends/tfhe-cuda-backend/cuda
TFHECUDA_BUILD=$(TFHECUDA_SRC)/build
# tfhe-hpu-backend
HPU_CONFIG=v80
# Exclude these files from coverage reports
define COVERAGE_EXCLUDED_FILES
--exclude-files apps/trivium/src/trivium/* \
@@ -290,7 +294,7 @@ check_typos: install_typos_checker
.PHONY: clippy_gpu # Run clippy lints on tfhe with "gpu" enabled
clippy_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=boolean,shortint,integer,internal-keycache,gpu,pbs-stats,extended-types \
--features=boolean,shortint,integer,internal-keycache,gpu,pbs-stats,extended-types,zk-pok \
--all-targets \
-p $(TFHE_SPEC) -- --no-deps -D warnings
@@ -301,6 +305,20 @@ check_gpu: install_rs_check_toolchain
--all-targets \
-p $(TFHE_SPEC)
.PHONY: clippy_hpu # Run clippy lints on tfhe with "hpu" enabled
clippy_hpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=boolean,shortint,integer,internal-keycache,hpu,pbs-stats,extended-types \
--all-targets \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: clippy_gpu_hpu # Run clippy lints on tfhe with "gpu" and "hpu" enabled
clippy_gpu_hpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy \
--features=boolean,shortint,integer,internal-keycache,gpu,hpu,pbs-stats,extended-types,zk-pok \
--all-targets \
-p $(TFHE_SPEC) -- --no-deps -D warnings
.PHONY: fix_newline # Fix newline at end of file issues to be UNIX compliant
fix_newline: check_linelint_installed
linelint -a .
@@ -473,6 +491,11 @@ clippy_cuda_backend: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
-p tfhe-cuda-backend -- --no-deps -D warnings
.PHONY: clippy_hpu_backend # Run clippy lints on the tfhe-hpu-backend
clippy_hpu_backend: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
-p tfhe-hpu-backend -- --no-deps -D warnings
.PHONY: check_rust_bindings_did_not_change # Check rust bindings are up to date for tfhe-cuda-backend
check_rust_bindings_did_not_change:
cargo build -p tfhe-cuda-backend && "$(MAKE)" fmt_gpu && \
@@ -702,6 +725,28 @@ test_signed_integer_multi_bit_gpu_ci: install_rs_check_toolchain install_cargo_n
--cargo-profile "$(CARGO_PROFILE)" --multi-bit --backend "gpu" \
--signed-only --tfhe-package "$(TFHE_SPEC)"
.PHONY: test_integer_hpu_ci # Run the tests for integer ci on hpu backend
test_integer_hpu_ci: install_rs_check_toolchain install_cargo_nextest
cargo test --release -p $(TFHE_SPEC) --features hpu-v80 --test hpu
.PHONY: test_integer_hpu_mockup_ci # Run the tests for integer ci on hpu backend and mockup
test_integer_hpu_mockup_ci: install_rs_check_toolchain install_cargo_nextest
source ./setup_hpu.sh --config sim ; \
cargo build --release --bin hpu_mockup; \
coproc target/release/hpu_mockup --params mockups/tfhe-hpu-mockup/params/tuniform_64b_pfail64_psi64.toml > mockup.log; \
HPU_TEST_ITER=1 \
cargo test --profile devo -p $(TFHE_SPEC) --features hpu --test hpu -- u32 && \
kill %1
.PHONY: test_integer_hpu_mockup_ci_fast # Run the quick tests for integer ci on hpu backend and mockup.
test_integer_hpu_mockup_ci_fast: install_rs_check_toolchain install_cargo_nextest
source ./setup_hpu.sh --config sim ; \
cargo build --profile devo --bin hpu_mockup; \
coproc target/devo/hpu_mockup --params mockups/tfhe-hpu-mockup/params/tuniform_64b_fast.toml > mockup.log; \
HPU_TEST_ITER=1 \
cargo test --profile devo -p $(TFHE_SPEC) --features hpu --test hpu -- u32 && \
kill %1
.PHONY: test_boolean # Run the tests of the boolean module
test_boolean: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
@@ -854,9 +899,25 @@ test_high_level_api: install_rs_build_toolchain
test_high_level_api_gpu: install_rs_build_toolchain install_cargo_nextest
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) nextest run --cargo-profile $(CARGO_PROFILE) \
--features=integer,internal-keycache,gpu -p $(TFHE_SPEC) \
--test-threads=4 --features=integer,internal-keycache,gpu,zk-pok -p $(TFHE_SPEC) \
-E "test(/high_level_api::.*gpu.*/)"
test_high_level_api_hpu: install_rs_build_toolchain install_cargo_nextest
ifeq ($(HPU_CONFIG), v80)
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) nextest run --cargo-profile $(CARGO_PROFILE) \
--build-jobs=$(CARGO_BUILD_JOBS) \
--test-threads=1 \
--features=integer,internal-keycache,hpu,hpu-v80 -p $(TFHE_SPEC) \
-E "test(/high_level_api::.*hpu.*/)"
else
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) nextest run --cargo-profile $(CARGO_PROFILE) \
--build-jobs=$(CARGO_BUILD_JOBS) \
--test-threads=1 \
--features=integer,internal-keycache,hpu -p $(TFHE_SPEC) \
-E "test(/high_level_api::.*hpu.*/)"
endif
.PHONY: test_strings # Run the tests for strings ci
test_strings: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --profile $(CARGO_PROFILE) \
@@ -1012,7 +1073,7 @@ check_compile_tests: install_rs_build_toolchain
.PHONY: check_compile_tests_benches_gpu # Build tests in debug without running them
check_compile_tests_benches_gpu: install_rs_build_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo $(CARGO_RS_BUILD_TOOLCHAIN) test --no-run \
--features=experimental,boolean,shortint,integer,internal-keycache,gpu \
--features=experimental,boolean,shortint,integer,internal-keycache,gpu,zk-pok \
-p $(TFHE_SPEC)
mkdir -p "$(TFHECUDA_BUILD)" && \
cd "$(TFHECUDA_BUILD)" && \
@@ -1100,6 +1161,12 @@ clippy_bench_gpu: install_rs_check_toolchain
--features=gpu,shortint,integer,internal-keycache,nightly-avx512,pbs-stats,zk-pok \
-p tfhe-benchmark -- --no-deps -D warnings
.PHONY: clippy_bench_hpu # Run clippy lints on tfhe-benchmark
clippy_bench_hpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" cargo "$(CARGO_RS_CHECK_TOOLCHAIN)" clippy --all-targets \
--features=hpu,shortint,integer,internal-keycache,pbs-stats\
-p tfhe-benchmark -- --no-deps -D warnings
.PHONY: print_doc_bench_parameters # Print parameters used in doc benchmarks
print_doc_bench_parameters:
RUSTFLAGS="" cargo run --example print_doc_bench_parameters \
@@ -1133,6 +1200,14 @@ bench_signed_integer_gpu: install_rs_check_toolchain
--bench integer-signed-bench \
--features=integer,gpu,internal-keycache,nightly-avx512,pbs-stats -p tfhe-benchmark --
.PHONY: bench_integer_hpu # Run benchmarks for integer on HPU backend
bench_integer_hpu: install_rs_check_toolchain
source ./setup_hpu.sh --config $(HPU_CONFIG) ; \
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_OP_FLAVOR=$(BENCH_OP_FLAVOR) __TFHE_RS_FAST_BENCH=$(FAST_BENCH) __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench integer-bench \
--features=integer,internal-keycache,pbs-stats,hpu,hpu-v80 -p tfhe-benchmark -- --quick
.PHONY: bench_integer_compression # Run benchmarks for unsigned integer compression
bench_integer_compression: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
@@ -1146,7 +1221,7 @@ bench_integer_compression_gpu: install_rs_check_toolchain
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench glwe_packing_compression-integer-bench \
--features=integer,internal-keycache,gpu,pbs-stats -p tfhe-benchmark --
.PHONY: bench_integer_zk_gpu
bench_integer_zk_gpu: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" __TFHE_RS_BENCH_TYPE=$(BENCH_TYPE) \
@@ -1324,6 +1399,14 @@ bench_hlapi_dex_gpu: install_rs_check_toolchain
--bench hlapi-dex \
--features=integer,gpu,internal-keycache,pbs-stats,nightly-avx512 -p tfhe-benchmark --
.PHONY: bench_hlapi_erc20_hpu # Run benchmarks for ECR20 operations on HPU
bench_hlapi_erc20_hpu: install_rs_check_toolchain
source ./setup_hpu.sh --config $(HPU_CONFIG) ; \
RUSTFLAGS="$(RUSTFLAGS)" \
cargo $(CARGO_RS_CHECK_TOOLCHAIN) bench \
--bench hlapi-erc20 \
--features=integer,internal-keycache,hpu,hpu-v80 -p tfhe-benchmark -- --quick
.PHONY: bench_tfhe_zk_pok # Run benchmarks for the tfhe_zk_pok crate
bench_tfhe_zk_pok: install_rs_check_toolchain
RUSTFLAGS="$(RUSTFLAGS)" \
@@ -1423,6 +1506,9 @@ tfhe_lints
pcc_gpu: check_rust_bindings_did_not_change clippy_rustdoc_gpu \
clippy_gpu clippy_cuda_backend clippy_bench_gpu check_compile_tests_benches_gpu
.PHONY: pcc_hpu # pcc stands for pre commit checks for HPU compilation
pcc_hpu: clippy_hpu clippy_hpu_backend test_integer_hpu_mockup_ci_fast
.PHONY: fpcc # pcc stands for pre commit checks, the f stands for fast
fpcc: no_tfhe_typo no_dbg_log check_parameter_export_ok check_fmt check_typos lint_doc \
check_md_docs_are_tested clippy_fast check_compile_tests

View File

@@ -11,11 +11,13 @@ extend-ignore-identifiers-re = [
# Example with string replacing "hello" with "herlo"
"herlo",
# Example in trivium
"C9217BA0D762ACA1"
"C9217BA0D762ACA1",
"0x[0-9a-fA-F]+"
]
[files]
extend-exclude = [
"backends/tfhe-cuda-backend/cuda/src/fft128/twiddles.cu",
"backends/tfhe-cuda-backend/cuda/src/fft/twiddles.cu",
"backends/tfhe-hpu-backend/config_store/**/*.link_summary",
]

View File

@@ -28,9 +28,10 @@ void cuda_modulus_switch_inplace_64(void *stream, uint32_t gpu_index,
void cuda_improve_noise_modulus_switch_64(
void *stream, uint32_t gpu_index, void *lwe_array_out,
void const *lwe_array_in, void const *encrypted_zeros, uint32_t lwe_size,
uint32_t num_lwes, uint32_t num_zeros, double input_variance,
double r_sigma, double bound, uint32_t log_modulus);
void const *lwe_array_in, void const *lwe_array_indexes,
void const *encrypted_zeros, uint32_t lwe_size, uint32_t num_lwes,
uint32_t num_zeros, double input_variance, double r_sigma, double bound,
uint32_t log_modulus);
void cuda_glwe_sample_extract_128(
void *stream, uint32_t gpu_index, void *lwe_array_out,

View File

@@ -248,6 +248,7 @@ template <> struct pbs_buffer_128<PBS_TYPE::CLASSICAL> {
__uint128_t *global_accumulator;
double *global_join_buffer;
__uint128_t *temp_lwe_array_in;
uint64_t *trivial_indexes;
PBS_VARIANT pbs_variant;
bool uses_noise_reduction;
@@ -263,11 +264,27 @@ template <> struct pbs_buffer_128<PBS_TYPE::CLASSICAL> {
cuda_set_device(gpu_index);
this->pbs_variant = pbs_variant;
this->uses_noise_reduction = allocate_ms_array;
this->temp_lwe_array_in =
(__uint128_t *)cuda_malloc_with_size_tracking_async(
(lwe_dimension + 1) * input_lwe_ciphertext_count *
sizeof(__uint128_t),
stream, gpu_index, size_tracker, allocate_ms_array);
if (allocate_ms_array) {
this->temp_lwe_array_in =
(__uint128_t *)cuda_malloc_with_size_tracking_async(
(lwe_dimension + 1) * input_lwe_ciphertext_count *
sizeof(__uint128_t),
stream, gpu_index, size_tracker, allocate_ms_array);
this->trivial_indexes = (uint64_t *)cuda_malloc_with_size_tracking_async(
input_lwe_ciphertext_count * sizeof(uint64_t), stream, gpu_index,
size_tracker, allocate_ms_array);
uint64_t *h_trivial_indexes = new uint64_t[input_lwe_ciphertext_count];
for (uint32_t i = 0; i < input_lwe_ciphertext_count; i++)
h_trivial_indexes[i] = i;
cuda_memcpy_with_size_tracking_async_to_gpu(
trivial_indexes, h_trivial_indexes,
input_lwe_ciphertext_count * sizeof(uint64_t), stream, gpu_index,
allocate_gpu_memory);
cuda_synchronize_stream(stream, gpu_index);
delete[] h_trivial_indexes;
}
auto max_shared_memory = cuda_get_max_shared_memory(gpu_index);
size_t global_join_buffer_size = (glwe_dimension + 1) * level_count *
input_lwe_ciphertext_count *
@@ -404,9 +421,12 @@ template <> struct pbs_buffer_128<PBS_TYPE::CLASSICAL> {
cuda_drop_with_size_tracking_async(global_accumulator, stream, gpu_index,
gpu_memory_allocated);
if (uses_noise_reduction)
if (uses_noise_reduction) {
cuda_drop_with_size_tracking_async(temp_lwe_array_in, stream, gpu_index,
gpu_memory_allocated);
cuda_drop_with_size_tracking_async(trivial_indexes, stream, gpu_index,
gpu_memory_allocated);
}
}
};
@@ -502,7 +522,8 @@ template <typename Torus>
bool has_support_to_cuda_programmable_bootstrap_tbc(uint32_t num_samples,
uint32_t glwe_dimension,
uint32_t polynomial_size,
uint32_t level_count);
uint32_t level_count,
uint32_t max_shared_memory);
#ifdef __CUDACC__
__device__ inline int get_start_ith_ggsw(int i, uint32_t polynomial_size,

View File

@@ -86,13 +86,15 @@ void cuda_modulus_switch_inplace_64(void *stream, uint32_t gpu_index,
void cuda_improve_noise_modulus_switch_64(
void *stream, uint32_t gpu_index, void *lwe_array_out,
void const *lwe_array_in, void const *encrypted_zeros, uint32_t lwe_size,
uint32_t num_lwes, uint32_t num_zeros, double input_variance,
double r_sigma, double bound, uint32_t log_modulus) {
void const *lwe_array_in, void const *lwe_array_indexes,
void const *encrypted_zeros, uint32_t lwe_size, uint32_t num_lwes,
uint32_t num_zeros, double input_variance, double r_sigma, double bound,
uint32_t log_modulus) {
host_improve_noise_modulus_switch<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
static_cast<uint64_t *>(lwe_array_out),
static_cast<uint64_t const *>(lwe_array_in),
static_cast<uint64_t const *>(lwe_array_indexes),
static_cast<const uint64_t *>(encrypted_zeros), lwe_size, num_lwes,
num_zeros, input_variance, r_sigma, bound, log_modulus);
}

View File

@@ -178,11 +178,10 @@ __device__ __forceinline__ double measure_modulus_switch_noise(
// Each thread processes two elements of the lwe array
template <typename Torus>
__global__ void
improve_noise_modulus_switch(Torus *array_out, const Torus *array_in,
const Torus *zeros, int lwe_size, int num_zeros,
double input_variance, double r_sigma,
double bound, uint32_t log_modulus) {
__global__ void improve_noise_modulus_switch(
Torus *array_out, const Torus *array_in, const uint64_t *indexes,
const Torus *zeros, int lwe_size, int num_zeros, double input_variance,
double r_sigma, double bound, uint32_t log_modulus) {
// First we will assume size is less than the number of threads per block
// I should switch this to dynamic shared memory
@@ -198,13 +197,13 @@ improve_noise_modulus_switch(Torus *array_out, const Torus *array_in,
// This probably are not needed cause we are setting the values
sum_mask_errors[threadIdx.x] = 0.f;
sum_squared_mask_errors[threadIdx.x] = 0.f;
auto this_block_lwe_in = array_in + indexes[blockIdx.x] * lwe_size;
auto this_block_lwe_out = array_out + indexes[blockIdx.x] * lwe_size;
Torus input_element1 = this_block_lwe_in[threadIdx.x];
Torus input_element1 = array_in[threadIdx.x + blockIdx.x * lwe_size];
Torus input_element2 =
threadIdx.x + blockDim.x < lwe_size
? array_in[threadIdx.x + blockDim.x + blockIdx.x * lwe_size]
: 0;
Torus input_element2 = threadIdx.x + blockDim.x < lwe_size
? this_block_lwe_in[threadIdx.x + blockDim.x]
: 0;
// Base noise is only handled by thread 0
double base_noise = measure_modulus_switch_noise<Torus>(
@@ -218,11 +217,10 @@ improve_noise_modulus_switch(Torus *array_out, const Torus *array_in,
__syncthreads();
if (found)
array_out[threadIdx.x + blockIdx.x * lwe_size] = input_element1;
this_block_lwe_out[threadIdx.x] = input_element1;
if (found && (threadIdx.x + blockDim.x) < lwe_size)
array_out[threadIdx.x + blockDim.x + blockIdx.x * lwe_size] =
input_element2;
this_block_lwe_out[threadIdx.x + blockDim.x] = input_element2;
__syncthreads();
// If we found a zero element we stop iterating (in avg 20 times are
@@ -253,11 +251,10 @@ improve_noise_modulus_switch(Torus *array_out, const Torus *array_in,
// Assumption we always have at least 512 elements
// If we find a useful zero encryption we replace the lwe by lwe + zero
if (found)
array_out[threadIdx.x + blockIdx.x * lwe_size] = zero_element1;
this_block_lwe_out[threadIdx.x] = zero_element1;
if (found && (threadIdx.x + blockDim.x) < lwe_size)
array_out[threadIdx.x + blockDim.x + blockIdx.x * lwe_size] =
zero_element2;
this_block_lwe_out[threadIdx.x + blockDim.x] = zero_element2;
__syncthreads();
// If we found a zero element we stop iterating (in avg 20 times are
@@ -270,9 +267,10 @@ improve_noise_modulus_switch(Torus *array_out, const Torus *array_in,
template <typename Torus>
__host__ void host_improve_noise_modulus_switch(
cudaStream_t stream, uint32_t gpu_index, Torus *array_out,
Torus const *array_in, const Torus *zeros, uint32_t lwe_size,
uint32_t num_lwes, const uint32_t num_zeros, const double input_variance,
const double r_sigma, const double bound, uint32_t log_modulus) {
Torus const *array_in, uint64_t const *indexes, const Torus *zeros,
uint32_t lwe_size, uint32_t num_lwes, const uint32_t num_zeros,
const double input_variance, const double r_sigma, const double bound,
uint32_t log_modulus) {
if (lwe_size < 512) {
PANIC("The lwe_size is less than 512, this is not supported\n");
@@ -289,8 +287,8 @@ __host__ void host_improve_noise_modulus_switch(
int num_threads = 512, num_blocks = num_lwes;
improve_noise_modulus_switch<Torus><<<num_blocks, num_threads, 0, stream>>>(
array_out, array_in, zeros, lwe_size, num_zeros, input_variance, r_sigma,
bound, log_modulus);
array_out, array_in, indexes, zeros, lwe_size, num_zeros, input_variance,
r_sigma, bound, log_modulus);
check_cuda_error(cudaGetLastError());
}

View File

@@ -492,6 +492,7 @@ __host__ void host_fourier_transform_forward_as_integer_f128(
batch_convert_u128_to_f128_as_integer<params>
<<<grid_size, block_size, 0, stream>>>(d_re0, d_re1, d_im0, d_im1,
d_standard);
check_cuda_error(cudaGetLastError());
// call negacyclic 128 bit forward fft.
if (full_sm) {
@@ -503,6 +504,7 @@ __host__ void host_fourier_transform_forward_as_integer_f128(
<<<grid_size, block_size, shared_memory_size, stream>>>(
d_re0, d_re1, d_im0, d_im1, d_re0, d_re1, d_im0, d_im1, buffer);
}
check_cuda_error(cudaGetLastError());
cuda_memcpy_async_to_cpu(re0, d_re0, N / 2 * sizeof(double), stream,
gpu_index);

View File

@@ -1291,7 +1291,7 @@ void host_compute_prefix_sum_hillis_steele(
}
// This function is used to perform step 2 of Thomas' new propagation algorithm
// Consist three steps:
// Consists of three steps:
// - propagates the carry within each group with cheap LWE operations stored in
// simulators
// - calculates the propagation state of each group

View File

@@ -261,6 +261,8 @@ void cuda_fourier_polynomial_mul(void *stream_v, uint32_t gpu_index,
default:
break;
}
check_cuda_error(cudaGetLastError());
cuda_drop_async(buffer, stream, gpu_index);
}

View File

@@ -279,6 +279,7 @@ void cuda_convert_lwe_programmable_bootstrap_key(cudaStream_t stream,
PANIC("Cuda error (convert KSK): unsupported polynomial size. Supported "
"N's are powers of two in the interval [256..16384].")
}
check_cuda_error(cudaGetLastError());
cuda_drop_async(d_bsk, stream, gpu_index);
cuda_drop_async(buffer, stream, gpu_index);
@@ -315,6 +316,7 @@ void convert_u128_to_f128_and_forward_fft_128(cudaStream_t stream,
// convert u128 into 4 x double
batch_convert_u128_to_f128_strided_as_torus<params>
<<<grid_size, block_size, 0, stream>>>(d_bsk, d_standard);
check_cuda_error(cudaGetLastError());
// call negacyclic 128 bit forward fft.
if (full_sm) {
@@ -326,6 +328,7 @@ void convert_u128_to_f128_and_forward_fft_128(cudaStream_t stream,
<<<grid_size, block_size, shared_memory_size, stream>>>(d_bsk, d_bsk,
buffer);
}
check_cuda_error(cudaGetLastError());
cuda_drop_async(buffer, stream, gpu_index);
}

View File

@@ -194,7 +194,8 @@ void execute_pbs_async(
lut_indexes_vec[i] + (ptrdiff_t)(gpu_offset);
void *zeros = nullptr;
if (ms_noise_reduction_key != nullptr)
if (ms_noise_reduction_key != nullptr &&
ms_noise_reduction_key->ptr != nullptr)
zeros = ms_noise_reduction_key->ptr[i];
cuda_programmable_bootstrap_lwe_ciphertext_vector_64(
streams[i], gpu_indexes[i], current_lwe_array_out,

View File

@@ -660,13 +660,15 @@ void cuda_programmable_bootstrap_lwe_ciphertext_vector_64(
(pbs_buffer<uint64_t, CLASSICAL> *)mem_ptr;
// If the parameters contain noise reduction key, then apply it
if (ms_noise_reduction_key != nullptr) {
if (ms_noise_reduction_key != nullptr &&
ms_noise_reduction_key->ptr != nullptr) {
if (ms_noise_reduction_key->num_zeros != 0) {
uint32_t log_modulus = log2(polynomial_size) + 1;
host_improve_noise_modulus_switch<uint64_t>(
static_cast<cudaStream_t>(stream), gpu_index,
buffer->temp_lwe_array_in,
static_cast<uint64_t const *>(lwe_array_in),
static_cast<uint64_t const *>(lwe_input_indexes),
static_cast<uint64_t *>(ms_noise_reduction_ptr), lwe_dimension + 1,
num_samples, ms_noise_reduction_key->num_zeros,
ms_noise_reduction_key->ms_input_variance,
@@ -846,4 +848,7 @@ template uint64_t scratch_cuda_programmable_bootstrap_tbc<uint64_t>(
uint32_t glwe_dimension, uint32_t polynomial_size, uint32_t level_count,
uint32_t input_lwe_ciphertext_count, bool allocate_gpu_memory,
bool allocate_ms_array);
template bool
supports_distributed_shared_memory_on_classic_programmable_bootstrap<
__uint128_t>(uint32_t polynomial_size, uint32_t max_shared_memory);
#endif

View File

@@ -256,6 +256,7 @@ void cuda_programmable_bootstrap_lwe_ciphertext_vector_128(
static_cast<cudaStream_t>(stream), gpu_index,
static_cast<__uint128_t *>(buffer->temp_lwe_array_in),
static_cast<__uint128_t const *>(lwe_array_in),
static_cast<uint64_t const *>(buffer->trivial_indexes),
static_cast<const __uint128_t *>(ms_noise_reduction_ptr),
lwe_dimension + 1, num_samples, ms_noise_reduction_key->num_zeros,
ms_noise_reduction_key->ms_input_variance,

View File

@@ -168,7 +168,7 @@ BENCHMARK_DEFINE_F(MultiBitBootstrap_u64, TbcMultiBit)
(benchmark::State &st) {
if (!has_support_to_cuda_programmable_bootstrap_tbc_multi_bit<uint64_t>(
input_lwe_ciphertext_count, glwe_dimension, polynomial_size,
pbs_level)) {
pbs_level, cuda_get_max_shared_memory(0))) {
st.SkipWithError("Configuration not supported for tbc operation");
return;
}
@@ -256,7 +256,7 @@ BENCHMARK_DEFINE_F(ClassicalBootstrap_u64, TbcPBC)
(benchmark::State &st) {
if (!has_support_to_cuda_programmable_bootstrap_tbc<uint64_t>(
input_lwe_ciphertext_count, glwe_dimension, polynomial_size,
pbs_level)) {
pbs_level, cuda_get_max_shared_memory(0))) {
st.SkipWithError("Configuration not supported for tbc operation");
return;
}

View File

@@ -50,6 +50,7 @@ unsafe extern "C" {
gpu_index: u32,
lwe_array_out: *mut ffi::c_void,
lwe_array_in: *const ffi::c_void,
lwe_array_indexes: *const ffi::c_void,
encrypted_zeros: *const ffi::c_void,
lwe_size: u32,
num_lwes: u32,

View File

@@ -0,0 +1,3 @@
*.xclbin filter=lfs diff=lfs merge=lfs -text
*.pdi filter=lfs diff=lfs merge=lfs -text
python/lib/example.json filter=lfs diff=lfs merge=lfs -text

3
backends/tfhe-hpu-backend/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
ngt_*
config
kogge_cfg.toml

View File

@@ -0,0 +1,88 @@
[package]
name = "tfhe-hpu-backend"
version = "0.1.0"
edition = "2021"
license = "BSD-3-Clause-Clear"
description = "HPU implementation on FPGA of TFHE-rs primitives."
homepage = "https://www.zama.ai/"
documentation = "https://docs.zama.ai/tfhe-rs"
repository = "https://github.com/zama-ai/tfhe-rs"
readme = "README.md"
keywords = ["encryption", "fhe", "cryptography", "hardware", "fpga"]
[features]
hw-xrt = []
hw-v80 = []
io-dump = ["num-traits"]
rtl_graph = ["dot2"]
utils = ["clap", "clap-num", "bitvec", "serde_json"]
[build-dependencies]
cxx-build = "1.0"
[dependencies]
cxx = "1.0"
hw_regmap = "0.1.0"
strum = { version = "0.26.2", features = ["derive"] }
strum_macros = "0.26.2"
enum_dispatch = "0.3.13"
tracing = "0.1.40"
tracing-subscriber = { version = "0.3.18", features = ["env-filter"] }
serde = { version = "1", features = ["derive"] }
toml = { version = "0.8", features = [] }
paste = "1.0.15"
thiserror = "1.0.61"
bytemuck = "1.16.0"
anyhow = "1.0.82"
lazy_static = "1.4.0"
rand = "0.8.5"
regex = "1.10.4"
bitflags = { version = "2.5.0", features = ["serde"] }
itertools = "0.11.0"
lru = "0.12.3"
bitfield-struct = "0.10.0"
crossbeam = { version = "0.8.4", features = ["crossbeam-queue"] }
rayon = { workspace = true }
# Dependencies used for Sim feature
ipc-channel = "0.18.3"
# Dependencies used for debug feature
num-traits = { version = "0.2", optional = true }
clap = { version = "4.4.4", features = ["derive"], optional = true }
clap-num = { version = "1.1.1", optional = true }
nix = { version = "0.29.0", features = ["ioctl", "uio"] }
# Dependencies used for rtl_graph features
dot2 = { version = "1.0", optional = true }
bitvec = { version = "1.0", optional = true }
serde_json = { version = "1.0", optional = true }
# Binary for manual debugging
# Enable to access Hpu register and drive some custom sequence by hand
[[bin]]
name = "hputil"
path = "src/utils/hputil.rs"
required-features = ["utils"]
# Binary for asm manipulation
# Enable to convert back and forth between asm/hex format
[[bin]]
name = "dop_fmt"
path = "src/utils/dop_fmt.rs"
required-features = ["utils"]
# Enable to convert back and forth between asm/hex format
[[bin]]
name = "iop_fmt"
path = "src/utils/iop_fmt.rs"
required-features = ["utils"]
# Firmware generation
# Enable to expand IOp in list of Dop for inspection
[[bin]]
name = "fw"
path = "src/utils/fw.rs"
required-features = ["utils"]

View File

@@ -0,0 +1,28 @@
BSD 3-Clause Clear License
Copyright © 2025 ZAMA.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
3. Neither the name of ZAMA nor the names of its contributors may be used to endorse
or promote products derived from this software without specific prior written permission.
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE.
THIS SOFTWARE IS PROVIDED BY THE ZAMA AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
ZAMA OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,261 @@
# TFHE-hpu-backend
## Brief
The `tfhe-hpu-backend` holds the code to interface with the HPU accelerator of TFHE.
It contains a `HpuDevice` abstraction that enables easy configuration and dispatching of TFHE operations on the HPU accelerator.
The user API exposes the following functions for hardware setup:
- `HpuDevice::new`, `HpuDevice::from_config`: Instantiates abstraction device from configuration file.
- `HpuDevice::init`: Configures and uploads the required public material.
- `new_var_from`: Creates a HPU ciphertext from `tfhe-rs` ciphertext.
HPU device could also be used from `integer` with the help of the following function:
- `tfhe::integer::hpu::init_device`: Init given HPU device with server key.
- `tfhe::integer::hpu::ciphertext::HpuRadixCiphertext::from_radix_ciphertext`: Convert a CpuRadixCiphertext in it's HPU counterpart.
HPU device could also be used seamlessly from `hl-api` by setting up a thread-local HPU server key:
- `tfhe::Config::from_hpu_device`: Extract hl-api configuration from HpuDevice.
- `tfhe::set_server_key`: Register the Hpu server key in the current thread.
HPU variables could also be created from a `high-level-api` object, with the help of the `hw-xfer` feature.
This implements a trait that enables `clone_on`, `mv_on` `FheUint` object on the HPU accelerator, and cast back `from` them.
These objects implement the `std::ops` trait and could be used to dispatch operations on HPU hardware.
### Backend structure
`tfhe-hpu-backend` is split in various modules:
- `entities`: Defines structure handled by HPU accelerator. Conversion traits from/into those objects are implemented in `tfhe-rs`.
- `asm`: Describes assembly-like language for the HPU. It enables abstract HPU behavior and easily updates it through micro-code.
- `fw`: Abstraction to help the micro-code designer. Uses a simple rust program for describing new HPU operations. Helps with register/heap management.
- `interface`:
+ `device`: High-level structure that exposes the User API.
+ `backend`: Inner private structure that contains HPU modules
+ `variable`: Wraps HPU ciphertexts. It enables to hook an hardware object lifetime within the `rust` borrow-checker.
+ `memory`: Handles on-board memory allocation and synchronization
+ `config`: Helps to configure HPU accelerator through a TOML configuration file
+ `cmd`: Translates operation over `variable` in concrete HPU commands
+ `regmap`: Communicates with the HPU internal register with ease.
+ `rtl`: Defines concrete `rust` structure populated from HPU's status/configuration registers
Below is an overview of the internal structure of the Backend.
![HPU backend structure](./figures/tfhe-hpu-backend.excalidraw.png)
This picture depicts the internal modules of `tfhe-hpu-backend`, Device is the main entry point for the user. Its lifecycle is as follows:
1. Create HpuDevice, open link with the associated FPGA. Configure associated drivers and upload the bitstream. Read FPGA registers to extract supported configuration and features. Build Firmware conversion table (IOp -> DOps stream).
2. Allocate required memory chunks in the on-board memory. Upload public material required by TFHE computation.
3. Create HPU variables that handle TFHE Ciphertexts. It wraps TFHE Ciphertext with required internal resources and enforces the correct lifetime management. This abstraction enforces that during the variable lifecycle all required resources are valid.
4. Users could trigger HPU operation from the HPU variable.
Variable abstraction enforces that required objects are correctly synced on the hardware and converts each operation in a concrete HPU command.
When HPU operation is acknowledged by the hardware, the internal state of the associated variable is updated.
This mechanism enables asynchronous operation and minimal amount of Host to/from HW memory transfer.
This mechanism also enables offloading a computation graph to the HPU and requires a synchronization only on the final results.
## Example
### Configuration file
HPU configuration knobs are gathered in a TOML configuration file. This file describes the targeted FPGA with its associated configuration:
```toml
[fpga] # FPGA target
# Register layout in the FPGA
regmap=["${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_cfg_1in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_cfg_3in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_prc_1in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_prc_3in3.toml"]
polling_us=10
[fpga.ffi.V80] # Hardware properties
ami_dev="/dev/ami1" # Name of ami device
qdma_h2c="/dev/qdma${V80_PCIE_DEV}001-MM-0" # QDma host to card device
qdma_c2h="/dev/qdma${V80_PCIE_DEV}001-MM-1" # QDma card to host device
[rtl] # RTL option
bpip_used = true # BPIP/IPIP mode
bpip_use_opportunism = false # Use strict flush paradigm
bpip_timeout = 100_000 # BPIP timeout in clock `cycles`
[board] # Board configuration
ct_mem = 32768 # Number of allocated ciphertext
ct_pc = [ # Memory used for ciphertext
{Hbm= {pc=32}},
{Hbm= {pc=33}},
]
heap_size = 16384 # Number of slots reserved for heap
lut_mem = 256 # Number of allocated LUT table
lut_pc = {Hbm={pc=34}} # Memory used for LUT
fw_size= 16777216 # Size in byte of the Firmware translation table
fw_pc = {Ddr= {offset= 0x3900_0000}} # Memory used for firmware translation table
bsk_pc = [ # Memory used for Bootstrapping key
{Hbm={pc=8}},
{Hbm={pc=12}},
{Hbm={pc=24}},
{Hbm={pc=28}},
{Hbm={pc=40}},
{Hbm={pc=44}},
{Hbm={pc=56}},
{Hbm={pc=60}}
]
ksk_pc = [ # Memory used for Keyswitching key
{Hbm={pc=0}},
{Hbm={pc=1}},
{Hbm={pc=2}},
{Hbm={pc=3}},
{Hbm={pc=4}},
{Hbm={pc=5}},
{Hbm={pc=6}},
{Hbm={pc=7}},
{Hbm={pc=16}},
{Hbm={pc=17}},
{Hbm={pc=18}},
{Hbm={pc=19}},
{Hbm={pc=20}},
{Hbm={pc=21}},
{Hbm={pc=22}},
{Hbm={pc=23}}
]
trace_pc = {Hbm={pc=35}} # Memory used for trace log
trace_depth = 32 # Size of Memory in MiB allocated for trace log
[firmware] # Firmware properties
implementation = "Llt" # Firmware flavor to use
integer_w=[4,6,8,10,12,14,16,32,64,128] # List of supported IOp width
min_batch_size = 11 # Minimum batch size for maximum throughput
kogge_cfg = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/kogge_cfg.toml"
custom_iop.'IOP[0]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_0.asm"
# Default firmware configuration. Could be edited on per-IOp basis
[firmware.op_cfg.default]
fill_batch_fifo = true
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
```
### Device setup
Following code snippet shows how to instantiate and configure a `HpuDevice`:
```rust
// Following code snippets used the HighLevelApi abstraction
// Instantiate HpuDevice --------------------------------------------------
let hpu_device = HpuDevice::from_config(&args.config.expand());
// Generate keys ----------------------------------------------------------
let config = Config::from_hpu_device(&hpu_device);
let cks = ClientKey::generate(config);
let csks = CompressedServerKey::new(&cks);
// Register HpuDevice and key as thread-local engine
set_server_key((hpu_device, csks));
```
### Clone CPU ciphertext on HPU
Following code snippet shows how to convert CPU ciphertext in HPU one:
``` rust
// Draw random value as input
let a = rand::thread_rng().gen_range(0..u8::MAX);
// Encrypt them on Cpu side
let a_fhe = FheUint8::encrypt(a, &cks);
// Clone a ciphertext and move them in HpuWorld
// NB: Data doesn't move over Pcie at this stage
// Data are only arranged in Hpu ordered an copy in the host internal buffer
let a_hpu = a_fhe.clone_on(&hpu_device);
```
### Dispatch operation on HPU
Once registered as thread-local engine, HighLevel FheUint are converted in Hpu format.
Following code snippets show how to start operation on HPU:
``` rust
// Sum -------------------------------------------------------------
// Generate random inputs value and compute expected result
let in_a = rng.gen_range(0..u64::max_value());
let in_b = rng.gen_range(0..u64::max_value());
let clear_sum_ab = in_a.wrapping_add(in_b);
// Encrypt input value
let fhe_a = FheUint64::encrypt(in_a, cks);
let fhe_b = FheUint64::encrypt(in_b, cks);
// Triggered operation on HPU through hl_api
let fhe_sum_ab = fhe_a+fhe_b;
// Decrypt values
let dec_sum_ab: u64 = fhe_sum_ab.decrypt(cks);
```
## Pre-made Examples
There are some example applications already available in `tfhe/examples/hpu`:
* hpu_hlapi: Depict the used of HPU device through HighLevelApi.
* hpu_bench: Depict the used of HPU device through Integer abstraction level.
In order to run those applications on hardware, user must build from the project root (i.e `tfhe-rs-internal`) with `hpu-v80` features:
> NB: Running examples required to have correctly pulled the `.pdi` files. Those files, due to their size, are backed by git-lfs and disabled by default.
> In order to retrieve them, use the following command:
> ```bash
> git lfs pull --include="*" --exclude=""
> ```
``` bash
cargo build --release --features="hpu-v80" --example hpu_hlapi --example hpu_bench
# Correctly setup environment with setup_hpu.sh script
source setup_hpu.sh --config v80 --init-qdma
./target/release/examples/hpu_bench --integer-w 64 --integer-w 32 --iop MUL --iter 10
./target/release/examples/hpu_hlapi
```
## Test framework
There is also a set of tests backed in tfhe-rs. Tests are gather in testbundle over various integer width.
Those tests have 5 sub-kind:
* `alu`: Run and check all ct x ct IOp
* `alus`: Run and check all ct x scalar IOp
* `bitwise`: Run and check all bitwise IOp
* `cmp`: Run and check all comparison IOp
* `ternary`: Run and check ternary operation
* `algo`: Run and check IOp dedicated to offload small algorithms
Snippets below give some example of command that could be used for testing:
``` bash
# Correctly setup environment with setup_hpu.sh script
source setup_hpu.sh --config v80 --init-qdma
# Run all sub-kind for 64b integer width
cargo test --release --features="hpu-v80" --test hpu -- u64
# Run only `bitwise` sub-kind for all integer width IOp
cargo test --release --features="hpu-v80" --test hpu -- bitwise
```
## Benches framework
HPU is completely integrated in tfhe benchmark system. Performances results could be extracted from HighLevelApi or Integer Api.
Three benchmarks could be started, through the following Makefile target for simplicity:
``` bash
# Do not forget to correctly set environment before hand
source setup_hpu.sh --config v80 --init-qdma
# Run hlapi benches
make test_high_level_api_hpu
# Run hlapi erc20 benches
make bench_hlapi_erc20_hpu
# Run integer level benches
make bench_integer_hpu
```
## Eager to start without real Hardware ?
You are still waiting your FPGA board and are frustrated by lead time ?
Don't worry, you have backed-up. A dedicated simulation infrastructure with accurate performance estimation is available in tfhe-rs.
You can use it on any linux/MacOs to test HPU integration within tfhe-rs and optimized your application for HPU target.
Simply through an eye to [Hpu mockup](../../mockups/tfhe-hpu-mockup/README.md), and follow the instruction.

View File

@@ -0,0 +1,26 @@
fn main() {
if cfg!(feature = "hw-xrt") {
println!("cargo:rustc-link-search=/opt/xilinx/xrt/lib");
println!("cargo:rustc-link-lib=dylib=stdc++");
println!("cargo:rustc-link-lib=dl");
println!("cargo:rustc-link-lib=rt");
println!("cargo:rustc-link-lib=uuid");
println!("cargo:rustc-link-lib=dylib=xrt_coreutil");
cxx_build::bridge("src/ffi/xrt/mod.rs")
.file("src/ffi/xrt/cxx/hpu_hw.cc")
.file("src/ffi/xrt/cxx/mem_zone.cc")
.flag_if_supported("-std=c++23")
.include("/opt/xilinx/xrt/include") // Enhance: support parsing bash env instead of hard path
.flag("-fmessage-length=0")
.compile("hpu-hw-ffi");
println!("cargo:rerun-if-changed=src/ffi/xrt/mod.rs");
println!("cargo:rerun-if-changed=src/ffi/xrt/cxx/hpu_hw.cc");
println!("cargo:rerun-if-changed=src/ffi/xrt/cxx/hpu_hw.h");
println!("cargo:rerun-if-changed=src/ffi/xrt/cxx/mem_zone.cc");
println!("cargo:rerun-if-changed=src/ffi/xrt/cxx/mem_zone.h");
} else {
// Simulation ffi -> nothing to do
}
}

View File

@@ -0,0 +1,15 @@
# CUST_0
# Simple IOp to check the xfer between Hpu/Cpu
# Construct constant in dest slot -> 249 (0xf9)
SUB R0 R0 R0
ADDS R0 R0 1
ST TD[0].0 R0
SUB R1 R1 R1
ADDS R1 R1 2
ST TD[0].1 R1
SUB R2 R2 R2
ADDS R2 R2 3
ST TD[0].2 R2
SUB R3 R3 R3
ADDS R3 R3 3
ST TD[0].3 R3

View File

@@ -0,0 +1,11 @@
# CUST_1
# Simple IOp to check the xfer between Hpu/Cpu
# Dest <- Src_a
LD R0 TS[0].0
LD R1 TS[0].1
LD R2 TS[0].2
LD R3 TS[0].3
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,25 @@
; CUST_8
; Simple IOp to check the ALU operation
; Dst[0].0 <- Src[0].0 + Src[1].0
LD R1 TS[0].0
LD R2 TS[1].0
ADD R0 R1 R2
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 + Src[1].1
LD R5 TS[0].1
LD R6 TS[1].1
ADD R4 R5 R6
ST TD[0].2 R4
; Dst[0].2 <- Src[0].2 + Src[1].2
LD R9 TS[0].2
LD R10 TS[1].2
ADD R8 R9 R10
ST TD[0].2 R8
; Dst[0].3 <- Src[0].3 + Src[1].3
LD R13 TS[0].3
LD R14 TS[1].3
ADD R12 R13 R14
ST TD[0].3 R0

View File

@@ -0,0 +1,6 @@
# CUST_16
# Simple IOp to check PBS behavior
# Dest <- PBSNone(Src_a.0)
LD R0 TS[0].0
PBS_F R0 R0 PbsNone
ST TD[0].0 R0

View File

@@ -0,0 +1,15 @@
# CUST_17
# Simple IOp to check PBS behavior
# Dest <- PBSNone(Src_a)
LD R0 TS[0].0
PBS R0 R0 PbsNone
ST TD[0].0 R0
LD R1 TS[0].1
PBS R1 R1 PbsNone
ST TD[0].1 R1
LD R2 TS[0].2
PBS R2 R2 PbsNone
ST TD[0].2 R2
LD R3 TS[0].3
PBS_F R3 R3 PbsNone
ST TD[0].3 R3

View File

@@ -0,0 +1,23 @@
; CUST_18
; Simple IOp to check extraction pattern
; Correct result:
; * Dst[0,1] <- Src[0][0,1]
; * Dst[2,3] <- Src[1][0,1]
; Pack Src[0][0,1] with a Mac and extract Carry/Msg in Dst[0][0,1]
LD R0 TS[0].0
LD R1 TS[0].1
MAC R3 R1 R0 4
PBS R4 R3 PbsMsgOnly
PBS R5 R3 PbsCarryInMsg
ST TD[0].0 R4
ST TD[0].1 R5
; Pack Src[1][0,1] with a Mac and extract Carry/Msg in Dst[0][2,3]
LD R10 TS[1].0
LD R11 TS[1].1
MAC R13 R11 R10 4
PBS R14 R13 PbsMsgOnly
PBS R15 R13 PbsCarryInMsg
ST TD[0].2 R14
ST TD[0].3 R15

View File

@@ -0,0 +1,19 @@
; CUST_19
; Simple IOp to check PbsMl2
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- 0
; * Dst[0][2] <- Src[0][0] +1
; * Dst[0][3] <- 0
; i.e Cust_19(0x2) => 0x32
; Construct a 0 for destination padding
SUB R16 R16 R16
; Apply PbsMl2 on Src[0] result goes in dest[0][0-3] (0-padded)
LD R0 TS[0].0
PBS_ML2_F R0 R0 PbsTestMany2
ST TD[0].0 R0
ST TD[0].1 R16
ST TD[0].2 R1
ST TD[0].3 R16

View File

@@ -0,0 +1,11 @@
# CUST_2
# Simple IOp to check the xfer between Hpu/Cpu
# Dest <- Src_b
LD R0 TS[1].0
LD R1 TS[1].1
LD R2 TS[1].2
LD R3 TS[1].3
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,22 @@
; CUST_20
; Simple IOp to check PbsMl4
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- Src[0][0] +1
; * Dst[0][2] <- Src[0][0] +2
; * Dst[0][3] <- Src[0][0] +3
; i.e Cust_20(0x0) => 0xe4
SUB R16 R16 R16
ST TD[0].0 R0
ST TD[0].1 R0
ST TD[0].2 R0
ST TD[0].3 R0
; Apply PbsMl4 on Src[0] result goes in dest[0][0-3]
LD R0 TS[0].0
PBS_ML4_F R0 R0 PbsTestMany4
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,24 @@
; CUST_21
; Simple IOp to check PbsMl8
; WARN: This operation required 16b ct width
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- Src[0][0] +1
; * Dst[0][2] <- Src[0][0] +2
; * Dst[0][3] <- Src[0][0] +3
; * Dst[0][4] <- Src[0][0] +4
; * Dst[0][5] <- Src[0][0] +5
; * Dst[0][6] <- Src[0][0] +6
; * Dst[0][7] <- Src[0][0] +7
; Apply PbsMl8 on Src[0] result goes in dest[0][0-7]
LD R0 TS[0].0
PBS_ML8_F R0 R0 PbsTestMany8
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3
ST TD[0].4 R4
ST TD[0].5 R5
ST TD[0].6 R6
ST TD[0].7 R7

View File

@@ -0,0 +1,16 @@
# CUST_3
# Simple IOp to check isc behavior
# Generate obvious deps and check that isc correctly issued the dop
# Correct result must bu Dest <- Src[0]
LD R0 TS[0].0
LD R1 TS[0].1
LD R2 TS[0].2
LD R3 TS[0].3
PBS R4 R0 PbsNone
ST TD[0].0 R4
PBS R4 R1 PbsNone
ST TD[0].1 R4
PBS R4 R2 PbsNone
ST TD[0].2 R4
PBS_F R4 R3 PbsNone
ST TD[0].3 R4

View File

@@ -0,0 +1,19 @@
; CUST_8
; Simple IOp to check the ALU operation
; Dst[0].0 <- Src[0].0 + Src[1].0
LD R1 TS[0].0
LD R2 TS[1].0
ADD R0 R1 R2
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 - Src[1].1
LD R5 TS[0].1
LD R6 TS[1].1
SUB R4 R5 R6
ST TD[0].1 R4
; Dst[0].2 <- Src[0].2 + (Src[1].2 *4)
LD R9 TS[0].2
LD R10 TS[1].2
MAC R8 R9 R10 4
ST TD[0].2 R8

View File

@@ -0,0 +1,21 @@
; CUST_9
; Simple IOp to check the ALU Scalar operation
; Dst[0].0 <- Src[0].0 + Imm[0].0
LD R1 TS[0].0
ADDS R0 R1 TI[0].0
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 - Imm[0].1
LD R5 TS[0].1
SUBS R4 R5 TI[0].1
ST TD[0].1 R4
; Dst[0].2 <- Imm[0].2 - Src[0].2
LD R9 TS[0].2
SSUB R8 R9 TI[0].2
ST TD[0].2 R8
; Dst[0].3 <- Src[0].3 * Imm[0].3
LD R13 TS[0].3
MULS R12 R13 TI[0].3
ST TD[0].3 R12

View File

@@ -0,0 +1,108 @@
[fpga]
regmap=["${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_cfg_1in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_cfg_3in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_prc_1in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_prc_3in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/tb_hpu_regif_dummy.toml"]
polling_us=100000
[fpga.ffi.Sim]
ipc_name="/tmp/${USER}/hpu_mockup_ipc"
[rtl]
bpip_use = true
bpip_use_opportunism = true
bpip_timeout = 100_000
[board]
ct_mem = 32768
ct_pc = [
{Hbm= {pc=32}},
{Hbm= {pc=33}},
]
heap_size = 16384
lut_mem = 256
lut_pc = {Hbm={pc=34}}
fw_size= 16777215 # i.e. 16 MiB
fw_pc = {Ddr= {offset= 0x3900_0000}} # NB: Allocation must take place in the Discret DDR
bsk_pc = [
{Hbm={pc=8}},
{Hbm={pc=12}},
{Hbm={pc=24}},
{Hbm={pc=28}},
{Hbm={pc=40}},
{Hbm={pc=44}},
{Hbm={pc=56}},
{Hbm={pc=60}}
]
ksk_pc = [
{Hbm={pc=0}},
{Hbm={pc=1}},
{Hbm={pc=2}},
{Hbm={pc=3}},
{Hbm={pc=4}},
{Hbm={pc=5}},
{Hbm={pc=6}},
{Hbm={pc=7}},
{Hbm={pc=16}},
{Hbm={pc=17}},
{Hbm={pc=18}},
{Hbm={pc=19}},
{Hbm={pc=20}},
{Hbm={pc=21}},
{Hbm={pc=22}},
{Hbm={pc=23}}
]
trace_pc = {Hbm={pc=35}}
trace_depth = 32 # In MB
[firmware]
implementation = "Llt"
integer_w=[2,4,6,8,10,12,14,16,32,64,128]
min_batch_size = 11
kogge_cfg = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/kogge_cfg.toml"
custom_iop.'IOP[0]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_0.asm"
custom_iop.'IOP[1]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_1.asm"
custom_iop.'IOP[2]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_2.asm"
custom_iop.'IOP[3]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_3.asm"
custom_iop.'IOP[8]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_8.asm"
custom_iop.'IOP[9]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_9.asm"
custom_iop.'IOP[16]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_16.asm"
custom_iop.'IOP[17]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_17.asm"
custom_iop.'IOP[18]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_18.asm"
custom_iop.'IOP[19]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_19.asm"
custom_iop.'IOP[20]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_20.asm"
custom_iop.'IOP[21]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_21.asm"
[firmware.op_cfg.default]
fill_batch_fifo = true
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.MUL]
fill_batch_fifo = false
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.MULS]
fill_batch_fifo = false
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.ERC_20]
fill_batch_fifo = true
min_batch_size = false
use_tiers = true
flush_behaviour = "Patient"
flush = true

View File

@@ -0,0 +1,256 @@
module_name="hpu_regif_core_cfg_1in3"
description="HPU top-level register interface. Used by the host to retrieve design information, and to configure it."
word_size_b = 32
offset = 0x00
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.entry_cfg_1in3]
description="entry_cfg_1in3 section with known value used for debug."
offset= 0x0
[section.entry_cfg_1in3.register.dummy_val0]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x01010101}
[section.entry_cfg_1in3.register.dummy_val1]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x11111111}
[section.entry_cfg_1in3.register.dummy_val2]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x21212121}
[section.entry_cfg_1in3.register.dummy_val3]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x31313131}
# =====================================================================================================================
[section.info]
description="RTL architecture parameters"
offset= 0x10
[section.info.register.version]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="VERSION"}
[section.info.register.ntt_architecture]
description="NTT architecture"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="NTT_CORE_ARCH"}
[section.info.register.ntt_structure]
description="NTT structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.radix = { size_b=8, offset_b=0 , default={Param="R"}, description="NTT radix"}
field.psi = { size_b=8, offset_b=8 , default={Param="PSI"}, description="NTT psi"}
field.div = { size_b=8, offset_b=16, default={Param="BWD_PSI_DIV"}, description="NTT backward div"}
field.delta = { size_b=8, offset_b=24, default={Param="DELTA"}, description="NTT network delta (for wmm arch)"}
[section.info.register.ntt_rdx_cut]
description="NTT radix cuts, in log2 unit (for gf64 arch)"
owner="Parameter"
read_access="Read"
write_access="None"
field.radix_cut0 = { size_b=4, offset_b=0 , default={Param="NTT_RDX_CUT_S_0"}, description="NTT radix cut #0"}
field.radix_cut1 = { size_b=4, offset_b=4 , default={Param="NTT_RDX_CUT_S_1"}, description="NTT radix cut #1"}
field.radix_cut2 = { size_b=4, offset_b=8 , default={Param="NTT_RDX_CUT_S_2"}, description="NTT radix cut #2"}
field.radix_cut3 = { size_b=4, offset_b=12, default={Param="NTT_RDX_CUT_S_3"}, description="NTT radix cut #3"}
field.radix_cut4 = { size_b=4, offset_b=16, default={Param="NTT_RDX_CUT_S_4"}, description="NTT radix cut #4"}
field.radix_cut5 = { size_b=4, offset_b=20, default={Param="NTT_RDX_CUT_S_5"}, description="NTT radix cut #5"}
field.radix_cut6 = { size_b=4, offset_b=24, default={Param="NTT_RDX_CUT_S_6"}, description="NTT radix cut #6"}
field.radix_cut7 = { size_b=4, offset_b=28, default={Param="NTT_RDX_CUT_S_7"}, description="NTT radix cut #7"}
[section.info.register.ntt_pbs]
description="Maximum number of PBS in the NTT pipeline"
owner="Parameter"
read_access="Read"
write_access="None"
field.batch_pbs_nb = { size_b=8, offset_b=0 , default={Param="BATCH_PBS_NB"}, description="Maximum number of PBS in the NTT pipe"}
field.total_pbs_nb = { size_b=8, offset_b=8 , default={Param="TOTAL_PBS_NB"}, description="Maximum number of PBS stored in PEP buffer"}
[section.info.register.ntt_modulo]
description="Code associated to the NTT prime"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="MOD_NTT_NAME"}
[section.info.register.application]
description="Code associated with the application"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="APPLICATION_NAME"}
[section.info.register.ks_structure]
description="Key-switch structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.x = { size_b=8, offset_b=0 , default={Param="LBX"}, description="Number of coefficients on X dimension"}
field.y = { size_b=8, offset_b=8 , default={Param="LBY"}, description="Number of coefficients on Y dimension"}
field.z = { size_b=8, offset_b=16, default={Param="LBZ"}, description="Number of coefficients on Z dimension"}
[section.info.register.ks_crypto_param]
description="Key-switch crypto parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.mod_ksk_w = { size_b=8, offset_b=0 , default={Param="MOD_KSK_W"}, description="Width of KSK modulo"}
field.ks_l = { size_b=8, offset_b=8 , default={Param="KS_L"}, description="Number of KS decomposition level"}
field.ks_b = { size_b=8, offset_b=16, default={Param="KS_B_W"}, description="Width of KS decomposition base"}
[section.info.register.regf_structure]
description="Register file structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.reg_nb = { size_b=8, offset_b=0 , default={Param="REGF_REG_NB"}, description="Number of registers in regfile"}
field.coef_nb = { size_b=8, offset_b=8 , default={Param="REGF_COEF_NB"}, description="Number of coefficients at regfile interface"}
[section.info.register.isc_structure]
description="Instruction scheduler structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.depth = { size_b=8, offset_b=0 , default={Param="ISC_DEPTH"}, description="Number of slots in ISC lookahead buffer."}
field.min_iop_size = { size_b=8, offset_b=8 , default={Param="MIN_IOP_SIZE"}, description="Minimum number of DOp per IOp to prevent sync_id overflow."}
[section.info.register.pe_properties]
description="Processing elements parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.alu_nb = { size_b=8, offset_b=24 , default={Param="PEA_ALU_NB"}, description="Number of coefficients processed in parallel in pe_alu"}
field.pep_regf_period = { size_b=8, offset_b=16 , default={Param="PEP_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEP and regfile"}
field.pem_regf_period = { size_b=8, offset_b=8 , default={Param="PEM_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEM and regfile"}
field.pea_regf_period = { size_b=8, offset_b=0 , default={Param="PEA_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEA and regfile"}
[section.info.register.bsk_structure]
description="BSK manager structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.bsk_cut_nb = { size_b=8, offset_b=8 , default={Param="BSK_CUT_NB"}, description="BSK cut nb"}
[section.info.register.ksk_structure]
description="KSK manager structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.ksk_cut_nb = { size_b=8, offset_b=8 , default={Param="KSK_CUT_NB"}, description="KSK cut nb"}
[section.info.register.hbm_axi4_nb]
description="Number of AXI4 connections to HBM"
owner="Parameter"
read_access="Read"
write_access="None"
field.bsk_pc = { size_b=8, offset_b=0 , default={Param="BSK_PC"}, description="Number of HBM connections for BSK"}
field.ksk_pc = { size_b=8, offset_b=8, default={Param="KSK_PC"}, description="Number of HBM connections for KSK"}
field.pem_pc = { size_b=8, offset_b=16, default={Param="PEM_PC"}, description="Number of HBM connections for ciphertexts (PEM)"}
field.glwe_pc = { size_b=8, offset_b=24, default={Param="GLWE_PC"}, description="Number of HBM connections for GLWE"}
[section.info.register.hbm_axi4_dataw_pem]
description="Ciphertext HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_PEM_DATA_W"}
[section.info.register.hbm_axi4_dataw_glwe]
description="GLWE HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_GLWE_DATA_W"}
[section.info.register.hbm_axi4_dataw_bsk]
description="BSK HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_BSK_DATA_W"}
[section.info.register.hbm_axi4_dataw_ksk]
description="KSK HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_KSK_DATA_W"}
# =====================================================================================================================
[section.hbm_axi4_addr_1in3]
offset= 0x1000
description="HBM AXI4 connection address offset"
[section.hbm_axi4_addr_1in3.register.ct]
description="Address offset for each ciphertext HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb","_pc1_lsb", "_pc1_msb"]
[section.hbm_axi4_addr_1in3.register.glwe]
description="Address offset for each GLWE HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb"]
[section.hbm_axi4_addr_1in3.register.ksk]
description="Address offset for each KSK HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb", "_pc1_lsb", "_pc1_msb", "_pc2_lsb", "_pc2_msb", "_pc3_lsb", "_pc3_msb", "_pc4_lsb", "_pc4_msb", "_pc5_lsb", "_pc5_msb", "_pc6_lsb", "_pc6_msb", "_pc7_lsb", "_pc7_msb", "_pc8_lsb", "_pc8_msb", "_pc9_lsb", "_pc9_msb", "_pc10_lsb", "_pc10_msb", "_pc11_lsb", "_pc11_msb", "_pc12_lsb", "_pc12_msb", "_pc13_lsb", "_pc13_msb", "_pc14_lsb", "_pc14_msb", "_pc15_lsb", "_pc15_msb"]
[section.hbm_axi4_addr_1in3.register.trc]
description="Address offset for each trace HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb"]
# =====================================================================================================================
[section.bpip]
offset= 0x2000
description="BPIP configuration"
[section.bpip.register.use]
description="(1) Use BPIP mode, (0) use IPIP mode (default)"
owner="User"
read_access="Read"
write_access="Write"
field.use_bpip = { size_b=1, offset_b=0 , default={Cst=1}, description="use"}
field.use_opportunism = { size_b=1, offset_b=1 , default={Cst=0}, description="use opportunistic PBS flush"}
[section.bpip.register.timeout]
description="Timeout for BPIP mode"
owner="User"
read_access="Read"
write_access="Write"
default={Cst=0xffffffff}

View File

@@ -0,0 +1,51 @@
module_name="hpu_regif_core_cfg_3in3"
description="HPU top-level register interface. Used by the host to retrieve design information, and to configure it."
word_size_b = 32
offset = 0x20000
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.entry_cfg_3in3]
description="entry_cfg_3in3 section with known value used for debug."
offset= 0x0
[section.entry_cfg_3in3.register.dummy_val0]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x03030303}
[section.entry_cfg_3in3.register.dummy_val1]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x13131313}
[section.entry_cfg_3in3.register.dummy_val2]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x23232323}
[section.entry_cfg_3in3.register.dummy_val3]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x33333333}
# =====================================================================================================================
[section.hbm_axi4_addr_3in3]
description="HBM AXI4 connection address offset"
offset= 0x10
[section.hbm_axi4_addr_3in3.register.bsk]
description="Address offset for each BSK HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb", "_pc1_lsb", "_pc1_msb", "_pc2_lsb", "_pc2_msb", "_pc3_lsb", "_pc3_msb", "_pc4_lsb", "_pc4_msb", "_pc5_lsb", "_pc5_msb", "_pc6_lsb", "_pc6_msb", "_pc7_lsb", "_pc7_msb", "_pc8_lsb", "_pc8_msb", "_pc9_lsb", "_pc9_msb", "_pc10_lsb", "_pc10_msb", "_pc11_lsb", "_pc11_msb", "_pc12_lsb", "_pc12_msb", "_pc13_lsb", "_pc13_msb", "_pc14_lsb", "_pc14_msb", "_pc15_lsb", "_pc15_msb"]

View File

@@ -0,0 +1,336 @@
module_name="hpu_regif_core_prc_1in3"
description="HPU top-level register interface. Used by the host to retrieve design information, and to configure it."
word_size_b = 32
offset = 0x10000
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.entry_prc_1in3]
description="entry_prc_1in3 section with known value used for debug."
offset= 0x0
[section.entry_prc_1in3.register.dummy_val0]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x02020202}
[section.entry_prc_1in3.register.dummy_val1]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x12121212}
[section.entry_prc_1in3.register.dummy_val2]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x22222222}
[section.entry_prc_1in3.register.dummy_val3]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x32323232}
# =====================================================================================================================
[section.status_1in3]
description="HPU status of part 1in3"
offset= 0x10
[section.status_1in3.register.error]
description="Error register (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.pbs = { size_b=32, offset_b=0 , default={Cst=0}, description="HPU error part 1in3"}
# =====================================================================================================================
[section.ksk_avail]
description="KSK availability configuration"
offset= 0x1000
[section.ksk_avail.register.avail]
description="KSK available bit"
owner="User"
read_access="Read"
write_access="Write"
field.avail = { size_b=1, offset_b=0 , default={Cst=0}, description="avail"}
[section.ksk_avail.register.reset]
description="KSK reset sequence"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.request = { size_b=1, offset_b=0 , default={Cst=0}, description="request"}
field.done = { size_b=1, offset_b=31 , default={Cst=0}, description="done"}
# =====================================================================================================================
[section.runtime_1in3]
description="Runtime information"
offset= 0x2000
[section.runtime_1in3.register.pep_cmux_loop]
description="PEP: CMUX iteration loop number"
owner="Kernel"
read_access="Read"
write_access="None"
field.br_loop = { size_b=15, offset_b=0 , default={Cst=0}, description="PBS current BR-loop"}
field.br_loop_c = { size_b=1, offset_b=15 , default={Cst=0}, description="PBS current BR-loop parity"}
field.ks_loop = { size_b=15, offset_b=16 , default={Cst=0}, description="KS current KS-loop"}
field.ks_loop_c = { size_b=1, offset_b=31 , default={Cst=0}, description="KS current KS-loop parity"}
[section.runtime_1in3.register.pep_pointer_0]
description="PEP: pointers (part 1)"
owner="Kernel"
read_access="Read"
write_access="None"
field.pool_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP pool_rp"}
field.pool_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP pool_wp"}
field.ldg_pt = { size_b=8, offset_b=16 , default={Cst=0}, description="PEP ldg_pt"}
field.ldb_pt = { size_b=8, offset_b=24 , default={Cst=0}, description="PEP ldb_pt"}
[section.runtime_1in3.register.pep_pointer_1]
description="PEP: pointers (part 2)"
owner="Kernel"
read_access="Read"
write_access="None"
field.ks_in_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP ks_in_rp"}
field.ks_in_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP ks_in_wp"}
field.ks_out_rp = { size_b=8, offset_b=16 , default={Cst=0}, description="PEP ks_out_rp"}
field.ks_out_wp = { size_b=8, offset_b=24 , default={Cst=0}, description="PEP ks_out_wp"}
[section.runtime_1in3.register.pep_pointer_2]
description="PEP: pointers (part 3)"
owner="Kernel"
read_access="Read"
write_access="None"
field.pbs_in_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP pbs_in_rp"}
field.pbs_in_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP pbs_in_wp"}
field.ipip_flush_last_pbs_in_loop = { size_b=16, offset_b=16 , default={Cst=0}, description="PEP IPIP flush last pbs_in_loop"}
[section.runtime_1in3.register.isc_latest_instruction]
description="ISC: 4 latest instructions received ([0] is the most recent)"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_0","_1","_2","_3"]
[section.runtime_1in3.register.pep_seq_bpip_batch_cnt]
description="PEP: BPIP batch counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_flush_cnt]
description="PEP: BPIP batch triggered by a flush counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_timeout_cnt]
description="PEP: BPIP batch triggered by a timeout counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_waiting_batch_cnt]
description="PEP: BPIP batch that waits the trigger counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_filling_cnt]
description="PEP: Count batch with filled with a given number of CT (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_1","_2","_3","_4","_5","_6","_7","_8","_9","_10","_11","_12","_13","_14","_15","_16"]
[section.runtime_1in3.register.pep_seq_ld_ack_cnt]
description="PEP: load BLWE ack counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_cmux_not_full_batch_cnt]
description="PEP: not full batch CMUX counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_ipip_flush_cnt]
description="PEP: IPIP flush CMUX counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldb_rcp_dur]
description="PEP: load BLWE reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldg_req_dur]
description="PEP: load GLWE request max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldg_rcp_dur]
description="PEP: load GLWE reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_load_ksk_rcp_dur]
description="PEP: load KSK slice reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_pc0","_pc1","_pc2","_pc3","_pc4","_pc5","_pc6","_pc7","_pc8","_pc9","_pc10","_pc11","_pc12","_pc13","_pc14","_pc15"]
[section.runtime_1in3.register.pep_mmacc_sxt_rcp_dur]
description="PEP: MMACC SXT reception duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_mmacc_sxt_req_dur]
description="PEP: MMACC SXT request duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_mmacc_sxt_cmd_wait_b_dur]
description="PEP: MMACC SXT command wait for b duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_inst_cnt]
description="PEP: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ack_cnt]
description="PEP: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_inst_cnt]
description="PEM: load input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_ack_cnt]
description="PEM: load instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_store_inst_cnt]
description="PEM: store input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_store_ack_cnt]
description="PEM: store instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pea_inst_cnt]
description="PEA: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pea_ack_cnt]
description="PEA: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.isc_inst_cnt]
description="ISC: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.isc_ack_cnt]
description="ISC: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_info_0]
description="PEM: load first data)"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_pc0_0","_pc0_1","_pc0_2","_pc0_3","_pc1_0","_pc1_1","_pc1_2","_pc1_3"]
[section.runtime_1in3.register.pem_load_info_1]
description="PEM: load first address"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_pc0_lsb","_pc0_msb","_pc1_lsb","_pc1_msb"]
[section.runtime_1in3.register.pem_store_info_0]
description="PEM: store info 0)"
owner="Kernel"
read_access="Read"
write_access="None"
field.cmd_vld = { size_b=1, offset_b=0 , default={Cst=0}, description="PEM_ST cmd vld"}
field.cmd_rdy = { size_b=1, offset_b=1 , default={Cst=0}, description="PEM_ST cmd rdy"}
field.pem_regf_rd_req_vld = { size_b=1, offset_b=2 , default={Cst=0}, description="PEM_ST pem_regf_rd_req_vld"}
field.pem_regf_rd_req_rdy = { size_b=1, offset_b=3 , default={Cst=0}, description="PEM_ST pem_regf_rd_req_rdy"}
field.brsp_fifo_in_vld = { size_b=4, offset_b=4 , default={Cst=0}, description="PEM_ST brsp_fifo_in_vld"}
field.brsp_fifo_in_rdy = { size_b=4, offset_b=8 , default={Cst=0}, description="PEM_ST brsp_fifo_in_rdy"}
field.rcp_fifo_in_vld = { size_b=4, offset_b=12 , default={Cst=0}, description="PEM_ST rcp_fifo_in_vld"}
field.rcp_fifo_in_rdy = { size_b=4, offset_b=16 , default={Cst=0}, description="PEM_ST rcp_fifo_in_rdy"}
field.r2_axi_vld = { size_b=4, offset_b=20 , default={Cst=0}, description="PEM_ST r2_axi_vld"}
field.r2_axi_rdy = { size_b=4, offset_b=24 , default={Cst=0}, description="PEM_ST r2_axi_rdy"}
field.c0_enough_location = { size_b=4, offset_b=28 , default={Cst=0}, description="PEM_ST c0_enough_location"}
[section.runtime_1in3.register.pem_store_info_1]
description="PEM: store info 1"
owner="Kernel"
read_access="Read"
write_access="None"
field.s0_cmd_vld = { size_b=4, offset_b=0 , default={Cst=0}, description="PEM_ST s0_cmd_vld"}
field.s0_cmd_rdy = { size_b=4, offset_b=4 , default={Cst=0}, description="PEM_ST s0_cmd_rdy"}
field.m_axi_bvalid = { size_b=4, offset_b=8 , default={Cst=0}, description="PEM_ST m_axi_bvalid"}
field.m_axi_bready = { size_b=4, offset_b=12 , default={Cst=0}, description="PEM_ST m_axi_bready"}
field.m_axi_wvalid = { size_b=4, offset_b=16 , default={Cst=0}, description="PEM_ST m_axi_wvalid"}
field.m_axi_wready = { size_b=4, offset_b=20 , default={Cst=0}, description="PEM_ST m_axi_wready"}
field.m_axi_awvalid = { size_b=4, offset_b=24 , default={Cst=0}, description="PEM_ST m_axi_awvalid"}
field.m_axi_awready = { size_b=4, offset_b=28 , default={Cst=0}, description="PEM_ST m_axi_awready"}
[section.runtime_1in3.register.pem_store_info_2]
description="PEM: store info 2"
owner="Kernel"
read_access="Read"
write_access="None"
field.c0_free_loc_cnt = { size_b=16, offset_b=0 , default={Cst=0}, description="PEM_ST c0_free_loc_cnt"}
field.brsp_bresp_cnt = { size_b=16, offset_b=16 , default={Cst=0}, description="PEM_ST brsp_bresp_cnt"}
[section.runtime_1in3.register.pem_store_info_3]
description="PEM: store info 3"
owner="Kernel"
read_access="Read"
write_access="None"
field.brsp_ack_seen = { size_b=16, offset_b=0 , default={Cst=0}, description="PEM_ST brsp_ack_seen"}
field.c0_cmd_cnt = { size_b=8, offset_b=16 , default={Cst=0}, description="PEM_ST c0_cmd_cnt"}

View File

@@ -0,0 +1,100 @@
module_name="hpu_regif_core_prc_3in3"
description="HPU top-level register interface. Used by the host to retrieve design information, and to configure it."
word_size_b = 32
offset = 0x30000
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.entry_prc_3in3]
description="entry_prc_3in3 section with known value used for debug."
offset= 0x0
[section.entry_prc_3in3.register.dummy_val0]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x04040404}
[section.entry_prc_3in3.register.dummy_val1]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x14141414}
[section.entry_prc_3in3.register.dummy_val2]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x24242424}
[section.entry_prc_3in3.register.dummy_val3]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x34343434}
# =====================================================================================================================
[section.status_3in3]
description="HPU status of parts 2in3 and 3in3"
offset= 0x10
[section.status_3in3.register.error]
description="Error register (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.pbs = { size_b=32, offset_b=0 , default={Cst=0}, description="HPU error part 3in3"}
# =====================================================================================================================
[section.bsk_avail]
description="BSK availability configuration"
offset= 0x1000
[section.bsk_avail.register.avail]
description="BSK available bit"
owner="User"
read_access="Read"
write_access="Write"
field.avail = { size_b=1, offset_b=0 , default={Cst=0}, description="avail"}
[section.bsk_avail.register.reset]
description="BSK reset sequence"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.request = { size_b=1, offset_b=0 , default={Cst=0}, description="request"}
field.done = { size_b=1, offset_b=31 , default={Cst=0}, description="done"}
# =====================================================================================================================
[section.runtime_3in3]
description="Runtime information"
offset= 0x2000
[section.runtime_3in3.register.pep_load_bsk_rcp_dur]
description="PEP: load BSK slice reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_pc0","_pc1","_pc2","_pc3","_pc4","_pc5","_pc6","_pc7","_pc8","_pc9","_pc10","_pc11","_pc12","_pc13","_pc14","_pc15"]
[section.runtime_3in3.register.pep_bskif_req_info_0]
description="PEP: BSK_IF: requester info 0"
owner="Kernel"
read_access="Read"
write_access="None"
field.req_br_loop_rp = { size_b=16, offset_b=0 , default={Cst=0}, description="PEP BSK_IF requester BSK read pointer"}
field.req_br_loop_wp = { size_b=16, offset_b=16 , default={Cst=0}, description="PEP BSK_IF requester BSK write pointer"}
[section.runtime_3in3.register.pep_bskif_req_info_1]
description="PEP: BSK_IF: requester info 0"
owner="Kernel"
read_access="Read"
write_access="None"
field.req_prf_br_loop = { size_b=16, offset_b=0 , default={Cst=0}, description="PEP BSK_IF requester BSK prefetch pointer"}
field.req_parity = { size_b=1, offset_b=16 , default={Cst=0}, description="PEP BSK_IF requester BSK pointer parity"}
field.req_assigned = { size_b=1, offset_b=31 , default={Cst=0}, description="PEP BSK_IF requester assignment"}

View File

@@ -0,0 +1,22 @@
module_name="tb_hpu_regif_dummy"
description="Fake registers needed by the mockup"
word_size_b = 32
offset = 0x40000
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# ==============================================================================
[section.WorkAck]
description="Purpose of this section"
[section.WorkAck.register.workq]
description="Insert work in workq and read status"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.WorkAck.register.ackq]
description="Pop ack from in ackq"
owner="Kernel"
read_access="ReadNotify"
write_access="None"

View File

@@ -0,0 +1,6 @@
# Fpga version
Built with the following command: (i.e. xrt/run_syn_hpu_msplit_3parts_64b.sh)
```
just zaxl-build hpu_msplit_3parts 3 "0:300" "-F TOP_MSPLIT TOP_MSPLIT_1 -F TOP_BATCH TOP_BATCH_TOPhpu_BPBS8_TPBS32 -F TOP_PCMAX TOP_PCMAX_pem2_glwe1_bsk8_ksk8 -F TOP_PC TOP_PC_pem2_glwe1_bsk4_ksk4 -F APPLICATION APPLI_msg2_carry2 -F NTT_MOD NTT_MOD_goldilocks -F NTT_CORE_ARCH NTT_CORE_ARCH_gf64 -F NTT_CORE_R_PSI NTT_CORE_R2_PSI16 -F NTT_CORE_RDX_CUT NTT_CORE_RDX_CUT_n5c5c1 -F NTT_CORE_DIV NTT_CORE_DIV_1 -F BSK_SLOT_CUT BSK_SLOT8_CUT4 -F KSK_SLOT_CUT KSK_SLOT8_CUT4 -F KSLB KSLB_x2y32z3 -F HPU_PART HPU_PART_gf64 -F AXI_DATA_W AXI_DATA_W_512" "1:${PROJECT_DIR}/hw/output/micro_code/ucore_fw.elf" 'D:MEMORY_FILE_PATH=\\\"${PROJECT_DIR}/hw/\\\"' | tee build_out.log
```

View File

@@ -0,0 +1,15 @@
# CUST_0
# Simple IOp to check the xfer between Hpu/Cpu
# Construct constant in dest slot -> 249 (0xf9)
SUB R0 R0 R0
ADDS R0 R0 1
ST TD[0].0 R0
SUB R1 R1 R1
ADDS R1 R1 2
ST TD[0].1 R1
SUB R2 R2 R2
ADDS R2 R2 3
ST TD[0].2 R2
SUB R3 R3 R3
ADDS R3 R3 3
ST TD[0].3 R3

View File

@@ -0,0 +1,11 @@
# CUST_1
# Simple IOp to check the xfer between Hpu/Cpu
# Dest <- Src_a
LD R0 TS[0].0
LD R1 TS[0].1
LD R2 TS[0].2
LD R3 TS[0].3
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,25 @@
; CUST_8
; Simple IOp to check the ALU operation
; Dst[0].0 <- Src[0].0 + Src[1].0
LD R1 TS[0].0
LD R2 TS[1].0
ADD R0 R1 R2
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 + Src[1].1
LD R5 TS[0].1
LD R6 TS[1].1
ADD R4 R5 R6
ST TD[0].2 R4
; Dst[0].2 <- Src[0].2 + Src[1].2
LD R9 TS[0].2
LD R10 TS[1].2
ADD R8 R9 R10
ST TD[0].2 R8
; Dst[0].3 <- Src[0].3 + Src[1].3
LD R13 TS[0].3
LD R14 TS[1].3
ADD R12 R13 R14
ST TD[0].3 R0

View File

@@ -0,0 +1,6 @@
# CUST_16
# Simple IOp to check PBS behavior
# Dest <- PBSNone(Src_a.0)
LD R0 TS[0].0
PBS_F R0 R0 PbsNone
ST TD[0].0 R0

View File

@@ -0,0 +1,15 @@
# CUST_17
# Simple IOp to check PBS behavior
# Dest <- PBSNone(Src_a)
LD R0 TS[0].0
PBS R0 R0 PbsNone
ST TD[0].0 R0
LD R1 TS[0].1
PBS R1 R1 PbsNone
ST TD[0].1 R1
LD R2 TS[0].2
PBS R2 R2 PbsNone
ST TD[0].2 R2
LD R3 TS[0].3
PBS_F R3 R3 PbsNone
ST TD[0].3 R3

View File

@@ -0,0 +1,23 @@
; CUST_18
; Simple IOp to check extraction pattern
; Correct result:
; * Dst[0,1] <- Src[0][0,1]
; * Dst[2,3] <- Src[1][0,1]
; Pack Src[0][0,1] with a Mac and extract Carry/Msg in Dst[0][0,1]
LD R0 TS[0].0
LD R1 TS[0].1
MAC R3 R1 R0 4
PBS R4 R3 PbsMsgOnly
PBS R5 R3 PbsCarryInMsg
ST TD[0].0 R4
ST TD[0].1 R5
; Pack Src[1][0,1] with a Mac and extract Carry/Msg in Dst[0][2,3]
LD R10 TS[1].0
LD R11 TS[1].1
MAC R13 R11 R10 4
PBS R14 R13 PbsMsgOnly
PBS R15 R13 PbsCarryInMsg
ST TD[0].2 R14
ST TD[0].3 R15

View File

@@ -0,0 +1,19 @@
; CUST_19
; Simple IOp to check PbsMl2
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- 0
; * Dst[0][2] <- Src[0][0] +1
; * Dst[0][3] <- 0
; i.e Cust_19(0x2) => 0x32
; Construct a 0 for destination padding
SUB R16 R16 R16
; Apply PbsMl2 on Src[0] result goes in dest[0][0-3] (0-padded)
LD R0 TS[0].0
PBS_ML2_F R0 R0 PbsTestMany2
ST TD[0].0 R0
ST TD[0].1 R16
ST TD[0].2 R1
ST TD[0].3 R16

View File

@@ -0,0 +1,11 @@
# CUST_2
# Simple IOp to check the xfer between Hpu/Cpu
# Dest <- Src_b
LD R0 TS[1].0
LD R1 TS[1].1
LD R2 TS[1].2
LD R3 TS[1].3
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,22 @@
; CUST_20
; Simple IOp to check PbsMl4
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- Src[0][0] +1
; * Dst[0][2] <- Src[0][0] +2
; * Dst[0][3] <- Src[0][0] +3
; i.e Cust_20(0x0) => 0xe4
SUB R16 R16 R16
ST TD[0].0 R0
ST TD[0].1 R0
ST TD[0].2 R0
ST TD[0].3 R0
; Apply PbsMl4 on Src[0] result goes in dest[0][0-3]
LD R0 TS[0].0
PBS_ML4_F R0 R0 PbsTestMany4
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,24 @@
; CUST_21
; Simple IOp to check PbsMl8
; WARN: This operation required 16b ct width
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- Src[0][0] +1
; * Dst[0][2] <- Src[0][0] +2
; * Dst[0][3] <- Src[0][0] +3
; * Dst[0][4] <- Src[0][0] +4
; * Dst[0][5] <- Src[0][0] +5
; * Dst[0][6] <- Src[0][0] +6
; * Dst[0][7] <- Src[0][0] +7
; Apply PbsMl8 on Src[0] result goes in dest[0][0-7]
LD R0 TS[0].0
PBS_ML8_F R0 R0 PbsTestMany8
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3
ST TD[0].4 R4
ST TD[0].5 R5
ST TD[0].6 R6
ST TD[0].7 R7

View File

@@ -0,0 +1,16 @@
# CUST_3
# Simple IOp to check isc behavior
# Generate obvious deps and check that isc correctly issued the dop
# Correct result must bu Dest <- Src[0]
LD R0 TS[0].0
LD R1 TS[0].1
LD R2 TS[0].2
LD R3 TS[0].3
PBS R4 R0 PbsNone
ST TD[0].0 R4
PBS R4 R1 PbsNone
ST TD[0].1 R4
PBS R4 R2 PbsNone
ST TD[0].2 R4
PBS_F R4 R3 PbsNone
ST TD[0].3 R4

View File

@@ -0,0 +1,19 @@
; CUST_8
; Simple IOp to check the ALU operation
; Dst[0].0 <- Src[0].0 + Src[1].0
LD R1 TS[0].0
LD R2 TS[1].0
ADD R0 R1 R2
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 - Src[1].1
LD R5 TS[0].1
LD R6 TS[1].1
SUB R4 R5 R6
ST TD[0].1 R4
; Dst[0].2 <- Src[0].2 + (Src[1].2 *4)
LD R9 TS[0].2
LD R10 TS[1].2
MAC R8 R9 R10 4
ST TD[0].2 R8

View File

@@ -0,0 +1,21 @@
; CUST_9
; Simple IOp to check the ALU Scalar operation
; Dst[0].0 <- Src[0].0 + Imm[0].0
LD R1 TS[0].0
ADDS R0 R1 TI[0].0
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 - Imm[0].1
LD R5 TS[0].1
SUBS R4 R5 TI[0].1
ST TD[0].1 R4
; Dst[0].2 <- Imm[0].2 - Src[0].2
LD R9 TS[0].2
SSUB R8 R9 TI[0].2
ST TD[0].2 R8
; Dst[0].3 <- Src[0].3 * Imm[0].3
LD R13 TS[0].3
MULS R12 R13 TI[0].3
ST TD[0].3 R12

View File

@@ -0,0 +1,98 @@
[fpga]
regmap=["${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core.toml"]
polling_us=10
[fpga.ffi.Xrt]
id= 0
kernel= "hpu_msplit_3parts_1in3"
xclbin="${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_msplit_3parts.xclbin"
[rtl]
bpip_use = true
bpip_use_opportunism = true
bpip_timeout = 100_000
[board]
ct_mem = 4096
ct_pc = [
{Hbm= {pc=10}},
{Hbm= {pc=11}},
]
heap_size = 3584
lut_mem = 256
lut_pc = {Hbm={pc=12}}
fw_size= 65536
fw_pc = {Hbm={pc=1}}
bsk_pc = [
{Hbm={pc=2}},
{Hbm={pc=3}},
{Hbm={pc=4}},
{Hbm={pc=5}},
{Hbm={pc=6}},
{Hbm={pc=7}},
{Hbm={pc=8}},
{Hbm={pc=9}}
]
ksk_pc = [
{Hbm={pc=24}},
{Hbm={pc=25}},
{Hbm={pc=26}},
{Hbm={pc=27}},
{Hbm={pc=28}},
{Hbm={pc=29}},
{Hbm={pc=30}},
{Hbm={pc=31}}
]
trace_pc = {Hbm={pc=0}}
trace_depth = 4 # In MB
[firmware]
implementation = "Llt"
integer_w=[4,6,8,10,12,14,16,32,64,128]
min_batch_size = 6
kogge_cfg = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/kogge_cfg.toml"
custom_iop.'IOP[0]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_0.asm"
custom_iop.'IOP[1]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_1.asm"
custom_iop.'IOP[2]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_2.asm"
custom_iop.'IOP[3]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_3.asm"
custom_iop.'IOP[8]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_8.asm"
custom_iop.'IOP[9]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_9.asm"
custom_iop.'IOP[16]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_16.asm"
custom_iop.'IOP[17]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_17.asm"
custom_iop.'IOP[18]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_18.asm"
custom_iop.'IOP[19]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_19.asm"
custom_iop.'IOP[20]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_20.asm"
custom_iop.'IOP[21]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_21.asm"
[firmware.op_cfg.default]
fill_batch_fifo = true
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.MUL]
fill_batch_fifo = false
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.MULS]
fill_batch_fifo = false
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.ERC_20]
fill_batch_fifo = false
min_batch_size = true
use_tiers = true
flush_behaviour = "Patient"
flush = true

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:35ad67cf9760e37256a6c92cf29ea67334690b724fd3b7b859919ee9b0bde6d3
size 78194785

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,622 @@
# This is a sample example of register-map definition
module_name="hpu_regif_core"
description="Hpu top-level register interface. Used by the host to retrieved RTL information, configure it and issue commands."
word_size_b = 32
offset = 0x00
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.Xrt]
description="Vitis Required registers"
offset= 0x0
# Currently not in used -> Placeholder only
[section.Xrt.register.reserved]
description="Xrt reserved"
default={Cst=0x00}
owner="User"
read_access="Read"
write_access="Write"
# =====================================================================================================================
[section.info]
description="Contain all the RTL parameters used that have impact on associated SW"
offset= 0x10
[section.info.register.version]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="VERSION"}
[section.info.register.ntt_architecture]
description="NTT architecture"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="NTT_CORE_ARCH"}
[section.info.register.ntt_structure]
description="NTT structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.radix = { size_b=8, offset_b=0 , default={Param="R"}, description="NTT radix"}
field.psi = { size_b=8, offset_b=8 , default={Param="PSI"}, description="NTT psi"}
field.div = { size_b=8, offset_b=16, default={Param="BWD_PSI_DIV"}, description="NTT backward div"}
field.delta = { size_b=8, offset_b=24, default={Param="DELTA"}, description="NTT network delta (for wmm arch)"}
[section.info.register.ntt_rdx_cut]
description="NTT radix cuts, in log2 unit (for gf64 arch)"
owner="Parameter"
read_access="Read"
write_access="None"
field.radix_cut0 = { size_b=4, offset_b=0 , default={Param="NTT_RDX_CUT_S_0"}, description="NTT radix cut #0"}
field.radix_cut1 = { size_b=4, offset_b=4 , default={Param="NTT_RDX_CUT_S_1"}, description="NTT radix cut #1"}
field.radix_cut2 = { size_b=4, offset_b=8 , default={Param="NTT_RDX_CUT_S_2"}, description="NTT radix cut #2"}
field.radix_cut3 = { size_b=4, offset_b=12, default={Param="NTT_RDX_CUT_S_3"}, description="NTT radix cut #3"}
field.radix_cut4 = { size_b=4, offset_b=16, default={Param="NTT_RDX_CUT_S_4"}, description="NTT radix cut #4"}
field.radix_cut5 = { size_b=4, offset_b=20, default={Param="NTT_RDX_CUT_S_5"}, description="NTT radix cut #5"}
field.radix_cut6 = { size_b=4, offset_b=24, default={Param="NTT_RDX_CUT_S_6"}, description="NTT radix cut #6"}
field.radix_cut7 = { size_b=4, offset_b=28, default={Param="NTT_RDX_CUT_S_7"}, description="NTT radix cut #7"}
[section.info.register.ntt_pbs]
description="Maximum number of PBS in the NTT pipeline"
owner="Parameter"
read_access="Read"
write_access="None"
field.batch_pbs_nb = { size_b=8, offset_b=0 , default={Param="BATCH_PBS_NB"}, description="Maximum number of PBS in the NTT pipe"}
field.total_pbs_nb = { size_b=8, offset_b=8 , default={Param="TOTAL_PBS_NB"}, description="Maximum number of PBS stored in PEP buffer"}
[section.info.register.ntt_modulo]
description="Code associated to the NTT prime"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="MOD_NTT_NAME"}
[section.info.register.application]
description="Code associated with the application"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="APPLICATION_NAME"}
[section.info.register.ks_structure]
description="Key-switch structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.x = { size_b=8, offset_b=0 , default={Param="LBX"}, description="Number of coefficients on X dimension"}
field.y = { size_b=8, offset_b=8 , default={Param="LBY"}, description="Number of coefficients on Y dimension"}
field.z = { size_b=8, offset_b=16, default={Param="LBZ"}, description="Number of coefficients on Z dimension"}
[section.info.register.ks_crypto_param]
description="Key-switch crypto parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.mod_ksk_w = { size_b=8, offset_b=0 , default={Param="MOD_KSK_W"}, description="Width of KSK modulo"}
field.ks_l = { size_b=8, offset_b=8 , default={Param="KS_L"}, description="Number of KS decomposition level"}
field.ks_b = { size_b=8, offset_b=16, default={Param="KS_B_W"}, description="Width of KS decomposition base"}
[section.info.register.regf_structure]
description="Register file structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.reg_nb = { size_b=8, offset_b=0 , default={Param="REGF_REG_NB"}, description="Number of registers in regfile"}
field.coef_nb = { size_b=8, offset_b=8 , default={Param="REGF_COEF_NB"}, description="Number of coefficients at regfile interface"}
[section.info.register.isc_structure]
description="Instruction scheduler structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.depth = { size_b=8, offset_b=0 , default={Param="ISC_DEPTH"}, description="Number of slots in ISC lookahead buffer."}
field.min_iop_size = { size_b=8, offset_b=8 , default={Param="MIN_IOP_SIZE"}, description="Minimum number of DOp per IOp to prevent sync_id overflow."}
[section.info.register.pe_properties]
description="Processing elements parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.alu_nb = { size_b=8, offset_b=24 , default={Param="PEA_ALU_NB"}, description="Number of coefficients processed in parallel in pe_alu"}
field.pep_regf_period = { size_b=8, offset_b=16 , default={Param="PEP_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEP and regfile"}
field.pem_regf_period = { size_b=8, offset_b=8 , default={Param="PEM_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEM and regfile"}
field.pea_regf_period = { size_b=8, offset_b=0 , default={Param="PEA_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEA and regfile"}
[section.info.register.bsk_structure]
description="BSK manager structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.bsk_cut_nb = { size_b=8, offset_b=8 , default={Param="BSK_CUT_NB"}, description="BSK cut nb"}
[section.info.register.ksk_structure]
description="KSK manager structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.ksk_cut_nb = { size_b=8, offset_b=8 , default={Param="KSK_CUT_NB"}, description="KSK cut nb"}
[section.info.register.hbm_axi4_nb]
description="Number of AXI4 connections to HBM"
owner="Parameter"
read_access="Read"
write_access="None"
field.bsk_pc = { size_b=8, offset_b=0 , default={Param="BSK_PC"}, description="Number of HBM connections for BSK"}
field.ksk_pc = { size_b=8, offset_b=8, default={Param="KSK_PC"}, description="Number of HBM connections for KSK"}
field.pem_pc = { size_b=8, offset_b=16, default={Param="PEM_PC"}, description="Number of HBM connections for ciphertexts (PEM)"}
field.glwe_pc = { size_b=8, offset_b=24, default={Param="GLWE_PC"}, description="Number of HBM connections for GLWE"}
[section.info.register.hbm_axi4_dataw_pem]
description="Ciphertext HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_PEM_DATA_W"}
[section.info.register.hbm_axi4_dataw_glwe]
description="GLWE HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_GLWE_DATA_W"}
[section.info.register.hbm_axi4_dataw_bsk]
description="BSK HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_BSK_DATA_W"}
[section.info.register.hbm_axi4_dataw_ksk]
description="KSK HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_KSK_DATA_W"}
# =====================================================================================================================
[section.bpip]
offset= 0x200
description="BPIP configuration"
[section.bpip.register.use]
description="(1) Use BPIP mode, (0) use IPIP mode (default)"
owner="User"
read_access="Read"
write_access="Write"
field.use_bpip = { size_b=1, offset_b=0 , default={Cst=1}, description="use"}
field.use_opportunism = { size_b=1, offset_b=1 , default={Cst=0}, description="use opportunistic PBS flush"}
[section.bpip.register.timeout]
description="Timeout for BPIP mode"
owner="User"
read_access="Read"
write_access="Write"
default={Cst=0xffffffff}
# =====================================================================================================================
[section.hbm_axi4_addr_1in3]
offset= 0x400
description="HBM AXI4 connection address offset"
[section.hbm_axi4_addr_1in3.register.ct]
description="Address offset for each ciphertext HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb","_pc1_lsb", "_pc1_msb"]
[section.hbm_axi4_addr_1in3.register.glwe]
description="Address offset for each GLWE HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb"]
[section.hbm_axi4_addr_1in3.register.ksk]
description="Address offset for each KSK HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb", "_pc1_lsb", "_pc1_msb", "_pc2_lsb", "_pc2_msb", "_pc3_lsb", "_pc3_msb", "_pc4_lsb", "_pc4_msb", "_pc5_lsb", "_pc5_msb", "_pc6_lsb", "_pc6_msb", "_pc7_lsb", "_pc7_msb"]
[section.hbm_axi4_addr_1in3.register.trc]
description="Address offset for each trace HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb"]
# =====================================================================================================================
[section.hbm_axi4_addr_3in3]
description="HBM AXI4 connection address offset"
[section.hbm_axi4_addr_3in3.register.bsk]
description="Address offset for each BSK HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb", "_pc1_lsb", "_pc1_msb", "_pc2_lsb", "_pc2_msb", "_pc3_lsb", "_pc3_msb", "_pc4_lsb", "_pc4_msb", "_pc5_lsb", "_pc5_msb", "_pc6_lsb", "_pc6_msb", "_pc7_lsb", "_pc7_msb"]
# =====================================================================================================================
[section.status_1in3]
description="HPU status of part 1in3"
offset= 0x800
[section.status_1in3.register.error]
description="Error register (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.pbs = { size_b=32, offset_b=0 , default={Cst=0}, description="HPU error part 1in3"}
# =====================================================================================================================
[section.status_3in3]
description="HPU status of parts 2in3 and 3in3"
[section.status_3in3.register.error]
description="Error register (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.pbs = { size_b=32, offset_b=0 , default={Cst=0}, description="HPU error part 3in3"}
# =====================================================================================================================
[section.ksk_avail]
description="KSK availability configuration"
offset= 0x1000
[section.ksk_avail.register.avail]
description="KSK available bit"
owner="User"
read_access="Read"
write_access="Write"
field.avail = { size_b=1, offset_b=0 , default={Cst=0}, description="avail"}
[section.ksk_avail.register.reset]
description="KSK reset sequence"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.request = { size_b=1, offset_b=0 , default={Cst=0}, description="request"}
field.done = { size_b=1, offset_b=31 , default={Cst=0}, description="done"}
# =====================================================================================================================
[section.bsk_avail]
description="BSK availability configuration"
[section.bsk_avail.register.avail]
description="BSK available bit"
owner="User"
read_access="Read"
write_access="Write"
field.avail = { size_b=1, offset_b=0 , default={Cst=0}, description="avail"}
[section.bsk_avail.register.reset]
description="BSK reset sequence"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.request = { size_b=1, offset_b=0 , default={Cst=0}, description="request"}
field.done = { size_b=1, offset_b=31 , default={Cst=0}, description="done"}
# =====================================================================================================================
[section.runtime_1in3]
description="Runtime information"
offset= 0x2000
[section.runtime_1in3.register.pep_cmux_loop]
description="PEP: CMUX iteration loop number"
owner="Kernel"
read_access="Read"
write_access="None"
field.br_loop = { size_b=15, offset_b=0 , default={Cst=0}, description="PBS current BR-loop"}
field.br_loop_c = { size_b=1, offset_b=15 , default={Cst=0}, description="PBS current BR-loop parity"}
field.ks_loop = { size_b=15, offset_b=16 , default={Cst=0}, description="KS current KS-loop"}
field.ks_loop_c = { size_b=1, offset_b=31 , default={Cst=0}, description="KS current KS-loop parity"}
[section.runtime_1in3.register.pep_pointer_0]
description="PEP: pointers (part 1)"
owner="Kernel"
read_access="Read"
write_access="None"
field.pool_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP pool_rp"}
field.pool_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP pool_wp"}
field.ldg_pt = { size_b=8, offset_b=16 , default={Cst=0}, description="PEP ldg_pt"}
field.ldb_pt = { size_b=8, offset_b=24 , default={Cst=0}, description="PEP ldb_pt"}
[section.runtime_1in3.register.pep_pointer_1]
description="PEP: pointers (part 2)"
owner="Kernel"
read_access="Read"
write_access="None"
field.ks_in_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP ks_in_rp"}
field.ks_in_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP ks_in_wp"}
field.ks_out_rp = { size_b=8, offset_b=16 , default={Cst=0}, description="PEP ks_out_rp"}
field.ks_out_wp = { size_b=8, offset_b=24 , default={Cst=0}, description="PEP ks_out_wp"}
[section.runtime_1in3.register.pep_pointer_2]
description="PEP: pointers (part 3)"
owner="Kernel"
read_access="Read"
write_access="None"
field.pbs_in_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP pbs_in_rp"}
field.pbs_in_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP pbs_in_wp"}
field.ipip_flush_last_pbs_in_loop = { size_b=16, offset_b=16 , default={Cst=0}, description="PEP IPIP flush last pbs_in_loop"}
[section.runtime_1in3.register.isc_latest_instruction]
description="ISC: 4 latest instructions received ([0] is the most recent)"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_0","_1","_2","_3"]
[section.runtime_1in3.register.pep_seq_bpip_batch_cnt]
description="PEP: BPIP batch counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_flush_cnt]
description="PEP: BPIP batch triggered by a flush counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_timeout_cnt]
description="PEP: BPIP batch triggered by a timeout counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_waiting_batch_cnt]
description="PEP: BPIP batch that waits the trigger counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_filling_cnt]
description="PEP: Count batch with filled with a given number of CT (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_1","_2","_3","_4","_5","_6","_7","_8","_9","_10","_11","_12","_13","_14","_15","_16"]
[section.runtime_1in3.register.pep_seq_ld_ack_cnt]
description="PEP: load BLWE ack counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_cmux_not_full_batch_cnt]
description="PEP: not full batch CMUX counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_ipip_flush_cnt]
description="PEP: IPIP flush CMUX counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldb_rcp_dur]
description="PEP: load BLWE reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldg_req_dur]
description="PEP: load GLWE request max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldg_rcp_dur]
description="PEP: load GLWE reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_load_ksk_rcp_dur]
description="PEP: load KSK slice reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_pc0","_pc1","_pc2","_pc3","_pc4","_pc5","_pc6","_pc7","_pc8","_pc9","_pc10","_pc11","_pc12","_pc13","_pc14","_pc15"]
[section.runtime_1in3.register.pep_mmacc_sxt_rcp_dur]
description="PEP: MMACC SXT reception duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_mmacc_sxt_req_dur]
description="PEP: MMACC SXT request duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_mmacc_sxt_cmd_wait_b_dur]
description="PEP: MMACC SXT command wait for b duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_inst_cnt]
description="PEP: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ack_cnt]
description="PEP: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_inst_cnt]
description="PEM: load input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_ack_cnt]
description="PEM: load instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_store_inst_cnt]
description="PEM: store input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_store_ack_cnt]
description="PEM: store instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pea_inst_cnt]
description="PEA: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pea_ack_cnt]
description="PEA: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.isc_inst_cnt]
description="ISC: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.isc_ack_cnt]
description="ISC: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_info_0]
description="PEM: load first data)"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_pc0_0","_pc0_1","_pc0_2","_pc0_3","_pc1_0","_pc1_1","_pc1_2","_pc1_3"]
[section.runtime_1in3.register.pem_load_info_1]
description="PEM: load first address"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_pc0_lsb","_pc0_msb","_pc1_lsb","_pc1_msb"]
[section.runtime_1in3.register.pem_store_info_0]
description="PEM: store info 0)"
owner="Kernel"
read_access="Read"
write_access="None"
field.cmd_vld = { size_b=1, offset_b=0 , default={Cst=0}, description="PEM_ST cmd vld"}
field.cmd_rdy = { size_b=1, offset_b=1 , default={Cst=0}, description="PEM_ST cmd rdy"}
field.pem_regf_rd_req_vld = { size_b=1, offset_b=2 , default={Cst=0}, description="PEM_ST pem_regf_rd_req_vld"}
field.pem_regf_rd_req_rdy = { size_b=1, offset_b=3 , default={Cst=0}, description="PEM_ST pem_regf_rd_req_rdy"}
field.brsp_fifo_in_vld = { size_b=4, offset_b=4 , default={Cst=0}, description="PEM_ST brsp_fifo_in_vld"}
field.brsp_fifo_in_rdy = { size_b=4, offset_b=8 , default={Cst=0}, description="PEM_ST brsp_fifo_in_rdy"}
field.rcp_fifo_in_vld = { size_b=4, offset_b=12 , default={Cst=0}, description="PEM_ST rcp_fifo_in_vld"}
field.rcp_fifo_in_rdy = { size_b=4, offset_b=16 , default={Cst=0}, description="PEM_ST rcp_fifo_in_rdy"}
field.r2_axi_vld = { size_b=4, offset_b=20 , default={Cst=0}, description="PEM_ST r2_axi_vld"}
field.r2_axi_rdy = { size_b=4, offset_b=24 , default={Cst=0}, description="PEM_ST r2_axi_rdy"}
field.c0_enough_location = { size_b=4, offset_b=28 , default={Cst=0}, description="PEM_ST c0_enough_location"}
[section.runtime_1in3.register.pem_store_info_1]
description="PEM: store info 1"
owner="Kernel"
read_access="Read"
write_access="None"
field.s0_cmd_vld = { size_b=4, offset_b=0 , default={Cst=0}, description="PEM_ST s0_cmd_vld"}
field.s0_cmd_rdy = { size_b=4, offset_b=4 , default={Cst=0}, description="PEM_ST s0_cmd_rdy"}
field.m_axi_bvalid = { size_b=4, offset_b=8 , default={Cst=0}, description="PEM_ST m_axi_bvalid"}
field.m_axi_bready = { size_b=4, offset_b=12 , default={Cst=0}, description="PEM_ST m_axi_bready"}
field.m_axi_wvalid = { size_b=4, offset_b=16 , default={Cst=0}, description="PEM_ST m_axi_wvalid"}
field.m_axi_wready = { size_b=4, offset_b=20 , default={Cst=0}, description="PEM_ST m_axi_wready"}
field.m_axi_awvalid = { size_b=4, offset_b=24 , default={Cst=0}, description="PEM_ST m_axi_awvalid"}
field.m_axi_awready = { size_b=4, offset_b=28 , default={Cst=0}, description="PEM_ST m_axi_awready"}
[section.runtime_1in3.register.pem_store_info_2]
description="PEM: store info 2"
owner="Kernel"
read_access="Read"
write_access="None"
field.c0_free_loc_cnt = { size_b=16, offset_b=0 , default={Cst=0}, description="PEM_ST c0_free_loc_cnt"}
field.brsp_bresp_cnt = { size_b=16, offset_b=16 , default={Cst=0}, description="PEM_ST brsp_bresp_cnt"}
[section.runtime_1in3.register.pem_store_info_3]
description="PEM: store info 3"
owner="Kernel"
read_access="Read"
write_access="None"
field.brsp_ack_seen = { size_b=16, offset_b=0 , default={Cst=0}, description="PEM_ST brsp_ack_seen"}
field.c0_cmd_cnt = { size_b=8, offset_b=16 , default={Cst=0}, description="PEM_ST c0_cmd_cnt"}
# =====================================================================================================================
[section.runtime_3in3]
description="Runtime information"
[section.runtime_3in3.register.pep_load_bsk_rcp_dur]
description="PEP: load BSK slice reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_pc0","_pc1","_pc2","_pc3","_pc4","_pc5","_pc6","_pc7","_pc8","_pc9","_pc10","_pc11","_pc12","_pc13","_pc14","_pc15"]
[section.runtime_3in3.register.pep_bskif_req_info_0]
description="PEP: BSK_IF: requester info 0"
owner="Kernel"
read_access="Read"
write_access="None"
field.req_br_loop_rp = { size_b=16, offset_b=0 , default={Cst=0}, description="PEP BSK_IF requester BSK read pointer"}
field.req_br_loop_wp = { size_b=16, offset_b=16 , default={Cst=0}, description="PEP BSK_IF requester BSK write pointer"}
[section.runtime_3in3.register.pep_bskif_req_info_1]
description="PEP: BSK_IF: requester info 0"
owner="Kernel"
read_access="Read"
write_access="None"
field.req_prf_br_loop = { size_b=16, offset_b=0 , default={Cst=0}, description="PEP BSK_IF requester BSK prefetch pointer"}
field.req_parity = { size_b=1, offset_b=16 , default={Cst=0}, description="PEP BSK_IF requester BSK pointer parity"}
field.req_assigned = { size_b=1, offset_b=31 , default={Cst=0}, description="PEP BSK_IF requester assignment"}
# =====================================================================================================================
[section.WorkAck]
description="Purpose of this section"
offset= 0x8000
[section.WorkAck.register.workq]
description="Insert work in workq and read status"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.WorkAck.register.ackq]
description="Pop ack from in ackq"
owner="Kernel"
read_access="ReadNotify"
write_access="None"

View File

@@ -0,0 +1,74 @@
NB: Versal don't have the pdi embedded in the configuration. Instead user is in charge of pdi upload in FPGA flash.
Thus, a given configuration could works on multiple pdi.
# Fpga version @250MHz
This configuration as based on the following Fpga commit:
```
commit ad668f931eff0c281a0848d43360da0b8813539a (HEAD -> dev/hpu_v80, origin/dev/hpu_v80, origin/baroux/dev/hpu_v80, baroux/dev/hpu_v80)
Merge: 1489024a f308f067
Author: Baptiste Roux <baptiste.roux@zama.ai>
Date: Fri Feb 14 19:02:53 2025 +0100
[MERGE] 'dev/hpu' into baroux/dev/hpu_v80
Retrieved CI bugfix from dev/hpu
```
Tagged as `aved_v1.0`
Built with the following command: (i.e. versal/run_syn_hpu_msplit_3parts_psi32.sh)
```
TOP=top_hpu_assembly
TOP_MSPLIT=TOP_MSPLIT_1
TOP_BATCH=TOP_BATCH_TOPhpu_BPBS12_TPBS32
TOP_PCMAX=TOP_PCMAX_pem2_glwe1_bsk16_ksk16
TOP_PC=TOP_PC_pem2_glwe1_bsk8_ksk16
APPLICATION=APPLI_msg2_carry2_pfail64_132b_gaussian_1f72dba
NTT_MOD=NTT_MOD_goldilocks
NTT_CORE_ARCH=NTT_CORE_ARCH_gf64
NTT_CORE_R_PSI=NTT_CORE_R2_PSI32
NTT_CORE_RDX_CUT=NTT_CORE_RDX_CUT_n5c6
NTT_CORE_DIV=NTT_CORE_DIV_1
BSK_SLOT_CUT=BSK_SLOT8_CUT8
KSK_SLOT_CUT=KSK_SLOT8_CUT16
KSLB=KSLB_x3y64z3
HPU_PART=HPU_PART_gf64
AXI_DATA_W=AXI_DATA_W_256
FPGA=FPGA_v80
just build $TOP new "-F TOP_MSPLIT $TOP_MSPLIT -F TOP_BATCH $TOP_BATCH -F TOP_PCMAX $TOP_PCMAX -F TOP_PC $TOP_PC -F APPLICATION $APPLICATION -F NTT_MOD $NTT_MOD -F NTT_CORE_ARCH $NTT_CORE_ARCH -F NTT_CORE_R_PSI $NTT_CORE_R_PSI -F NTT_CORE_RDX_CUT $NTT_CORE_RDX_CUT -F NTT_CORE_DIV $NTT_CORE_DIV -F BSK_SLOT_CUT $BSK_SLOT_CUT -F KSK_SLOT_CUT $KSK_SLOT_CUT -F KSLB $KSLB -F HPU_PART $HPU_PART -F AXI_DATA_W $AXI_DATA_W -F FPGA $FPGA" | tee build_out.log
```
# Fpga version @350MHz
This configuration as based on the following Fpga commit:
```
commit d29dbeaccf09adfe0ee13e326f4633e14726b020 (HEAD -> baroux/dev/hpu_v80_2024.2, origin/baroux/dev/hpu_v80_2024.2)
Author: pgardratzama <pierre.gardrat@zama.ai>
Date: Tue Feb 11 16:12:10 2025 +0100
adds script to synthetize HPU 1 part PSI32
```
Mainly the that commit as above with flow modification from Pierre Gardrat to support Vivado 2024.2.
NB: Based on unofficial branch and thus not tagged
Built with the following command: (i.e. versal/run_syn_hpu_1part_psi32.sh)
```
TOP=fpga_top_hpu
TOP_MSPLIT=TOP_MSPLIT_1
TOP_BATCH=TOP_BATCH_TOPhpu_BPBS12_TPBS32
TOP_PCMAX=TOP_PCMAX_pem2_glwe1_bsk16_ksk16
TOP_PC=TOP_PC_pem2_glwe1_bsk8_ksk16
APPLICATION=APPLI_msg2_carry2_pfail64_132b_gaussian_1f72dba
NTT_MOD=NTT_MOD_goldilocks
NTT_CORE_ARCH=NTT_CORE_ARCH_gf64
NTT_CORE_R_PSI=NTT_CORE_R2_PSI32
NTT_CORE_RDX_CUT=NTT_CORE_RDX_CUT_n5c6
NTT_CORE_DIV=NTT_CORE_DIV_1
BSK_SLOT_CUT=BSK_SLOT8_CUT8
KSK_SLOT_CUT=KSK_SLOT8_CUT16
KSLB=KSLB_x3y64z3
HPU_PART=HPU_PART_gf64
AXI_DATA_W=AXI_DATA_W_256
FPGA=FPGA_v80
just build $TOP new "-F TOP_MSPLIT $TOP_MSPLIT -F TOP_BATCH $TOP_BATCH -F TOP_PCMAX $TOP_PCMAX -F TOP_PC $TOP_PC -F APPLICATION $APPLICATION -F NTT_MOD $NTT_MOD -F NTT_CORE_ARCH $NTT_CORE_ARCH -F NTT_CORE_R_PSI $NTT_CORE_R_PSI -F NTT_CORE_RDX_CUT $NTT_CORE_RDX_CUT -F NTT_CORE_DIV $NTT_CORE_DIV -F BSK_SLOT_CUT $BSK_SLOT_CUT -F KSK_SLOT_CUT $KSK_SLOT_CUT -F KSLB $KSLB -F HPU_PART $HPU_PART -F AXI_DATA_W $AXI_DATA_W -F FPGA $FPGA" | tee build_out.log
```

View File

@@ -0,0 +1,15 @@
# CUST_0
# Simple IOp to check the xfer between Hpu/Cpu
# Construct constant in dest slot -> 249 (0xf9)
SUB R0 R0 R0
ADDS R0 R0 1
ST TD[0].0 R0
SUB R1 R1 R1
ADDS R1 R1 2
ST TD[0].1 R1
SUB R2 R2 R2
ADDS R2 R2 3
ST TD[0].2 R2
SUB R3 R3 R3
ADDS R3 R3 3
ST TD[0].3 R3

View File

@@ -0,0 +1,11 @@
# CUST_1
# Simple IOp to check the xfer between Hpu/Cpu
# Dest <- Src_a
LD R0 TS[0].0
LD R1 TS[0].1
LD R2 TS[0].2
LD R3 TS[0].3
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,25 @@
; CUST_8
; Simple IOp to check the ALU operation
; Dst[0].0 <- Src[0].0 + Src[1].0
LD R1 TS[0].0
LD R2 TS[1].0
ADD R0 R1 R2
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 + Src[1].1
LD R5 TS[0].1
LD R6 TS[1].1
ADD R4 R5 R6
ST TD[0].2 R4
; Dst[0].2 <- Src[0].2 + Src[1].2
LD R9 TS[0].2
LD R10 TS[1].2
ADD R8 R9 R10
ST TD[0].2 R8
; Dst[0].3 <- Src[0].3 + Src[1].3
LD R13 TS[0].3
LD R14 TS[1].3
ADD R12 R13 R14
ST TD[0].3 R0

View File

@@ -0,0 +1,6 @@
# CUST_16
# Simple IOp to check PBS behavior
# Dest <- PBSNone(Src_a.0)
LD R0 TS[0].0
PBS_F R0 R0 PbsNone
ST TD[0].0 R0

View File

@@ -0,0 +1,15 @@
# CUST_17
# Simple IOp to check PBS behavior
# Dest <- PBSNone(Src_a)
LD R0 TS[0].0
PBS R0 R0 PbsNone
ST TD[0].0 R0
LD R1 TS[0].1
PBS R1 R1 PbsNone
ST TD[0].1 R1
LD R2 TS[0].2
PBS R2 R2 PbsNone
ST TD[0].2 R2
LD R3 TS[0].3
PBS_F R3 R3 PbsNone
ST TD[0].3 R3

View File

@@ -0,0 +1,23 @@
; CUST_18
; Simple IOp to check extraction pattern
; Correct result:
; * Dst[0,1] <- Src[0][0,1]
; * Dst[2,3] <- Src[1][0,1]
; Pack Src[0][0,1] with a Mac and extract Carry/Msg in Dst[0][0,1]
LD R0 TS[0].0
LD R1 TS[0].1
MAC R3 R1 R0 4
PBS R4 R3 PbsMsgOnly
PBS R5 R3 PbsCarryInMsg
ST TD[0].0 R4
ST TD[0].1 R5
; Pack Src[1][0,1] with a Mac and extract Carry/Msg in Dst[0][2,3]
LD R10 TS[1].0
LD R11 TS[1].1
MAC R13 R11 R10 4
PBS R14 R13 PbsMsgOnly
PBS R15 R13 PbsCarryInMsg
ST TD[0].2 R14
ST TD[0].3 R15

View File

@@ -0,0 +1,19 @@
; CUST_19
; Simple IOp to check PbsMl2
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- 0
; * Dst[0][2] <- Src[0][0] +1
; * Dst[0][3] <- 0
; i.e Cust_19(0x2) => 0x32
; Construct a 0 for destination padding
SUB R16 R16 R16
; Apply PbsMl2 on Src[0] result goes in dest[0][0-3] (0-padded)
LD R0 TS[0].0
PBS_ML2_F R0 R0 PbsTestMany2
ST TD[0].0 R0
ST TD[0].1 R16
ST TD[0].2 R1
ST TD[0].3 R16

View File

@@ -0,0 +1,11 @@
# CUST_2
# Simple IOp to check the xfer between Hpu/Cpu
# Dest <- Src_b
LD R0 TS[1].0
LD R1 TS[1].1
LD R2 TS[1].2
LD R3 TS[1].3
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,22 @@
; CUST_20
; Simple IOp to check PbsMl4
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- Src[0][0] +1
; * Dst[0][2] <- Src[0][0] +2
; * Dst[0][3] <- Src[0][0] +3
; i.e Cust_20(0x0) => 0xe4
SUB R16 R16 R16
ST TD[0].0 R0
ST TD[0].1 R0
ST TD[0].2 R0
ST TD[0].3 R0
; Apply PbsMl4 on Src[0] result goes in dest[0][0-3]
LD R0 TS[0].0
PBS_ML4_F R0 R0 PbsTestMany4
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3

View File

@@ -0,0 +1,24 @@
; CUST_21
; Simple IOp to check PbsMl8
; WARN: This operation required 16b ct width
; Correct result:
; * Dst[0][0] <- Src[0][0]
; * Dst[0][1] <- Src[0][0] +1
; * Dst[0][2] <- Src[0][0] +2
; * Dst[0][3] <- Src[0][0] +3
; * Dst[0][4] <- Src[0][0] +4
; * Dst[0][5] <- Src[0][0] +5
; * Dst[0][6] <- Src[0][0] +6
; * Dst[0][7] <- Src[0][0] +7
; Apply PbsMl8 on Src[0] result goes in dest[0][0-7]
LD R0 TS[0].0
PBS_ML8_F R0 R0 PbsTestMany8
ST TD[0].0 R0
ST TD[0].1 R1
ST TD[0].2 R2
ST TD[0].3 R3
ST TD[0].4 R4
ST TD[0].5 R5
ST TD[0].6 R6
ST TD[0].7 R7

View File

@@ -0,0 +1,16 @@
# CUST_3
# Simple IOp to check isc behavior
# Generate obvious deps and check that isc correctly issued the dop
# Correct result must bu Dest <- Src[0]
LD R0 TS[0].0
LD R1 TS[0].1
LD R2 TS[0].2
LD R3 TS[0].3
PBS R4 R0 PbsNone
ST TD[0].0 R4
PBS R4 R1 PbsNone
ST TD[0].1 R4
PBS R4 R2 PbsNone
ST TD[0].2 R4
PBS_F R4 R3 PbsNone
ST TD[0].3 R4

View File

@@ -0,0 +1,264 @@
# CUST_4
# Just to check if this batch times out
LD R0 TS[0].31
LD R1 TS[1].31
LD R3 TS[0].27
LD R4 TS[1].27
LD R6 TS[0].30
LD R7 TS[1].30
LD R9 TS[0].28
LD R10 TS[1].28
LD R12 TS[0].29
LD R13 TS[1].29
LD R15 TS[0].23
LD R16 TS[1].23
LD R18 TS[0].26
LD R19 TS[1].26
LD R21 TS[0].24
LD R22 TS[1].24
LD R24 TS[0].20
LD R25 TS[1].20
LD R27 TS[0].13
LD R28 TS[1].13
LD R30 TS[0].25
LD R31 TS[1].25
LD R33 TS[0].22
LD R34 TS[1].22
LD R36 TS[0].17
LD R37 TS[1].17
LD R39 TS[0].19
LD R40 TS[1].19
LD R42 TS[0].15
LD R43 TS[1].15
LD R45 TS[0].12
LD R46 TS[1].12
LD R48 TS[0].7
LD R49 TS[1].7
LD R51 TS[0].6
LD R52 TS[1].6
LD R54 TS[0].10
LD R55 TS[1].10
LD R57 TS[0].14
LD R58 TS[1].14
LD R60 TS[0].11
LD R61 TS[1].11
ADD R2 R0 R1
ADD R5 R3 R4
LD R63 TS[0].18
LD R3 TS[1].18
ADD R8 R6 R7
ST TH.0 R6
ST TH.1 R7
ADD R11 R9 R10
ST TH.2 R11
LD R9 TH.2
ADD R14 R12 R13
ST TH.3 R12
ST TH.4 R13
ADD R17 R15 R16
ST TH.5 R17
ADD R20 R18 R19
ST TH.6 R18
ST TH.7 R19
LD R15 TH.5
ADD R23 R21 R22
ST TH.8 R23
LD R21 TH.8
ADD R26 R24 R25
ST TH.9 R24
ST TH.10 R25
ADD R29 R27 R28
ST TH.11 R29
LD R27 TH.11
ADD R32 R30 R31
ST TH.12 R30
ST TH.13 R31
ADD R35 R33 R34
ST TH.14 R35
ADD R38 R36 R37
ST TH.15 R36
ST TH.16 R37
LD R33 TH.14
PBS_ML2 R0 R2 PbsManyGenProp
PBS_ML2 R6 R5 PbsManyGenProp
PBS_ML2 R10 R9 PbsManyGenProp
PBS_ML2 R12 R8 PbsManyGenProp
PBS_ML2 R16 R14 PbsManyGenProp
PBS_ML2 R18 R15 PbsManyGenProp
PBS_ML2 R22 R21 PbsManyGenProp
PBS_ML2 R24 R20 PbsManyGenProp
PBS_ML2 R28 R27 PbsManyGenProp
PBS_ML2 R30 R26 PbsManyGenProp
PBS_ML2 R34 R32 PbsManyGenProp
PBS_ML2_F R36 R33 PbsManyGenProp
ADD R41 R39 R40
LD R39 TS[0].16
LD R40 TS[1].16
ST TH.17 R38
ST TH.18 R33
LD R33 TS[0].1
ST TH.19 R32
LD R32 TS[1].1
ST TH.20 R26
ST TH.21 R27
LD R27 TS[0].21
ST TH.22 R20
LD R20 TS[1].21
ST TH.23 R21
ST TH.24 R15
LD R15 TS[0].0
ST TH.25 R14
LD R14 TS[1].0
ST TH.26 R8
ST TH.27 R9
LD R9 TS[0].3
ST TH.28 R5
LD R5 TS[1].3
ST TH.29 R2
ADD R44 R42 R43
LD R42 TS[0].2
LD R43 TS[1].2
ST TH.30 R41
ADD R47 R45 R46
LD R45 TS[0].9
LD R46 TS[1].9
ST TH.31 R44
ADD R50 R48 R49
LD R48 TS[0].5
LD R49 TS[1].5
ST TH.32 R47
ADD R53 R51 R52
LD R51 TS[0].4
LD R52 TS[1].4
ST TH.33 R50
ADD R56 R54 R55
LD R54 TS[0].8
LD R55 TS[1].8
ST TH.34 R53
ADD R59 R57 R58
ADD R62 R60 R61
ADD R4 R63 R3
ADD R38 R39 R40
ADD R26 R33 R32
ADD R21 R27 R20
ADD R8 R15 R14
ADD R2 R9 R5
ADD R41 R42 R43
ADD R44 R45 R46
ADD R47 R48 R49
ADD R50 R51 R52
ADD R53 R54 R55
MAC R57 R11 R7 2
LD R58 TH.31
LD R63 TH.32
LD R3 TH.17
ST TH.35 R41
LD R39 TH.30
ST TH.36 R21
ST TH.37 R47
ST TH.38 R53
ST TH.39 R44
ST TH.40 R50
ST TH.41 R0
LD R27 TH.35
ST TH.42 R12
ST TH.43 R13
LD R9 TH.39
ST TH.44 R16
ST TH.45 R17
LD R5 TH.37
ST TH.46 R18
ST TH.47 R19
ST TH.48 R6
LD R6 TH.40
ST TH.49 R22
ST TH.50 R23
ST TH.51 R10
LD R10 TH.38
ST TH.52 R24
ST TH.53 R25
ST TH.54 R28
LD R28 TH.33
ST TH.55 R30
ST TH.56 R31
ST TH.57 R29
LD R29 TH.36
ST TH.58 R34
ST TH.59 R35
ST TH.60 R36
LD R36 TH.34
PBS_ML2 R60 R58 PbsManyGenProp
PBS_ML2 R32 R38 PbsManyGenProp
PBS_ML2 R14 R63 PbsManyGenProp
PBS_ML2 R42 R8 PbsManyGenProp
PBS_ML2 R48 R3 PbsManyGenProp
PBS_ML2 R54 R62 PbsManyGenProp
PBS_ML2 R40 R39 PbsManyGenProp
PBS_ML2 R20 R4 PbsManyGenProp
PBS_ML2 R46 R59 PbsManyGenProp
PBS_ML2 R52 R26 PbsManyGenProp
PBS_ML2 R44 R56 PbsManyGenProp
PBS_ML2_F R50 R2 PbsManyGenProp
LD R11 TH.45
ST TH.61 R37
ST TH.62 R2
LD R2 TH.53
ST TH.63 R56
LD R56 TH.59
ST TH.64 R26
ST TH.65 R59
LD R59 TH.43
ST TH.66 R4
MAC R37 R11 R57 4
MAC R26 R2 R56 2
MAC R4 R59 R11 2
MAC R2 R4 R57 4
MAC R59 R33 R61 2
LD R58 TH.57
LD R62 TH.56
ADDS R4 R42 0
MAC R38 R47 R58 2
MAC R63 R49 R59 4
MAC R8 R21 R49 2
MULS R3 R43 2
ADDS R3 R3 0
MAC R39 R62 R41 2
MAC R42 R8 R59 4
MAC R21 R53 R3 4
PBS_ML2 R0 R27 PbsManyGenProp
PBS_ML2 R12 R9 PbsManyGenProp
PBS_ML2 R16 R5 PbsManyGenProp
PBS_ML2 R18 R6 PbsManyGenProp
PBS_ML2 R22 R10 PbsManyGenProp
PBS_ML2 R24 R28 PbsManyGenProp
PBS_ML2 R30 R29 PbsManyGenProp
PBS_ML2 R34 R36 PbsManyGenProp
PBS R11 R2 PbsReduceCarryPad
PBS R33 R4 PbsGenPropAdd
PBS R47 R3 PbsReduceCarry2
PBS_F R49 R42 PbsReduceCarryPad
MAC R43 R1 R53 2
ST TD[0].0 R33
LD R29 TH.61
MAC R8 R47 R52 4
ADDS R27 R11 1
MAC R9 R31 R39 4
ADDS R5 R49 1
MAC R6 R43 R3 4
MAC R10 R45 R13 2
MAC R28 R23 R25 2
MAC R36 R29 R31 2
MAC R2 R19 R51 2
MAC R4 R35 R17 2
MAC R1 R13 R28 4
MAC R53 R10 R28 4
MAC R47 R36 R39 4
MAC R52 R17 R2 4
MAC R11 R4 R2 4
PBS R62 R21 PbsReduceCarry3
PBS R42 R8 PbsGenPropAdd
PBS R33 R6 PbsReduceCarryPad
PBS R49 R53 PbsReduceCarryPad
PBS R43 R47 PbsReduceCarryPad
PBS_F R3 R11 PbsReduceCarryPad
MAC R45 R62 R0 4

View File

@@ -0,0 +1,19 @@
; CUST_8
; Simple IOp to check the ALU operation
; Dst[0].0 <- Src[0].0 + Src[1].0
LD R1 TS[0].0
LD R2 TS[1].0
ADD R0 R1 R2
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 - Src[1].1
LD R5 TS[0].1
LD R6 TS[1].1
SUB R4 R5 R6
ST TD[0].1 R4
; Dst[0].2 <- Src[0].2 + (Src[1].2 *4)
LD R9 TS[0].2
LD R10 TS[1].2
MAC R8 R9 R10 4
ST TD[0].2 R8

View File

@@ -0,0 +1,21 @@
; CUST_9
; Simple IOp to check the ALU Scalar operation
; Dst[0].0 <- Src[0].0 + Imm[0].0
LD R1 TS[0].0
ADDS R0 R1 TI[0].0
ST TD[0].0 R0
; Dst[0].1 <- Src[0].1 - Imm[0].1
LD R5 TS[0].1
SUBS R4 R5 TI[0].1
ST TD[0].1 R4
; Dst[0].2 <- Imm[0].2 - Src[0].2
LD R9 TS[0].2
SSUB R8 R9 TI[0].2
ST TD[0].2 R8
; Dst[0].3 <- Src[0].3 * Imm[0].3
LD R13 TS[0].3
MULS R12 R13 TI[0].3
ST TD[0].3 R12

View File

@@ -0,0 +1,112 @@
[fpga]
regmap=["${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_cfg_1in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_cfg_3in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_prc_1in3.toml",
"${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/hpu_regif_core_prc_3in3.toml"]
polling_us=10
[fpga.ffi.V80]
ami_id=1 # First ami device in the list
qdma_h2c="/dev/qdma${V80_PCIE_DEV}001-MM-1"
qdma_c2h="/dev/qdma${V80_PCIE_DEV}001-MM-2"
[rtl]
bpip_use = true
bpip_use_opportunism = true
bpip_timeout = 100_000
[board]
ct_mem = 32768
ct_pc = [
{Hbm= {pc=32}},
{Hbm= {pc=33}},
]
heap_size = 16384
lut_mem = 256
lut_pc = {Hbm={pc=34}}
fw_size= 16777216 # i.e. 16 MiB
fw_pc = {Ddr= {offset= 0x3900_0000}} # NB: Allocation must take place in the Discret DDR
bsk_pc = [
{Hbm={pc=8}},
{Hbm={pc=12}},
{Hbm={pc=24}},
{Hbm={pc=28}},
{Hbm={pc=40}},
{Hbm={pc=44}},
{Hbm={pc=56}},
{Hbm={pc=60}}
]
ksk_pc = [
{Hbm={pc=0}},
{Hbm={pc=1}},
{Hbm={pc=2}},
{Hbm={pc=3}},
{Hbm={pc=4}},
{Hbm={pc=5}},
{Hbm={pc=6}},
{Hbm={pc=7}},
{Hbm={pc=16}},
{Hbm={pc=17}},
{Hbm={pc=18}},
{Hbm={pc=19}},
{Hbm={pc=20}},
{Hbm={pc=21}},
{Hbm={pc=22}},
{Hbm={pc=23}}
]
trace_pc = {Hbm={pc=35}}
trace_depth = 32 # In MB
[firmware]
#implementation = "Ilp"
implementation = "Llt"
integer_w=[2,4,6,8,10,12,14,16,32,64,128]
min_batch_size = 11
kogge_cfg = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/kogge_cfg.toml"
custom_iop.'IOP[0]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_0.asm"
custom_iop.'IOP[1]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_1.asm"
custom_iop.'IOP[2]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_2.asm"
custom_iop.'IOP[3]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_3.asm"
custom_iop.'IOP[4]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_4.asm"
custom_iop.'IOP[8]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_8.asm"
custom_iop.'IOP[9]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_9.asm"
custom_iop.'IOP[16]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_16.asm"
custom_iop.'IOP[17]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_17.asm"
custom_iop.'IOP[18]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_18.asm"
custom_iop.'IOP[19]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_19.asm"
custom_iop.'IOP[20]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_20.asm"
custom_iop.'IOP[21]' = "${HPU_BACKEND_DIR}/config_store/${HPU_CONFIG}/custom_iop/cust_21.asm"
[firmware.op_cfg.default]
fill_batch_fifo = true
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.MUL]
fill_batch_fifo = false
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.MULS]
fill_batch_fifo = false
min_batch_size = false
use_tiers = false
flush_behaviour = "Patient"
flush = true
[firmware.op_cfg.by_op.ERC_20]
fill_batch_fifo = true
min_batch_size = false
use_tiers = true
flush_behaviour = "Patient"
flush = true

View File

@@ -0,0 +1,256 @@
module_name="hpu_regif_core_cfg_1in3"
description="HPU top-level register interface. Used by the host to retrieve design information, and to configure it."
word_size_b = 32
offset = 0x00
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.entry_cfg_1in3]
description="entry_cfg_1in3 section with known value used for debug."
offset= 0x0
[section.entry_cfg_1in3.register.dummy_val0]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x01010101}
[section.entry_cfg_1in3.register.dummy_val1]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x11111111}
[section.entry_cfg_1in3.register.dummy_val2]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x21212121}
[section.entry_cfg_1in3.register.dummy_val3]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x31313131}
# =====================================================================================================================
[section.info]
description="RTL architecture parameters"
offset= 0x10
[section.info.register.version]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="VERSION"}
[section.info.register.ntt_architecture]
description="NTT architecture"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="NTT_CORE_ARCH"}
[section.info.register.ntt_structure]
description="NTT structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.radix = { size_b=8, offset_b=0 , default={Param="R"}, description="NTT radix"}
field.psi = { size_b=8, offset_b=8 , default={Param="PSI"}, description="NTT psi"}
field.div = { size_b=8, offset_b=16, default={Param="BWD_PSI_DIV"}, description="NTT backward div"}
field.delta = { size_b=8, offset_b=24, default={Param="DELTA"}, description="NTT network delta (for wmm arch)"}
[section.info.register.ntt_rdx_cut]
description="NTT radix cuts, in log2 unit (for gf64 arch)"
owner="Parameter"
read_access="Read"
write_access="None"
field.radix_cut0 = { size_b=4, offset_b=0 , default={Param="NTT_RDX_CUT_S_0"}, description="NTT radix cut #0"}
field.radix_cut1 = { size_b=4, offset_b=4 , default={Param="NTT_RDX_CUT_S_1"}, description="NTT radix cut #1"}
field.radix_cut2 = { size_b=4, offset_b=8 , default={Param="NTT_RDX_CUT_S_2"}, description="NTT radix cut #2"}
field.radix_cut3 = { size_b=4, offset_b=12, default={Param="NTT_RDX_CUT_S_3"}, description="NTT radix cut #3"}
field.radix_cut4 = { size_b=4, offset_b=16, default={Param="NTT_RDX_CUT_S_4"}, description="NTT radix cut #4"}
field.radix_cut5 = { size_b=4, offset_b=20, default={Param="NTT_RDX_CUT_S_5"}, description="NTT radix cut #5"}
field.radix_cut6 = { size_b=4, offset_b=24, default={Param="NTT_RDX_CUT_S_6"}, description="NTT radix cut #6"}
field.radix_cut7 = { size_b=4, offset_b=28, default={Param="NTT_RDX_CUT_S_7"}, description="NTT radix cut #7"}
[section.info.register.ntt_pbs]
description="Maximum number of PBS in the NTT pipeline"
owner="Parameter"
read_access="Read"
write_access="None"
field.batch_pbs_nb = { size_b=8, offset_b=0 , default={Param="BATCH_PBS_NB"}, description="Maximum number of PBS in the NTT pipe"}
field.total_pbs_nb = { size_b=8, offset_b=8 , default={Param="TOTAL_PBS_NB"}, description="Maximum number of PBS stored in PEP buffer"}
[section.info.register.ntt_modulo]
description="Code associated to the NTT prime"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="MOD_NTT_NAME"}
[section.info.register.application]
description="Code associated with the application"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="APPLICATION_NAME"}
[section.info.register.ks_structure]
description="Key-switch structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.x = { size_b=8, offset_b=0 , default={Param="LBX"}, description="Number of coefficients on X dimension"}
field.y = { size_b=8, offset_b=8 , default={Param="LBY"}, description="Number of coefficients on Y dimension"}
field.z = { size_b=8, offset_b=16, default={Param="LBZ"}, description="Number of coefficients on Z dimension"}
[section.info.register.ks_crypto_param]
description="Key-switch crypto parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.mod_ksk_w = { size_b=8, offset_b=0 , default={Param="MOD_KSK_W"}, description="Width of KSK modulo"}
field.ks_l = { size_b=8, offset_b=8 , default={Param="KS_L"}, description="Number of KS decomposition level"}
field.ks_b = { size_b=8, offset_b=16, default={Param="KS_B_W"}, description="Width of KS decomposition base"}
[section.info.register.regf_structure]
description="Register file structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.reg_nb = { size_b=8, offset_b=0 , default={Param="REGF_REG_NB"}, description="Number of registers in regfile"}
field.coef_nb = { size_b=8, offset_b=8 , default={Param="REGF_COEF_NB"}, description="Number of coefficients at regfile interface"}
[section.info.register.isc_structure]
description="Instruction scheduler structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.depth = { size_b=8, offset_b=0 , default={Param="ISC_DEPTH"}, description="Number of slots in ISC lookahead buffer."}
field.min_iop_size = { size_b=8, offset_b=8 , default={Param="MIN_IOP_SIZE"}, description="Minimum number of DOp per IOp to prevent sync_id overflow."}
[section.info.register.pe_properties]
description="Processing elements parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.alu_nb = { size_b=8, offset_b=24 , default={Param="PEA_ALU_NB"}, description="Number of coefficients processed in parallel in pe_alu"}
field.pep_regf_period = { size_b=8, offset_b=16 , default={Param="PEP_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEP and regfile"}
field.pem_regf_period = { size_b=8, offset_b=8 , default={Param="PEM_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEM and regfile"}
field.pea_regf_period = { size_b=8, offset_b=0 , default={Param="PEA_REGF_PERIOD"}, description="Number of cycles between 2 consecutive data transfer between PEA and regfile"}
[section.info.register.bsk_structure]
description="BSK manager structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.bsk_cut_nb = { size_b=8, offset_b=8 , default={Param="BSK_CUT_NB"}, description="BSK cut nb"}
[section.info.register.ksk_structure]
description="KSK manager structure parameters"
owner="Parameter"
read_access="Read"
write_access="None"
field.ksk_cut_nb = { size_b=8, offset_b=8 , default={Param="KSK_CUT_NB"}, description="KSK cut nb"}
[section.info.register.hbm_axi4_nb]
description="Number of AXI4 connections to HBM"
owner="Parameter"
read_access="Read"
write_access="None"
field.bsk_pc = { size_b=8, offset_b=0 , default={Param="BSK_PC"}, description="Number of HBM connections for BSK"}
field.ksk_pc = { size_b=8, offset_b=8, default={Param="KSK_PC"}, description="Number of HBM connections for KSK"}
field.pem_pc = { size_b=8, offset_b=16, default={Param="PEM_PC"}, description="Number of HBM connections for ciphertexts (PEM)"}
field.glwe_pc = { size_b=8, offset_b=24, default={Param="GLWE_PC"}, description="Number of HBM connections for GLWE"}
[section.info.register.hbm_axi4_dataw_pem]
description="Ciphertext HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_PEM_DATA_W"}
[section.info.register.hbm_axi4_dataw_glwe]
description="GLWE HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_GLWE_DATA_W"}
[section.info.register.hbm_axi4_dataw_bsk]
description="BSK HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_BSK_DATA_W"}
[section.info.register.hbm_axi4_dataw_ksk]
description="KSK HBM AXI4 connection data width"
owner="Parameter"
read_access="Read"
write_access="None"
default={Param="AXI4_KSK_DATA_W"}
# =====================================================================================================================
[section.hbm_axi4_addr_1in3]
offset= 0x1000
description="HBM AXI4 connection address offset"
[section.hbm_axi4_addr_1in3.register.ct]
description="Address offset for each ciphertext HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb","_pc1_lsb", "_pc1_msb"]
[section.hbm_axi4_addr_1in3.register.glwe]
description="Address offset for each GLWE HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb"]
[section.hbm_axi4_addr_1in3.register.ksk]
description="Address offset for each KSK HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb", "_pc1_lsb", "_pc1_msb", "_pc2_lsb", "_pc2_msb", "_pc3_lsb", "_pc3_msb", "_pc4_lsb", "_pc4_msb", "_pc5_lsb", "_pc5_msb", "_pc6_lsb", "_pc6_msb", "_pc7_lsb", "_pc7_msb", "_pc8_lsb", "_pc8_msb", "_pc9_lsb", "_pc9_msb", "_pc10_lsb", "_pc10_msb", "_pc11_lsb", "_pc11_msb", "_pc12_lsb", "_pc12_msb", "_pc13_lsb", "_pc13_msb", "_pc14_lsb", "_pc14_msb", "_pc15_lsb", "_pc15_msb"]
[section.hbm_axi4_addr_1in3.register.trc]
description="Address offset for each trace HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb"]
# =====================================================================================================================
[section.bpip]
offset= 0x2000
description="BPIP configuration"
[section.bpip.register.use]
description="(1) Use BPIP mode, (0) use IPIP mode (default)"
owner="User"
read_access="Read"
write_access="Write"
field.use_bpip = { size_b=1, offset_b=0 , default={Cst=1}, description="use"}
field.use_opportunism = { size_b=1, offset_b=1 , default={Cst=0}, description="use opportunistic PBS flush"}
[section.bpip.register.timeout]
description="Timeout for BPIP mode"
owner="User"
read_access="Read"
write_access="Write"
default={Cst=0xffffffff}

View File

@@ -0,0 +1,51 @@
module_name="hpu_regif_core_cfg_3in3"
description="HPU top-level register interface. Used by the host to retrieve design information, and to configure it."
word_size_b = 32
offset = 0x20000
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.entry_cfg_3in3]
description="entry_cfg_3in3 section with known value used for debug."
offset= 0x0
[section.entry_cfg_3in3.register.dummy_val0]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x03030303}
[section.entry_cfg_3in3.register.dummy_val1]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x13131313}
[section.entry_cfg_3in3.register.dummy_val2]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x23232323}
[section.entry_cfg_3in3.register.dummy_val3]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x33333333}
# =====================================================================================================================
[section.hbm_axi4_addr_3in3]
description="HBM AXI4 connection address offset"
offset= 0x10
[section.hbm_axi4_addr_3in3.register.bsk]
description="Address offset for each BSK HBM AXI4 connection"
owner="User"
read_access="Read"
write_access="Write"
duplicate=["_pc0_lsb", "_pc0_msb", "_pc1_lsb", "_pc1_msb", "_pc2_lsb", "_pc2_msb", "_pc3_lsb", "_pc3_msb", "_pc4_lsb", "_pc4_msb", "_pc5_lsb", "_pc5_msb", "_pc6_lsb", "_pc6_msb", "_pc7_lsb", "_pc7_msb", "_pc8_lsb", "_pc8_msb", "_pc9_lsb", "_pc9_msb", "_pc10_lsb", "_pc10_msb", "_pc11_lsb", "_pc11_msb", "_pc12_lsb", "_pc12_msb", "_pc13_lsb", "_pc13_msb", "_pc14_lsb", "_pc14_msb", "_pc15_lsb", "_pc15_msb"]

View File

@@ -0,0 +1,336 @@
module_name="hpu_regif_core_prc_1in3"
description="HPU top-level register interface. Used by the host to retrieve design information, and to configure it."
word_size_b = 32
offset = 0x10000
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.entry_prc_1in3]
description="entry_prc_1in3 section with known value used for debug."
offset= 0x0
[section.entry_prc_1in3.register.dummy_val0]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x02020202}
[section.entry_prc_1in3.register.dummy_val1]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x12121212}
[section.entry_prc_1in3.register.dummy_val2]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x22222222}
[section.entry_prc_1in3.register.dummy_val3]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x32323232}
# =====================================================================================================================
[section.status_1in3]
description="HPU status of part 1in3"
offset= 0x10
[section.status_1in3.register.error]
description="Error register (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.pbs = { size_b=32, offset_b=0 , default={Cst=0}, description="HPU error part 1in3"}
# =====================================================================================================================
[section.ksk_avail]
description="KSK availability configuration"
offset= 0x1000
[section.ksk_avail.register.avail]
description="KSK available bit"
owner="User"
read_access="Read"
write_access="Write"
field.avail = { size_b=1, offset_b=0 , default={Cst=0}, description="avail"}
[section.ksk_avail.register.reset]
description="KSK reset sequence"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.request = { size_b=1, offset_b=0 , default={Cst=0}, description="request"}
field.done = { size_b=1, offset_b=31 , default={Cst=0}, description="done"}
# =====================================================================================================================
[section.runtime_1in3]
description="Runtime information"
offset= 0x2000
[section.runtime_1in3.register.pep_cmux_loop]
description="PEP: CMUX iteration loop number"
owner="Kernel"
read_access="Read"
write_access="None"
field.br_loop = { size_b=15, offset_b=0 , default={Cst=0}, description="PBS current BR-loop"}
field.br_loop_c = { size_b=1, offset_b=15 , default={Cst=0}, description="PBS current BR-loop parity"}
field.ks_loop = { size_b=15, offset_b=16 , default={Cst=0}, description="KS current KS-loop"}
field.ks_loop_c = { size_b=1, offset_b=31 , default={Cst=0}, description="KS current KS-loop parity"}
[section.runtime_1in3.register.pep_pointer_0]
description="PEP: pointers (part 1)"
owner="Kernel"
read_access="Read"
write_access="None"
field.pool_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP pool_rp"}
field.pool_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP pool_wp"}
field.ldg_pt = { size_b=8, offset_b=16 , default={Cst=0}, description="PEP ldg_pt"}
field.ldb_pt = { size_b=8, offset_b=24 , default={Cst=0}, description="PEP ldb_pt"}
[section.runtime_1in3.register.pep_pointer_1]
description="PEP: pointers (part 2)"
owner="Kernel"
read_access="Read"
write_access="None"
field.ks_in_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP ks_in_rp"}
field.ks_in_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP ks_in_wp"}
field.ks_out_rp = { size_b=8, offset_b=16 , default={Cst=0}, description="PEP ks_out_rp"}
field.ks_out_wp = { size_b=8, offset_b=24 , default={Cst=0}, description="PEP ks_out_wp"}
[section.runtime_1in3.register.pep_pointer_2]
description="PEP: pointers (part 3)"
owner="Kernel"
read_access="Read"
write_access="None"
field.pbs_in_rp = { size_b=8, offset_b=0 , default={Cst=0}, description="PEP pbs_in_rp"}
field.pbs_in_wp = { size_b=8, offset_b=8 , default={Cst=0}, description="PEP pbs_in_wp"}
field.ipip_flush_last_pbs_in_loop = { size_b=16, offset_b=16 , default={Cst=0}, description="PEP IPIP flush last pbs_in_loop"}
[section.runtime_1in3.register.isc_latest_instruction]
description="ISC: 4 latest instructions received ([0] is the most recent)"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_0","_1","_2","_3"]
[section.runtime_1in3.register.pep_seq_bpip_batch_cnt]
description="PEP: BPIP batch counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_flush_cnt]
description="PEP: BPIP batch triggered by a flush counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_timeout_cnt]
description="PEP: BPIP batch triggered by a timeout counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_waiting_batch_cnt]
description="PEP: BPIP batch that waits the trigger counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_bpip_batch_filling_cnt]
description="PEP: Count batch with filled with a given number of CT (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_1","_2","_3","_4","_5","_6","_7","_8","_9","_10","_11","_12","_13","_14","_15","_16"]
[section.runtime_1in3.register.pep_seq_ld_ack_cnt]
description="PEP: load BLWE ack counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_cmux_not_full_batch_cnt]
description="PEP: not full batch CMUX counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_seq_ipip_flush_cnt]
description="PEP: IPIP flush CMUX counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldb_rcp_dur]
description="PEP: load BLWE reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldg_req_dur]
description="PEP: load GLWE request max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ldg_rcp_dur]
description="PEP: load GLWE reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_load_ksk_rcp_dur]
description="PEP: load KSK slice reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_pc0","_pc1","_pc2","_pc3","_pc4","_pc5","_pc6","_pc7","_pc8","_pc9","_pc10","_pc11","_pc12","_pc13","_pc14","_pc15"]
[section.runtime_1in3.register.pep_mmacc_sxt_rcp_dur]
description="PEP: MMACC SXT reception duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_mmacc_sxt_req_dur]
description="PEP: MMACC SXT request duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_mmacc_sxt_cmd_wait_b_dur]
description="PEP: MMACC SXT command wait for b duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_inst_cnt]
description="PEP: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pep_ack_cnt]
description="PEP: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_inst_cnt]
description="PEM: load input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_ack_cnt]
description="PEM: load instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_store_inst_cnt]
description="PEM: store input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_store_ack_cnt]
description="PEM: store instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pea_inst_cnt]
description="PEA: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pea_ack_cnt]
description="PEA: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.isc_inst_cnt]
description="ISC: input instruction counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.isc_ack_cnt]
description="ISC: instruction acknowledge counter (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
[section.runtime_1in3.register.pem_load_info_0]
description="PEM: load first data)"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_pc0_0","_pc0_1","_pc0_2","_pc0_3","_pc1_0","_pc1_1","_pc1_2","_pc1_3"]
[section.runtime_1in3.register.pem_load_info_1]
description="PEM: load first address"
owner="Kernel"
read_access="Read"
write_access="None"
duplicate=["_pc0_lsb","_pc0_msb","_pc1_lsb","_pc1_msb"]
[section.runtime_1in3.register.pem_store_info_0]
description="PEM: store info 0)"
owner="Kernel"
read_access="Read"
write_access="None"
field.cmd_vld = { size_b=1, offset_b=0 , default={Cst=0}, description="PEM_ST cmd vld"}
field.cmd_rdy = { size_b=1, offset_b=1 , default={Cst=0}, description="PEM_ST cmd rdy"}
field.pem_regf_rd_req_vld = { size_b=1, offset_b=2 , default={Cst=0}, description="PEM_ST pem_regf_rd_req_vld"}
field.pem_regf_rd_req_rdy = { size_b=1, offset_b=3 , default={Cst=0}, description="PEM_ST pem_regf_rd_req_rdy"}
field.brsp_fifo_in_vld = { size_b=4, offset_b=4 , default={Cst=0}, description="PEM_ST brsp_fifo_in_vld"}
field.brsp_fifo_in_rdy = { size_b=4, offset_b=8 , default={Cst=0}, description="PEM_ST brsp_fifo_in_rdy"}
field.rcp_fifo_in_vld = { size_b=4, offset_b=12 , default={Cst=0}, description="PEM_ST rcp_fifo_in_vld"}
field.rcp_fifo_in_rdy = { size_b=4, offset_b=16 , default={Cst=0}, description="PEM_ST rcp_fifo_in_rdy"}
field.r2_axi_vld = { size_b=4, offset_b=20 , default={Cst=0}, description="PEM_ST r2_axi_vld"}
field.r2_axi_rdy = { size_b=4, offset_b=24 , default={Cst=0}, description="PEM_ST r2_axi_rdy"}
field.c0_enough_location = { size_b=4, offset_b=28 , default={Cst=0}, description="PEM_ST c0_enough_location"}
[section.runtime_1in3.register.pem_store_info_1]
description="PEM: store info 1"
owner="Kernel"
read_access="Read"
write_access="None"
field.s0_cmd_vld = { size_b=4, offset_b=0 , default={Cst=0}, description="PEM_ST s0_cmd_vld"}
field.s0_cmd_rdy = { size_b=4, offset_b=4 , default={Cst=0}, description="PEM_ST s0_cmd_rdy"}
field.m_axi_bvalid = { size_b=4, offset_b=8 , default={Cst=0}, description="PEM_ST m_axi_bvalid"}
field.m_axi_bready = { size_b=4, offset_b=12 , default={Cst=0}, description="PEM_ST m_axi_bready"}
field.m_axi_wvalid = { size_b=4, offset_b=16 , default={Cst=0}, description="PEM_ST m_axi_wvalid"}
field.m_axi_wready = { size_b=4, offset_b=20 , default={Cst=0}, description="PEM_ST m_axi_wready"}
field.m_axi_awvalid = { size_b=4, offset_b=24 , default={Cst=0}, description="PEM_ST m_axi_awvalid"}
field.m_axi_awready = { size_b=4, offset_b=28 , default={Cst=0}, description="PEM_ST m_axi_awready"}
[section.runtime_1in3.register.pem_store_info_2]
description="PEM: store info 2"
owner="Kernel"
read_access="Read"
write_access="None"
field.c0_free_loc_cnt = { size_b=16, offset_b=0 , default={Cst=0}, description="PEM_ST c0_free_loc_cnt"}
field.brsp_bresp_cnt = { size_b=16, offset_b=16 , default={Cst=0}, description="PEM_ST brsp_bresp_cnt"}
[section.runtime_1in3.register.pem_store_info_3]
description="PEM: store info 3"
owner="Kernel"
read_access="Read"
write_access="None"
field.brsp_ack_seen = { size_b=16, offset_b=0 , default={Cst=0}, description="PEM_ST brsp_ack_seen"}
field.c0_cmd_cnt = { size_b=8, offset_b=16 , default={Cst=0}, description="PEM_ST c0_cmd_cnt"}

View File

@@ -0,0 +1,100 @@
module_name="hpu_regif_core_prc_3in3"
description="HPU top-level register interface. Used by the host to retrieve design information, and to configure it."
word_size_b = 32
offset = 0x30000
range = 0x10000
ext_pkg = ["axi_if_common_param_pkg", "axi_if_shell_axil_pkg"]
# =====================================================================================================================
[section.entry_prc_3in3]
description="entry_prc_3in3 section with known value used for debug."
offset= 0x0
[section.entry_prc_3in3.register.dummy_val0]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x04040404}
[section.entry_prc_3in3.register.dummy_val1]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x14141414}
[section.entry_prc_3in3.register.dummy_val2]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x24242424}
[section.entry_prc_3in3.register.dummy_val3]
description="RTL version"
owner="Parameter"
read_access="Read"
write_access="None"
default={Cst=0x34343434}
# =====================================================================================================================
[section.status_3in3]
description="HPU status of parts 2in3 and 3in3"
offset= 0x10
[section.status_3in3.register.error]
description="Error register (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.pbs = { size_b=32, offset_b=0 , default={Cst=0}, description="HPU error part 3in3"}
# =====================================================================================================================
[section.bsk_avail]
description="BSK availability configuration"
offset= 0x1000
[section.bsk_avail.register.avail]
description="BSK available bit"
owner="User"
read_access="Read"
write_access="Write"
field.avail = { size_b=1, offset_b=0 , default={Cst=0}, description="avail"}
[section.bsk_avail.register.reset]
description="BSK reset sequence"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
field.request = { size_b=1, offset_b=0 , default={Cst=0}, description="request"}
field.done = { size_b=1, offset_b=31 , default={Cst=0}, description="done"}
# =====================================================================================================================
[section.runtime_3in3]
description="Runtime information"
offset= 0x2000
[section.runtime_3in3.register.pep_load_bsk_rcp_dur]
description="PEP: load BSK slice reception max duration (Could be reset by user)"
owner="Kernel"
read_access="Read"
write_access="WriteNotify"
duplicate=["_pc0","_pc1","_pc2","_pc3","_pc4","_pc5","_pc6","_pc7","_pc8","_pc9","_pc10","_pc11","_pc12","_pc13","_pc14","_pc15"]
[section.runtime_3in3.register.pep_bskif_req_info_0]
description="PEP: BSK_IF: requester info 0"
owner="Kernel"
read_access="Read"
write_access="None"
field.req_br_loop_rp = { size_b=16, offset_b=0 , default={Cst=0}, description="PEP BSK_IF requester BSK read pointer"}
field.req_br_loop_wp = { size_b=16, offset_b=16 , default={Cst=0}, description="PEP BSK_IF requester BSK write pointer"}
[section.runtime_3in3.register.pep_bskif_req_info_1]
description="PEP: BSK_IF: requester info 0"
owner="Kernel"
read_access="Read"
write_access="None"
field.req_prf_br_loop = { size_b=16, offset_b=0 , default={Cst=0}, description="PEP BSK_IF requester BSK prefetch pointer"}
field.req_parity = { size_b=1, offset_b=16 , default={Cst=0}, description="PEP BSK_IF requester BSK pointer parity"}
field.req_assigned = { size_b=1, offset_b=31 , default={Cst=0}, description="PEP BSK_IF requester assignment"}

Binary file not shown.

After

Width:  |  Height:  |  Size: 269 KiB

View File

@@ -0,0 +1,12 @@
This contains a small library to read trace files retrieved from the hardware or the mockup.
To run, please add the lib directory to your PYTHONPATH:
export PYTHONPATH=$(readlink -m ./lib)
Make sure you start from a fresh python virtual environment and install the requirements in
requirements.txt:
python -m venv new_env
source new_env/bin/activate
pip3 install -r requirements.txt

View File

@@ -0,0 +1,28 @@
#!/usr/bin/env python3
from pandas import DataFrame
from isctrace.analysis import Refilled, Retired, Trace
freq_mhz = 300
iops = Trace.from_hw("data/trace.json")
def analyze_iop(iop):
retired = Retired(iop)
# Print the retired instructions as a table
print(retired.to_df().to_string())
# Print a batch latency table
latency_table = retired.pbs_latency_table(freq_mhz=freq_mhz).drop(columns='data')
print(latency_table)
# And the runtime
runtime = retired.runtime_us(freq_mhz=freq_mhz)
print(f"batches: {latency_table['count'].sum()}")
print(f"Runtime: {runtime}us")
if __name__ == "__main__":
analyze_iop(iops[0])
# vim: fdm=marker

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c3701d5e7d53eef6478a1b03a2c8e32cf5d20c1eb6829e754fe1ced4a0a16bed
size 693363

View File

@@ -0,0 +1,4 @@
from . import hw
from . import fmt
from . import analysis
from . import mockup

Some files were not shown because too many files have changed in this diff Show More