Compare commits

..

63 Commits

Author SHA1 Message Date
Gaurav Shukla
4b1a0b43ff [WEB] Remove long prompts support
It removes support to long prompts due to higher lag in loading long prompts.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs>
2022-11-03 18:57:58 +05:30
Gaurav Shukla
099f2160c3 [WEB] fix background color
Signed-Off-by: Gaurav Shukla
2022-11-03 17:36:24 +05:30
Gaurav Shukla
9d2d62dedf [WEB] Add support for long prompts (#467) 2022-11-03 03:27:36 -07:00
Gaurav Shukla
15ed05b221 [WEB] Update the title (#466) 2022-11-02 14:30:03 -07:00
Gaurav Shukla
7c825fc288 [WEB] CSS changes to the web-ui (#465)
This commit updates UI with styling.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-02 12:36:11 -07:00
Gaurav Shukla
88f8718635 [WEB] Load prompts from json
The prompt examples will now be loaded from a json file `prompts.json`.

Signed-Off-by: Gaurav Shukla
2022-11-02 20:52:34 +05:30
Prashant Kumar
a081733a42 Add the clip text shark_model. (#458) 2022-11-02 00:08:33 -07:00
Gaurav Shukla
06ccfb0533 [WEB] Load vae and unet during server start up
The vae and unet models(both fp16 and fp32 variant) can be loaded at
server startup in order to reduce web response time.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-01 23:11:52 +05:30
Gaurav Shukla
b18d75e3f7 [WEB] Use tuned version of UNET fp16
This commit updates SD script in order to use the tuned version of Unet
fp16 model.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-01 19:00:21 +05:30
Quinn Dawkins
3e7efaa048 Switch stable diffusion to the new tuned model (#455) 2022-10-31 15:15:31 -07:00
Gaurav Shukla
a3fdfc81db [WEB] Minor changes in the shark web (#454)
1. Default steps = 50.
2. Live preview will yield intermediate image at every 5 steps.
3. Add logs to .gitignore

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-31 14:29:00 -07:00
Gaurav Shukla
f4c91df1df [WEB] Add pillow dependency (#453)
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-31 12:57:21 -07:00
Prashant Kumar
32e1ba8c0d Adding batch_size support for stable diffusion. 2022-11-01 00:57:52 +05:30
Gaurav Shukla
1939376d72 [WEB] Cache model parameters (#452)
This commit cache some of the model parameters to reduce the response
time of shark web.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-31 11:55:10 -07:00
Gaurav Shukla
25931d48a3 [WEB] Update stable diffusion UI and enable live preview (#447)
This commit enables live preview feature and also updates stable
diffusion web UI.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-31 04:10:15 -07:00
powderluv
024c5e153a Update Windows in README 2022-10-30 22:27:03 -07:00
powderluv
83f34b645d Add Windows instructions 2022-10-30 22:25:42 -07:00
powderluv
3f9f450e0d Add setup_venv.ps1 for windows (#448)
Powershell users can run ./setup_venv.ps1 to setup the env
2022-10-30 22:17:35 -07:00
powderluv
fd89b06641 Drop RDNA1 for now 2022-10-29 14:29:09 -07:00
Gaurav Shukla
f8dc996004 Update vulkan-target-triple for Radeon devices. (#446)
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-29 14:27:20 -07:00
Phaneesh Barwaria
e6a964088b Add os agnostic vulkan device name check (#445) 2022-10-29 13:19:14 -07:00
Gaurav Shukla
e3e767c7eb [WEB] Remove live preview and disable resnet|albert_maskfill
This commit removes live preview feature for now as it's not functional.
This feature will be added in the next patch.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-30 00:37:59 +05:30
Quinn Dawkins
239c19eb12 Update Stable diffusion script to enable use of tuned models (#443) 2022-10-29 01:42:49 -04:00
Eliasj42
7f37599a60 Added a dispatch benchmarking tool (#441)
To produce benchmarks of individual dispatches, you can add --dispatch_benchmarks=All --dispatch_benchmarks_dir=<output_dir> to your command line argument.

Co-authored-by: Elias Joseph <elias@nod-labs.com>
2022-10-28 14:31:03 -07:00
Prashant Kumar
77c9a2c5ea Add profiling vulkan_device info and minor changes to reflect upstream
changes.
2022-10-28 18:02:07 +05:30
Ean Garvey
fd7baae548 Serialize torch-mlir CAPI module as bytecode instead of string. (#435)
* Serialize torch-mlir CAPI as bytecode instead of string.

* Minor fixes to MLIR data handling in SHARK python.
2022-10-27 14:37:15 -05:00
Stanley Winata
01fdf5ee16 [example][SD] compile fp16 with iree-spirv-unify-aliased-resources (#436) 2022-10-27 05:12:28 -07:00
Gaurav Shukla
e52f533c16 [WEB] Save vmfb and add live preview
This commit updates SD script to save the compiled module and also adds
live preview of generated images.

Signed-off-by: Gaurav Shukla<gaurav@nod-labs.com>
2022-10-26 23:20:53 +05:30
Quinn Dawkins
fbd77dc936 Enable iterator space fusion for SD (#432) 2022-10-26 01:08:26 -04:00
Quinn Dawkins
cdc6dd19e3 Force stable diffusion fp16 and fp32 to generate images with similar noise (#431) 2022-10-25 17:28:18 -04:00
PhaneeshB
fd578a48a9 add cli args for vulkan target triple 2022-10-25 21:47:26 +05:30
Ean Garvey
9956099516 Add pytest option for updating tank and fix save_mlir function. (#413)
* Use IREE tf tools to save .mlir modules when generating shark_tank.

* Add option to pytest for enabling auto-updates to local shark tank.

* xfail mobilenet torch on cpu, cuda and fix CI macos setup

* Update test-models.yml to disable macos vulkan CI.
2022-10-25 21:29:18 +05:30
powderluv
f97b8fffed Update README.md 2022-10-24 12:51:49 -07:00
Gaurav Shukla
7b9e309724 [WEB] Expose SD parameters in the web ui (#427) 2022-10-24 04:34:35 -07:00
Quinn Dawkins
1d33913d48 Add option to save and load precompiled flatbuffer (#425) 2022-10-23 16:24:09 -07:00
Prashant Kumar
a48eaaed20 Pass the flags to vae. 2022-10-23 23:57:48 +05:30
Prashant Kumar
2741b8be53 Pass the flags to vae. (#422) 2022-10-23 11:23:13 -07:00
Anush Elangovan
4f906a265c Fix lint 2022-10-22 12:43:52 -07:00
Anush Elangovan
0dff8d7af0 Simple download script to prime the hf model cache 2022-10-21 17:42:05 -07:00
Quinn Dawkins
4f0d0d8167 Update vulkan gui README for iree-vulkan-gui + Stable Diffusion (#399) 2022-10-21 14:02:40 -04:00
Vivek Khandelwal
d513060b21 Add params for Stable Diffusion (#420) 2022-10-21 23:11:09 +05:30
Prashant Kumar
d1a25ce4f3 Update stable_args.py 2022-10-21 17:26:31 +05:30
Gaurav Shukla
51c98695b2 [WEB] Update stable diffusion inference
This commit updates the stable diffusion web incorporating the latest
improvements.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-21 01:26:38 +05:30
Quinn Dawkins
b448770ec2 Add ms/iter timing for stable diffusion script (#414) 2022-10-20 13:32:37 -04:00
Prashant Kumar
5fe22a7980 Minor fix. 2022-10-20 22:57:22 +05:30
Prashant Kumar
38ae6b5af4 Add stable_diffusion fp16 and fp32 with args. 2022-10-20 21:47:11 +05:30
Ean Garvey
0bfe30d75d Fix issues with extra_args in benchmarks, pin tf==2.10 (#411) 2022-10-20 06:55:26 -07:00
Quinn Dawkins
7be1d7d0be Add option for extra arguments through SharkInference.compile (#408) 2022-10-19 15:32:48 -05:00
Prashant Kumar
0d74c873f0 Add stable_diff_f16 version. (#407) 2022-10-19 10:04:24 -07:00
powderluv
139aff2938 Update nightly.yml
fix links
2022-10-18 23:42:22 -07:00
anush elangovan
a3f733490c Force update of packages
Pickup tools from upstream IREE
2022-10-19 05:20:53 +00:00
anush elangovan
8a11f138d1 Update SHARK-Runtime releases page 2022-10-19 05:06:36 +00:00
Ean Garvey
3405607917 (TESTING) Fix .whl assets path (#404) 2022-10-14 12:13:14 -05:00
Ean Garvey
7c99a6bd33 Update README.md (#406) 2022-10-13 20:29:49 -05:00
Ean Garvey
3fba8ce0e6 Update README.md (#405) 2022-10-13 12:43:03 -07:00
Ean Garvey
f3bde3c7fc Cleanup tank directory and move instructions to tank/README.md (#401) 2022-10-13 12:20:02 -05:00
Phaneesh Barwaria
21fee8ef33 enable only one workflow job per branch (#402) 2022-10-13 12:15:30 -05:00
Vivek Khandelwal
0e217d6180 Add Stable Diffusion Img2Img model script 2022-10-13 21:56:46 +05:30
Phaneesh Barwaria
00a8ce75d1 Xfail vulkan tests and Enable MacOs test on CI (#383) 2022-10-13 11:14:41 -05:00
Quinn Dawkins
8f3f00cd99 Add iree-run-module like tool for running in a vulkan session (#398) 2022-10-12 20:46:26 -04:00
Ean Garvey
13bae2538a Update URL for IREE compiler/runtime install (#397)
* Update URL for IREE compiler/runtime install

* Update gh-pages-releases.yml

* Update test_models.py

* Update assets path
2022-10-12 15:47:11 -05:00
Ean Garvey
f508c80c23 Add workflow for GH pages releases and release scraping script. (#394)
* Add workflow for GH pages releases and release scraping script.

* Update test_models.py and change tokens for gh pages.
2022-10-11 22:03:33 -05:00
gpetters94
53df0620e3 Add OPT to tank (#214) 2022-10-11 11:03:56 -05:00
94 changed files with 5940 additions and 881 deletions

37
.github/workflows/gh-pages-releases.yml vendored Normal file
View File

@@ -0,0 +1,37 @@
# See: https://github.com/llvm/torch-mlir/issues/1374
name: Publish releases page
on:
workflow_dispatch:
jobs:
scrape_and_publish_releases:
name: "Scrape and publish releases"
runs-on: ubuntu-latest
# Don't run this in everyone's forks.
if: github.repository == 'nod-ai/SHARK'
steps:
- name: Checking out repository
uses: actions/checkout@v2
with:
token: ${{ secrets.NODAI_INVOCATION_TOKEN }}
- name: Run scrape releases script
run: python ./build_tools/scrape_releases.py nod-ai SHARK > /tmp/index.html
shell: bash
- run: git fetch --all
- run: git switch github-pages
- run: git config --global user.email "none@none.com"
- run: git config --global user.name "nod-ai"
- run: mv /tmp/index.html package-index/index.html
- run: git add package-index/index.html
# Only try to make a commit if the file has changed.
- run: git diff --cached --exit-code || git commit -m "Update releases."
- name: GitHub Push
uses: ad-m/github-push-action@v0.6.0
with:
github_token: ${{ secrets.NODAI_INVOCATION_TOKEN }}
branch: github-pages

View File

@@ -39,6 +39,10 @@ jobs:
tag_name="${package_version}"
echo "package_version=${package_version}" >> $GITHUB_ENV
echo "tag_name=${tag_name}" >> $GITHUB_ENV
- name: Set Environment Variables
run: |
echo "SHORT_SHA=`git rev-parse --short=4 HEAD`" >> $GITHUB_ENV
echo "DATE=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Create Release
id: create_release
uses: actions/create-release@v1
@@ -51,17 +55,11 @@ jobs:
Automatic snapshot release of nod.ai SHARK.
draft: true
prerelease: false
- name: Find Torch-MLIR Release
run: |
TM_HTML_URL="$(python3 -c "import urllib.request, json, sys; u=json.loads(urllib.request.urlopen('https://api.github.com/repos/llvm/torch-mlir/releases/latest').read().decode()).get('html_url', False); print(u) if u else sys.exit(1);")"
TM_RELEASE_DIR=${TM_HTML_URL/"tag"/"expanded_assets"}
echo "TM_RELEASE_DIR=${TM_RELEASE_DIR}" >> $GITHUB_ENV
- name: Install dependencies
run: |
echo "Torch-MLIR Release DIR is ${{ env.TM_RELEASE_DIR }}"
python -m pip install --upgrade pip
python -m pip install flake8 pytest toml
if [ -f requirements.txt ]; then pip install -r requirements.txt -f ${{ env.TM_RELEASE_DIR }} -f https://github.com/nod-ai/SHARK-Runtime/releases; fi
if [ -f requirements.txt ]; then pip install -r requirements.txt -f https://llvm.github.io/torch-mlir/package-index/ -f https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
@@ -76,19 +74,19 @@ jobs:
source iree.venv/bin/activate
package_version="$(printf '%(%Y%m%d)T.${{ github.run_number }}')"
SHARK_PACKAGE_VERSION=${package_version} \
pip wheel -v -w wheelhouse . --pre -f https://download.pytorch.org/whl/nightly/torch -f ${{ env.TM_RELEASE_DIR }} -f https://github.com/iree-org/iree/releases
pip wheel -v -w wheelhouse . --pre -f https://download.pytorch.org/whl/nightly/torch -f https://llvm.github.io/torch-mlir/package-index/ -f https://iree-org.github.io/iree/pip-release-links.html
# Install the built wheel
pip install ./wheelhouse/nodai*
# Validate the Models
/bin/bash "$GITHUB_WORKSPACE/build_tools/populate_sharktank_ci.sh"
pytest tank/test_models.py |
pytest --ci --ci_sha=${SHORT_SHA} --local_tank_cache="./gen_shark_tank/" tank/test_models.py |
tail -n 1 |
tee -a pytest_results.txt
if !(grep -Fxq " failed" pytest_results.txt)
then
export SHA=$(git log -1 --format='%h')
gsutil -m cp -r $GITHUB_WORKSPACE/gen_shark_tank/* gs://shark_tank/$SHA
gsutil -m cp -r gs://shark_tank/$SHA/* gs://shark_tank/latest/
gsutil -m cp -r $GITHUB_WORKSPACE/gen_shark_tank/* gs://shark_tank/${DATE}_$SHA
gsutil -m cp -r gs://shark_tank/${DATE}_$SHA/* gs://shark_tank/latest/
fi
rm -rf ./wheelhouse/nodai*
@@ -100,17 +98,14 @@ jobs:
source shark.venv/bin/activate
package_version="$(printf '%(%Y%m%d)T.${{ github.run_number }}')"
SHARK_PACKAGE_VERSION=${package_version} \
pip wheel -v -w wheelhouse . --pre -f https://download.pytorch.org/whl/nightly/torch -f ${{ env.TM_RELEASE_DIR }} -f https://github.com/nod-ai/SHARK-Runtime/releases
pip wheel -v -w wheelhouse . --pre -f https://download.pytorch.org/whl/nightly/torch -f https://llvm.github.io/torch-mlir/package-index/ -f https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html
# Install the built wheel
pip install ./wheelhouse/nodai*
# Validate the Models
pytest tank/test_models.py |
pytest --ci --ci_sha=${SHORT_SHA} tank/test_models.py |
tail -n 1 |
tee -a pytest_results.txt
publish:
runs-on: a100
needs: build
steps:
- name: Upload Release Assets
if: ${{ matrix.backend == 'SHARK' }}
id: upload-release-assets
@@ -119,7 +114,7 @@ jobs:
GITHUB_TOKEN: ${{ secrets.NODAI_INVOCATION_TOKEN }}
with:
release_id: ${{ steps.create_release.outputs.id }}
assets_path: ${GITHUB_WORKSPACE}/wheelhouse/nodai_*.whl
assets_path: ./wheelhouse/nodai_*.whl
- name: Publish Release
if: ${{ matrix.backend == 'SHARK' }}

View File

@@ -10,6 +10,14 @@ on:
branches: [ main ]
workflow_dispatch:
# Ensure that only a single job or workflow using the same
# concurrency group will run at a time. This would cancel
# any in-progress jobs in the same github workflow and github
# ref (e.g. refs/heads/main or refs/pull/<pr_number>/merge).
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build-validate:
strategy:
@@ -28,12 +36,12 @@ jobs:
suite: cuda
- os: ubuntu-latest
suite: cpu
- os: MacStudio
suite: vulkan
- os: MacStudio
suite: cuda
- os: MacStudio
suite: cpu
- os: MacStudio
suite: vulkan
- os: icelake
suite: vulkan
- os: icelake
@@ -90,7 +98,7 @@ jobs:
cd $GITHUB_WORKSPACE
PYTHON=python${{ matrix.python-version }} BENCHMARK=1 IMPORTER=1 ./setup_venv.sh
source shark.venv/bin/activate
pytest --benchmark --ci --ci_sha=${SHORT_SHA} --local_tank_cache="/data/anush" tank/test_models.py -k cpu
pytest --benchmark --ci --ci_sha=${SHORT_SHA} -s --local_tank_cache="/data/anush/shark_cache" tank/test_models.py -k cpu --update_tank
gsutil cp ./bench_results.csv gs://shark-public/builder/bench_results/${DATE}/bench_results_cpu_${SHORT_SHA}.csv
gsutil cp gs://shark-public/builder/bench_results/${DATE}/bench_results_cpu_${SHORT_SHA}.csv gs://shark-public/builder/bench_results/latest/bench_results_cpu_latest.csv
@@ -100,14 +108,29 @@ jobs:
cd $GITHUB_WORKSPACE
PYTHON=python${{ matrix.python-version }} BENCHMARK=1 IMPORTER=1 ./setup_venv.sh
source shark.venv/bin/activate
pytest --benchmark --ci --ci_sha=${SHORT_SHA} --local_tank_cache="/data/anush" tank/test_models.py -k cuda
pytest --benchmark --ci --ci_sha=${SHORT_SHA} -s --local_tank_cache="/data/anush/shark_cache" tank/test_models.py -k cuda --update_tank
gsutil cp ./bench_results.csv gs://shark-public/builder/bench_results/${DATE}/bench_results_cuda_${SHORT_SHA}.csv
gsutil cp gs://shark-public/builder/bench_results/${DATE}/bench_results_cuda_${SHORT_SHA}.csv gs://shark-public/builder/bench_results/latest/bench_results_cuda_latest.csv
- name: Validate Vulkan Models
if: matrix.suite == 'vulkan'
- name: Validate Vulkan Models (MacOS)
if: matrix.suite == 'vulkan' && matrix.os == 'MacStudio'
run: |
cd $GITHUB_WORKSPACE
PYTHON=python${{ matrix.python-version }} IMPORTER=1 ./setup_venv.sh
source shark.venv/bin/activate
echo "VULKAN SDK PATH wo setup: $VULKAN_SDK"
cd /Users/anush/VulkanSDK/1.3.224.1/
source setup-env.sh
cd $GITHUB_WORKSPACE
echo "VULKAN SDK PATH with setup: $VULKAN_SDK"
echo $PATH
pip list | grep -E "torch|iree"
pytest --ci --ci_sha=${SHORT_SHA} --local_tank_cache="/Volumes/builder/anush/shark_cache" tank/test_models.py -k vulkan --update_tank
- name: Validate Vulkan Models (a100)
if: matrix.suite == 'vulkan' && matrix.os != 'MacStudio'
run: |
cd $GITHUB_WORKSPACE
PYTHON=python${{ matrix.python-version }} BENCHMARK=1 IMPORTER=1 ./setup_venv.sh
source shark.venv/bin/activate
pytest --ci --ci_sha=${SHORT_SHA} --local_tank_cache="/data/anush" tank/test_models.py -k vulkan
pytest --ci --ci_sha=${SHORT_SHA} -s --local_tank_cache="/data/anush/shark_cache" tank/test_models.py -k vulkan --update_tank

4
.gitignore vendored
View File

@@ -167,3 +167,7 @@ shark_tmp/
# ORT related artefacts
cache_models/
onnx_models/
#web logging
web/logs/
web/stored_results/stable_diffusion/

307
README.md
View File

@@ -14,16 +14,16 @@ High Performance Machine Learning and Data Analytics for CPUs, GPUs, Accelerator
## Installation
<details>
<summary>Installation (Linux and macOS)</summary>
<summary>Installation (Linux, macOS and Windows)</summary>
### Setup a new pip Virtual Environment
This step sets up a new VirtualEnv for Python
```shell
python --version #Check you have 3.7->3.10 on Linux or 3.10 on macOS
python --version #Check you have 3.10 on Linux, macOS or Windows Powershell
python -m venv shark_venv
source shark_venv/bin/activate
source shark_venv/bin/activate # Use shark_venv/Scripts/activate on Windows
# If you are using conda create and activate a new conda env
@@ -38,9 +38,14 @@ python -m pip install --upgrade pip
This step pip installs SHARK and related packages on Linux Python 3.7, 3.8, 3.9, 3.10 and macOS Python 3.10
```shell
pip install nodai-shark -f https://github.com/nod-ai/SHARK/releases -f https://github.com/llvm/torch-mlir/releases -f https://github.com/nod-ai/shark-runtime/releases --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install nodai-shark -f https://nod-ai.github.io/SHARK/package-index/ -f https://llvm.github.io/torch-mlir/package-index/ -f https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html --extra-index-url https://download.pytorch.org/whl/nightly/cpu
```
If you are on an Intel macOS machine you need this [workaround](https://github.com/nod-ai/SHARK/issues/102) for an upstream issue.
### Run shark tank model tests.
```shell
pytest tank/test_models.py
```
See tank/README.md for a more detailed walkthrough of our pytest suite and CLI.
### Download and run Resnet50 sample
@@ -71,14 +76,41 @@ git clone https://github.com/nod-ai/SHARK.git
```
## Setup your Python VirtualEnvironment and Dependencies
### Windows Users
```shell
# Setup venv and install necessary packages (torch-mlir, nodLabs/Shark, ...).
# Requires Python 3.10 and Powershell
./setup_venv.ps1
shark.venv/Scripts/activate
```
### Linux / macOS Users
```shell
# Setup venv and install necessary packages (torch-mlir, nodLabs/Shark, ...).
./setup_venv.sh
source shark.venv/bin/activate
```
For example if you want to use Python3.10 and upstream IREE with TF Import tools you can use the environment variables like:
### Run a demo script
```shell
python -m shark.examples.shark_inference.resnet50_script --device="cpu" # Use gpu | vulkan
# Or a pytest
pytest tank/test_models.py -k "MiniLM"
```
# PYTHON=python3.10 VENV_DIR=0617_venv IMPORTER=1 USE_IREE=1 ./setup_venv.sh
</details>
<details>
<summary>Development, Testing and Benchmarks</summary>
If you want to use Python3.10 and with TF Import tools you can use the environment variables like:
Set `USE_IREE=1` to use upstream IREE
```
# PYTHON=python3.10 VENV_DIR=0617_venv IMPORTER=1 ./setup_venv.sh
```
If you are a *Torch-mlir developer or an IREE developer* and want to test local changes you can uninstall
@@ -102,82 +134,38 @@ for Torch-MLIR.
```
Now the SHARK will use your locally build Torch-MLIR repo.
### Run a demo script
```shell
python -m shark.examples.shark_inference.resnet50_script --device="cpu" # Use gpu | vulkan
# Or a pytest
pytest tank/test_models.py -k "MiniLM"
## Benchmarking Dispatches
To produce benchmarks of individual dispatches, you can add `--dispatch_benchmarks=All --dispatch_benchmarks_dir=<output_dir>` to your command line argument.
If you only want to compile specific dispatches, you can specify them with a space seperated string instead of `"All"`. E.G. `--dispatch_benchmarks="0 1 2 10"`
if you want to instead incorporate this into a python script, you can pass the `dispatch_benchmarks` and `dispatch_benchmarks_dir` commands when initializing `SharkInference`, and the benchmarks will be generated when compiled. E.G:
```
shark_module = SharkInference(
mlir_model,
func_name,
device=args.device,
mlir_dialect="tm_tensor",
dispatch_benchmarks="all",
dispatch_benchmarks_dir="results"
)
```
Output will include:
- Inside the specified directory, there will be a directory for each dispatch (there will be mlir files for all dispatches, but only compiled binaries and benchmark data for the specified dispatches)
- An .mlir file containing the dispatch benchmark
- A compiled .vmfb file containing the dispatch benchmark
- An .mlir file containing just the hal executable
- A compiled .vmfb file of the hal executable
- A .txt file containing benchmark output
See tank/README.md for instructions on how to run model tests and benchmarks from the SHARK tank.
</details>
<details>
<summary>Testing and Benchmarks</summary>
### Run all model tests on CPU/GPU/VULKAN/Metal
```shell
pytest tank/test_models.py
# If on Linux for multithreading on CPU (faster results):
pytest tank/test_models.py -n auto
```
### Running specific tests
```shell
# Search for test cases by including a keyword that matches all or part of the test case's name;
pytest tank/test_models.py -k "keyword"
# Test cases are named uniformly by format test_module_<model_name_underscores_only>_<torch/tf>_<static/dynamic>_<device>.
# Example: Test all models on nvidia gpu:
pytest tank/test_models.py -k "cuda"
# Example: Test all tensorflow resnet models on Vulkan backend:
pytest tank/test_models.py -k "resnet and tf and vulkan"
# Exclude a test case:
pytest tank/test_models.py -k "not ..."
### Run benchmarks on SHARK tank pytests and generate bench_results.csv with results.
(the following requires source installation with `IMPORTER=1 ./setup_venv.sh`)
```shell
pytest --benchmark tank/test_models.py
# Just do static GPU benchmarks for PyTorch tests:
pytest --benchmark tank/test_models.py -k "pytorch and static and cuda"
```
### Benchmark Resnet50, MiniLM on CPU
(requires source installation with `IMPORTER=1 ./setup_venv.sh`)
```shell
# We suggest running the following commands as root before running benchmarks on CPU:
cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | awk -F, '{print $2}' | sort -n | uniq | ( while read X ; do echo $X ; echo 0 > /sys/devices/system/cpu/cpu$X/online ; done )
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
# Benchmark canonical Resnet50 on CPU via pytest
pytest --benchmark tank/test_models -k "resnet50 and tf_static_cpu"
# Benchmark canonical MiniLM on CPU via pytest
pytest --benchmark tank/test_models -k "MiniLM and cpu"
# Benchmark MiniLM on CPU via transformer-benchmarks:
git clone --recursive https://github.com/nod-ai/transformer-benchmarks.git
cd transformer-benchmarks
./perf-ci.sh -n
# Check detail.csv for MLIR/IREE results.
```
</details>
<details>
<summary>API Reference</summary>
@@ -228,160 +216,21 @@ result = shark_module.forward((arg0, arg1))
```
</details>
## Supported and Validated Models
<details>
<summary>PyTorch Models</summary>
SHARK is maintained to support the latest innovations in ML Models:
### Huggingface PyTorch Models
| TF HuggingFace Models | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------|----------|-------------|
| BERT | :green_heart: | :green_heart: | :green_heart: |
| DistilBERT | :green_heart: | :green_heart: | :green_heart: |
| GPT2 | :green_heart: | :green_heart: | :green_heart: |
| BLOOM | :green_heart: | :green_heart: | :green_heart: |
| Stable Diffusion | :green_heart: | :green_heart: | :green_heart: |
| Vision Transformer | :green_heart: | :green_heart: | :green_heart: |
| ResNet50 | :green_heart: | :green_heart: | :green_heart: |
| Hugging Face Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| Albert | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| BigBird | :green_heart: (AOT) | | | |
| DistilBERT | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| GPT2 | :broken_heart: (AOT) | | | |
| MobileBert | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
### Torchvision Models
| TORCHVISION Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|--------------------|----------------------|----------|----------|-------------|
| AlexNet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| DenseNet121 | :green_heart: (Script) | | | |
| MNasNet1_0 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| MobileNetV2 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| MobileNetV3 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Unet | :broken_heart: (Script) | | | |
| Resnet18 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnet50 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnet101 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnext50_32x4d | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| ShuffleNet_v2 | :broken_heart: (Script) | | | |
| SqueezeNet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| EfficientNet | :green_heart: (Script) | | | |
| Regnet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnest | :broken_heart: (Script) | | | |
| Vision Transformer | :green_heart: (Script) | | | |
| VGG 16 | :green_heart: (Script) | :green_heart: | :green_heart: | |
| Wide Resnet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| RAFT | :broken_heart: (JIT) | | | |
For more information refer to [MODEL TRACKING SHEET](https://docs.google.com/spreadsheets/d/15PcjKeHZIrB5LfDyuw7DGEEE8XnQEX2aX8lm8qbxV8A/edit#gid=0)
### PyTorch Training Models
| Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :broken_heart: | :broken_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
</details>
<details>
<summary>JAX Models</summary>
### JAX Models
| Models | JAX-MHLO lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| DALL-E | :broken_heart: | :broken_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
</details>
<details>
<summary>TFLite Models</summary>
### TFLite Models
| Models | TOSA/LinAlg | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :broken_heart: | :broken_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
| albert | :green_heart: | :green_heart: | | |
| asr_conformer | :green_heart: | :green_heart: | | |
| bird_classifier | :green_heart: | :green_heart: | | |
| cartoon_gan | :green_heart: | :green_heart: | | |
| craft_text | :green_heart: | :green_heart: | | |
| deeplab_v3 | :green_heart: | :green_heart: | | |
| densenet | :green_heart: | :green_heart: | | |
| east_text_detector | :green_heart: | :green_heart: | | |
| efficientnet_lite0_int8 | :green_heart: | :green_heart: | | |
| efficientnet | :green_heart: | :green_heart: | | |
| gpt2 | :green_heart: | :green_heart: | | |
| image_stylization | :green_heart: | :green_heart: | | |
| inception_v4 | :green_heart: | :green_heart: | | |
| inception_v4_uint8 | :green_heart: | :green_heart: | | |
| lightning_fp16 | :green_heart: | :green_heart: | | |
| lightning_i8 | :green_heart: | :green_heart: | | |
| lightning | :green_heart: | :green_heart: | | |
| magenta | :green_heart: | :green_heart: | | |
| midas | :green_heart: | :green_heart: | | |
| mirnet | :green_heart: | :green_heart: | | |
| mnasnet | :green_heart: | :green_heart: | | |
| mobilebert_edgetpu_s_float | :green_heart: | :green_heart: | | |
| mobilebert_edgetpu_s_quant | :green_heart: | :green_heart: | | |
| mobilebert | :green_heart: | :green_heart: | | |
| mobilebert_tf2_float | :green_heart: | :green_heart: | | |
| mobilebert_tf2_quant | :green_heart: | :green_heart: | | |
| mobilenet_ssd_quant | :green_heart: | :green_heart: | | |
| mobilenet_v1 | :green_heart: | :green_heart: | | |
| mobilenet_v1_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v2_int8 | :green_heart: | :green_heart: | | |
| mobilenet_v2 | :green_heart: | :green_heart: | | |
| mobilenet_v2_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v3-large | :green_heart: | :green_heart: | | |
| mobilenet_v3-large_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v35-int8 | :green_heart: | :green_heart: | | |
| nasnet | :green_heart: | :green_heart: | | |
| person_detect | :green_heart: | :green_heart: | | |
| posenet | :green_heart: | :green_heart: | | |
| resnet_50_int8 | :green_heart: | :green_heart: | | |
| rosetta | :green_heart: | :green_heart: | | |
| spice | :green_heart: | :green_heart: | | |
| squeezenet | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v1 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v1_uint8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_fpnlite | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_fpnlite_uint8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_int8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2 | :green_heart: | :green_heart: | | |
| ssd_spaghettinet_large | :green_heart: | :green_heart: | | |
| ssd_spaghettinet_large_uint8 | :green_heart: | :green_heart: | | |
| visual_wake_words_i8 | :green_heart: | :green_heart: | | |
</details>
<details>
<summary>TF Models</summary>
### Tensorflow Models (Inference)
| Hugging Face Models | tf-mhlo lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| albert-base-v2 | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| DistilBERT | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| CamemBert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| ConvBert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| Deberta | | | | |
| electra | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| funnel | | | | |
| layoutlm | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| longformer | | | | |
| mobile-bert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| remembert | | | | |
| tapas | | | | |
| flaubert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| roberta | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| xlm-roberta | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| mpnet | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
</details>
For a complete list of the models supported in SHARK, please refer to [tank/README.md](https://github.com/nod-ai/SHARK/blob/main/tank/README.md).
## Related Projects

View File

@@ -0,0 +1,37 @@
"""Scrapes the github releases API to generate a static pip-install-able releases page.
See https://github.com/llvm/torch-mlir/issues/1374
"""
import argparse
import json
import requests
# Parse arguments
parser = argparse.ArgumentParser()
parser.add_argument("owner", type=str)
parser.add_argument("repo", type=str)
args = parser.parse_args()
# Get releases
response = requests.get(
f"https://api.github.com/repos/{args.owner}/{args.repo}/releases"
)
body = json.loads(response.content)
# Parse releases
releases = []
for row in body:
for asset in row["assets"]:
releases.append((asset["name"], asset["browser_download_url"]))
# Output HTML
html = """<!DOCTYPE html>
<html>
<body>
"""
for name, url in releases:
html += f" <a href='{url}'>{name}</a><br />\n"
html += """ </body>
</html>"""
print(html)

View File

@@ -36,6 +36,12 @@ def pytest_addoption(parser):
default="False",
help="Enables uploading of reproduction artifacts upon test case failure during iree-compile or validation. Must be passed with --ci_sha option ",
)
parser.addoption(
"--update_tank",
action="store_true",
default="False",
help="Update local shark tank with latest artifacts.",
)
parser.addoption(
"--ci_sha",
action="store",

3
cpp/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
*.mlir
*.vmfb
*.ini

View File

@@ -54,5 +54,29 @@ python -m pip install tensorflow
*Run the vulkan_gui*
```bash
./build/vulkan_gui/iree-samples-vulkan-gui
./build/vulkan_gui/iree-samples-resnet-vulkan-gui
```
## Other models
A tool for benchmarking other models is built and can be invoked with a command like the following
```bash
./build/vulkan_gui/iree-vulkan-gui --module-file=path/to/.vmfb --function_input=...
```
see `./build/vulkan_gui/iree-vulkan-gui --help` for an explanation on the function input. For example, stable diffusion unet can be tested with the following commands:
```bash
wget https://storage.googleapis.com/shark_tank/quinn/stable_diff_tf/stable_diff_tf.mlir
iree-compile --iree-input-type=mhlo --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvm-target-cpu-features=host -iree-vulkan-target-triple=rdna2-unknown-linux --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 stable_diff_tf.mlir -o stable_diff_tf.vmfb
./build/vulkan_gui/iree-vulkan-gui --module-file=stable_diff_tf.vmfb --function_input=2x4x64x64xf32 --function_input=1xf32 --function_input=2x77x768xf32
```
VAE and Autoencoder are also available
```bash
# VAE
wget https://storage.googleapis.com/shark_tank/quinn/stable_diff_tf/vae_tf/vae.mlir
iree-compile --iree-input-type=mhlo --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvm-target-cpu-features=host -iree-vulkan-target-triple=rdna2-unknown-linux --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 vae.mlir -o vae.vmfb
./build/vulkan_gui/iree-vulkan-gui --module-file=stable_diff_tf.vmfb --function_input=1x4x64x64xf32
# CLIP Autoencoder
wget https://storage.googleapis.com/shark_tank/quinn/stable_diff_tf/clip_tf/clip_autoencoder.mlir
iree-compile --iree-input-type=mhlo --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=vulkan --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvm-target-cpu-features=host -iree-vulkan-target-triple=rdna2-unknown-linux --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64 clip_autoencoder.mlir -o clip_autoencoder.vmfb
./build/vulkan_gui/iree-vulkan-gui --module-file=stable_diff_tf.vmfb --function_input=1x77xi32 --function_input=1x77xi32
```

View File

@@ -40,45 +40,77 @@ set(IMGUI_DIR ${CMAKE_BINARY_DIR}/_deps/imgui-src)
message("Looking for Imgui in ${IMGUI_DIR}")
include_directories(${IMGUI_DIR} ${IMGUI_DIR}/backends ..)
# Define the sample executable.
set(_NAME "iree-samples-vulkan-gui")
add_executable(${_NAME} "")
target_sources(${_NAME}
PRIVATE
vulkan_inference_gui.cc
"${IMGUI_DIR}/backends/imgui_impl_sdl.cpp"
"${IMGUI_DIR}/backends/imgui_impl_vulkan.cpp"
"${IMGUI_DIR}/imgui.cpp"
"${IMGUI_DIR}/imgui_draw.cpp"
"${IMGUI_DIR}/imgui_demo.cpp"
"${IMGUI_DIR}/imgui_tables.cpp"
"${IMGUI_DIR}/imgui_widgets.cpp"
)
set_target_properties(${_NAME} PROPERTIES OUTPUT_NAME "iree-samples-vulkan-gui")
target_include_directories(${_NAME} PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
)
target_link_libraries(${_NAME}
SDL2::SDL2
Vulkan::Vulkan
iree_runtime_runtime
iree_base_internal_main
iree_hal_drivers_vulkan_registration_registration
iree_modules_hal_hal
iree_vm_vm
iree_vm_bytecode_module
iree_vm_cc
function(iree_vulkan_sample)
cmake_parse_arguments(
_RULE
""
"NAME"
"SRCS"
${ARGN}
)
# Define the sample executable.
set(_NAME "${_RULE_NAME}")
set(SRCS "${_RULE_SRCS}")
add_executable(${_NAME} "")
target_sources(${_NAME}
PRIVATE
${SRCS}
"${IMGUI_DIR}/backends/imgui_impl_sdl.cpp"
"${IMGUI_DIR}/backends/imgui_impl_vulkan.cpp"
"${IMGUI_DIR}/imgui.cpp"
"${IMGUI_DIR}/imgui_draw.cpp"
"${IMGUI_DIR}/imgui_demo.cpp"
"${IMGUI_DIR}/imgui_tables.cpp"
"${IMGUI_DIR}/imgui_widgets.cpp"
)
set_target_properties(${_NAME} PROPERTIES OUTPUT_NAME "${_NAME}")
target_include_directories(${_NAME} PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
)
target_link_libraries(${_NAME}
SDL2::SDL2
Vulkan::Vulkan
iree_runtime_runtime
iree_base_internal_main
iree_hal_drivers_vulkan_registration_registration
iree_modules_hal_hal
iree_vm_vm
iree_vm_bytecode_module
iree_vm_cc
iree_tooling_vm_util_cc
iree_tooling_context_util
)
if(${CMAKE_SYSTEM_NAME} STREQUAL "Windows")
set(_GUI_LINKOPTS "-SUBSYSTEM:CONSOLE")
else()
set(_GUI_LINKOPTS "")
endif()
target_link_options(${_NAME}
PRIVATE
${_GUI_LINKOPTS}
)
endfunction()
iree_vulkan_sample(
NAME
iree-samples-resnet-vulkan-gui
SRCS
vulkan_resnet_inference_gui.cc
)
if(${CMAKE_SYSTEM_NAME} STREQUAL "Windows")
set(_GUI_LINKOPTS "-SUBSYSTEM:CONSOLE")
else()
set(_GUI_LINKOPTS "")
endif()
iree_vulkan_sample(
NAME
iree-vulkan-gui
target_link_options(${_NAME}
PRIVATE
${_GUI_LINKOPTS}
SRCS
vulkan_inference_gui.cc
)
message(STATUS "Configured vulkan_gui sample successfully")

View File

@@ -18,6 +18,12 @@
#include <set>
#include <vector>
#include <fstream>
#include <array>
#include <cstdio>
#include <cstdlib>
#include <iterator>
#include <string>
#include <utility>
#include "iree/hal/drivers/vulkan/api.h"
@@ -30,6 +36,15 @@
#include "iree/vm/bytecode_module.h"
#include "iree/vm/ref_cc.h"
// iree-run-module
#include "iree/base/internal/flags.h"
#include "iree/base/status_cc.h"
#include "iree/base/tracing.h"
#include "iree/modules/hal/types.h"
#include "iree/tooling/comparison.h"
#include "iree/tooling/context_util.h"
#include "iree/tooling/vm_util_cc.h"
// Other dependencies (helpers, etc.)
#include "iree/base/internal/main.h"
@@ -38,6 +53,49 @@
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
IREE_FLAG(string, entry_function, "",
"Name of a function contained in the module specified by module_file "
"to run.");
// TODO(benvanik): move --function_input= flag into a util.
static iree_status_t parse_function_io(iree_string_view_t flag_name,
void* storage,
iree_string_view_t value) {
auto* list = (std::vector<std::string>*)storage;
list->push_back(std::string(value.data, value.size));
return iree_ok_status();
}
static void print_function_io(iree_string_view_t flag_name, void* storage,
FILE* file) {
auto* list = (std::vector<std::string>*)storage;
if (list->empty()) {
fprintf(file, "# --%.*s=\n", (int)flag_name.size, flag_name.data);
} else {
for (size_t i = 0; i < list->size(); ++i) {
fprintf(file, "--%.*s=\"%s\"\n", (int)flag_name.size, flag_name.data,
list->at(i).c_str());
}
}
}
static std::vector<std::string> FLAG_function_inputs;
IREE_FLAG_CALLBACK(
parse_function_io, print_function_io, &FLAG_function_inputs, function_input,
"An input (a) value or (b) buffer of the format:\n"
" (a) scalar value\n"
" value\n"
" e.g.: --function_input=\"3.14\"\n"
" (b) buffer:\n"
" [shape]xtype=[value]\n"
" e.g.: --function_input=\"2x2xi32=1 2 3 4\"\n"
"Optionally, brackets may be used to separate the element values:\n"
" 2x2xi32=[[1 2][3 4]]\n"
"Raw binary files can be read to provide buffer contents:\n"
" 2x2xi32=@some/file.bin\n"
"numpy npy files (from numpy.save) can be read to provide 1+ values:\n"
" @some.npy\n"
"Each occurrence of the flag indicates an input in the order they were\n"
"specified on the command line.");
typedef struct iree_file_toc_t {
const char* name; // the file's original name
char* data; // beginning of the file
@@ -87,225 +145,6 @@ static void check_vk_result(VkResult err) {
abort();
}
// Helper function to find Vulkan memory type bits. See ImGui_ImplVulkan_MemoryType() in imgui_impl_vulkan.cpp
uint32_t findMemoryType(uint32_t type_filter, VkMemoryPropertyFlags properties)
{
VkPhysicalDeviceMemoryProperties mem_properties;
vkGetPhysicalDeviceMemoryProperties(g_PhysicalDevice, &mem_properties);
for (uint32_t i = 0; i < mem_properties.memoryTypeCount; i++)
{
if ((type_filter & (1 << i)) && (mem_properties.memoryTypes[i].propertyFlags & properties) == properties)
{
return i;
}
}
return 0xFFFFFFFF; // Unable to find memoryType
}
// Helper function to load an image with common settings and return a VkDescriptorSet as a sort of Vulkan pointer
bool LoadTextureFromFile(const char* filename, VkDescriptorSet* img_ds, int* image_width, int* image_height)
{
// Specifying 4 channels forces stb to load the image in RGBA which is an easy format for Vulkan
int image_channels = 4;
unsigned char* image_data = stbi_load(filename, image_width, image_height, 0, image_channels);
if (image_data == NULL)
{
return false;
}
// Calculate allocation size (in number of bytes)
size_t image_size = (*image_width)*(*image_height)*image_channels;
VkResult err;
// Create the Vulkan image.
VkImage texture_image;
VkDeviceMemory texture_image_memory;
{
VkImageCreateInfo info = {};
info.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
info.imageType = VK_IMAGE_TYPE_2D;
info.format = VK_FORMAT_R8G8B8A8_UNORM;
info.extent.width = *image_width;
info.extent.height = *image_height;
info.extent.depth = 1;
info.mipLevels = 1;
info.arrayLayers = 1;
info.samples = VK_SAMPLE_COUNT_1_BIT;
info.tiling = VK_IMAGE_TILING_OPTIMAL;
info.usage = VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT;
info.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
info.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
err = vkCreateImage(g_Device, &info, g_Allocator, &texture_image);
check_vk_result(err);
VkMemoryRequirements req;
vkGetImageMemoryRequirements(g_Device, texture_image, &req);
VkMemoryAllocateInfo alloc_info = {};
alloc_info.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
alloc_info.allocationSize = req.size;
alloc_info.memoryTypeIndex = findMemoryType(req.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
err = vkAllocateMemory(g_Device, &alloc_info, g_Allocator, &texture_image_memory);
check_vk_result(err);
err = vkBindImageMemory(g_Device, texture_image, texture_image_memory, 0);
check_vk_result(err);
}
// Create the Image View
VkImageView image_view;
{
VkImageViewCreateInfo info = {};
info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
info.image = texture_image;
info.viewType = VK_IMAGE_VIEW_TYPE_2D;
info.format = VK_FORMAT_R8G8B8A8_UNORM;
info.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
info.subresourceRange.levelCount = 1;
info.subresourceRange.layerCount = 1;
err = vkCreateImageView(g_Device, &info, g_Allocator, &image_view);
check_vk_result(err);
}
// Create Sampler
VkSampler sampler;
{
VkSamplerCreateInfo sampler_info{};
sampler_info.sType = VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO;
sampler_info.magFilter = VK_FILTER_LINEAR;
sampler_info.minFilter = VK_FILTER_LINEAR;
sampler_info.mipmapMode = VK_SAMPLER_MIPMAP_MODE_LINEAR;
sampler_info.addressModeU = VK_SAMPLER_ADDRESS_MODE_REPEAT; // outside image bounds just use border color
sampler_info.addressModeV = VK_SAMPLER_ADDRESS_MODE_REPEAT;
sampler_info.addressModeW = VK_SAMPLER_ADDRESS_MODE_REPEAT;
sampler_info.minLod = -1000;
sampler_info.maxLod = 1000;
sampler_info.maxAnisotropy = 1.0f;
err = vkCreateSampler(g_Device, &sampler_info, g_Allocator, &sampler);
check_vk_result(err);
}
// Create Descriptor Set using ImGUI's implementation
*img_ds = ImGui_ImplVulkan_AddTexture(sampler, image_view, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);
// Create Upload Buffer
VkBuffer upload_buffer;
VkDeviceMemory upload_buffer_memory;
{
VkBufferCreateInfo buffer_info = {};
buffer_info.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
buffer_info.size = image_size;
buffer_info.usage = VK_BUFFER_USAGE_TRANSFER_SRC_BIT;
buffer_info.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
err = vkCreateBuffer(g_Device, &buffer_info, g_Allocator, &upload_buffer);
check_vk_result(err);
VkMemoryRequirements req;
vkGetBufferMemoryRequirements(g_Device, upload_buffer, &req);
VkMemoryAllocateInfo alloc_info = {};
alloc_info.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
alloc_info.allocationSize = req.size;
alloc_info.memoryTypeIndex = findMemoryType(req.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT);
err = vkAllocateMemory(g_Device, &alloc_info, g_Allocator, &upload_buffer_memory);
check_vk_result(err);
err = vkBindBufferMemory(g_Device, upload_buffer, upload_buffer_memory, 0);
check_vk_result(err);
}
// Upload to Buffer:
{
void* map = NULL;
err = vkMapMemory(g_Device, upload_buffer_memory, 0, image_size, 0, &map);
check_vk_result(err);
memcpy(map, image_data, image_size);
VkMappedMemoryRange range[1] = {};
range[0].sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;
range[0].memory = upload_buffer_memory;
range[0].size = image_size;
err = vkFlushMappedMemoryRanges(g_Device, 1, range);
check_vk_result(err);
vkUnmapMemory(g_Device, upload_buffer_memory);
}
// Release image memory using stb
stbi_image_free(image_data);
// Create a command buffer that will perform following steps when hit in the command queue.
// TODO: this works in the example, but may need input if this is an acceptable way to access the pool/create the command buffer.
VkCommandPool command_pool = g_MainWindowData.Frames[g_MainWindowData.FrameIndex].CommandPool;
VkCommandBuffer command_buffer;
{
VkCommandBufferAllocateInfo alloc_info{};
alloc_info.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
alloc_info.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
alloc_info.commandPool = command_pool;
alloc_info.commandBufferCount = 1;
err = vkAllocateCommandBuffers(g_Device, &alloc_info, &command_buffer);
check_vk_result(err);
VkCommandBufferBeginInfo begin_info = {};
begin_info.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
begin_info.flags |= VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
err = vkBeginCommandBuffer(command_buffer, &begin_info);
check_vk_result(err);
}
// Copy to Image
{
VkImageMemoryBarrier copy_barrier[1] = {};
copy_barrier[0].sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
copy_barrier[0].dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
copy_barrier[0].oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
copy_barrier[0].newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
copy_barrier[0].srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
copy_barrier[0].dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
copy_barrier[0].image = texture_image;
copy_barrier[0].subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
copy_barrier[0].subresourceRange.levelCount = 1;
copy_barrier[0].subresourceRange.layerCount = 1;
vkCmdPipelineBarrier(command_buffer, VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, NULL, 0, NULL, 1, copy_barrier);
VkBufferImageCopy region = {};
region.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
region.imageSubresource.layerCount = 1;
region.imageExtent.width = *image_width;
region.imageExtent.height = *image_height;
region.imageExtent.depth = 1;
vkCmdCopyBufferToImage(command_buffer, upload_buffer, texture_image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region);
VkImageMemoryBarrier use_barrier[1] = {};
use_barrier[0].sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
use_barrier[0].srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
use_barrier[0].dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
use_barrier[0].oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
use_barrier[0].newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
use_barrier[0].srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
use_barrier[0].dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
use_barrier[0].image = texture_image;
use_barrier[0].subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
use_barrier[0].subresourceRange.levelCount = 1;
use_barrier[0].subresourceRange.layerCount = 1;
vkCmdPipelineBarrier(command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0, 0, NULL, 0, NULL, 1, use_barrier);
}
// End command buffer
{
VkSubmitInfo end_info = {};
end_info.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
end_info.commandBufferCount = 1;
end_info.pCommandBuffers = &command_buffer;
err = vkEndCommandBuffer(command_buffer);
check_vk_result(err);
err = vkQueueSubmit(g_Queue, 1, &end_info, VK_NULL_HANDLE);
check_vk_result(err);
err = vkDeviceWaitIdle(g_Device);
check_vk_result(err);
}
return true;
}
// Returns the names of the Vulkan layers used for the given IREE
// |extensibility_set| and |features|.
std::vector<const char*> GetIreeLayers(
@@ -723,7 +562,16 @@ namespace iree {
extern "C" int iree_main(int argc, char** argv) {
fprintf(stdout, "starting yo\n");
iree_flags_parse_checked(IREE_FLAGS_PARSE_MODE_DEFAULT, &argc, &argv);
if (argc > 1) {
// Avoid iree-run-module spinning endlessly on stdin if the user uses single
// dashes for flags.
printf(
"[ERROR] unexpected positional argument (expected none)."
" Did you use pass a flag with a single dash ('-')?"
" Use '--' instead.\n");
return 1;
}
// --------------------------------------------------------------------------
// Create a window.
@@ -835,8 +683,6 @@ extern "C" int iree_main(int argc, char** argv) {
// Demo state.
bool show_iree_window = true;
// --------------------------------------------------------------------------
// --------------------------------------------------------------------------
// Setup IREE.
@@ -900,69 +746,44 @@ extern "C" int iree_main(int argc, char** argv) {
// Load bytecode module
iree_file_toc_t module_file_toc;
const char network_model[] = "resnet50_tf.vmfb";
fprintf(stdout, "Loading: %s\n", network_model);
if (load_file(network_model, &module_file_toc.data, &module_file_toc.size) == false)
{
abort();
return 1;
}
fprintf(stdout, "module size: %zu\n", module_file_toc.size);
static float input_res50[224*224*3];
static float output_res50[1000];
char filename[] = "dog_imagenet.jpg";
fprintf(stdout, "loading: %s\n", filename);
int x,y,n;
//unsigned char *image_raw = stbi_load(filename, &x, &y, &n, 3);
stbi_load(filename, &x, &y, &n, 3);
fprintf(stdout, "res: %i x %i x %i\n", x, y, n);
/* Preprocessing needs to go here. For now use a buffer preprocessed in python.
//convert image into floating point format
for(int i=0;i<224*224*3;i++)
{
input_res50[i]= ((float)image_raw[i])/255.0f;
}*/
std::ifstream fin("dog.bin", std::ifstream::in | std::ifstream::binary);
fin.read((char*)input_res50, 224*224*3*sizeof(float));
// load image again so imgui can display it
int my_image_width = 0;
int my_image_height = 0;
VkDescriptorSet my_image_texture = 0;
bool ret = LoadTextureFromFile(filename, &my_image_texture, &my_image_width, &my_image_height);
fprintf(stdout, "creating vulkan image: %s\n", ret ?"OK":"FAIL");
IM_ASSERT(ret);
//iree_file_toc_t module_file_toc;
//const char network_model[] = "resnet50_tf.vmfb";
//fprintf(stdout, "Loading: %s\n", network_model);
//if (load_file(network_model, &module_file_toc.data, &module_file_toc.size) == false)
//{
// abort();
// return 1;
//}
//fprintf(stdout, "module size: %zu\n", module_file_toc.size);
iree_vm_module_t* bytecode_module = nullptr;
IREE_CHECK_OK(iree_vm_bytecode_module_create(
iree_instance,
iree_const_byte_span_t{
reinterpret_cast<const uint8_t*>(module_file_toc.data),
module_file_toc.size},
iree_allocator_null(), iree_allocator_system(), &bytecode_module));
// Query for details about what is in the loaded module.
iree_vm_module_signature_t bytecode_module_signature =
iree_vm_module_signature(bytecode_module);
fprintf(stdout, "Module loaded, have <%" PRIhsz "> exported functions:\n",
bytecode_module_signature.export_function_count);
for (int i = 0; i < bytecode_module_signature.export_function_count; ++i) {
iree_vm_function_t function;
IREE_CHECK_OK(iree_vm_module_lookup_function_by_ordinal(
bytecode_module, IREE_VM_FUNCTION_LINKAGE_EXPORT, i, &function));
auto function_name = iree_vm_function_name(&function);
auto function_signature = iree_vm_function_signature(&function);
iree_status_t module_status = iree_tooling_load_module_from_flags(
iree_instance, iree_allocator_system(), &bytecode_module);
if (!iree_status_is_ok(module_status))
return -1;
//IREE_CHECK_OK(iree_vm_bytecode_module_create(
// iree_instance,
// iree_const_byte_span_t{
// reinterpret_cast<const uint8_t*>(module_file_toc.data),
// module_file_toc.size},
// iree_allocator_null(), iree_allocator_system(), &bytecode_module));
//// Query for details about what is in the loaded module.
//iree_vm_module_signature_t bytecode_module_signature =
// iree_vm_module_signature(bytecode_module);
//fprintf(stdout, "Module loaded, have <%" PRIhsz "> exported functions:\n",
// bytecode_module_signature.export_function_count);
//for (int i = 0; i < bytecode_module_signature.export_function_count; ++i) {
// iree_vm_function_t function;
// IREE_CHECK_OK(iree_vm_module_lookup_function_by_ordinal(
// bytecode_module, IREE_VM_FUNCTION_LINKAGE_EXPORT, i, &function));
// auto function_name = iree_vm_function_name(&function);
// auto function_signature = iree_vm_function_signature(&function);
fprintf(stdout, " %d: '%.*s' with calling convention '%.*s'\n", i,
(int)function_name.size, function_name.data,
(int)function_signature.calling_convention.size,
function_signature.calling_convention.data);
}
// fprintf(stdout, " %d: '%.*s' with calling convention '%.*s'\n", i,
// (int)function_name.size, function_name.data,
// (int)function_signature.calling_convention.size,
// function_signature.calling_convention.data);
//}
// Allocate a context that will hold the module state across invocations.
iree_vm_context_t* iree_context = nullptr;
@@ -988,33 +809,42 @@ extern "C" int iree_main(int argc, char** argv) {
// Write inputs into mappable buffers.
iree_hal_allocator_t* allocator =
iree_hal_device_allocator(iree_vk_device);
iree_hal_memory_type_t input_memory_type =
static_cast<iree_hal_memory_type_t>(
IREE_HAL_MEMORY_TYPE_HOST_LOCAL |
IREE_HAL_MEMORY_TYPE_DEVICE_VISIBLE);
iree_hal_buffer_usage_t input_buffer_usage =
static_cast<iree_hal_buffer_usage_t>(IREE_HAL_BUFFER_USAGE_DEFAULT);
iree_hal_buffer_params_t buffer_params;
buffer_params.type = input_memory_type;
buffer_params.usage = input_buffer_usage;
buffer_params.access = IREE_HAL_MEMORY_ACCESS_READ | IREE_HAL_MEMORY_ACCESS_WRITE;
//iree_hal_memory_type_t input_memory_type =
// static_cast<iree_hal_memory_type_t>(
// IREE_HAL_MEMORY_TYPE_HOST_LOCAL |
// IREE_HAL_MEMORY_TYPE_DEVICE_VISIBLE);
//iree_hal_buffer_usage_t input_buffer_usage =
// static_cast<iree_hal_buffer_usage_t>(IREE_HAL_BUFFER_USAGE_DEFAULT);
//iree_hal_buffer_params_t buffer_params;
//buffer_params.type = input_memory_type;
//buffer_params.usage = input_buffer_usage;
//buffer_params.access = IREE_HAL_MEMORY_ACCESS_READ | IREE_HAL_MEMORY_ACCESS_WRITE;
// Wrap input buffers in buffer views.
iree_hal_buffer_view_t* input0_buffer_view = nullptr;
constexpr iree_hal_dim_t input_buffer_shape[] = {1, 224, 224, 3};
IREE_CHECK_OK(iree_hal_buffer_view_allocate_buffer(
allocator,
/*shape_rank=*/4, /*shape=*/input_buffer_shape,
IREE_HAL_ELEMENT_TYPE_FLOAT_32,
IREE_HAL_ENCODING_TYPE_DENSE_ROW_MAJOR, buffer_params,
iree_make_const_byte_span(&input_res50, sizeof(input_res50)),
&input0_buffer_view));
vm::ref<iree_vm_list_t> inputs;
IREE_CHECK_OK(iree_vm_list_create(/*element_type=*/nullptr, 6, iree_allocator_system(), &inputs));
auto input0_buffer_view_ref = iree_hal_buffer_view_move_ref(input0_buffer_view);
IREE_CHECK_OK(iree_vm_list_push_ref_move(inputs.get(), &input0_buffer_view_ref));
iree_status_t input_status = ParseToVariantList(
allocator,
iree::span<const std::string>{FLAG_function_inputs.data(),
FLAG_function_inputs.size()},
iree_allocator_system(), &inputs);
if (!iree_status_is_ok(input_status))
return -1;
//vm::ref<iree_vm_list_t> inputs;
//IREE_CHECK_OK(iree_vm_list_create(/*element_type=*/nullptr, 6, iree_allocator_system(), &inputs));
//iree_hal_buffer_view_t* input0_buffer_view = nullptr;
//constexpr iree_hal_dim_t input_buffer_shape[] = {1, 224, 224, 3};
//IREE_CHECK_OK(iree_hal_buffer_view_allocate_buffer(
// allocator,
// /*shape_rank=*/4, /*shape=*/input_buffer_shape,
// IREE_HAL_ELEMENT_TYPE_FLOAT_32,
// IREE_HAL_ENCODING_TYPE_DENSE_ROW_MAJOR, buffer_params,
// iree_make_const_byte_span(&input_res50, sizeof(input_res50)),
// &input0_buffer_view));
//auto input0_buffer_view_ref = iree_hal_buffer_view_move_ref(input0_buffer_view);
//IREE_CHECK_OK(iree_vm_list_push_ref_move(inputs.get(), &input0_buffer_view_ref));
// Prepare outputs list to accept results from the invocation.
@@ -1023,6 +853,7 @@ extern "C" int iree_main(int argc, char** argv) {
IREE_CHECK_OK(iree_vm_list_create(/*element_type=*/nullptr, kOutputCount * sizeof(float), iree_allocator_system(), &outputs));
// --------------------------------------------------------------------------
// Main loop.
bool done = false;
while (!done) {
@@ -1076,46 +907,11 @@ extern "C" int iree_main(int argc, char** argv) {
/*policy=*/nullptr, inputs.get(),
outputs.get(), iree_allocator_system()));
// Read back the results.
auto* output_buffer_view = reinterpret_cast<iree_hal_buffer_view_t*>(
iree_vm_list_get_ref_deref(outputs.get(),
0,
iree_hal_buffer_view_get_descriptor()));
IREE_CHECK_OK(iree_hal_device_transfer_d2h(
iree_vk_device,
iree_hal_buffer_view_buffer(output_buffer_view),
0,
output_res50, sizeof(output_res50),
IREE_HAL_TRANSFER_BUFFER_FLAG_DEFAULT, iree_infinite_timeout()));
// we want to run continuously so we can use tools like RenderDoc, RGP, etc...
dirty = true;
}
// find maxarg from results
float max = 0.0f;
int max_idx = -1;
for(int i=0;i<1000;i++)
{
if (output_res50[i] > max)
{
max = output_res50[i];
max_idx = i;
}
}
ImGui::Text("pointer = %p", my_image_texture);
ImGui::Text("size = %d x %d", my_image_width, my_image_height);
ImGui::Image((ImTextureID)my_image_texture, ImVec2(my_image_width, my_image_height));
// Display the latest computation output.
ImGui::Text("Max idx = [%i]", max_idx);
ImGui::Text("Max value = [%f]", max);
ImGui::Text("Resnet50 categories:");
ImGui::PlotHistogram("Histogram", output_res50, IM_ARRAYSIZE(output_res50), 0, NULL, 0.0f, 1.0f, ImVec2(0,80));
ImGui::Separator();
// Framerate counter.
ImGui::Text("Application average %.3f ms/frame (%.1f FPS)",
1000.0f / ImGui::GetIO().Framerate, ImGui::GetIO().Framerate);
@@ -1137,6 +933,7 @@ extern "C" int iree_main(int argc, char** argv) {
iree_vm_module_release(bytecode_module);
iree_vm_context_release(iree_context);
iree_hal_device_release(iree_vk_device);
iree_hal_allocator_release(allocator);
iree_hal_driver_release(iree_vk_driver);
iree_hal_vulkan_syms_release(iree_vk_syms);
iree_vm_instance_release(iree_instance);

File diff suppressed because it is too large Load Diff

View File

@@ -205,14 +205,14 @@ if __name__ == "__main__":
parser.add_argument(
"--torch_model_csv",
type=lambda x: is_valid_file(x),
default="./tank/pytorch/torch_model_list.csv",
default="./tank/torch_model_list.csv",
help="""Contains the file with torch_model name and args.
Please see: https://github.com/nod-ai/SHARK/blob/main/tank/pytorch/torch_model_list.csv""",
Please see: https://github.com/nod-ai/SHARK/blob/main/tank/torch_model_list.csv""",
)
parser.add_argument(
"--tf_model_csv",
type=lambda x: is_valid_file(x),
default="./tank/tf/tf_model_list.csv",
default="./tank/tf_model_list.csv",
help="Contains the file with tf model name and args.",
)
parser.add_argument(

View File

@@ -4,9 +4,9 @@ requires = [
"wheel",
"packaging",
"numpy==1.22.4",
"torch-mlir>=20220428.420",
"iree-compiler>=20220427.13",
"iree-runtime>=20220427.13",
"numpy>=1.22.4",
"torch-mlir>=20221021.633",
"iree-compiler>=20221022.190",
"iree-runtime>=20221022.190",
]
build-backend = "setuptools.build_meta"

View File

@@ -1,8 +1,8 @@
-f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
-f https://download.pytorch.org/whl/nightly/cpu/
--pre
numpy
torch
torch==1.14.0.dev20221021
torchvision
tqdm

View File

@@ -14,7 +14,8 @@ iree-tools-tf
# TensorFlow and JAX.
gin-config
tensorflow
tensorflow==2.10
keras==2.10
#tf-models-nightly
#tensorflow-text-nightly
transformers

View File

@@ -10,8 +10,8 @@ PACKAGE_VERSION = os.environ.get("SHARK_PACKAGE_VERSION") or "0.0.4"
backend_deps = []
if "NO_BACKEND" in os.environ.keys():
backend_deps = [
"iree-compiler>=20220427.13",
"iree-runtime>=20220427.13",
"iree-compiler>=20221022.190",
"iree-runtime>=20221022.190",
]
setup(
@@ -33,11 +33,11 @@ setup(
"Operating System :: OS Independent",
],
packages=find_packages(exclude=("examples")),
python_requires=">=3.7",
python_requires=">=3.9",
install_requires=[
"numpy",
"PyYAML",
"torch-mlir>=20220428.420",
"torch-mlir>=20221021.633",
]
+ backend_deps,
)

40
setup_venv.ps1 Normal file
View File

@@ -0,0 +1,40 @@
#Write-Host "Installing python"
#Start-Process winget install Python.Python.3.10 '/quiet InstallAllUsers=1 PrependPath=1' -wait -NoNewWindow
#Write-Host "python installation completed successfully"
#Write-Host "Reload environment variables"
#$env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User")
#Write-Host "Reloaded environment variables"
# redirect stderr into stdout
$p = &{python -V} 2>&1
# check if an ErrorRecord was returned
$version = if($p -is [System.Management.Automation.ErrorRecord])
{
# grab the version string from the error message
$p.Exception.Message
}
else
{
# otherwise return as is
$p
}
Write-Host "Python version found is"
Write-Host $p
Write-Host "Installing Build Dependencies"
python -m venv .\shark.venv\
.\shark.venv\Scripts\activate
pip install -r requirements.txt
pip install --pre torch-mlir torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu116 -f https://llvm.github.io/torch-mlir/package-index/
pip install --upgrade -f https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html iree-compiler iree-runtime
Write-Host "Building SHARK..."
pip install -e . -f https://llvm.github.io/torch-mlir/package-index/ -f https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html
pip install diffusers transformers scipy pillow gradio
Write-Host "Build and installation completed successfully"
Write-Host "Source your venv with ./shark.venv/Scripts/activate"

View File

@@ -76,11 +76,15 @@ fi
$PYTHON -m pip install --upgrade pip || die "Could not upgrade pip"
$PYTHON -m pip install --upgrade -r "$TD/requirements.txt"
if [ "$torch_mlir_bin" = true ]; then
$PYTHON -m pip install --pre torch-mlir -f https://llvm.github.io/torch-mlir/package-index/
if [ $? -eq 0 ];then
echo "Successfully Installed torch-mlir"
if [[ $(uname -s) = 'Darwin' ]]; then
echo "MacOS detected. Please install torch-mlir from source or .whl, as dependency problems may occur otherwise."
else
echo "Could not install torch-mlir" >&2
$PYTHON -m pip install --pre torch-mlir -f https://llvm.github.io/torch-mlir/package-index/
if [ $? -eq 0 ];then
echo "Successfully Installed torch-mlir"
else
echo "Could not install torch-mlir" >&2
fi
fi
else
echo "${Red}No binaries found for Python $PYTHON_VERSION_X_Y on $(uname -s)"
@@ -89,13 +93,13 @@ else
exit 1
fi
if [[ -z "${USE_IREE}" ]]; then
RUNTIME="nod-ai/SHARK-Runtime"
RUNTIME="https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html"
else
RUNTIME="google/iree"
RUNTIME="https://iree-org.github.io/iree/pip-release-links.html"
fi
if [[ -z "${NO_BACKEND}" ]]; then
echo "Installing ${RUNTIME}..."
$PYTHON -m pip install --find-links https://github.com/${RUNTIME}/releases iree-compiler iree-runtime
$PYTHON -m pip install --upgrade --find-links ${RUNTIME} iree-compiler iree-runtime
else
echo "Not installing a backend, please make sure to add your backend to PYTHONPATH"
fi
@@ -103,15 +107,17 @@ if [[ ! -z "${IMPORTER}" ]]; then
echo "${Yellow}Installing importer tools.."
if [[ $(uname -s) = 'Linux' ]]; then
echo "${Yellow}Linux detected.. installing Linux importer tools"
$PYTHON -m pip install --upgrade -r "$TD/requirements-importer.txt" -f https://github.com/${RUNTIME}/releases --extra-index-url https://download.pytorch.org/whl/nightly/cpu
#Always get the importer tools from upstream IREE
$PYTHON -m pip install --upgrade -r "$TD/requirements-importer.txt" -f https://iree-org.github.io/iree/pip-release-links.html --extra-index-url https://download.pytorch.org/whl/nightly/cpu
elif [[ $(uname -s) = 'Darwin' ]]; then
echo "${Yellow}macOS detected.. installing macOS importer tools"
#Conda seems to have some problems installing these packages and hope they get resolved upstream.
$PYTHON -m pip install --upgrade -r "$TD/requirements-importer-macos.txt" -f https://github.com/${RUNTIME}/releases --extra-index-url https://download.pytorch.org/whl/nightly/cpu
$PYTHON -m pip install --upgrade -r "$TD/requirements-importer-macos.txt" -f ${RUNTIME} --extra-index-url https://download.pytorch.org/whl/nightly/cpu
$PYTHON -m pip install https://github.com/llvm/torch-mlir/releases/download/snapshot-20221024.636/torch_mlir-20221024.636-cp310-cp310-macosx_11_0_universal2.whl
fi
fi
$PYTHON -m pip install -e . -f https://llvm.github.io/torch-mlir/package-index/ -f https://github.com/${RUNTIME}/releases
$PYTHON -m pip install -e . -f https://llvm.github.io/torch-mlir/package-index/ -f ${RUNTIME}
if [[ $(uname -s) = 'Linux' && ! -z "${BENCHMARK}" ]]; then
$PYTHON -m pip uninstall -y torch torchvision

View File

@@ -69,7 +69,7 @@ labels = load_labels()
mlir_model, func_name, inputs, golden_out = download_torch_model("resnet50")
shark_module = SharkInference(mlir_model, func_name, mlir_dialect="linalg")
# shark_module.compile()
shark_module.compile()
path = shark_module.save_module()
shark_module.load_module(path)
result = shark_module.forward((img.detach().numpy(),))

View File

@@ -47,7 +47,7 @@ def load_mlir(mlir_loc):
return mlir_module
def compile_through_fx(model, inputs, mlir_loc=None):
def compile_through_fx(model, inputs, mlir_loc=None, extra_args=[]):
module = load_mlir(mlir_loc)
if mlir_loc == None:
@@ -98,9 +98,12 @@ def compile_through_fx(model, inputs, mlir_loc=None):
func_name = "forward"
shark_module = SharkInference(
mlir_model, func_name, device=args.device, mlir_dialect="tm_tensor"
mlir_model,
func_name,
device=args.device,
mlir_dialect="tm_tensor",
)
shark_module.compile()
shark_module.compile(extra_args)
return shark_module
@@ -161,6 +164,7 @@ if __name__ == "__main__":
unet,
(latent_model_input, torch.tensor([1.0]), text_embeddings),
args.mlir_loc,
["--iree-flow-enable-conv-nchw-to-nhwc-transform"],
)
# torch.jit.script(unet)

View File

@@ -0,0 +1,278 @@
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler
import torch
from PIL import Image
from diffusers import LMSDiscreteScheduler
from tqdm.auto import tqdm
from shark.shark_inference import SharkInference
from torch.fx.experimental.proxy_tensor import make_fx
from torch._decomp import get_decompositions
import torch_mlir
import tempfile
import numpy as np
# pip install diffusers
# pip install scipy
############### Parsing args #####################
import argparse
p = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
p.add_argument(
"--prompt",
type=str,
default="a photograph of an astronaut riding a horse",
help="the text prompt to use",
)
p.add_argument("--device", type=str, default="cpu", help="the device to use")
p.add_argument("--steps", type=int, default=50, help="the device to use")
p.add_argument("--mlir_loc", type=str, default=None, help="the device to use")
p.add_argument("--vae_loc", type=str, default=None, help="the device to use")
args = p.parse_args()
#####################################################
def fp16_unet():
from shark.shark_downloader import download_torch_model
mlir_model, func_name, inputs, golden_out = download_torch_model(
"stable_diff_f16_18_OCT", tank_url="gs://shark_tank/prashant_nod"
)
shark_module = SharkInference(
mlir_model, func_name, device=args.device, mlir_dialect="linalg"
)
shark_module.compile()
return shark_module
def load_mlir(mlir_loc):
import os
if mlir_loc == None:
return None
print(f"Trying to load the model from {mlir_loc}.")
with open(os.path.join(mlir_loc)) as f:
mlir_module = f.read()
return mlir_module
def compile_through_fx(model, inputs, mlir_loc=None):
module = load_mlir(mlir_loc)
if mlir_loc == None:
fx_g = make_fx(
model,
decomposition_table=get_decompositions(
[
torch.ops.aten.embedding_dense_backward,
torch.ops.aten.native_layer_norm_backward,
torch.ops.aten.slice_backward,
torch.ops.aten.select_backward,
torch.ops.aten.norm.ScalarOpt_dim,
torch.ops.aten.native_group_norm,
torch.ops.aten.upsample_bilinear2d.vec,
torch.ops.aten.split.Tensor,
torch.ops.aten.split_with_sizes,
]
),
)(*inputs)
fx_g.graph.set_codegen(torch.fx.graph.CodeGen())
fx_g.recompile()
def strip_overloads(gm):
"""
Modifies the target of graph nodes in :attr:`gm` to strip overloads.
Args:
gm(fx.GraphModule): The input Fx graph module to be modified
"""
for node in gm.graph.nodes:
if isinstance(node.target, torch._ops.OpOverload):
node.target = node.target.overloadpacket
gm.recompile()
strip_overloads(fx_g)
ts_g = torch.jit.script(fx_g)
module = torch_mlir.compile(
ts_g,
inputs,
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=False,
verbose=False,
)
mlir_model = module
func_name = "forward"
shark_module = SharkInference(
mlir_model, func_name, device=args.device, mlir_dialect="linalg"
)
shark_module.compile()
return shark_module
if __name__ == "__main__":
YOUR_TOKEN = "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"
# 1. Load the autoencoder model which will be used to decode the latents into image space.
vae = AutoencoderKL.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="vae",
use_auth_token=YOUR_TOKEN,
)
# 2. Load the tokenizer and text encoder to tokenize and encode the text.
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
text_encoder = CLIPTextModel.from_pretrained(
"openai/clip-vit-large-patch14"
)
class VaeModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.vae = AutoencoderKL.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="vae",
use_auth_token=YOUR_TOKEN,
)
def forward(self, input):
return self.vae.decode(input, return_dict=False)[0]
vae = VaeModel()
vae_input = torch.rand(1, 4, 64, 64)
shark_vae = compile_through_fx(vae, (vae_input,), args.vae_loc)
# Wrap the unet model to return tuples.
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
)
self.in_channels = self.unet.in_channels
self.train(False)
def forward(self, x, y, z):
return self.unet.forward(x, y, z, return_dict=False)[0]
# # 3. The UNet model for generating the latents.
unet = UnetModel()
shark_unet = fp16_unet()
scheduler = LMSDiscreteScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
num_train_timesteps=1000,
)
prompt = [args.prompt]
height = 512 # default height of Stable Diffusion
width = 512 # default width of Stable Diffusion
num_inference_steps = args.steps # Number of denoising steps
guidance_scale = 7.5 # Scale for classifier-free guidance
generator = torch.manual_seed(
42
) # Seed generator to create the inital latent noise
batch_size = len(prompt)
text_input = tokenizer(
prompt,
padding="max_length",
max_length=tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)
text_embeddings = text_encoder(text_input.input_ids)[0]
max_length = text_input.input_ids.shape[-1]
uncond_input = tokenizer(
[""] * batch_size,
padding="max_length",
max_length=max_length,
return_tensors="pt",
)
uncond_embeddings = text_encoder(uncond_input.input_ids)[0]
text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
latents = torch.randn(
(batch_size, unet.in_channels, height // 8, width // 8),
generator=generator,
)
# latents = latents.to(torch_device)
scheduler.set_timesteps(num_inference_steps)
latents = latents * scheduler.sigmas[0]
# print(latents, latents.shape)
for i, t in tqdm(enumerate(scheduler.timesteps)):
print(f"i = {i} t = {t}")
# expand the latents if we are doing classifier-free guidance to avoid doing two forward passes.
latent_model_input = torch.cat([latents] * 2)
sigma = scheduler.sigmas[i]
latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
# predict the noise residual
# with torch.no_grad():
# noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings)
latent_model_input_numpy = (
latent_model_input.detach().numpy().astype(np.half)
)
text_embeddings_numpy = (
text_embeddings.detach().numpy().astype(np.half)
)
noise_pred = shark_unet.forward(
(
latent_model_input_numpy,
np.array([t]).astype(np.half),
text_embeddings_numpy,
)
)
noise_pred = torch.from_numpy(noise_pred).to(torch.float32)
# perform guidance
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + guidance_scale * (
noise_pred_text - noise_pred_uncond
)
# compute the previous noisy sample x_t -> x_t-1
latents = scheduler.step(noise_pred, i, latents)["prev_sample"]
# print("Latents shape : ", latents.shape)
# scale and decode the image latents with vae
latents = 1 / 0.18215 * latents
latents_numpy = latents.detach().numpy()
image = shark_vae.forward((latents_numpy,))
image = torch.from_numpy(image)
image = (image / 2 + 0.5).clamp(0, 1)
image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
images = (image * 255).round().astype("uint8")
pil_images = [Image.fromarray(image) for image in images]
pil_images[0].save("astro.jpg")

View File

@@ -0,0 +1,2 @@
*.vmfb
*.jpg

View File

@@ -0,0 +1,15 @@
# STABLE DIFFUSION
## Installation
```shell
pip install diffusers
pip install scipy
```
## RUN
```shell
python main.py --precision="fp32"|"fp16" --prompt="enter the text" --device="cpu"|"cuda"|"vulkan" --import_mlir|--no-import_mlir
```

View File

@@ -0,0 +1,25 @@
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(
text=["a photo of a cat", "a photo of a dog"],
images=image,
return_tensors="pt",
padding=True,
)
outputs = model(**inputs)
logits_per_image = (
outputs.logits_per_image
) # this is the image-text similarity score
probs = logits_per_image.softmax(
dim=1
) # we can take the softmax to get the label probabilities

View File

@@ -0,0 +1,241 @@
from transformers import CLIPTextModel, CLIPTokenizer
import torch
from PIL import Image
from diffusers import LMSDiscreteScheduler
from tqdm.auto import tqdm
import numpy as np
from stable_args import args
from model_wrappers import (
get_vae32,
get_vae16,
get_unet16_wrapped,
get_unet32_wrapped,
get_clipped_text,
)
from utils import get_shark_model
import time
GCLOUD_BUCKET = "gs://shark_tank/prashant_nod"
VAE_FP16 = "vae_fp16"
VAE_FP32 = "vae_fp32"
UNET_FP16 = "unet_fp16"
UNET_FP32 = "unet_fp32"
IREE_EXTRA_ARGS = []
TUNED_GCLOUD_BUCKET = "gs://shark_tank/quinn"
UNET_FP16_TUNED = "unet_fp16_tunedv2"
BATCH_SIZE = len(args.prompts)
if BATCH_SIZE not in [1, 2]:
import sys
sys.exit("Only batch size 1 and 2 are supported.")
if BATCH_SIZE > 1 and args.precision != "fp16":
sys.exit("batch size > 1 is supported for fp16 model.")
if BATCH_SIZE != 1:
TUNED_GCLOUD_BUCKET = "gs://shark_tank/prashant_nod"
UNET_FP16_TUNED = f"unet_fp16_{BATCH_SIZE}"
VAE_FP16 = f"vae_fp16_{BATCH_SIZE}"
# Helper function to profile the vulkan device.
def start_profiling(file_path="foo.rdc", profiling_mode="queue"):
if args.vulkan_debug_utils and "vulkan" in args.device:
import iree
print(f"Profiling and saving to {file_path}.")
vulkan_device = iree.runtime.get_device(args.device)
vulkan_device.begin_profiling(mode=profiling_mode, file_path=file_path)
return vulkan_device
return None
def end_profiling(device):
if device:
return device.end_profiling()
def get_models():
global IREE_EXTRA_ARGS
if args.precision == "fp16":
IREE_EXTRA_ARGS += [
"--iree-flow-enable-padding-linalg-ops",
"--iree-flow-linalg-ops-padding-size=32",
]
if args.use_tuned:
unet_gcloud_bucket = TUNED_GCLOUD_BUCKET
vae_gcloud_bucket = GCLOUD_BUCKET
unet_args = IREE_EXTRA_ARGS
vae_args = IREE_EXTRA_ARGS + [
"--iree-flow-enable-conv-nchw-to-nhwc-transform"
]
unet_name = UNET_FP16_TUNED
vae_name = VAE_FP16
else:
unet_gcloud_bucket = GCLOUD_BUCKET
vae_gcloud_bucket = GCLOUD_BUCKET
IREE_EXTRA_ARGS += [
"--iree-flow-enable-conv-nchw-to-nhwc-transform"
]
unet_args = IREE_EXTRA_ARGS
vae_args = IREE_EXTRA_ARGS
unet_name = UNET_FP16
vae_name = VAE_FP16
if batch_size > 1:
vae_args = []
if args.import_mlir == True:
return get_vae16(model_name=VAE_FP16), get_unet16_wrapped(
model_name=UNET_FP16
)
else:
return get_shark_model(
vae_gcloud_bucket,
vae_name,
vae_args,
), get_shark_model(
unet_gcloud_bucket,
unet_name,
unet_args,
)
elif args.precision == "fp32":
IREE_EXTRA_ARGS += [
"--iree-flow-enable-conv-nchw-to-nhwc-transform",
"--iree-flow-enable-padding-linalg-ops",
"--iree-flow-linalg-ops-padding-size=16",
]
if args.import_mlir == True:
return get_vae32(model_name=VAE_FP32), get_unet32_wrapped(
model_name=UNET_FP32
)
else:
return get_shark_model(
GCLOUD_BUCKET,
VAE_FP32,
IREE_EXTRA_ARGS,
), get_shark_model(
GCLOUD_BUCKET,
UNET_FP32,
IREE_EXTRA_ARGS,
)
if __name__ == "__main__":
dtype = torch.float32 if args.precision == "fp32" else torch.half
if len(args.iree_vulkan_target_triple) > 0:
IREE_EXTRA_ARGS.append(
f"-iree-vulkan-target-triple={args.iree_vulkan_target_triple}"
)
clip_model = "clip_text"
clip_extra_args = [
"--iree-flow-linalg-ops-padding-size=16",
"--iree-flow-enable-padding-linalg-ops",
]
clip = get_shark_model(GCLOUD_BUCKET, clip_model, clip_extra_args)
prompt = args.prompts
height = 512 # default height of Stable Diffusion
width = 512 # default width of Stable Diffusion
num_inference_steps = args.steps # Number of denoising steps
guidance_scale = args.guidance_scale # Scale for classifier-free guidance
generator = torch.manual_seed(
args.seed
) # Seed generator to create the inital latent noise
batch_size = len(prompt)
vae, unet = get_models()
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
scheduler = LMSDiscreteScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
num_train_timesteps=1000,
)
start = time.time()
text_input = tokenizer(
prompt,
padding="max_length",
max_length=args.max_length,
truncation=True,
return_tensors="pt",
)
text_embeddings = clip.forward((text_input.input_ids,))
text_embeddings = torch.from_numpy(text_embeddings).to(dtype)
max_length = text_input.input_ids.shape[-1]
uncond_input = tokenizer(
[""] * batch_size,
padding="max_length",
max_length=max_length,
return_tensors="pt",
)
uncond_embeddings = clip.forward((uncond_input.input_ids,))
uncond_embeddings = torch.from_numpy(uncond_embeddings).to(dtype)
text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
latents = torch.randn(
(batch_size, 4, height // 8, width // 8),
generator=generator,
dtype=torch.float32,
).to(dtype)
scheduler.set_timesteps(num_inference_steps)
scheduler.is_scale_input_called = True
latents = latents * scheduler.sigmas[0]
text_embeddings_numpy = text_embeddings.detach().numpy()
avg_ms = 0
for i, t in tqdm(enumerate(scheduler.timesteps)):
step_start = time.time()
print(f"i = {i} t = {t}", end="")
timestep = torch.tensor([t]).to(dtype).detach().numpy()
latents_numpy = latents.detach().numpy()
sigma_numpy = np.array(scheduler.sigmas[i]).astype(np.float32)
profile_device = start_profiling(file_path="unet.rdc")
noise_pred = unet.forward(
(latents_numpy, timestep, text_embeddings_numpy, sigma_numpy)
)
end_profiling(profile_device)
noise_pred = torch.from_numpy(noise_pred)
step_time = time.time() - step_start
avg_ms += step_time
step_ms = int((step_time) * 1000)
print(f" ({step_ms}ms)")
latents = scheduler.step(noise_pred, i, latents)["prev_sample"]
avg_ms = 1000 * avg_ms / args.steps
print(f"Average step time: {avg_ms}ms/it")
# scale and decode the image latents with vae
latents = 1 / 0.18215 * latents
latents_numpy = latents.detach().numpy()
profile_device = start_profiling(file_path="vae.rdc")
image = vae.forward((latents_numpy,))
end_profiling(profile_device)
image = torch.from_numpy(image)
image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
images = (image * 255).round().astype("uint8")
print("Total image generation runtime (s): {}".format(time.time() - start))
pil_images = [Image.fromarray(image) for image in images]
for i in range(batch_size):
pil_images[i].save(f"{args.prompts[i]}_{i}.jpg")

View File

@@ -0,0 +1,223 @@
from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler
from transformers import CLIPTextModel
from utils import compile_through_fx
from stable_args import args
import torch
YOUR_TOKEN = "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"
BATCH_SIZE = len(args.prompts)
def get_clipped_text(model_name="clip_text"):
class CLIPText(torch.nn.Module):
def __init__(self):
super().__init__()
self.text_encoder = CLIPTextModel.from_pretrained(
"openai/clip-vit-large-patch14"
)
def forward(self, input):
return self.text_encoder(input)[0]
clip_model = CLIPText()
clip_input = torch.randint(1, 2, (BATCH_SIZE, 77))
shark_clip = compile_through_fx(
clip_model,
(clip_input,),
model_name=model_name,
)
return shark_clip
def get_vae32(model_name="vae_fp32"):
class VaeModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.vae = AutoencoderKL.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="vae",
use_auth_token=YOUR_TOKEN,
)
def forward(self, input):
x = self.vae.decode(input, return_dict=False)[0]
return (x / 2 + 0.5).clamp(0, 1)
vae = VaeModel()
vae_input = torch.rand(BATCH_SIZE, 4, 64, 64)
shark_vae = compile_through_fx(
vae,
(vae_input,),
model_name=model_name,
)
return shark_vae
def get_vae16(model_name="vae_fp16"):
class VaeModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.vae = AutoencoderKL.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="vae",
use_auth_token=YOUR_TOKEN,
revision="fp16",
)
def forward(self, input):
x = self.vae.decode(input, return_dict=False)[0]
return (x / 2 + 0.5).clamp(0, 1)
vae = VaeModel()
vae = vae.half().cuda()
vae_input = torch.rand(BATCH_SIZE, 4, 64, 64, dtype=torch.half).cuda()
shark_vae = compile_through_fx(
vae,
(vae_input,),
model_name=model_name,
)
return shark_vae
def get_unet32(model_name="unet_fp32"):
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
)
self.in_channels = self.unet.in_channels
self.train(False)
def forward(self, x, y, z):
return self.unet.forward(x, y, z, return_dict=False)[0]
unet = UnetModel()
latent_model_input = torch.rand([2, 4, 64, 64])
text_embeddings = torch.rand([2, args.max_length, 768])
shark_unet = compile_through_fx(
unet,
(latent_model_input, torch.tensor([1.0]), text_embeddings),
model_name=model_name,
)
return shark_unet
def get_unet16(model_name="unet_fp16"):
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
revision="fp16",
)
self.in_channels = self.unet.in_channels
self.train(False)
def forward(self, x, y, z):
return self.unet.forward(x, y, z, return_dict=False)[0]
unet = UnetModel()
unet = unet.half().cuda()
latent_model_input = torch.rand([2, 4, 64, 64]).half().cuda()
text_embeddings = torch.rand([2, args.max_length, 768]).half().cuda()
shark_unet = compile_through_fx(
unet,
(
latent_model_input,
torch.tensor([1.0]).half().cuda(),
text_embeddings,
),
model_name=model_name,
)
return shark_unet
def get_unet16_wrapped(guidance_scale=7.5, model_name="unet_fp16_wrapped"):
class UnetModel(torch.nn.Module):
def __init__(self, guidance_scale=guidance_scale):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
revision="fp16",
)
self.in_channels = self.unet.in_channels
self.guidance_scale = guidance_scale
self.train(False)
def forward(self, latent, timestep, text_embedding, sigma):
# expand the latents if we are doing classifier-free guidance to avoid doing two forward passes.
latents = torch.cat([latent] * 2)
latents = latents / (torch.pow((torch.pow(sigma, 2) + 1), 0.5))
unet_out = self.unet.forward(
latents, timestep, text_embedding, return_dict=False
)[0]
noise_pred_uncond, noise_pred_text = unet_out.chunk(2)
noise_pred = noise_pred_uncond + self.guidance_scale * (
noise_pred_text - noise_pred_uncond
)
return noise_pred
unet = UnetModel()
unet = unet.half().cuda()
latent_model_input = torch.rand([BATCH_SIZE, 4, 64, 64]).half().cuda()
text_embeddings = (
torch.rand([2 * BATCH_SIZE, args.max_length, 768]).half().cuda()
)
sigma = torch.tensor(1).to(torch.float32)
shark_unet = compile_through_fx(
unet,
(
latent_model_input,
torch.tensor([1.0]).half().cuda(),
text_embeddings,
sigma,
),
model_name=model_name,
)
return shark_unet
def get_unet32_wrapped(guidance_scale=7.5, model_name="unet_fp32_wrapped"):
class UnetModel(torch.nn.Module):
def __init__(self, guidance_scale=guidance_scale):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
)
self.in_channels = self.unet.in_channels
self.guidance_scale = guidance_scale
self.train(False)
def forward(self, latent, timestep, text_embedding, sigma):
latents = torch.cat([latent] * 2)
latents = latents / (torch.pow((torch.pow(sigma, 2) + 1), 0.5))
unet_out = self.unet.forward(
latents, timestep, text_embedding, return_dict=False
)[0]
noise_pred_uncond, noise_pred_text = unet_out.chunk(2)
noise_pred = noise_pred_uncond + self.guidance_scale * (
noise_pred_text - noise_pred_uncond
)
return noise_pred
unet = UnetModel()
latent_model_input = torch.rand([BATCH_SIZE, 4, 64, 64])
text_embeddings = torch.rand([2 * BATCH_SIZE, args.max_length, 768])
sigma = torch.tensor(1).to(torch.float32)
shark_unet = compile_through_fx(
unet,
(latent_model_input, torch.tensor([1.0]), text_embeddings, sigma),
model_name=model_name,
)
return shark_unet

View File

@@ -0,0 +1,88 @@
import argparse
p = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
p.add_argument(
"--prompts",
nargs="+",
default=["a photograph of an astronaut riding a horse"],
help="text of which images to be generated.",
)
p.add_argument(
"--device", type=str, default="cpu", help="device to run the model."
)
p.add_argument(
"--steps",
type=int,
default=10,
help="the no. of steps to do the sampling.",
)
p.add_argument(
"--seed",
type=int,
default=42,
help="the seed to use.",
)
p.add_argument(
"--guidance_scale",
type=float,
default=7.5,
help="the value to be used for guidance scaling.",
)
p.add_argument(
"--import_mlir",
default=False,
action=argparse.BooleanOptionalAction,
help="imports the model from torch module to shark_module otherwise downloads the model from shark_tank.",
)
p.add_argument(
"--precision", type=str, default="fp32", help="precision to run the model."
)
p.add_argument(
"--max_length",
type=int,
default=77,
help="max length of the tokenizer output.",
)
p.add_argument(
"--load_vmfb",
default=True,
action=argparse.BooleanOptionalAction,
help="attempts to load the model from a precompiled flatbuffer and compiles + saves it if not found.",
)
p.add_argument(
"--save_vmfb",
default=False,
action=argparse.BooleanOptionalAction,
help="saves the compiled flatbuffer to the local directory",
)
p.add_argument(
"--iree-vulkan-target-triple",
type=str,
default="",
help="Specify target triple for vulkan",
)
p.add_argument(
"--vulkan_debug_utils",
default=False,
action=argparse.BooleanOptionalAction,
help="Profiles vulkan device and collects the .rdc info",
)
p.add_argument(
"--use_tuned",
default=True,
action=argparse.BooleanOptionalAction,
help="Download and use the tuned version of the model if available",
)
args = p.parse_args()

View File

@@ -0,0 +1,103 @@
import os
import torch
from shark.shark_inference import SharkInference
from shark.shark_importer import SharkImporter
from torch.fx.experimental.proxy_tensor import make_fx
from stable_args import args
from torch._decomp import get_decompositions
import torch_mlir
def _compile_module(shark_module, model_name, extra_args=[]):
if args.load_vmfb or args.save_vmfb:
extended_name = "{}_{}".format(model_name, args.device)
vmfb_path = os.path.join(os.getcwd(), extended_name + ".vmfb")
if args.load_vmfb and os.path.isfile(vmfb_path) and not args.save_vmfb:
print("Loading flatbuffer from {}".format(vmfb_path))
shark_module.load_module(vmfb_path)
else:
if args.save_vmfb:
print("Saving to {}".format(vmfb_path))
else:
print(
"No vmfb found. Compiling and saving to {}".format(
vmfb_path
)
)
path = shark_module.save_module(
os.getcwd(), extended_name, extra_args
)
shark_module.load_module(path)
else:
shark_module.compile(extra_args)
return shark_module
# Downloads the model from shark_tank and returns the shark_module.
def get_shark_model(tank_url, model_name, extra_args=[]):
from shark.shark_downloader import download_torch_model
mlir_model, func_name, inputs, golden_out = download_torch_model(
model_name, tank_url=tank_url
)
shark_module = SharkInference(
mlir_model, func_name, device=args.device, mlir_dialect="linalg"
)
return _compile_module(shark_module, model_name, extra_args)
# Converts the torch-module into shark_module.
def compile_through_fx(model, inputs, model_name, extra_args=[]):
fx_g = make_fx(
model,
decomposition_table=get_decompositions(
[
torch.ops.aten.embedding_dense_backward,
torch.ops.aten.native_layer_norm_backward,
torch.ops.aten.slice_backward,
torch.ops.aten.select_backward,
torch.ops.aten.norm.ScalarOpt_dim,
torch.ops.aten.native_group_norm,
torch.ops.aten.upsample_bilinear2d.vec,
torch.ops.aten.split.Tensor,
torch.ops.aten.split_with_sizes,
]
),
)(*inputs)
fx_g.graph.set_codegen(torch.fx.graph.CodeGen())
fx_g.recompile()
def strip_overloads(gm):
"""
Modifies the target of graph nodes in :attr:`gm` to strip overloads.
Args:
gm(fx.GraphModule): The input Fx graph module to be modified
"""
for node in gm.graph.nodes:
if isinstance(node.target, torch._ops.OpOverload):
node.target = node.target.overloadpacket
gm.recompile()
strip_overloads(fx_g)
ts_g = torch.jit.trace(fx_g, inputs)
mlir_importer = SharkImporter(
ts_g,
inputs,
frontend="torch",
)
(mlir_module, func_name), _, _ = mlir_importer.import_debug()
shark_module = SharkInference(
mlir_module,
func_name,
device=args.device,
mlir_dialect="linalg",
)
return _compile_module(shark_module, model_name, extra_args)

View File

@@ -0,0 +1,41 @@
# Stable Diffusion Img2Img model
## Installation
<details>
<summary>Installation (Linux)</summary>
### Activate shark.venv Virtual Environment
```shell
source shark.venv/bin/activate
# Some older pip installs may not be able to handle the recent PyTorch deps
python -m pip install --upgrade pip
```
### Install dependencies
# Run the setup.sh script
```shell
./setup.sh
```
### Run the Stable diffusion Img2Img model
To run the model with the default set of images and params, run:
```shell
python stable_diffusion_img2img.py
```
To run the model with your set of images, and parameters you need to specify the following params:
1.) Input images directory with the arg `--input_dir` containing 3-5 images.
2.) What to teach the model? Using the arg `--what_to_teach`, allowed values are `object` or `style`.
3.) Placeholder token using the arg `--placeholder_token`, that represents your new concept. It should be passed with the opening and closing angle brackets. For ex: token is `cat-toy`, it should be passed as `<cat-toy>`.
4.) Initializer token using the arg `--initializer_token`, which summarise what is your new concept.
For the result, you need to pass the text prompt with the arg: `--prompt`. The prompt string should contain a "*s" in it, which will be replaced by the placeholder token during the inference.
By default the result images will go into the `sd_result` dir. To specify your output dir use the arg: `--output_dir`.
The default value of max_training_steps is `3000`, which takes some hours to complete. You can pass the smaller value with the arg `--training_steps`. Specify the number of images to be sampled for the result with the `--num_inference_samples` arg.

View File

@@ -0,0 +1,25 @@
#!/bin/bash
TD="$(cd $(dirname $0) && pwd)"
if [ -z "$PYTHON" ]; then
PYTHON="$(which python3)"
fi
function die() {
echo "Error executing command: $*"
exit 1
}
PYTHON_VERSION_X_Y=`${PYTHON} -c 'import sys; version=sys.version_info[:2]; print("{0}.{1}".format(*version))'`
echo "Python: $PYTHON"
echo "Python version: $PYTHON_VERSION_X_Y"
mkdir input_images
wget https://huggingface.co/datasets/valhalla/images/resolve/main/2.jpeg -P input_images/
wget https://huggingface.co/datasets/valhalla/images/resolve/main/3.jpeg -P input_images/
wget https://huggingface.co/datasets/valhalla/images/resolve/main/5.jpeg -P input_images/
wget https://huggingface.co/datasets/valhalla/images/resolve/main/6.jpeg -P input_images/
pip install diffusers["training"]==0.4.1 transformers ftfy opencv-python

View File

@@ -0,0 +1,597 @@
# Textual-inversion fine-tuning for Stable Diffusion using diffusers
# This script shows how to "teach" Stable Diffusion a new concept via
# textual-inversion using 🤗 Hugging Face [🧨 Diffusers library](https://github.com/huggingface/diffusers).
# By using just 3-5 images you can teach new concepts to Stable Diffusion
# and personalize the model on your own images.
import argparse
import itertools
import math
import os
import random
import cv2
import numpy as np
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
from torch.utils.data import Dataset
import PIL
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import set_seed
from diffusers import (
AutoencoderKL,
DDPMScheduler,
PNDMScheduler,
StableDiffusionPipeline,
UNet2DConditionModel,
)
from diffusers.hub_utils import init_git_repo, push_to_hub
from diffusers.optimization import get_scheduler
from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
from PIL import Image
from torchvision import transforms
from tqdm.auto import tqdm
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
YOUR_TOKEN = "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"
p = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
p.add_argument(
"--input_dir",
type=str,
default="input_images/",
help="the directory contains the images used for fine tuning",
)
p.add_argument(
"--output_dir",
type=str,
default="sd_result",
help="the directory contains the images used for fine tuning",
)
p.add_argument(
"--training_steps",
type=int,
default=3000,
help="the maximum number of training steps",
)
p.add_argument("--seed", type=int, default=42, help="the random seed")
p.add_argument(
"--what_to_teach",
type=str,
choices=["object", "style"],
default="object",
help="what is it that you are teaching?",
)
p.add_argument(
"--placeholder_token",
type=str,
default="<cat-toy>",
help="It is the token you are going to use to represent your new concept",
)
p.add_argument(
"--initializer_token",
type=str,
default="toy",
help="It is a word that can summarise what is your new concept",
)
p.add_argument(
"--inference_steps",
type=int,
default=50,
help="the number of steps for inference",
)
p.add_argument(
"--num_inference_samples",
type=int,
default=4,
help="the number of samples for inference",
)
p.add_argument(
"--prompt",
type=str,
default="a grafitti in a wall with a *s on it",
help="the text prompt to use",
)
args = p.parse_args()
if "*s" not in args.prompt:
raise ValueError(
f'The prompt should have a "*s" which will be replaced by a placeholder token.'
)
prompt1, prompt2 = args.prompt.split("*s")
args.prompt = prompt1 + args.placeholder_token + prompt2
pretrained_model_name_or_path = "CompVis/stable-diffusion-v1-4"
# Load input images.
images = []
for filename in os.listdir(args.input_dir):
img = cv2.imread(os.path.join(args.input_dir, filename))
if img is not None:
images.append(img)
# Setup the prompt templates for training
imagenet_templates_small = [
"a photo of a {}",
"a rendering of a {}",
"a cropped photo of the {}",
"the photo of a {}",
"a photo of a clean {}",
"a photo of a dirty {}",
"a dark photo of the {}",
"a photo of my {}",
"a photo of the cool {}",
"a close-up photo of a {}",
"a bright photo of the {}",
"a cropped photo of a {}",
"a photo of the {}",
"a good photo of the {}",
"a photo of one {}",
"a close-up photo of the {}",
"a rendition of the {}",
"a photo of the clean {}",
"a rendition of a {}",
"a photo of a nice {}",
"a good photo of a {}",
"a photo of the nice {}",
"a photo of the small {}",
"a photo of the weird {}",
"a photo of the large {}",
"a photo of a cool {}",
"a photo of a small {}",
]
imagenet_style_templates_small = [
"a painting in the style of {}",
"a rendering in the style of {}",
"a cropped painting in the style of {}",
"the painting in the style of {}",
"a clean painting in the style of {}",
"a dirty painting in the style of {}",
"a dark painting in the style of {}",
"a picture in the style of {}",
"a cool painting in the style of {}",
"a close-up painting in the style of {}",
"a bright painting in the style of {}",
"a cropped painting in the style of {}",
"a good painting in the style of {}",
"a close-up painting in the style of {}",
"a rendition in the style of {}",
"a nice painting in the style of {}",
"a small painting in the style of {}",
"a weird painting in the style of {}",
"a large painting in the style of {}",
]
# Setup the dataset
class TextualInversionDataset(Dataset):
def __init__(
self,
data_root,
tokenizer,
learnable_property="object", # [object, style]
size=512,
repeats=100,
interpolation="bicubic",
flip_p=0.5,
set="train",
placeholder_token="*",
center_crop=False,
):
self.data_root = data_root
self.tokenizer = tokenizer
self.learnable_property = learnable_property
self.size = size
self.placeholder_token = placeholder_token
self.center_crop = center_crop
self.flip_p = flip_p
self.image_paths = [
os.path.join(self.data_root, file_path)
for file_path in os.listdir(self.data_root)
]
self.num_images = len(self.image_paths)
self._length = self.num_images
if set == "train":
self._length = self.num_images * repeats
self.interpolation = {
"linear": PIL.Image.LINEAR,
"bilinear": PIL.Image.BILINEAR,
"bicubic": PIL.Image.BICUBIC,
"lanczos": PIL.Image.LANCZOS,
}[interpolation]
self.templates = (
imagenet_style_templates_small
if learnable_property == "style"
else imagenet_templates_small
)
self.flip_transform = transforms.RandomHorizontalFlip(p=self.flip_p)
def __len__(self):
return self._length
def __getitem__(self, i):
example = {}
image = Image.open(self.image_paths[i % self.num_images])
if not image.mode == "RGB":
image = image.convert("RGB")
placeholder_string = self.placeholder_token
text = random.choice(self.templates).format(placeholder_string)
example["input_ids"] = self.tokenizer(
text,
padding="max_length",
truncation=True,
max_length=self.tokenizer.model_max_length,
return_tensors="pt",
).input_ids[0]
# default to score-sde preprocessing
img = np.array(image).astype(np.uint8)
if self.center_crop:
crop = min(img.shape[0], img.shape[1])
h, w, = (
img.shape[0],
img.shape[1],
)
img = img[
(h - crop) // 2 : (h + crop) // 2,
(w - crop) // 2 : (w + crop) // 2,
]
image = Image.fromarray(img)
image = image.resize(
(self.size, self.size), resample=self.interpolation
)
image = self.flip_transform(image)
image = np.array(image).astype(np.uint8)
image = (image / 127.5 - 1.0).astype(np.float32)
example["pixel_values"] = torch.from_numpy(image).permute(2, 0, 1)
return example
# Setting up the model
# Load the tokenizer and add the placeholder token as a additional special token.
# Please read and if you agree accept the LICENSE
# [here](https://huggingface.co/CompVis/stable-diffusion-v1-4) if you see an error
tokenizer = CLIPTokenizer.from_pretrained(
pretrained_model_name_or_path,
subfolder="tokenizer",
use_auth_token=YOUR_TOKEN,
)
# Add the placeholder token in tokenizer
num_added_tokens = tokenizer.add_tokens(args.placeholder_token)
if num_added_tokens == 0:
raise ValueError(
f"The tokenizer already contains the token {args.placeholder_token}. Please pass a different"
" `placeholder_token` that is not already in the tokenizer."
)
# Get token ids for our placeholder and initializer token.
# This code block will complain if initializer string is not a single token
# Convert the initializer_token, placeholder_token to ids
token_ids = tokenizer.encode(args.initializer_token, add_special_tokens=False)
# Check if initializer_token is a single token or a sequence of tokens
if len(token_ids) > 1:
raise ValueError("The initializer token must be a single token.")
initializer_token_id = token_ids[0]
placeholder_token_id = tokenizer.convert_tokens_to_ids(args.placeholder_token)
# Load the Stable Diffusion model
# Load models and create wrapper for stable diffusion
text_encoder = CLIPTextModel.from_pretrained(
pretrained_model_name_or_path,
subfolder="text_encoder",
use_auth_token=YOUR_TOKEN,
)
vae = AutoencoderKL.from_pretrained(
pretrained_model_name_or_path,
subfolder="vae",
use_auth_token=YOUR_TOKEN,
)
unet = UNet2DConditionModel.from_pretrained(
pretrained_model_name_or_path,
subfolder="unet",
use_auth_token=YOUR_TOKEN,
)
# We have added the `placeholder_token` in the `tokenizer` so we resize the token embeddings here,
# this will a new embedding vector in the token embeddings for our `placeholder_token`
text_encoder.resize_token_embeddings(len(tokenizer))
# Initialise the newly added placeholder token with the embeddings of the initializer token
token_embeds = text_encoder.get_input_embeddings().weight.data
token_embeds[placeholder_token_id] = token_embeds[initializer_token_id]
# In Textual-Inversion we only train the newly added embedding vector,
# so lets freeze rest of the model parameters here.
def freeze_params(params):
for param in params:
param.requires_grad = False
# Freeze vae and unet
freeze_params(vae.parameters())
freeze_params(unet.parameters())
# Freeze all parameters except for the token embeddings in text encoder
params_to_freeze = itertools.chain(
text_encoder.text_model.encoder.parameters(),
text_encoder.text_model.final_layer_norm.parameters(),
text_encoder.text_model.embeddings.position_embedding.parameters(),
)
freeze_params(params_to_freeze)
# Creating our training data
train_dataset = TextualInversionDataset(
data_root=args.input_dir,
tokenizer=tokenizer,
size=512,
placeholder_token=args.placeholder_token,
repeats=100,
learnable_property=args.what_to_teach, # Option selected above between object and style
center_crop=False,
set="train",
)
def create_dataloader(train_batch_size=1):
return torch.utils.data.DataLoader(
train_dataset, batch_size=train_batch_size, shuffle=True
)
# Create noise_scheduler for training.
noise_scheduler = DDPMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
num_train_timesteps=1000,
tensor_format="pt",
)
# Define hyperparameters for our training
hyperparameters = {
"learning_rate": 5e-04,
"scale_lr": True,
"max_train_steps": args.training_steps,
"train_batch_size": 1,
"gradient_accumulation_steps": 4,
"seed": args.seed,
"output_dir": "sd-concept-output",
}
def training_function(text_encoder, vae, unet):
logger = get_logger(__name__)
train_batch_size = hyperparameters["train_batch_size"]
gradient_accumulation_steps = hyperparameters[
"gradient_accumulation_steps"
]
learning_rate = hyperparameters["learning_rate"]
max_train_steps = hyperparameters["max_train_steps"]
output_dir = hyperparameters["output_dir"]
accelerator = Accelerator(
gradient_accumulation_steps=gradient_accumulation_steps,
)
train_dataloader = create_dataloader(train_batch_size)
if hyperparameters["scale_lr"]:
learning_rate = (
learning_rate
* gradient_accumulation_steps
* train_batch_size
* accelerator.num_processes
)
# Initialize the optimizer
optimizer = torch.optim.AdamW(
text_encoder.get_input_embeddings().parameters(), # only optimize the embeddings
lr=learning_rate,
)
text_encoder, optimizer, train_dataloader = accelerator.prepare(
text_encoder, optimizer, train_dataloader
)
# Move vae and unet to device
vae.to(accelerator.device)
unet.to(accelerator.device)
# Keep vae and unet in eval model as we don't train these
vae.eval()
unet.eval()
# We need to recalculate our total training steps as the size of the training dataloader may have changed.
num_update_steps_per_epoch = math.ceil(
len(train_dataloader) / gradient_accumulation_steps
)
num_train_epochs = math.ceil(max_train_steps / num_update_steps_per_epoch)
# Train!
total_batch_size = (
train_batch_size
* accelerator.num_processes
* gradient_accumulation_steps
)
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Instantaneous batch size per device = {train_batch_size}")
logger.info(
f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}"
)
logger.info(
f" Gradient Accumulation steps = {gradient_accumulation_steps}"
)
logger.info(f" Total optimization steps = {max_train_steps}")
# Only show the progress bar once on each machine.
progress_bar = tqdm(
range(max_train_steps), disable=not accelerator.is_local_main_process
)
progress_bar.set_description("Steps")
global_step = 0
for epoch in range(num_train_epochs):
text_encoder.train()
for step, batch in enumerate(train_dataloader):
with accelerator.accumulate(text_encoder):
# Convert images to latent space
latents = (
vae.encode(batch["pixel_values"])
.latent_dist.sample()
.detach()
)
latents = latents * 0.18215
# Sample noise that we'll add to the latents
noise = torch.randn(latents.shape).to(latents.device)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(
0,
noise_scheduler.num_train_timesteps,
(bsz,),
device=latents.device,
).long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(
latents, noise, timesteps
)
# Get the text embedding for conditioning
encoder_hidden_states = text_encoder(batch["input_ids"])[0]
# Predict the noise residual
noise_pred = unet(
noisy_latents, timesteps, encoder_hidden_states
).sample
loss = (
F.mse_loss(noise_pred, noise, reduction="none")
.mean([1, 2, 3])
.mean()
)
accelerator.backward(loss)
# Zero out the gradients for all token embeddings except the newly added
# embeddings for the concept, as we only want to optimize the concept embeddings
if accelerator.num_processes > 1:
grads = (
text_encoder.module.get_input_embeddings().weight.grad
)
else:
grads = text_encoder.get_input_embeddings().weight.grad
# Get the index for tokens that we want to zero the grads for
index_grads_to_zero = (
torch.arange(len(tokenizer)) != placeholder_token_id
)
grads.data[index_grads_to_zero, :] = grads.data[
index_grads_to_zero, :
].fill_(0)
optimizer.step()
optimizer.zero_grad()
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
global_step += 1
logs = {"loss": loss.detach().item()}
progress_bar.set_postfix(**logs)
if global_step >= max_train_steps:
break
accelerator.wait_for_everyone()
# Create the pipeline using using the trained modules and save it.
if accelerator.is_main_process:
pipeline = StableDiffusionPipeline(
text_encoder=accelerator.unwrap_model(text_encoder),
vae=vae,
unet=unet,
tokenizer=tokenizer,
scheduler=PNDMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
skip_prk_steps=True,
),
safety_checker=StableDiffusionSafetyChecker.from_pretrained(
"CompVis/stable-diffusion-safety-checker"
),
feature_extractor=CLIPFeatureExtractor.from_pretrained(
"openai/clip-vit-base-patch32"
),
)
pipeline.save_pretrained(output_dir)
# Also save the newly trained embeddings
learned_embeds = (
accelerator.unwrap_model(text_encoder)
.get_input_embeddings()
.weight[placeholder_token_id]
)
learned_embeds_dict = {
args.placeholder_token: learned_embeds.detach().cpu()
}
torch.save(
learned_embeds_dict, os.path.join(output_dir, "learned_embeds.bin")
)
import accelerate
accelerate.notebook_launcher(
training_function, args=(text_encoder, vae, unet), num_processes=1
)
# Set up the pipeline
pipe = StableDiffusionPipeline.from_pretrained(
hyperparameters["output_dir"],
# torch_dtype=torch.float16,
)
all_images = []
for _ in range(args.num_inference_samples):
images = pipe(
[args.prompt],
num_inference_steps=args.inference_steps,
guidance_scale=7.5,
).images
all_images.extend(images)
# output_path = os.path.abspath(os.path.join(os.getcwd(), args.output_dir))
if not os.path.isdir(args.output_dir):
os.mkdir(args.output_dir)
[
image.save(f"{args.output_dir}/{i}.jpeg")
for i, image in enumerate(all_images)
]

View File

@@ -78,6 +78,31 @@ def build_benchmark_args(
return benchmark_cl
def build_benchmark_args_non_tensor_input(
input_file: str,
device: str,
inputs: tuple,
mlir_dialect: str,
function_name: str,
):
"""
Inputs: input_file leading to vmfb, input_tensor to function, target device,
and whether it is training or not.
Outputs: string that execute benchmark-module on target model.
"""
path = benchmark_module.__path__[0]
benchmarker_path = os.path.join(path, "..", "..", "iree-benchmark-module")
benchmark_cl = [benchmarker_path, f"--module_file={input_file}"]
# TODO: The function named can be passed as one of the args.
benchmark_cl.append(f"--entry_function={function_name}")
benchmark_cl.append(f"--device={IREE_DEVICE_MAP[device]}")
for input in inputs:
benchmark_cl.append(f"--function_input={input}")
time_extractor = "| awk 'END{{print $2 $3}}'"
benchmark_cl.append(time_extractor)
return benchmark_cl
def run_benchmark_module(benchmark_cl):
"""
Run benchmark command, extract result and return iteration/seconds.

View File

@@ -14,11 +14,13 @@
import iree.runtime as ireert
import iree.compiler as ireec
from shark.iree_utils._common import IREE_DEVICE_MAP, IREE_TARGET_MAP
from shark.iree_utils.benchmark_utils import *
import numpy as np
import os
import re
# Get the iree-compile arguments given device.
def get_iree_device_args(device):
def get_iree_device_args(device, extra_args=[]):
if device == "cpu":
from shark.iree_utils.cpu_utils import get_iree_cpu_args
@@ -30,7 +32,7 @@ def get_iree_device_args(device):
if device in ["metal", "vulkan"]:
from shark.iree_utils.vulkan_utils import get_iree_vulkan_args
return get_iree_vulkan_args()
return get_iree_vulkan_args(extra_args=extra_args)
if device == "rocm":
from shark.iree_utils.gpu_utils import get_iree_rocm_args
@@ -58,17 +60,138 @@ def get_iree_common_args():
return [
"--iree-stream-resource-index-bits=64",
"--iree-vm-target-index-bits=64",
"--iree-util-zero-fill-elided-attrs",
]
def create_dispatch_dirs(bench_dir, device):
bench_dir_path = bench_dir.split("/")
bench_dir_path[-1] = "temp_" + bench_dir_path[-1]
tmp_bench_dir = "/".join(bench_dir_path)
for f_ in os.listdir(bench_dir):
if os.path.isfile(f"{bench_dir}/{f_}"):
dir_name = re.sub("\.\S*$", "", f_)
if os.path.exists(f"{bench_dir}/{dir_name}"):
os.system(f"rm -rf {bench_dir}/{dir_name}")
os.system(f"mkdir {bench_dir}/{dir_name}")
os.system(f"mv {bench_dir}/{f_} {bench_dir}/{dir_name}/{f_}")
for f_ in os.listdir(tmp_bench_dir):
if os.path.isfile(f"{tmp_bench_dir}/{f_}"):
dir_name = ""
for d_ in os.listdir(bench_dir):
if re.search(f"{d_}(?=\D)", f_):
dir_name = d_
if dir_name != "":
os.system(
f"mv {tmp_bench_dir}/{f_} {bench_dir}/{dir_name}/{dir_name}_benchmark.mlir"
)
def compile_benchmark_dirs(bench_dir, device, dispatch_benchmarks):
dispatch_list = []
all_dispatches = False
if dispatch_benchmarks.lower().strip() == "all":
all_dispatches = True
else:
try:
dispatch_list = [
int(dispatch_index)
for dispatch_index in dispatch_benchmarks.split(" ")
]
except:
print("ERROR: Invalid dispatch benchmarks")
return None
for d_ in os.listdir(bench_dir):
in_dispatches = False
for dispatch in dispatch_list:
if str(dispatch) in d_:
in_dispatches = True
if all_dispatches or in_dispatches:
for f_ in os.listdir(f"{bench_dir}/{d_}"):
if "benchmark.mlir" in f_:
dispatch_file = open(f"{bench_dir}/{d_}/{f_}", "r")
module = dispatch_file.read()
dispatch_file.close()
flatbuffer_blob = ireec.compile_str(
module, target_backends=[IREE_TARGET_MAP[device]]
)
vmfb_file = open(
f"{bench_dir}/{d_}/{d_}_benchmark.vmfb", "wb"
)
vmfb_file.write(flatbuffer_blob)
vmfb_file.close()
config = ireert.Config(IREE_DEVICE_MAP[device])
vm_module = ireert.VmModule.from_flatbuffer(
config.vm_instance, flatbuffer_blob
)
benchmark_cl = build_benchmark_args_non_tensor_input(
input_file=f"{bench_dir}/{d_}/{d_}_benchmark.vmfb",
device=device,
inputs=(0,),
mlir_dialect="linalg",
function_name=vm_module.function_names[0],
)
benchmark_bash = open(
f"{bench_dir}/{d_}/{d_}_benchmark.sh", "w+"
)
benchmark_bash.write("#!/bin/bash\n")
benchmark_bash.write(" ".join(benchmark_cl))
benchmark_bash.close()
benchmark_data = run_benchmark_module(benchmark_cl)
benchmark_file = open(
f"{bench_dir}/{d_}/{d_}_data.txt", "w+"
)
benchmark_file.write(f"DISPATCH: {d_}\n")
benchmark_file.write(str(benchmark_data) + "\n")
benchmark_file.write(
"SHARK BENCHMARK RESULT: "
+ str(1 / (benchmark_data * 0.001))
+ "\n"
)
benchmark_file.close()
elif ".mlir" in f_ and "benchmark" not in f_:
dispatch_file = open(f"{bench_dir}/{d_}/{f_}", "r")
module = dispatch_file.read()
dispatch_file.close()
module = re.sub(
"hal.executable private",
"hal.executable public",
module,
)
flatbuffer_blob = ireec.compile_str(
module,
target_backends=[IREE_TARGET_MAP[device]],
extra_args=["--compile-mode=hal-executable"],
)
spirv_file = open(
f"{bench_dir}/{d_}/{d_}_spirv.vmfb", "wb"
)
spirv_file.write(flatbuffer_blob)
spirv_file.close()
def compile_module_to_flatbuffer(
module, device, frontend, func_name, model_config_path
module, device, frontend, func_name, model_config_path, extra_args
):
# Setup Compile arguments wrt to frontends.
input_type = ""
args = get_iree_frontend_args(frontend)
args += get_iree_device_args(device)
args += get_iree_device_args(device, extra_args)
args += get_iree_common_args()
args += extra_args
if frontend in ["tensorflow", "tf"]:
input_type = "mhlo"
@@ -77,12 +200,10 @@ def compile_module_to_flatbuffer(
elif frontend in ["tflite", "tflite-tosa"]:
input_type = "tosa"
elif frontend in ["tm_tensor"]:
input_type = frontend
input_type = ireec.InputType.TM_TENSOR
# TODO: make it simpler.
# Compile according to the input type, else just try compiling.
if input_type not in ["mhlo", "tosa"]:
module = str(module)
if input_type != "":
# Currently for MHLO/TOSA.
flatbuffer_blob = ireec.compile_str(
@@ -94,7 +215,7 @@ def compile_module_to_flatbuffer(
else:
# Currently for Torch.
flatbuffer_blob = ireec.compile_str(
str(module),
module,
target_backends=[IREE_TARGET_MAP[device]],
extra_args=args,
)
@@ -120,10 +241,11 @@ def get_iree_compiled_module(
frontend: str = "torch",
func_name: str = "forward",
model_config_path: str = None,
extra_args: list = [],
):
"""Given a module returns the compiled .vmfb and configs"""
flatbuffer_blob = compile_module_to_flatbuffer(
module, device, frontend, func_name, model_config_path
module, device, frontend, func_name, model_config_path, extra_args
)
return get_iree_module(flatbuffer_blob, device, func_name)
@@ -145,12 +267,15 @@ def export_iree_module_to_vmfb(
mlir_dialect: str = "linalg",
func_name: str = "forward",
model_config_path: str = None,
module_name: str = None,
extra_args: list = [],
):
# Compiles the module given specs and saves it as .vmfb file.
flatbuffer_blob = compile_module_to_flatbuffer(
module, device, mlir_dialect, func_name, model_config_path
module, device, mlir_dialect, func_name, model_config_path, extra_args
)
module_name = f"{mlir_dialect}_{func_name}_{device}"
if module_name is None:
module_name = f"{mlir_dialect}_{func_name}_{device}"
filename = os.path.join(directory, module_name + ".vmfb")
print(f"Saved vmfb in {filename}.")
with open(filename, "wb") as f:

View File

@@ -14,12 +14,28 @@
# All the iree_vulkan related functionalities go here.
from os import linesep
from shark.iree_utils._common import run_cmd
def get_vulkan_triple_flag():
vulkan_device_cmd = "vulkaninfo | grep deviceName"
vulkan_device = run_cmd(vulkan_device_cmd).strip()
def get_vulkan_device_name():
vulkaninfo_dump = run_cmd("vulkaninfo").split(linesep)
vulkaninfo_list = [s.strip() for s in vulkaninfo_dump if "deviceName" in s]
if len(vulkaninfo_list) == 0:
raise ValueError("No device name found in VulkanInfo!")
if len(vulkaninfo_list) > 1:
print(
f"Found {len(vulkaninfo_list)} device names. choosing first one: {vulkaninfo_list[0]}"
)
return vulkaninfo_list[0]
def get_vulkan_triple_flag(extra_args=[]):
if "-iree-vulkan-target-triple=" in " ".join(extra_args):
print(f"Using target triple from command line args")
return None
vulkan_device = get_vulkan_device_name()
if all(x in vulkan_device for x in ("Apple", "M1")):
print(f"Found {vulkan_device} Device. Using m1-moltenvk-macos")
return "-iree-vulkan-target-triple=m1-moltenvk-macos"
@@ -32,15 +48,8 @@ def get_vulkan_triple_flag():
elif all(x in vulkan_device for x in ("RTX", "3090")):
print(f"Found {vulkan_device} Device. Using ampere-rtx3090-linux")
return "-iree-vulkan-target-triple=ampere-rtx3090-linux"
elif all(x in vulkan_device for x in ("Radeon", "RX 5")):
print(
"Found AMD Radeon RX 5000 series device. Using rdna1-5700xt-linux"
)
return "-iree-vulkan-target-triple=rdna1-5700xt-linux"
elif all(x in vulkan_device for x in ("Radeon", "RX 6")):
print(
"Found AMD Radeon RX 6000 series device. Using rdna2-unknown-linux"
)
elif "AMD" in vulkan_device:
print("Found AMD device. Using rdna2-unknown-linux")
return "-iree-vulkan-target-triple=rdna2-unknown-linux"
else:
print(
@@ -52,10 +61,10 @@ def get_vulkan_triple_flag():
return None
def get_iree_vulkan_args():
def get_iree_vulkan_args(extra_args=[]):
# vulkan_flag = ["--iree-flow-demote-i64-to-i32"]
vulkan_flag = []
vulkan_triple_flag = get_vulkan_triple_flag()
vulkan_triple_flag = get_vulkan_triple_flag(extra_args)
if vulkan_triple_flag is not None:
vulkan_flag.append(vulkan_triple_flag)
return vulkan_flag

View File

@@ -93,4 +93,16 @@ parser.add_argument(
help="Specify where to save downloaded shark_tank artifacts. If this is not set, the default is ~/.local/shark_tank/.",
)
parser.add_argument(
"--dispatch_benchmarks",
default=None,
help='dispatches to return benchamrk data on. use "All" for all, and None for none.',
)
parser.add_argument(
"--dispatch_benchmarks_dir",
default="temp_dispatch_benchmarks",
help='directory where you want to store dispatch data generated with "--dispatch_benchmarks"',
)
shark_args, unknown = parser.parse_known_args()

View File

@@ -43,25 +43,34 @@ class SharkBenchmarkRunner(SharkRunner):
# SharkRunner derived class with Benchmarking capabilities.
def __init__(
self,
mlir_module: str,
mlir_module: bytes,
function_name: str = "forward",
device: str = "none",
mlir_dialect: str = "linalg",
extra_args: list = [],
):
self.device = shark_args.device if device == "none" else device
self.frontend_model = None
self.vmfb_file = None
self.mlir_dialect = mlir_dialect
self.extra_args = extra_args
SharkRunner.__init__(
self,
mlir_module,
function_name,
device,
self.mlir_dialect,
self.extra_args,
compile_vmfb=True,
)
if self.vmfb_file == None:
self.vmfb_file = export_iree_module_to_vmfb(
mlir_module, device, shark_args.repro_dir, self.mlir_dialect
mlir_module,
device,
shark_args.repro_dir,
self.mlir_dialect,
function_name,
extra_args=self.extra_args,
)
def setup_cl(self, input_tensors):

View File

@@ -137,7 +137,8 @@ def download_torch_model(
model_dir = os.path.join(WORKDIR, model_dir_name)
with open(
os.path.join(model_dir, model_name + dyn_str + "_torch.mlir")
os.path.join(model_dir, model_name + dyn_str + "_torch.mlir"),
mode="rb",
) as f:
mlir_file = f.read()
@@ -201,7 +202,8 @@ def download_tflite_model(
model_dir = os.path.join(WORKDIR, model_dir_name)
with open(
os.path.join(model_dir, model_name + dyn_str + "_tflite.mlir")
os.path.join(model_dir, model_name + dyn_str + "_tflite.mlir"),
mode="rb",
) as f:
mlir_file = f.read()
@@ -266,7 +268,7 @@ def download_tf_model(
if not os.path.isfile(filename):
filename = os.path.join(model_dir, model_name + "_tf.mlir")
with open(filename) as f:
with open(filename, mode="rb") as f:
mlir_file = f.read()
function_name = str(np.load(os.path.join(model_dir, "function_name.npy")))

View File

@@ -75,14 +75,17 @@ class SharkImporter:
self.module, self.inputs, is_dynamic, tracing_required
)
def _tf_mlir(self, func_name):
def _tf_mlir(self, func_name, save_dir="./shark_tmp/"):
from iree.compiler import tf as tfc
return tfc.compile_module(
self.module, exported_names=[func_name], import_only=True
self.module,
exported_names=[func_name],
import_only=True,
output_file=save_dir,
)
def _tflite_mlir(self, func_name):
def _tflite_mlir(self, func_name, save_dir="./shark_tmp/"):
from iree.compiler import tflite as tflitec
from shark.iree_utils._common import IREE_TARGET_MAP
@@ -90,6 +93,7 @@ class SharkImporter:
self.raw_model_file, # in tflite, it is a path to .tflite file, not a tflite interpreter
input_type="tosa",
import_only=True,
output_file=save_dir,
)
return self.mlir_model
@@ -99,6 +103,7 @@ class SharkImporter:
is_dynamic=False,
tracing_required=False,
func_name="forward",
save_dir="./shark_tmp/",
):
if self.frontend in ["torch", "pytorch"]:
if self.inputs == None:
@@ -108,15 +113,15 @@ class SharkImporter:
sys.exit(1)
return self._torch_mlir(is_dynamic, tracing_required), func_name
if self.frontend in ["tf", "tensorflow"]:
return self._tf_mlir(func_name), func_name
return self._tf_mlir(func_name, save_dir), func_name
if self.frontend in ["tflite", "tf-lite"]:
func_name = "main"
return self._tflite_mlir(func_name), func_name
return self._tflite_mlir(func_name, save_dir), func_name
# Converts the frontend specific tensors into np array.
def convert_to_numpy(self, array_tuple: tuple):
if self.frontend in ["torch", "pytorch"]:
return [x.detach().numpy() for x in array_tuple]
return [x.detach().cpu().numpy() for x in array_tuple]
if self.frontend in ["tf", "tensorflow"]:
return [x.numpy() for x in array_tuple]
@@ -130,19 +135,20 @@ class SharkImporter:
outputs_name = "golden_out.npz"
func_file_name = "function_name"
model_name_mlir = model_name + "_" + self.frontend + ".mlir"
try:
inputs = [x.cpu().detach() for x in inputs]
except AttributeError:
try:
inputs = [x.numpy() for x in inputs]
except AttributeError:
inputs = [x for x in inputs]
np.savez(os.path.join(dir, inputs_name), *inputs)
np.savez(os.path.join(dir, outputs_name), *outputs)
np.save(os.path.join(dir, func_file_name), np.array(func_name))
mlir_str = mlir_data
if self.frontend == "torch":
mlir_str = mlir_data.operation.get_asm()
elif self.frontend == "tf":
mlir_str = mlir_data.decode("utf-8")
elif self.frontend == "tflite":
mlir_str = mlir_data.decode("utf-8")
with open(os.path.join(dir, model_name_mlir), "w") as mlir_file:
mlir_file.write(mlir_str)
with open(os.path.join(dir, model_name_mlir), "wb") as mlir_file:
mlir_file.write(mlir_data)
return
@@ -159,9 +165,13 @@ class SharkImporter:
f"There is no input provided: {self.inputs}, please provide inputs or simply run import_mlir."
)
sys.exit(1)
model_name_mlir = model_name + "_" + self.frontend + ".mlir"
artifact_path = os.path.join(dir, model_name_mlir)
imported_mlir = self.import_mlir(
is_dynamic, tracing_required, func_name
is_dynamic,
tracing_required,
func_name,
save_dir=artifact_path,
)
# TODO: Make sure that any generic function name is accepted. Currently takes in the default function names.
# TODO: Check for multiple outputs.
@@ -171,7 +181,7 @@ class SharkImporter:
golden_out = self.module(*self.inputs)
if torch.is_tensor(golden_out):
golden_out = tuple(
golden_out.detach().numpy(),
golden_out.detach().cpu().numpy(),
)
else:
golden_out = self.convert_to_numpy(golden_out)

View File

@@ -12,6 +12,8 @@
from shark.iree_utils.compile_utils import (
export_iree_module_to_vmfb,
load_flatbuffer,
create_dispatch_dirs,
compile_benchmark_dirs,
)
import os
from shark.shark_runner import SharkRunner
@@ -37,7 +39,7 @@ class SharkInference:
Attributes
----------
mlir_module : str
mlir_module represented in string.
mlir_module represented in string; modules from torch-mlir are serialized in bytecode format.
function_name : str
function to execute in the given mlir_module.
device : str
@@ -63,21 +65,45 @@ class SharkInference:
def __init__(
self,
mlir_module: str,
mlir_module: bytes,
function_name: str = "forward",
device: str = "none",
mlir_dialect: str = "linalg",
is_benchmark: bool = False,
dispatch_benchmark: str = None,
dispatch_benchmark_dir: str = "temp_dispatch_benchmarks",
):
self.mlir_module = mlir_module
self.function_name = function_name
self.device = shark_args.device if device == "none" else device
self.mlir_dialect = mlir_dialect
self.is_benchmark = is_benchmark
self.dispatch_benchmarks = (
shark_args.dispatch_benchmarks
if dispatch_benchmark is None
else dispatch_benchmark
)
self.dispatch_benchmarks_dir = (
shark_args.dispatch_benchmarks_dir
if dispatch_benchmark_dir == "temp_dispatch_benchmarks"
else dispatch_benchmark_dir
)
self.shark_runner = None
def compile(self):
def compile(self, extra_args=[]):
if self.dispatch_benchmarks is not None:
extra_args.append(
f"--iree-hal-dump-executable-sources-to={self.dispatch_benchmarks_dir}"
)
temp_dir = self.dispatch_benchmarks_dir.split("/")
temp_dir[-1] = "temp_" + temp_dir[-1]
temp_dir = "/".join(temp_dir)
self.temp_dispatch_benchmarks_dir = temp_dir
extra_args.append(
f"--iree-hal-dump-executable-benchmarks-to={self.temp_dispatch_benchmarks_dir}"
)
if self.is_benchmark == True:
from shark.shark_benchmark_runner import SharkBenchmarkRunner
@@ -87,6 +113,7 @@ class SharkInference:
self.function_name,
self.device,
self.mlir_dialect,
extra_args=extra_args,
)
else:
@@ -95,8 +122,18 @@ class SharkInference:
self.function_name,
self.device,
self.mlir_dialect,
extra_args=extra_args,
)
if self.dispatch_benchmarks is not None:
create_dispatch_dirs(self.dispatch_benchmarks_dir, self.device)
compile_benchmark_dirs(
self.dispatch_benchmarks_dir,
self.device,
self.dispatch_benchmarks,
)
os.system(f"rm -rf {self.temp_dispatch_benchmarks_dir}")
# inputs are considered to be tuple of np.array.
def forward(self, inputs: tuple):
return self.shark_runner.run(inputs)
@@ -144,13 +181,15 @@ class SharkInference:
# TODO: Instead of passing directory and having names decided by the module
# , user may want to save the module with manual names.
def save_module(self, dir=os.getcwd()):
def save_module(self, dir=os.getcwd(), module_name=None, extra_args=[]):
return export_iree_module_to_vmfb(
self.mlir_module,
self.device,
dir,
self.mlir_dialect,
self.function_name,
module_name=module_name,
extra_args=extra_args,
)
# load and return the module.

View File

@@ -25,7 +25,7 @@ import sys
# supported dialects by the shark-runtime.
supported_dialects = {"linalg", "mhlo", "tosa", "tf-lite"}
supported_dialects = {"linalg", "mhlo", "tosa", "tf-lite", "tm_tensor"}
class SharkRunner:
@@ -61,16 +61,18 @@ class SharkRunner:
def __init__(
self,
mlir_module: str = "none",
mlir_module: bytes = None,
function_name: str = "forward",
device: str = "none",
mlir_dialect: str = "linalg",
extra_args: list = [],
compile_vmfb: bool = True,
):
self.mlir_module = mlir_module
self.function_name = function_name
self.device = shark_args.device if device == "none" else device
self.mlir_dialect = mlir_dialect
self.extra_args = extra_args
if check_device_drivers(self.device):
device_driver_info(self.device)
@@ -86,6 +88,7 @@ class SharkRunner:
self.device,
self.mlir_dialect,
func_name=self.function_name,
extra_args=self.extra_args,
)
def run(self, inputs: tuple):

View File

@@ -17,6 +17,7 @@ import torch_mlir
from torch_mlir_e2e_test.linalg_on_tensors_backends import refbackend
import tempfile
from shark.parser import shark_args
import io
def get_module_name_for_asm_dump(module):
@@ -66,11 +67,14 @@ def get_torch_mlir_module(
tempfile.tempdir = shark_args.repro_dir
module = torch_mlir.compile(
mlir_module = torch_mlir.compile(
module,
input,
output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=jit_trace,
ignore_traced_shapes=ignore_traced_shapes,
)
return module
bytecode_stream = io.BytesIO()
mlir_module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
return bytecode

View File

@@ -1,3 +1,211 @@
## Supported and Validated Models
### PyTorch HuggingFace Models
| PyTorch Language Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| Albert | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| BigBird | :green_heart: (AOT) | | | |
| dbmdz/ConvBERT | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| DistilBERT | :broken_heart: (JIT) | | | |
| GPT2 | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| MobileBert | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| microsoft/beit | :green_heart: | :green_heart: | :broken_heart: | :broken_heart: |
| facebook/deit | :green_heart: | :green_heart: | :broken_heart: | :broken_heart: |
| facebook/convnext | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
### Torchvision Models
| TORCHVISION Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|--------------------|----------------------|----------|----------|-------------|
| AlexNet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| MobileNetV2 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| MobileNetV3 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Unet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnet18 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnet50 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnet101 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnext50_32x4d | :green_heart: (Script) | | | |
| SqueezeNet | :green_heart: (Script) | :green_heart: | :broken_heart: | :broken_heart: |
| EfficientNet | :green_heart: (Script) | | | |
| Regnet | :green_heart: (Script) | | | |
| Resnest | :broken_heart: (Script) | | | |
| Vision Transformer | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| VGG 16 | :green_heart: (Script) | :green_heart: | :green_heart: | |
| Wide Resnet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| RAFT | :broken_heart: (JIT) | | | |
For more information refer to [MODEL TRACKING SHEET](https://docs.google.com/spreadsheets/d/15PcjKeHZIrB5LfDyuw7DGEEE8XnQEX2aX8lm8qbxV8A/edit#gid=0)
### Tensorflow Models (Inference)
| Hugging Face Models | tf-mhlo lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| MiniLM | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| albert-base-v2 | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| DistilBERT | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| CamemBert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| ConvBert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| Deberta | | | | |
| electra | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| funnel | | | | |
| layoutlm | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| longformer | | | | |
| mobile-bert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| rembert | | | | |
| tapas | | | | |
| flaubert | :broken_heart: | :green_heart: | :green_heart: | :green_heart: |
| roberta | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| xlm-roberta | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| mpnet | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
### PyTorch Training Models
| Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :green_heart: | :green_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
### JAX Models
| Models | JAX-MHLO lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| DALL-E | :broken_heart: | :broken_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
<details>
<summary>TFLite Models</summary>
### TFLite Models
| Models | TOSA/LinAlg | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :broken_heart: | :broken_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
| albert | :green_heart: | :green_heart: | | |
| asr_conformer | :green_heart: | :green_heart: | | |
| bird_classifier | :green_heart: | :green_heart: | | |
| cartoon_gan | :green_heart: | :green_heart: | | |
| craft_text | :green_heart: | :green_heart: | | |
| deeplab_v3 | :green_heart: | :green_heart: | | |
| densenet | :green_heart: | :green_heart: | | |
| east_text_detector | :green_heart: | :green_heart: | | |
| efficientnet_lite0_int8 | :green_heart: | :green_heart: | | |
| efficientnet | :green_heart: | :green_heart: | | |
| gpt2 | :green_heart: | :green_heart: | | |
| image_stylization | :green_heart: | :green_heart: | | |
| inception_v4 | :green_heart: | :green_heart: | | |
| inception_v4_uint8 | :green_heart: | :green_heart: | | |
| lightning_fp16 | :green_heart: | :green_heart: | | |
| lightning_i8 | :green_heart: | :green_heart: | | |
| lightning | :green_heart: | :green_heart: | | |
| magenta | :green_heart: | :green_heart: | | |
| midas | :green_heart: | :green_heart: | | |
| mirnet | :green_heart: | :green_heart: | | |
| mnasnet | :green_heart: | :green_heart: | | |
| mobilebert_edgetpu_s_float | :green_heart: | :green_heart: | | |
| mobilebert_edgetpu_s_quant | :green_heart: | :green_heart: | | |
| mobilebert | :green_heart: | :green_heart: | | |
| mobilebert_tf2_float | :green_heart: | :green_heart: | | |
| mobilebert_tf2_quant | :green_heart: | :green_heart: | | |
| mobilenet_ssd_quant | :green_heart: | :green_heart: | | |
| mobilenet_v1 | :green_heart: | :green_heart: | | |
| mobilenet_v1_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v2_int8 | :green_heart: | :green_heart: | | |
| mobilenet_v2 | :green_heart: | :green_heart: | | |
| mobilenet_v2_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v3-large | :green_heart: | :green_heart: | | |
| mobilenet_v3-large_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v35-int8 | :green_heart: | :green_heart: | | |
| nasnet | :green_heart: | :green_heart: | | |
| person_detect | :green_heart: | :green_heart: | | |
| posenet | :green_heart: | :green_heart: | | |
| resnet_50_int8 | :green_heart: | :green_heart: | | |
| rosetta | :green_heart: | :green_heart: | | |
| spice | :green_heart: | :green_heart: | | |
| squeezenet | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v1 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v1_uint8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_fpnlite | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_fpnlite_uint8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_int8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2 | :green_heart: | :green_heart: | | |
| ssd_spaghettinet_large | :green_heart: | :green_heart: | | |
| ssd_spaghettinet_large_uint8 | :green_heart: | :green_heart: | | |
| visual_wake_words_i8 | :green_heart: | :green_heart: | | |
</details>
## Testing and Benchmarks
### Run all model tests on CPU/GPU/VULKAN/Metal
For a list of models included in our pytest model suite, see https://github.com/nod-ai/SHARK/blob/main/tank/all_models.csv
```shell
pytest tank/test_models.py
# Models included in the pytest suite can be found listed in all_models.csv.
# If on Linux for multithreading on CPU (faster results):
pytest tank/test_models.py -n auto
```
### Running specific tests
```shell
# Search for test cases by including a keyword that matches all or part of the test case's name;
pytest tank/test_models.py -k "keyword"
# Test cases are named uniformly by format test_module_<model_name_underscores_only>_<torch/tf>_<static/dynamic>_<device>.
# Example: Test all models on nvidia gpu:
pytest tank/test_models.py -k "cuda"
# Example: Test all tensorflow resnet models on Vulkan backend:
pytest tank/test_models.py -k "resnet and tf and vulkan"
# Exclude a test case:
pytest tank/test_models.py -k "not ..."
### Run benchmarks on SHARK tank pytests and generate bench_results.csv with results.
(the following requires source installation with `IMPORTER=1 ./setup_venv.sh`)
```shell
pytest --benchmark tank/test_models.py
# Just do static GPU benchmarks for PyTorch tests:
pytest --benchmark tank/test_models.py -k "pytorch and static and cuda"
```
### Benchmark Resnet50, MiniLM on CPU
(requires source installation with `IMPORTER=1 ./setup_venv.sh`)
```shell
# We suggest running the following commands as root before running benchmarks on CPU:
cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | awk -F, '{print $2}' | sort -n | uniq | ( while read X ; do echo $X ; echo 0 > /sys/devices/system/cpu/cpu$X/online ; done )
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
# Benchmark canonical Resnet50 on CPU via pytest
pytest --benchmark tank/test_models -k "resnet50 and tf_static_cpu"
# Benchmark canonical MiniLM on CPU via pytest
pytest --benchmark tank/test_models -k "MiniLM and cpu"
# Benchmark MiniLM on CPU via transformer-benchmarks:
git clone --recursive https://github.com/nod-ai/transformer-benchmarks.git
cd transformer-benchmarks
./perf-ci.sh -n
# Check detail.csv for MLIR/IREE results.
```
To run the fine tuning example, from the root SHARK directory, run:
```shell
@@ -11,3 +219,5 @@ if running from a google vm, you can view jupyter notebooks on your local system
gcloud compute ssh <YOUR_INSTANCE_DETAILS> --ssh-flag="-N -L localhost:8888:localhost:8888"
```

View File

@@ -0,0 +1,83 @@
#!/usr/bin/env python
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf
from shark.shark_inference import SharkInference
from shark.parser import shark_args
import argparse
seq_parser = argparse.ArgumentParser(
description="Shark Sequence Classification."
)
seq_parser.add_argument(
"--hf_model_name",
type=str,
default="bert-base-uncased",
help="Hugging face model to run sequence classification.",
)
seq_args, unknown = seq_parser.parse_known_args()
BATCH_SIZE = 1
MAX_SEQUENCE_LENGTH = 16
# Create a set of input signature.
inputs_signature = [
tf.TensorSpec(shape=[BATCH_SIZE, MAX_SEQUENCE_LENGTH], dtype=tf.int32),
tf.TensorSpec(shape=[BATCH_SIZE, MAX_SEQUENCE_LENGTH], dtype=tf.int32),
]
# For supported models please see here:
# https://huggingface.co/docs/transformers/model_doc/auto#transformers.TFAutoModelForSequenceClassification
def preprocess_input(text="This is just used to compile the model"):
tokenizer = AutoTokenizer.from_pretrained(seq_args.hf_model_name)
inputs = tokenizer(
text,
padding="max_length",
return_tensors="tf",
truncation=True,
max_length=MAX_SEQUENCE_LENGTH,
)
return inputs
class SeqClassification(tf.Module):
def __init__(self, model_name):
super(SeqClassification, self).__init__()
self.m = TFAutoModelForSequenceClassification.from_pretrained(
model_name, output_attentions=False, num_labels=2
)
self.m.predict = lambda x, y: self.m(input_ids=x, attention_mask=y)[0]
@tf.function(input_signature=inputs_signature)
def forward(self, input_ids, attention_mask):
return tf.math.softmax(
self.m.predict(input_ids, attention_mask), axis=-1
)
if __name__ == "__main__":
inputs = preprocess_input()
shark_module = SharkInference(
SeqClassification(seq_args.hf_model_name),
(inputs["input_ids"], inputs["attention_mask"]),
)
shark_module.set_frontend("tensorflow")
shark_module.compile()
print(f"Model has been successfully compiled on {shark_args.device}")
while True:
input_text = input(
"Enter the text to classify (press q or nothing to exit): "
)
if not input_text or input_text == "q":
break
inputs = preprocess_input(input_text)
print(
shark_module.forward(
(inputs["input_ids"], inputs["attention_mask"])
)
)

View File

@@ -0,0 +1,3 @@
# Running Different OPT Variants
To run different sizes of OPT, change the string `OPT_MODEL` string in `opt_torch_test.py`. The default is 350m parameters. 66b cases also exist in the file, simply uncomment the test cases.

View File

@@ -0,0 +1,881 @@
# coding=utf-8
# Copyright 2022 The Fairseq Authors and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" PyTorch OPT model."""
import random
from typing import List, Optional, Tuple, Union
import torch
import torch.utils.checkpoint
from torch import nn
from torch.nn import CrossEntropyLoss
from transformers import OPTConfig, PreTrainedModel
from transformers.activations import ACT2FN
from transformers.modeling_outputs import (
BaseModelOutputWithPast,
CausalLMOutputWithPast,
)
_CHECKPOINT_FOR_DOC = "facebook/opt-350m"
_CONFIG_FOR_DOC = "OPTConfig"
_TOKENIZER_FOR_DOC = "GPT2Tokenizer"
# Base model docstring
_EXPECTED_OUTPUT_SHAPE = [1, 8, 1024]
OPT_PRETRAINED_MODEL_ARCHIVE_LIST = [
"facebook/opt-125m",
"facebook/opt-350m",
"facebook/opt-1.3b",
"facebook/opt-2.7b",
"facebook/opt-6.7b",
"facebook/opt-13b",
"facebook/opt-30b",
# See all OPT models at https://huggingface.co/models?filter=opt
]
def _make_causal_mask(
input_ids_shape: torch.Size,
dtype: torch.dtype,
past_key_values_length: int = 0,
):
"""
Make causal mask used for bi-directional self-attention.
"""
bsz, tgt_len = input_ids_shape
mask = torch.full((tgt_len, tgt_len), float("-inf"))
mask_cond = torch.arange(int(mask.size(-1)))
mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0)
# mask = mask.to(dtype)
if past_key_values_length > 0:
mask = torch.cat(
[torch.zeros(tgt_len, past_key_values_length, dtype=dtype), mask],
dim=-1,
)
return mask[None, None, :, :].expand(
bsz, 1, tgt_len, tgt_len + past_key_values_length
)
def _expand_mask(
mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None
):
"""
Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
"""
bsz, src_len = map(int, mask.size())
tgt_len = tgt_len if tgt_len is not None else src_len
expanded_mask = (
mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
)
inverted_mask = 1.0 - expanded_mask
return inverted_mask.masked_fill(
inverted_mask.to(torch.bool), torch.finfo(dtype).min
)
# return inverted_mask.masked_fill(inverted_mask, torch.finfo(dtype).min)
class OPTLearnedPositionalEmbedding(nn.Embedding):
"""
This module learns positional embeddings up to a fixed maximum size.
"""
def __init__(self, num_embeddings: int, embedding_dim: int):
# OPT is set up so that if padding_idx is specified then offset the embedding ids by 2
# and adjust num_embeddings appropriately. Other models don't have this hack
self.offset = 2
super().__init__(num_embeddings + self.offset, embedding_dim)
def forward(
self,
attention_mask: torch.LongTensor,
past_key_values_length: int = 0,
):
"""`input_ids_shape` is expected to be [bsz x seqlen]."""
attention_mask = attention_mask.long()
# create positions depending on attention_mask
positions = (
torch.cumsum(attention_mask, dim=1).type_as(attention_mask)
* attention_mask
).long() - 1
# cut positions if `past_key_values_length` is > 0
positions = positions[:, past_key_values_length:]
return super().forward(positions + self.offset)
# Copied from transformers.models.bart.modeling_bart.BartAttention with Bart->OPT
class OPTAttention(nn.Module):
"""Multi-headed attention from 'Attention Is All You Need' paper"""
def __init__(
self,
embed_dim: int,
num_heads: int,
dropout: float = 0.0,
is_decoder: bool = False,
bias: bool = True,
):
super().__init__()
self.embed_dim = embed_dim
self.num_heads = num_heads
self.dropout = dropout
self.head_dim = embed_dim // num_heads
if (self.head_dim * num_heads) != self.embed_dim:
raise ValueError(
"embed_dim must be divisible by num_heads (got `embed_dim`:"
f" {self.embed_dim} and `num_heads`: {num_heads})."
)
self.scaling = self.head_dim**-0.5
self.is_decoder = is_decoder
self.k_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
self.v_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
self.q_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
self.out_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
return (
tensor.view(bsz, seq_len, self.num_heads, self.head_dim)
.transpose(1, 2)
.contiguous()
)
def forward(
self,
hidden_states: torch.Tensor,
key_value_states: Optional[torch.Tensor] = None,
past_key_value: Optional[Tuple[torch.Tensor]] = None,
attention_mask: Optional[torch.Tensor] = None,
layer_head_mask: Optional[torch.Tensor] = None,
output_attentions: bool = False,
) -> Tuple[
torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]
]:
"""Input shape: Batch x Time x Channel"""
# if key_value_states are provided this layer is used as a cross-attention layer
# for the decoder
is_cross_attention = key_value_states is not None
# bsz, tgt_len, _ = map(int, hidden_states.size())
bsz, tgt_len, _ = hidden_states.size()
# get query proj
query_states = self.q_proj(hidden_states) * self.scaling
# get key, value proj
if is_cross_attention and past_key_value is not None:
# reuse k,v, cross_attentions
key_states = past_key_value[0]
value_states = past_key_value[1]
elif is_cross_attention:
# cross_attentions
key_states = self._shape(self.k_proj(key_value_states), -1, bsz)
value_states = self._shape(self.v_proj(key_value_states), -1, bsz)
elif past_key_value is not None:
# reuse k, v, self_attention
key_states = self._shape(self.k_proj(hidden_states), -1, bsz)
value_states = self._shape(self.v_proj(hidden_states), -1, bsz)
key_states = torch.cat([past_key_value[0], key_states], dim=2)
value_states = torch.cat([past_key_value[1], value_states], dim=2)
else:
# self_attention
key_states = self._shape(self.k_proj(hidden_states), -1, bsz)
value_states = self._shape(self.v_proj(hidden_states), -1, bsz)
if self.is_decoder:
# if cross_attention save Tuple(torch.Tensor, torch.Tensor) of all cross attention key/value_states.
# Further calls to cross_attention layer can then reuse all cross-attention
# key/value_states (first "if" case)
# if uni-directional self-attention (decoder) save Tuple(torch.Tensor, torch.Tensor) of
# all previous decoder key/value_states. Further calls to uni-directional self-attention
# can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
# if encoder bi-directional self-attention `past_key_value` is always `None`
past_key_value = (key_states, value_states)
proj_shape = (bsz * self.num_heads, -1, self.head_dim)
query_states = self._shape(query_states, tgt_len, bsz).view(
*proj_shape
)
key_states = key_states.view(*proj_shape)
value_states = value_states.view(*proj_shape)
src_len = key_states.size(1)
attn_weights = torch.bmm(query_states, key_states.transpose(1, 2))
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
raise ValueError(
"Attention weights should be of size"
f" {(bsz * self.num_heads, tgt_len, src_len)}, but is"
f" {attn_weights.size()}"
)
if attention_mask is not None:
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
raise ValueError(
"Attention mask should be of size"
f" {(bsz, 1, tgt_len, src_len)}, but is"
f" {attention_mask.size()}"
)
attn_weights = (
attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+ attention_mask
)
attn_weights = attn_weights.view(
bsz * self.num_heads, tgt_len, src_len
)
attn_weights = nn.functional.softmax(attn_weights, dim=-1)
if layer_head_mask is not None:
if layer_head_mask.size() != (self.num_heads,):
raise ValueError(
"Head mask for a single layer should be of size"
f" {(self.num_heads,)}, but is {layer_head_mask.size()}"
)
attn_weights = layer_head_mask.view(
1, -1, 1, 1
) * attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
attn_weights = attn_weights.view(
bsz * self.num_heads, tgt_len, src_len
)
if output_attentions:
# this operation is a bit awkward, but it's required to
# make sure that attn_weights keeps its gradient.
# In order to do so, attn_weights have to be reshaped
# twice and have to be reused in the following
attn_weights_reshaped = attn_weights.view(
bsz, self.num_heads, tgt_len, src_len
)
attn_weights = attn_weights_reshaped.view(
bsz * self.num_heads, tgt_len, src_len
)
else:
attn_weights_reshaped = None
attn_probs = nn.functional.dropout(
attn_weights, p=self.dropout, training=self.training
)
attn_output = torch.bmm(attn_probs, value_states)
if attn_output.size() != (
bsz * self.num_heads,
tgt_len,
self.head_dim,
):
raise ValueError(
"`attn_output` should be of size"
f" {(bsz, self.num_heads, tgt_len, self.head_dim)}, but is"
f" {attn_output.size()}"
)
attn_output = attn_output.view(
bsz, self.num_heads, tgt_len, self.head_dim
)
attn_output = attn_output.transpose(1, 2)
# Use the `embed_dim` from the config (stored in the class) rather than `hidden_state` because `attn_output` can be
# partitioned aross GPUs when using tensor-parallelism.
attn_output = attn_output.reshape(bsz, tgt_len, self.embed_dim)
attn_output = self.out_proj(attn_output)
return attn_output, attn_weights_reshaped, past_key_value
class OPTDecoderLayer(nn.Module):
def __init__(self, config: OPTConfig):
super().__init__()
self.embed_dim = config.hidden_size
self.self_attn = OPTAttention(
embed_dim=self.embed_dim,
num_heads=config.num_attention_heads,
dropout=config.attention_dropout,
is_decoder=True,
)
self.do_layer_norm_before = config.do_layer_norm_before
self.dropout = config.dropout
self.activation_fn = ACT2FN[config.activation_function]
self.activation_dropout = config.activation_dropout
self.self_attn_layer_norm = nn.LayerNorm(self.embed_dim)
self.fc1 = nn.Linear(self.embed_dim, config.ffn_dim)
self.fc2 = nn.Linear(config.ffn_dim, self.embed_dim)
self.final_layer_norm = nn.LayerNorm(self.embed_dim)
def forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
layer_head_mask: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = False,
use_cache: Optional[bool] = False,
past_key_value: Optional[Tuple[torch.Tensor]] = None,
) -> Tuple[
torch.FloatTensor,
Optional[Tuple[torch.FloatTensor, torch.FloatTensor]],
]:
# TODO: Refactor this function
residual = hidden_states
# 125m, 1.7B, ..., 175B applies layer norm BEFORE attention
if self.do_layer_norm_before:
hidden_states = self.self_attn_layer_norm(hidden_states)
# Self Attention
hidden_states, self_attn_weights, present_key_value = self.self_attn(
hidden_states=hidden_states,
past_key_value=past_key_value,
attention_mask=attention_mask,
layer_head_mask=layer_head_mask,
output_attentions=output_attentions,
)
hidden_states = nn.functional.dropout(
hidden_states, p=self.dropout, training=self.training
)
hidden_states = residual + hidden_states
# 350m applies layer norm AFTER attention
if not self.do_layer_norm_before:
hidden_states = self.self_attn_layer_norm(hidden_states)
# Fully Connected
hidden_states_shape = hidden_states.shape
hidden_states = hidden_states.reshape(-1, hidden_states.size(-1))
residual = hidden_states
# 125m, 1.7B, ..., 175B applies layer norm BEFORE attention
if self.do_layer_norm_before:
hidden_states = self.final_layer_norm(hidden_states)
hidden_states = self.fc1(hidden_states)
hidden_states = self.activation_fn(hidden_states)
hidden_states = self.fc2(hidden_states)
hidden_states = nn.functional.dropout(
hidden_states, p=self.dropout, training=self.training
)
hidden_states = (residual + hidden_states).view(hidden_states_shape)
# 350m applies layer norm AFTER attention
if not self.do_layer_norm_before:
hidden_states = self.final_layer_norm(hidden_states)
outputs = (hidden_states,)
if output_attentions:
outputs += (self_attn_weights,)
if use_cache:
outputs += (present_key_value,)
return outputs
class OPTPreTrainedModel(PreTrainedModel):
config_class = OPTConfig
base_model_prefix = "model"
supports_gradient_checkpointing = True
_no_split_modules = ["OPTDecoderLayer"]
_keys_to_ignore_on_load_unexpected = [r"decoder.version"]
def _init_weights(self, module):
std = self.config.init_std
if isinstance(module, nn.Linear):
module.weight.data.normal_(mean=0.0, std=std)
if module.bias is not None:
module.bias.data.zero_()
elif isinstance(module, nn.Embedding):
module.weight.data.normal_(mean=0.0, std=std)
if module.padding_idx is not None:
module.weight.data[module.padding_idx].zero_()
def _set_gradient_checkpointing(self, module, value=False):
if isinstance(module, (OPTDecoder)):
module.gradient_checkpointing = value
class OPTDecoder(OPTPreTrainedModel):
def __init__(self, config: OPTConfig):
super().__init__(config)
self.dropout = config.dropout
self.layerdrop = config.layerdrop
self.padding_idx = config.pad_token_id
self.max_target_positions = config.max_position_embeddings
self.vocab_size = config.vocab_size
self.embed_tokens = nn.Embedding(
config.vocab_size, config.word_embed_proj_dim, self.padding_idx
)
self.embed_positions = OPTLearnedPositionalEmbedding(
config.max_position_embeddings, config.hidden_size
)
if config.word_embed_proj_dim != config.hidden_size:
self.project_out = nn.Linear(
config.hidden_size, config.word_embed_proj_dim, bias=False
)
else:
self.project_out = None
if config.word_embed_proj_dim != config.hidden_size:
self.project_in = nn.Linear(
config.word_embed_proj_dim, config.hidden_size, bias=False
)
else:
self.project_in = None
self.layer_norm = None
self.layers = nn.ModuleList(
[OPTDecoderLayer(config) for _ in range(config.num_hidden_layers)]
)
self.gradient_checkpointing = False
# Initialize weights and apply final processing
self.post_init()
def get_input_embeddings(self):
return self.embed_tokens
def set_input_embeddings(self, value):
self.embed_tokens = value
# Copied from transformers.models.bart.modeling_bart.BartDecoder._prepare_decoder_attention_mask
def _prepare_decoder_attention_mask(
self,
attention_mask,
input_shape,
inputs_embeds,
past_key_values_length,
):
# create causal mask
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
combined_attention_mask = None
if input_shape[-1] > 1:
combined_attention_mask = _make_causal_mask(
input_shape,
inputs_embeds.dtype,
past_key_values_length=past_key_values_length,
) # .to(inputs_embeds.device)
if attention_mask is not None:
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
expanded_attn_mask = _expand_mask(
attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
)
combined_attention_mask = (
expanded_attn_mask
if combined_attention_mask is None
else expanded_attn_mask + combined_attention_mask
)
return combined_attention_mask
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
head_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple, BaseModelOutputWithPast]:
# TODO: Refactor this function
output_attentions = (
output_attentions
if output_attentions is not None
else self.config.output_attentions
)
output_hidden_states = (
output_hidden_states
if output_hidden_states is not None
else self.config.output_hidden_states
)
use_cache = (
use_cache if use_cache is not None else self.config.use_cache
)
return_dict = (
return_dict
if return_dict is not None
else self.config.use_return_dict
)
# retrieve input_ids and inputs_embeds
if input_ids is not None and inputs_embeds is not None:
raise ValueError(
"You cannot specify both decoder_input_ids and"
" decoder_inputs_embeds at the same time"
)
elif input_ids is not None:
input_shape = input_ids.size()
input_ids = input_ids.view(-1, input_shape[-1])
elif inputs_embeds is not None:
input_shape = inputs_embeds.size()[:-1]
else:
raise ValueError(
"You have to specify either decoder_input_ids or"
" decoder_inputs_embeds"
)
past_key_values_length = (
past_key_values[0][0].shape[2]
if past_key_values is not None
else 0
)
if inputs_embeds is None:
inputs_embeds = self.embed_tokens(input_ids)
# embed positions
if attention_mask is None:
attention_mask = torch.ones(
inputs_embeds.shape[:2],
dtype=torch.bool,
device=inputs_embeds.device,
)
pos_embeds = self.embed_positions(
attention_mask, past_key_values_length
)
attention_mask = self._prepare_decoder_attention_mask(
attention_mask, input_shape, inputs_embeds, past_key_values_length
)
if self.project_in is not None:
inputs_embeds = self.project_in(inputs_embeds)
hidden_states = inputs_embeds + pos_embeds
hidden_states = nn.functional.dropout(
hidden_states, p=self.dropout, training=self.training
)
# decoder layers
all_hidden_states = () if output_hidden_states else None
all_self_attns = () if output_attentions else None
next_decoder_cache = () if use_cache else None
# check if head_mask has a correct number of layers specified if desired
for attn_mask, mask_name in zip([head_mask], ["head_mask"]):
if attn_mask is not None:
if attn_mask.size()[0] != (len(self.layers)):
raise ValueError(
f"The `{mask_name}` should be specified for"
f" {len(self.layers)} layers, but it is for"
f" {head_mask.size()[0]}."
)
for idx, decoder_layer in enumerate(self.layers):
# add LayerDrop (see https://arxiv.org/abs/1909.11556 for description)
if output_hidden_states:
all_hidden_states += (hidden_states,)
dropout_probability = random.uniform(0, 1)
if self.training and (dropout_probability < self.layerdrop):
continue
past_key_value = (
past_key_values[idx] if past_key_values is not None else None
)
if self.gradient_checkpointing and self.training:
if use_cache:
use_cache = False
def create_custom_forward(module):
def custom_forward(*inputs):
# None for past_key_value
return module(*inputs, output_attentions, None)
return custom_forward
layer_outputs = torch.utils.checkpoint.checkpoint(
create_custom_forward(decoder_layer),
hidden_states,
attention_mask,
head_mask[idx] if head_mask is not None else None,
None,
)
else:
layer_outputs = decoder_layer(
hidden_states,
attention_mask=attention_mask,
layer_head_mask=(
head_mask[idx] if head_mask is not None else None
),
past_key_value=past_key_value,
output_attentions=output_attentions,
use_cache=use_cache,
)
hidden_states = layer_outputs[0]
if use_cache:
next_decoder_cache += (
layer_outputs[2 if output_attentions else 1],
)
if output_attentions:
all_self_attns += (layer_outputs[1],)
if self.project_out is not None:
hidden_states = self.project_out(hidden_states)
# add hidden states from the last decoder layer
if output_hidden_states:
all_hidden_states += (hidden_states,)
next_cache = next_decoder_cache if use_cache else None
if not return_dict:
# TODO: This tuple needs to be a static list (of tensors)
# return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
return hidden_states
return BaseModelOutputWithPast(
last_hidden_state=hidden_states,
past_key_values=next_cache,
hidden_states=all_hidden_states,
attentions=all_self_attns,
)
class OPTModel(OPTPreTrainedModel):
def __init__(self, config: OPTConfig):
super().__init__(config)
self.decoder = OPTDecoder(config)
# Initialize weights and apply final processing
self.post_init()
def get_input_embeddings(self):
return self.decoder.embed_tokens
def set_input_embeddings(self, value):
self.decoder.embed_tokens = value
def get_decoder(self):
return self.decoder
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
head_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple, BaseModelOutputWithPast]:
output_attentions = (
output_attentions
if output_attentions is not None
else self.config.output_attentions
)
output_hidden_states = (
output_hidden_states
if output_hidden_states is not None
else self.config.output_hidden_states
)
use_cache = (
use_cache if use_cache is not None else self.config.use_cache
)
return_dict = (
return_dict
if return_dict is not None
else self.config.use_return_dict
)
# decoder outputs consists of (dec_features, past_key_value, dec_hidden, dec_attn)
decoder_outputs = self.decoder(
input_ids=input_ids,
attention_mask=attention_mask,
head_mask=head_mask,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
# if not return_dict:
# return decoder_outputs
# return BaseModelOutputWithPast(
# last_hidden_state=decoder_outputs.last_hidden_state,
# past_key_values=decoder_outputs.past_key_values,
# hidden_states=decoder_outputs.hidden_states,
# attentions=decoder_outputs.attentions,
# )
return decoder_outputs.last_hidden_state
class OPTForCausalLM(OPTPreTrainedModel):
_keys_to_ignore_on_load_missing = [r"lm_head.weight"]
def __init__(self, config):
super().__init__(config)
self.model = OPTModel(config)
# the lm_head weight is automatically tied to the embed tokens weight
self.lm_head = nn.Linear(
config.word_embed_proj_dim, config.vocab_size, bias=False
)
# Initialize weights and apply final processing
self.post_init()
def get_input_embeddings(self):
return self.model.decoder.embed_tokens
def set_input_embeddings(self, value):
self.model.decoder.embed_tokens = value
def get_output_embeddings(self):
return self.lm_head
def set_output_embeddings(self, new_embeddings):
self.lm_head = new_embeddings
def set_decoder(self, decoder):
self.model.decoder = decoder
def get_decoder(self):
return self.model.decoder
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
head_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
labels: Optional[torch.LongTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple, CausalLMOutputWithPast]:
# TODO: Refactor this function
output_attentions = (
output_attentions
if output_attentions is not None
else self.config.output_attentions
)
output_hidden_states = (
output_hidden_states
if output_hidden_states is not None
else self.config.output_hidden_states
)
return_dict = (
return_dict
if return_dict is not None
else self.config.use_return_dict
)
# decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
outputs = self.model.decoder(
input_ids=input_ids,
attention_mask=attention_mask,
head_mask=head_mask,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
logits = self.lm_head(outputs[0]).contiguous()
loss = None
if labels is not None:
# Shift so that tokens < n predict n
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
loss_fct = CrossEntropyLoss()
loss = loss_fct(
shift_logits.view(-1, self.config.vocab_size),
shift_labels.view(-1),
)
if not return_dict:
output = (logits,) + outputs[1:]
return (loss,) + output if loss is not None else output
return CausalLMOutputWithPast(
loss=loss,
logits=logits,
past_key_values=outputs.past_key_values,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)
def prepare_inputs_for_generation(
self,
input_ids,
past=None,
attention_mask=None,
use_cache=None,
**kwargs,
):
# if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly
if attention_mask is None:
attention_mask = input_ids.new_ones(input_ids.shape)
if past:
input_ids = input_ids[:, -1:]
# first step, decoder_cached_states are empty
return {
"input_ids": input_ids, # encoder_outputs is defined. input_ids not needed
"attention_mask": attention_mask,
"past_key_values": past,
"use_cache": use_cache,
}
@staticmethod
def _reorder_cache(past, beam_idx):
reordered_past = ()
for layer_past in past:
reordered_past += (
tuple(
past_state.index_select(0, beam_idx)
for past_state in layer_past
),
)
return reordered_past

View File

@@ -0,0 +1,188 @@
import unittest
import pytest
import torch_mlir
from hacked_hf_opt import OPTModel
from shark.iree_utils._common import check_device_drivers, device_driver_info
from shark.shark_inference import SharkInference
from tank.model_utils import compare_tensors
from transformers import GPT2Tokenizer
OPT_MODEL = "facebook/opt-350m"
OPT_MODEL_66B = "facebook/opt-66b"
class OPTModuleTester:
def __init__(
self,
benchmark=False,
):
self.benchmark = benchmark
def create_and_check_module(self, dynamic, device, model_name):
# model_mlir, func_name, input, act_out = download_torch_model(
# "opt", dynamic
# )
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
# config = OPTConfig()
# opt_model = OPTModel(config)
opt_model = OPTModel.from_pretrained(model_name)
opt_model.eval()
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
input_ids, attention_mask = (
inputs.data["input_ids"],
inputs.data["attention_mask"],
)
module = torch_mlir.compile(
opt_model,
(input_ids, attention_mask),
output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=True,
)
model_mlir = module.operation.get_asm(
large_elements_limit=None, enable_debug_info=True
)
func_name = "forward"
act_out = opt_model(input_ids, attention_mask).detach()
# mlir_importer = SharkImporter(
# model,
# (input,),
# frontend="torch",
# )
# minilm_mlir, func_name = mlir_importer.import_mlir(
# is_dynamic=dynamic, tracing_required=True
# )
shark_module = SharkInference(
model_mlir,
func_name,
device=device,
mlir_dialect="tm_tensor",
is_benchmark=self.benchmark,
)
shark_module.compile()
results = shark_module.forward((input_ids, attention_mask))
assert compare_tensors(act_out, results)
if self.benchmark:
shark_module.shark_runner.benchmark_all_csv(
(input_ids, attention_mask),
"opt",
dynamic,
device,
"torch",
)
class OPTModuleTest(unittest.TestCase):
@pytest.fixture(autouse=True)
def configure(self, pytestconfig):
self.module_tester = OPTModuleTester(self)
self.module_tester.save_mlir = False
self.module_tester.save_vmfb = False
self.module_tester.benchmark = pytestconfig.getoption("benchmark")
def test_350m_static_cpu(self):
dynamic = False
device = "cpu"
self.module_tester.create_and_check_module(dynamic, device, OPT_MODEL)
def test_350m_dynamic_cpu(self):
dynamic = True
device = "cpu"
self.module_tester.create_and_check_module(dynamic, device, OPT_MODEL)
@pytest.mark.skipif(
check_device_drivers("cuda"), reason=device_driver_info("cuda")
)
def test_350m_static_cuda(self):
dynamic = False
device = "cuda"
self.module_tester.create_and_check_module(dynamic, device, OPT_MODEL)
@pytest.mark.skipif(
check_device_drivers("cuda"), reason=device_driver_info("cuda")
)
def test_350m_dynamic_cuda(self):
dynamic = True
device = "cuda"
self.module_tester.create_and_check_module(dynamic, device, OPT_MODEL)
@pytest.mark.skipif(
check_device_drivers("vulkan"), reason=device_driver_info("vulkan")
)
def test_350m_static_vulkan(self):
dynamic = False
device = "vulkan"
self.module_tester.create_and_check_module(dynamic, device, OPT_MODEL)
@pytest.mark.skipif(
check_device_drivers("vulkan"), reason=device_driver_info("vulkan")
)
def test_350m_dynamic_vulkan(self):
dynamic = True
device = "vulkan"
self.module_tester.create_and_check_module(dynamic, device, OPT_MODEL)
# def test_66b_static_cpu(self):
# dynamic = False
# device = "cpu"
# self.module_tester.create_and_check_module(
# dynamic, device, OPT_MODEL_66B
# )
# def test_66b_dynamic_cpu(self):
# dynamic = True
# device = "cpu"
# self.module_tester.create_and_check_module(
# dynamic, device, OPT_MODEL_66B
# )
# @pytest.mark.skipif(
# check_device_drivers("cuda"), reason=device_driver_info("cuda")
# )
# def test_66b_static_cuda(self):
# dynamic = False
# device = "cuda"
# self.module_tester.create_and_check_module(
# dynamic, device, OPT_MODEL_66B
# )
# @pytest.mark.skipif(
# check_device_drivers("cuda"), reason=device_driver_info("cuda")
# )
# def test_66b_dynamic_cuda(self):
# dynamic = True
# device = "cuda"
# self.module_tester.create_and_check_module(
# dynamic, device, OPT_MODEL_66B
# )
# @pytest.mark.skipif(
# check_device_drivers("vulkan"), reason=device_driver_info("vulkan")
# )
# def test_66b_static_vulkan(self):
# dynamic = False
# device = "vulkan"
# self.module_tester.create_and_check_module(
# dynamic, device, OPT_MODEL_66B
# )
# @pytest.mark.skipif(
# check_device_drivers("vulkan"), reason=device_driver_info("vulkan")
# )
# def test_66b_dynamic_vulkan(self):
# dynamic = True
# device = "vulkan"
# self.module_tester.create_and_check_module(
# dynamic, device, OPT_MODEL_66B
# )
if __name__ == "__main__":
unittest.main()

View File

@@ -131,6 +131,7 @@ class SharkModuleTester:
def create_and_check_module(self, dynamic, device):
shark_args.local_tank_cache = self.local_tank_cache
shark_args.update_tank = self.update_tank
if self.config["framework"] == "tf":
model, func_name, inputs, golden_out = download_tf_model(
self.config["model_name"],
@@ -266,6 +267,9 @@ class SharkModuleTest(unittest.TestCase):
self.module_tester.local_tank_cache = self.pytestconfig.getoption(
"local_tank_cache"
)
self.module_tester.update_tank = self.pytestconfig.getoption(
"update_tank"
)
self.module_tester.tank_url = self.pytestconfig.getoption("tank_url")
if (
config["model_name"] == "distilbert-base-uncased"
@@ -311,6 +315,7 @@ class SharkModuleTest(unittest.TestCase):
reason="https://github.com/nod-ai/SHARK/issues/311, https://github.com/nod-ai/SHARK/issues/342"
)
if config["model_name"] == "funnel-transformer/small" and device in [
"cpu",
"cuda",
"metal",
"vulkan",
@@ -348,14 +353,55 @@ class SharkModuleTest(unittest.TestCase):
and device == "cuda"
):
pytest.xfail(reason="https://github.com/nod-ai/SHARK/issues/390")
if config["model_name"] == "squeezenet1_0" and device == "vulkan":
if config["model_name"] == "squeezenet1_0" and device in [
"cpu",
"metal",
"vulkan",
]:
pytest.xfail(
reason="Numerics Issues: https://github.com/nod-ai/SHARK/issues/388"
)
if config["model_name"] == "mobilenet_v3_small" and device == "vulkan":
if config["model_name"] == "mobilenet_v3_small" and device in [
"metal",
"vulkan",
]:
pytest.xfail(
reason="Numerics Issues: https://github.com/nod-ai/SHARK/issues/388"
)
if config["model_name"] == "hf-internal-testing/tiny-random-flaubert":
pytest.xfail(reason="Transformers API mismatch")
if config["model_name"] == "alexnet" and device in ["metal", "vulkan"]:
pytest.xfail(reason="Assertion Error: Zeros Output")
if (
config["model_name"] == "camembert-base"
and dynamic == False
and device in ["metal", "vulkan"]
):
pytest.xfail(
reason="chlo.broadcast_compare failed to satify constraint"
)
if (
config["model_name"] == "roberta-base"
and dynamic == False
and device in ["metal", "vulkan"]
):
pytest.xfail(
reason="chlo.broadcast_compare failed to satify constraint"
)
if config["model_name"] in [
"microsoft/MiniLM-L12-H384-uncased",
"wide_resnet50_2",
"resnet50",
"resnet18",
"resnet101",
"microsoft/resnet-50",
] and device in ["metal", "vulkan"]:
pytest.xfail(reason="Vulkan Numerical Error (mostly conv)")
if config["model_name"] == "mobilenet_v3_small" and device in [
"cuda",
"cpu",
]:
pytest.xfail(reason="https://github.com/nod-ai/SHARK/issues/424")
if config["framework"] == "tf" and dynamic == True:
pytest.skip(
reason="Dynamic shapes not supported for this framework."

View File

@@ -1,15 +0,0 @@
## Running SharkInference on CPUs, GPUs and MAC.
### Run the binary sequence_classification.
#### The models supported are: [hugging face sequence classification](https://huggingface.co/docs/transformers/model_doc/auto#transformers.TFAutoModelForSequenceClassification)
```shell
./seq_classification.py --hf_model_name="hf_model" --device="cpu" # Use gpu | vulkan
```
Once the model is compiled to run on the device mentioned, we can pass in text and
get the logits.

View File

@@ -5,7 +5,6 @@ camembert-base,hf
dbmdz/convbert-base-turkish-cased,hf
distilbert-base-uncased,hf
google/electra-small-discriminator,hf
hf-internal-testing/tiny-random-flaubert,hf
funnel-transformer/small,hf
microsoft/layoutlm-base-uncased,hf
google/mobilebert-uncased,hf
1 model_name model_type
5 dbmdz/convbert-base-turkish-cased hf
6 distilbert-base-uncased hf
7 google/electra-small-discriminator hf
hf-internal-testing/tiny-random-flaubert hf
8 funnel-transformer/small hf
9 microsoft/layoutlm-base-uncased hf
10 google/mobilebert-uncased hf

Binary file not shown.

Before

Width:  |  Height:  |  Size: 41 KiB

View File

@@ -1,13 +1,16 @@
In order to launch SHARK-web, from the root SHARK directory, run:
## Linux
```shell
IMPORTER=1 ./setup_venv.sh
source shark.venv/bin/activate
pip install diffusers scipy
cd web
wget -O stable_diffusion.mlir https://storage.googleapis.com/shark_tank/prashant_nod/stable_diff/stable_diff_torch.mlir
python index.py
```
This will launch a gradio server with a public URL like:
Running on public URL: https://xxxxx.gradio.app
## Windows
```shell
./setup_venv.ps1
cd web
python index.py --local_tank_cache=<current_working_dir>
```

View File

@@ -1,142 +1,241 @@
from models.resnet50 import resnet_inf
from models.albert_maskfill import albert_maskfill_inf
from models.stable_diffusion import stable_diff_inf
# from models.diffusion.v_diffusion import vdiff_inf
import gradio as gr
from PIL import Image
def debug_event(debug):
return gr.Textbox.update(visible=debug)
with gr.Blocks() as shark_web:
with gr.Row():
with gr.Group():
with gr.Column(scale=1):
img = Image.open("./Nod_logo.jpg")
gr.Image(value=img, show_label=False, interactive=False).style(
height=70, width=70
)
with gr.Column(scale=9):
gr.Label(value="Shark Models Demo.")
with gr.Tabs():
with gr.TabItem("ResNet50"):
image = device = debug = resnet = output = std_output = None
with gr.Row():
with gr.Column(scale=1, min_width=600):
image = gr.Image(label="Image")
device = gr.Textbox(label="Device", value="cpu")
debug = gr.Checkbox(label="DEBUG", value=False)
resnet = gr.Button("Recognize Image").style(
full_width=True
)
with gr.Column(scale=1, min_width=600):
output = gr.Label(label="Output")
std_output = gr.Textbox(
label="Std Output",
value="Nothing to show.",
visible=False,
)
debug.change(
debug_event,
inputs=[debug],
outputs=[std_output],
show_progress=False,
)
resnet.click(
resnet_inf,
inputs=[image, device],
outputs=[output, std_output],
)
with gr.TabItem("Albert MaskFill"):
masked_text = (
device
) = debug = albert_mask = decoded_res = std_output = None
with gr.Row():
with gr.Column(scale=1, min_width=600):
masked_text = gr.Textbox(
label="Masked Text",
placeholder="Give me a sentence with [MASK] to fill",
)
device = gr.Textbox(label="Device", value="cpu")
debug = gr.Checkbox(label="DEBUG", value=False)
albert_mask = gr.Button("Decode Mask")
with gr.Column(scale=1, min_width=600):
decoded_res = gr.Label(label="Decoded Results")
std_output = gr.Textbox(
label="Std Output",
value="Nothing to show.",
visible=False,
)
debug.change(
debug_event,
inputs=[debug],
outputs=[std_output],
show_progress=False,
)
albert_mask.click(
albert_maskfill_inf,
inputs=[masked_text, device],
outputs=[decoded_res, std_output],
)
# with gr.TabItem("V-Diffusion"):
# prompt = sample_count = batch_size = iters = device = v_diffusion = generated_img = None
# with gr.Row():
# with gr.Column(scale=1, min_width=600):
# prompt = gr.Textbox(
# label="Prompt", value="New York City, oil on canvas:5"
# )
# sample_count = gr.Number(label="Sample Count", value=1)
# batch_size = gr.Number(label="Batch Size", value=1)
# iters = gr.Number(label="Steps", value=2)
# device = gr.Textbox(label="Device", value="gpu")
# v_diffusion = gr.Button("Generate image from prompt")
# with gr.Column(scale=1, min_width=600):
# generated_img = gr.Image(type="pil", shape=(100, 100))
# std_output = gr.Textbox(label="Std Output", value="Nothing.")
# v_diffusion.click(
# vdiff_inf,
# inputs=[prompt, sample_count, batch_size, iters, device],
# outputs=[generated_img, std_output]
# )
with gr.TabItem("Stable-Diffusion"):
prompt = (
iters
) = (
device
) = debug = stable_diffusion = generated_img = std_output = None
with gr.Row():
with gr.Column(scale=1, min_width=600):
prompt = gr.Textbox(
label="Prompt",
value="a photograph of an astronaut riding a horse",
)
iters = gr.Number(label="Steps", value=2)
device = gr.Textbox(label="Device", value="vulkan")
debug = gr.Checkbox(label="DEBUG", value=False)
stable_diffusion = gr.Button("Generate image from prompt")
with gr.Column(scale=1, min_width=600):
generated_img = gr.Image(type="pil", shape=(100, 100))
std_output = gr.Textbox(
label="Std Output", value="Nothing.", visible=False
)
debug.change(
debug_event,
inputs=[debug],
outputs=[std_output],
show_progress=False,
)
stable_diffusion.click(
stable_diff_inf,
inputs=[prompt, iters, device],
outputs=[generated_img, std_output],
)
shark_web.launch(share=True, server_port=8080, enable_queue=True)
# from models.resnet50 import resnet_inf
# from models.albert_maskfill import albert_maskfill_inf
from models.stable_diffusion.main import stable_diff_inf
# from models.diffusion.v_diffusion import vdiff_inf
import gradio as gr
from PIL import Image
import json
import os
def debug_event(debug):
return gr.Textbox.update(visible=debug)
prompt_examples = []
prompt_loc = "./prompts.json"
if os.path.exists(prompt_loc):
with open("./prompts.json", encoding="utf-8") as fopen:
prompt_examples = json.load(fopen)
demo_css = """
.gradio-container {background-color: black}
.container {background-color: black !important; padding-top:20px !important; }
#ui_title {padding: 10px !important; }
#top_logo {background-color: transparent; border-radius: 0 !important; border: 0; }
#demo_title {background-color: black; border-radius: 0 !important; border: 0; padding-top: 50px; padding-bottom: 0px; width: 460px !important;}
#demo_title_outer {border-radius: 0; }
#prompt_box_outer div:first-child {border-radius: 0 !important}
#prompt_box textarea {background-color:#1d1d1d !important}
#prompt_examples {margin:0 !important}
#prompt_examples svg {display: none !important;}
.gr-sample-textbox { border-radius: 1rem !important; border-color: rgb(31,41,55) !important; border-width:2px !important; }
#ui_body {background-color: #111111 !important; padding: 10px !important; border-radius: 0.5em !important;}
#img_result+div {display: none !important;}
footer {display: none !important;}
"""
with gr.Blocks(css=demo_css) as shark_web:
# load prompt examples.
with gr.Row(elem_id="ui_title"):
with gr.Column(scale=1, elem_id="demo_title_outer"):
logo2 = Image.open("./logos/sd-demo-logo.png")
gr.Image(
value=logo2,
show_label=False,
interactive=False,
elem_id="demo_title",
).style(width=230)
# with gr.Column(scale=1):
# gr.Label(value="Ultra fast Stable Diffusion")
with gr.Row(elem_id="ui_body"):
prompt = (
scheduler
) = (
iters_count
) = (
batch_size
) = (
steps
) = (
guidance
) = (
height
) = (
width
) = (
seed
) = (
precision
) = (
device
) = (
cache
) = (
iree_vulkan_target_triple
) = (
live_preview
) = (
debug
) = save_img = stable_diffusion = generated_img = std_output = None
# load prompts.
with gr.Row():
with gr.Column(scale=1, min_width=600):
with gr.Group(elem_id="prompt_box_outer"):
prompt = gr.Textbox(
label="Prompt",
value="A photograph of an astronaut riding a horse",
lines=1,
elem_id="prompt_box",
)
with gr.Group():
ex = gr.Examples(
label="Examples",
examples=prompt_examples,
inputs=prompt,
cache_examples=False,
elem_id="prompt_examples",
)
with gr.Row():
steps = gr.Slider(1, 100, value=50, step=1, label="Steps")
guidance = gr.Slider(
0,
50,
value=7.5,
step=0.1,
label="Guidance Scale",
interactive=False,
)
with gr.Row():
height = gr.Slider(
384,
768,
value=512,
step=64,
label="Height",
interactive=False,
)
width = gr.Slider(
384,
768,
value=512,
step=64,
label="Width",
interactive=False,
)
with gr.Row():
precision = gr.Radio(
label="Precision",
value="fp16",
choices=["fp16", "fp32"],
)
seed = gr.Textbox(value="42", max_lines=1, label="Seed")
with gr.Row():
cache = gr.Checkbox(label="Cache", value=True)
# debug = gr.Checkbox(label="DEBUG", value=False)
save_img = gr.Checkbox(label="Save Image", value=False)
live_preview = gr.Checkbox(
label="Live Preview", value=False
)
# Hidden Items.
scheduler = gr.Radio(
label="Scheduler",
value="LMS",
choices=["PNDM", "LMS", "DDIM"],
interactive=False,
visible=False,
)
device = gr.Radio(
label="Device",
value="vulkan",
choices=["cpu", "cuda", "vulkan"],
interactive=False,
visible=False,
elem_id="ugly_line",
)
iters_count = gr.Slider(
1,
24,
value=1,
step=1,
label="Iteration Count",
visible=False,
)
batch_size = gr.Slider(
1,
4,
value=1,
step=1,
label="Batch Size",
visible=False,
)
iree_vulkan_target_triple = gr.Textbox(
value="",
max_lines=1,
label="IREE VULKAN TARGET TRIPLE",
visible=False,
elem_id="ugly_line",
)
stable_diffusion = gr.Button("Generate Image")
# logo
nod_logo = Image.open("./logos/amd-nod-logo.png")
gr.Image(
value=nod_logo,
show_label=False,
interactive=False,
elem_id="top_logo",
).style(width=230)
with gr.Column(scale=1, min_width=600):
generated_img = gr.Image(
type="pil", elem_id="img_result", interactive=False
).style(height=768, width=768)
std_output = gr.Textbox(
label="Std Output",
value="Nothing.",
lines=5,
visible=False,
elem_id="ugly_line",
)
"""
debug.change(
debug_event,
inputs=[debug],
outputs=[std_output],
show_progress=False,
)
"""
stable_diffusion.click(
stable_diff_inf,
inputs=[
prompt,
scheduler,
iters_count,
batch_size,
steps,
guidance,
height,
width,
seed,
precision,
device,
cache,
iree_vulkan_target_triple,
live_preview,
save_img,
],
outputs=[generated_img, std_output],
show_progress=False,
)
shark_web.queue()
shark_web.launch(server_name="0.0.0.0", server_port=8080, enable_queue=True)

BIN
web/logos/Nod_logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

BIN
web/logos/amd-nod-logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

BIN
web/logos/other_logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

BIN
web/logos/sd-demo-logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.0 KiB

View File

@@ -23,7 +23,7 @@ def load_mlir(mlir_loc):
return mlir_module
def compile_through_fx(model, inputs, device, mlir_loc=None):
def compile_through_fx(model, inputs, device, mlir_loc=None, extra_args=[]):
module = load_mlir(mlir_loc)
if mlir_loc == None:
@@ -74,9 +74,12 @@ def compile_through_fx(model, inputs, device, mlir_loc=None):
func_name = "forward"
shark_module = SharkInference(
mlir_model, func_name, device=device, mlir_dialect="tm_tensor"
mlir_model,
func_name,
device=device,
mlir_dialect="tm_tensor",
)
shark_module.compile()
shark_module.compile(extra_args)
return shark_module
@@ -150,6 +153,7 @@ def stable_diff_inf(prompt: str, steps, device: str):
(latent_model_input, torch.tensor([1.0]), text_embeddings),
args["device"],
args["mlir_loc"],
["--iree-flow-enable-conv-nchw-to-nhwc-transform"],
)
compiled_module[args["device"]] = shark_unet
if DEBUG:

View File

@@ -0,0 +1,365 @@
from transformers import CLIPTextModel, CLIPTokenizer
import torch
from PIL import Image
from diffusers import DDIMScheduler, LMSDiscreteScheduler, PNDMScheduler
from tqdm.auto import tqdm
import numpy as np
from models.stable_diffusion.model_wrappers import (
get_vae32,
get_vae16,
get_unet16_wrapped,
get_unet32_wrapped,
)
from models.stable_diffusion.utils import get_shark_model
import time
import os
GCLOUD_BUCKET = "gs://shark_tank/prashant_nod"
VAE_FP16 = "vae_fp16"
VAE_FP32 = "vae_fp32"
UNET_FP16 = "unet_fp16"
UNET_FP32 = "unet_fp32"
TUNED_GCLOUD_BUCKET = "gs://shark_tank/quinn"
UNET_FP16_TUNED = "unet_fp16_tunedv2"
args = None
class Arguments:
def __init__(
self,
prompt: str,
scheduler: str,
iteration_count: int,
batch_size: int,
steps: int,
guidance: float,
height: int,
width: int,
seed: int,
precision: str,
device: str,
cache: bool,
iree_vulkan_target_triple: str,
live_preview: bool,
save_img: bool,
import_mlir: bool = False,
max_length: int = 77,
use_tuned: bool = True,
):
self.prompt = prompt
self.scheduler = scheduler
self.iteration_count = iteration_count
self.batch_size = batch_size
self.steps = steps
self.guidance = guidance
self.height = height
self.width = width
self.seed = seed
self.precision = precision
self.device = device
self.cache = cache
self.iree_vulkan_target_triple = iree_vulkan_target_triple
self.live_preview = live_preview
self.save_img = save_img
self.import_mlir = import_mlir
self.max_length = max_length
self.use_tuned = use_tuned
def get_models():
global args
IREE_EXTRA_ARGS = []
if args.precision == "fp16":
IREE_EXTRA_ARGS += [
"--iree-flow-enable-padding-linalg-ops",
"--iree-flow-linalg-ops-padding-size=32",
]
if args.use_tuned:
unet_gcloud_bucket = TUNED_GCLOUD_BUCKET
vae_gcloud_bucket = GCLOUD_BUCKET
unet_args = IREE_EXTRA_ARGS
vae_args = IREE_EXTRA_ARGS + [
"--iree-flow-enable-conv-nchw-to-nhwc-transform"
]
unet_name = UNET_FP16_TUNED
vae_name = VAE_FP16
else:
unet_gcloud_bucket = GCLOUD_BUCKET
vae_gcloud_bucket = GCLOUD_BUCKET
IREE_EXTRA_ARGS += [
"--iree-flow-enable-conv-nchw-to-nhwc-transform"
]
unet_args = IREE_EXTRA_ARGS
vae_args = IREE_EXTRA_ARGS
unet_name = UNET_FP16
vae_name = VAE_FP16
if args.import_mlir == True:
return get_vae16(args, model_name=VAE_FP16), get_unet16_wrapped(
args, model_name=UNET_FP16
)
else:
return get_shark_model(
args,
vae_gcloud_bucket,
vae_name,
vae_args,
), get_shark_model(
args,
unet_gcloud_bucket,
unet_name,
unet_args,
)
elif args.precision == "fp32":
IREE_EXTRA_ARGS += [
"--iree-flow-enable-conv-nchw-to-nhwc-transform",
"--iree-flow-enable-padding-linalg-ops",
"--iree-flow-linalg-ops-padding-size=16",
]
if args.import_mlir == True:
return get_vae32(args, model_name=VAE_FP32), get_unet32_wrapped(
args, model_name=UNET_FP32
)
else:
return get_shark_model(
args,
GCLOUD_BUCKET,
VAE_FP32,
IREE_EXTRA_ARGS,
), get_shark_model(
args,
GCLOUD_BUCKET,
UNET_FP32,
IREE_EXTRA_ARGS,
)
schedulers = dict()
# set scheduler value
schedulers["PNDM"] = PNDMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
num_train_timesteps=1000,
)
schedulers["LMS"] = LMSDiscreteScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
num_train_timesteps=1000,
)
schedulers["DDIM"] = DDIMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
)
cache_obj = dict()
# cache tokenizer and text_encoder
cache_obj["tokenizer"] = CLIPTokenizer.from_pretrained(
"openai/clip-vit-large-patch14"
)
cache_obj["text_encoder"] = CLIPTextModel.from_pretrained(
"openai/clip-vit-large-patch14"
)
# cache vae and unet.
args = Arguments(
prompt="load unet/vmfb",
scheduler="LMS",
iteration_count=1,
batch_size=1,
steps=50,
guidance=7.5,
height=512,
width=512,
seed=42,
precision="fp16",
device="vulkan",
cache=True,
iree_vulkan_target_triple="",
live_preview=False,
save_img=False,
import_mlir=False,
max_length=77,
use_tuned=True,
)
cache_obj["vae_fp16_vulkan"], cache_obj["unet_fp16_vulkan"] = get_models()
args.precision = "fp32"
cache_obj["vae_fp32_vulkan"], cache_obj["unet_fp32_vulkan"] = get_models()
output_dir = "./stored_results/stable_diffusion"
os.makedirs(output_dir, exist_ok=True)
def stable_diff_inf(
prompt: str,
scheduler: str,
iteration_count: int,
batch_size: int,
steps: int,
guidance: float,
height: int,
width: int,
seed: str,
precision: str,
device: str,
cache: bool,
iree_vulkan_target_triple: str,
live_preview: bool,
save_img: bool,
):
global args
global schedulers
global cache_obj
global output_dir
start = time.time()
# set seed value
if seed == "":
seed = int(torch.randint(low=25, high=100, size=()))
else:
try:
seed = int(seed)
if seed < 0 or seed > 10000:
seed = hash(seed)
except (ValueError, OverflowError) as error:
seed = hash(seed)
scheduler = schedulers[scheduler]
args = Arguments(
prompt,
scheduler,
iteration_count,
batch_size,
steps,
guidance,
height,
width,
seed,
precision,
device,
cache,
iree_vulkan_target_triple,
live_preview,
save_img,
)
dtype = torch.float32 if args.precision == "fp32" else torch.half
num_inference_steps = int(args.steps) # Number of denoising steps
generator = torch.manual_seed(
args.seed
) # Seed generator to create the inital latent noise
# Initialize vae and unet models.
is_model_initialized = False
if (
args.cache
and args.use_tuned
and args.device == "vulkan"
and not args.import_mlir
):
vae_key = f"vae_{args.precision}_vulkan"
unet_key = f"unet_{args.precision}_vulkan"
cached_keys = cache_obj.keys()
if vae_key in cached_keys and unet_key in cached_keys:
vae, unet = cache_obj[vae_key], cache_obj[unet_key]
is_model_initialized = True
if not is_model_initialized:
vae, unet = get_models()
tokenizer = cache_obj["tokenizer"]
text_encoder = cache_obj["text_encoder"]
text_input = tokenizer(
[args.prompt],
padding="max_length",
max_length=args.max_length,
truncation=True,
return_tensors="pt",
)
text_embeddings = text_encoder(text_input.input_ids)[0].to(dtype)
max_length = text_input.input_ids.shape[-1]
uncond_input = tokenizer(
[""] * batch_size,
padding="max_length",
max_length=max_length,
return_tensors="pt",
)
uncond_embeddings = text_encoder(uncond_input.input_ids)[0].to(dtype)
text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
latents = torch.randn(
(batch_size, 4, args.height // 8, args.width // 8),
generator=generator,
dtype=torch.float32,
).to(dtype)
scheduler.set_timesteps(num_inference_steps)
scheduler.is_scale_input_called = True
latents = latents * scheduler.sigmas[0]
text_embeddings_numpy = text_embeddings.detach().numpy()
avg_ms = 0
out_img = None
text_output = ""
for i, t in tqdm(enumerate(scheduler.timesteps)):
text_output += f"\n Iteration = {i} | Timestep = {t} | "
step_start = time.time()
timestep = torch.tensor([t]).to(dtype).detach().numpy()
latents_numpy = latents.detach().numpy()
sigma_numpy = np.array(scheduler.sigmas[i]).astype(np.float32)
noise_pred = unet.forward(
(latents_numpy, timestep, text_embeddings_numpy, sigma_numpy)
)
noise_pred = torch.from_numpy(noise_pred)
step_time = time.time() - step_start
avg_ms += step_time
step_ms = int((step_time) * 1000)
text_output += f"Time = {step_ms}ms."
latents = scheduler.step(noise_pred, i, latents)["prev_sample"]
if live_preview and i % 5 == 0:
scaled_latents = 1 / 0.18215 * latents
latents_numpy = scaled_latents.detach().numpy()
image = vae.forward((latents_numpy,))
image = torch.from_numpy(image)
image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
images = (image * 255).round().astype("uint8")
pil_images = [Image.fromarray(image) for image in images]
out_img = pil_images[0]
yield out_img, text_output
# scale and decode the image latents with vae
latents = 1 / 0.18215 * latents
latents_numpy = latents.detach().numpy()
image = vae.forward((latents_numpy,))
image = torch.from_numpy(image)
image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
images = (image * 255).round().astype("uint8")
pil_images = [Image.fromarray(image) for image in images]
out_img = pil_images[0]
avg_ms = 1000 * avg_ms / args.steps
text_output += f"\n\nAverage step time: {avg_ms}ms/it"
total_time = time.time() - start
text_output += f"\n\nTotal image generation time: {total_time}sec"
if args.save_img:
# save outputs.
output_loc = f"{output_dir}/{time.time()}_{int(args.steps)}_{args.precision}_{args.device}.jpg"
out_img.save(os.path.join(output_loc))
yield out_img, text_output

View File

@@ -0,0 +1,201 @@
from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler
from models.stable_diffusion.utils import compile_through_fx
import torch
YOUR_TOKEN = "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"
def get_vae32(args, model_name="vae_fp32"):
class VaeModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.vae = AutoencoderKL.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="vae",
use_auth_token=YOUR_TOKEN,
)
def forward(self, input):
x = self.vae.decode(input, return_dict=False)[0]
return (x / 2 + 0.5).clamp(0, 1)
vae = VaeModel()
vae_input = torch.rand(1, 4, 64, 64)
shark_vae = compile_through_fx(
args,
vae,
(vae_input,),
model_name,
)
return shark_vae
def get_vae16(args, model_name="vae_fp16"):
class VaeModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.vae = AutoencoderKL.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="vae",
use_auth_token=YOUR_TOKEN,
revision="fp16",
)
def forward(self, input):
x = self.vae.decode(input, return_dict=False)[0]
return (x / 2 + 0.5).clamp(0, 1)
vae = VaeModel()
vae = vae.half().cuda()
vae_input = torch.rand(1, 4, 64, 64, dtype=torch.half).cuda()
shark_vae = compile_through_fx(
args,
vae,
(vae_input,),
model_name,
)
return shark_vae
def get_unet32(args, model_name="unet_fp32"):
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
)
self.in_channels = self.unet.in_channels
self.train(False)
def forward(self, x, y, z):
return self.unet.forward(x, y, z, return_dict=False)[0]
unet = UnetModel()
latent_model_input = torch.rand([2, 4, 64, 64])
text_embeddings = torch.rand([2, args.max_length, 768])
shark_unet = compile_through_fx(
args,
unet,
(latent_model_input, torch.tensor([1.0]), text_embeddings),
model_name,
)
return shark_unet
def get_unet16(args, model_name="unet_fp16"):
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
revision="fp16",
)
self.in_channels = self.unet.in_channels
self.train(False)
def forward(self, x, y, z):
return self.unet.forward(x, y, z, return_dict=False)[0]
unet = UnetModel()
unet = unet.half().cuda()
latent_model_input = torch.rand([2, 4, 64, 64]).half().cuda()
text_embeddings = torch.rand([2, args.max_length, 768]).half().cuda()
shark_unet = compile_through_fx(
args,
unet,
(
latent_model_input,
torch.tensor([1.0]).half().cuda(),
text_embeddings,
),
model_name,
)
return shark_unet
def get_unet16_wrapped(args, model_name="unet_fp16_wrapped"):
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
revision="fp16",
)
self.in_channels = self.unet.in_channels
self.guidance_scale = args.guidance_scale
self.train(False)
def forward(self, latent, timestep, text_embedding, sigma):
# expand the latents if we are doing classifier-free guidance to avoid doing two forward passes.
latents = torch.cat([latent] * 2)
latents = latents / (torch.pow((torch.pow(sigma, 2) + 1), 0.5))
unet_out = self.unet.forward(
latents, timestep, text_embedding, return_dict=False
)[0]
noise_pred_uncond, noise_pred_text = unet_out.chunk(2)
noise_pred = noise_pred_uncond + self.guidance_scale * (
noise_pred_text - noise_pred_uncond
)
return noise_pred
unet = UnetModel()
unet = unet.half().cuda()
latent_model_input = torch.rand([1, 4, 64, 64]).half().cuda()
text_embeddings = torch.rand([2, args.max_length, 768]).half().cuda()
sigma = torch.tensor(1).to(torch.float32)
shark_unet = compile_through_fx(
args,
unet,
(
latent_model_input,
torch.tensor([1.0]).half().cuda(),
text_embeddings,
sigma,
),
model_name,
)
return shark_unet
def get_unet32_wrapped(args, model_name="unet_fp32_wrapped"):
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
"CompVis/stable-diffusion-v1-4",
subfolder="unet",
use_auth_token=YOUR_TOKEN,
)
self.in_channels = self.unet.in_channels
self.guidance_scale = args.guidance_scale
self.train(False)
def forward(self, latent, timestep, text_embedding, sigma):
latents = torch.cat([latent] * 2)
latents = latents / (torch.pow((torch.pow(sigma, 2) + 1), 0.5))
unet_out = self.unet.forward(
latents, timestep, text_embedding, return_dict=False
)[0]
noise_pred_uncond, noise_pred_text = unet_out.chunk(2)
noise_pred = noise_pred_uncond + self.guidance_scale * (
noise_pred_text - noise_pred_uncond
)
return noise_pred
unet = UnetModel()
latent_model_input = torch.rand([1, 4, 64, 64])
text_embeddings = torch.rand([2, args.max_length, 768])
sigma = torch.tensor(1).to(torch.float32)
shark_unet = compile_through_fx(
args,
unet,
(latent_model_input, torch.tensor([1.0]), text_embeddings, sigma),
model_name,
)
return shark_unet

View File

@@ -0,0 +1,90 @@
import torch
from shark.shark_inference import SharkInference
from shark.shark_importer import SharkImporter
from torch.fx.experimental.proxy_tensor import make_fx
from torch._decomp import get_decompositions
import torch_mlir
import os
def _compile_module(args, shark_module, model_name, extra_args=[]):
extended_name = "{}_{}".format(model_name, args.device)
if args.cache:
vmfb_path = os.path.join(os.getcwd(), extended_name + ".vmfb")
if os.path.isfile(vmfb_path):
print("Loading flatbuffer from {}".format(vmfb_path))
shark_module.load_module(vmfb_path)
return shark_module
print("No vmfb found. Compiling and saving to {}".format(vmfb_path))
path = shark_module.save_module(os.getcwd(), extended_name, extra_args)
shark_module.load_module(path)
return shark_module
# Downloads the model from shark_tank and returns the shark_module.
def get_shark_model(args, tank_url, model_name, extra_args=[]):
from shark.shark_downloader import download_torch_model
mlir_model, func_name, inputs, golden_out = download_torch_model(
model_name, tank_url=tank_url
)
shark_module = SharkInference(
mlir_model, func_name, device=args.device, mlir_dialect="linalg"
)
return _compile_module(args, shark_module, model_name, extra_args)
# Converts the torch-module into shark_module.
def compile_through_fx(args, model, inputs, model_name, extra_args=[]):
fx_g = make_fx(
model,
decomposition_table=get_decompositions(
[
torch.ops.aten.embedding_dense_backward,
torch.ops.aten.native_layer_norm_backward,
torch.ops.aten.slice_backward,
torch.ops.aten.select_backward,
torch.ops.aten.norm.ScalarOpt_dim,
torch.ops.aten.native_group_norm,
torch.ops.aten.upsample_bilinear2d.vec,
torch.ops.aten.split.Tensor,
torch.ops.aten.split_with_sizes,
]
),
)(*inputs)
fx_g.graph.set_codegen(torch.fx.graph.CodeGen())
fx_g.recompile()
def strip_overloads(gm):
"""
Modifies the target of graph nodes in :attr:`gm` to strip overloads.
Args:
gm(fx.GraphModule): The input Fx graph module to be modified
"""
for node in gm.graph.nodes:
if isinstance(node.target, torch._ops.OpOverload):
node.target = node.target.overloadpacket
gm.recompile()
strip_overloads(fx_g)
ts_g = torch.jit.trace(fx_g, inputs)
mlir_importer = SharkImporter(
ts_g,
inputs,
frontend="torch",
)
(mlir_module, func_name), _, _ = mlir_importer.import_debug()
shark_module = SharkInference(
mlir_module,
func_name,
device=args.device,
mlir_dialect="linalg",
)
return _compile_module(args, shark_module, model_name, extra_args)

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

8
web/prompts.json Normal file
View File

@@ -0,0 +1,8 @@
[["A high tech solarpunk utopia in the Amazon rainforest"],
["A pikachu fine dining with a view to the Eiffel Tower"],
["A mecha robot in a favela in expressionist style"],
["an insect robot preparing a delicious meal"],
["A digital Illustration of the Babel tower, 4k, detailed, trending in artstation, fantasy vivid colors"],
["Cluttered house in the woods, anime, oil painting, high resolution, cottagecore, ghibli inspired, 4k"],
["A beautiful mansion beside a waterfall in the woods, by josef thoma, matte painting, trending on artstation HQ"],
["portrait photo of a asia old warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes"]]