Compare commits

...

1338 Commits

Author SHA1 Message Date
pdhirajkumarprasad
f6dd02fa67 Fix for migration (#2183)
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
2025-11-24 15:32:02 +05:30
pdhirajkumarprasad
fe03539901 Migration to AMDShark (#2182)
Signed-off-by: pdhirajkumarprasad <dhirajp@amd.com>
2025-11-20 12:52:07 +05:30
Ean Garvey
dba2c8a567 Update README.md (#2181) 2025-03-28 15:43:50 -05:00
Marius Brehler
8c923e5eea Pin, update and replace action (#2180)
* Pin and update actions
  Pins actions to tagged versions by its hashes as suggested by
  the OpenSSF Scorecard project.
* Replace the unmaintained `create-release` action
2025-03-21 16:45:15 +01:00
Marius Brehler
5fef4f694a Fix installing and importing iree-turbine (#2175)
The package was renamed from `shark-turbine` to `iree.turbine` with
iree-org/iree-turbine@40016ad.
2024-10-18 17:16:25 +02:00
Marius Brehler
6b7903793a Update links after repository renamings (#2174) 2024-10-17 18:37:30 +02:00
saienduri
ec75e08a2d Update requirements.txt (#2168) 2024-08-06 19:08:38 -07:00
saienduri
d5013fd13e Update requirements.txt (#2157) 2024-06-18 13:41:35 -07:00
Ean Garvey
26f80ccbbb Fixes to UI config defaults, config loading, and warnings. (#2153) 2024-05-31 18:14:27 -04:00
Ean Garvey
d2c3752dc7 Fix batch count and tweaks to chatbot. (#2151)
* Fix batch count

* Add button to unload models manually.

* Add compiled pipeline option

* Add brevitas to requirements

* Tweaks to chatbot

* Change script loading trigger
2024-05-31 18:48:28 +05:30
Ean Garvey
4505c4549f Force inlined weights on igpu for now, small fixes to chatbot (#2149)
* Add igpu and custom triple support.

* Small fixes to igpu, SDXL-turbo

* custom pipe loading

* formatting

* Remove old nodlogo import.
2024-05-30 11:40:42 -05:00
Gaurav Shukla
793495c9c6 [ui] Add AMD logo in shark studio
Signed-Off-by: Gaurav Shukla <gaurav.shukla@amd.com>
2024-05-30 21:43:15 +05:30
Ean Garvey
13e1d8d98a Add igpu and custom triple support. (#2148) 2024-05-29 17:39:36 -05:00
Ean Garvey
2074df40ad Point to nod fork of diffusers. (#2146) 2024-05-29 00:56:21 -05:00
Ean Garvey
7b30582408 Point to SRT links for windows. (#2145) 2024-05-29 01:20:30 -04:00
Ean Garvey
151195ab74 Add a few requirements for ensured parity with turbine-models requirements. (#2142)
* Add scipy to requirements.

Adds diffusers req and a note for torchsde.
2024-05-28 15:37:31 -05:00
Ean Garvey
8146f0bd2f Remove leftover merge conflict line from setup script. (#2141) 2024-05-28 11:04:45 -07:00
Ean Garvey
68e9281778 (Studio2) Refactors SD pipeline to rely on turbine-models pipeline, fixes to LLM, gitignore (#2129)
* Shark Studio SDXL support, HIP driver support, simpler device info, small fixes

* Fixups to llm API/UI and ignore user config files.

* Small fixes for unifying pipelines.

* Update requirements.txt for iree-turbine (#2130)

* Fix Llama2 on CPU (#2133)

* Filesystem cleanup and custom model fixes (#2127)

* Fix some formatting issues

* Remove IREE pin (fixes exe issue) (#2126)

* Update find links for IREE packages (#2136)

* Shark Studio SDXL support, HIP driver support, simpler device info, small fixes

* Abstract out SD pipelines from Studio Webui (WIP)

* Switch from pin to minimum torch version and fix index url

* Fix device parsing.

* Fix linux setup

* Fix custom weights.

---------

Co-authored-by: saienduri <77521230+saienduri@users.noreply.github.com>
Co-authored-by: gpetters-amd <159576198+gpetters-amd@users.noreply.github.com>
Co-authored-by: gpetters94 <gpetters@protonmail.com>
2024-05-28 13:18:31 -04:00
Ean Garvey
fd07cae991 Update find links for IREE packages (#2136) 2024-05-13 11:43:17 -05:00
gpetters94
6cb86a843e Remove IREE pin (fixes exe issue) (#2126)
* Diagnose a build issue

* Remove IREE pin

* Revert the build on pull request change
2024-04-30 12:27:30 -05:00
gpetters-amd
7db1612a5c Filesystem cleanup and custom model fixes (#2127)
* Initial filesystem cleanup

* More filesystem cleanup

* Fix some formatting issues

* Address comments
2024-04-30 11:18:33 -05:00
gpetters-amd
81d6e059ac Fix Llama2 on CPU (#2133) 2024-04-29 12:18:16 -05:00
saienduri
e003d0abe8 Update requirements.txt for iree-turbine (#2130)
* Update requirements.txt to iree-turbine creation

* Update requirements.txt

* Update requirements.txt

* Update requirements.txt
2024-04-29 12:28:14 -04:00
Quinn Dawkins
cf2513e7b1 Update IREE discord link (#2118)
Discord links for IREE were purged, so update the link on the readme.
2024-04-15 12:54:27 -07:00
Ean Garvey
60d8591e95 Change shark-turbine requirement target branch to main. (#2116) 2024-04-11 19:31:39 -04:00
gpetters-amd
ff91982168 Remove target env (#2114) 2024-04-08 16:52:45 -05:00
powderluv
a6a9e524c1 Drop linux nightly for now 2024-04-05 12:04:36 -07:00
powderluv
732df2e263 Updated signtool key 2024-04-05 12:01:42 -07:00
gpetters-amd
1ee16bd256 Fix the nightly build (#2111) 2024-04-05 19:22:33 +05:30
gpetters-amd
752d775fbd Fix a typo in the nightly build script (#2110) 2024-03-30 17:31:51 -07:00
gpetters-amd
4d1a6a204d Fix builder issue (#2109) 2024-03-30 16:21:55 -07:00
Ean Garvey
0eff62a468 (Studio 2.0) add Stable Diffusion features (#2037)
* (WIP): Studio2 app infra and SD API

UI/app structure and utility implementation.

- Initializers for webui/API launch
- Schedulers file for SD scheduling utilities
- Additions to API-level utilities
- Added embeddings module for LoRA, Lycoris, yada yada
- Added image_processing module for resamplers, resize tools,
  transforms, and any image annotation (PNG metadata)
- shared_cmd_opts module -- sorry, this is stable_args.py. It lives on.
  We still want to have some global control over the app exclusively
  from the command-line. At least we will be free from shark_args.
- Moving around some utility pieces.
- Try to make api+webui concurrency possible in index.py
- SD UI -- this is just img2imgUI but hopefully a little better.
- UI utilities for your nod logos and your gradio temps.

Enable UI / bugfixes / tweaks

* Studio2/SD: Use more correct LoRA alpha calculation (#2034)

* Updates ProcessLoRA to use both embedded LoRA alpha, and lora_strength
optional parameter (default 1.0) when applying LoRA weights.
* Updates ProcessLoRA to cover more dim cases.
* This bring ProcessLoRA into line with PR #2015 against Studio1

* Studio2: Remove duplications from api/utils.py (#2035)

* Remove duplicate os import
* Remove duplicate parse_seed_input function

Migrating to JSON requests in SD UI

More UI and app flow improvements, logging, shared device cache

Model loading

Complete SD pipeline.

Tweaks to VAE, pipeline states

Pipeline tweaks, add cmd_opts parsing to sd api

* Add test for SD

* Small cleanup

* Shark2/SD/UI: Respect ckpt_dir, share and server_port args (#2070)

* Takes whether to generate a gradio live link from the existing --share command
line parameter, rather than hardcoding as True.
* Takes server port from existing --server_port command line parameter, rather than
hardcoding as 11911.
* Default --ckpt_dir parameter to '../models'
* Use --ckpt_dir rather than hardcoding ../models as the base directory for
checkpoints, vae, and lora, etc
* Add a 'checkpoints' directory below --ckpt_dir to match ComfyUI folder structure.
Read custom_weights choices from there, and/or subfolders below there matching
the selected base model.
* Fix --ckpt_dir possibly not working correctly when an absolute rather than relative path
is specified.
* Relabel "Custom Weights" to "Custom Weights Checkpoint" in the UI

* Add StreamingLLM support to studio2 chat (#2060)

* Streaming LLM

* Update precision and add gpu support

* (studio2) Separate weights generation for quantization support

* Adapt prompt changes to studio flow

* Remove outdated flag from llm compile flags.

* (studio2) use turbine vmfbRunner

* tweaks to prompts

* Update CPU path and llm api test.

* Change device in test to cpu.

* Fixes to runner, device names, vmfb mgmt

* Use small test without external weights.

* HF-Reference LLM mode + Update test result to match latest Turbine. (#2080)

* HF-Reference LLM mode.

* Fixup test to match current output from Turbine.

* lint

* Fix test error message + Only initialize HF torch model when used.

* Remove redundant format_out change.

* Add rest API endpoint from LanguageModel API

* Add StreamingLLM support to studio2 chat (#2060)

* Streaming LLM

* Update precision and add gpu support

* (studio2) Separate weights generation for quantization support

* Adapt prompt changes to studio flow

* Remove outdated flag from llm compile flags.

* (studio2) use turbine vmfbRunner

* tweaks to prompts

* Update CPU path and llm api test.

* Change device in test to cpu.

* Fixes to runner, device names, vmfb mgmt

* Use small test without external weights.

* Formatting and init files.

* Remove unused import.

* Small fixes

* Studio2/SD/UI: Improve various parts of the UI for Stable Diffusion (#2074)

* Studio2/SD/UI: Improve various parts of the UI of Shark 2

* Update Gradio pin to 4.15.0.
* Port workarounds for Gradio >4.8.0 main container sizing from Shark 1.0.
* Move nod Logo out of the SD tab and onto the top right of the main tab bar.
* Set nod logo icon as the favicon (as current Shark 1.0).
* Create a tabbed right hand panel within the SD UI sized to the viewport height.
* Make Input Image tab 1 in the right hand panel.
* Make output images, generation log, and  generation buttons, tab 2 in the
right hand panel
* Make config JSON display, with config load, save and clear, tab 3 in the
right hand panel
* Make gallery  area of the Output tab take up all vertical space the other controls
on the tab do not.
* Tidy up the controls on the Config tab somewhat.

* Studio2/SD/UI: Reorganise inputs on Left Panel of SD tab

* Rename previously added Right Panel Output tab to 'Generate'.
* Move Batch Count, Batch Size, and Repeatable Seeds, off of Left Panel and onto 'Generate' Tab.
* On 'Generate' tab, rename 'Generate Image(s)' button to 'Start', and 'Stop Batch' button to 'Stop'. They are now below the Batch inputs on a Generate tab so don't need the specificity.
* Move Device, Low VRAM, and Precision inputs into their own 'Device Settings' Accordion control. (starts closed)
* Rename 'Custom Weights Checkpoint' to 'Checkpoint Weights'
* Move Checkpoint Weights, VAE Model, Standalone Lora Weights, and Embeddings Options controls, into their own 'Model Weights' Accordion control.  (starts closed)
* Move Denoising Strength, and Resample Type controls into their own 'Input Image Processing' Accordion. (starts closed)
* Move any remaining controls in the 'Advanced Options' Accorion directly onto the left panel, and remove then Accordion.
* Enable the copy button for all text boxes on the SD tab.
* Add emoji/unicode glphs to all top level controls and Accordions on the SD Left Panel.
* Start with the 'Generate' as the initially selected tab in the SD Right Panel, working around Gradio issue #7805
* Tweaks to SD Right Tab Panel vertical height.

* Studio2/SD/UI: Sizing tweaks for Right Panel, and >1920 width

* Set height of right panel using vmin rather than vh, with explicit affordances
for fixed areas above and below.
* Port >1920 width Gradio >4.8 CSS workaround from Shark 1.0.

* Studio2/SD: Fix sd pipeline up to "Windows not supported" (#2082)

* Studio2/SD: Fix sd pipeline up to "Windows not supported"

A number of fixes to the SD pipeline as run from the UI, up until the point that dynamo
complains "Windows not yet supported for torch.compile".

* Remove separate install of iree-runtime and iree-compile in setup_venv.ps1, and rely on the
versions installed via the Turbine requirements.txt. Fixes #2063 for me.
* Replace any "None" strings with python None when pulling the config in the UI.
* Add 'hf_auth_token' param to api StableDiffusion class, defaulting to None, and then pass
that in to the various Models where it is required and wasn't already being done before.
* Fix clip custom_weight_params being passed to export_clip_model as "external_weight_file"
rather than "external_weights"
* Don't pass non-existing "custom_vae" parameter to the Turbine Vae Model, instead
pass custom_vae as the "hf_model_id" if it is set. (this may be wrong in the custom vae
cast, but stops the code *always* breaking).

* Studio2/SD/UI: Improve UI config None handling

* When populating the UI from a JSON Config set controls to "None" for null/None
values.
* When generating a JSON Config from the UI set props to null/None for controls
set to "None".
* Use null rather string 'None' in the default config

---------

Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>

* Studio2/SD/UI: Further sd ui pipeline fixes (#2091)

On Windows, this gets us all the way failing in iree compile of the with SD 2.1 base.

- Fix merge errors with sd right pane config UI tab.
- Remove non-requirement.txt install/build of torch/mlir/iree/SRT in setup_venv.ps1, fixing "torch.compile not supported on Windows" error.
- Fix gradio deprecation warning for `root=` FileExplorer kwarg.
- Comment out `precision` and `max_length` kwargs being passed to unet, as not yet supported on main Turbine branch. Avoids keyword argument error.

* Tweak compile-time flags for SD submodels.

* Small fixes to sd, pin mpmath

* Add pyinstaller spec and imports script.

* Fix the .exe (#2101)

* Fix _IREE_TARGET_MAP (#2103) (#2108)

- Change target passed to iree for vulkan from 'vulkan'
to 'vulkan-spriv', as 'vulkan' is not a valid value for
--iree-hal-target-backends with the current iree compiler.

Co-authored-by: Stefan Kapusniak <121311569+one-lithe-rune@users.noreply.github.com>

* Cleanup sd model map.

* Update dependencies.

* Studio2/SD/UI: Update gradio to 4.19.2 (sd-studio2) (#2097)

- Move pin for gradio from 4.15 -> 4.19.2 on the sd-studio2 branch

* fix formatting and disable explicit vulkan env settings.

---------

Co-authored-by: Stefan Kapusniak <121311569+one-lithe-rune@users.noreply.github.com>
Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com>
Co-authored-by: gpetters-amd <159576198+gpetters-amd@users.noreply.github.com>
Co-authored-by: gpetters94 <gpetters@protonmail.com>
2024-03-29 18:13:21 -04:00
dependabot[bot]
5a5de545c9 Bump gradio from 3.34.0 to 4.19.2 in /dataset (#2093)
Bumps [gradio](https://github.com/gradio-app/gradio) from 3.34.0 to 4.19.2.
- [Release notes](https://github.com/gradio-app/gradio/releases)
- [Changelog](https://github.com/gradio-app/gradio/blob/main/CHANGELOG.md)
- [Commits](https://github.com/gradio-app/gradio/compare/v3.34.0...gradio@4.19.2)

---
updated-dependencies:
- dependency-name: gradio
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
2024-03-28 10:01:26 -05:00
Stefan Kapusniak
58f194a450 Fix _IREE_TARGET_MAP (#2103)
- Change target passed to iree for vulkan from 'vulkan'
to 'vulkan-spriv', as 'vulkan' is not a valid value for
--iree-hal-target-backends with the current iree compiler.
2024-03-18 00:21:44 -05:00
Stefan Kapusniak
c5cf005292 Add *.safetensors to .gitignore (#2089)
- shark2 is putting base model .safetensors file in model specific subfolders. Easiest to just ignore
.safetensors completely.
2024-02-17 21:37:47 -06:00
Stefan Kapusniak
12094ec49c Update README to recommend SHARK-1.0 for now (#2087)
- Add a note at the of the top of the README to use SHARK-1.0 whilst
rewrite with Turbine is ongoing.
- Update installation section to point at SHARK-1.0 branch.
- Suggest the latest SHARK-1.0 pre-release as well as stable.
- Recommend running the .exe from the command line.
2024-02-08 14:22:27 -06:00
Daniel Garvey
100e5b8244 address refactor in turbine (#2086)
python/turbine_models -> models
shark-turbine -> core
2024-02-05 13:05:01 -08:00
Stanley Winata
6bf51f1f1d HF-Reference LLM mode + Update test result to match latest Turbine. (#2080)
* HF-Reference LLM mode.

* Fixup test to match current output from Turbine.

* lint

* Fix test error message + Only initialize HF torch model when used.

* Remove redundant format_out change.
2024-02-01 11:46:22 -06:00
Ean Garvey
05b498267e Add StreamingLLM support to studio2 chat (#2060)
* Streaming LLM 

* Update precision and add gpu support

* (studio2) Separate weights generation for quantization support

* Adapt prompt changes to studio flow

* Remove outdated flag from llm compile flags.

* (studio2) use turbine vmfbRunner

* tweaks to prompts

* Update CPU path and llm api test.

* Change device in test to cpu.

* Fixes to runner, device names, vmfb mgmt

* Use small test without external weights.
2024-01-18 19:01:07 -06:00
Ean Garvey
fa95ed30d1 Relocate quantized matmul reassociation flag (#2047)
* Remove quantized matmul reassociation flag

This flag should be a model/use-case specific addition, not a default CPU compile flag.
2023-12-20 12:48:40 -08:00
Daniel Garvey
788cc9157c Remove SHARK 1.0 implementations (#2042)
Any reimplementation of these features should be tracked in https://github.com/nod-ai/SHARK/issues/1931.
These implementations are preserved in the SHARK-1.0 branch: https://github.com/nod-ai/SHARK/tree/SHARK-1.0
2023-12-19 11:47:18 -06:00
Daniel Garvey
ebfcfec338 remove shark 1.0 tests, add support for 2.0 llm
* add support for external weights

* add tests and edit deps
2023-12-14 21:44:37 -06:00
Stefan Kapusniak
f692a012e1 UI: Fixes for Gradio 4.7.1/4.8.0 update (#2024)
* Upgrade Gradio pin from 4.7.1 to 4.80.
* Make Nod AI logos visible again.
* Remove image toolbars from png import boxes.
* Set Input Images on img2img, outpaint and upscaler tabs to be upload
only.
* Change Image control to an ImageEditor control for masking on the
inpaint tab. Remove previous height restriction as this hides the
editing controls.
* Move Input Image/Masked Image on img2img, inpaint, outpaint and
upscaler tabs to be the first control on their tabs.
* Remove download buttons from all galleries as they download some
html rather the image (gradio issue #6595)
* Remove add new row and column from Output Gallery parameters
dataframe.
* Add partial workaround for not being able to select text in the Output
Gallery Gallery parameters dataframe (gradio issue #6086 )
* Fix uglified formatting of subdirectory selection dropown, refresh
button, and open folder buttons on the Output Gallery tab.
* Force Output Gallery to use the full width of the Gallery control
for the preview overlay when an image is selected, rather than
an overlay the width of the selected image.
* Fix sendto buttons.
* Reset Inpaint ImageEditor control with the Mask Layer after generation
is complete, as it gets lost if the image was sent to the tab from
another tab rather than being uploaded. Also rework queuing and
progress rendering along this codepath. This doesn't solve the
underlying problem of the Mask Layer being removed, but does get inpaint
fully working with the Gradio update.
2023-12-14 14:56:37 -06:00
Vivek Khandelwal
3cc643b2de Add support for StableLM-3B model (#2019)
* Add support for StableLM-3B model

* Add support for Quantized StableLM-3B model

* Update stablelm_pipeline.py
2023-12-12 22:39:50 +05:30
Phaneesh Barwaria
bf70e80d20 vulkan device id fix (#2028) 2023-12-08 19:00:26 -06:00
Ean Garvey
7159698496 (Studio) Fix controlnet switching. (#2026)
* Fix controlnet switching.

* Fix txt2img + control adapters
2023-12-07 00:52:36 -06:00
gpetters94
7e12d1782a Fix stencil pipline to use input image (#2027) 2023-12-07 00:25:18 -06:00
Ean Garvey
bb5f133e1c Many UI fixes and controlnet impovements (#2025)
* multi-controlnet UI and perf fixes

* Controlnet fixes
2023-12-06 20:10:06 -06:00
Richard Pastirčák
3af0c6c658 #1843 - Add Export Default settings button (#2016)
* #1843 - Add Export Default settings button

* #1843 reformating units test

---------

Co-authored-by: Richard Pastirčák <richard.pastircak@student.tuke.sk>
2023-12-06 14:58:17 -06:00
Ean Garvey
3322b7264f (vicuna.py) Move enable_tracy_tracing outside of BenchmarkRunInfo (#2011) 2023-12-06 14:57:32 -06:00
Ean Garvey
eeb7bdd143 Fix nodlogo (#2023) 2023-12-06 14:57:16 -06:00
Ean Garvey
2d6f48821d Fix SharkEulerDiscrete (#2022) 2023-12-06 12:25:06 -06:00
Gaurav Shukla
c74b55f24e [ui] Add UI for sharding
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-12-06 17:25:49 +05:30
Elias Joseph
1a723645fb finilized fixes for sharded llama2 2023-12-06 15:35:29 +05:30
Eliasj42
dfdd3b1f78 improved sharded performance and fixed issue with lmhead on rocm (#2008)
* improved sharded performance and fixed issue with lmhead on rocm

* mmap shards + disable sharing of device arrays across devices

* fix device_idx for non-layer vmfbs

* fix time calc for sharded

---------

Co-authored-by: Elias Joseph <elias@nod-labs.com>
Co-authored-by: PhaneeshB <b.phaneesh@gmail.com>
2023-12-05 11:53:44 -08:00
Ean Garvey
6384780d16 Fixes to llama2 cpu compilation and studio UI, schedulers (#2013)
* Fix some issues with defaults

Fixes to llama2 cpu compilation (turns off data tiling for old argmax
mode)

---------

Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
2023-12-05 11:19:19 -05:00
gpetters94
db0c53ae59 Fix zoedepth (#2010) 2023-12-05 04:31:50 -05:00
Ean Garvey
ce9ce3a7c8 (SD) Fix schedulers and multi-controlnet. (#2006)
* (SD) Fixes schedulers if recieving noise preds as numpy arrays

* Fix schedulers and stencil name

* Multicontrolnet fixes
2023-12-05 03:29:18 -06:00
Ean Garvey
d72da3801f (Studio) Update gradio and multicontrolnet UI. (#2001)
* (Studio) Update gradio and multicontrolnet UI.

* Fixes for outputgallery, exe build

* Fix image return types.

* Update Gradio to 4.7.1

* Fix send buttons and hiresfix

* Various bugfixes and SDXL additions.

* More UI fixes and txt2img_sdxl presets.

*enable SDXL-Turbo and custom models, custom VAE for sdxl

* img2img ui tweaks
2023-12-04 12:37:51 -06:00
Eliasj42
9c50edc664 fixed functionality of sharded vicuna/llama2 (#1982)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-12-04 09:11:52 -08:00
Abhishek Varma
a1b7110550 [SDXL] Add SDXL pipeline to SHARK (#1941)
* [SDXL] Add SDXL pipeline to SHARK

-- This commit adds SDXL pipeline to SHARK.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

* (SDXL) Fix --ondemand and vae scale factor use, and fix VAE flags.

---------

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
2023-12-02 03:15:15 -06:00
gpetters94
ff15fd74f6 Add multicontrolnet (#1958) 2023-12-01 13:51:20 -06:00
gpetters94
552b2c3ee3 Add controlmode (#1957) 2023-12-01 13:04:47 -06:00
Ean Garvey
795fc33001 Update default compilation flags for data tiling. (#2000)
* Update default CPU compilation flags.

c5a6cdc8dd

52eb7e9b82

tweak CPU iree-compile flags to match upstream changes.

* Add an option for data tiling on SD models.
2023-11-30 17:05:37 -06:00
gpetters94
2910841fe6 Fix an importer issue on Linux (#1986) 2023-11-30 10:50:33 -06:00
Vivek Khandelwal
396a054856 Fix Sharded Falcon-180b 2023-11-30 21:51:57 +05:30
Vivek Khandelwal
5c66948d4f Fix unsharded Falcon pipeline 2023-11-30 21:51:57 +05:30
Ean Garvey
ed3dda94c0 Cleanup xfails in pytest suite. (#1995) 2023-11-29 23:16:15 -06:00
Quinn Dawkins
d31d28b082 [SD] Add flag to collapse reduction dims pre dispatch formation (#1999) 2023-11-30 00:09:17 -05:00
Evan Ruttenberg
78c607e1d3 Fix typo in default_rocm_arch (#1998) 2023-11-29 20:40:56 -05:00
Vivek Khandelwal
666e601dd9 Remove sharding support for non-180B falcon variants 2023-11-27 13:45:13 +05:30
Vivek Khandelwal
ca58908e5b Add Falcon-GPTQ Support for 2-way sharding 2023-11-27 13:45:13 +05:30
Jakub Kuderski
1f5b39f56e [vicuna.py] Add option to enable tracing (#1993)
This makes the program wait for tracy profiler to connect before exiting
and flush profiling data after each token.

I don't know how to select the tracy iree-runtime variant
programatically -- instead, print an error and exit.
2023-11-24 12:25:03 -08:00
Jakub Kuderski
2da31c4109 [vicuna.py] Rework benchmark statistics calculation (#1992)
- Move statistics out of the main loop
- Add 'end-to-end' numbers
- Switch the main display unit from s to ms
- Start measuring time at 0

The new print format looks like this:
```
Number of iterations: 5
Num tokens: 1 (prompt), 512 (generated), 513 (total)
Prefill: avg. 0.01 ms (stdev 0.00), avg. 97.99 tokens/s
Decode: avg. 4840.44 ms (stdev 28.80), avg. 97.99 tokens/s
Decode end-2-end: avg. 85.78 tokens/s (w/o prompt), avg. 95.98 (w/ prompt)
```
2023-11-23 12:04:03 -05:00
Ean Garvey
da50a16242 Create specified dir if needed during save_mlir and fix vulkan device fetching without URI/ID (#1989) 2023-11-23 01:01:41 -06:00
Stefan Kapusniak
ce38d49f05 Add .mlir to startup shark_tmp cleanup (#1991)
* Add .mlir to the fiiles that are deleted from `./shark_tmp` when studio
is started.
* refactor/rename existing gradio temp file cleanup on startup to be
consistent with a general `./shark_tmp` cleanup
2023-11-22 14:34:28 -06:00
PhaneeshB
2f780f0d38 quick fix rocm None device 2023-11-22 21:17:25 +05:30
Ean Garvey
d051c3a4a7 Use clean_device_info() by default and don't write .mlir to /tmp/ (#1984)
* Move clean_device_info to compile_utils

* Update compile_utils.py

* Fix .mlir writes for some user-level permissions

* Fix cases where full URI is given

* Fix conditionals.

* Fix device path handling in vulkan utils.
2023-11-20 13:10:31 -06:00
Ean Garvey
1b11c82c9d Small UI tweaks for chatbot, fix torchvision requirements (#1988)
- add torchvision to setup_venv.ps1 -- we need this for the torchvision::nms that is now a dependency of controlnet features.
- Don't have bad flashy orange updates when using the chatbot
- Don't limit the height of the chatbot -- there's mixed opinions and solutions around this one. I think the default (400) is just way too small and LLMs generate plenty enough to justify matching the output.
2023-11-21 00:09:10 +05:30
gpetters94
80a33d427f Save intermediate values of controlnet (#1981) 2023-11-17 19:05:41 -05:00
Stefan Kapusniak
4125a26294 API/Docs: Fix incorrect cors arguments listing (#1983)
* Replace `api_cors_origin` in the api/koboldcpp doc, with the correct
 `api_accept_origin`
2023-11-17 12:29:01 -06:00
Ean Garvey
905d0103ff Revert "Re-enable SD tunings without matmuls. (#1976)" (#1979)
This reverts commit 70817bb50a.
2023-11-17 23:44:33 +05:30
Stefan Kapusniak
192b3b2c61 UI: Output galllery cleanups (#1959)
* Workaround gradio bug that causes the parameters frame to always show
scrollbars.
* Remove the original funky method of setting the number of image
columns in the gallery using _fn= javacript events. The version
of gradio we now have pinned allows doing this by setting the property
on the gallery directly and also doesn't keep resetting the columns on
other events being fired.
2023-11-15 22:20:42 -06:00
Stefan Kapusniak
8f9adc4a2a UI: Display top tag frequencies for selected LoRA (#1972)
* Adds a function to webui utils to read metadata from
.safetensors LoRA files. and do limiting parsing of the format written
out by the Kohya SS scripts (https://github.com/kohya-ss/sd-scripts)
to get tag frequency and trained model information.
* Adds a new common_ui_events.py file for gradio event handlers
needed for multiple UI tabs, and adds an event handler for binding to
the change event of the LoRA selection boxes, that outputs HTML
to display the LoRA tag frequency and model information.
* Adds an HTML gradio control to each of the SD tabs to show the
LoRA model name, and most frequently trained tags.
* Bind the change event of the LoRA selection box on each tab
to our new event handler, with the output set to the relevant HTML
control.
2023-11-15 22:19:54 -06:00
Ean Garvey
70817bb50a Re-enable SD tunings without matmuls. (#1976) 2023-11-15 20:42:53 -06:00
jinchen62
dd37c26d36 Update brevitas quant api (#1975) 2023-11-15 10:04:07 -08:00
PhaneeshB
a708879c6c fix iree version mismatch 2023-11-15 01:24:42 +05:30
Ean Garvey
bb1b49eb6f Add --no-index to setup_venv.sh runtime pip install. 2023-11-14 21:44:20 +05:30
Ean Garvey
f6d41affd9 (SHARK Studio) Add Turbine-based llm chatbot. (#1933)
* Dan shark studio (#1970)

* Fix issue in Falcon-GPTQ

* initial webui and llama2

---------

Co-authored-by: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>

* Fix formatting.

---------

Co-authored-by: Daniel Garvey <34486624+dan-garvey@users.noreply.github.com>
Co-authored-by: Vivek Khandelwal <vivekkhandelwal1424@gmail.com>
2023-11-14 09:56:28 -06:00
Stefan Kapusniak
c2163488d8 SD/UI Restrict hires fix/img2img resamplers/schedulers (#1955)
* Restrict resamplers for img2img and high res fix to the ones that
PIL.Image actually supports, since it uses that to di the resampling.
Removed: Antialias, Affine, Cubic. Added: Hamming.
* Set list of available schedulers to CPU only when high res fix
is selected in the web ui. Set list to all schdulers when high res fix
is deselected.
* Put hi res fix in its own Accordian in the txt2img UI instead of
grouping it with Advanced Options.
2023-11-13 16:08:24 -06:00
PhaneeshB
54bff4611d fix cli rocm device selection 2023-11-13 23:35:55 +05:30
PhaneeshB
11510d5111 add intra rocm vmfb differentiator 2023-11-13 23:35:55 +05:30
PhaneeshB
32cab73a29 add iree-rocm-target-chip only if added by user 2023-11-13 23:35:55 +05:30
PhaneeshB
392bade0bf enable non default rocm device selection for webui 2023-11-13 23:35:55 +05:30
Stefan Kapusniak
91df5f0613 API/Docs: Fix an image link in koboldcpp doc (#1954)
* Fix the image link for the koboldcpp style button pointing to the
dialog image rather than the button image.
2023-11-13 11:14:29 -06:00
dependabot[bot]
df20cf9c8a Bump langchain in /apps/language_models/langchain (#1968)
Bumps [langchain](https://github.com/langchain-ai/langchain) from 0.0.325 to 0.0.329.
- [Release notes](https://github.com/langchain-ai/langchain/releases)
- [Commits](https://github.com/langchain-ai/langchain/compare/v0.0.325...v0.0.329)

---
updated-dependencies:
- dependency-name: langchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-12 19:46:00 -08:00
Ean Garvey
c4a908c3ea Pin pydantic to 2.4.1 in requirements (#1967)
pyinstaller-hooks-contrib doesn't see beta versions of pydantic as versions greater than 2.0.0, and so it looks for an attribute `compile` only available in versions older than 2.0.0 if you have a beta version of pydantic.
2023-11-10 21:34:52 -06:00
Stefan Kapusniak
6285430d8a UI: Fix webui launch on non-Windows (#1963)
* Moves the imports of winreg and Tk, into the functions that use them,
with winreg behind a guard clause. This should hopefully mean that if
you're not on Window or not using `ui=app` we won't trip over either
of these due to them not being there.
2023-11-10 16:38:32 -06:00
PhaneeshB
51afe19e20 fix rocm arch selection 2023-11-10 13:22:51 +05:30
Ean Garvey
31005bcf73 Don't require vulkan installation to query devices. (#1953) 2023-11-09 14:46:44 -06:00
dependabot[bot]
f41ad87ef6 Bump langchain in /apps/language_models/langchain (#1926)
Bumps [langchain](https://github.com/langchain-ai/langchain) from 0.0.202 to 0.0.325.
- [Release notes](https://github.com/langchain-ai/langchain/releases)
- [Commits](https://github.com/langchain-ai/langchain/compare/v0.0.202...v0.0.325)

---
updated-dependencies:
- dependency-name: langchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-09 11:03:47 -06:00
dependabot[bot]
d811524a00 Bump pypdf from 3.12.2 to 3.17.0 in /apps/language_models/langchain (#1929)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 3.12.2 to 3.17.0.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/3.12.2...3.17.0)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-09 11:02:43 -06:00
Sungsoon Cho
51e1bd1c5d (OPT) Fix typo in the message; s/reponse/response (#1920) 2023-11-09 11:00:48 -06:00
Phaneesh Barwaria
db89b1bdc1 Fix MacOS web execution flow (#1899)
* fix metal device path for chatbot

* single device remove indexing

* lint fix
2023-11-09 10:59:29 -06:00
Huang Qi
2754e2e257 Fix wrong parameter index passed to 'compile_module_to_flatbuffer' (#1921)
compile_str is always False in compile_module_to_flatbuffer since there
is a parameter 'model_name' before 'debug'.

This issue is relative to https://github.com/nod-ai/SHARK/pull/1863.

Then we can use mlir model buffer in RAM to run inference.
2023-11-09 10:58:05 -06:00
PhaneeshB
ab0e870c43 fix vicuna cli vulkan 2023-11-09 22:27:13 +05:30
Stefan Kapusniak
fb30e8c226 UI: Fix some webui launch corner cases (#1952)
* On windows insist on the presence of webview2 as the embeddable
browser for `ui=app`. If we can't find it, effectively switch back to
`ui=web`. This should prevent pywebview trying to use MSHTML, whilst
saying its deprecated, and apparently we are too much for poor old IE11
* Add webview2 runtime droppings to .gitignore.
* If we can't bind to args.server_port get another suitable port from
the OS and advise the user that we did this in the UI.
* Make `ui=web` mode use 'SHARK AI Studio' as its title. This makes it
consistent with `ui=app`.
* Replace the generic gradio favicon with a nod swirl one instead.
2023-11-09 10:53:28 -06:00
Ean Garvey
a07d542400 (Studio) Disable SD tunings and sub-model downloads (#1944)
* sets --no-use_tuned and --import_mlir as defaults in SHARK Studio.
2023-11-07 15:55:30 -06:00
Stefan Kapusniak
ad55cb696f SD/API: Add missing A1111 APIs to Shark to support koboldcpp image generation (#1924)
* SD/API: Add missing a1111 API features for Koboldcpp

* Refactors SD api functions into their own file
* Adds the following apis implemented by a1111 as needed by koboldcpp:
   - adds /sdapi/v1/sd-models (lists available models)
   - adds /sdapi/v1/options (only the bare minimum needed)
* Adds optional CORS support, use the '--api_accept_origin' command line
argument to activate and configure.
* Extends existing APIs to include optional sampler/scheduler selection
* Extends /sdapi/v1/textimg to recognise the method used by koboldcpp
to select the model.
* Where possible take values not provided to the API in the request from
the existing relevant command line parameters rather than hardcoding
them.
* return a 400 response when a request doesn't have required properties.
* changed default schedulers and models for some apis to ones that
actually seem to work.
* Update api_test.py to include the new APIs.
* Update api_test.py to include a '--verbose' command line option.

* SD/API: Take more API values from args

* Take LoRA from '--use_lora' command line arg if specified
* Take device from '--device' command line arg if specified (substring
match, so a short name such as 'vulkan://0' should work)

* SD/API: add more endpoints and pydantic typing

* Mount the whole of /sdapi from index.py as a FastAPI application,
rather than each endpoint individually
* Add the following additional API endpoints:
  * /sdapi/v1/samplers
  * /sdapi/v1/cmd-flags
* Make scheduler/sampler selection checking and fallback much more
robust.
* Support aliasing some A1111 scheduler/sampler names to the diffusers
ones we are using.
* Expand response /sdapi/v1/options to add a few more things.
* Split non-api functions and variables into their own utils.py file.
* Support 'n_iter' request property and the return of multiple images
from generation endpoints. Equivalent of '--batch_count', batch_size
is stil hardcoded at 1
* Include (some) hires_fix request properties in txt2img endpoint
* Rework endpoints using pydantic model classes for better request
validation and so we get much improved swagger api docs at
/sdapi/docs and redoc at /sdapi/redoc

* SD/API Delete commented out code from index.py

* Delete some code that is no longer needed by the SD API in index.py
(and one line sdapi_v1.py) that I'd previously only commented out.

* SD/UI: Add shark_sd_koboldcpp.md document

* Add documentation on how to set up Koboldcpp with SHARK
* Link this and the existing blender set up document from the main
README.md

* SD/API Improve stencil options in img2img endpoint

In /sdapi/v1/img2img:
  * Add zoedepth to the controlnet use_stencil options
  * Require and use second image as stencil mask for controlnet scribble
2023-11-06 15:20:19 -06:00
Jakub Kuderski
488a172292 [vicuna.py] Allow to pass extra arguments to iree-compile (#1935)
Add a new flag `-Xiree_compile` to forward extra compiler arguments to
`iree-compile`. This flag can be set multiple times to pass more than
one extra argument.
2023-11-06 12:12:34 -05:00
Stanley Winata
500c4f2306 [compile utils] Fix ROCM to not expect config.id as a default. (#1939) 2023-11-06 08:44:53 -08:00
Vivek Khandelwal
92b694db4d Add support for Falcon-40b-GPTQ 2023-11-06 19:49:19 +05:30
Vivek Khandelwal
322874f7f9 Fix issue in Falcon-GPTQ 2023-11-03 11:48:36 +05:30
Ean Garvey
5001db3415 Add 7800xt to target triples explicitly. (#1928) 2023-11-01 17:11:45 -05:00
Vivek Khandelwal
71846344a2 Add sharded Falcon-GPTQ support
This commit adds the support for sharded Falcon-7b-GPTQ and
Falcon-180B-GPTQ. This commit also adds the support for 4-way
sharding of the Falcon model for the device ROCM.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
2023-11-01 12:11:44 +05:30
gpetters94
72e27c96fc Add ZoeDepth (#1834)
* Add ZoeDepth

* Add einops to Studio imports.

* Specify ref for forked torch.hub repos.

* Unpin timm.

---------

Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
Co-authored-by: Ean Garvey <garveyej@gmail.com>
2023-10-30 11:57:45 -05:00
PhaneeshB
7963abb8ec remove caching for rocm args 2023-10-29 07:07:57 +05:30
Ean Garvey
98244232dd Add smoothquant OPT to examples. (#1922) 2023-10-27 12:32:12 -05:00
PhaneeshB
679a452139 fix calls and remove unused imports for check_device_drivers 2023-10-27 10:30:40 +05:30
PhaneeshB
72c0a8abc8 remove dependency on external commands for driver installation check 2023-10-27 10:30:40 +05:30
Vivek Khandelwal
ea920f2955 Add sharded Falcon support 2023-10-26 21:53:25 +05:30
Phaneesh Barwaria
486202377a update dependency on rocm/hip info command (#1900)
* add support for rocm flags

* add rocm target flag to chat args

* rm rocm libs dependency message
2023-10-26 15:18:25 +05:30
Sungsoon Cho
0c38c33d0a Add opt_causallm_samples.py. (#1916) 2023-10-25 11:52:51 -05:00
Ean Garvey
841773fa32 Updates to opt_causallm example (#1905)
* Updates to opt_causallm example

* Fixup opt_perf_comparison.py

* Use same filenames across opt examples.
2023-10-24 10:54:39 -07:00
Stefan Kapusniak
0361db46f9 SD: Fix unet untuned opt_flags (#1912)
* correct my sloppy copy/paste for the untuned unet default compilation
flags that introduced an extra 'detach' into what should have been
'iree-global-opt-convert-1x1-filter-conv2d-to-matmul'
2023-10-24 12:47:33 -05:00
xzuyn
a012433ffd Save hiresfix info if used (#1914) 2023-10-24 12:45:10 -05:00
xzuyn
5061193da3 Move Generate, Randomize Seed, & Stop Batch to same positions as txt2img (#1915) 2023-10-24 12:44:39 -05:00
xzuyn
bff48924be LLaMa 2 Chat template fix (#1913) 2023-10-23 18:51:15 -05:00
Stefan Kapusniak
825b36cbdd Fix MLIR Textual PassPipeline Error (#1910) 2023-10-22 07:39:52 -07:00
Stefan Kapusniak
134441957d SD - Fix civitai download on Windows +improvements (#1907) 2023-10-21 11:17:41 -07:00
Stefan Kapusniak
7cd14fdc47 SD/UI: Use a single model selection box on UI tabs (#1906)
* Allow entry of a huggingface model id or civitai download url to be
done in the main model selection dropdown on SD tabs
* Remove separate textbox for entering huggingface model id or civitai
download url on SD Tabs
* Remove 'None' option from the model selection dropdown (no longer
needed) on SD tabs
* Update png metadata drop zone on txt2img tab to work with a single
argument for model selection
* Update UI generate functions on SD tabs to work with single argument
model selection
* Update API code for changes to the UI generate functions
* Move info about the custom model path to the logging textarea on SD
tabs
2023-10-21 10:06:05 -07:00
Ean Garvey
e6cb5cef57 Add --additional_runtime_args option and use in OPT example. (#1855)
* Add --additional_runtime_args option and use in OPT example.

Fix the func name. (#1838)

Co-authored-by: Sungsoon Cho <sungsoon.cho@gmail.com>
2023-10-19 13:29:39 -05:00
Huang Qi
66abee8e5b SharkInference: Fix various examples and README.md (#1903)
Follow https://github.com/nod-ai/SHARK/pull/708, remove parameter 'func_name'
for SharkInference.
2023-10-19 09:28:36 -05:00
Ean Garvey
4797bb89f5 Stringify path for ireec.compile_file (#1901)
* Stringify path for ireec.compile_file

* Update test-models.yml
2023-10-18 14:59:23 -05:00
Vivek Khandelwal
205e57683a Modify Falcon-180b-GPTQ sharded pipeline 2023-10-17 20:26:01 +05:30
Vivek Khandelwal
2866d665ee Fix Sharded Falcon-180b-GPTQ Pipeline 2023-10-17 20:26:01 +05:30
Stefan Kapusniak
71d25ec5d8 SD: Fix repeatable seeds when intial seed is random (#1893) 2023-10-14 22:50:42 -07:00
Vivek Khandelwal
202ffff67b Add support for sharded Falcon model 2023-10-13 22:05:10 +05:30
Ean Garvey
0b77059628 Add matmul reassociation flags (#1891) 2023-10-12 20:12:37 -05:00
Stefan Kapusniak
a208302bb9 Fix repeatable seeds consistency over batch counts (#1889)
* Set the input seed for the random number generator when
generating repeatable seeds to exclude any negative numbers
in the parsed seed input.  The makes seeds generated for
different batch counts consistent where they have the same
input for the initial seed or set of seeds.
2023-10-12 17:15:19 -05:00
Vivek Khandelwal
b83d32fafe Fix Falcon GPTQ Pipeline 2023-10-11 20:09:32 +05:30
Vivek Khandelwal
0a618e1863 Add support for Falcon GPTQ 2023-10-11 10:47:48 +05:30
Phaneesh Barwaria
a731eb6ed4 Macos fixes (#1883)
* fix venv setup for MacOS

* allow stream fuse binding on mac

* clean iree metal args
2023-10-09 23:36:12 -07:00
Ean Garvey
2004d16945 Revert "[SDXL] Add SDXL pipeline to SHARK (#1731)" (#1882)
This reverts commit 9f0a421764.
2023-10-09 18:01:44 -07:00
Gaurav Shukla
6e409bfb77 fix else if syntax error
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-10-10 06:23:56 +05:30
Gaurav Shukla
77727d149c [warning] Fix dropdown warning
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-10-10 05:18:43 +05:30
Ean Garvey
66f6e79d68 Split CPU/GPU definitions conditionally outside of torch contexts. (#1879) 2023-10-09 16:46:41 -07:00
Ean Garvey
3b825579a7 (LLaMa-2) Point to int4 + f32 acc .mlir for cpu (#1878)
- fixes some issues with non-system prompt invocation

Co-authored-by: Gaurav Shukla <gauravshukla789@gmail.com>
2023-10-09 14:37:35 -05:00
Abhishek Varma
9f0a421764 [SDXL] Add SDXL pipeline to SHARK (#1731)
-- This commit adds SDXL pipeline to SHARK.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-10-09 13:01:37 -05:00
Gaurav Shukla
c28682110c [chatbot] Flag to add system prompt
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-10-09 22:17:39 +05:30
Ean Garvey
caf6cc5d8f Switch most compile flows to use ireec.compile_file. (#1863)
* Switch most compile flows to use ireec.compile_file.

* re-add input type to compile_str path.

* Check if mlir_module exists before checking if it's a path or pyobject.

* Fix some save_dir cases
2023-10-06 23:04:43 -05:00
Ean Garvey
8614a18474 Remove tf dependencies from importer path. (#1874)
* Remove tf dependencies from import path.

* Fix formatting.
2023-10-06 12:27:12 -07:00
Jakub Kuderski
86c1c0c215 Add aggregate statistics to microbenchmark (#1871)
Print averaged results at the end of all iterations. Increase the
default number of iterations to 5.

Example:
```
Number of iterations: 5
Prefill: avg. 0.03 s, stddev 0.00
Decode: avg. 43.34 tokens/s, stdev 0.13
```

Also remove the -2 in the number of generated tokens -- I did not find
any evidence we need it.
2023-10-06 10:03:07 -07:00
Daniel Garvey
8bb364bcb8 enforce fp32 accumulates for cpu (#1873) 2023-10-06 11:34:49 -05:00
Daniel Garvey
7abddd01ec argmax inside model + brevitas pin (#1872) 2023-10-05 20:15:21 -07:00
Abhishek Varma
2a451fa0c7 [Llama2] Add a standalone utility for dynamic and combining IRs
-- This script adds a standalone utility for converting Llama IRs
   to dynamic and combining them as well.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-10-05 20:01:06 +05:30
Jakub Kuderski
9c4610b9da Add microbenchmark mode to vicuna CLI (#1864)
Add flags to enable a non-internactive mode for microbenchmarking llama
models. In this mode, the system and user prompts are specified with CLI
flags, and the number of generated tokens and iterations is fixed.

Also move the stats below the response and trim any response blankspace.
2023-10-05 00:12:08 -04:00
powderluv
a38cc9d216 Update vulkan_utils.py for Radeon 780m igpu (#1866) 2023-10-04 20:33:07 -07:00
Jakub Kuderski
1c382449ec [vulkan] Print note about module load times. NFC. (#1862)
Print a note ahead of a potentially long inactivity to set the right expectations.

Separately, we should add progress to the UI and make this loading faster.
2023-10-03 17:27:27 -04:00
Gaurav Shukla
7cc9b3f8e8 [llama cli] Fix llama cli
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-10-03 20:39:53 +05:30
Gaurav Shukla
e54517e967 [UI] Disable config generator, lora train and model manager (#1858)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-10-02 22:34:40 -07:00
Ean Garvey
326327a799 Collect pipeline submodules for diffusers ckpt preprocessing. (#1859) 2023-10-03 00:29:28 -04:00
Ean Garvey
785b65c7b0 Add flag for specifying device-local caching allocator heap key. (#1856) 2023-10-03 00:28:39 -04:00
Sungsoon Cho
0d16c81687 Remove unused import. (#1857) 2023-10-02 11:36:08 -05:00
Vivek Khandelwal
8dd7850c69 Add Falcon-GPTQ support 2023-10-02 16:39:57 +05:30
Gaurav Shukla
e930ba85b4 [os] Remove os dependency from vmfb naming (#1854)
Also fixes a small ui issue for chatbot.

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-09-29 12:38:17 -05:00
Gaurav Shukla
cd732e7a38 [chatbot] split execution time to prefill and decode
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-09-29 13:18:03 +05:30
Gaurav Shukla
8e0f8b3227 [ui] Update chatbot UI
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-09-29 13:18:03 +05:30
Gaurav Shukla
b8210ef796 [chatbot] Re-instantiate the chatbot object if device id changes
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-09-29 13:18:03 +05:30
PhaneeshB
94594542a9 remove use of vulkaninfo 2023-09-28 21:57:00 +05:30
Gaurav Shukla
82f833e87d [vulkan] Update vmfb naming
Update vmfb naming for vulkan devices in order to resolve naming
conflicts in the presence of multiple vulkan devices.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-09-28 14:52:11 +05:30
Vivek Khandelwal
c9d6870105 Modify falcon pipeline for 180b support 2023-09-28 12:39:35 +05:30
Jakub Kuderski
4fec03a6cc [vulkan] Switch from coop matrix NV to KHR (#1848) 2023-09-27 21:43:37 -04:00
harsh-nod
9a27f51378 Deprecate inference directory
This patch removes the inference directory that was no longer being used.
2023-09-27 14:29:00 -07:00
Abhishek Varma
ad1a0f35ff Fix misdirection while saving vmfb
-- Currently SHARK suggests that vmfb has been saved, while
    that is not the case and no vmfb is generated. 
    This creates a misdirection for IR/vmfbs which are of larger
    size.
-- This commit therefore fixes that misdirection.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-09-27 16:25:29 +05:30
Nelson Sharpe
6773278ec2 Fix checkpoint_path unexpected argument (#1832) 2023-09-24 14:17:52 -07:00
Abhishek Varma
9a0efffcca [Llama2] Fix wrong Vulkan device ID + Add Vulkan compile flags
-- This commit fixes the wrong Vulkan device being selected during
   runtime.
-- It also adds couple of IREE compilation flags to target specific
   Vulkan device.
-- It also changes the Vulkan device listing to be more in tune with
   lowering control flow.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-09-22 22:24:18 +05:30
gpetters94
61c6f153d9 Switch to keras-nightly to fix a Linux issue (#1835) 2023-09-21 12:33:45 -04:00
Phaneesh Barwaria
effd42e8f5 pin gradio to v3.44.3 2023-09-21 17:33:43 +05:30
Sungsoon Cho
b5fbb1a8a0 Rename the func arg save_json to avoid name collision. (#1837)
* Rename the func arg save_json to avoid name collision.

* black formatted.
2023-09-19 17:29:27 -05:00
Quinn Dawkins
ded74d09cd [vicuna.py] Keep past key values on device (#1836)
The past key values are only used within the models themselves and can
be kept on device. For vulkan int4, this gives 44 tok/s (for the first
prompt) and settles at around 26 tok/s on 7900xtx.
2023-09-19 18:17:41 -04:00
Boian Petkantchin
79267931c1 Add argument --additional_compile_args (#1119)
This allows to pass more arguemnts to the IREE compiler
Example:
python my-app.py --additional_compile_args="--mlir-pretty-debuginfo --mlir-timing"

Co-authored-by: Boian Petkantchin <boian@nod-labs.com>
2023-09-19 11:26:03 -05:00
zjgarvey
9eceba69b7 local_tank_cache included into clear_all (#1833) 2023-09-18 00:27:23 -05:00
Ean Garvey
ca609afb6a Update README.md (#1830) 2023-09-14 10:33:57 -05:00
Gaurav Shukla
11bdce9790 [flags] Fix vulkan runtime flags as vma is dropped from iree (#1831) 2023-09-14 08:58:59 -05:00
Ean Garvey
684943a4a6 (SD) Fix tokenizers imports in pyinstaller builds. (#1828)
* Fix tokenizers metadata.

* (SD) Disable VAE lowering configs (rdna3) and add versioned tunings.

* Update sd_annotation.py

* (SD) Add cv2 to spec.

* Update stencil pipeline with the new img2img arg.
2023-09-12 12:23:48 -05:00
PhaneeshB
b817bb8455 add roles for llama2 2023-09-12 10:59:28 +05:30
Ean Garvey
780f520f02 Fix vk.target_env extensions and remove redundant SD imports. (#1826)
* Remove redundant IREE runtime imports.

* Fix vulkan target env extensions.
2023-09-11 13:42:52 -05:00
Dom
c61b6f8d65 Code refactoring (#1817)
* use join

* fix bug

* further code optimizations

---------

Co-authored-by: Daniel Garvey <34486624+dan-garvey@users.noreply.github.com>
2023-09-11 11:30:56 -05:00
Abhishek Varma
c854208d49 [Llama2] Prefetch llama2 tokenizer configs (#1824)
-- This commit prefetches llama2 tokenizer configs from shark_tank.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-09-08 11:29:54 -07:00
Gaurav Shukla
c5dcfc1f13 [vicuna] Exit when mlir is not present in shark tank (#1825)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-09-08 10:30:29 -07:00
Abhishek Varma
bde63ee8ae Add logging feature in WebUI (#1821) 2023-09-08 05:48:05 -07:00
Vivek Khandelwal
9681d494eb Update decomp list and shark trainer for DLRM 2023-09-06 21:24:50 +05:30
Gaurav Shukla
ede6bf83e2 [vicuna] Disabling the IR generation path
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-09-06 20:13:17 +05:30
Ean Garvey
2c2693fb7d Fix torchvision versioning in Linux importer setup. (#1809) 2023-09-05 12:57:03 -05:00
Vivek Khandelwal
1d31b2b2c6 Fix StableHLO Compilation flag 2023-09-05 21:32:33 +05:30
Gaurav Shukla
d2f64eefa3 [chatbot] Remove few outdated models from list (#1814) 2023-09-04 09:26:32 -07:00
Abhishek Varma
87ae14b6ff [SD] Add sdpfa decomposition + update IREE flag
-- This commit adds Scaled Dot Product Flash Attention's decomposition
   in shark_importer.
-- It also updates `iree-flow-enable-data-tiling` to `iree-opt-data-tiling`.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-09-04 18:03:53 +05:30
Phaneesh Barwaria
1ccafa1fc1 fix llama2-70b rewrite tensor dim 2023-09-01 17:27:06 +05:30
jinchen62
4c3d8a0a7f Enable downloading vmfb/mlir for webui (#1807) 2023-08-31 11:05:47 -07:00
jinchen62
3601dc7c3b Fix llama2 13b combined ir (#1803) 2023-08-28 11:34:44 -07:00
Daniel Garvey
671881cf87 Llama2 70b (#1783)
* llama2 70b IR gen

* fix IR sec llama2 + debug

* llama270b

---------

Co-authored-by: PhaneeshB <b.phaneesh@gmail.com>
2023-08-25 23:04:28 -07:00
Gaurav Shukla
4e9be6be59 [chatbot] Add debug as class attribute (#1799)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-25 21:46:29 -07:00
Ean Garvey
9c8cbaf498 Add support for ROCM (Windows) in Studio + compile utils (#1770)
* WIP: MSVC ROCM support for SHARK Studio

* Make get_iree_rocm_args platform-agnostic.

* Update stable_args.py

* Update rocm arg handling in SD utils

* Guard quantization imports.

Co-authored-by: jam https://github.com/jammm
2023-08-25 20:56:05 -07:00
Ean Garvey
9e348a114e Revert changes process_skipfiles.py (#1798)
Keeps a small typo fix but reverts the rest of changes to this file from 450c231171
2023-08-25 15:31:49 -07:00
jinchen62
51f90a4d56 Update conversion passes for brevitas quant op (#1795) 2023-08-25 17:28:07 -05:00
Abhishek Varma
310d5d0a49 Fix llama2 13b crashing + add spec file for CLI execution of Llama (#1797)
* [Llama2] Add a fix for Llama2 13B downloading/crashing

-- This commit fixes downloading/crashing of llama2 13B on wrong
   .mlir file.
-- Also adds support for downloading vmfb from shark_tank in CLI.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

* [llama2] Add a spec file to run Llama/Vicuna CLI exe

-- This commit adds a spec file to run Llama/Vicuna CLI exe.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

---------

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-08-25 09:36:09 -05:00
Ean Garvey
9697981004 Pipe through a debug option to iree compile utils. (#1796)
* Update compile_utils.py

* Pipe through a flag to toggle debug options in compile utils.

* Update SharkLLMBase.py
2023-08-25 07:11:11 -07:00
Ean Garvey
450c231171 Add tokenizers to requirements.txt (#1790)
* Add tokenizers to requirements and pin version

* Update process_skipfiles.py
2023-08-24 19:44:04 -05:00
Ean Garvey
07f6f4a2f7 Add a short README for the OPT examples and small tweaks. (#1793)
* Small changes to OPT example.

* Update opt README.

* Add a few modes to batch script.

* Update README.md
2023-08-24 17:26:11 -07:00
jinchen62
610813c72f Add iree flag to strip assertions (#1791) 2023-08-24 10:51:19 -07:00
Ean Garvey
8e3860c9e6 Remove flags that are default in upstream IREE (#1785)
* Remove index bits flags now set by default

* Update shark_studio_imports.py
2023-08-24 11:57:54 -05:00
xzuyn
e37d6720eb Add Hires Fix (#1787)
* improper test hiresfix

* add sliders & use `clear_cache`

* add resample choices & fix step adjustment

* add step adjustment to img2img

* add resample options to img2img

* simplify hiresfix
- import `img2img_inf` from `img2img_ui.py` instead of just copying it into `txt2img_ui.py`

* set `hri` to None after using

* add more resample types, and don't show output until hiresfix is done

* cleaner implementation

* ran black

* ran black again with jupyter dependencies
2023-08-24 09:01:41 -07:00
Vivek Khandelwal
16160d9a7d Fix combine mlir script 2023-08-24 19:10:49 +05:30
Sungsoon Cho
79075a1a07 Opt perf (#1786)
* Define command line args, model-name, max-seq-len, platform, etc.

* Add usage example.

* Add opt_perf_comparision_batch.py.

* Use shlex instead.
2023-08-24 08:33:12 -05:00
Abhishek Varma
db990826d3 Add Llama2 13B int4 fp16 support (#1784)
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-08-23 10:00:32 -07:00
gpetters94
7ee3e4ba5d Add stencil_unet_512 support (#1778)
This should fix any remaining issues with stencils and long prompts.
2023-08-22 12:23:46 -04:00
Vivek Khandelwal
05889a8fe1 Add LLaMa2-int4-fp16 support (#1782) 2023-08-22 07:45:50 -07:00
jinchen62
b87efe7686 Fix venv setup for brevitas (#1779) 2023-08-21 11:58:51 -07:00
gpetters94
82b462de3a Fix stencils for long prompts (#1777) 2023-08-19 00:26:51 -07:00
Daniel Garvey
d8f0f7bade replace public with private (#1776)
unload footguns
2023-08-18 14:22:46 -07:00
gpetters94
79bd0b84a1 Fix an issue with diffusers>0.19.3 (#1775) 2023-08-18 14:06:06 -04:00
jinchen62
8738571d1e Adapt the change of brevitas custom op name (#1772) 2023-08-17 14:24:43 -07:00
Gaurav Shukla
a4c354ce54 [version] Pin diffusers==0.19.3
Once the latest works with LORA train, unpin it.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-17 21:27:10 +05:30
Gaurav Shukla
cc53efa89f [cli] Fix chatbot cli
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-17 21:27:10 +05:30
Gaurav Shukla
9ae8bc921e [chatbot] Fix chatbot cli and webview warning
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-17 21:27:10 +05:30
Gaurav Shukla
32eb78f0f9 [chatbot] Fix switching parameters in chatbot
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-17 19:14:17 +05:30
Ean Garvey
cb509343d9 Fix pytest benchmarks and shark_tank generation. (#1632)
- fix setup_venv.sh for benchmarks/imports etc.
- fix torch benchmarks in SharkBenchmarkRunner
- generate SD artifacts using build_tools/stable_diffusion_testing.py and --import_mlir
- decouple SD gen from tank/generate_sharktank for now
2023-08-16 17:48:47 -05:00
powderluv
6da391c9b1 update signtool to use /fd certHash 2023-08-15 15:11:40 -07:00
Ean Garvey
9dee7ae652 fix tkinter window (#1766) 2023-08-15 13:23:09 -07:00
Ean Garvey
343dfd901c Update SHARK-Runtime links to SRT (#1765)
* Update nightly.yml

* Update setup_venv.ps1

* Update CMakeLists.txt

* Update shark_iree_profiling.md

* Update setup_venv.sh

* Update README.md

* Update .gitmodules

* Update CMakeLists.txt

* Update README.md

* fix signtool flags

* Update nightly.yml

* Update benchmark_utils.py

* uncomment tkinter launch
2023-08-15 12:40:44 -07:00
Ean Garvey
57260b9c37 (Studio) Add hf-hub to pyinstaller metadata (#1761) 2023-08-14 23:01:50 -05:00
Ean Garvey
18e7d2d061 Enable vae tunings for rdna3. (#1764) 2023-08-14 21:00:14 -07:00
Stanley Winata
51a1009796 Add Forward method to SHARKRunner and fix examples. (#1756) 2023-08-14 19:20:37 -07:00
Daniel Garvey
045c3c3852 enable iree-opt-const-expr-hoisting in vicuna (#1742)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-08-14 18:43:42 -07:00
Ean Garvey
0139dd58d9 Specify max allocation size in IREE compile args. (#1760) 2023-08-14 15:43:09 -05:00
Ean Garvey
c96571855a prevents recompiles for cuda benchmarks + update benchmark_module path (#1759)
* xfail resnet50_fp16

* Fix cuda benchmarks and prevent recompilation.
2023-08-14 15:30:32 -05:00
PhaneeshB
4f61d69d86 add support passing iree flags for LLMs 2023-08-15 00:22:56 +05:30
Phaneesh Barwaria
531d447768 set default allocator for metal device creation (#1755) 2023-08-14 06:17:52 -07:00
Vivek Khandelwal
16f46f8de9 Update langchain_requirements.txt 2023-08-14 14:32:19 +05:30
Vivek Khandelwal
c4723f469f Update langchain_requirements.txt 2023-08-14 14:32:19 +05:30
Vivek Khandelwal
d804f45a61 Update langchain_requirements.txt 2023-08-14 14:32:19 +05:30
Vivek Khandelwal
d22177f936 Update requirements.txt 2023-08-14 14:32:19 +05:30
George Petterson
75e68f02f4 Remove CUDNN 2023-08-14 14:32:19 +05:30
Gaurav Shukla
4dc9c59611 [chatbot] Add tokens generated per second (#1753) 2023-08-13 11:25:41 -07:00
Gaurav Shukla
18801dcabc [chat] Update chatbot ui
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-13 18:39:22 +05:30
Gaurav Shukla
3c577f7168 [vicuna] fix shard config generator script (#1747)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-10 11:26:03 -07:00
Stefan Kapusniak
f5e4fa6ffe UI/Web - Revert tab order (#1724)
* Revert ui tab order

* Reverts the tab order, so that SD, LLM, and Experimental are grouped
together again as far as is possible.
* Labelled "Generate Sharding Config" as experimental as pressing the
'Get Model Config' errors for me.

* Fix formatting in index.py
2023-08-10 11:25:36 -07:00
powderluv
48de445325 Enable caching and disable vma (#1746)
* Enable caching allocator by default

Going to toggle VMA off too and this is required for performance.  Will have to monitor in the wild reports.

* Disable VMA

Disable VMA
2023-08-10 10:49:44 -07:00
Gaurav Shukla
8e90f1b81a [vicuna] add default config in case of sharded vicuna
Signed-Off-by: Gaurav Shukla<gaurav@nod-labs.com>
2023-08-10 21:28:08 +05:30
Vivek Khandelwal
e8c1203be2 Fix vicuna script (#1745) 2023-08-10 06:11:14 -07:00
Vivek Khandelwal
e4d7abb519 Final patch for fixing Langchain token streaming issue (#1744) 2023-08-09 10:09:41 -07:00
powderluv
96185c9dc1 pin safetensors to 0.3.1 (#1740) 2023-08-08 19:24:44 -07:00
powderluv
bc22a81925 re-enable constant folding (#1739)
Tested and works well. (modulo unrelated driver issue)
2023-08-08 17:17:38 -07:00
Eliasj42
5203679f1f Bandaid fix 2 (#1728)
* download all mlirs

* fixed install method

* download all mlirs (#1727)

Co-authored-by: Elias Joseph <elias@nod-labs.com>

* added taggs

* fix name check for file existence

* Remove SD from all_models.csv (#1706)

Removes SD from pytests as it has its own test suite.

* gpt_langchain.py fixes for pydantic (#1722)

* removed dead code

---------

Co-authored-by: Elias Joseph <elias@nod-labs.com>
Co-authored-by: PhaneeshB <b.phaneesh@gmail.com>
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
Co-authored-by: Stefan Kapusniak <121311569+one-lithe-rune@users.noreply.github.com>
2023-08-08 12:14:57 -05:00
Vivek Khandelwal
bf073f8f37 [Langchain] Expand pipelines to fix token streaming issue 2023-08-08 10:27:23 +05:30
Stella Laurenzo
cec6eda6b4 Optimize device enumeration overhead and log details on long operations. (#1734)
* Optimize device enumeration overhead and log details on long operations.

* Various fixes to add `@functools.cache` to what should be one time, expensive, device enumeration and setup activities. Cuts several seconds off of initialization on my machine.
* Add detailed tracing to actual invocations if they exceed a certain timeout or have an exception.
* Add detailed tracing to loading status.
* By default detail logging is only printed if an operation takes an excessive amount of time. All logging/timing can be printed by setting the variable `$env:SHARK_DETAIL_TRACE = "1"`

* Remove cache from unhashable functions
2023-08-07 17:20:53 -07:00
Stella Laurenzo
9e37e03741 Clearly differentiate phases of loading modules to better understand if things are taking a long time. (#1733) 2023-08-07 14:03:12 -07:00
Stefan Kapusniak
9b8c4401b5 gpt_langchain.py fixes for pydantic (#1722) 2023-08-07 00:55:38 -07:00
Ean Garvey
a9f95a218b Remove SD from all_models.csv (#1706)
Removes SD from pytests as it has its own test suite.
2023-08-05 15:55:52 -05:00
PhaneeshB
872bd72d0b fix name check for file existence 2023-08-05 21:33:53 +05:30
Eliasj42
fd1c4db5d0 download all mlirs (#1727)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-08-04 18:22:06 -05:00
Daniel Garvey
759664bb48 add py files to pyinstaller for shark (#1723) 2023-08-04 14:10:43 -07:00
Daniel Garvey
14fd0cdd87 add missing subprocess import (#1721) 2023-08-04 15:15:22 -05:00
Daniel Garvey
a57eccc997 fix lint (#1720) 2023-08-04 14:54:33 -05:00
Daniel Garvey
a686d7d89f temporarily disable langchain stuff in webui (#1719)
its breaking the exe
2023-08-04 12:48:06 -07:00
Eliasj42
ed484b8253 added functionality for int8 vicuna and 4 shards (#1712)
combined vicuna_4_shards.py and vicuna.py to reduce code duplication

Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-08-04 14:05:05 -05:00
gpetters94
7fe57ebaaf Add vector database and add support on the web UI (#1699) 2023-08-04 13:47:19 -04:00
Nithin Meganathan
c287fd2be8 Add GPU ID's in model_confg.json by default for manual annotation (#1718) 2023-08-04 12:46:27 -05:00
Gaurav Shukla
51ec1a1360 [vicuna] Integrate sharded vicuna in web (#1717)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-04 11:46:53 -05:00
Gaurav Shukla
bd30044c0b [Shard] Add sharding generation in shark studio
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-08-04 21:51:14 +05:30
Ean Garvey
c9de2729b2 Add flag for toggling constant folding. (#1714) 2023-08-04 04:55:52 -07:00
Vivek Khandelwal
a5b13fcc2f [Langchain] Patch for fixing streaming of tokens (#1709) 2023-08-03 10:06:49 -07:00
Stefan Kapusniak
6bb329c4af Unsharded Vicuna: Fix Memory Error compiling mlir for lmsys/vicuna-7b-v1.3 fp16 with 64 GiB (#1702) 2023-08-01 06:07:56 -07:00
Vivek Khandelwal
98fb6c52df Expand pipelines to fix streaming of tokens 2023-07-31 22:11:01 +05:30
Stefan Kapusniak
206c1b70f4 UI/Web: Reorder tabs to separate SD and LLM (#1701)
Shuffle the tabs around so that:

* All the SD tabs are together
* All the LLM tabs are together
* All the experimental tabs are together
2023-07-29 22:25:30 -04:00
PhaneeshB
cdb037ee54 use shark_args for vulkan debug utils flag 2023-07-30 07:54:26 +05:30
PhaneeshB
ce2fd84538 fix cpu device name for SharkStudio 2023-07-30 07:54:26 +05:30
PhaneeshB
4684afad34 update upscalar example 2023-07-28 21:06:28 +05:30
PhaneeshB
8d65456b7a Move vulkan runtime flags to shark_args 2023-07-28 21:06:28 +05:30
PhaneeshB
d6759a852b add vulkan vma alloc flag 2023-07-28 21:06:28 +05:30
Daniel Garvey
ab57af43c1 Couple of fixes for vicuna.py (#1696)
* mega vicuna merge pt 2

* add fallback to ensure compile is called
2023-07-27 15:53:05 -07:00
jinchen62
4d5c55dd9f Fix vicuna script (#1697) 2023-07-27 17:24:26 -05:00
Vivek Khandelwal
07399ad65c [Langchain] Remove unused code (#1698) 2023-07-27 11:59:54 -05:00
Vivek Khandelwal
776a9c2293 Fix for Langchain (#1694)
For CPU, remove max time stopping criteria
Fix web UI issue
2023-07-26 09:00:23 -07:00
Eliasj42
9d399eb988 fixed bug where device_idx was hardcoded (#1693)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-07-25 19:00:13 -05:00
Vivek Khandelwal
927b662aa7 Add Langchain SHARK Compilation support for all paths 2023-07-25 22:15:42 +05:30
Abhishek Varma
47f8a79c75 [MiniGPT4] Add MiniGPT4 to SHARK (#1554)
* [MiniGPT4] Add MiniGPT4 to SHARK

-- This is the first installment of MiniGPT4 in SHARK.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

* Add int8 support for MiniGPT4

-- This commit adds int8 support for MiniGPT4.

Signed-off-by: Abhishek Varma <abhishek@nod-lab.com>

* Update .spec for MiniGPT4's config files

* black format MiniGPT4

---------

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Signed-off-by: Abhishek Varma <abhishek@nod-lab.com>
2023-07-25 09:42:27 -07:00
Stefan Kapusniak
289f983f41 SD - Implement seed arrays for batch runs (#1690)
* SD Scripts and UI tabs that support batch_count can now take a
string containing a JSON array, or a list of integers, as their seed
input.
* Each batch in a run will now take the seed specified at the
corresponding array index if one exists. If there is no seed at
that index, the seed value will be treated as -1 and a random
seed will be assigned at that position. If an integer rather than
a list or json array has been, everything works as before.
* UI seed input controls are now Textboxes with info lines about
the seed formats allowed.
* UI error handling updated to be more helpful if the seed input is
invalid.
2023-07-24 19:22:34 -07:00
Daniel Garvey
453e46562f mega vicuna merge pt 2 (#1685) 2023-07-24 12:42:20 -05:00
Gaurav Shukla
5497af1f56 [config] Add support for uploading sharding config file in chatbot (#1689)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-07-24 10:18:03 -07:00
Vivek Khandelwal
f3cb63fc9c Fix Langchain multiple device isssue (#1688) 2023-07-24 08:03:46 -07:00
Vivek Khandelwal
d7092aafaa Fix multiple issue for Langchain
This commit fixes the following issue for the Langchain:
1.) Web UI not able to fetch results.
2.) For each query model getting reloaded.
3.) SHARK module not using user provided device and precision.
4.) Create a class for main Langchain code.
5.) Misc issues
2023-07-21 21:56:27 +05:30
Vivek Khandelwal
a415f3f70e Fix Langchain Prompt issue and add web UI support (#1682) 2023-07-21 06:36:55 -07:00
Vivek Khandelwal
c292e5c9d7 Add Langchain CPU support and update requirements 2023-07-20 18:53:34 +05:30
Vivek Khandelwal
03c4d9e171 Add support for Llama-2-70b for web and cli, and for hf_auth_token 2023-07-20 14:57:48 +05:30
jinchen62
3662224c04 Update brevitas requirement (#1677)
also clean up useless args

Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-07-19 22:03:32 -07:00
Vivek Khandelwal
db3f222933 Revert "Add Llama2 70B option in CLI and WebUI (#1673)" (#1679)
This reverts commit 41e5088908.
2023-07-19 22:02:48 -07:00
Stefan Kapusniak
68b3021325 Fixes cosmetic problems with Gradio 3.37.0 (#1676)
* Fix nod-ai logo having a white border
* Fix control labels having a black background
* Remove extra lower border below Save Prompt checkboxes in Txt2Img UI
2023-07-19 17:28:53 -07:00
AyaanShah2204
336469154d added copy-metadata for pyyaml (#1678) 2023-07-19 17:27:25 -07:00
Abhishek Varma
41e5088908 Add Llama2 70B option in CLI and WebUI (#1673) 2023-07-19 10:41:42 -07:00
PhaneeshB
0a8f7673f4 Add README for CodeGen server 2023-07-19 23:10:23 +05:30
PhaneeshB
c482ab78da fix second vic clearing for low mem device 2023-07-19 23:10:23 +05:30
Vivek Khandelwal
4be80f7158 Add support for the Llama-2 model 2023-07-19 20:57:08 +05:30
AyaanShah2204
536aba1424 unpinned torch_mlir (#1668)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-07-19 06:28:00 -07:00
Ean Garvey
dd738a0e02 small changes to opt_perf_comparison.py (#1670)
* Use longer prompts for OPT comparison script

* small tweaks
2023-07-19 06:26:50 -07:00
Daniel Garvey
8927cb0a2c set optional vmfb download (#1667) 2023-07-18 10:57:28 -07:00
Daniel Garvey
8c317e4809 fix cli for vicuna (#1666) 2023-07-18 10:03:40 -07:00
Vivek Khandelwal
b0136593df Add support for different compilation paths for DocuChat (#1665) 2023-07-18 09:49:44 -07:00
Vivek Khandelwal
11f62d7fac Minor fixes for MiniLM Training 2023-07-18 17:16:44 +05:30
powderluv
14559dd620 Update DocuChat as experimental (#1660) 2023-07-17 22:12:05 -07:00
AyaanShah2204
e503a3e8d6 fixed joblib import error (#1659) 2023-07-17 12:56:10 -07:00
AyaanShah2204
22a4254adf fixed pyinstaller path for langchain imports (#1658) 2023-07-17 12:19:21 -07:00
Vivek Khandelwal
ab01f0f048 Add Langchain model in SHARK (#1657)
* Add H2OGPT

* Add UI tab for h2ogpt

* Add source files from h2ogpt

* Add the rest of the files

* Add h2ogpt support

* Add SHARK Compilation support for langchain model for cli mode

---------

Co-authored-by: George Petterson <gpetters@protonmail.com>
2023-07-17 09:58:15 -07:00
Phaneesh Barwaria
c471d17cca codegen API (#1655) 2023-07-16 20:00:39 -07:00
Stefan Kapusniak
a2a436eb0c SD - Add repeatable (batch) seeds option (#1654)
* Generates the seeds for all batch_count batches being run up front
rather than generating the seed for a batch just before it is run.
* Adds a --repeatable_seeds argument defaulting to False
* When repeatable_seeds=True, the first seed for a set of batches will
also be used as the rng seed for the subsequent batch seeds in the run.
The rng seed is then reset.
* When repeatable_seeds=False, batch seeding works as currently.
* Update scripts under apps/scripts that support the batch_count
argument to also support the repeatable_seeds argument.
* UI/Web: Adds a checkbox element on each SD tab after batch count/size
for toggling repeatable seeds, and update _inf functions to take
this into account.
* UI/Web: Moves the Stop buttons out of the Advanced sections and next
to Generate to make things not fit quite so badly with the extra UI
elements.
* UI/Web: Fixes logging to the upscaler output text box not working
correctly when running multiple batches.
2023-07-15 16:22:41 -07:00
powderluv
1adb51b29d Update docker README.md 2023-07-15 14:31:56 -07:00
anush elangovan
aab2233e25 Add a dev Ubuntu 22.04 docker image 2023-07-15 16:25:37 +00:00
jinchen62
e20cd71314 Change to a separate pass to unpack quantized weights (#1652) 2023-07-15 04:54:53 -07:00
powderluv
5ec91143f5 add a HF accelerate requirement (#1651) 2023-07-14 05:56:12 -07:00
Ean Garvey
7cf19230e2 add perf comparison script for opt. (#1650) 2023-07-13 13:29:48 -05:00
powderluv
1bcf6b2c5b pin diffusers to 0.18.1 (#1648) 2023-07-13 01:02:24 -07:00
jinchen62
91027f8719 Remove done TODOs, a sup PR for #1644 (#1647) 2023-07-12 23:30:45 -07:00
powderluv
a909fc2e78 add tiktoken to spec file (#1646) 2023-07-12 16:12:02 -07:00
jinchen62
247f69cf9d Apply canonicalize for unpacking int4 (#1644)
- tested it unpacks int4 as expected
- tested it doesn't make difference on int8
2023-07-11 19:41:09 -07:00
PhaneeshB
3b8f7cc231 Add codegen support in UI + lint 2023-07-11 21:58:01 +05:30
PhaneeshB
6e8dbf72bd mlir/vmfb path fixes for vic pipeline 2023-07-11 21:58:01 +05:30
PhaneeshB
38e5b62d80 adapt UI to send model details to pipeline 2023-07-11 21:58:01 +05:30
PhaneeshB
1c7eecc981 add codegen support in vic pipeline 2023-07-11 21:58:01 +05:30
PhaneeshB
be417f0bf4 fix precision for fp16 2023-07-11 21:58:01 +05:30
AyaanShah2204
a517e217b0 Added support for building ZIP distributions (#1639)
* added support for zip files

* making linter happy

* Added temporary fix for NoneType padding

* Removed zip script

* Added shared imports file

* making linter happy
2023-07-09 06:45:36 -07:00
Ranvir Singh Virk
9fcae4f808 Metal testing (#1595)
* Fixing metal_platform and device selection

* fixing for metal platform

* fixed for black lint formating
2023-07-08 15:22:53 -07:00
Stefan Kapusniak
788d469c5b UI/Web Refix remaining gradio deprecation warning (#1638) 2023-07-08 13:48:36 -07:00
Stefan Kapusniak
8a59f7cc27 UI/Web add 'open folder' button to output gallery (#1634)
* Adds a button that opens the currently selected subdirectory using
the default OS file manager
* Improve output gallery handling of having images deleted out from
under it.
* Don't show VAE or LoRA lines in parameter info panel when their
value is 'None'
* Use a css class for small icon buttons on the output gallery
tab instead using the same id for multiple buttons
2023-07-08 12:44:59 -07:00
Stefan Kapusniak
1c2ec3c7a2 Some Fixes for Gradio 3.36.1 (#1637)
* Clear .style deprecation warnings.
* Re-remove download button from Nod logos.
* Add work around for `container=false` not doing what it did before on
dropdowns to the output gallery CSS
2023-07-08 11:20:34 -07:00
powderluv
af0f715e20 Unpin gradio 2023-07-08 09:41:14 -07:00
jinchen62
47ec7275e6 Fix brevitas quantize argument (#1633) 2023-07-07 11:30:31 -07:00
powderluv
3a24cff901 change binary names 2023-07-06 23:59:14 -07:00
powderluv
1f72907886 Fix the pyinstaller for chatbots (#1631) 2023-07-06 23:30:01 -07:00
Daniel Garvey
06c8aabd01 remove local-sync from webui (#1629) 2023-07-06 13:58:59 -07:00
Phaneesh Barwaria
55a12cc0c4 cpu name in device (#1628)
* show cpu name in devices

* change device order for chatbot
2023-07-06 12:00:09 -07:00
Ean Garvey
7dcbbde523 Xfail models for data tiling flag changes (#1624) 2023-07-06 06:57:17 -07:00
Abhishek Varma
1b62dc4529 [Vicuna] Revert the formatting for Brevitas op (#1626)
-- This commit reverts the formatting for Brevitas op.
-- It also excludes vicuna.py script from `black` formatter.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-07-06 06:56:17 -07:00
Daniel Garvey
c5a47887f4 Revert revert negative prompt change (#1625)
* revert default flag changes

* revert revert negative prompt change

* revert revert negative prompt change
2023-07-05 22:09:06 -07:00
Daniel Garvey
c72d0eaf87 revert default flag changes (#1622) 2023-07-05 15:43:26 -05:00
powderluv
c41f58042a Update compile_utils.py (#1617)
* Update compile_utils.py

* Update compile_utils.py

* Update compile_utils.py
2023-07-05 10:06:48 -07:00
xzuyn
043e5a5c7a fix a mistake I made, and more formatting changes, and add ++/Karras (#1619)
* fixed missing line break in `stablelm_ui.py` `start_message`
- also more formatting changes

* fix variable spelling mistake

* revert some formatting cause black wants it different

* one less line, still less than 79

* add ++, karras, and karras++ types of dpmsolver.

* black line length 79

---------

Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-07-05 09:00:16 -07:00
Abhishek Varma
a1b1ce935c int8 e2e for WebUI (#1620) 2023-07-05 07:08:36 -07:00
jinchen62
bc6fee1a0c Add int4/int8 vicuna (#1598) 2023-07-05 07:01:51 -07:00
xzuyn
91ab594744 minor fix, some changes, some additions, and cleaning up (#1618)
* - fix overflowing text (a janky fix)
- add DEISMultistep scheduler as an option
- set default scheduler to DEISMultistep
- set default CFG to 3.5
- set default steps to 16
- add `xzuyn/PhotoMerge` as a model option
- add 3 new example prompts (which work nicely with PhotoMerge)
- formatting

* Set DEISMultistep in the cpu_only list instead

* formatting

* formatting

* modify prompts

* resize window to 81% & 85% monitor resolution instead of (WxH / 1.0625).

* increase steps to 32 after some testing. somewhere in between 16 and 32 is best compromise on speed/quality for DEIS, so 32 steps to play it safe.

* black line length 79

* revert settings DEIS as default scheduler.

* add more schedulers & revert accidental DDIM change
- add DPMSolverSingleStep, KDPM2AncestralDiscrete, & HeunDiscrete.
- did not add `DPMSolverMultistepInverse` or `DDIMInverse` as they only output latent noise, there are a few I did not try adding yet.
- accidentally set `upscaler_ui.py` to EulerDiscrete by default last commit while reverting DEIS changes.
- add `xzuyn/PhotoMerge-inpainting` as an in or out painting model.

* black line length 79

* add help section stuff and some other changes
- list the rest of the schedulers in argument help section.
- replace mutable default arguments.
- increased default window height to 91% to remove any scrolling for the main txt2img page (tested on a 1920x1080 monitor). width is the same as its just enough to have the image output on the side instead of the bottom.
- cleanup
2023-07-04 18:51:23 -07:00
Eliasj42
4015793f84 changed method of compiling vicuna to remove first and second vicuna (#1611)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-07-03 12:12:43 -07:00
Ean Garvey
d63ce76dd8 Use sortable image filenames for SD outputs. (#1528) 2023-07-03 10:30:47 -07:00
Prashant Kumar
1c32915570 Add the shark compile downstream due to https://github.com/pytorch/pytorch/pull/104185#issuecomment-1615110613 (#1615) 2023-07-01 08:30:58 -07:00
Ean Garvey
6d286c0609 Enable tuning for rectangle sizes on rdna2. (#1608) 2023-06-30 22:28:24 -07:00
Stefan Kapusniak
7392b22731 UI/Web Reduce animation of default --progress_bars (#1613) 2023-06-30 21:12:10 -07:00
jinchen62
534de05791 Update precision check for vicuna (#1610) 2023-06-29 16:16:33 -05:00
Daniel Garvey
5779e8c039 int4/int8 vicuna download support (#1609)
* set task_topology_max_group to cpu_count

by default. Can be overriden with a flag of the same str

* add download for int4/int8 mlir
2023-06-29 13:35:51 -07:00
Abhishek Varma
d496053590 [SHARK] Add a compile API to use for quick testing of inference (#1606) 2023-06-28 08:40:28 -07:00
gpetters94
6274a813c9 Add unet512 support for the other StableDiffusion pipelines (#1602) 2023-06-27 12:28:57 -07:00
Gaurav Shukla
1d6a1f9f8a [vicuna] Add tokens streaming(step=3) (#1600)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-06-27 08:59:27 -07:00
Daniel Garvey
75672c0e28 set task_topology_max_group to cpu_count (#1594)
by default. Can be overriden with a flag of the same str
2023-06-26 14:54:06 -07:00
Prashant Kumar
74a7202173 Make the tensors contiguous. 2023-06-26 17:29:54 +05:30
Prashant Kumar
27a08735db Add the shark backend for torch.compile API. (#1596) 2023-06-26 03:53:32 -07:00
Stefan Kapusniak
eaa49cce17 UI/App - Allow text selection (#1593)
* When run in app mode on windows, allows selection of text from
non-input controls, which is the same behaviour as web mode.
2023-06-26 02:16:53 -07:00
powderluv
10657d6fb1 Disable upx 2023-06-25 07:28:52 -07:00
Stefan Kapusniak
e3ab844cd1 Fix output gallery for csv format inc. VAE & LoRA (#1591) 2023-06-24 06:20:53 -07:00
powderluv
5ce6001b41 Update stablelm_ui.py to default to fp16 2023-06-23 22:55:47 -07:00
powderluv
501d0ca52e Add sentencepiece to webui for pyinstaller 2023-06-23 22:52:06 -07:00
powderluv
b444528715 Pin torch-mlir for windows too 2023-06-23 19:19:28 -07:00
Ean Garvey
6e6c90f62b Pin torch-mlir and use local-task in OPT. (#1592) 2023-06-23 19:17:05 -07:00
AyaanShah2204
8cdb38496e Final REST API Fixes (#1590)
* fixed outpaint api and added tests

* fixed text2img api

* more elegant generator to subscriptable conversion

* final fixes
2023-06-23 16:46:47 -07:00
powderluv
726d73d6ba Revert "[vicuna] Add streaming of tokens (#1587)" (#1588)
This reverts commit 4d55e51d46.
2023-06-23 10:29:00 -07:00
Gaurav Shukla
4d55e51d46 [vicuna] Add streaming of tokens (#1587)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-06-23 08:20:46 -07:00
Prashant Kumar
6ef78ee7ba Add cpu compile time flags. (#1585) 2023-06-23 07:23:26 -07:00
jinchen62
4002da7161 Add int4/int8 options to chatbot webui (#1586) 2023-06-23 07:18:34 -07:00
powderluv
ecb5e8e5d8 Update txt2img_ui.py 2023-06-23 06:42:12 -07:00
PhaneeshB
28e0919321 Add AMD cpu device 2023-06-23 18:47:04 +05:30
Daniel Garvey
28f4d44a6b downloader was double downloading (#1580) 2023-06-22 18:30:27 -07:00
AyaanShah2204
97f7e79391 [Blender Integration] Fixed Inpainting REST API (#1577)
* fixed inpaint api

* added inpainting test

* fixed linter errors

---------

Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-06-22 16:08:26 -07:00
Nelson Sharpe
44a8f2f8db Include VAE & LoRA data into PNG metadata (#1573)
* include custom lora and vae data in png metadata

* include pycharm settings

* lint with black
2023-06-22 16:05:54 -07:00
Eliasj42
8822b9acd7 added ability to use config file to shard vicuna (#1565)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-06-22 17:40:35 -05:00
Daniel Garvey
0ca3b9fce3 fix some mmap and vicuna bugs (#1576) 2023-06-22 17:39:55 -05:00
Nithin Meganathan
045f2bb147 Add dispatch-level config file generator for manual annotation (#1566) 2023-06-22 15:11:41 -07:00
Prashant Kumar
a811b867b9 Add shark_eager mode.
-- Eager mode with step by step op compilation and execution.
2023-06-22 22:59:14 +05:30
Abhishek Varma
cdd505e2dd [SharkInference-SharkRuntime] Adds capability to mmap vmfbs
-- This commit is based on [VmModule.mmap() API](https://github.com/openxla/iree/pull/14124).
-- It thereby adds capability to mmap vmfbs in SHARK.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-06-22 20:43:40 +05:30
powderluv
1b0f39107c Move torch_mlir import to the top (#1574) 2023-06-21 22:31:35 -07:00
powderluv
b9b8955f74 exclude vulkan on macos 2023-06-21 22:22:27 -07:00
powderluv
6f7a85eee3 switch to metal backend for CI 2023-06-21 22:17:11 -07:00
Ranvir Singh Virk
18c8e9e51e Metal typo fix (#1572)
* fixing typos for metal changes

* black formating
2023-06-21 21:56:11 -07:00
Daniel Garvey
a202bb466a fp16 fixes for webui (#1571) 2023-06-21 20:24:02 -07:00
Ranvir Singh Virk
07c1e1d712 Adding metal_utils for iree_utils (#1561)
* Adding metal_utils for iree_utils

* Add patch for making compile API work for both MEGABYTE and MiniGPT4 (#1559)

-- It also modifies the mega_test.py script

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

* [SD] Update unet in_channels API and add PIL metadata to spec. (#1560)

* Fix deprecation warning for unet config.

* Include PIL metadata instead of hidden imports in SD spec.

* Fixing iree-metal-target-platform

* adding metal to txt2img pipeline

* Fixing Copyright date

* removing debug prints

* black lint formating

* fixing device dump

---------

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <avarma094@gmail.com>
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-06-21 19:09:03 -07:00
Ranvir Singh Virk
18daec78c8 Added check for python version (#1570)
* Added check for python version

* Update for PYTHON_VERSION_X_Y
2023-06-21 18:56:47 -07:00
Ean Garvey
1a8e2024d6 Exclude non-square sizes from use_tuned on rdna2 (#1568) 2023-06-21 11:36:55 -05:00
AyaanShah2204
d61b6641fb Rest API: Resolved Generator Object not Subscripatable error (#1556) 2023-06-20 19:27:41 -07:00
Phaneesh Barwaria
88cc2423cc Enable Vicuna fp16 cpu (#1562)
* fix second vic mlir gen

* fp16 mlir/vmfb download from shark_tank
2023-06-20 13:43:21 -05:00
Ean Garvey
ccf944c1bd Enable tuner for upscaler unet. (#1563) 2023-06-20 13:40:13 -05:00
Ean Garvey
0def74f520 [SD] Update unet in_channels API and add PIL metadata to spec. (#1560)
* Fix deprecation warning for unet config.

* Include PIL metadata instead of hidden imports in SD spec.
2023-06-20 10:26:36 -07:00
Abhishek Varma
3fb72e192e Add patch for making compile API work for both MEGABYTE and MiniGPT4 (#1559)
-- It also modifies the mega_test.py script

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-06-20 10:04:17 -07:00
Vivek Khandelwal
855435ee24 Fix for the user input for Falcon pipeline 2023-06-20 18:09:32 +05:30
Elias Joseph
6f9f868fc0 fixed a bug where designating device for vicuna didn't work 2023-06-20 17:09:32 +05:30
powderluv
fb865f1b99 Move to checkout@v3
This will break Windows again but we have to fix it up since the old node.js is now deprecated.
2023-06-19 18:44:36 -07:00
rprasad2
3e5c50f07b changes for tuning (#1542)
* Add tuning sizes for rdna3
2023-06-19 15:29:08 -05:00
powderluv
a544f30a8f Move mega to the shark examples (#1555) 2023-06-19 11:10:51 -07:00
Abhishek Varma
1fe56d460a [MEGABYTE] Add script to compile MEGABYTE through SHARK (#1553)
-- Usage: `python mega_test.py`.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-06-19 11:00:35 -07:00
Vivek Khandelwal
fafd713141 Minor change to falcon pipeline 2023-06-19 22:36:32 +05:30
Vivek Khandelwal
015d0132c3 Modify falcon pipeline to add fp16 support (#1551) 2023-06-19 09:57:13 -07:00
powderluv
20ddd96ef7 unpin diffusers (#1550) 2023-06-18 13:45:55 -07:00
powderluv
ee33cfd2d1 Add PIL in main index.py (#1549)
* Add PIL in main index.py

This is to ensure pyinstaller picks it up

* Update index.py
2023-06-18 11:51:44 -07:00
Stefan Kapusniak
a3cba21d5b Fix load of unet512 vmfb fail on get of iree opts (#1546)
* Change retrieval of Iree options used when loading an existing
unet512 vmfb to look up the "unet" options rather than attempt to
find a non-existent set of options for "unet512"

Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-06-18 06:42:20 -07:00
Stefan Kapusniak
a7b6ec4095 Fix unet512 always being used when --max_length=77 (#1547)
* Switches a few places in the SD pipeline where an assumption of
max_length=64 was being made, to using the actual max_length
as passed into the pipeline. This prevents unet512 always being
used and producing different images than previously when
--max_length=77
2023-06-18 06:41:25 -07:00
Ean Garvey
d80b087d95 Add PIL hidden imports to sd spec. (#1544)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-06-18 06:39:08 -07:00
Stefan Kapusniak
297a209608 Remove workarounds for gradio tempfile bugs (#1548) 2023-06-17 19:50:36 -07:00
gpetters94
b204113563 Add UNet512 (#1504)
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
2023-06-17 03:46:25 -04:00
Chi_Liu
f60ab1f4fa Add Deberta to stablehlo in shark tank (#1545) 2023-06-16 13:24:44 -07:00
Surya Jasper
b203779462 Added Adreno target triples to vulkan_utils (#1543) 2023-06-15 16:42:59 -07:00
Stefan Kapusniak
38570a9bbb Some Fixes for update to gradio 3.34.0 (#1538)
* Fixes randomize seed buttons that stopped working.
* Update now deprecated method to set initial colums for output
gallery to the newer undeprecated one.
2023-06-15 01:10:36 -07:00
dependabot[bot]
a5c882f296 Bump gradio from 3.15.0 to 3.34.0 (#1518)
Bumps [gradio](https://github.com/gradio-app/gradio) from 3.15.0 to 3.34.0.
- [Release notes](https://github.com/gradio-app/gradio/releases)
- [Changelog](https://github.com/gradio-app/gradio/blob/main/CHANGELOG.md)
- [Commits](https://github.com/gradio-app/gradio/compare/v3.15.0...v3.34.0)

---
updated-dependencies:
- dependency-name: gradio
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-14 18:13:48 -07:00
Ean Garvey
eb6d11cfed Change mlir dialects for tf tests to stablehlo. (#1535)
* Change mlir dialects for tf tests to stablehlo

* Update shark_runner.py
2023-06-14 10:43:49 -07:00
Vivek Khandelwal
46184a81ac Add Falcon pipeline (#1534) 2023-06-14 09:39:16 -07:00
PhaneeshB
149165a2f0 add multi-device mutli-precision vmfb names 2023-06-14 22:08:24 +05:30
dan
bec82a665f mega vicuna merge
single endpoint in apps/language/models/scripts/vicuna.py
removed main functions from pipelines
replaced divergent utils compile with shark_importer
adds support for different precisions
2023-06-14 19:06:29 +05:30
Ean Garvey
9551490341 Remove deprecared --iree-mhlo-demote-164-to-132 flag usage. (#1533) 2023-06-13 22:40:47 -05:00
Ean Garvey
49b3ecdbca (pytest) don't run redundant tests in cpu suite (#1532) 2023-06-13 22:40:33 -05:00
Ean Garvey
f53e3594c3 OPT Refactor (#1516)
* Change script to 1.3b model and add pytorch comparison

* fix CLI command

* Match OPT transformers model updates + numerics against latest version

* Cleanup OPT sentence completion script.

* Fix formatting and add standalone validation scripts.

* Add minimal OPT wrapper and example with import_with_fx

* Rename OPT full model wrapper.

* Cleanup test scripts for OPT.
2023-06-13 22:40:07 -05:00
Ean Garvey
5562d1dfda Fix xfails for cpu pytest cases (#1527)
Adding cpu-sync and cpu-task device configs was allowing respective tests to bypass the xfail conditional for cpu pytests marked in tank/all_models.csv. This commit updates the conditional to xfail those cases for cpu-sync and cpu-task as well.
2023-06-13 17:01:51 -07:00
Stefan Kapusniak
c7b0c2961e UI/Web Improve output gallery temp file handling (#1531)
* On startup report that cleaning up of temp files is taking place, in
case it takes a long time.
* Have the output gallery tab delete any zero length temporary files
generated by gradio < 3.32.0 for its gallery control whenever it
needs to update that control with images. This prevents such
files multiplying out of control.
2023-06-13 16:25:37 -05:00
Ean Garvey
44273b0791 Fix conditional in transform_fx() (#1530) 2023-06-13 16:24:53 -05:00
Prashant Kumar
0a4c8fcb3e Minor changes in the fx transforms. 2023-06-13 21:23:35 +05:30
Stefan Kapusniak
2fec3c8169 re-indents add_upcast in shark importer (#1523)
* The two with blocks in add_upcast appear to be underindented making
SD 1.4 break on rdna3, I've pushed them out one more tab, and then
everything appears to work again.
2023-06-12 14:41:10 -05:00
Gaurav Shukla
5e7d5930dd [vicuna] Add device and precision propagation in vicuna (#1520)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-06-12 12:14:43 -05:00
Prashant Kumar
b6dbd20250 Modify the fx transforms. (#1521)
- The bounds are set properly.
- The upcasting and downcasting is done for vicuna.
2023-06-12 09:40:14 -07:00
Nithin Meganathan
34f1295349 Add a model config generator (#1511)
Model config generator takes a PyTorch model as input and generates a JSON file with model layers and other propperties that define sharding on a particular hardware.
2023-06-09 15:32:00 -07:00
Phaneesh Barwaria
1980d7b2c3 Cpu device map (#1515)
* update cpu iree device

* fix vmfb paths vic unsharded
2023-06-09 11:27:02 -05:00
powderluv
2cfacc5051 fix osx torch_mlir (#1513)
* fix osx torch_mlir

* Update index.py

* Update index.py
2023-06-09 00:57:26 -07:00
Phaneesh Barwaria
436f58ddc4 cli using generate and mem fixes (#1509) 2023-06-08 13:13:32 -05:00
Phaneesh Barwaria
6b29bd17c8 Enable compilation vicuna (#1507)
* add cli for unsharded vic

* enable mlir download and compile
2023-06-07 13:08:22 -07:00
Ean Garvey
2c3485ca3e Add standalone OPT sentence completion script. (#1506) 2023-06-07 10:58:03 -07:00
Daniel Garvey
f206ecc635 reenable compilation in vicuna pipeline, add flags (#1505)
* replace vicuna.py backend with pipeline

* add some memory management to fist vicuna compile

reenable compilation
2023-06-07 09:49:27 -07:00
Stefan Kapusniak
a187e05ae6 Prevent having no cuda devices breaking the UI (#1503)
Don't break the UI when the LLM tab only wants cuda devices but there
aren't any.
2023-06-06 11:41:16 -07:00
Gaurav Shukla
8c21960486 [vicuna] Set only cuda devices in vicuna UI for now
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-06-06 22:15:20 +05:30
Gaurav Shukla
be62fce676 [vicuna] Fix vicuna chatbot (#1499)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-06-06 09:23:32 -07:00
PhaneeshB
f23b778a6c remove old vicuna scripts 2023-06-06 21:35:58 +05:30
PhaneeshB
436edf900d add vic sharded pipeline 2023-06-06 21:35:58 +05:30
Gaurav Shukla
ed58c2553f [vicuna] Integrate vicuna in shark studio
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-06-06 20:57:48 +05:30
Stefan Kapusniak
f2ca58e844 Add .csv and .json param info to output gallery (#1495) 2023-06-06 07:08:34 -07:00
Ean Garvey
1dbcc736eb [SD] (RDNA2) Enable new tuning for sd1.4 (#1498) 2023-06-06 06:48:58 -07:00
Phaneesh Barwaria
a83808ddc5 Vicuna cuda on A100 40G (#1496)
* vic chat with memory management (precompiled vmfb)

* fix vmfb path and download
2023-06-06 15:10:33 +05:30
Ean Garvey
a07fe80530 Update OPT, ResNet example scripts. (#1492)
* Update API in OPT example.

* fix resnet50 script

* Add OPT1.3b test script.
2023-06-05 20:19:35 -07:00
Ean Garvey
d0ba3ef8fa disable use_tuned on SD1.4 for rdna2 (#1490)
this is a temporary measure while we retune SD1.4 for rdna2. The current config fails during iree-compile.
2023-06-05 19:46:16 -05:00
Stefan Kapusniak
8400529c2c Fix output gallery not using shark_tmp (#1493)
This fix the gallery component of the  output gallery dumping temporary
files into the standard folders rather than shark_tmp so those files never
got cleared out on restart and would build up.
2023-06-05 16:23:49 -05:00
powderluv
7eaee9c242 update SHARK to nodai SHARK 2023-06-05 00:44:49 -07:00
powderluv
8230eebce5 Switch to CPU torch builds for shark.whl 2023-06-05 00:36:03 -07:00
Ean Garvey
6296ea4be9 fix config handling for sd1.4 on rdna2 (#1489) 2023-06-05 00:02:30 -07:00
Ean Garvey
4151ec3a8f (pytest) tag efficientnet, mobilenet as xfails on vulkan (#1488) 2023-06-04 23:22:32 -07:00
powderluv
a2467e8d43 Enable SHARK whl packages 2023-06-04 23:21:22 -07:00
Ean Garvey
e677178bcc Replace RDNA2 SD lowering configs. (#1486) 2023-06-05 00:57:43 -05:00
Anush Elangovan
7ef1bea953 XFAIL some macos tests 2023-06-04 15:27:03 -07:00
Chi_Liu
ad89bb1413 Add distilgpt2 to stablehlo in shark tank (#1481) 2023-06-02 16:44:46 -05:00
Ean Garvey
218ed78c40 Change instances of input_type='mhlo' to 'auto' (#1482) 2023-06-02 16:43:47 -05:00
Stefan Kapusniak
6046f36ab6 UI/Web: Fix upscaler stop button (mostly) (#1479)
* UI/Web: Fix upscaler stop button

* Hook the cancel_sd function up to the Stop button.
* Adds checks for SD_STATE_CANCEL in the upscaler ui inference function.
* Set and check for SD_STATE_IDLE, SD_STATE_CANCEL in the upscaler
pipeline.

* UI/Web: lint fixes for upscaler stop button fix

---------

Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-06-01 22:26:55 -07:00
Foxlum
5915bf7de3 Add to and tweak vulkan configuration environments. (#1475)
* Update vulkan_target_env_utils.py

* Update vulkan_target_env_utils.py

Adjust target environment capabilities.

* Update vulkan_target_env_utils.py

black linted?
2023-06-01 22:25:20 -07:00
Phaneesh Barwaria
f0a4e59758 LLM Pipeline Wrapper (#1477)
* [LLM] Add LLM pipeline

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

* add base pipeline and stableLM

* StableLM on UI - full block

* add SLM default model name

* add vicuna with pipeline

* add one token gen api for vic

* Fix stableLM bugs

* debug vic memory

* lint fix

---------

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
Co-authored-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-05-31 10:17:20 -07:00
Stefan Kapusniak
1ddef26af5 Web/UI: Add an Output Gallery tab for SD (#1470)
* WebUI: Adds an Output Gallery tab

Adds an new Output Gallery tab to the ui/webui with these features:

* Subdirectory select dropdown listing subdirectories at any depth below
the <output_dir>/generated_imgs directory,
* Large, full height, gallery area displaying the images in the selected
subdirectory. Shows nod logo when no images are in the selected
subdirectory.
* Slider that changes the number of columns of images that the gallery
displays from between 1 to 16 columns (defaults to 4).
* Expandable parameter info panel showing any generation parameters
saved in the file of the selected image for PNGs, alternatively the
image's EXIF data for JPEGs
* Send to buttons for txt2img, img2img, inpaint, outpaint and upscaler.
* Auto update of gallery and gallery label (to show generation status),
when a new image is generated by any of the stable diffusion tabs, and
is outputted to the currently selected subdirectory.
* Command line option for enabling and disabling the output gallery
(defaults to enabled)
* Command line option for following symlinks when getting entries
for the subdirectory list (defaults to off, as Python os.walk doesn't
check for circular references if following symlinks)

* Reformat with black

Reformat changes with black and then adjust some places where black's
formatting then needed some rephrasing of the code to make things
clearer.

* Add back transformers and sd_cancel imports

Adds back the transformers import in index.py needed for .exe
generation. Add comment so it doesn't get mistakenly removed
next time.
Adds back sd_cancel import in upscaler.py that is currently unused
but should be being used for the 'Stop' button.
2023-05-30 13:47:48 -07:00
Chi_Liu
ba8eddb12f Add GPT3/OPT to Stablehlo in shark tank (#1468)
Co-authored-by: AmosLewis <Amos_Lewsi@foxmail.com>
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
2023-05-29 21:58:39 -07:00
yzhang93
47b346d428 Modify the lowering config format for SPIRVMatmulPromoteVectorize pipeline (#1471) 2023-05-29 21:53:48 -07:00
Ean Garvey
1b4f4f5f4d Fix download path for SD1.4 Unet. (#1469) 2023-05-26 11:59:51 -07:00
Elias Joseph
73cd7e8320 added full vicuna to vicuna.py 2023-05-26 22:06:40 +05:30
Ean Garvey
19c0ae3702 Cleanup SD pipeline utils (#1466) 2023-05-25 12:50:11 -05:00
Ean Garvey
54e57f7771 Revive SD downloads from shark_tank. (#1465) 2023-05-25 12:03:21 -05:00
PhaneeshB
6d64b8e273 vic and slm common generation base 2023-05-25 20:29:41 +05:30
PhaneeshB
a8ea0326f5 correct SLM saved vmfb naming 2023-05-25 20:29:41 +05:30
PhaneeshB
58e9194553 add Lists import 2023-05-25 20:29:41 +05:30
PhaneeshB
eb360e255d remove unused imports 2023-05-25 20:29:41 +05:30
PhaneeshB
a6f88d7f72 refactor mlir compile 2023-05-25 20:29:41 +05:30
Prashant Kumar
8e571d165f Enable cpu f16 dtype tracing for the vicuna model. (#1461) 2023-05-24 09:37:57 -07:00
Ean Garvey
3cddd01b10 Update OPT tokenizer and xfail a few more large tests on macos CI (#1459)
* Update opt_torch_test.py

* Update all_models.csv
2023-05-23 14:36:57 -07:00
Chi_Liu
64c2b2d96b Add gpt2 to stablehlo support in shark tank (#1447)
- Add torch decomposition support when generating shark tank
- Add gpt2 stablehlo
2023-05-22 10:45:51 -07:00
Phaneesh Barwaria
f5ce121988 SLM on Sharkstudio (#1454)
* localize import, fix file reading, device cpu

* extract out model args
2023-05-19 11:21:08 -07:00
Ean Garvey
991f144598 Add iree hidden imports to SD spec (#1456)
* Add iree hidden imports to SD spec

* Update shark_sd_cli.spec
2023-05-19 11:19:16 -07:00
PhaneeshB
09bea17e59 fix #2 SLM in SharkStudio 2023-05-18 00:56:22 +05:30
Daniel Garvey
aefcf80b48 swap to cpu an remove hardcoded paths (#1448)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-05-17 10:53:34 -07:00
PhaneeshB
512235892e fix SLM for SharkStudio 2023-05-17 22:34:30 +05:30
PhaneeshB
6602a2f5ba add continuous output for CLI 2023-05-17 18:33:46 +05:30
Boian Petkantchin
20114deea0 In MiniLM JAX example verify MLIR result against JAX 2023-05-16 09:54:07 -07:00
Boian Petkantchin
9acf519078 Add option to skip venv creation in setup script 2023-05-16 09:54:07 -07:00
Boian Petkantchin
bdf37b5311 If device/backend is unknown pass it to IREE verbatim 2023-05-16 09:54:07 -07:00
powderluv
8ee2ac89f8 Rename sharded_vicuna_fp32_web.py to vicuna_web.py 2023-05-16 09:41:35 -07:00
powderluv
60cb48be2e Rename sharded_vicuna_fp32.py to vicuna.py 2023-05-16 09:40:51 -07:00
powderluv
86a215b063 Delete sharded_vicunia.py 2023-05-16 09:37:39 -07:00
powderluv
d6e3a9a236 Delete standalone_vicuna.py 2023-05-16 09:37:26 -07:00
Chi_Liu
a0097a1ead Add mlir_type for torch_model_list.csv (#1428)
- Enable stablehlo/tosa mlir output for torch model
- Add BERT stablehlo support
2023-05-15 10:23:54 -07:00
Ean Garvey
a9bae00606 Fix vulkan device selection at compile time and adapt to IREE python changes. (#1407)
* Add support for vulkan device selection at compile time.

* Don't convert device ID to int and fix .exe imports
2023-05-12 23:31:50 -07:00
Daniel Garvey
4731c1a835 prevent loading tokenizer on import (#1432)
also adds sentencepiece dep for exe
moved vicuna imports to after an if statement
in general we should avoid importing files that load whole models as
global variables
2023-05-12 19:11:45 -07:00
Ean Garvey
4c07e47e8c Specify a few models for expected failure on CUDA CI. (#1430) 2023-05-12 17:03:37 -05:00
Gaurav Shukla
e0cc2871bb [SD] Yield 2 tokens at a time in vicuna
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-05-11 23:49:01 +05:30
Gaurav Shukla
649f39408b [SD] Fix vicuna response
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-05-11 18:06:21 +05:30
Gaurav Shukla
c142297d73 [SD] Fix gradio to 3.22.0 version
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com
2023-05-11 18:05:55 +05:30
Gaurav Shukla
9e07360b00 [SD] Standalone vicuna with web
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-05-11 17:23:44 +05:30
Gaurav Shukla
7b74c86e42 [SD] Fix SAMPLE_INPUT_LEN import issue
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-05-11 15:41:43 +05:30
Eliasj42
fa833f8366 fixed spacing issue with chat-bot (#1417)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-05-10 16:07:50 -07:00
Gaurav Shukla
fcb059aa38 [SD] Integrate vicuna in the web (#1410) 2023-05-10 11:30:22 -07:00
PhaneeshB
517c670f82 vicuna chat cli 2023-05-10 22:55:06 +05:30
Eliasj42
59df14f18b added vicuna demo (#1408)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-05-09 21:18:20 -07:00
Ean Garvey
6c95ac0f37 Revert dialect registration in model annotator (#1406)
Matches https://github.com/nod-ai/SHARK-Runtime/pull/58
2023-05-09 11:50:19 -07:00
Daniel Garvey
7a4a51ae73 vulkan vic f16 (#1404)
Co-authored-by: dan <dan@nod-labs.com>
2023-05-08 16:46:53 -07:00
powderluv
d816cc015e Revert "added standalone vicuna script (#1399)" (#1402)
This reverts commit 0e4a8ca240.
2023-05-05 16:08:05 -07:00
Eliasj42
54ce3d48ca added standalone vicuna script (#1401)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-05-05 18:05:52 -05:00
Eliasj42
0e4a8ca240 added standalone vicuna script (#1399)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-05-05 15:46:05 -07:00
Daniel Garvey
6ca1298675 maximizes window size for webview launch (#1394) 2023-05-04 20:43:06 -07:00
jinchen62
bbef7a6464 Redesign model manager webui (#1391) 2023-05-04 20:41:29 -07:00
Ean Garvey
cdf2d61d53 Remove imports from iree.compiler.transforms from model annotator. (#1392) 2023-05-04 20:40:19 -07:00
Ean Garvey
6c14847d1f xfail some large tests on macOS builder and switch to hash updates. (#1341)
* Update test-models.yml

* Disable large tests on macOS builder
2023-05-04 19:47:03 -05:00
Gaurav Shukla
68ecdd2a73 [SD] Add LoRA as experimental tab
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-05-04 22:30:25 +05:30
Gaurav Shukla
3f4d444d18 [SD] Fix stable LM chatbot
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-05-04 22:30:25 +05:30
m68k-fr
e473d0375b [Web] Models folders cleanup (#1365) 2023-05-03 16:13:20 -05:00
Ean Garvey
e38d96850f Fix input image loading in img2img rest API (#1388) 2023-05-03 15:51:00 -05:00
Gaurav Shukla
fed63dfd4b [SD] Add stableLM chatbot (#1383)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-05-03 15:37:20 -05:00
Boian Petkantchin
eba4d06405 In MiniLM JAX example do not hardcode device (#1385)
* In MiniLM JAX example do not hardcode device

* In MiniLM JAX example don't use bytecode MLIR

---------

Co-authored-by: Boian Petkantchin <boian@nod-labs.com>
2023-05-03 10:34:42 -07:00
Boian Petkantchin
4cfba153d2 Add example JAX MiniLM inference (#1380)
* Do not hardcode the name of the VM module in get_iree_module

* Add example JAX MiniLM inference

---------

Co-authored-by: Boian Petkantchin <boian@nod-labs.com>
2023-05-02 15:03:54 -07:00
jinchen62
307c05f38d Convert original vae to diffusers (#1382) 2023-05-02 01:27:28 -07:00
jinchen62
696df349cb Fix curl issue (#1369) 2023-04-28 09:31:14 -07:00
jinchen62
cb54cb1348 Add model manager tab for SD webui (#1368) 2023-04-28 02:43:40 -07:00
Daniel Garvey
9bdb86637d add tkinter launch for webui (#1364) 2023-04-27 19:17:55 -05:00
jinchen62
fb6f26517f Fix webui note (#1367) 2023-04-27 16:14:43 -07:00
Chi_Liu
aa8ada9da9 Add support for torch to stablehlo and tosa in shark_importer (#1360) 2023-04-27 08:09:45 -07:00
powderluv
1db906a373 Revert "Add model manager tab for webui (#1359)" (#1362)
This reverts commit 9d1d1617d8.
2023-04-26 22:25:26 -07:00
jinchen62
9d1d1617d8 Add model manager tab for webui (#1359) 2023-04-26 13:38:18 -07:00
jinchen62
7112789cb8 Add support of using civitai model download url (#1357) 2023-04-25 23:39:52 -07:00
jinchen62
d6b8be2849 Add drawing canvas for img2img stencil scribble (#1355) 2023-04-25 14:41:01 -07:00
powderluv
822171277c Revert "[SD] Add FastChat as part of SD WebUI (#1349)" (#1350)
This reverts commit a5ae9d9f02.
2023-04-24 15:22:25 -07:00
Abhishek Varma
a5ae9d9f02 [SD] Add FastChat as part of SD WebUI (#1349)
-- This commit includes FastChat as part of SD WebUI.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-04-24 11:12:58 -07:00
powderluv
09e3f63d5b Fix pascal (#1346)
* Add fp32 for upscaler VAE

* Plumb Pascal vulkan support
2023-04-23 20:28:25 -07:00
powderluv
d60a5a9396 Add fp32 for upscaler VAE (#1345) 2023-04-23 15:27:55 -07:00
m68k-fr
90df0ee365 [Web] Gallery set to a 768px reference for high-end desktop users (#1344) 2023-04-23 11:48:06 -07:00
nirvedhmeshram
133c1bcadd add device to scheduler model names (#1338) 2023-04-22 20:13:56 -05:00
powderluv
caadbe14e9 Revert VAE to use im2col (#1339) 2023-04-22 15:23:41 -07:00
Ean Garvey
5f5823ccd9 Fix inference object imports for SD apps. (#1334) 2023-04-21 13:40:48 -05:00
Vivek Khandelwal
d2f7e03b7e Add StableLM model (#1331) 2023-04-21 09:51:02 -07:00
Gaurav Shukla
0b01bbe479 [SD] Add txt2img/upscaler/inpaint/outpaint Rest API (#1325)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-04-21 09:06:06 -07:00
yzhang93
25c5fc44ae Modify tuner.py to take vulkan target triple flag (#1328) 2023-04-20 14:31:32 -07:00
Daniel Garvey
7330729c92 enable sd pytest (#1322) 2023-04-19 22:11:30 -05:00
Ean Garvey
ce16cd5431 Create local shark_tank if needed for tuning configs. (#1321)
Now that --clear_all successfully deletes local shark_tank cache, we need to make sure it exists before trying to use it.
2023-04-19 11:44:21 -05:00
Ean Garvey
598dc5f79d Don't dump image data on img2img api call. (#1320) 2023-04-19 21:24:46 +05:30
Abhishek Varma
1f8e332cbe [SD] Fix img2img API bug for custom_vae argument (#1319)
-- https://github.com/nod-ai/SHARK/pull/1314 misses to add `custom_vae`
   parameter to img2img_if's invocation within img2img_api.
-- This commit adds a fix to the same.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-04-19 10:39:52 -05:00
Abhishek Varma
17b9632659 [SD] Adapted SHARK's v1 img2img API for SdPaint + updated Stencil model ID (#1318) 2023-04-19 06:29:36 -07:00
jinchen62
bda92a54ab Fix custom vae path (#1317) 2023-04-18 20:50:43 -07:00
jinchen62
747ed383b1 Add custom vae dropdown in webui (#1314) 2023-04-18 17:24:02 -07:00
Ean Garvey
1afe07c296 Disable winograd on VAE with rdna2 and fix unet tuning. (#1313)
* Disable winograd on VAE with rdna2 and fix unet tuning.

* Fix batch size 1 downloads and clear_all on windows.
2023-04-18 15:55:10 -05:00
jinchen62
b70919b38d Fix memory leak with ondemand (#1312)
support ondemand for outpainting and multi batch_count
2023-04-18 13:03:16 -05:00
m68k-fr
4e513d647f Update list of scheduler available for inferences (#1298) 2023-04-17 22:37:00 -05:00
jinchen62
94cd2a0fed Fix outpainting config (#1310) 2023-04-17 10:48:52 -07:00
Kyle Herndon
606029c01c Fix LoRA device format bug and allow LoRA to resume from a previous training 2023-04-17 13:19:46 +05:30
powderluv
1aa85222e9 Add AMD W7900 target triple (#1304)
This maps to RDNA3
2023-04-16 00:14:21 -07:00
m68k-fr
1b3f468c04 [Web] Style Fixes for Gradio V3.25.0 (#1300) 2023-04-13 18:40:42 -05:00
m68k-fr
35de7e27fa [Web] remove txt2img ui dependencies from png import metadata (#1275) 2023-04-12 07:32:47 -10:00
yzhang93
467f900759 Add auto-tuner to SD apps (#1291) 2023-04-12 09:21:17 -07:00
Ean Garvey
0bd9d582c7 Add documentation for using SHARK with AI-Render (#1296) 2023-04-12 03:09:34 -10:00
jinchen62
428cfe8dae Fix low vram mode issues (#1295)
- add ondemand back to img2img
- workaround memory leak for batch count
2023-04-11 17:59:09 -07:00
Ean Garvey
f17915bedc Fix batch size appending to model name. (#1294)
* Update shark_downloader.py

* Update shark_downloader.py
2023-04-11 15:34:25 -05:00
Gaurav Shukla
1b49b5149a [SD] Add Img2Img rest API
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-04-11 23:06:58 +05:30
jinchen62
3002793301 Unload clip on demand and workaround memory leak (#1283) 2023-04-10 16:59:03 -07:00
Phaneesh Barwaria
d25ef5529f Add fix for vae fp32 Upscalar (#1284)
- fixes size mismatch error for upscalar vae
2023-04-07 14:36:40 -05:00
Ean Garvey
308856a947 Touch unet if base cfg needed for SD pipeline init (#1281) 2023-04-05 03:02:29 -05:00
m68k-fr
151b4e142f [SD] Fix encoder error for model_max_length not beeing 77 (#1278)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-04-04 22:39:29 -07:00
Ean Garvey
e5a69a7c36 pin diffusers to e47459c (#1279) 2023-04-04 18:29:21 -07:00
m68k-fr
450b6cafc4 [SD] Add weight emphasis to prompts encoder (#1276) 2023-04-04 09:47:04 -07:00
Daniel Garvey
237d26baa2 update model db to reflect changes (#1277)
* remove 1/1 tqdm progress bar

* update model_db to reflect changes
2023-04-04 11:46:55 -05:00
Daniel Garvey
67d6ee1104 remove 1/1 tqdm progress bar (#1274) 2023-04-03 22:30:09 -05:00
Ean Garvey
98b069488e Add tank_version.json (#1272) 2023-04-03 18:36:23 -07:00
jinchen62
e0f227643a Fix webui circular import issue (#1271) 2023-04-03 16:00:10 -07:00
jinchen62
a0af3bb0cb xload and unload models (#1242) 2023-04-03 14:42:18 -07:00
powderluv
2cd61a5b96 strip source map (#1270) 2023-04-03 14:41:32 -07:00
Gaurav Shukla
f49d41a807 [SD] Add Stable diffusion text2image rest API (#1265)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-04-03 12:02:24 -07:00
Ean Garvey
2191fc8952 Separate pytest benchmark modes and fix model updates for SHARK downloader / pytest. (#1264)
* Only xfail windows models in CI

* downloader: make model updates more robust.

* Separate baseline and native benchmarks in pytest.

* Fix native benchmarks

* Fix torchvision model utils.
2023-04-03 08:24:21 -07:00
PhaneeshB
aea7796e60 add gradio client to spec 2023-04-03 18:57:19 +05:30
Abhishek Varma
a376619f1e [SD] Improve vmfb caching algo and retry mechanism (#1248)
-- This commit gets rid of the all-or-nothing vmfb caching mechanism
   and improves the retry mechanism by providing lower-level granularity
   for compiling each model units.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
2023-03-31 09:38:14 -07:00
powderluv
02d52bb626 Add Intel ARC A770 target triple (#1263)
This just enables the plumbing. It generates black images.
2023-03-29 14:49:05 -07:00
Abhishek Varma
3b63645f79 [SD] Fix custom model path for WebUI (#1260)
-- This commit fixes custom model path for WebUI.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-03-29 09:48:11 -07:00
Ean Garvey
d6f740b998 allow pytest to retry getting model artifacts + disable autotuning for pytorch benchmarks (#1257)
* Adds a few xfails to enable macOS builder

* Convert string batch sizes to ints where needed.

* allow pytest to retry getting model artifacts

* Reduce attempts and add assert msg.
2023-03-28 23:38:45 -05:00
Daniel Garvey
594c6b8ea2 fix ckpt dir (#1258) 2023-03-28 14:31:01 -07:00
Ean Garvey
96b1560da5 Make batch size configurable via pytest and fix sharktank generation. (#1227)
* Fix sharktank generation and add batch_size pytest option for torch.

* Disable torch dynamo until py3.11 supported

* Compile torchmodel without dynamo if torch.compile fails

* Use release versions of TF/Keras for importer.

* Pin torchvision and remove debug prints.

* Remove duplicates from torch model list.

* Update generate_sharktank.py

* xfail a few models that fail sharktank generation/ numerics
2023-03-28 14:33:39 -05:00
Abhishek Varma
0ef6a0e234 [SD] Fix Stencil scribble crash by updating image resize (#1255)
-- This commit updates Stencil resize feature to cap the size of
   images within [128,768] as supported by the SD pipeline.
-- This solves the issue of scribble crashing on larger image.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-03-28 10:13:11 -07:00
Gaurav Shukla
641d535f44 [SD] Fix device path issue for cpu (#1256)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-28 10:09:49 -07:00
Daniel Garvey
5bb7846227 single entry point exe for all cli apps (#1158)
usage:
add --app="img2img" (or "inpaint" "outpaint" "txt2img")
2023-03-28 11:15:21 -05:00
yzhang93
8f84258fb8 Fix check for use_tuned conditions (#1252) 2023-03-27 11:21:25 -07:00
Ean Garvey
7619e76bbd Disable and xfail some models that fail validation/compilation. (#1251)
* Rollback T5 models for torch as the inputs give some issues that aren't trivial to resolve
* xfail efficientnet-b0 on torch+cuda -- see CUDA requesting shared memory size larger than allowed size openxla/iree#12771
2023-03-27 12:42:53 -05:00
Daniel Garvey
9267eadbfa disable openjourney gen for nightly (#1249) 2023-03-27 11:55:34 -05:00
Phaneesh Barwaria
431132b8ee Fix img2img mode switch (#1247)
* add updated scheduler value in global config

* clear scheduler global variable with others
2023-03-27 07:01:22 -07:00
cstueckrath
fb35e13e7a fix Python version detection bug (#1246)
* fix Python version detection bug

* Update setup_venv.ps1
2023-03-27 07:00:40 -07:00
yzhang93
17a67897d1 Add SD v2.1 768x768 tuned model (#1244)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-03-24 10:39:15 -07:00
Gaurav Shukla
da449b73aa [SD] Disable lora training tab for now (#1241)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-24 09:16:24 -07:00
Kyle Herndon
0b0526699a Fix incorrect device argument initialization for LoRA training by extracting the device type and number and formatting it for pytorch (#1237)
Co-authored-by: Kyle Herndon <kyle@nod-labs.com>
2023-03-24 01:10:50 -07:00
Boian Petkantchin
4fac46f7bb In models testing fix paths to be relative to the script dir not cwd (#1128)
authored-by: Boian Petkantchin <boian@nod-labs.com>
2023-03-22 15:26:52 -05:00
Daniel Garvey
49925950f1 fix false positives (#1193) 2023-03-22 15:25:39 -05:00
Thomas
807947c0c8 Remove deprecated cli option iree-hal-cuda-disable-loop-nounroll-wa (#1235) 2023-03-22 12:05:15 -05:00
Abhishek Varma
593428bda4 [SD] Fix for transformers/__init__.py issue in PyInstaller (#1233)
-- This commit fixes the transformers/__init__.py issue in PyInstaller.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-03-22 08:43:53 -07:00
Abhishek Varma
cede9b4fec [SD] Fix custom_vae as a required parameter in inpaint (#1232) 2023-03-22 04:30:17 -07:00
Prashant Kumar
c2360303f0 Add the int8 quantized model. 2023-03-22 16:28:13 +05:30
jinchen62
420366c1b8 Move schedulers to global obj (#1225) 2023-03-21 22:40:43 -07:00
Ean Garvey
d31bae488c Set iree-input-type to tm_tensor for SD (#1228) 2023-03-21 19:07:31 -07:00
Kyle Herndon
c23fcf3748 Fix incorrect device argument initialization for LoRA training (#1231)
Co-authored-by: Kyle Herndon <kyle@nod-labs.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-03-21 19:07:18 -07:00
jinchen62
7dbbb1726a Fix SD obj not defined if fail to get models from pretrained (#1222) 2023-03-21 07:55:17 -07:00
Abhishek Varma
8b8cc7fd33 [SD] Update LoRA inference to handle various checkpoints (#1215) 2023-03-21 06:52:20 -07:00
Ean Garvey
e3c96a2b9d Move sentencepiece to importer requirements. (#1218) 2023-03-21 00:39:57 -05:00
Ean Garvey
5e3f50647d Set --vulkan_large_heap_block_size default to 2gb. (#1220) 2023-03-20 21:07:09 -07:00
gpetters94
7899e1803a Add fix for attention slicing fp16 (#1217) 2023-03-20 19:11:29 -07:00
mariecwhite
d105246b9c Fix t5 models 2023-03-21 10:39:59 +11:00
mariecwhite
90c958bca2 Add T5-base and T5-large Torch and TF Models (#1116) 2023-03-20 17:32:50 -05:00
mariecwhite
f99903e023 Add EfficientNet B0 and B7 Torch and TF models 2023-03-21 09:22:05 +11:00
mariecwhite
c6f44ef1b3 Add EfficientNet B0 and B7 Torch and TF models 2023-03-21 09:14:45 +11:00
mariecwhite
8dcd4d5aeb Make batch size configurable 2023-03-20 18:03:17 -04:00
Phoenix Meadowlark
d319f4684e Add peak memory reporting for IREE, TF and PyTorch (#1216) 2023-03-20 15:40:49 -05:00
Ean Garvey
54d7b6d83e Generate model artifacts in pytests if they don't exist in the cloud. (#1121)
* Add gen_shark_files fn to shark_downloader for OTF artifact generation

* add generate_sharktank as a tank/ python module.

* Fix some paths in tank generation.
2023-03-20 12:13:19 -05:00
m68k-fr
4a622532e5 [Web] Stop images (#1212) 2023-03-19 14:37:30 -07:00
cstueckrath
650b2ada58 add pytorch_lightning to requirements (#1211)
* add pytorch_lightning to requirements

this will additionally add lightning-utilities and torchmetrics

* Update shark_sd.spec

* Update shark_sd_cli.spec
2023-03-19 12:29:54 -07:00
m68k-fr
f87f8949f3 [Web] CSS fix for gradio V3.22.1 (#1210) 2023-03-19 06:13:59 -07:00
m68k-fr
7dc9bf8148 [Web] Move "stop Batch" button to "Advanced Options" toggle (#1209) 2023-03-18 20:54:42 -07:00
Kyle Herndon
ba48ff8d25 Implement LoRA training and UI for training and UI for inference in img2img, inpaint, outpaint (#1200)
txt2img inference UI is already committed.

Co-authored-by: Kyle Herndon <kyle@nod-labs.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-03-17 12:54:56 -07:00
Gaurav Shukla
638840925c [SD] Add support for larger size upscaling (#1204)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-17 10:20:48 -07:00
m68k-fr
b661656c03 [Web] Fix custom model path for upscaler (#1199) 2023-03-16 15:57:23 -07:00
Gaurav Shukla
0225434389 [SD] Add sendTo Upscaler
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-16 20:49:19 +05:30
Gaurav Shukla
7ffe20b1c2 [SD] Release memory used by upscaler when not in use
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-16 20:49:19 +05:30
Gaurav Shukla
d8f0c4655d [SD] Add Upscaler web
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-16 20:49:19 +05:30
Gaurav Shukla
7e8d3ec0df [SD] Add upscalar pipeline
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-16 20:49:19 +05:30
jinchen62
9c08eec565 Clear memory cache when switching model and mode (#1194) 2023-03-15 22:18:26 -07:00
m68k-fr
2d2c523ac5 [Web] Upgrade Gradio to v3.21.0 (#1188)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-03-15 10:14:49 -07:00
Abhishek Varma
f17b3128c0 [SD] Add LoRA inference to SD pipeline (#1189)
-- This commit adds LoRA inference to SD pipeline.
-- It also modifies txt2img to incorporate the new feature.
   img2img, inpaint, outpaint, etc using Unet can also be extended in a
   similar way.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-03-15 10:13:45 -07:00
Abhishek Varma
7c7e630099 [SD] Add fix for using latest diffusers + add scribble variant to Stencil (#1191)
* [SD] Add Scribble variant in Stencil

-- This commit adds scribble variant in Stencil.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

* [SD] Use latest diffusers

-- This commit points back to the latest diffusers and updates the
   processing script to tackle the Pix2Pix import issue.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

---------

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-03-15 10:13:20 -07:00
m68k-fr
2dd1491ec1 [Web] Add clear queue button (#1192) 2023-03-15 10:12:59 -07:00
Daniel Garvey
236357fb61 add missing import for shark_sd.spec (#1190)
L
2023-03-15 09:23:29 -05:00
Phoenix Meadowlark
7bc38719de Add benchmark artifacts to .gitignore (#1186) 2023-03-14 15:19:06 -07:00
Daniel Garvey
bdbe992769 Add IREE_SAVE_TEMPS for import_debug command (#1184)
based on hf_model_id. Works on windows
2023-03-14 11:40:23 -07:00
Abhishek Varma
e6b925e012 [SD] Add Openpose to Stencil + image size issue fix (#1181)
-- This commit adds openpose model variant to stencil.
-- Fixes image size issue.
-- Also includes fix for the .exe bug introduced by https://github.com/nod-ai/SHARK/pull/1175

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-03-14 10:30:52 -07:00
cstueckrath
771120b76c workaround Gradio issue (#1183)
https://discord.com/channels/973663919757492264/975522729564446740/1085109774758191164
2023-03-14 01:27:24 -07:00
Boian Petkantchin
a8ce7680db Add flag to augment the device allocator (#1182)
Example:
$ python my_app.py --device_allocator caching debug
This will wrap the device allocator with first caching allocator then
debug allocator.

$ python my_app.py --device_allocator caching
Only wrap with caching allocator.

Co-authored-by: Boian Petkantchin <boian@nod-labs.com>
2023-03-13 15:49:26 -07:00
Phaneesh Barwaria
b6dcf2401b Stencil perf improvement (#1179)
* remove conditioning strength multiplier

* mod diffusers lib to v0.14.0
2023-03-13 14:37:38 -07:00
Daniel Garvey
62b5a9fd49 generate sharktank for apps dir (#966)
* merge confix resolution

* add support to other scripts

---------

Co-authored-by: dan <dan@nod-labs.com>
2023-03-13 10:54:15 -07:00
m68k-fr
2f133e9d5c Fix png metadata (#1178) 2023-03-12 22:43:39 -07:00
powderluv
f898a1d332 Update README.md 2023-03-12 16:54:42 -07:00
m68k-fr
b94266d2b9 [Web] Randomize seed to -1 (#1176) 2023-03-12 12:42:31 -07:00
m68k-fr
1b08242aaa [Web] Improve dropdowns ux (#1175) 2023-03-12 12:41:51 -07:00
Abhishek Varma
691030fbab [SD] Improve Stencil feature to handle general image sizes
-- Currently stencil feature works with 512x512 images only.
-- This commit relaxes this constraint and adds support for various
   image sizes.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-03-11 21:48:31 +05:30
m68k-fr
16ad7d57a3 [WebUi] txt2img_ui: Import png metadata (#1147) 2023-03-10 16:26:34 -08:00
Anush Elangovan
c561ebf43c Drop the torch-mlir pin
Seems to work now with top of master
2023-03-10 15:39:04 -08:00
Prashant Kumar
97fdff7f19 Add instructions how to run the LLaMA model. (#1168)
* Add instructions how to run the LLaMA model.

* Update README.md
2023-03-10 12:36:37 -08:00
Anush Elangovan
ce6d82eab2 Fix bloom lint 2023-03-10 11:53:08 -08:00
Abhishek Varma
b8f4b18951 [SD] Use dynamic stencil HF repo id
-- This commit removes the hardcoded HF ID for Stencil and instead
   utilizes a dynamic instantiation of HF model.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-03-10 23:31:45 +05:30
Eliasj42
b23d3aa584 added more memory efficient method to run large bloom models with sharded blooms (#1165)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-03-10 09:32:56 -08:00
Vivek Khandelwal
495670d9b6 Fix SD fine tuning script device arg usage 2023-03-10 18:37:53 +05:30
Boian Petkantchin
815e23a0b8 Update iree-compile flags --iree-llvm-xxx -> --iree-llvmcpu-xxx (#1164) 2023-03-09 11:31:50 -08:00
Boian Petkantchin
783538fe11 Move linting opts from github workflow to config files
This helps development where you can be sure that running locally

black .
flake8 .

will do the same as in the github job.
2023-03-09 10:46:30 -08:00
Boian Petkantchin
996c645f6a In SD don't include device path in vmfb filename
Include only the driver name instead.
2023-03-09 10:45:32 -08:00
m68k-fr
1f7d249a62 Use utf-8 format for imgs_details.csv 2023-03-09 16:15:58 +05:30
jinchen62
7f6c9a2dc2 Add an inpainting option for only masked area (#1154) 2023-03-07 09:46:05 -08:00
Eliasj42
93891984f3 made sharded bloom example more user friendly (#1153)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-03-06 10:23:48 -08:00
Vivek Khandelwal
cc0ef54e0e Fix Stable diffusion fine tuning script 2023-03-06 17:52:16 +05:30
Daniel Garvey
812152485d temporarily xfail tiny convnext macos (#1142) 2023-03-03 13:30:56 -06:00
Vivek Khandelwal
0816fb403a Add Stable diffusion fine tuning script
This commit adds the sd fine tuning script which runs through the
torchdynamo path.
2023-03-03 21:59:00 +05:30
Gaurav Shukla
4f171772be [SD] Fix SD web flags
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-03 21:55:40 +05:30
mariecwhite
a52331d4aa Install IREE pre-releases (#1139) 2023-03-02 23:17:56 -06:00
yzhang93
ad821a1fc8 Use old torch-mlir package to avoid crash on rdna2 (#1137) 2023-03-02 18:16:58 -08:00
Ean Garvey
116b128802 Use nightly shark_tank for test-models (#1133)
* Use nightly shark_tank for test-models

* Update all_models.csv
2023-03-02 12:33:36 -06:00
Gaurav Shukla
b118f183d1 [SD] Fix few things in sendTo feature (#1132) 2023-03-02 09:11:55 -08:00
Gaurav Shukla
911dff16f1 [SD] Add sendTo feature in stable diffusion (#1131)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-03-02 08:42:38 -08:00
Abhishek Varma
de59a66ae4 [SD] Update diffusers to point to the fix for Stencil + add opencv-python (#1130) 2023-03-02 08:19:29 -08:00
Daniel Garvey
23f1468cc6 disable most models on windows pytest (#1125) 2023-03-02 01:37:50 -06:00
jinchen62
080350d311 Make loading custom inpainting models general (#1126) 2023-03-01 22:14:04 -08:00
Phaneesh Barwaria
7f3f92b9d5 remove extra return arg (#1123)
* remove extra return arg

txt2img expects only 3 mlirs

* add venv reqs for stencils
2023-03-01 11:45:24 -08:00
Abhishek Varma
be3cdec290 [SD] Add Stencil feature to SD pipeline (#1111)
* [WIP] Add ControlNet to SD pipeline

-- This commit adds ControlNet to SD pipeline.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

* [SD] Add ControlNet to img2img + fix bug for img2img scheduler

-- This commit adds ControlNet execution to img2img.
-- It restructures the addition of ControlNet variants.
-- It also fixes scheduler selecting bug for img2img pipeline.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

* add shark models for stencilSD

* Add Stencil controlled SD in img2img pipeline (#1106)

* use shark stencil modules

* adjust diffusers change

* modify to use pipeline

* remove control from unet

* pump stencils through unet

* complete integration in img2img

* fix lint and comments

* [SD] Add ControlNet pipeline + integrate with WebUI + add compiled flow execution

-- This commit creates a dedicated SD pipeline for ControlNet.
-- Integrates it with img2img WebUI.
-- Integrates the compiled execution flow for ControlNet.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

* [SD] Stencil execution

* Remove integration setup

* [SD] Fix args.use_stencil overriding bug + vmfb caching issue

-- This commit fixes args.use_stencil overriding issue which caused
   img2img pipeline to pick wrong set of modules.
-- It also fixes vmfb caching issue to speed up the loading time
   and pick right set of modules based on a mask.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>

---------

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: PhaneeshB <b.phaneesh@gmail.com>
2023-03-01 10:44:40 -08:00
m68k-fr
f09574538c [WebUi] Remove unsupported full_width parameter, Reactivate gallery nav while multiple images are generated 2023-03-01 23:17:12 +05:30
Daniel Garvey
b1113ab551 disable benchmark on windows for pytest (#1100) 2023-02-28 18:10:29 -06:00
powderluv
ef756389e3 Revert "add cv2 and nod diffusers (#1112)" (#1114)
This reverts commit cb17d017df.
2023-02-28 14:31:40 -08:00
Phaneesh Barwaria
cb17d017df add cv2 and nod diffusers (#1112) 2023-03-01 01:33:43 +05:30
Gaurav Shukla
798f231792 [SD] Update metadata info and canvas size (#1109)
* [SD] Save missing metadata in case of img2img and outpaint

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

* [SD] Update the canvas size for inpaint/outpaint

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

* [SD] Update output gallery on each inference

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

---------

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-28 11:25:30 -08:00
m68k-fr
7136890da3 [Fix] Unsupported width and height argument error 2023-02-28 23:32:58 +05:30
mariecwhite
d567192fd3 Fix call to Torch Inductor 2023-02-28 00:35:57 -08:00
jinchen62
dcc4025c78 Fix loading custom inpainting models (#1103) 2023-02-27 17:06:09 -08:00
yzhang93
c6c8ec36a1 Enable tuned models for inpainting (#1102) 2023-02-27 16:46:57 -08:00
Quinn Dawkins
1344c0659a Add doc on profiling with Shark (#1101)
* Add doc on profiling with Shark

* Rename doc
2023-02-27 11:31:27 -08:00
powderluv
973f6d20f4 Try pre-pix2pix 2023-02-25 00:09:05 -08:00
powderluv
8b5c9c51e7 Revert "Update diffusers (#1094)" (#1096)
This reverts commit 0064cc2a6e.
2023-02-24 19:27:56 -08:00
jinchen62
bae208bcc4 Fix outpainting params (#1089) 2023-02-24 14:41:32 -08:00
Daniel Garvey
b6c14ad468 Make sd tests output performance metrics into csv (#1085)
* make some paths windows friendly (#1066)

* add csv output to builder script

and reduce number of models tested
2023-02-24 16:27:52 -06:00
powderluv
0064cc2a6e Update diffusers (#1094) 2023-02-24 14:09:19 -08:00
Gaurav Shukla
0a0567e944 [SD] Avoid unnecessary temp file creations (#1092)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-24 10:53:34 -08:00
gpetters94
694b1d43a8 Add attention slicing support (#1087) 2023-02-24 02:43:02 -08:00
Ean Garvey
e7eb116bd2 use tf-nightly for importer (#1077) 2023-02-23 23:14:48 -06:00
yzhang93
596499a08c Disable tuned configs on all inpainting models (#1086) 2023-02-23 13:15:22 -08:00
naveen raj
2a2e460df2 Add DEISMultistep scheduler #1076 (#1084)
* Add DEISMultistep scheduler #1076

* line lenght lint fix
2023-02-23 10:15:05 -08:00
jinchen62
a9039b35ed Add outpainting web UI (#1083) 2023-02-23 01:02:25 -08:00
jinchen62
a01154a507 Add SD outpainting (#1072)
python apps/stable_diffusion/scripts/outpaint.py --prompt="Face of a yellow cat, high resolution, sitting on a park bench" --img_path=test_imgs/overture-creations-5sI6fQgYIuo.png --import_mlir --hf_model_id="stabilityai/stable-diffusion-2-inpainting" --pixels=128 --mask_blur=8 --left --right --top --bottom --steps=20
2023-02-22 23:16:05 -08:00
powderluv
1d9204282d Update README.md 2023-02-22 23:12:41 -08:00
Eliasj42
5ff40a0d2d added an example to run sharded bloom (#1079)
added ability to compile sharded mlir files from hugingface models

Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-02-22 22:48:58 -08:00
jinchen62
fab6d2e4e0 Resize input image and mask for SD inpainting (#1082) 2023-02-22 22:46:59 -08:00
powderluv
abab59c25f Update nightly.yml 2023-02-22 18:44:43 -08:00
powderluv
c25840b585 Update nightly.yml 2023-02-22 18:34:37 -08:00
powderluv
1b3f9125bb Update nightly.yml 2023-02-22 18:23:44 -08:00
powderluv
b5d9f5ba49 Update nightly.yml 2023-02-22 18:20:31 -08:00
powderluv
1c22aa9c8f Resolve __init__.py issues (#1080)
Also drop torchvision. The test passed and didn't fail but
we can't be sure it fixes the __init__.py issue yet.
2023-02-22 18:17:00 -08:00
Daniel Garvey
e1d7fb879c make some paths windows friendly (#1066) 2023-02-22 14:44:55 -06:00
powderluv
e912c42bf0 update the openxla links 2023-02-22 12:10:23 -08:00
powderluv
e6841acf36 Publish nightlies as pre-releases
So stable versions can be marked on the Releases page
2023-02-22 12:05:28 -08:00
Gaurav Shukla
bc4459b6f4 [SD] Add inpainting web UI (#1069)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-22 11:01:18 -08:00
cstueckrath
9b544491e0 Update setup_venv.ps1 (#1073)
* Update setup_venv.ps1

fix a bug that occurs, when Python is installed but no py.exe is available

* Update setup_venv.ps1
2023-02-22 07:52:59 -08:00
m68k-fr
9c5415b598 [WebUi] css fix for Gradio v3.19.0 (#1059)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-02-21 23:50:54 -08:00
powderluv
040dbc317f unpin diffuser to latest (#1071)
Currently 0.13.x
2023-02-21 23:47:19 -08:00
powderluv
65775046d8 update IREE pip links 2023-02-21 19:31:23 -08:00
Daniel Garvey
b18bc36127 force creation of workdir (#1070) 2023-02-21 18:10:36 -08:00
cstueckrath
f01c526efd Update setup_venv.ps1 (#1064) 2023-02-21 14:13:04 -05:00
Gaurav Shukla
16168ab6b3 [SD] Update need_vae_encode correctly
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-21 20:26:06 +05:30
Gaurav Shukla
4233218629 [SD] Reset args.img_path to None in txt2img to avoid vae_encode
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-21 18:46:15 +05:30
RaINi_
b63fb36dc0 Use path.join for the winograd config directory (#1065) 2023-02-20 22:04:25 -06:00
Daniel Garvey
4e92304b89 remove annoying accelerate warning (#1056)
disables usage of low_cpu_mem_usage=True in from_pretrained() calls.
Can be re-enabled by using flag --low_cpu_mem_usage
defaults to False to avoid spam as we don't include accelerate in our
requirements.txt
2023-02-20 14:46:26 -06:00
Ean Garvey
2ae047f1a8 Update importer/benchmark setup for python3.11 (#1043) 2023-02-20 11:29:00 -06:00
Ean Garvey
6d2a485264 Add --benchmark_dispatches option to pytest. (#800)
* Add --benchmark_dispatches option to pytest.

* Update README.md and fix filepath for dispatch benchmarks
2023-02-19 12:16:18 -06:00
Daniel Garvey
4f045db024 disable anythingv3 until issue is resolved (#1053) 2023-02-18 23:47:21 -05:00
yzhang93
5b33597b6d Enable v1.5 to use tuned configs (#1049) 2023-02-18 16:54:26 -05:00
m68k-fr
962470f610 [WebUi] Minor interface cleanup and Ui cosmetics 2023-02-17 22:00:47 +05:30
cstueckrath
ba8c116380 add KDPM2Discrete and a force flag for setup_venv (#1044)
* add KDPM2Discrete and a force flag for setup_venv

* add KDPM2Discrete and a force flag for setup_venv
also made sure that Python 3.11 is used for the venv as 3.10
doesn't work anymore

* add KDPM2Discrete and a force flag for setup_venv
also made sure that Python 3.11 is used for the venv as 3.10
doesn't work anymore
2023-02-17 07:19:56 -05:00
jinchen62
ad7330eae4 Add inpainting test (#1011) 2023-02-16 22:17:10 -06:00
yzhang93
cf126e4839 Use tuned configs on custom models with ckpt_loc (#1038) 2023-02-16 17:06:21 -08:00
powderluv
c96d25c3e2 Delete stable_diffusion_amd.md
All instructions are common now and on the main page.
2023-02-16 14:57:32 -08:00
powderluv
006aa0dae2 Update README.md 2023-02-16 14:54:00 -08:00
Daniel Garvey
5b204bee86 temporarily xfail microsoft resnet50 (#1037)
Co-authored-by: dan <dan@nod-labs.com>
2023-02-16 16:14:51 -06:00
Phaneesh Barwaria
d98b2afbe9 img2img denoise strength (#1040) 2023-02-16 13:40:20 -08:00
Daniel Garvey
681332ef32 fix tests after default flag changes (#1009)
* fix tests after default flag changes

also adds support for import-mlir

* Update setup_venv.ps1

---------
2023-02-16 12:57:50 -06:00
mariecwhite
c3a4fdcbfc Add bert-large-uncased TF model 2023-02-15 21:42:44 -08:00
mariecwhite
aac5de5b02 Add bert-large-uncased Torch model 2023-02-15 21:25:32 -08:00
powderluv
13a255afad Update nightly.yml 2023-02-15 17:11:38 -08:00
powderluv
3bffda52f9 Pin to latest diffusers (#1031) 2023-02-15 14:23:10 -08:00
Daniel Garvey
d4e62ce557 add an import-mlir fallback in case of failure (#1030)
may not cover all cases. will observet

Co-authored-by: dan <dan@nod-labs.com>
2023-02-15 16:15:23 -06:00
yzhang93
9738483b18 [SD] Map v2_1 to v2_1_base until fix (#1029) 2023-02-15 13:44:41 -08:00
Abhishek Varma
143492fe94 [SD] Add support for standalone Vae checkpoints (#1020)
-- This commit adds support for standalone Vae checkpoints.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-02-15 12:17:32 -08:00
Gaurav Shukla
ecc5c662c4 [SD] Save output images to different loc every day (#1027)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-15 12:16:36 -08:00
yzhang93
d973ba191d Add conditions to force use --import_mlir (#1028) 2023-02-15 10:37:09 -08:00
Gaurav Shukla
0198b183a2 [SD] Img2Img works for limited schedulers.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-15 23:06:28 +05:30
Gaurav Shukla
0d44a3527b [SD][web] Add strength UI for img2img
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-15 22:47:41 +05:30
Gaurav Shukla
2147b6a397 [SD] Move some common code to utility
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-15 22:47:41 +05:30
Gaurav Shukla
6b5b4ba27b [SD] Add batch count in Image2Image
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-15 22:47:41 +05:30
Gaurav Shukla
67005bf57c [SD] Update iree-vulkan-target-triple after device switch
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-15 22:47:41 +05:30
PhaneeshB
0430c741c6 add strength param 2023-02-15 20:59:03 +05:30
powderluv
1ce02e365d Update README.md 2023-02-15 01:22:28 -08:00
m68k-fr
eae862adc2 Fix lint and path for gradio_tmp_imgs_folder 2023-02-15 14:27:29 +05:30
drumicube
dffa89524a Save gradio tmp images to shark_tmp folder and clean it at launch 2023-02-15 14:27:29 +05:30
yzhang93
2af1102441 [SD] Merge configs of different max lengthes from the same variant to one config file (#1019) 2023-02-15 00:25:29 -08:00
powderluv
c4b472842a Update stable_diffusion_amd.md 2023-02-14 19:02:20 -08:00
powderluv
750a7d806f update docs to 3.11 2023-02-14 17:12:09 -08:00
powderluv
bc7333f1e5 Remove forcing LLPC setting (#1018)
also fix logo paths
2023-02-14 17:09:03 -08:00
powderluv
55ae50f991 Update inpaint.py 2023-02-14 14:12:05 -08:00
powderluv
a590c331ef Update img2img.py 2023-02-14 14:11:50 -08:00
powderluv
8c241b06cb Update txt2img.py 2023-02-14 14:11:36 -08:00
powderluv
9c072c8068 Update index.py 2023-02-14 14:11:20 -08:00
powderluv
ebd8b5122a Update stable_diffusion_amd.md 2023-02-14 14:09:34 -08:00
powderluv
055e484a40 Update README.md 2023-02-14 14:06:46 -08:00
powderluv
912c4a1d12 Update shark_sd.spec 2023-02-14 13:21:29 -08:00
Abhishek Varma
c203b65bf1 Fix __file__ AttributeError + Remove --enable_stack_trace (#1015) 2023-02-14 07:55:02 -08:00
powderluv
307f0334ee Drop im2col for VAE since it crashes the driver (#1010)
This is for untuned models.
2023-02-13 19:02:51 -05:00
yzhang93
5167df08b9 [SD] Fix cuda OTF annotation (#1008) 2023-02-13 12:32:50 -08:00
Gaurav Shukla
dd2e482214 [SD] Fix multiple call to device check (#1007)
- Also makes the dark theme default.
- Fix custom_vae parameter in img2img.

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-13 11:57:52 -08:00
Eliasj42
87fd13d8eb added an example to run sharded bloom (#1003)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-02-13 10:37:47 -08:00
yzhang93
dd423bc6de [SD] Using --compile-to to dump mlir for OTF annotation (#1004)
* [SD] Using --compile-to to dumpmlir for preprocessing

* Use python api for dumping process
2023-02-13 09:17:59 -08:00
powderluv
899cb9cc1f Temporarily disable signing of exe 2023-02-12 20:37:42 -08:00
drumicube
0464c7e558 Add support for command arguments to the WebUi (#1000)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-02-11 19:20:21 -08:00
powderluv
f64e1fb926 Fix dark theme again for exe builds (#1001) 2023-02-11 19:08:17 -08:00
powderluv
ef7d31293d Update tests to 3.11 2023-02-11 15:38:27 -08:00
powderluv
6d54eb68dc update to support 3.11 2023-02-11 15:23:18 -08:00
powderluv
30eb10c990 Update to 3.11 2023-02-11 03:47:14 -08:00
Abhishek Varma
591bbcd058 [SD] Fix vmfb locating bug
-- This commit fixes a bug in vmfb caching due to vae_encoder and also
   involves a minor NFC change in the code.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-02-10 23:33:47 +05:30
Abhishek Varma
99aa77d036 [SD] Add a common way to name vmfbs including custom_vae
-- This commit adds a common way to name vmfbs and adds to it `custom_vae`
   support as well.
-- This was required to make a common place to change vmfbs name
   without breaking any feature support AND also tackle the caching
   of vmfbs gracefully.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-02-10 23:33:47 +05:30
Abhishek Varma
9c13f1e635 Add custom vae support using --custom_vae flag
-- This commit adds custom vae support to SD wherein the user can
   point to a model's checkpoint file whose Vae needs to be plugged
   into the main model.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-02-10 23:33:47 +05:30
Gaurav Shukla
24af983cfb [SD] Fix input image type
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-10 23:27:52 +05:30
Gaurav Shukla
67842a7525 [SD] Fix parameters in img2img
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-10 22:03:33 +05:30
PhaneeshB
3159a6f3e1 add support for img1img 2023-02-10 21:29:02 +05:30
Gaurav Shukla
b2f3c96835 [SD][web] Add Img2Img UI
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-10 21:27:31 +05:30
jinchen62
6582475955 Add SD inpainting
python apps/stable_diffusion/scripts/inpaint.py --prompt="prompt" --img_path=path/to/img --mask_path=path/to/mask --import_mlir --max_length=77 --hf_model_id="stabilityai/stable-diffusion-2-inpainting"
2023-02-10 15:33:20 +05:30
Anush Elangovan
41ee65b377 Revert "Enable --device_allocator=caching"
This reverts commit 83fe477066.
2023-02-09 23:00:06 -08:00
Anush Elangovan
83fe477066 Enable --device_allocator=caching 2023-02-09 22:58:46 -08:00
yzhang93
4ca84ee4ee Revert "Delete unnecessary arg setting (#978)" (#985)
This reverts commit 83c69ecd49.
2023-02-09 16:44:26 -08:00
Ean Garvey
c28cc4c919 Fix local_tank_cache handling in shark_downloader. (#981) 2023-02-09 14:52:03 -06:00
yzhang93
e9864cb3f7 Modify the annotation OTF to return bytecode module (#980) 2023-02-08 14:29:43 -08:00
yzhang93
83c69ecd49 Delete unnecessary arg setting (#978) 2023-02-08 10:30:18 -08:00
Prashant Kumar
3595b4aaff Incorporate latest changes in the shark_dynamo backend. 2023-02-08 20:37:30 +05:30
Abhishek Varma
3a9cfe113a Fix SD restart error in exe file (#975)
-- This commit fixes SD restart error in exe file by creating
   variants.json in CWD instead of a relative path.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-02-08 06:14:08 -08:00
yzhang93
c9966127da Fix iree flags to be able to run on rdna2 (#972) 2023-02-07 16:39:32 -08:00
Ean Garvey
51300d33a7 Remove non-SD args from generate_sharktank.py (#970) 2023-02-07 13:29:55 -06:00
Gaurav Shukla
5af124c5a5 [SD] Add batch count in stable diffusion
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-07 23:26:46 +05:30
Abhishek Varma
eeb20b531a Fix restart SD session error + override args.use_tuned temporarily
-- This commit fixes the session restart error for SD.
-- It also overrides `args.use_tuned` for `import_mlir`, and sets
   `use_tuned` as `False`.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-02-07 19:50:48 +05:30
cstueckrath
9dca842c22 Update .gitignore to exclude models (#967)
the models folder will be stashed along with other changes and most likely kill git doing so.
2023-02-07 01:48:36 -08:00
Ean Garvey
1eb9436836 Fix generate_sharktank args. 2023-02-07 14:06:07 +05:30
Ean Garvey
9604d9ce81 make --update_tank update only if hash mismatch 2023-02-07 14:06:07 +05:30
Ean Garvey
481d0553d8 Remove unnecessary repro_dir / shark_tmp usage 2023-02-07 14:06:07 +05:30
powderluv
60035cd63a Add css in exe (#963)
exe should now default to dark theme too
2023-02-06 15:26:08 -08:00
drumicube
d35f992ace Bring back the --runs options for the cmd command and fix wrong seed/model reported in json, csv and png (#962) 2023-02-06 15:16:50 -06:00
Daniel Garvey
157ae64f9d print to stdout for test visibility (#937)
Co-authored-by: dan <dan@nod-labs.com>
2023-02-06 01:03:27 -08:00
powderluv
ffa17f6057 Update sd_dark_theme.css 2023-02-06 01:01:50 -08:00
drumicube
d695a43e32 Make the dark theme default while launching web server (#954) 2023-02-05 07:25:45 -08:00
powderluv
01f6b4e6f0 Update README.md 2023-02-04 23:40:13 -08:00
yzhang93
7cf31a6ae4 Fix iree-benchmark flag names (#952) 2023-02-04 22:24:18 -08:00
Quinn Dawkins
fbd6224b04 Revert "Revert pipelines (#948)" (#951)
This reverts commit 8115b26079.
Additionally fixes img2col by adding detach elementwise from named op
passes.
2023-02-04 22:44:08 -05:00
powderluv
8115b26079 Revert pipelines (#948)
* Revert "[SD] Modify the flags to use --iree-preprocessing-pass-pipeline (#914)"

This reverts commit a783c089a9.

* Revert "Fix iree flags due to the change in shark-runtime (#944)"

This reverts commit 1d38d49162.
2023-02-04 07:09:51 -08:00
powderluv
820586ac68 Update README.md 2023-02-04 01:01:11 -08:00
powderluv
4a7441ed07 Update profiling_with_iree.md 2023-02-04 00:47:57 -08:00
powderluv
383741f284 Update stable_diffusion_amd.md 2023-02-04 00:40:47 -08:00
powderluv
2bbc4e0e9f Update README.md 2023-02-04 00:35:40 -08:00
powderluv
a7237244b0 Send users to the .exe file first 2023-02-04 00:30:32 -08:00
yzhang93
1d38d49162 Fix iree flags due to the change in shark-runtime (#944) 2023-02-03 21:34:02 -08:00
yzhang93
a783c089a9 [SD] Modify the flags to use --iree-preprocessing-pass-pipeline (#914)
* [SD] Modify the flags to use --iree-preprocessing-pass-pipeline

* Fix flags in sd_annotation
2023-02-03 15:08:02 -08:00
powderluv
e7907dc532 Disable tuned models for sm_89 (#943)
Looks like tuning on A100 doesn't necessarily translate to 40xx.
2023-02-03 14:30:46 -08:00
powderluv
394413679d Fix ckpt_dir (#939) 2023-02-03 12:54:19 -08:00
powderluv
37189f14cb roll to 492 2023-02-03 11:59:18 -08:00
powderluv
0b1ee81901 Minor webui changes (#938) 2023-02-03 11:26:45 -08:00
Gaurav Shukla
00cf73f9b8 [SD] Merge model id dropdown and .ckpt dropdown (#936)
- use_tuned is set to False for custom checkpoints.

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-03 10:43:33 -08:00
Abhishek Varma
5a5f285493 [apps-SD] Prepone loading of vmfbs + restructure the SD pipeline
-- This commit prepones loading of vmfbs, if present, for all sub-models.
-- It also involves restructuring the SD pipeline to achieve the loading
   of vmfbs smoothly and postpones processing of checkpoint files only when
   required.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-02-03 20:21:24 +05:30
powderluv
7f2ea454b6 revert /base variants as they are different (#929)
sd2_1base is different from VAE base (for older cards)
2023-02-03 01:27:32 -08:00
Daniel Garvey
7c14002118 Map 2_1 to 2_1_base (#927)
* fix broken paths for older models

* adds a mapping from sd_2_1 to sd_2_1_base

we only have models in models_db for 2_1_base.
now that diffusers is fixed we can actually generate
2_1 itself, but until we add support for that in the tank
we should fetch 2_1_base for no-import_mlir

---------

Co-authored-by: dan <dan@nod-labs.com>
2023-02-02 19:03:19 -08:00
powderluv
3e9554f0a1 roll to 487 2023-02-02 19:02:39 -08:00
Daniel Garvey
e11ffec544 fix broken paths for older models (#926)
Co-authored-by: dan <dan@nod-labs.com>
2023-02-02 15:48:19 -08:00
powderluv
8a47ddbe99 Update models/ location in UI (#925)
default to png metadata on
2023-02-02 15:28:39 -08:00
powderluv
821108c7bd Fix models path (#924) 2023-02-02 15:16:00 -08:00
Gaurav Shukla
339738f8a3 [SD][web] Populate checkpoints as dropdown UI (#918)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-02 13:59:50 -08:00
powderluv
9b90672f63 Fix LLPC env var (#920) 2023-02-02 11:45:08 -08:00
Ean Garvey
ba07e94a5e disable Torch Inductor autotuner in benchmarks (#919) 2023-02-02 13:25:43 -06:00
aldesilv
b3fc0f29cc enable additional flags for tank test models (#866)
Co-authored-by: Alex <alexander@nod-labs.com>
2023-02-02 11:19:33 -08:00
Gaurav Shukla
5c7deb3611 [SD] Fix output image location (#917)
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-02-02 09:50:37 -08:00
Daniel Garvey
15604e374f change bytecode model paths (#913)
Co-authored-by: dan <dan@nod-labs.com>
2023-02-02 11:12:13 -06:00
Abhishek Varma
7cfc0fa55b [APPS-SD] Fix a few bugs and bring it up to speed with SD CLI (#908) 2023-02-02 07:12:01 -08:00
Ean Garvey
a90812133b Enable pytests on Windows (#901) 2023-02-01 18:36:41 -06:00
powderluv
e26a70aa4f Drop old cli and webui (#911) 2023-02-01 13:13:46 -08:00
Daniel Garvey
6a32a4e26c move ci sd stuff to apps (#912)
Co-authored-by: dan <dan@nod-labs.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-02-01 12:15:07 -08:00
powderluv
e853abf98b Update stable_diffusion_amd.md 2023-02-01 11:11:58 -08:00
powderluv
51e81e6ef8 update main readme 2023-02-01 11:09:00 -08:00
powderluv
e355000ceb Drop torchvision 2023-02-01 10:26:37 -08:00
Daniel Garvey
e374074013 Windows test (#896)
* add generate_sharktank for stable_diffusion model defaults

* add windows test for sd

---------

Co-authored-by: dan <dan@nod-labs.com>
2023-02-01 12:03:54 -06:00
powderluv
81e3d1c2c6 switch to apps/ 2023-02-01 06:54:20 -08:00
powderluv
ab0cbb4475 Add PyInstaller for apps/ webui and cli (#909)
tested webui, cli and webui exe and cli exe
2023-02-01 06:51:27 -08:00
powderluv
1c64e40722 Add PyInstaller for apps/ (#907)
Build with pyinstaller.exe .\apps\stable_diffusion\web\shark_sd.spec

normal flow works. exe is missing a few json files
2023-02-01 06:04:49 -08:00
Evan Guan
8cafe56eb4 Added flags for metadata information. (#894) 2023-02-01 05:16:11 -08:00
Eliasj42
3eceeb7b23 fixed a bug that would sometimes cause intel-gpu to appear unsupported (#899)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-01-31 22:32:05 -08:00
powderluv
1a37675435 Revert "move beta to release (#898)" (#905)
This reverts commit 7edcaf5a06.
2023-01-31 20:31:41 -08:00
powderluv
198ebede8d Revert "replace new model_db.json (#902)" (#904)
This reverts commit 842adef29c.
2023-01-31 20:29:40 -08:00
Ean Garvey
a504903dd5 Fix formatting issues. (#903) 2023-02-01 09:12:45 +05:30
Daniel Garvey
842adef29c replace new model_db.json (#902) 2023-01-31 18:55:22 -08:00
Daniel Garvey
7edcaf5a06 move beta to release (#898)
Co-authored-by: dan <dan@nod-labs.com>
2023-01-31 17:14:08 -06:00
Gaurav Shukla
c124b76328 [SD] Reorganize the stable diffusion model. (#806)
The stable diffusion codebase has been reorganized to make it more
modular so that the same script can be used for web as well as cli,
instead of duplicating the whole codebase.

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-01-31 14:42:41 -08:00
aldesilv
e9c744ee5d find rocm arch used in rocminfo (#893)
Co-authored-by: Alex <alexander@nod-labs.com>
2023-01-31 10:22:31 -08:00
Ean Garvey
83302930d8 Update generate_sharktank.py (#897) 2023-01-31 10:21:22 -08:00
Daniel Garvey
a4634632ba add generate_sharktank for stable_diffusion model defaults (#742)
Co-authored-by: dan <dan@nod-labs.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-01-31 09:44:54 -08:00
Abhishek Varma
d17e8dc5ad [NFC] Rename SD negative_prompts flag
-- This commit renames SD `negative-prompts` -> `negative_prompts` flag.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-31 21:38:59 +05:30
powderluv
9fe63de4d4 Pin macOS SDK to 216 2023-01-31 01:09:44 -08:00
Eliasj42
8111f8bf35 added ability to select gpu (#891)
Co-authored-by: Elias Joseph <elias@nod-labs.com>
2023-01-30 13:39:12 -08:00
Abhishek Varma
fcd62513cf [SD-CLI] Add support for .safetensors + Use diffusers pipeline to load SD
-- This commit uses `load_pipeline_from_original_stable_diffusion_ckpt`
   as exposed due to [Diffusers PR](https://github.com/huggingface/diffusers/pull/2019).
-- It also adds a support for the end users to use `.safetensors` along
   with `.ckpt` file.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-31 00:00:37 +05:30
Abhishek Varma
c3c701e654 Update requirements.txt + README.md of SD
-- This commit includes two python modules as part of requirements.txt.
-- It also updates README.md to also inclue `--no-use_tuned` for users to
   be able to try `hf_model_id` or `ckpt_loc` without any issue.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-30 14:12:54 +05:30
Daniel Garvey
6bf991edf6 adding more robust main.py testing (#889)
Co-authored-by: dan <dan@nod-labs.com>
2023-01-30 00:14:26 -08:00
yzhang93
9644e78545 Fix CUDA tuned model annotation (#880) 2023-01-27 11:35:18 -08:00
dymil
c911189ef0 Add note about latest RDNA3 driver support (#881)
Also tweak other wording
2023-01-27 09:39:19 -08:00
Abhishek Varma
1118b4b651 [SD-CLI] Clean up vmfbs if a retry method fails
-- This commit cleans up vmfb files generated as a result of retry method.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-27 21:55:36 +05:30
PhaneeshB
4be75d4418 fix seed values in SD json and filename 2023-01-27 18:40:26 +05:30
Ean Garvey
fb6beae27c Adds pytest-forked dependency to fix pytest memory accumulation issues. (#876)
* Minor improvements to test-models workflow

- cleaned up pytest command line args in Validate Models job scripts.
- Removed -s flag to provide more readable logs
- Changed shark_cache location to within github workspace and removed --update_tank flag from Linux workflows.

* Use pytest-forked for managing pytest memory usage.
2023-01-26 18:20:15 -06:00
yzhang93
fee73b0b63 Add SD model annotation on fly (#869)
* Add SD model annotation on fly

* Move tuned_compile_through_fx to utils

* Fix SD compilation flags
2023-01-26 11:46:36 -08:00
powderluv
9bbffa519e Add an option to respect LLPC env var (#875)
Also add OSX paths
2023-01-25 13:56:55 -08:00
jinchen62
c3a641f0ab Address TODOs for dataset annotator (#872)
- add args usage, pass gs_url by CL flag
- add support for no existing prompts
2023-01-25 09:28:23 -08:00
yzhang93
aafe7c4701 Add more cuda devices to use tuned model (#868) 2023-01-25 06:36:17 -08:00
Abhishek Varma
9a0b082cf8 [SD-CLI] Add batch_size command-line arg + prompt processing
-- This commit adds `batch_size` command-line arg.
-- It also involves replicating the prompt `batch_size` no. of times.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-25 19:21:25 +05:30
powderluv
8265e34a29 Add SHARK SD CLI tool (#870) 2023-01-24 23:14:32 -08:00
powderluv
8ef8ae097f Update to build 469 2023-01-24 22:16:13 -08:00
powderluv
c3d14293c0 Update sample results 2023-01-24 22:14:06 -08:00
powderluv
d55d8be504 Add signing of release builds 2023-01-24 21:32:21 -08:00
powderluv
03543030d3 use pefile 2023-01-24 18:35:51 -08:00
powderluv
fc6b474b92 Add ordlookup to requirements.txt 2023-01-24 18:30:16 -08:00
powderluv
a5db785dd7 checkoutv2 on windows 2023-01-24 18:23:22 -08:00
powderluv
1c1c5cd611 Build Windows nightly on 7950x 2023-01-24 16:21:56 -08:00
Abhishek Varma
6ed02f70ec [SD-CLI] Make using ckpt_loc and hf_model_id easier
-- Currently we require users to specify the base model on which the custom
   model (.ckpt) is tuned on. Even for running a HuggingFace repo-id, we
   require the users to go a tedious way of adding things to variants.json.

-- This commit aims to address the above issues and will be treated as a
   starting point for a series of design changes which makes using SHARK's SD
   easier.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-24 23:03:46 +05:30
Prashant Kumar
cb78cd8ac0 Add the support for the batch size parameter. 2023-01-24 22:33:13 +05:30
Ean Garvey
0c4590b45a Update generate_sharktank.py 2023-01-24 10:18:03 +05:30
jinchen62
d2e2ee6efa Add multiple prompts support for dataset annotator (#862) 2023-01-23 18:40:36 -08:00
powderluv
6a380a0b48 Add more nvidia cards 2023-01-23 17:07:45 -08:00
powderluv
e5d5acbf1f Remove torchvision requirements from web (#860) 2023-01-23 13:48:53 -08:00
powderluv
00e38abbf0 Add 4080 support 2023-01-23 09:56:34 -08:00
Abhishek Varma
e3e4ea5443 Update README.md
-- Make usage of `hf_model_id` clearer.
2023-01-23 23:25:23 +05:30
Prashant Kumar
a3e4ea3228 Remove the dependency of the torchvision. (#858)
Remove the dependency of torchvision library for the conversion
of tensor layout format to what PIL library expects.
2023-01-23 08:49:57 -08:00
powderluv
56f16d6baf Update SD readme 2023-01-23 06:51:54 -08:00
Abhishek Varma
7a55ab900e [SD-CLI] Fix CKPT script + add more variants + update README.md
-- This commit fixes CKPT script to rely on the previous CKPT to Diffusers
   script.
   TODO: Let go of the script once the CKPT is included in next release
         of diffusers.
-- It also adds many variants as part of `variants.json` and updates
   `README.md` to reflect change in default `hf_model_id`.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-23 18:34:24 +05:30
Abhishek Varma
137643fe72 [SD-CLI] Update README.md of custom models to include hf_model_id 2023-01-23 11:37:13 +05:30
Anush Elangovan
d6e59c6241 black format comments 2023-01-22 16:34:40 -08:00
powderluv
458eb5d34c detect RX 7900 better 2023-01-22 16:32:27 -08:00
Erkin Alp Güney
8259f08864 Collapsibles for Win10 and Linux users (#851)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-01-22 09:50:33 -08:00
Prashant Kumar
b3ab0a1843 Add width and height support for the scheduler. 2023-01-22 23:16:50 +05:30
dependabot[bot]
f09f217478 Bump tensorflow from 2.10 to 2.10.1 (#853)
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.10 to 2.10.1.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.10.0...v2.10.1)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-22 06:40:17 -08:00
Daniel Garvey
e842c8c19b add main.py testing for sdiff (#836)
Co-authored-by: dan <dan@nod-labs.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-01-22 01:16:17 -08:00
powderluv
f6c3112d44 Revert "potential fix to pre-load DLL dir for torch-mlir (#848)" (#852)
This reverts commit 6c470d8131.
2023-01-22 00:09:35 -08:00
yzhang93
7059610632 Modify the default for --hf_model_id flag 2023-01-21 11:21:47 +05:30
powderluv
2d272930d9 Update to signed build 455 2023-01-20 16:50:42 -08:00
powderluv
6c470d8131 potential fix to pre-load DLL dir for torch-mlir (#848)
Doesn't regress the main.py script but system already pre-loaded
the DLL so needs more testing.
2023-01-20 14:48:45 -08:00
jinchen62
30b29ce8cd Add readme for dataset annotator (#847) 2023-01-20 01:03:33 -08:00
jinchen62
1a9933002f Add dataset annotation tool (#835) 2023-01-19 16:56:08 -08:00
stanley
c4a9365aa1 [Shark][Training] Refresh SharkTrainer to latest APIs. 2023-01-19 20:30:15 +00:00
Prashant Kumar
9d3af37104 bugfix related to the height width params. 2023-01-20 00:21:44 +05:30
Prashant Kumar
7b3d57cff7 Add height and width as args. 2023-01-19 23:43:29 +05:30
Abhishek Varma
a802270da9 [SD-CLI] Update README.md about variants.json 2023-01-19 22:46:54 +05:30
Abhishek Varma
dd194a8758 [SD-CLI] Reorder loading of opt_params when needed
-- This commit reorders loading of opt_params when `import_mlir` is not used.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-19 22:02:51 +05:30
Abhishek Varma
6de02de221 [SD-CLI] Make using custom models easier
-- This commit makes using custom models easier using a combination of
   `import_mlir`, `ckpt_loc` and `hf_model_id`.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-19 22:02:36 +05:30
Abhishek Varma
85259750bf [SD-CLI] Fix variants.json mapping
-- This commit fixes variants.json's mapping.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-19 22:02:36 +05:30
Prashant Kumar
1249f0007d Remove args.variant and args.version with args.custom_model. 2023-01-19 19:55:12 +05:30
Abhishek Varma
db0514d3fa [SD-CLI] Fix get_model_configuration to use max_length
-- This commit fixes `get_model_configuration` to use `max_length`.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-19 19:10:04 +05:30
Abhishek Varma
dce42a7fad [SD-CLI] Fix args.max_length range check
This commit fixes args.max_length range check.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-19 18:26:23 +05:30
Prashant Kumar
ec0b380194 Refactor shark_tank models and custom models.
The custom models shouldn't depend on shark_tank in anyway.
2023-01-19 13:56:11 +05:30
Ean Garvey
7f27b61c98 Update setup_venv.sh to install triton if BENCHMARK=1 2023-01-19 00:26:46 -06:00
Guy Nachshon
f0b3557b02 fix: replace malicious and deleted package (#833) 2023-01-18 13:41:05 -08:00
xzuyn
2a1d1c1001 make jpeg optimized and progressive (#820)
* GUI make jpeg optimized and progressive

* CLI make jpeg optimized and progressive
2023-01-17 16:35:36 -08:00
Abhishek Varma
df7eb80e5b [SD-CLI] Make custom_model take highest priority for generating models if present
-- This commit makes `custom_model` take highest priority for generating models if present.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-17 22:50:58 +05:30
Fraser Humphries
b9d947ce6f style: 🎨 Restore whitespace 2023-01-17 17:45:32 +05:30
Fraser Humphries
e6589d2454 fix: 🏗️ Add demo.css to spec file datas 2023-01-17 17:45:32 +05:30
Fraser Humphries
0f5ac6afcf fix: 🐛 resolve css file path relative to __file__
issues-816
2023-01-17 17:45:32 +05:30
Abhishek Varma
bc1bb1d188 [SD-CLI] Fix vmfb naming + update README.md for custom_model
-- This commit introduces a fix for .vmfb naming to strip away any
   non-alphanumeric characters from `custom_model` path.
-- It also updates the README.md to include the `custom_model` arg.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-17 16:27:54 +05:30
Abhishek Varma
3af2dd10ce [SD-CLI] Add CKPT support to update models irrespective of import_mlir flag
-- This commit adds CKPT support to update models irrespective of `import_mlir` flag.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-17 13:24:27 +05:30
yzhang93
dd22c65855 Add CUDA tuned models for SD variants (#814) 2023-01-16 09:38:27 -08:00
PhaneeshB
48137ced19 add png as default format 2023-01-16 18:37:36 +05:30
Phaneesh Barwaria
6eb47c12d1 add multi-run in single execution (#812) 2023-01-13 11:12:43 -08:00
Prashant Kumar
5a1fc6675a This PR adds --import-mlir for f16 tensors without cuda. 2023-01-13 22:19:53 +05:30
Prashant Kumar
6f80825814 Modify import_with_fx to import with dtype=f16. 2023-01-13 22:19:53 +05:30
PhaneeshB
f0dd48ed2a remaining disk space warning 2023-01-13 19:34:05 +05:30
Gaurav Shukla
15e2df0db0 [SD][web] Add a UI textbox to show the output location
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-01-13 19:33:04 +05:30
Fraser Humphries
4ad0109769 fix: 🐛 Extract demo css string to css file
fix: 🐛 Extract demo css string to css file

issues/807

fix: 🐛 Revert background colors
2023-01-13 16:42:05 +05:30
PhaneeshB
ee0009d4b8 pythonize uname for cpu target triple in windows 2023-01-12 22:39:49 +05:30
PhaneeshB
9d851c3346 small fixes 2023-01-12 22:32:24 +05:30
xzuyn
5d117af8ae Increase JPEG output quality & disable subsampling (#801)
* Increase JPEG output quality & disable subsampling

Increased to JPEG95 from the default JPEG75 which is way too compressed. Output image size is now ~100kb. Previously was ~20kb.

* Increase JPEG output quality & disable subsampling

Add jpeg quality increase on cli

* line length changes

* line length changes
2023-01-11 23:06:11 -08:00
yzhang93
bb41c2d15e Add VAE cuda tuned model (#796) 2023-01-11 14:15:03 -08:00
powderluv
eba138ee4a Revert "Change address for connection test (#785)" (#797)
This reverts commit 187f0fa70c.
2023-01-11 12:01:37 -08:00
Gaurav Shukla
3b2bbb74f8 [SD][web] Add support for saving generated images
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-01-11 22:47:32 +05:30
fokin33
dbc0f81211 Add simple telegram bot (#787) 2023-01-11 09:20:23 -06:00
mariecwhite
d0b613d22e Enable Torch-Inductor Part 2 2023-01-10 20:15:29 -08:00
Ean Garvey
72f29b67d5 Add Resnet50 fp16 variant to pytests. (#760) 2023-01-10 16:31:11 -08:00
Quinn Dawkins
9570045cc3 Fix tuned model selection for non-vulkan devices (#792) 2023-01-10 19:04:21 -05:00
Phaneesh Barwaria
e4efdb5cbb add json data for each image (#790)
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2023-01-10 13:13:07 -08:00
calcifer11
187f0fa70c Change address for connection test (#785)
Some ISP's (like mine) reserves 1.1.1.1 for internal testing, meaning _internet_connected(); needlessly retries for a minute until it fails even though my connection is fine.
Propose 8.8.8.8 instead as this is also publically available and not normally blocked by ISPs.
2023-01-10 10:51:30 -08:00
Gaurav Shukla
472185c3e4 [SD][web] Fix device key error
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-01-10 20:51:01 +05:30
Gaurav Shukla
f94a571773 [SD] Update spec file to include model_config.json
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-01-10 20:38:10 +05:30
mariecwhite
183e447d35 Enable Torch Inductor (#784) 2023-01-10 20:57:58 +11:00
xzuyn
12f844d93a Git pull through argument in setup_venv (#623) 2023-01-09 15:42:13 -08:00
yzhang93
47a119a37f [SD] Add CUDA A100 tuned model (#773) 2023-01-09 15:22:27 -08:00
Gaurav Shukla
ee56559b9a [SD][web] Add a json file for model configuration
This cleans model_wrappers.py file.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-01-10 00:05:46 +05:30
Gaurav Shukla
00e594deea [SD][web] Add version number in performance details
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-01-09 21:32:34 +05:30
George Petterson
6ad9b213b9 Add GCN4
(cherry picked from commit 3be072b3c09c9b38bc2d79ad6e6900eefee49a1c)
2023-01-09 21:09:50 +05:30
PhaneeshB
e4375e8195 Add support for vulkan target env 2023-01-09 21:09:50 +05:30
mariecwhite
487bf8e29b Enable TF32 in Torch if specified (#768) 2023-01-09 06:48:57 -08:00
Prashant Kumar
fea1694e74 Delete the cached objects explicitly. 2023-01-06 23:04:52 +05:30
Prashant Kumar
4102c124a9 Add the shark upscaler model. (#759) 2023-01-05 14:07:20 -08:00
yzhang93
135bad3280 [SD] Update v1.4 tuned model (#758) 2023-01-05 11:04:30 -08:00
Gaurav Shukla
b604f36881 [SD][web] Add flags for global URL and server port
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2023-01-05 15:30:30 +05:30
yzhang93
782b449c71 Add script to auto annotate SD models and variants (#751)
* Add script to auto annotate SD models and variants

* Add model config files

* Add script to auto annotate SD models and variants

* Add model config files

* Move config files to shark_tank
2023-01-04 15:53:10 -08:00
jinchen62
017dcab685 Add target triple support for TITAN RTX (#756) 2023-01-04 15:39:00 -08:00
Abhishek Varma
e60b4568c6 [SharkInference] Make SharkInference compile the entire module (#708)
* [SharkInference] Make SharkInference compile the entire module

-- Previously SharkInference was compiling and providing run APIs
   for a harcoded function with function name "forward".
-- This commit makes the compiling functionality generic and now
   any function being defined within the module can be run.
-- It also creates an API to fetch all the function names defined
   within the compiled module.
-- This commit updates both web and command-line execution of Stable
   Diffusion to use new API of  SharkInference.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-03 23:25:23 +05:30
powderluv
4ee3d95a5a Update to build 423
Post pytorch security breach
2023-01-01 12:10:23 -08:00
Graham
f18725bacc replaced <username> with %username% for easy copy/paste (#744) 2022-12-31 21:29:37 -08:00
jinchen62
f6064a2b84 Add a prototype of the model compilation configs for SD (#734) 2022-12-28 15:14:36 -08:00
Quinn Dawkins
2e90cb7b95 Set default warmup count to 0 (#736) 2022-12-28 12:27:43 -06:00
powderluv
2c09d63cd9 Update to build 417 2022-12-27 14:25:20 -08:00
powderluv
cc6fbdb0c3 Add sm_89 and point to nvcuda.dll (#731) 2022-12-26 10:54:38 -08:00
powderluv
ecfdec12f3 Update requirements.txt 2022-12-25 15:39:20 -08:00
Gaurav Shukla
45af40fd14 [SD][web] Add openjourney and dreamlike in SD web UI
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-26 01:59:36 +05:30
Phaneesh Barwaria
d11cf42501 Add support for dreamlike diffusion (#725)
* Add support for dreamlike diffusion

* model wrapper to support 77 dreamlike

* lint fix
2022-12-26 01:35:17 +05:30
Gaurav Shukla
c3c1e3b055 [SD] Add bucket info in the model_db.json
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-25 20:38:33 +05:30
Gaurav Shukla
7c5e3b1d99 [SD] Fix flags for cuda devices
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-25 19:03:02 +05:30
Gaurav Shukla
ed6cec71e7 [SD] Fix clip inference time
Fix clip inference time by adding default warmup_count to 5.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-25 18:16:53 +05:30
Tobby "GTD-Carthage" Ong
d6bcdd069c - Added missing double linebreak from linting 2022-12-25 12:07:43 +05:30
Tobby "GTD-Carthage" Ong
a26347826d - Revised code to also use get_schedulers function instead 2022-12-25 12:07:43 +05:30
Tobby "GTD-Carthage" Ong
5d1c099b31 [SD] Add Euler Ancestral scheduler as option to WebUI 2022-12-25 12:07:43 +05:30
Gaurav Shukla
220bee1365 [SD][web] Add device support in the SD web UI
1. Now device selection is available through UI.
2. Models reloading will only happen when there will be a change in the
   settings(variant + device).

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-25 01:45:07 +05:30
PhaneeshB
1261074d95 Add tuned models for av3 and ad 2022-12-24 22:56:15 +05:30
Stanley Winata
136021424c [SD] Change default VMA large heap block size for windows perf. (#715)
Windows perform can boost from 2.67s/image to 2.4523s/image.
While Linux stays the same.
2022-12-24 01:40:58 +07:00
PhaneeshB
fee4ba3746 Add openjourney 2022-12-23 23:34:22 +05:30
Gaurav Shukla
a5b70335d4 [SD][web] Add variant support in the web UI
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-23 23:18:27 +05:30
Stanley Winata
5cf4976054 [Vulkan][utils] Add GTX Pascal support. (#709) 2022-12-22 15:24:15 -08:00
PhaneeshB
1aa3255061 Add vaebase for av3 and ad 2022-12-23 04:17:17 +05:30
Daniel Garvey
b01f29f10d add support for clear_all (#691) 2022-12-22 11:25:03 -06:00
Boian Petkantchin
2673abca88 Fix concurrency issue in stress_test for CUDA devices 2022-12-22 08:54:19 -08:00
Gaurav Shukla
7eeb7f0715 [SD] Update all the utilities to make web and CLI codebase closer (#707)
At this point, all the utilities of SD web and CLI are exactly same.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-22 02:49:48 -08:00
powderluv
37262a2479 Remove spurious characters 2022-12-21 19:23:54 -08:00
Gaurav Shukla
de6e304959 [SD] Fix the resource location in shark_sd.spec (#706) 2022-12-21 14:41:56 -08:00
Quinn Dawkins
234475bbc7 Add base_vae entries for variant models (#705) 2022-12-21 14:35:08 -08:00
Quinn Dawkins
abbd9f7cfc [SD] Set unet flags for cuda (#704) 2022-12-21 13:22:04 -08:00
Gaurav Shukla
dfd6ba67b3 [SD] Update SD CLI to use model_db.json
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-22 02:13:04 +05:30
yzhang93
1595254eab Modify model annotation tool to walk through ops by shape (#692) 2022-12-21 10:46:30 -08:00
PhaneeshB
6964c5eeba encapsulate relevant methods in one method 2022-12-21 23:56:17 +05:30
PhaneeshB
2befe771b3 Add support for automatic target triple selection for SD 2022-12-21 22:38:06 +05:30
Prashant Kumar
b133a035a4 Add the download progress bar. 2022-12-21 15:47:33 +05:30
Gaurav Shukla
726c062327 [SD] Update spec files
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-21 14:16:04 +05:30
Gaurav Shukla
9083672de3 [SD][web] Tuned models only for stablediffusion/fp16 and rdna3 cards
Currently tuned models are only available for stablediffusion/fp16 and
rdna3 cards.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-21 14:15:39 +05:30
Quinn Dawkins
cdbaf880af [SD] [web] Add model variants to web 2022-12-21 13:42:22 +05:30
Quinn Dawkins
9434981cdc Add random seed generation for seed = -1 in cli (#689) 2022-12-20 17:15:22 -05:00
Phaneesh Barwaria
8b3706f557 Add Anything v3 and AnalogDiffusion variants of SD (#685)
* base support for anythingv3

* add analogdiffusiont

* Update readme

* keep max len 77 till support for 64 added for variants

* lint fix
2022-12-20 13:08:13 -08:00
Gaurav Shukla
0d5173833d [SD] Add a json file for model names information. (#687)
This commit simplifies the code to identify the model name for a
particular set of flags. This is achieved by introducing a json file
that stores the model names information. The models are uploaded in
gcloud with these names.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-20 11:47:31 -08:00
powderluv
bf1178eb79 roll to build 400 2022-12-20 10:34:31 -08:00
yzhang93
abcd3fa94a [SD] Set model max length 64 as default (#681) 2022-12-19 21:13:04 -08:00
Quinn Dawkins
62aa1614b6 [SD] Add --use_base_vae flag to do conversion to pixel space on cpu (#682) 2022-12-19 21:09:39 -08:00
Quinn Dawkins
7027356126 [SD] Fix warmup for max length 64 (#680) 2022-12-19 21:04:44 -05:00
yzhang93
5ebe13a13d Add Unet len 64 tuned model (#679) 2022-12-19 16:24:08 -08:00
Gaurav Shukla
c3bed9a2b7 [SD][web] Add flag to disable the progress bar animation
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-20 02:50:04 +05:30
yzhang93
f865222882 Update VAE 19dec tuned model (#676) 2022-12-19 12:42:28 -08:00
powderluv
e2fe2e4095 Point to 398 2022-12-19 12:08:30 -08:00
powderluv
0532a95f08 Update stable_diffusion_amd.md 2022-12-19 12:04:42 -08:00
Quinn Dawkins
ff536f6015 [SD] Deduplicate initial noise generation (#677) 2022-12-19 14:38:41 -05:00
Gaurav Shukla
097d0f27bb [SD][web] Add 64 max_length support in SD web
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-20 00:00:58 +05:30
Prashant Kumar
2257f87edf Update opt_params.py 2022-12-19 23:43:30 +05:30
PhaneeshB
a17800da00 Add 64 len f16 untuned mlir 2022-12-19 22:53:17 +05:30
Prashant Kumar
059c1b3a19 Disable vae --use_tuned version. 2022-12-19 22:45:45 +05:30
Stanley Winata
9a36816d27 [SD][CLI] Add a warmup phase (#670) 2022-12-20 00:14:23 +07:00
Gaurav Shukla
7986b9b20b [SD][WEB] Update VAE model and wrapper
This commit updates VAE model which significantly improves performance
by an order of ~300ms.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-19 22:32:05 +05:30
Gaurav Shukla
b2b3a0a62b [SD] Move initial latent generation out of inference time
The initial random latent generation is not taken into account
for total SD inference time.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-19 22:32:05 +05:30
Prashant Kumar
3173b7d1d9 Update VAE model and wrapper. 2022-12-19 19:54:50 +05:30
Gaurav Shukla
9d716d70d6 [SD][web] Fix performance issues on shark scheduler
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-19 17:44:37 +05:30
Stanley Winata
e1901a8608 [SD][CL] Disable print at every iteration. (#664)
Printing might incur extra time to runtime. Hence, we add a flag to hide it. To disable printing please set this flag `--hide_steps`.

Co-authored-by: Stanley <stanley@MacStudio.lan>
2022-12-19 15:39:57 +07:00
Quinn Dawkins
7d0cbd8d90 [SD][web] Set default tuned unet to v2 (#663) 2022-12-19 11:50:08 +07:00
Quinn Dawkins
59358361f9 [SD] Make clip batch 2 for positive and negative prompts (#662)
Combines the forward passes for each input prompt type into a single batched clip pass.
2022-12-18 23:46:21 -05:00
Quinn Dawkins
7fea2d3b68 [SD] update default large heap size for web as well (#661) 2022-12-18 21:50:26 -05:00
Quinn Dawkins
b6d3ff26bd [SD] Change default VMA large heap block size (#660) 2022-12-18 21:41:46 -05:00
Stella Laurenzo
523e63f5c1 Fix NoneType exception if vulkan tuning flags not detected. (#659)
(This goes on to produce compilation errors, but one step at a time)
2022-12-18 16:40:56 -08:00
Stella Laurenzo
10630ab597 Add config stanza for NVIDIA RTX 2080. (#658)
Just happened to have this card on my Windows machine and verified that the SD demo works on it.

```
Average step time: 144.26142692565918ms/it
Clip Inference Avg time (ms) = (205.001 + 44.000) / 2 = 124.501
VAE Inference time (ms): 281.001

Total image generation time: 7.856997728347778sec
```

I'd love to add an API upstream to derive compiler tuning flags from a host device.
2022-12-18 16:40:47 -08:00
Quinn Dawkins
2bc6de650d [SD] Add support for a compiled version of the discrete Euler scheduler (#657)
* Add Shark version of euler scheduler

* Add Shark version of euler scheduler to web ui
2022-12-17 19:25:43 -08:00
powderluv
ffef1681e3 Update stable_diffusion_amd.md 2022-12-17 03:40:08 -08:00
yzhang93
d935006a4a Update Unet tuned model to v2 (#656) 2022-12-16 22:10:15 -08:00
powderluv
660cb5946e Update to 392 release 2022-12-16 16:00:49 -08:00
Gaurav Shukla
10160a066a [SD][WEB] Add vae tuned model in the SD web (#653)
1. Add tuned vae model in the SD web.
2. Use tuned models in case of rdna3 cards.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-16 15:29:48 -08:00
Anush Elangovan
72976a2ece Import env vars first 2022-12-16 15:12:28 -08:00
Phaneesh Barwaria
831f206cd0 Revert "Add target triple selection for multiple cards" (#655)
This reverts commit acb905f0cc.
2022-12-16 15:01:45 -08:00
Gaurav Shukla
72648aa9f2 Revert "[SD][WEB] Deduce vulkan-target-triple in the presence of multiple cards"
This reverts commit 35e623deaf.
2022-12-17 04:28:18 +05:30
Gaurav Shukla
35e623deaf [SD][WEB] Deduce vulkan-target-triple in the presence of multiple cards
1. Get the correct vulkan-target-triple for a specified device in the
   presence of multiple cards.
2. Use tuned unet model for rdna3 cards.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-17 03:04:47 +05:30
Anush Elangovan
6263636738 Fix more lints 2022-12-16 13:26:15 -08:00
Anush Elangovan
535d012ded Fix lint 2022-12-16 13:24:51 -08:00
yzhang93
c73eed2e51 Add VAE winograd tuned model (#647) 2022-12-16 13:01:45 -08:00
Anush Elangovan
30fdc99f37 Set to enable llpc
Use an env var to enable llpc
2022-12-16 12:57:30 -08:00
PhaneeshB
acb905f0cc Add target triple selection for multiple cards 2022-12-17 02:24:37 +05:30
Gaurav Shukla
bba06d0142 [SD][WEB] Avoid passing args to utils APIs
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-17 01:41:33 +05:30
Ean Garvey
a14a47af12 Move most xfails to entries in tank/all_models.csv and temporarily remove multiprocessing and TF gpu support. (#646)
-Adds date variable back to nightly.yml so shark_tank uploads are dated again
-added specification for nightly pytests to not run tests on metal (vulkan is sufficient)
-added some paths/filetypes to be ignored when triggering workflow runs. (no test-models on changes to .md files or anything in the shark/examples/ directory or its subdirectories.
-pytest only picks up tank/test_models.py, so no need to specify which file to run when running pytest from SHARK base directory.
-Cleaned up xfails so that they can be added to models as csv entries. Columns 7-9 in all_models.csv trigger xfails with cpu, cuda, vulkan, respectively, and row 10 can be populated with a reason for the xfails.
-Fixed a few defaults for shark_args and pytest args (defined in conftest.py)
-Fixes --update_tank option in shark_downloader
removes some multiprocessing in pytest / TF+CUDA support because it breaks pytest and false passes, leaving regressions at large.
-Adds xfails for and removes albert torch from gen_sharktank list (tank/torch_model_list.csv).
-Cleans up xfails for cpu, cuda, vulkan (removing old ones)
2022-12-16 12:56:32 +05:30
Phaneesh Barwaria
73457336bc add flag for toggling vulkan validation layers (#624)
* add vulkan_validation_layers flag

* categorize SD flags

* stringify true and false for flag
2022-12-15 20:40:59 -06:00
Ean Garvey
a14c53ad31 Remove albert-base-v2 since it fails torch_mlir.compile() (#644) 2022-12-15 16:05:19 -06:00
Gaurav Shukla
e7e763551a [WEB][SD] Make unet tuned model default for rdna3 devices (#642) 2022-12-15 12:02:03 -08:00
nirvedhmeshram
2928179331 Add more NVIDIA targets (#640) 2022-12-15 11:24:38 -06:00
Stanley Winata
24a16a4cfe [Stable Diffusion] Disable binding fusion to work with moltenVK on mac. (#639)
Co-authored-by: Stanley <stanley@MacStudio.lan>
2022-12-16 00:22:49 +07:00
Phaneesh Barwaria
6aed4423b2 add vulkan lib path (#638) 2022-12-15 19:48:29 +07:00
yzhang93
6508e3fcc9 Update tuned model SD v2.1base (#634) 2022-12-14 16:02:35 -05:00
Gaurav Shukla
a15cb140ae [WEB] Display the 512x512 image size
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-14 22:43:03 +05:30
Prashant Kumar
898bc9e009 Add the stable diffusion v2.1 version. 2022-12-14 20:19:41 +05:30
Gaurav Shukla
e67ea31ee2 [SHARK][SD] Add --local_tank_cache flag in the stable diffusion
This flag can be used to set local shark_tank cache directory.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-14 20:00:25 +05:30
Gaurav Shukla
986c126a5c [SHARK][SD] Add support for negative prompts
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-14 18:20:09 +05:30
Gaurav Shukla
0eee7616b9 [WEB] Launch only one SD version at a time
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-14 17:30:24 +05:30
powderluv
5ddce749b8 lint fix 2022-12-13 22:02:32 -08:00
powderluv
d946cffabc Revert "Move most xfails to entries in tank/all_models.csv and temporarily remove multiprocessing and TF gpu support. (#602)" (#622)
This reverts commit fe618811ee.
2022-12-13 21:49:46 -08:00
Ean Garvey
fe618811ee Move most xfails to entries in tank/all_models.csv and temporarily remove multiprocessing and TF gpu support. (#602)
* Move most xfails to entries in tank/all_models.csv

* enable usage of pytest without specifying tank/test_models.py

* add dict_configs.py to gitignore.

* Pin versions for runtimes and torch-mlir for setup.
2022-12-13 18:11:17 -08:00
powderluv
09c45bfb80 clean up cache printf 2022-12-13 14:11:14 -08:00
Boian Petkantchin
e9e9ccd379 Add stress test 2022-12-13 13:21:51 -08:00
Boian Petkantchin
a9b27c78a3 Return dynamic model if specified when downloading from the tank 2022-12-13 13:21:51 -08:00
Boian Petkantchin
bc17c29b2e In get_iree_runtime_config get the specific device instead of the default 2022-12-13 13:21:51 -08:00
Boian Petkantchin
aaf60bdee6 Simplify iree_device_map 2022-12-13 13:21:51 -08:00
Gaurav Shukla
d913453e57 [WEB] Update models to 8dec and also default values (#620)
1. Update the models to 8 dec.
2. precision is default to `fp16` in CLI.
3. version is default to `v2.1base` in CLI as well as web.
4. The default scheduler is set to `EulerDiscrete` now.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-13 13:08:33 -08:00
powderluv
08e373aef4 Update stable_diffusion_amd.md 2022-12-13 11:47:29 -08:00
Prashant Kumar
4cb50a3d06 Update the models to 8th Dec version. 2022-12-14 00:01:46 +05:30
Gaurav Shukla
b03038222d [SHARK] Update dependencies
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-13 22:12:00 +05:30
Gaurav Shukla
5f5e0766dd [WEB] Add SD2.1 web support
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-13 21:36:01 +05:30
powderluv
48ec11c514 Build wheels (#613)
* Build wheels

* Update nightly.yml

* Update nightly.yml

* Update nightly.yml
2022-12-12 20:53:08 -08:00
Prashant Kumar
8ae76d18b5 Add euler scheduler. Also, make it default for sd2.1. 2022-12-13 00:03:45 +05:30
Prashant Kumar
e5be1790e5 Enable the v2.1 base version with --version="v2.1base". (#611) 2022-12-12 07:02:01 -08:00
powderluv
e64aa40b17 Add Windows nightly builder 2022-12-11 19:31:02 -08:00
mariecwhite
eb8114ece8 Initialize TF models locally (#610) 2022-12-12 11:35:34 +11:00
Ean Garvey
616ee9b824 Don't include baseline benchmarks if setup without IMPORTER=1. (#607) 2022-12-10 14:58:29 -06:00
Stanley Winata
57c94f8f80 [vulkan] Add "radeon" check to the default AMD triple (#604) 2022-12-10 09:05:48 -08:00
powderluv
2a59c4f670 Update stable_diffusion_amd.md 2022-12-09 16:54:47 -08:00
Boian Petkantchin
192ff487c4 Fix wrong path to script in tank readme (#598) 2022-12-09 11:51:17 -06:00
Gaurav Shukla
b62ee3fcb9 [WEB] Add schedulers in the web UI (#594)
1. Add schedulers option in web UI.
2. Remove random seed checkbox as the same functionality can be achieved
   by passing -1(or any negative number) to the seed.

Signed-Off-by: Gaurav Shukla

Signed-off-by: Gaurav Shukla
2022-12-08 13:53:20 -08:00
Ean Garvey
0225292a44 Remove print statements from compile utils (#593) 2022-12-08 13:40:47 -08:00
Ean Garvey
589a7ed02f Print a message when a model is downloaded via shark_downloader. (#595) 2022-12-08 15:27:58 -06:00
Quinn Dawkins
b3a42cd0b1 Don't do nchw-to-nhwc transpose for stable diffusion models (#592) 2022-12-08 12:19:23 -05:00
Gaurav Shukla
e3e1ca7cc6 [WEB] Fix seed when out of uint32 range
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-08 22:46:33 +05:30
Gaurav Shukla
57e417d174 [WEB] Fix web performance
Set the iree flags before compilation.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-08 19:57:20 +05:30
Ean Garvey
1699db79b5 Disable SHARK-Runtime flags if USE_IREE=1 specified during setup. (#588)
* Disable SHARK-Runtime flags if USE_IREE=1 specified during setup.

* Update setup_venv.sh

* Autodetect cpu count for runtime flags.
2022-12-08 02:31:31 -06:00
Quinn Dawkins
dab9403b8f Fix slow conversion to image in SD web gui (#586) 2022-12-08 00:35:51 -05:00
Ean Garvey
9a14298146 Revert changes to multiprocessing (#585) 2022-12-07 19:59:17 -06:00
Ean Garvey
40eea21863 Enable conv nchw-to-nhwc flag by default for most models + minor fixes (#584) 2022-12-07 16:24:02 -08:00
Ean Garvey
d2475ec169 Add mnasnet to torch models and minor fixes. (#577)
* Minor fixes to benchmark runner

* Add Mnasnet to tank.
2022-12-07 22:30:58 +05:30
Ean Garvey
b3bcf4bf44 Update expected failures in pytest suite. (#574) 2022-12-06 23:05:12 -08:00
Stanley Winata
6049f86bc4 [Vulkan][Utils] Automatic platform/OS detection (#569)
To enable AMD gpus on macOS, we need this detection to let the compiler know that we would be needing moltenVK to use this GPU.
2022-12-07 12:05:00 +07:00
mariecwhite
ff649b52ef Add TF EfficientNet Model (#502) 2022-12-06 13:51:59 -06:00
Gaurav Shukla
e9e138c757 [WEB] Add random seed checkbox
When True, it will not use user specified seed, instead will generate a
random seed.

Signed-Off-by: Gaurav Shukla
2022-12-06 21:44:22 +05:30
Phaneesh Barwaria
1096936a15 Enable f32 path for SD (#567) 2022-12-06 19:29:12 +05:30
Gaurav Shukla
29cc478525 [WEB] Add command line args to shark web
1. Now the server can be launched with command line args.
2. The `precision` and `scheduler` parameters are now part of command
   line args instead of UI.
3. Add vae encode model wrapper.

Signed-Off-by: Gaurav Shukla
2022-12-06 17:21:05 +05:30
Stanley Winata
05e9eb40b5 [Misc] Ignore vmfbs from getting tracked by git. (#566) 2022-12-06 00:01:52 -08:00
Stanley Winata
c4444ff695 [vulkan][utils] Add rdna3 detection (#565) 2022-12-05 23:56:06 -08:00
Anush Elangovan
27b34f3929 Add gcs instead of gsutil
Test .exe on AMD hardware.
2022-12-05 22:17:58 -08:00
powderluv
2b8d784660 update latest sd build 2022-12-05 22:16:13 -08:00
Daniel Garvey
18f447d8d8 fix hash comparison (#563)
Co-authored-by: dan <dan@nod-labs.com>
2022-12-05 21:43:05 -08:00
Daniel Garvey
d7e1078d68 remove nodcloud from client (#562)
Co-authored-by: dan <dan@nod-labs.com>
2022-12-05 23:13:19 -06:00
Daniel Garvey
6be592653f remove gsutil_flags and fix download (#559) 2022-12-05 20:29:00 -08:00
Daniel Garvey
8859853b41 Revert "Revert "find gsutil on linux (#557)" (#560)" (#561)
This reverts commit 3c46021102.
2022-12-05 20:27:43 -08:00
Daniel Garvey
3c46021102 Revert "find gsutil on linux (#557)" (#560)
This reverts commit bba8646669.
2022-12-05 21:53:47 -06:00
Daniel Garvey
bba8646669 find gsutil on linux (#557)
* find gsutil on linux

* cleaned up downloader and ditched gsutil

Co-authored-by: dan <dan@nod-labs.com>
2022-12-05 19:03:48 -08:00
Daniel Garvey
b0dc19a910 revert parallel downloads to 1 (#555)
Co-authored-by: dan <dan@nod-labs.com>
2022-12-05 15:42:42 -08:00
Daniel Garvey
df79ebd0f2 replace gsutil with variable path for pyinstaller (#541)
Co-authored-by: dan <dan@nod-labs.com>
2022-12-05 15:08:57 -08:00
Quinn Dawkins
e19a97f316 Don't do a numpy copy on the results from compiled vm (#543) 2022-12-05 14:21:47 -05:00
Harish Anand
482ffd6275 Move discord link from advanced instructions (#542) 2022-12-04 06:15:34 -08:00
Quinn Dawkins
5117e50602 Revert "Enable the clip f16 model." until correctness is fixed 2022-12-04 19:17:34 +05:30
powderluv
83b138208d Add gradio to requirements.txt 2022-12-03 16:06:52 -08:00
Quinn Dawkins
1870cb4557 Add a note to the Stable Diffusion README about clearing vulkan cache (#545) 2022-12-03 15:12:45 -08:00
Prashant Kumar
42ad5b9c5c Enable the clip f16 model.
-- Enabled the clip f16 model.
-- Updated the location of sdv2 model.
2022-12-03 18:50:40 +05:30
yzhang93
333975eb8f Update Unet fp16 tuned model and Vae flag (#539) 2022-12-02 23:21:18 -05:00
Gaurav Shukla
aa0195e4ef [SHARK] Add vae encoder wrapper
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-03 08:42:25 +05:30
Anush Elangovan
56109fe09b Add one click installer
Build with pyinstaller web\shark_sd.spec
2022-12-02 14:07:10 -08:00
powderluv
e74046478b Update stable_diffusion_amd.md 2022-12-02 13:57:03 -08:00
Gaurav Shukla
aa5a60812f [SHARK] Fix space issues in download path
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-03 00:52:10 +05:30
Ean Garvey
ebb60019aa Minor formatting fix. (#538) 2022-12-03 00:17:31 +05:30
mariecwhite
6393dc5d14 Use correct TF device depending on configuration (#492) 2022-12-02 11:33:56 -06:00
Anush Elangovan
8c158f2452 Fix onedir pyinstall
Use relative paths for install

pyinstaller web/shark_sd.spec creates an exe
2022-12-02 07:28:22 -08:00
powderluv
8c3eabdcee Update stable_diffusion_amd.md 2022-12-02 07:13:10 -08:00
powderluv
8aa0ce6a24 Update stable_diffusion_amd.md 2022-12-02 07:10:31 -08:00
Gaurav Shukla
a27ee141b3 [WEB] Fix few warnings and generate seed faster
1. Fix gsutil warnings while copying multiple files.
2. Enhance random seed generation speed.
3. Add support for multiple schedulers.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-02 17:16:19 +05:30
Anush Elangovan
1106456651 Update cuda 11.7 nightly URL and add index.spec 2022-12-01 22:49:23 -08:00
Quinn Dawkins
8856878cbd Add flag for enabling rgp from the main.py SD script (#533) 2022-12-01 19:01:29 -05:00
Gaurav Shukla
a9bac0287d [WEB] Update to latest models.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-12-01 22:55:31 +05:30
Gaurav Shukla
efbd3dc778 [WEB] Fix debug option and add random seed generation
Signed-Off-by: Gaurav Shukla
2022-12-01 21:08:34 +05:30
Phaneesh Barwaria
a0d0eaa408 add clip and vae timing (#527) 2022-12-01 16:17:40 +05:30
Prashant Kumar
e2bf734b67 Update f32 models. 2022-12-01 14:16:03 +05:30
Prashant Kumar
a333a90441 Update to the latest bytecode. 2022-12-01 12:44:54 +05:30
powderluv
6dc0057d3d Update README.md 2022-11-30 17:02:28 -08:00
powderluv
0f9e69d48c Update README.md 2022-11-30 17:01:23 -08:00
powderluv
e6a7c019ab Update README.md 2022-11-30 16:59:55 -08:00
powderluv
1d32eabd14 Update stable_diffusion_amd.md 2022-11-30 16:52:07 -08:00
powderluv
53d03f06a6 Update stable_diffusion_amd.md 2022-11-30 16:04:53 -08:00
powderluv
a2d8c40455 Update stable_diffusion_amd.md 2022-11-30 15:56:38 -08:00
powderluv
4f7d950c8d Update README.md 2022-11-30 15:54:50 -08:00
Harish Anand
cac54b8c26 Update stable_diffusion_amd.md (#525)
- Mention `git clone` after installing git in Windows
- Remove the extra . in powershell set-executionpolicy
2022-11-30 14:48:10 -08:00
powderluv
cd0e881d7d Update stable_diffusion_amd.md 2022-11-30 13:43:24 -08:00
powderluv
fee406e220 Update README.md 2022-11-30 13:43:02 -08:00
powderluv
128342f47f Update stable_diffusion_amd.md 2022-11-30 13:42:25 -08:00
powderluv
024487c5fe Update stable_diffusion_amd.md 2022-11-30 13:40:00 -08:00
powderluv
879ba27ccb Update stable_diffusion_amd.md 2022-11-30 13:33:04 -08:00
powderluv
6d6d9627e7 Update stable_diffusion_amd.md 2022-11-30 13:31:53 -08:00
powderluv
af4bc82543 Update stable_diffusion_amd.md 2022-11-30 13:30:15 -08:00
powderluv
439a18bcc3 Update README.md 2022-11-30 13:27:13 -08:00
powderluv
e12a1e0444 Update README.md 2022-11-30 13:01:19 -08:00
powderluv
4400b0d3c3 Update README.md 2022-11-30 12:38:02 -08:00
powderluv
5dff28ff99 streamline README.md 2022-11-30 12:23:36 -08:00
powderluv
d5ac841a1a Update requirements.txt
add transformers to base venv
2022-11-30 12:12:28 -08:00
powderluv
232ce12e9b Create stable_diffusion_amd.md 2022-11-30 12:10:34 -08:00
aldesilv
9a8638a6d0 dump all isas with amdllpc (#517)
SHARK/shark/examples/shark_inference/stable_diffusion$ python main.py --precision="fp16" --device="vulkan" --iree-vulkan-target-triple=rdna3-unknown-linux --no-load_vmfb --dispatch_benchmarks="all" --dispatch_benchmarks_dir="SD_dispatches" --dump_isa

Co-authored-by: alexander <alexander@nod-labs.com>
2022-11-30 11:33:30 -08:00
Gaurav Shukla
a5445866b8 [WEB] Update the iree flag
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-30 18:56:48 +05:30
powderluv
e8ded71a7b Default to 50 steps for SD 2022-11-29 16:45:23 -08:00
Prashant Kumar
a14c615def Update with the new flag. (#522) 2022-11-29 09:39:32 -08:00
Gaurav Shukla
3903b6ff0c [WEB] Enable Debug and disable live preview
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-29 22:39:53 +05:30
Ean Garvey
41bf262482 Update SD README.md (#516)
* Update README.md

* Create profiling_with_iree.md
2022-11-29 10:21:28 -06:00
Gaurav Shukla
645b658da0 [WEB] Update model wrappers and scheduler
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-29 21:22:33 +05:30
Prashant Kumar
6ee8f61fbe Add the stable diffusion v2 model.
The f16 version of stable diffusion v2 model is added.
--version="v2" will run the v2 model.
2022-11-29 18:18:04 +05:30
Prashant Kumar
3c4c4231ce Add new args. 2022-11-29 18:18:04 +05:30
Prashant Kumar
d0eef19eba Remove the lms versions as they were redundant.
Tested with the DPM scheduler.
2022-11-29 15:05:05 +05:30
Ean Garvey
6ca2eb3ad7 Update README.md (#515) 2022-11-28 14:09:30 -06:00
Prashant Kumar
74aeb55733 Add support for different schedulers.
Initial support for adding schedulers. This verifies the model running
with the PNDM scheduler too.
2022-11-28 22:12:09 +05:30
Gaurav Shukla
3eb7965ca0 [WEB] Pressing Enter at prompt triggers Image generation
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-28 20:56:20 +05:30
Phaneesh Barwaria
04f20070d1 xfail for cpu models with tensor shape inf error (#512) 2022-11-24 16:12:04 -06:00
Gaurav Shukla
88937fcb2f [WEB] Add vulkan-heap-block-size flag
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-24 16:58:27 +05:30
aldesilv
f80b85f10c dump spv for dispatches (#509) 2022-11-23 22:34:27 -06:00
Quinn Dawkins
32a2ec432d [Stable Diffusion] Revive the tuned model (#506) 2022-11-23 15:42:24 -05:00
Gaurav Shukla
f4821d0d39 [WEB] Update seed calculation and model versions.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-23 19:21:48 +05:30
Prashant Kumar
fdf2aa54ef Update the sd models. 2022-11-22 23:09:04 +05:30
Gaurav Shukla
275c032264 [WEB] Fix set_param prototype
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-22 20:15:45 +05:30
Gaurav Shukla
d88979fe19 [WEB] Enable guidance scale and update seed calculation
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-22 19:32:21 +05:30
Phaneesh Barwaria
e67bcffea7 add vulkan-heap-block-size flag (#498) 2022-11-22 13:30:25 +05:30
Ean Garvey
005ded3c6f Update xfails. (#500)
* Update test_models.py

* Fix formatting.
2022-11-22 01:30:34 +05:30
Gaurav Shukla
d624940e12 Remove unnecessary torch_mlir import
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-21 21:47:21 +05:30
Gaurav Shukla
7763403b0e [WEB] Cache text-encoder and reorganize the codebase
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-21 17:21:12 +05:30
Prashant Kumar
88c58244b9 Update stable diffusion models to point to new location. 2022-11-18 19:39:21 +05:30
yzhang93
0754c6ea20 Update model annotation to take vulkan configs (#495)
Co-authored-by: vivian <vivian@nod-labs.com>
2022-11-17 14:34:17 -08:00
Prashant Kumar
7b1f04d121 Changes incorporating the recent torch_mlir compile api changes. 2022-11-15 15:25:37 +05:30
Phaneesh Barwaria
d8a9bee244 Add internet connection check for re-downloading models (#488) 2022-11-14 13:56:42 -06:00
Phaneesh Barwaria
ac0ea6bd3c xfail albert tf static cpu (#490) 2022-11-14 13:56:26 -06:00
Ean Garvey
45677c1e23 Install torch version required by torch-mlir when setting up importer venv. (#486) 2022-11-14 14:01:01 +05:30
Phaneesh Barwaria
d9f4a9954a modify to get correct target triple (#485) 2022-11-13 20:13:44 -08:00
mariecwhite
ec461a4456 Enable XLA compiler for TF models (#484) 2022-11-13 20:10:47 -08:00
Mehdi Amini
559928e93b Actually print the error message when SharkRunner can't initialize the driver (#482)
Right now it would just terminate the process silently
2022-11-13 19:08:46 -08:00
Mehdi Amini
a526f7d5b8 Fix dispatch saving code after 749a2c2d (#483)
In 749a2c2d iree_device_map and iree_target_map have been made functions
but not all of the uses were updated.
2022-11-14 05:39:01 +05:30
Phaneesh Barwaria
749a2c2dec add support for choosing vulkan device (#439) 2022-11-12 14:00:41 -08:00
Gaurav Shukla
29a317dbb6 [WEB] Update SD styling and prompt loading. (#479)
* [WEB] CSS changes to the web-ui (#465)

This commit updates UI with styling.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>

* [WEB] Update the title (#466)

* [WEB] Add support for long prompts (#467)

* [WEB] fix background color

Signed-Off-by: Gaurav Shukla

* [WEB] Remove long prompts support

It removes support to long prompts due to higher lag in loading long prompts.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs>

* [WEB] Update nod logo and enable debug feature.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
Signed-off-by: Gaurav Shukla
Signed-off-by: Gaurav Shukla <gaurav@nod-labs>
2022-11-10 10:55:22 -08:00
Abhishek Varma
2f36de319a [SHARK_INFERENCE] Add ESRGAN model test file
-- This commit adds ESRGAN model test file to SHARK_INFERENCE.

Signed-off-by: Abhishek Varma <abhishek@nod-ai.com>
2022-11-10 17:12:42 +05:30
Quinn Dawkins
2005bce419 Fix flags for untuned Stable Diffusion FP16 model (#478) 2022-11-09 21:31:10 -05:00
Ean Garvey
8a02d7729d Add a few xfails. (#477) 2022-11-09 09:33:09 -08:00
Prashant Kumar
1cdf301c14 Update the guidance parameter argument and add the int8 version of the
stable diffusion model.
2022-11-08 23:14:44 +05:30
yzhang93
9a86e5c476 Fix dispatch benchmarking tool (#460) 2022-11-08 09:37:12 -08:00
Eliasj42
32d3f4bd5f added ordered benchmarks to dispatch benchmarking tool (#450)
* added ordered benchmarks to dispatch benchmarking tool

* saved changes

* updated readme

Co-authored-by: Elias Joseph <elias@nod-labs.com>
2022-11-07 09:36:21 -08:00
Prashant Kumar
18689afc1a Make separate function for each model. 2022-11-07 20:20:38 +05:30
PhaneeshB
64d6da75c7 Resolve Mac torch-mlir torch setup dependency. Enable MacOS CI 2022-11-07 15:38:37 +05:30
Ean Garvey
1e95e4b502 Change dependency installation order in venv setup script. (#470) 2022-11-04 20:53:54 -05:00
Ean Garvey
c63009a6db Update test_models.py (#464) 2022-11-04 16:59:01 -07:00
Gaurav Shukla
88f8718635 [WEB] Load prompts from json
The prompt examples will now be loaded from a json file `prompts.json`.

Signed-Off-by: Gaurav Shukla
2022-11-02 20:52:34 +05:30
Prashant Kumar
a081733a42 Add the clip text shark_model. (#458) 2022-11-02 00:08:33 -07:00
Gaurav Shukla
06ccfb0533 [WEB] Load vae and unet during server start up
The vae and unet models(both fp16 and fp32 variant) can be loaded at
server startup in order to reduce web response time.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-01 23:11:52 +05:30
Gaurav Shukla
b18d75e3f7 [WEB] Use tuned version of UNET fp16
This commit updates SD script in order to use the tuned version of Unet
fp16 model.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-11-01 19:00:21 +05:30
Quinn Dawkins
3e7efaa048 Switch stable diffusion to the new tuned model (#455) 2022-10-31 15:15:31 -07:00
Gaurav Shukla
a3fdfc81db [WEB] Minor changes in the shark web (#454)
1. Default steps = 50.
2. Live preview will yield intermediate image at every 5 steps.
3. Add logs to .gitignore

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-31 14:29:00 -07:00
Gaurav Shukla
f4c91df1df [WEB] Add pillow dependency (#453)
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-31 12:57:21 -07:00
Prashant Kumar
32e1ba8c0d Adding batch_size support for stable diffusion. 2022-11-01 00:57:52 +05:30
Gaurav Shukla
1939376d72 [WEB] Cache model parameters (#452)
This commit cache some of the model parameters to reduce the response
time of shark web.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-31 11:55:10 -07:00
Gaurav Shukla
25931d48a3 [WEB] Update stable diffusion UI and enable live preview (#447)
This commit enables live preview feature and also updates stable
diffusion web UI.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-31 04:10:15 -07:00
powderluv
024c5e153a Update Windows in README 2022-10-30 22:27:03 -07:00
powderluv
83f34b645d Add Windows instructions 2022-10-30 22:25:42 -07:00
powderluv
3f9f450e0d Add setup_venv.ps1 for windows (#448)
Powershell users can run ./setup_venv.ps1 to setup the env
2022-10-30 22:17:35 -07:00
powderluv
fd89b06641 Drop RDNA1 for now 2022-10-29 14:29:09 -07:00
Gaurav Shukla
f8dc996004 Update vulkan-target-triple for Radeon devices. (#446)
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-29 14:27:20 -07:00
Phaneesh Barwaria
e6a964088b Add os agnostic vulkan device name check (#445) 2022-10-29 13:19:14 -07:00
Gaurav Shukla
e3e767c7eb [WEB] Remove live preview and disable resnet|albert_maskfill
This commit removes live preview feature for now as it's not functional.
This feature will be added in the next patch.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-30 00:37:59 +05:30
Quinn Dawkins
239c19eb12 Update Stable diffusion script to enable use of tuned models (#443) 2022-10-29 01:42:49 -04:00
Eliasj42
7f37599a60 Added a dispatch benchmarking tool (#441)
To produce benchmarks of individual dispatches, you can add --dispatch_benchmarks=All --dispatch_benchmarks_dir=<output_dir> to your command line argument.

Co-authored-by: Elias Joseph <elias@nod-labs.com>
2022-10-28 14:31:03 -07:00
Prashant Kumar
77c9a2c5ea Add profiling vulkan_device info and minor changes to reflect upstream
changes.
2022-10-28 18:02:07 +05:30
Ean Garvey
fd7baae548 Serialize torch-mlir CAPI module as bytecode instead of string. (#435)
* Serialize torch-mlir CAPI as bytecode instead of string.

* Minor fixes to MLIR data handling in SHARK python.
2022-10-27 14:37:15 -05:00
Stanley Winata
01fdf5ee16 [example][SD] compile fp16 with iree-spirv-unify-aliased-resources (#436) 2022-10-27 05:12:28 -07:00
Gaurav Shukla
e52f533c16 [WEB] Save vmfb and add live preview
This commit updates SD script to save the compiled module and also adds
live preview of generated images.

Signed-off-by: Gaurav Shukla<gaurav@nod-labs.com>
2022-10-26 23:20:53 +05:30
Quinn Dawkins
fbd77dc936 Enable iterator space fusion for SD (#432) 2022-10-26 01:08:26 -04:00
Quinn Dawkins
cdc6dd19e3 Force stable diffusion fp16 and fp32 to generate images with similar noise (#431) 2022-10-25 17:28:18 -04:00
PhaneeshB
fd578a48a9 add cli args for vulkan target triple 2022-10-25 21:47:26 +05:30
Ean Garvey
9956099516 Add pytest option for updating tank and fix save_mlir function. (#413)
* Use IREE tf tools to save .mlir modules when generating shark_tank.

* Add option to pytest for enabling auto-updates to local shark tank.

* xfail mobilenet torch on cpu, cuda and fix CI macos setup

* Update test-models.yml to disable macos vulkan CI.
2022-10-25 21:29:18 +05:30
powderluv
f97b8fffed Update README.md 2022-10-24 12:51:49 -07:00
Gaurav Shukla
7b9e309724 [WEB] Expose SD parameters in the web ui (#427) 2022-10-24 04:34:35 -07:00
Quinn Dawkins
1d33913d48 Add option to save and load precompiled flatbuffer (#425) 2022-10-23 16:24:09 -07:00
Prashant Kumar
a48eaaed20 Pass the flags to vae. 2022-10-23 23:57:48 +05:30
Prashant Kumar
2741b8be53 Pass the flags to vae. (#422) 2022-10-23 11:23:13 -07:00
Anush Elangovan
4f906a265c Fix lint 2022-10-22 12:43:52 -07:00
Anush Elangovan
0dff8d7af0 Simple download script to prime the hf model cache 2022-10-21 17:42:05 -07:00
Quinn Dawkins
4f0d0d8167 Update vulkan gui README for iree-vulkan-gui + Stable Diffusion (#399) 2022-10-21 14:02:40 -04:00
Vivek Khandelwal
d513060b21 Add params for Stable Diffusion (#420) 2022-10-21 23:11:09 +05:30
Prashant Kumar
d1a25ce4f3 Update stable_args.py 2022-10-21 17:26:31 +05:30
Gaurav Shukla
51c98695b2 [WEB] Update stable diffusion inference
This commit updates the stable diffusion web incorporating the latest
improvements.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-10-21 01:26:38 +05:30
Quinn Dawkins
b448770ec2 Add ms/iter timing for stable diffusion script (#414) 2022-10-20 13:32:37 -04:00
Prashant Kumar
5fe22a7980 Minor fix. 2022-10-20 22:57:22 +05:30
Prashant Kumar
38ae6b5af4 Add stable_diffusion fp16 and fp32 with args. 2022-10-20 21:47:11 +05:30
Ean Garvey
0bfe30d75d Fix issues with extra_args in benchmarks, pin tf==2.10 (#411) 2022-10-20 06:55:26 -07:00
Quinn Dawkins
7be1d7d0be Add option for extra arguments through SharkInference.compile (#408) 2022-10-19 15:32:48 -05:00
Prashant Kumar
0d74c873f0 Add stable_diff_f16 version. (#407) 2022-10-19 10:04:24 -07:00
powderluv
139aff2938 Update nightly.yml
fix links
2022-10-18 23:42:22 -07:00
anush elangovan
a3f733490c Force update of packages
Pickup tools from upstream IREE
2022-10-19 05:20:53 +00:00
anush elangovan
8a11f138d1 Update SHARK-Runtime releases page 2022-10-19 05:06:36 +00:00
Ean Garvey
3405607917 (TESTING) Fix .whl assets path (#404) 2022-10-14 12:13:14 -05:00
Ean Garvey
7c99a6bd33 Update README.md (#406) 2022-10-13 20:29:49 -05:00
Ean Garvey
3fba8ce0e6 Update README.md (#405) 2022-10-13 12:43:03 -07:00
Ean Garvey
f3bde3c7fc Cleanup tank directory and move instructions to tank/README.md (#401) 2022-10-13 12:20:02 -05:00
Phaneesh Barwaria
21fee8ef33 enable only one workflow job per branch (#402) 2022-10-13 12:15:30 -05:00
Vivek Khandelwal
0e217d6180 Add Stable Diffusion Img2Img model script 2022-10-13 21:56:46 +05:30
Phaneesh Barwaria
00a8ce75d1 Xfail vulkan tests and Enable MacOs test on CI (#383) 2022-10-13 11:14:41 -05:00
Quinn Dawkins
8f3f00cd99 Add iree-run-module like tool for running in a vulkan session (#398) 2022-10-12 20:46:26 -04:00
Ean Garvey
13bae2538a Update URL for IREE compiler/runtime install (#397)
* Update URL for IREE compiler/runtime install

* Update gh-pages-releases.yml

* Update test_models.py

* Update assets path
2022-10-12 15:47:11 -05:00
Ean Garvey
f508c80c23 Add workflow for GH pages releases and release scraping script. (#394)
* Add workflow for GH pages releases and release scraping script.

* Update test_models.py and change tokens for gh pages.
2022-10-11 22:03:33 -05:00
gpetters94
53df0620e3 Add OPT to tank (#214) 2022-10-11 11:03:56 -05:00
powderluv
a63755bc24 Correct spelling 2022-10-11 01:53:55 -07:00
Quinn Dawkins
d93d0783a8 Add script for tensorflow stable diffusion (#391) 2022-10-10 12:01:49 -04:00
Daniel Garvey
d38e37bd99 seperate importer and benchmark deps (#393) 2022-10-08 23:31:20 -05:00
Ean Garvey
3618fb3ada Move old test scripts out of base tank directory and add xfails. (#389) 2022-10-07 16:02:46 -07:00
Vivek Khandelwal
70a29b03e0 Add FP16 Resnet50 script 2022-10-06 21:56:43 +05:30
Ean Garvey
006adf8746 Fix issue with FASTAPI pip install. (#382) 2022-10-01 14:55:24 -05:00
Quinn Dawkins
33b53e7caf Add flag for specifying the vae mlir file location in stable diffusion (#381) 2022-09-30 00:37:58 -04:00
Daniel Garvey
c54815de17 edit assets path (#376) 2022-09-28 16:42:36 -05:00
Gaurav Shukla
0013fb0753 [WEB] Add shark-web logging
1. This commit adds support to display logs in the shark-web.
2. It also adds nod logo in the home page.
3. Stable-diffusion outputs are being saved now.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-09-29 01:20:42 +05:30
Ean Garvey
56f8a0d85a Update torch-mlir releases page in setup_venv.sh (#374)
* Update README.md

* Update setup_venv.sh
2022-09-28 11:07:44 -07:00
Ean Garvey
9035a2eed3 Add --local_tank_cache flag and update requirements. (#368)
* Add --local_tank_cache flag and update requirements.

* Update requirements-importer.txt
2022-09-28 03:02:59 -05:00
Vivek Khandelwal
28daf410b6 Add instructions to use locally build Torch-MLIR with SHARK 2022-09-28 10:16:38 +05:30
Ean Garvey
cbf3f784aa Add pytest option to specify a URL for shark tank artifacts. (#363)
* Xfail updates.

* Generalize tank SHA option to bucket address and add pytest option.
2022-09-27 02:40:40 -05:00
Anush Elangovan
ef4b306c7b Add diffusers and scipy 2022-09-26 13:35:23 -07:00
powderluv
5316c1e0bf Use latest transformers (#346) 2022-09-26 13:11:41 -07:00
Gaurav Shukla
0228973eef [WEB] Fix the mlir location of stable-diffusion model (#367)
Update the location of stable-diffusion mlir file since there is some
problem with iree-compile.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-09-26 10:56:36 -07:00
Gaurav Shukla
d4eeff0a5d [WEB] Add Stable-Diffusion in the SHARK web (#366)
1. This commit adds stable-diffusion as a part of shark web.
2. The V-diffusion model has been disabled for now as it's not
   working(will raise a different patch with fix).
3. Add standard output in the web ui.
4. Add instructions to launch the shark-web.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-09-26 10:42:02 -07:00
Prashant Kumar
c7b2d39ab2 Update stable_diff to contain vae. 2022-09-26 20:11:43 +05:30
Gaurav Shukla
21958cc02a [WEB] Remove unused parameters in the v-diffuison model (#314)
This commit removes unused parameters in the v-diffusion model. It also
updated the server parameters in order to make multiple requests to be
handled sequentially.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-09-25 10:57:06 -07:00
Ean Garvey
de23e5d9d7 update xfails for PyTorch DistilBERT (#355) 2022-09-24 14:53:20 -05:00
Quinn Dawkins
6438bce023 Add a script to convert a jpg to the correct input for resnet50 with the vulkan gui (#362) 2022-09-23 16:32:52 -07:00
yzhang93
587d74b449 Update model annotation tool (#361)
Usage:
with create_context() as ctx:
  module = model_annotation(ctx, input_contents=..., config_path=..., search_op=...)

Example:
The example is to annotate the minilm model with GPU config files.
python model_annotation.py /nodclouddata/vivian/minilm_model/model.mlir /nodclouddata/vivian/minilm_model/model_config.json
2022-09-23 15:44:51 -07:00
Prashant Kumar
b9c8985047 Add sharkdynamo which combines shark with torchdynamo.
-- Adds graph breaks when necessary.
-- Even for loops are supported.
2022-09-23 22:40:02 +05:30
Vivek Khandelwal
93ebe07d2b Add bert_tosa script 2022-09-23 10:52:06 +05:30
Ean Garvey
d82b305781 Fix issues with loading .vmfb into SharkInference 2022-09-23 09:53:13 +05:30
Quinn Dawkins
1df20fac95 [Lockstep] Hack to avoid aten._reshape_alias (#332)
This enforces the decomposition for aten._reshape_alias used in AOTAutograd to essentially avoid having to deal with problems with strides when running in eager mode.
2022-09-22 18:02:09 -04:00
Prashant Kumar
991e7043d1 Add stable diffusion model. 2022-09-22 13:40:51 +05:30
powderluv
1c4d6c23fa Update CMakeLists.txt 2022-09-21 22:48:56 -07:00
Anush Elangovan
87895446a5 Roll SHARK-Runtime 2022-09-22 00:09:04 -07:00
Ean Garvey
c0f3a09a40 Include SHA in path to failure reproducers. Add --save_fails option. (#352) 2022-09-21 17:55:06 -05:00
Anush Elangovan
e9ad4b9fc4 Update SHARK Runtime 2022-09-21 06:31:48 -07:00
Ean Garvey
c061a8897d Add pytest options to save reproducers. (#350)
* Add pytest options to save and/or upload reproducers.

* pass shark_module to benchmark method.
2022-09-20 20:29:46 -05:00
Ean Garvey
4253551b67 Update README with new testing instructions and filter test cases. (#349) 2022-09-20 15:55:46 -05:00
Vivek Khandelwal
e4991c049e Add Readme file for the bloom model 2022-09-20 20:27:52 +05:30
Daniel Garvey
5df582e7e8 creates abstract test case class (#333) 2022-09-20 07:06:38 -07:00
Ean Garvey
814a6f8295 Modify vulkan target triple substring searches. (#318) 2022-09-20 01:20:20 -05:00
Vivek Khandelwal
7013c3cd4a Add bloom e2e script 2022-09-20 10:56:04 +05:30
powderluv
0ddd65b6f1 Create LICENSE 2022-09-19 15:07:59 -07:00
powderluv
44d8f08bfc Fix Torch-MLIR release page 2022-09-17 00:50:39 -07:00
erman-gurses
fc8aa6ae63 Add ROCM parameters (#335) 2022-09-16 09:12:19 -07:00
Quinn Dawkins
9bd951b083 Clean up the v-diffusion install pipeline (#327) 2022-09-16 11:47:07 -04:00
Vivek Khandelwal
c43448a826 Update compile_utils.py 2022-09-15 18:28:10 +05:30
Vivek Khandelwal
864723a473 add bloom model example 2022-09-15 18:23:09 +05:30
Anush Elangovan
3b0ec8ce4e Update resnet paths 2022-09-14 16:56:20 -07:00
Anush Elangovan
174b171913 Clean up SDL linking 2022-09-14 13:18:55 -07:00
powderluv
cfd9733c2b Delete shark_web directory 2022-09-14 06:38:30 -07:00
Anush Elangovan
8d4d543a49 Update shark runtime 2022-09-14 06:14:02 -07:00
Anush Elangovan
1b9c88a052 Update vulkan gui readme 2022-09-13 19:35:47 -07:00
Anush Elangovan
e212ff2071 Fix resnet50 vulkan_gui to work with tank models 2022-09-13 19:22:41 -07:00
Quinn Dawkins
8d21292d34 Fix input tensors with non-floating point dtype in the lockstep tracer (#328) 2022-09-13 21:14:38 -04:00
Anush Elangovan
e304041574 Remove redundant {} 2022-09-13 16:12:35 -07:00
Anush Elangovan
1776c55e73 Fix torch-mlir download URL 2022-09-13 16:07:25 -07:00
Anush Elangovan
4e4c34c717 fix release downloads 2022-09-13 15:00:47 -07:00
Anush Elangovan
23378b6be8 Add resnet to vulkan-gui 2022-09-13 07:06:47 -07:00
Ean Garvey
6cf5564c84 Remove "gpu" device alias and migrate to using "cuda" for NVIDIA GPU. (#325)
* Replace instances of "gpu" alias for devices with "cuda"
2022-09-13 01:16:56 -05:00
Ean Garvey
7143902a90 Update test-models.yml (#323) 2022-09-12 22:47:40 -05:00
Anush Elangovan
15186db73f Hardcode SDL2 for now (works on linux) 2022-09-12 10:17:41 -07:00
powderluv
ccd7a01ce2 Update README.md 2022-09-12 07:12:57 -07:00
Anush Elangovan
1d7035117d Add cpp inference examples and vulkan_gui 2022-09-12 07:07:33 -07:00
Ean Garvey
1710abd366 Update mobilenet_v3_small_torch_test.py (#322) 2022-09-10 15:22:57 -05:00
Ean Garvey
6aeda3670f Split nightly workflow by backend (IREE / SHARK) (#313)
* Fix validation for nightly builds.

* Add option to generate shark_tank inside SHARK project
Add shark_arg for updating tank on mismatched hash (downloader)

* Fixup CI tank dir option.

* Fixup work directory variable
2022-09-09 22:51:30 +05:30
Prashant Kumar
bb52b224d0 Add sparse architecture and test with torchrec SparseArch.
Features that don't work with current implementation:
    -- embeddingbag config with multiple features.
2022-09-09 21:49:30 +05:30
Stanley Winata
95ec3d7216 [tank][v-diffusion] Polish up v-diffusion UX (#315) 2022-09-08 12:55:51 -07:00
powderluv
18872222d3 Update README.md 2022-09-07 01:14:30 -07:00
Ean Garvey
d453f2e49d Enable CPU benchmarks on test-models workflows. (#299)
* Update test-models.yml

* Update README.md
2022-09-07 01:22:58 -05:00
Ean Garvey
3824d37d27 Add metadata to benchmark results. (#297) 2022-09-06 13:03:48 -05:00
Ean Garvey
d946287723 Update xfails for torchvision models. (#310) 2022-09-01 13:06:12 -05:00
Prashant Kumar
885b0969f5 [WEB] Cache the compiled module.
-- Don't compile the module again and again.
2022-09-01 23:08:08 +05:30
Gaurav Shukla
a886cba655 [WEB] Add v_diffusion model in the shark web (#306)
This commit adds adds `v_diffusion` model web visualization as a part of
shark web.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-09-01 06:34:51 -07:00
Vivek Khandelwal
4afe2e3adb Add func to save intermediate images in v_diffusion_pytorch 2022-09-01 18:36:58 +05:30
Gaurav Shukla
fe080eaee6 [WEB] Introduce web interface for the SHARK models (#298)
This commit introduces web application for SHARK using gradio platform.
This adds web visualization of `Resnet50` and `Albert_Maskfill` models
as a start.

Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>

Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>
2022-08-31 23:17:52 -07:00
Quinn Dawkins
3703f014d9 Add scripts for generating images on ats-m (#305) 2022-08-31 23:07:02 -07:00
Daniel Garvey
d45a496030 adds a flag to enable directory choice (#303)
individual tests will require implementation of the flag
alternatively, simply passing shark_default_sha in your
individual app's download function will allow for this behavior
2022-08-31 22:17:40 -07:00
powderluv
4ee164c66f remove a100 cpu 2022-08-31 12:59:47 -07:00
powderluv
bf84c033bb add icelake 2022-08-31 12:58:40 -07:00
Prashant Kumar
5105f62551 Add the dlrm_model in shark example. (#301)
-- DLRM model is added in the shark example.
-- The model is verified on cpu, gpu and vulkan.

Co-authored-by: powderluv <powderluv@users.noreply.github.com>
2022-08-31 12:54:21 -07:00
Quinn Dawkins
99be837d84 Add lockstep tracer based on TorchMLIR eager mode + examples (#243) 2022-08-31 15:50:24 -04:00
Quinn Dawkins
b7766898ee Add cfg sampling from tank model for v-diffusion and move compilation outside of the sampling loop (#302) 2022-08-31 11:35:04 -07:00
powderluv
57f73dfbc9 Update nightly.yml 2022-08-28 23:59:03 -07:00
powderluv
50b2b9638d Update nightly.yml 2022-08-28 23:43:32 -07:00
Daniel Garvey
1bfd00e2f8 fixes an install issue (#295) 2022-08-25 18:52:00 -05:00
Daniel Garvey
64424877ac No iree instal (#294)
* adds support to default to tuned model

currently setup for tf bert/resnet50
going to refactor test class to avoid having to
add an argument to 50+ files

* adds an option to avoid installing iree

useful when building iree from source
specify env variable NO_BACKEND=1
2022-08-25 15:02:28 -05:00
Phaneesh Barwaria
02d857260c Update ReadMe
-Add gsutil installation for resnet50 example
2022-08-25 20:28:50 +05:30
Phaneesh Barwaria
1322ec5935 Simplified Testing Interface (#289) 2022-08-24 23:54:56 -05:00
Daniel Garvey
48e9818f7e adds support to default to tuned model (#287)
currently setup for tf bert/resnet50
going to refactor test class to avoid having to
add an argument to 50+ files
2022-08-24 16:30:02 -05:00
Ean Garvey
14857770dc Fix local artifact recognition and usage by SHARK downloader. (#286)
* Fix local artifact recognition and usage by SHARK downloader.

* Update generate_sharktank.py

* Update generate_sharktank.py
2022-08-24 14:37:16 -05:00
Vivek Khandelwal
f79a6bf5aa Update setup_v_diffusion_pytorch.sh (#291)
Fix minor issue with v-diffusion PyTorch version
2022-08-24 22:00:02 +05:30
Prashant Kumar
7dc27a7477 Don't remove the latest .whl package from CI. (#290)
Previously, the CI was removing the latest package and pointing to the
stale package.
2022-08-24 09:03:48 -07:00
Chi_Liu
17dba601c8 Add huggingface top5 image classification automodel (#268) 2022-08-22 15:05:38 -07:00
Chi_Liu
064aa3b1f4 Fix tmp dir bug (#285) 2022-08-22 15:00:35 -07:00
Ean Garvey
4960efc686 Update requirements-importer.txt (#284) 2022-08-19 23:21:41 -05:00
334 changed files with 38517 additions and 8222 deletions

5
.flake8 Normal file
View File

@@ -0,0 +1,5 @@
[flake8]
count = 1
show-source = 1
select = E9,F63,F7,F82
exclude = lit.cfg.py, apps/language_models/scripts/vicuna.py, apps/language_models/src/pipelines/minigpt4_pipeline.py, apps/language_models/langchain/h2oai_pipeline.py

37
.github/workflows/gh-pages-releases.yml vendored Normal file
View File

@@ -0,0 +1,37 @@
# See: https://github.com/llvm/torch-mlir/issues/1374
name: Publish releases page
on:
workflow_dispatch:
jobs:
scrape_and_publish_releases:
name: "Scrape and publish releases"
runs-on: ubuntu-latest
# Don't run this in everyone's forks.
if: github.repository == 'nod-ai/AMD-SHARK-Studio'
steps:
- name: Checking out repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
token: ${{ secrets.NODAI_INVOCATION_TOKEN }}
- name: Run scrape releases script
run: python ./build_tools/scrape_releases.py nod-ai AMD-SHARK-Studio > /tmp/index.html
shell: bash
- run: git fetch --all
- run: git switch github-pages
- run: git config --global user.email "none@none.com"
- run: git config --global user.name "nod-ai"
- run: mv /tmp/index.html package-index/index.html
- run: git add package-index/index.html
# Only try to make a commit if the file has changed.
- run: git diff --cached --exit-code || git commit -m "Update releases."
- name: GitHub Push
uses: ad-m/github-push-action@d91a481090679876dfc4178fef17f286781251df # v0.8.0
with:
github_token: ${{ secrets.NODAI_INVOCATION_TOKEN }}
branch: github-pages

View File

@@ -9,123 +9,68 @@ on:
workflow_dispatch:
jobs:
build:
runs-on: a100
windows-build:
runs-on: 7950X
strategy:
fail-fast: false
matrix:
python-version: ["3.10"]
python-version: ["3.11"]
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
uses: actions/setup-python@42375524e23c412d93fb67b49958b491fce71c38 # v5.4.0
with:
python-version: ${{ matrix.python-version }}
- name: Setup pip cache
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Compute version
shell: powershell
run: |
package_version="$(printf '%(%Y%m%d)T.${{ github.run_number }}')"
tag_name="${package_version}"
echo "package_version=${package_version}" >> $GITHUB_ENV
echo "tag_name=${tag_name}" >> $GITHUB_ENV
$package_version = $(Get-Date -UFormat "%Y%m%d")+"."+${{ github.run_number }}
$package_version_ = $(Get-Date -UFormat "%Y%m%d")+"_"+${{ github.run_number }}
$tag_name=$package_version
echo "package_version=$package_version" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf8 -Append
echo "package_version_=$package_version_" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf8 -Append
echo "tag_name=$tag_name" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf8 -Append
- name: Create Release
id: create_release
uses: actions/create-release@v1
uses: ncipollo/release-action@440c8c1cb0ed28b9f43e4d1d670870f059653174 # v1.16.0
env:
GITHUB_TOKEN: ${{ secrets.NODAI_INVOCATION_TOKEN }}
with:
tag_name: ${{ env.tag_name }}
release_name: nod.ai SHARK ${{ env.tag_name }}
tag: ${{ env.tag_name }}
name: nod.ai AMDSHARK ${{ env.tag_name }}
body: |
Automatic snapshot release of nod.ai SHARK.
Automatic snapshot release of nod.ai AMDSHARK.
draft: true
prerelease: false
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest toml
if [ -f requirements.txt ]; then pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/nightly/cpu -f https://github.com/llvm/torch-mlir/releases -f https://github.com/nod-ai/SHARK-Runtime/releases; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude shark.venv,lit.cfg.py
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --exclude shark.venv,lit.cfg.py
- name: Build and validate the IREE package
run: |
cd $GITHUB_WORKSPACE
USE_IREE=1 VENV_DIR=iree.venv ./setup_venv.sh
source iree.venv/bin/activate
package_version="$(printf '%(%Y%m%d)T.${{ github.run_number }}')"
SHARK_PACKAGE_VERSION=${package_version} \
pip wheel -v -w wheelhouse . --pre -f https://download.pytorch.org/whl/nightly/torch -f https://github.com/llvm/torch-mlir/releases -f https://github.com/iree-org/iree/releases
# Install the built wheel
pip install ./wheelhouse/nodai*
# Validate the Models
/bin/bash "$GITHUB_WORKSPACE/build_tools/populate_sharktank_ci.sh"
pytest -k 'cpu' --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py --ignore=shark/tests/test_shark_importer.py --ignore=tank/tf/ |
tail -n 1 |
tee -a pytest_results.txt
pytest -k 'gpu' --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py --ignore=shark/tests/test_shark_importer.py --ignore=tank/tf/ |
tail -n 1 |
tee -a pytest_results.txt
pytest -k 'vulkan' --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py --ignore=shark/tests/test_shark_importer.py --ignore=tank/tf/ |
tail -n 1 |
tee -a pytest_results.txt
rm -rf ./wheelhouse/nodai*
prerelease: true
- name: Build and validate the SHARK Runtime package
- name: Build Package
shell: powershell
run: |
cd $GITHUB_WORKSPACE
./setup_venv.sh
source shark.venv/bin/activate
package_version="$(printf '%(%Y%m%d)T.${{ github.run_number }}')"
SHARK_PACKAGE_VERSION=${package_version} \
pip wheel -v -w wheelhouse . --pre -f https://download.pytorch.org/whl/nightly/torch -f https://github.com/llvm/torch-mlir/releases -f https://github.com/nod-ai/SHARK-Runtime/releases
# Install the built wheel
pip install ./wheelhouse/nodai*
# Validate the Models
pytest -k 'cpu' --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py --ignore=shark/tests/test_shark_importer.py --ignore=tank/tf/ |
tail -n 1 |
tee -a pytest_results.txt
pytest -k 'gpu' --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py --ignore=shark/tests/test_shark_importer.py --ignore=tank/tf/ |
tail -n 1 |
tee -a pytest_results.txt
pytest -k 'vulkan' --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py --ignore=shark/tests/test_shark_importer.py --ignore=tank/tf/ |
tail -n 1 |
tee -a pytest_results.txt
if !(grep -Fxq " failed" pytest_results.txt)
then
export SHA=$(git log -1 --format='%h')
gsutil -m cp -r $GITHUB_WORKSPACE/gen_shark_tank/* gs://shark_tank/$SHA
gsutil -m cp -r gs://shark_tank/$SHA/* gs://shark_tank/latest/
fi
rm pytest_results.txt
rm -rf ./wheelhouse/nodai*
./setup_venv.ps1
python process_skipfiles.py
$env:AMDSHARK_PACKAGE_VERSION=${{ env.package_version }}
pip install -e .
pip freeze -l
pyinstaller .\apps\amdshark_studio\amdshark_studio.spec
mv ./dist/nodai_amdshark_studio.exe ./dist/nodai_amdshark_studio_${{ env.package_version_ }}.exe
signtool sign /f c:\g\amdshark_02152023.cer /fd certHash /csp "eToken Base Cryptographic Provider" /k "${{ secrets.CI_CERT }}" ./dist/nodai_amdshark_studio_${{ env.package_version_ }}.exe
- name: Upload Release Assets
id: upload-release-assets
uses: dwenegar/upload-release-assets@v1
uses: dwenegar/upload-release-assets@fe47e06814723c7b1bea3a7e46cf93d5f020d0c3 # v3
env:
GITHUB_TOKEN: ${{ secrets.NODAI_INVOCATION_TOKEN }}
with:
release_id: ${{ steps.create_release.outputs.id }}
assets_path: ./wheelhouse/nodai_*.whl
assets_path: ./dist/nodai*
#asset_content_type: application/vnd.microsoft.portable-executable
- name: Publish Release
id: publish_release
uses: eregon/publish-release@v1
uses: eregon/publish-release@01df127f5e9a3c26935118e22e738d95b59d10ce # v1.0.6
env:
GITHUB_TOKEN: ${{ secrets.NODAI_INVOCATION_TOKEN }}
with:

View File

@@ -1,104 +0,0 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: Validate Models on Shark Runtime
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch:
jobs:
build-validate:
strategy:
fail-fast: true
matrix:
os: [a100, MacStudio, ubuntu-latest]
suite: [cpu,gpu,vulkan]
python-version: ["3.10"]
include:
- os: ubuntu-latest
suite: lint
exclude:
- os: ubuntu-latest
suite: vulkan
- os: ubuntu-latest
suite: gpu
- os: ubuntu-latest
suite: cpu
- os: MacStudio
suite: gpu
- os: MacStudio
suite: cpu
- os: MacStudio
suite: vulkan
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- name: Set Environment Variables
run: |
echo "SHORT_SHA=`git rev-parse --short=4 HEAD`" >> $GITHUB_ENV
echo "DATE=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Set up Python Version File ${{ matrix.python-version }}
if: matrix.os == 'a100' || matrix.os == 'ubuntu-latest'
run: |
# See https://github.com/actions/setup-python/issues/433
echo ${{ matrix.python-version }} >> $GITHUB_WORKSPACE/.python-version
- name: Set up Python ${{ matrix.python-version }}
if: matrix.os == 'a100' || matrix.os == 'ubuntu-latest'
uses: actions/setup-python@v4
with:
python-version: '${{ matrix.python-version }}'
#cache: 'pip'
#cache-dependency-path: |
# **/requirements-importer.txt
# **/requirements.txt
- name: Install dependencies
if: matrix.suite == 'lint'
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest toml black
- name: Lint with flake8
if: matrix.suite == 'lint'
run: |
# black format check
black --version
black --line-length 79 --check .
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude lit.cfg.py
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --exclude lit.cfg.py
- name: Validate CPU Models
if: matrix.suite == 'cpu'
run: |
cd $GITHUB_WORKSPACE
PYTHON=python${{ matrix.python-version }} IMPORTER=1 ./setup_venv.sh
source shark.venv/bin/activate
pytest -k 'cpu' --ignore=shark/tests/test_shark_importer.py --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py
- name: Validate GPU Models
if: matrix.suite == 'gpu'
run: |
cd $GITHUB_WORKSPACE
PYTHON=python${{ matrix.python-version }} IMPORTER=1 ./setup_venv.sh
source shark.venv/bin/activate
pytest --benchmark -k "gpu" --ignore=shark/tests/test_shark_importer.py --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py
gsutil cp ./bench_results.csv gs://shark-public/builder/bench_results/${DATE}/bench_results_gpu_${SHORT_SHA}.csv
- name: Validate Vulkan Models
if: matrix.suite == 'vulkan'
run: |
cd $GITHUB_WORKSPACE
PYTHON=python${{ matrix.python-version }} ./setup_venv.sh
source shark.venv/bin/activate
pytest -k 'vulkan' --ignore=shark/tests/test_shark_importer.py --ignore=benchmarks/tests/test_hf_benchmark.py --ignore=benchmarks/tests/test_benchmark.py

85
.github/workflows/test-studio.yml vendored Normal file
View File

@@ -0,0 +1,85 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: Validate AMDShark Studio
on:
push:
branches: [ main ]
paths-ignore:
- '**.md'
- 'amdshark/examples/**'
pull_request:
branches: [ main ]
paths-ignore:
- '**.md'
- 'amdshark/examples/**'
workflow_dispatch:
# Ensure that only a single job or workflow using the same
# concurrency group will run at a time. This would cancel
# any in-progress jobs in the same github workflow and github
# ref (e.g. refs/heads/main or refs/pull/<pr_number>/merge).
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build-validate:
strategy:
fail-fast: true
matrix:
os: [nodai-ubuntu-builder-large]
suite: [cpu] #,cuda,vulkan]
python-version: ["3.11"]
include:
- os: nodai-ubuntu-builder-large
suite: lint
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set Environment Variables
run: |
echo "SHORT_SHA=`git rev-parse --short=4 HEAD`" >> $GITHUB_ENV
echo "DATE=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Set up Python Version File ${{ matrix.python-version }}
run: |
echo ${{ matrix.python-version }} >> $GITHUB_WORKSPACE/.python-version
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@42375524e23c412d93fb67b49958b491fce71c38 # v5.4.0
with:
python-version: '${{ matrix.python-version }}'
- name: Install dependencies
if: matrix.suite == 'lint'
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest toml black
- name: Lint with flake8
if: matrix.suite == 'lint'
run: |
# black format check
black --version
black --check apps/amdshark_studio
# stop the build if there are Python syntax errors or undefined names
flake8 . --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --isolated --count --exit-zero --max-complexity=10 --max-line-length=127 \
--statistics --exclude lit.cfg.py
- name: Validate Models on CPU
if: matrix.suite == 'cpu'
run: |
cd $GITHUB_WORKSPACE
python${{ matrix.python-version }} -m venv amdshark.venv
source amdshark.venv/bin/activate
pip install -r requirements.txt --no-cache-dir
pip install -e .
# Disabled due to hang when exporting test llama2
# python apps/amdshark_studio/tests/api_test.py

50
.gitignore vendored
View File

@@ -2,6 +2,8 @@
__pycache__/
*.py[cod]
*$py.class
*.mlir
*.vmfb
# C extensions
*.so
@@ -31,7 +33,6 @@ MANIFEST
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
@@ -158,12 +159,53 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/
# Shark related artefacts
# vscode related
.vscode
# AMDShark related artifacts
*venv/
shark_tmp/
amdshark_tmp/
*.vmfb
.use-iree
tank/dict_configs.py
*.csv
reproducers/
apps/amdshark_studio/web/configs
# ORT related artefacts
cache_models/
onnx_models/
# Generated images
generated_imgs/
# Custom model related artefacts
variants.json
/models/
*.safetensors
# models folder
apps/stable_diffusion/web/models/
# model artifacts (AMDSHARK)
*.tempfile
*.mlir
*.vmfb
# Stencil annotators.
stencil_annotator/
# For DocuChat
apps/language_models/langchain/user_path/
db_dir_UserData
# Embeded browser cache and other
apps/stable_diffusion/web/EBWebView/
# Llama2 tokenizer configs
llama2_tokenizer_configs/
# Webview2 runtime artefacts
EBWebView/

6
.gitmodules vendored
View File

@@ -1,4 +1,4 @@
[submodule "inference/thirdparty/shark-runtime"]
path = inference/thirdparty/shark-runtime
url =https://github.com/nod-ai/SHARK-Runtime.git
[submodule "inference/thirdparty/amdshark-runtime"]
path = inference/thirdparty/amdshark-runtime
url =https://github.com/nod-ai/SRT.git
branch = shark-06032022

View File

@@ -1,3 +0,0 @@
[style]
based_on_style = google
column_limit = 80

218
LICENSE Normal file
View File

@@ -0,0 +1,218 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---- LLVM Exceptions to the Apache 2.0 License ----
As an exception, if, as a result of your compiling your source code, portions
of this Software are embedded into an Object form of such source code, you
may redistribute such embedded portions in such Object form without complying
with the conditions of Sections 4(a), 4(b) and 4(d) of the License.
In addition, if you combine or link compiled forms of this Software with
software that is licensed under the GPLv2 ("Combined Software") and if a
court of competent jurisdiction determines that the patent provision (Section
3), the indemnity provision (Section 9) or other Section of the License
conflicts with the conditions of the GPLv2, you may retroactively and
prospectively choose to deem waived or otherwise exclude such Section(s) of
the License, but only in their entirety and only with respect to the Combined
Software.

534
README.md
View File

@@ -1,29 +1,175 @@
# SHARK
# AMDSHARK
High Performance Machine Learning and Data Analytics for CPUs, GPUs, Accelerators and Heterogeneous Clusters
High Performance Machine Learning Distribution
[![Nightly Release](https://github.com/nod-ai/SHARK/actions/workflows/nightly.yml/badge.svg)](https://github.com/nod-ai/SHARK/actions/workflows/nightly.yml)
[![Validate torch-models on Shark Runtime](https://github.com/nod-ai/SHARK/actions/workflows/test-models.yml/badge.svg)](https://github.com/nod-ai/SHARK/actions/workflows/test-models.yml)
<h2>NOTE: This project is not currently maintained.</h2>
## Communication Channels
*The latest versions of this project are developments towards a refactor on top of IREE-Turbine. Until further notice, make sure you use an .exe release or a checkout of the `AMDSHARK-1.0` branch, for a working AMDSHARK-Studio*
* [SHARK Discord server](https://discord.gg/RUqY2h2s9u): Real time discussions with the SHARK team and other users
* [GitHub issues](https://github.com/nod-ai/SHARK/issues): Feature requests, bugs etc
## Installation
[![Nightly Release](https://github.com/nod-ai/AMD-SHARK-Studio/actions/workflows/nightly.yml/badge.svg)](https://github.com/nod-ai/AMD-SHARK-Studio/actions/workflows/nightly.yml)
<details>
<summary>Installation (Linux and macOS)</summary>
<summary>Prerequisites - Drivers </summary>
#### Install your Windows hardware drivers
* [AMD RDNA Users] Download the latest driver (23.2.1 is the oldest supported) [here](https://www.amd.com/en/support).
* [macOS Users] Download and install the 1.3.216 Vulkan SDK from [here](https://sdk.lunarg.com/sdk/download/1.3.216.0/mac/vulkansdk-macos-1.3.216.0.dmg). Newer versions of the SDK will not work.
* [Nvidia Users] Download and install the latest CUDA / Vulkan drivers from [here](https://developer.nvidia.com/cuda-downloads)
#### Linux Drivers
* MESA / RADV drivers wont work with FP16. Please use the latest AMGPU-PRO drivers (non-pro OSS drivers also wont work) or the latest NVidia Linux Drivers.
Other users please ensure you have your latest vendor drivers and Vulkan SDK from [here](https://vulkan.lunarg.com/sdk/home) and if you are using vulkan check `vulkaninfo` works in a terminal window
</details>
### Quick Start for AMDSHARK Stable Diffusion for Windows 10/11 Users
Install the Driver from [Prerequisites](https://github.com/nod-ai/AMD-SHARK-Studio#install-your-hardware-drivers) above
Download the [stable release](https://github.com/nod-ai/AMD-SHARK-Studio/releases/latest) or the most recent [AMDSHARK 1.0 pre-release](https://github.com/nod-ai/AMD-SHARK-Studio/releases).
Double click the .exe, or [run from the command line](#running) (recommended), and you should have the [UI](http://localhost:8080/) in the browser.
If you have custom models put them in a `models/` directory where the .exe is.
Enjoy.
<details>
<summary>More installation notes</summary>
* We recommend that you download EXE in a new folder, whenever you download a new EXE version. If you download it in the same folder as a previous install, you must delete the old `*.vmfb` files with `rm *.vmfb`. You can also use `--clear_all` flag once to clean all the old files.
* If you recently updated the driver or this binary (EXE file), we recommend you clear all the local artifacts with `--clear_all`
## Running
* Open a Command Prompt or Powershell terminal, change folder (`cd`) to the .exe folder. Then run the EXE from the command prompt. That way, if an error occurs, you'll be able to cut-and-paste it to ask for help. (if it always works for you without error, you may simply double-click the EXE)
* The first run may take few minutes when the models are downloaded and compiled. Your patience is appreciated. The download could be about 5GB.
* You will likely see a Windows Defender message asking you to give permission to open a web server port. Accept it.
* Open a browser to access the Stable Diffusion web server. By default, the port is 8080, so you can go to http://localhost:8080/.
* If you prefer to always run in the browser, use the `--ui=web` command argument when running the EXE.
## Stopping
* Select the command prompt that's running the EXE. Press CTRL-C and wait a moment or close the terminal.
</details>
<details>
<summary>Advanced Installation (Only for developers)</summary>
## Advanced Installation (Windows, Linux and macOS) for developers
### Windows 10/11 Users
* Install Git for Windows from [here](https://git-scm.com/download/win) if you don't already have it.
## Check out the code
```shell
git clone https://github.com/nod-ai/AMD-SHARK-Studio.git
cd AMD-SHARK-Studio
```
## Switch to the Correct Branch (IMPORTANT!)
Currently AMDSHARK is being rebuilt for [Turbine](https://github.com/iree-org/iree-turbine) on the `main` branch. For now you are strongly discouraged from using `main` unless you are working on the rebuild effort, and should not expect the code there to produce a working application for Image Generation, So for now you'll need switch over to the `AMDSHARK-1.0` branch and use the stable code.
```shell
git checkout AMDSHARK-1.0
```
The following setup instructions assume you are on this branch.
## Setup your Python VirtualEnvironment and Dependencies
### Windows 10/11 Users
* Install the latest Python 3.11.x version from [here](https://www.python.org/downloads/windows/)
#### Allow the install script to run in Powershell
```powershell
set-executionpolicy remotesigned
```
#### Setup venv and install necessary packages (torch-mlir, nodLabs/AMDShark, ...)
```powershell
./setup_venv.ps1 #You can re-run this script to get the latest version
```
### Linux / macOS Users
```shell
./setup_venv.sh
source amdshark1.venv/bin/activate
```
### Run Stable Diffusion on your device - WebUI
#### Windows 10/11 Users
```powershell
(amdshark1.venv) PS C:\g\amdshark> cd .\apps\stable_diffusion\web\
(amdshark1.venv) PS C:\g\amdshark\apps\stable_diffusion\web> python .\index.py
```
#### Linux / macOS Users
```shell
(amdshark1.venv) > cd apps/stable_diffusion/web
(amdshark1.venv) > python index.py
```
#### Access Stable Diffusion on http://localhost:8080/?__theme=dark
<img width="1607" alt="webui" src="https://user-images.githubusercontent.com/74956/204939260-b8308bc2-8dc4-47f6-9ac0-f60b66edab99.png">
### Run Stable Diffusion on your device - Commandline
#### Windows 10/11 Users
```powershell
(amdshark1.venv) PS C:\g\amdshark> python .\apps\stable_diffusion\scripts\main.py --app="txt2img" --precision="fp16" --prompt="tajmahal, snow, sunflowers, oil on canvas" --device="vulkan"
```
#### Linux / macOS Users
```shell
python3.11 apps/stable_diffusion/scripts/main.py --app=txt2img --precision=fp16 --device=vulkan --prompt="tajmahal, oil on canvas, sunflowers, 4k, uhd"
```
You can replace `vulkan` with `cpu` to run on your CPU or with `cuda` to run on CUDA devices. If you have multiple vulkan devices you can address them with `--device=vulkan://1` etc
</details>
The output on a AMD 7900XTX would look something like:
```shell
Average step time: 47.19188690185547ms/it
Clip Inference time (ms) = 109.531
VAE Inference time (ms): 78.590
Total image generation time: 2.5788655281066895sec
```
Here are some samples generated:
![tajmahal, snow, sunflowers, oil on canvas_0](https://user-images.githubusercontent.com/74956/204934186-141f7e43-6eb2-4e89-a99c-4704d20444b3.jpg)
![a photo of a crab playing a trumpet](https://user-images.githubusercontent.com/74956/204933258-252e7240-8548-45f7-8253-97647d38313d.jpg)
Find us on [AMDSHARK Discord server](https://discord.gg/RUqY2h2s9u) if you have any trouble with running it on your hardware.
<details>
<summary>Binary Installation</summary>
### Setup a new pip Virtual Environment
This step sets up a new VirtualEnv for Python
```shell
python --version #Check you have 3.7->3.10 on Linux or 3.10 on macOS
python -m venv shark_venv
source shark_venv/bin/activate
python --version #Check you have 3.11 on Linux, macOS or Windows Powershell
python -m venv amdshark_venv
source amdshark_venv/bin/activate # Use amdshark_venv/Scripts/activate on Windows
# If you are using conda create and activate a new conda env
@@ -33,27 +179,32 @@ python -m pip install --upgrade pip
*macOS Metal* users please install https://sdk.lunarg.com/sdk/download/latest/mac/vulkan-sdk.dmg and enable "System wide install"
### Install SHARK
### Install AMD-SHARK
This step pip installs SHARK and related packages on Linux Python 3.7, 3.8, 3.9, 3.10 and macOS Python 3.10
This step pip installs AMD-SHARK and related packages on Linux Python 3.8, 3.10 and 3.11 and macOS / Windows Python 3.11
```shell
pip install nodai-shark -f https://github.com/nod-ai/SHARK/releases -f https://github.com/llvm/torch-mlir/releases -f https://github.com/nod-ai/shark-runtime/releases --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install nodai-amdshark -f https://nod-ai.github.io/AMD-SHARK-Studio/package-index/ -f https://llvm.github.io/torch-mlir/package-index/ -f https://nod-ai.github.io/SRT/pip-release-links.html --extra-index-url https://download.pytorch.org/whl/nightly/cpu
```
If you are on an Intel macOS machine you need this [workaround](https://github.com/nod-ai/SHARK/issues/102) for an upstream issue.
### Run amdshark tank model tests.
```shell
pytest tank/test_models.py
```
See tank/README.md for a more detailed walkthrough of our pytest suite and CLI.
### Download and run Resnet50 sample
```shell
curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
curl -O https://raw.githubusercontent.com/nod-ai/AMD-SHARK-Studio/main/amdshark/examples/amdshark_inference/resnet50_script.py
#Install deps for test script
pip install --pre torch torchvision torchaudio tqdm pillow --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install --pre torch torchvision torchaudio tqdm pillow gsutil --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./resnet50_script.py --device="cpu" #use cuda or vulkan or metal
```
### Download and run BERT (MiniLM) sample
```shell
curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/minilm_jit.py
curl -O https://raw.githubusercontent.com/nod-ai/AMD-SHARK-Studio/main/amdshark/examples/amdshark_inference/minilm_jit.py
#Install deps for test script
pip install transformers torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./minilm_jit.py --device="cpu" #use cuda or vulkan or metal
@@ -61,127 +212,107 @@ python ./minilm_jit.py --device="cpu" #use cuda or vulkan or metal
</details>
<details>
<summary>Source Installation</summary>
<summary>Development, Testing and Benchmarks</summary>
## Check out the code
If you want to use Python3.11 and with TF Import tools you can use the environment variables like:
Set `USE_IREE=1` to use upstream IREE
```
# PYTHON=python3.11 VENV_DIR=0617_venv IMPORTER=1 ./setup_venv.sh
```
### Run any of the hundreds of AMDSHARK tank models via the test framework
```shell
git clone https://github.com/nod-ai/SHARK.git
python -m amdshark.examples.amdshark_inference.resnet50_script --device="cpu" # Use gpu | vulkan
# Or a pytest
pytest tank/test_models.py -k "MiniLM"
```
## Setup your Python VirtualEnvironment and Dependencies
```shell
# Setup venv and install necessary packages (torch-mlir, nodLabs/Shark, ...).
./setup_venv.sh
source shark.venv/bin/activate
```
For example if you want to use Python3.10 and upstream IREE with TF Import tools you can use the environment variables like:
```
# PYTHON=python3.10 VENV_DIR=0617_venv IMPORTER=1 USE_IREE=1 ./setup_venv.sh
```
If you are a Torch-mlir developer or an IREE developer and want to test local changes you can uninstall
### How to use your locally built IREE / Torch-MLIR with AMDSHARK
If you are a *Torch-mlir developer or an IREE developer* and want to test local changes you can uninstall
the provided packages with `pip uninstall torch-mlir` and / or `pip uninstall iree-compiler iree-runtime` and build locally
with Python bindings and set your PYTHONPATH as mentioned [here](https://google.github.io/iree/bindings/python/)
with Python bindings and set your PYTHONPATH as mentioned [here](https://github.com/iree-org/iree/tree/main/docs/api_docs/python#install-iree-binaries)
for IREE and [here](https://github.com/llvm/torch-mlir/blob/main/development.md#setup-python-environment-to-export-the-built-python-packages)
for Torch-MLIR.
### Run a demo script
How to use your locally built Torch-MLIR with AMDSHARK:
```shell
python -m shark.examples.shark_inference.resnet50_script --device="cpu" # Use gpu | vulkan
# Or a pytest
pytest tank/tf/hf_masked_lm/albert-base-v2_test.py::AlbertBaseModuleTest::test_module_static_cpu
1.) Run `./setup_venv.sh in AMDSHARK` and activate `amdshark.venv` virtual env.
2.) Run `pip uninstall torch-mlir`.
3.) Go to your local Torch-MLIR directory.
4.) Activate mlir_venv virtual envirnoment.
5.) Run `pip uninstall -r requirements.txt`.
6.) Run `pip install -r requirements.txt`.
7.) Build Torch-MLIR.
8.) Activate amdshark.venv virtual environment from the Torch-MLIR directory.
8.) Run `export PYTHONPATH=`pwd`/build/tools/torch-mlir/python_packages/torch_mlir:`pwd`/examples` in the Torch-MLIR directory.
9.) Go to the AMDSHARK directory.
```
Now the AMDSHARK will use your locally build Torch-MLIR repo.
## Benchmarking Dispatches
To produce benchmarks of individual dispatches, you can add `--dispatch_benchmarks=All --dispatch_benchmarks_dir=<output_dir>` to your pytest command line argument.
If you only want to compile specific dispatches, you can specify them with a space seperated string instead of `"All"`. E.G. `--dispatch_benchmarks="0 1 2 10"`
For example, to generate and run dispatch benchmarks for MiniLM on CUDA:
```
pytest -k "MiniLM and torch and static and cuda" --benchmark_dispatches=All -s --dispatch_benchmarks_dir=./my_dispatch_benchmarks
```
The given command will populate `<dispatch_benchmarks_dir>/<model_name>/` with an `ordered_dispatches.txt` that lists and orders the dispatches and their latencies, as well as folders for each dispatch that contain .mlir, .vmfb, and results of the benchmark for that dispatch.
if you want to instead incorporate this into a python script, you can pass the `dispatch_benchmarks` and `dispatch_benchmarks_dir` commands when initializing `AMDSharkInference`, and the benchmarks will be generated when compiled. E.G:
```
amdshark_module = AMDSharkInference(
mlir_model,
device=args.device,
mlir_dialect="tm_tensor",
dispatch_benchmarks="all",
dispatch_benchmarks_dir="results"
)
```
Output will include:
- An ordered list ordered-dispatches.txt of all the dispatches with their runtime
- Inside the specified directory, there will be a directory for each dispatch (there will be mlir files for all dispatches, but only compiled binaries and benchmark data for the specified dispatches)
- An .mlir file containing the dispatch benchmark
- A compiled .vmfb file containing the dispatch benchmark
- An .mlir file containing just the hal executable
- A compiled .vmfb file of the hal executable
- A .txt file containing benchmark output
See tank/README.md for further instructions on how to run model tests and benchmarks from the AMDSHARK tank.
</details>
<details>
<summary>Testing</summary>
### Run all model tests on CPU/GPU/VULKAN/Metal
```shell
pytest tank
# If on Linux for multithreading on CPU (faster results):
pytest tank -n auto
```
### Running specific tests
```shell
# Run tests for a specific model:
pytest tank/<MODEL_NAME> #i.e., pytest tank/bert-base-uncased
# Run tests for a specific case:
pytest tank/<MODEL_NAME> -k "keyword"
# i.e., pytest tank/bert-base-uncased/bert-base-uncased_test.py -k "static_gpu"
```
### Run benchmarks on SHARK tank pytests and generate bench_results.csv with results.
(requires source installation with `IMPORTER=1 ./setup_venv.sh`)
```shell
pytest --benchmark tank
# Just do static GPU benchmarks for PyTorch tests:
pytest --benchmark tank --ignore-glob="_tf*" -k "static_gpu"
```
### Benchmark Resnet50, MiniLM on CPU
(requires source installation with `IMPORTER=1 ./setup_venv.sh`)
```shell
# We suggest running the following commands as root before running benchmarks on CPU:
cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | awk -F, '{print $2}' | sort -n | uniq | ( while read X ; do echo $X ; echo 0 > /sys/devices/system/cpu/cpu$X/online ; done )
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
# Benchmark canonical Resnet50 on CPU via pytest
pytest --benchmark tank/resnet50/ -k "cpu"
# Benchmark canonical MiniLM on CPU via pytest
pytest --benchmark tank/MiniLM-L12-H384-uncased/ -k "cpu"
# Benchmark MiniLM on CPU via transformer-benchmarks:
git clone --recursive https://github.com/nod-ai/transformer-benchmarks.git
cd transformer-benchmarks
./perf-ci.sh -n
# Check detail.csv for MLIR/IREE results.
```
</details>
<details>
<summary>API Reference</summary>
### Shark Inference API
### AMDShark Inference API
```
from shark.shark_importer import SharkImporter
from amdshark.amdshark_importer import AMDSharkImporter
# SharkImporter imports mlir file from the torch, tensorflow or tf-lite module.
# AMDSharkImporter imports mlir file from the torch, tensorflow or tf-lite module.
mlir_importer = SharkImporter(
mlir_importer = AMDSharkImporter(
torch_module,
(input),
frontend="torch", #tf, #tf-lite
)
torch_mlir, func_name = mlir_importer.import_mlir(tracing_required=True)
# SharkInference accepts mlir in linalg, mhlo, and tosa dialect.
# AMDSharkInference accepts mlir in linalg, mhlo, and tosa dialect.
from shark.shark_inference import SharkInference
shark_module = SharkInference(torch_mlir, func_name, device="cpu", mlir_dialect="linalg")
shark_module.compile()
result = shark_module.forward((input))
from amdshark.amdshark_inference import AMDSharkInference
amdshark_module = AMDSharkInference(torch_mlir, device="cpu", mlir_dialect="linalg")
amdshark_module.compile()
result = amdshark_module.forward((input))
```
@@ -189,7 +320,7 @@ result = shark_module.forward((input))
### Example demonstrating running MHLO IR.
```
from shark.shark_inference import SharkInference
from amdshark.amdshark_inference import AMDSharkInference
import numpy as np
mhlo_ir = r"""builtin.module {
@@ -202,166 +333,37 @@ mhlo_ir = r"""builtin.module {
arg0 = np.ones((1, 4)).astype(np.float32)
arg1 = np.ones((4, 1)).astype(np.float32)
shark_module = SharkInference(mhlo_ir, func_name="forward", device="cpu", mlir_dialect="mhlo")
shark_module.compile()
result = shark_module.forward((arg0, arg1))
amdshark_module = AMDSharkInference(mhlo_ir, device="cpu", mlir_dialect="mhlo")
amdshark_module.compile()
result = amdshark_module.forward((arg0, arg1))
```
</details>
## Examples Using the REST API
* [Setting up AMDSHARK for use with Blender](./docs/amdshark_sd_blender.md)
* [Setting up AMDSHARK for use with Koboldcpp](./docs/amdshark_sd_koboldcpp.md)
## Supported and Validated Models
<details>
<summary>PyTorch Models</summary>
AMDSHARK is maintained to support the latest innovations in ML Models:
### Huggingface PyTorch Models
| TF HuggingFace Models | AMDSHARK-CPU | AMDSHARK-CUDA | AMDSHARK-METAL |
|---------------------|----------|----------|-------------|
| BERT | :green_heart: | :green_heart: | :green_heart: |
| DistilBERT | :green_heart: | :green_heart: | :green_heart: |
| GPT2 | :green_heart: | :green_heart: | :green_heart: |
| BLOOM | :green_heart: | :green_heart: | :green_heart: |
| Stable Diffusion | :green_heart: | :green_heart: | :green_heart: |
| Vision Transformer | :green_heart: | :green_heart: | :green_heart: |
| ResNet50 | :green_heart: | :green_heart: | :green_heart: |
| Hugging Face Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| Albert | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| BigBird | :green_heart: (AOT) | | | |
| DistilBERT | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
| GPT2 | :broken_heart: (AOT) | | | |
| MobileBert | :green_heart: (JIT) | :green_heart: | :green_heart: | :green_heart: |
For a complete list of the models supported in AMDSHARK, please refer to [tank/README.md](https://github.com/nod-ai/AMD-SHARK-Studio/blob/main/tank/README.md).
### Torchvision Models
## Communication Channels
| TORCHVISION Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|--------------------|----------------------|----------|----------|-------------|
| AlexNet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| DenseNet121 | :green_heart: (Script) | | | |
| MNasNet1_0 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| MobileNetV2 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| MobileNetV3 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Unet | :broken_heart: (Script) | | | |
| Resnet18 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnet50 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnet101 | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnext50_32x4d | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| ShuffleNet_v2 | :broken_heart: (Script) | | | |
| SqueezeNet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| EfficientNet | :green_heart: (Script) | | | |
| Regnet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| Resnest | :broken_heart: (Script) | | | |
| Vision Transformer | :green_heart: (Script) | | | |
| VGG 16 | :green_heart: (Script) | :green_heart: | :green_heart: | |
| Wide Resnet | :green_heart: (Script) | :green_heart: | :green_heart: | :green_heart: |
| RAFT | :broken_heart: (JIT) | | | |
For more information refer to [MODEL TRACKING SHEET](https://docs.google.com/spreadsheets/d/15PcjKeHZIrB5LfDyuw7DGEEE8XnQEX2aX8lm8qbxV8A/edit#gid=0)
### PyTorch Training Models
| Models | Torch-MLIR lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :broken_heart: | :broken_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
</details>
<details>
<summary>JAX Models</summary>
### JAX Models
| Models | JAX-MHLO lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| DALL-E | :broken_heart: | :broken_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
</details>
<details>
<summary>TFLite Models</summary>
### TFLite Models
| Models | TOSA/LinAlg | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :broken_heart: | :broken_heart: | | |
| FullyConnected | :green_heart: | :green_heart: | | |
| albert | :green_heart: | :green_heart: | | |
| asr_conformer | :green_heart: | :green_heart: | | |
| bird_classifier | :green_heart: | :green_heart: | | |
| cartoon_gan | :green_heart: | :green_heart: | | |
| craft_text | :green_heart: | :green_heart: | | |
| deeplab_v3 | :green_heart: | :green_heart: | | |
| densenet | :green_heart: | :green_heart: | | |
| east_text_detector | :green_heart: | :green_heart: | | |
| efficientnet_lite0_int8 | :green_heart: | :green_heart: | | |
| efficientnet | :green_heart: | :green_heart: | | |
| gpt2 | :green_heart: | :green_heart: | | |
| image_stylization | :green_heart: | :green_heart: | | |
| inception_v4 | :green_heart: | :green_heart: | | |
| inception_v4_uint8 | :green_heart: | :green_heart: | | |
| lightning_fp16 | :green_heart: | :green_heart: | | |
| lightning_i8 | :green_heart: | :green_heart: | | |
| lightning | :green_heart: | :green_heart: | | |
| magenta | :green_heart: | :green_heart: | | |
| midas | :green_heart: | :green_heart: | | |
| mirnet | :green_heart: | :green_heart: | | |
| mnasnet | :green_heart: | :green_heart: | | |
| mobilebert_edgetpu_s_float | :green_heart: | :green_heart: | | |
| mobilebert_edgetpu_s_quant | :green_heart: | :green_heart: | | |
| mobilebert | :green_heart: | :green_heart: | | |
| mobilebert_tf2_float | :green_heart: | :green_heart: | | |
| mobilebert_tf2_quant | :green_heart: | :green_heart: | | |
| mobilenet_ssd_quant | :green_heart: | :green_heart: | | |
| mobilenet_v1 | :green_heart: | :green_heart: | | |
| mobilenet_v1_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v2_int8 | :green_heart: | :green_heart: | | |
| mobilenet_v2 | :green_heart: | :green_heart: | | |
| mobilenet_v2_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v3-large | :green_heart: | :green_heart: | | |
| mobilenet_v3-large_uint8 | :green_heart: | :green_heart: | | |
| mobilenet_v35-int8 | :green_heart: | :green_heart: | | |
| nasnet | :green_heart: | :green_heart: | | |
| person_detect | :green_heart: | :green_heart: | | |
| posenet | :green_heart: | :green_heart: | | |
| resnet_50_int8 | :green_heart: | :green_heart: | | |
| rosetta | :green_heart: | :green_heart: | | |
| spice | :green_heart: | :green_heart: | | |
| squeezenet | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v1 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v1_uint8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_fpnlite | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_fpnlite_uint8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2_int8 | :green_heart: | :green_heart: | | |
| ssd_mobilenet_v2 | :green_heart: | :green_heart: | | |
| ssd_spaghettinet_large | :green_heart: | :green_heart: | | |
| ssd_spaghettinet_large_uint8 | :green_heart: | :green_heart: | | |
| visual_wake_words_i8 | :green_heart: | :green_heart: | | |
</details>
<details>
<summary>TF Models</summary>
### Tensorflow Models (Inference)
| Hugging Face Models | tf-mhlo lowerable | SHARK-CPU | SHARK-CUDA | SHARK-METAL |
|---------------------|----------------------|----------|----------|-------------|
| BERT | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| albert-base-v2 | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| DistilBERT | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| CamemBert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| ConvBert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| Deberta | | | | |
| electra | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| funnel | | | | |
| layoutlm | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| longformer | | | | |
| mobile-bert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| remembert | | | | |
| tapas | | | | |
| flaubert | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| roberta | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| xlm-roberta | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
| mpnet | :green_heart: | :green_heart: | :green_heart: | :green_heart: |
</details>
* [AMDSHARK Discord server](https://discord.gg/RUqY2h2s9u): Real time discussions with the AMDSHARK team and other users
* [GitHub issues](https://github.com/nod-ai/AMD-SHARK-Studio/issues): Feature requests, bugs etc
## Related Projects
@@ -370,7 +372,7 @@ For more information refer to [MODEL TRACKING SHEET](https://docs.google.com/spr
* [Upstream IREE issues](https://github.com/google/iree/issues): Feature requests,
bugs, and other work tracking
* [Upstream IREE Discord server](https://discord.gg/26P4xW4): Daily development
* [Upstream IREE Discord server](https://discord.gg/wEWh6Z9nMU): Daily development
discussions with the core team and collaborators
* [iree-discuss email list](https://groups.google.com/forum/#!forum/iree-discuss):
Announcements, general and low-priority discussion
@@ -383,10 +385,10 @@ For more information refer to [MODEL TRACKING SHEET](https://docs.google.com/spr
* Torch-MLIR Github issues [here](https://github.com/llvm/torch-mlir/issues)
* [`torch-mlir` section](https://llvm.discourse.group/c/projects-that-want-to-become-official-llvm-projects/torch-mlir/41) of LLVM Discourse
* Weekly meetings on Mondays 9AM PST. See [here](https://discourse.llvm.org/t/community-meeting-developer-hour-refactoring-recurring-meetings/62575) for more information.
* [MLIR topic within LLVM Discourse](https://llvm.discourse.group/c/llvm-project/mlir/31) SHARK and IREE is enabled by and heavily relies on [MLIR](https://mlir.llvm.org).
* [MLIR topic within LLVM Discourse](https://llvm.discourse.group/c/llvm-project/mlir/31) AMDSHARK and IREE is enabled by and heavily relies on [MLIR](https://mlir.llvm.org).
</details>
## License
nod.ai SHARK is licensed under the terms of the Apache 2.0 License with LLVM Exceptions.
nod.ai AMDSHARK is licensed under the terms of the Apache 2.0 License with LLVM Exceptions.
See [LICENSE](LICENSE) for more information.

28
amdshark/__init__.py Normal file
View File

@@ -0,0 +1,28 @@
import importlib
import logging
from torch._dynamo import register_backend
log = logging.getLogger(__name__)
@register_backend
def amdshark(model, inputs, *, options):
try:
from amdshark.dynamo_backend.utils import AMDSharkBackend
except ImportError:
log.exception(
"Unable to import AMDSHARK - High Performance Machine Learning Distribution"
"Please install the right version of AMDSHARK that matches the PyTorch version being used. "
"Refer to https://github.com/nod-ai/AMD-SHARK-Studio/ for details."
)
raise
return AMDSharkBackend(model, inputs, options)
def has_amdshark():
try:
importlib.import_module("amdshark")
return True
except ImportError:
return False

View File

@@ -0,0 +1,501 @@
# Copyright 2020 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from amdshark.amdshark_runner import AMDSharkRunner
from amdshark.iree_utils.compile_utils import (
export_iree_module_to_vmfb,
load_flatbuffer,
get_iree_runtime_config,
)
from amdshark.iree_utils.benchmark_utils import (
build_benchmark_args,
run_benchmark_module,
)
from amdshark.parser import amdshark_args
from datetime import datetime
import time
from typing import Optional
import csv
import os
TF_CPU_DEVICE = "/CPU:0"
TF_GPU_DEVICE = "/GPU:0"
def _bytes_to_mb_str(bytes_: Optional[int]) -> str:
return "" if bytes_ is None else f"{bytes_ / 1e6:.6f}"
class OnnxFusionOptions(object):
def __init__(self):
self.disable_gelu = False
self.disable_layer_norm = False
self.disable_attention = False
self.disable_skip_layer_norm = False
self.disable_embed_layer_norm = False
self.disable_bias_skip_layer_norm = False
self.disable_bias_gelu = False
self.enable_gelu_approximation = False
self.use_mask_index = False
self.no_attention_mask = False
def check_requirements(frontend):
import importlib
has_pkgs = False
if frontend == "torch":
tv_spec = importlib.util.find_spec("torchvision")
has_pkgs = tv_spec is not None
elif frontend in ["tensorflow", "tf"]:
keras_spec = importlib.util.find_spec("keras")
tf_spec = importlib.util.find_spec("tensorflow")
has_pkgs = keras_spec is not None and tf_spec is not None
return has_pkgs
class AMDSharkBenchmarkRunner(AMDSharkRunner):
# AMDSharkRunner derived class with Benchmarking capabilities.
def __init__(
self,
mlir_module: bytes,
device: str = "none",
mlir_dialect: str = "linalg",
extra_args: list = [],
):
self.device = amdshark_args.device if device == "none" else device
self.enable_tf32 = amdshark_args.enable_tf32
self.frontend_model = None
self.vmfb_file = None
self.mlir_dialect = mlir_dialect
self.extra_args = extra_args
self.import_args = {}
self.temp_file_to_unlink = None
if not os.path.isfile(mlir_module):
print(
"Warning: Initializing AMDSharkRunner with a mlir string/bytecode object will duplicate the model in RAM at compile time. To avoid this, initialize AMDSharkInference with a path to a MLIR module on your hard disk instead."
)
self.compile_str = True
else:
self.compile_str = False
AMDSharkRunner.__init__(
self,
mlir_module,
device,
self.mlir_dialect,
self.extra_args,
compile_vmfb=False,
)
self.vmfb_file = export_iree_module_to_vmfb(
mlir_module,
device,
".",
self.mlir_dialect,
extra_args=self.extra_args,
compile_str=self.compile_str,
)
params = load_flatbuffer(
self.vmfb_file,
device,
mmap=True,
)
self.iree_compilation_module = params["vmfb"]
self.iree_config = params["config"]
self.temp_file_to_unlink = params["temp_file_to_unlink"]
del params
def setup_cl(self, input_tensors):
self.benchmark_cl = build_benchmark_args(
self.vmfb_file,
self.device,
input_tensors,
mlir_dialect=self.mlir_dialect,
)
def benchmark_frontend(self, modelname):
if self.mlir_dialect in ["linalg", "torch"]:
return self.benchmark_torch(modelname)
elif self.mlir_dialect in ["mhlo", "tf"]:
return self.benchmark_tf(modelname)
def benchmark_torch(self, modelname, device="cpu"):
import torch
from tank.model_utils import get_torch_model
# TODO: Pass this as an arg. currently the best way is to setup with BENCHMARK=1 if we want to use torch+cuda, else use cpu.
device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cuda":
torch.set_default_device("cuda:0")
# if self.enable_tf32:
# torch.backends.cuda.matmul.allow_tf32 = True
else:
torch.set_default_dtype(torch.float32)
torch.set_default_device("cpu")
torch_device = torch.device("cuda:0" if device == "cuda" else "cpu")
HFmodel, input = get_torch_model(modelname, self.import_args)[:2]
frontend_model = HFmodel.model
frontend_model.to(torch_device)
if device == "cuda":
frontend_model.cuda()
input.to(torch.device("cuda:0"))
print(input)
else:
frontend_model.cpu()
input.cpu()
for i in range(amdshark_args.num_warmup_iterations):
frontend_model.forward(input)
if device == "cuda":
torch.cuda.reset_peak_memory_stats()
begin = time.time()
for i in range(amdshark_args.num_iterations):
out = frontend_model.forward(input)
end = time.time()
if device == "cuda":
stats = torch.cuda.memory_stats()
device_peak_b = stats["allocated_bytes.all.peak"]
frontend_model.to(torch.device("cpu"))
input.to(torch.device("cpu"))
torch.cuda.empty_cache()
else:
device_peak_b = None
print(
f"Torch benchmark:{amdshark_args.num_iterations/(end-begin)} iter/second, Total Iterations:{amdshark_args.num_iterations}"
)
if device == "cuda":
# Set device to CPU so we don't run into segfaults exiting pytest subprocesses.
torch_device = torch.device("cpu")
return [
f"{amdshark_args.num_iterations/(end-begin)}",
f"{((end-begin)/amdshark_args.num_iterations)*1000}",
"", # host_peak_b (CPU usage) is not reported by PyTorch.
_bytes_to_mb_str(device_peak_b),
]
def benchmark_tf(self, modelname):
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import tensorflow as tf
visible_default = tf.config.list_physical_devices("GPU")
try:
tf.config.set_visible_devices([], "GPU")
visible_devices = tf.config.get_visible_devices()
for device in visible_devices:
assert device.device_type != "GPU"
except:
# Invalid device or cannot modify virtual devices once initialized.
pass
from tank.model_utils_tf import get_tf_model
# tf_device = TF_GPU_DEVICE if self.device == "cuda" else TF_CPU_DEVICE
tf_device = TF_CPU_DEVICE
with tf.device(tf_device):
(
model,
input,
) = get_tf_model(
modelname, self.import_args
)[:2]
frontend_model = model
for i in range(amdshark_args.num_warmup_iterations):
frontend_model.forward(*input)
if tf_device == TF_GPU_DEVICE:
tf.config.experimental.reset_memory_stats(tf_device)
begin = time.time()
for i in range(amdshark_args.num_iterations):
out = frontend_model.forward(*input)
end = time.time()
if tf_device == TF_GPU_DEVICE:
memory_info = tf.config.experimental.get_memory_info(tf_device)
device_peak_b = memory_info["peak"]
else:
# tf.config.experimental does not currently support measuring
# CPU memory usage.
device_peak_b = None
print(
f"TF benchmark:{amdshark_args.num_iterations/(end-begin)} iter/second, Total Iterations:{amdshark_args.num_iterations}"
)
return [
f"{amdshark_args.num_iterations/(end-begin)}",
f"{((end-begin)/amdshark_args.num_iterations)*1000}",
"", # host_peak_b (CPU usage) is not reported by TensorFlow.
_bytes_to_mb_str(device_peak_b),
]
def benchmark_c(self):
iter_per_second, host_peak_b, device_peak_b = run_benchmark_module(
self.benchmark_cl
)
print(f"AMDShark-IREE-C benchmark:{iter_per_second} iter/second")
return [
f"{iter_per_second}",
f"{1000/iter_per_second}",
_bytes_to_mb_str(host_peak_b),
_bytes_to_mb_str(device_peak_b),
]
def benchmark_python(self, inputs):
input_list = [x for x in inputs]
for i in range(amdshark_args.num_warmup_iterations):
self.run("forward", input_list)
begin = time.time()
for i in range(amdshark_args.num_iterations):
out = self.run("forward", input_list)
end = time.time()
print(
f"AMDShark-IREE Python benchmark:{amdshark_args.num_iterations/(end-begin)} iter/second, Total Iterations:{amdshark_args.num_iterations}"
)
return [
f"{amdshark_args.num_iterations/(end-begin)}",
f"{((end-begin)/amdshark_args.num_iterations)*1000}",
]
def benchmark_onnx(self, modelname, inputs):
if self.device == "cuda":
print(
"Currently GPU benchmarking on ONNX is not supported in AMDSHARK."
)
return ["N/A", "N/A"]
else:
from onnxruntime.transformers.benchmark import run_onnxruntime
from onnxruntime.transformers.huggingface_models import MODELS
from onnxruntime.transformers.benchmark_helper import (
ConfigModifier,
Precision,
)
import psutil
if modelname == "microsoft/MiniLM-L12-H384-uncased":
modelname = "bert-base-uncased"
if modelname not in MODELS:
print(
f"{modelname} is currently not supported in ORT's HF. Check \
https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/huggingface_models.py \
for currently supported models. Exiting benchmark ONNX."
)
return ["N/A", "N/A"]
use_gpu = self.device == "cuda"
num_threads = psutil.cpu_count(logical=False)
batch_sizes = [1]
sequence_lengths = [128]
cache_dir = os.path.join(".", "cache_models")
onnx_dir = os.path.join(".", "onnx_models")
verbose = False
input_counts = [1]
optimize_onnx = True
validate_onnx = False
disable_ort_io_binding = False
use_raw_attention_mask = True
model_fusion_statistics = {}
overwrite = False
model_source = "pt" # Either "pt" or "tf"
provider = None
config_modifier = ConfigModifier(None)
onnx_args = OnnxFusionOptions()
result = run_onnxruntime(
use_gpu,
provider,
(modelname,),
None,
config_modifier,
Precision.FLOAT32,
num_threads,
batch_sizes,
sequence_lengths,
amdshark_args.num_iterations,
input_counts,
optimize_onnx,
validate_onnx,
cache_dir,
onnx_dir,
verbose,
overwrite,
disable_ort_io_binding,
use_raw_attention_mask,
model_fusion_statistics,
model_source,
onnx_args,
)
print(
f"ONNX ORT-benchmark:{result[0]['QPS']} iter/second, Total Iterations:{amdshark_args.num_iterations}"
)
return [
result[0]["QPS"],
result[0]["average_latency_ms"],
]
def get_metadata(self, modelname):
metadata_path = os.path.join(".", "tank", "model_metadata.csv")
with open(metadata_path, mode="r") as csvfile:
torch_reader = csv.reader(csvfile, delimiter=",")
fields = next(torch_reader)
for row in torch_reader:
torch_model_name = row[0]
if torch_model_name == modelname:
param_count = row[3]
model_tags = row[4]
model_notes = row[5]
return [param_count, model_tags, model_notes]
def compare_bench_results(self, baseline: str, result: str):
if baseline is not None:
# Takes a baseline and a result string and calculates a comparison, e.g. "1.04x baseline".
a = float(baseline)
b = float(result)
comparison = a / b
comp_str = f"{round(comparison, 2)}x baseline"
else:
comp_str = "N/A"
return comp_str
def benchmark_all_csv(
self,
inputs: tuple,
modelname,
dynamic,
device_str,
frontend,
import_args,
mode="native",
):
self.setup_cl(inputs)
self.import_args = import_args
self.mode = mode
field_names = [
"model",
"batch_size",
"engine",
"dialect",
"device",
"shape_type",
"data_type",
"iter/sec",
"ms/iter",
"vs. PyTorch/TF",
"iterations",
"param_count",
"tags",
"notes",
"datetime",
"host_memory_mb",
"device_memory_mb",
"measured_host_memory_mb",
"measured_device_memory_mb",
]
# "frontend" must be the first element.
if self.mode == "native":
engines = ["amdshark_python", "amdshark_iree_c"]
if self.mode == "baseline":
engines = ["frontend"]
if self.mode == "all":
engines = ["frontend", "amdshark_python", "amdshark_iree_c"]
if amdshark_args.onnx_bench == True:
engines.append("onnxruntime")
if not os.path.exists("bench_results.csv"):
with open("bench_results.csv", mode="w", newline="") as f:
writer = csv.writer(f)
writer.writerow(field_names)
with open("bench_results.csv", mode="a", newline="") as f:
writer = csv.DictWriter(f, fieldnames=field_names)
bench_info = {}
bench_info["model"] = modelname
bench_info["batch_size"] = str(import_args["batch_size"])
bench_info["dialect"] = self.mlir_dialect
bench_info["iterations"] = amdshark_args.num_iterations
if dynamic == True:
bench_info["shape_type"] = "dynamic"
else:
bench_info["shape_type"] = "static"
bench_info["device"] = device_str
if "fp16" in modelname:
bench_info["data_type"] = "float16"
else:
bench_info["data_type"] = inputs[0].dtype
for e in engines:
engine_result = {}
self.frontend_result = None
if e == "frontend":
engine_result["engine"] = frontend
if check_requirements(frontend):
(
engine_result["iter/sec"],
engine_result["ms/iter"],
engine_result["host_memory_mb"],
engine_result["device_memory_mb"],
) = self.benchmark_frontend(modelname)
self.frontend_result = engine_result["ms/iter"]
engine_result["vs. PyTorch/TF"] = "baseline"
(
engine_result["param_count"],
engine_result["tags"],
engine_result["notes"],
) = self.get_metadata(modelname)
else:
self.frontend_result = None
continue
elif e == "amdshark_python":
engine_result["engine"] = "amdshark_python"
(
engine_result["iter/sec"],
engine_result["ms/iter"],
) = self.benchmark_python(inputs)
engine_result[
"vs. PyTorch/TF"
] = self.compare_bench_results(
self.frontend_result, engine_result["ms/iter"]
)
elif e == "amdshark_iree_c":
engine_result["engine"] = "amdshark_iree_c"
(
engine_result["iter/sec"],
engine_result["ms/iter"],
engine_result["host_memory_mb"],
engine_result["device_memory_mb"],
) = self.benchmark_c()
engine_result[
"vs. PyTorch/TF"
] = self.compare_bench_results(
self.frontend_result, engine_result["ms/iter"]
)
elif e == "onnxruntime":
engine_result["engine"] = "onnxruntime"
(
engine_result["iter/sec"],
engine_result["ms/iter"],
) = self.benchmark_onnx(modelname, inputs)
engine_result["datetime"] = str(datetime.now())
writer.writerow(bench_info | engine_result)

View File

@@ -0,0 +1,241 @@
import os
import tempfile
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_importer import import_with_fx, save_mlir
import torch
import torch_mlir
from torch_mlir.compiler_utils import run_pipeline_with_repro_report
from typing import List, Tuple
from io import BytesIO
from brevitas_examples.common.generative.quantize import quantize_model
from brevitas_examples.llm.llm_quant.run_utils import get_model_impl
# fmt: off
def quantmatmul_rhs_group_quant〡shape(lhs: List[int], rhs: List[int], rhs_scale: List[int], rhs_zero_point: List[int], rhs_bit_width: int, rhs_group_size: int) -> List[int]:
if len(lhs) == 3 and len(rhs) == 2:
return [lhs[0], lhs[1], rhs[0]]
elif len(lhs) == 2 and len(rhs) == 2:
return [lhs[0], rhs[0]]
else:
raise ValueError("Input shapes not supported.")
def quantmatmul_rhs_group_quant〡dtype(lhs_rank_dtype: Tuple[int, int], rhs_rank_dtype: Tuple[int, int], rhs_scale_rank_dtype: Tuple[int, int], rhs_zero_point_rank_dtype: Tuple[int, int], rhs_bit_width: int, rhs_group_size: int) -> int:
# output dtype is the dtype of the lhs float input
lhs_rank, lhs_dtype = lhs_rank_dtype
return lhs_dtype
def quantmatmul_rhs_group_quant〡has_value_semantics(lhs, rhs, rhs_scale, rhs_zero_point, rhs_bit_width, rhs_group_size) -> None:
return
brevitas_matmul_rhs_group_quant_library = [
quantmatmul_rhs_group_quant〡shape,
quantmatmul_rhs_group_quant〡dtype,
quantmatmul_rhs_group_quant〡has_value_semantics]
# fmt: on
def load_vmfb(extended_model_name, device, mlir_dialect, extra_args=[]):
vmfb_path = os.path.join(os.getcwd(), extended_model_name + ".vmfb")
amdshark_module = None
if os.path.isfile(vmfb_path):
amdshark_module = AMDSharkInference(
None,
device=device,
mlir_dialect=mlir_dialect,
)
print(f"loading existing vmfb from: {vmfb_path}")
amdshark_module.load_module(vmfb_path, extra_args=extra_args)
return amdshark_module
def compile_module(
amdshark_module, extended_model_name, generate_vmfb, extra_args=[]
):
if generate_vmfb:
vmfb_path = os.path.join(os.getcwd(), extended_model_name + ".vmfb")
if os.path.isfile(vmfb_path):
print(f"loading existing vmfb from: {vmfb_path}")
amdshark_module.load_module(vmfb_path, extra_args=extra_args)
else:
print(
"No vmfb found. Compiling and saving to {}".format(vmfb_path)
)
path = amdshark_module.save_module(
os.getcwd(), extended_model_name, extra_args
)
amdshark_module.load_module(path, extra_args=extra_args)
else:
amdshark_module.compile(extra_args)
return amdshark_module
def compile_int_precision(
model, inputs, precision, device, generate_vmfb, extended_model_name
):
weight_bit_width = 4 if precision == "int4" else 8
weight_group_size = 128
quantize_model(
get_model_impl(model),
dtype=torch.float32,
weight_quant_type="asym",
weight_bit_width=weight_bit_width,
weight_param_method="stats",
weight_scale_precision="float_scale",
weight_quant_granularity="per_group",
weight_group_size=weight_group_size,
quantize_weight_zero_point=False,
input_bit_width=None,
input_scale_type="float",
input_param_method="stats",
input_quant_type="asym",
input_quant_granularity="per_tensor",
quantize_input_zero_point=False,
seqlen=2048,
)
print("Weight quantization applied.")
torchscript_module = import_with_fx(
model,
inputs,
precision=precision,
mlir_type="torchscript",
)
mlir_module = torch_mlir.compile(
torchscript_module,
inputs,
output_type="torch",
backend_legal_ops=["quant.matmul_rhs_group_quant"],
extra_library=brevitas_matmul_rhs_group_quant_library,
use_tracing=False,
verbose=False,
)
print(f"[DEBUG] converting torch to linalg")
run_pipeline_with_repro_report(
mlir_module,
"builtin.module(func.func(torch-unpack-quant-tensor),func.func(torch-convert-custom-quant-op),torch-backend-to-linalg-on-tensors-backend-pipeline)",
description="Lowering Torch Backend IR -> Linalg-on-Tensors Backend IR",
)
from contextlib import redirect_stdout
mlir_file_path = os.path.join(
os.getcwd(), f"{extended_model_name}_linalg.mlir"
)
with open(mlir_file_path, "w") as f:
with redirect_stdout(f):
print(mlir_module.operation.get_asm())
mlir_module = str(mlir_module)
mlir_module = mlir_module.encode("UTF-8")
mlir_module = BytesIO(mlir_module)
bytecode = mlir_module.read()
bytecode_path = os.path.join(
os.getcwd(), f"{extended_model_name}_linalg.mlirbc"
)
with open(bytecode_path, "wb") as f:
f.write(bytecode)
del bytecode
del mlir_module
print(f"Elided IR written for {extended_model_name}")
return bytecode_path
amdshark_module = AMDSharkInference(
mlir_module=bytecode_path, device=device, mlir_dialect="tm_tensor"
)
extra_args = [
"--iree-hal-dump-executable-sources-to=ies",
"--iree-vm-target-truncate-unsupported-floats",
"--iree-codegen-check-ir-before-llvm-conversion=false",
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
]
return (
compile_module(
amdshark_module,
extended_model_name=extended_model_name,
generate_vmfb=generate_vmfb,
extra_args=extra_args,
),
bytecode_path,
)
def amdshark_compile_through_fx(
model,
inputs,
extended_model_name,
precision,
f16_input_mask=None,
save_dir=tempfile.gettempdir(),
debug=False,
generate_or_load_vmfb=True,
extra_args=[],
device=None,
mlir_dialect="tm_tensor",
):
is_f16 = precision == "fp16"
if generate_or_load_vmfb:
amdshark_module = load_vmfb(
extended_model_name=extended_model_name,
device=device,
mlir_dialect=mlir_dialect,
extra_args=extra_args,
)
if amdshark_module:
return (
amdshark_module,
None,
)
from amdshark.parser import amdshark_args
if "cuda" in device:
amdshark_args.enable_tf32 = True
if precision in ["int4", "int8"]:
mlir_module = compile_int_precision(
model,
inputs,
precision,
device,
generate_or_load_vmfb,
extended_model_name,
)
extra_args = [
"--iree-hal-dump-executable-sources-to=ies",
"--iree-vm-target-truncate-unsupported-floats",
"--iree-codegen-check-ir-before-llvm-conversion=false",
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
]
else:
(
bytecode,
_,
) = import_with_fx(
model=model,
inputs=inputs,
is_f16=is_f16,
f16_input_mask=f16_input_mask,
debug=debug,
model_name=extended_model_name,
save_dir=save_dir,
)
mlir_module = save_mlir(
mlir_module=bytecode,
model_name=extended_model_name,
mlir_dialect=mlir_dialect,
)
amdshark_module = AMDSharkInference(
mlir_module,
device=device,
mlir_dialect=mlir_dialect,
)
return (
compile_module(
amdshark_module,
extended_model_name,
generate_vmfb=generate_or_load_vmfb,
extra_args=extra_args,
),
mlir_module,
)

View File

@@ -0,0 +1,297 @@
# Lint as: python3
"""AMDSHARK Downloader"""
# Requirements : Put amdshark_tank in AMDSHARK directory
# /AMDSHARK
# /gen_amdshark_tank
# /tflite
# /albert_lite_base
# /...model_name...
# /tf
# /pytorch
#
#
#
import numpy as np
import os
from tqdm.std import tqdm
import sys
from pathlib import Path
from amdshark.parser import amdshark_args
from google.cloud import storage
def download_public_file(
full_gs_url, destination_folder_name, single_file=False
):
"""Downloads a public blob from the bucket."""
# bucket_name = "gs://your-bucket-name/path/to/file"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client.create_anonymous_client()
bucket_name = full_gs_url.split("/")[2]
source_blob_name = None
dest_filename = None
desired_file = None
if single_file:
desired_file = full_gs_url.split("/")[-1]
source_blob_name = "/".join(full_gs_url.split("/")[3:-1])
destination_folder_name, dest_filename = os.path.split(
destination_folder_name
)
else:
source_blob_name = "/".join(full_gs_url.split("/")[3:])
bucket = storage_client.bucket(bucket_name)
blobs = bucket.list_blobs(prefix=source_blob_name)
if not os.path.exists(destination_folder_name):
os.mkdir(destination_folder_name)
for blob in blobs:
blob_name = blob.name.split("/")[-1]
if single_file:
if blob_name == desired_file:
destination_filename = os.path.join(
destination_folder_name, dest_filename
)
with open(destination_filename, "wb") as f:
with tqdm.wrapattr(
f, "write", total=blob.size
) as file_obj:
storage_client.download_blob_to_file(blob, file_obj)
else:
continue
else:
destination_filename = os.path.join(
destination_folder_name, blob_name
)
if os.path.isdir(destination_filename):
continue
with open(destination_filename, "wb") as f:
with tqdm.wrapattr(f, "write", total=blob.size) as file_obj:
storage_client.download_blob_to_file(blob, file_obj)
input_type_to_np_dtype = {
"float32": np.float32,
"float64": np.float64,
"bool": np.bool_,
"int32": np.int32,
"int64": np.int64,
"uint8": np.uint8,
"int8": np.int8,
}
# Save the model in the home local so it needn't be fetched everytime in the CI.
home = str(Path.home())
alt_path = os.path.join(os.path.dirname(__file__), "../gen_amdshark_tank/")
custom_path = amdshark_args.local_tank_cache
if custom_path is not None:
if not os.path.exists(custom_path):
os.mkdir(custom_path)
WORKDIR = custom_path
print(f"Using {WORKDIR} as local amdshark_tank cache directory.")
elif os.path.exists(alt_path):
WORKDIR = alt_path
print(
f"Using {WORKDIR} as amdshark_tank directory. Delete this directory if you aren't working from locally generated amdshark_tank."
)
else:
WORKDIR = os.path.join(home, ".local/amdshark_tank/")
print(
f"amdshark_tank local cache is located at {WORKDIR} . You may change this by setting the --local_tank_cache= flag"
)
os.makedirs(WORKDIR, exist_ok=True)
# Checks whether the directory and files exists.
def check_dir_exists(model_name, frontend="torch", dynamic=""):
model_dir = os.path.join(WORKDIR, model_name)
# Remove the _tf keyword from end only for non-SD models.
if not any(model in model_name for model in ["clip", "unet", "vae"]):
if frontend in ["tf", "tensorflow"]:
model_name = model_name[:-3]
elif frontend in ["tflite"]:
model_name = model_name[:-7]
elif frontend in ["torch", "pytorch"]:
model_name = model_name[:-6]
model_mlir_file_name = f"{model_name}{dynamic}_{frontend}.mlir"
if os.path.isdir(model_dir):
if (
os.path.isfile(os.path.join(model_dir, model_mlir_file_name))
and os.path.isfile(os.path.join(model_dir, "function_name.npy"))
and os.path.isfile(os.path.join(model_dir, "inputs.npz"))
and os.path.isfile(os.path.join(model_dir, "golden_out.npz"))
and os.path.isfile(os.path.join(model_dir, "hash.npy"))
):
print(
f"""Model artifacts for {model_name} found at {WORKDIR}..."""
)
return True
return False
def _internet_connected():
import requests as req
try:
req.get("http://1.1.1.1")
return True
except:
return False
def get_git_revision_short_hash() -> str:
import subprocess
if amdshark_args.amdshark_prefix is not None:
prefix_kw = amdshark_args.amdshark_prefix
else:
import json
dir_path = os.path.dirname(os.path.realpath(__file__))
src = os.path.join(dir_path, "..", "tank_version.json")
with open(src, "r") as f:
data = json.loads(f.read())
prefix_kw = data["version"]
print(f"Checking for updates from gs://amdshark_tank/{prefix_kw}")
return prefix_kw
def get_amdsharktank_prefix():
tank_prefix = ""
if not _internet_connected():
print(
"No internet connection. Using the model already present in the tank."
)
tank_prefix = "none"
else:
desired_prefix = get_git_revision_short_hash()
storage_client_a = storage.Client.create_anonymous_client()
base_bucket_name = "amdshark_tank"
base_bucket = storage_client_a.bucket(base_bucket_name)
dir_blobs = base_bucket.list_blobs(prefix=f"{desired_prefix}")
for blob in dir_blobs:
dir_blob_name = blob.name.split("/")
if desired_prefix in dir_blob_name[0]:
tank_prefix = dir_blob_name[0]
break
else:
continue
if tank_prefix == "":
print(
f"amdshark_tank bucket not found matching ({desired_prefix}). Defaulting to nightly."
)
tank_prefix = "nightly"
return tank_prefix
# Downloads the torch model from gs://amdshark_tank dir.
def download_model(
model_name,
dynamic=False,
tank_url=None,
frontend=None,
tuned=None,
import_args={"batch_size": 1},
):
model_name = model_name.replace("/", "_")
dyn_str = "_dynamic" if dynamic else ""
os.makedirs(WORKDIR, exist_ok=True)
amdshark_args.amdshark_prefix = get_amdsharktank_prefix()
if import_args["batch_size"] and import_args["batch_size"] != 1:
model_dir_name = (
model_name
+ "_"
+ frontend
+ "_BS"
+ str(import_args["batch_size"])
)
elif any(model in model_name for model in ["clip", "unet", "vae"]):
# TODO(Ean Garvey): rework extended naming such that device is only included in model_name after .vmfb compilation.
model_dir_name = model_name
else:
model_dir_name = model_name + "_" + frontend
model_dir = os.path.join(WORKDIR, model_dir_name)
if not tank_url:
tank_url = "gs://amdshark_tank/" + amdshark_args.amdshark_prefix
full_gs_url = tank_url.rstrip("/") + "/" + model_dir_name
if not check_dir_exists(
model_dir_name, frontend=frontend, dynamic=dyn_str
):
print(
f"Downloading artifacts for model {model_name} from: {full_gs_url}"
)
download_public_file(full_gs_url, model_dir)
elif amdshark_args.force_update_tank == True:
print(
f"Force-updating artifacts for model {model_name} from: {full_gs_url}"
)
download_public_file(full_gs_url, model_dir)
else:
if not _internet_connected():
print(
"No internet connection. Using the model already present in the tank."
)
else:
local_hash = str(np.load(os.path.join(model_dir, "hash.npy")))
gs_hash_url = (
tank_url.rstrip("/") + "/" + model_dir_name + "/hash.npy"
)
download_public_file(
gs_hash_url,
os.path.join(model_dir, "upstream_hash.npy"),
single_file=True,
)
try:
upstream_hash = str(
np.load(os.path.join(model_dir, "upstream_hash.npy"))
)
except FileNotFoundError:
print(f"Model artifact hash not found at {model_dir}.")
upstream_hash = None
if local_hash != upstream_hash and amdshark_args.update_tank == True:
print(f"Updating artifacts for model {model_name}...")
download_public_file(full_gs_url, model_dir)
elif local_hash != upstream_hash:
print(
"Hash does not match upstream in gs://amdshark_tank/. If you want to use locally generated artifacts, this is working as intended. Otherwise, run with --update_tank."
)
else:
print(
"Local and upstream hashes match. Using cached model artifacts."
)
model_dir = os.path.join(WORKDIR, model_dir_name)
tuned_str = "" if tuned is None else "_" + tuned
suffix = f"{dyn_str}_{frontend}{tuned_str}.mlir"
mlir_filename = os.path.join(model_dir, model_name + suffix)
print(
f"Verifying that model artifacts were downloaded successfully to {mlir_filename}..."
)
if not os.path.exists(mlir_filename):
from tank.generate_amdsharktank import gen_amdshark_files
print(
"The model data was not found. Trying to generate artifacts locally."
)
gen_amdshark_files(model_name, frontend, WORKDIR, import_args)
assert os.path.exists(mlir_filename), f"MLIR not found at {mlir_filename}"
function_name = str(np.load(os.path.join(model_dir, "function_name.npy")))
inputs = np.load(os.path.join(model_dir, "inputs.npz"))
golden_out = np.load(os.path.join(model_dir, "golden_out.npz"))
inputs_tuple = tuple([inputs[key] for key in inputs])
golden_out_tuple = tuple([golden_out[key] for key in golden_out])
return mlir_filename, function_name, inputs_tuple, golden_out_tuple

View File

@@ -0,0 +1,212 @@
from typing import Any, Dict, List, Tuple
from collections import defaultdict
from amdshark.amdshark_importer import import_with_fx, save_mlir
import torchvision.models as models
import copy
import io
import numpy as np
import sys
import torch
import torch.fx
from torch.fx.node import Node
from typing import Dict
import torch_mlir
def amdshark_backend(fx_g: torch.fx.GraphModule, inputs, device: str = "cpu"):
mlir_module = torch_mlir.compile(
fx_g, inputs, output_type="linalg-on-tensors"
)
bytecode_stream = io.BytesIO()
mlir_module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
bytecode_path = save_mlir(
bytecode,
model_name="amdshark_eager_module",
frontend="torch",
mlir_dialect="tm_tensor",
)
from amdshark.amdshark_inference import AMDSharkInference
amdshark_module = AMDSharkInference(
mlir_module=bytecode_path,
device=device,
mlir_dialect="tm_tensor",
)
amdshark_module.compile(extra_args=[])
return amdshark_module
def _make_single_op_gm(node, captured_val, compiled_graph):
"""Make a GraphModule that just executes the given node."""
g = torch.fx.Graph()
env = {}
inputs = []
for arg in node.args:
if arg and hasattr(arg, "name"):
env[arg.name] = g.placeholder(arg.name)
if isinstance(captured_val[arg.name], (list, tuple)):
for val in captured_val[arg.name]:
inputs.append(val)
else:
inputs.append(captured_val[arg.name])
call = g.node_copy(node, lambda n: env[n.name])
g.output(call)
g.lint()
single_node = torch.fx.GraphModule(torch.nn.Module(), g)
compiled_module = amdshark_backend(single_node, inputs)
compiled_graph[node.name] = {
"module": compiled_module,
"inputs": [i for i in env],
"result": None,
}
return
def compiled_graph(gm: torch.fx.GraphModule, attr_info):
compiled_graph = {}
g = gm.graph
for node in g.nodes:
if node.op == "call_function":
if not (
node.target in [torch.ops.aten.empty]
or node.name.startswith("getitem")
):
_make_single_op_gm(node, attr_info, compiled_graph)
# Currently torch.aten.empty has an compilation issue, so running natively.
elif node.target in [torch.ops.aten.empty]:
compiled_graph[node.name] = {
"target": node.target,
"args": node.args,
"kwargs": node.kwargs,
"result": None,
}
# Get item is a simple case takes a tuple and return the tensor at a particular index.
elif node.name.startswith("getitem"):
compiled_graph[node.name] = {
"input": node.args[0].name,
"pos": node.args[1],
"result": None,
}
return compiled_graph
class ShapeProp:
"""
Shape propagation. This class takes a `GraphModule`.
Then, its `propagate` method executes the `GraphModule`
node-by-node with the given arguments. As each operation
executes, the ShapeProp class stores away the shape and
element type for the output values of each operation on
the `shape` and `dtype` attributes of the operation's
`Node`.
"""
def __init__(self, mod):
self.mod = mod
self.graph = mod.graph
self.modules = dict(self.mod.named_modules())
def propagate(self, *args):
args_iter = iter(args)
env: Dict[str, Node] = {}
def load_arg(a):
return torch.fx.graph.map_arg(a, lambda n: env[n.name])
def fetch_attr(target: str):
target_atoms = target.split(".")
attr_itr = self.mod
for i, atom in enumerate(target_atoms):
if not hasattr(attr_itr, atom):
raise RuntimeError(
f"Node referenced nonexistant target {'.'.join(target_atoms[:i])}"
)
attr_itr = getattr(attr_itr, atom)
return attr_itr
for node in self.graph.nodes:
if node.op == "placeholder":
result = next(args_iter)
elif node.op == "get_attr":
result = fetch_attr(node.target)
elif node.op == "call_function":
result = node.target(
*load_arg(node.args), **load_arg(node.kwargs)
)
elif node.op == "call_method":
self_obj, *args = load_arg(node.args)
kwargs = load_arg(node.kwargs)
result = getattr(self_obj, node.target)(*args, **kwargs)
elif node.op == "call_module":
result = self.modules[node.target](
*load_arg(node.args), **load_arg(node.kwargs)
)
# This is the only code specific to shape propagation.
# you can delete this `if` branch and this becomes
# a generic GraphModule interpreter.
if isinstance(result, torch.Tensor):
node.shape = result.shape
node.dtype = result.dtype
env[node.name] = result
return env
# return load_arg(self.graph.result)
resnet18 = models.resnet18(pretrained=True)
resnet18.train(False)
input = (torch.randn(1, 3, 224, 224),)
print(resnet18(input[0]))
fx_graph = import_with_fx(resnet18, input, mlir_type="fx")
shape_prop = ShapeProp(fx_graph)
x = shape_prop.propagate(input[0])
amdshark_graph = compiled_graph(fx_graph, x)
for key in amdshark_graph:
if key.startswith("getitem"):
input_val = amdshark_graph[key]["input"]
pos = amdshark_graph[key]["pos"]
if input_val not in amdshark_graph:
amdshark_graph[key]["result"] = x[input_val][pos].detach()
else:
amdshark_graph[key]["result"] = amdshark_graph[input_val]["result"][
pos
].detach()
elif key.startswith("empty"):
operator = amdshark_graph[key]["target"]
args = amdshark_graph[key]["args"]
kwargs = amdshark_graph[key]["kwargs"]
amdshark_graph[key]["result"] = operator(*args, **kwargs).detach()
else:
input_val = amdshark_graph[key]["inputs"]
input_tensors = []
for input in input_val:
if input not in amdshark_graph:
input_tensors.append(x[input].detach())
else:
input_tensors.append(amdshark_graph[input]["result"])
val = amdshark_graph[key]["module"]("forward", input_tensors)
if isinstance(val, (tuple, list)):
list_val = []
for v in val:
list_val.append(torch.from_numpy(v))
amdshark_graph[key]["result"] = list_val
else:
amdshark_graph[key]["result"] = torch.from_numpy(val)
print(amdshark_graph)

View File

@@ -0,0 +1,153 @@
import re
import json
import numpy as np
import torch_mlir
from iree.compiler import compile_file
from amdshark.amdshark_importer import import_with_fx, get_f16_inputs, save_mlir
class GenerateConfigFile:
def __init__(
self,
model,
num_sharding_stages: int,
sharding_stages_id: list[str],
units_in_each_stage: list[int],
model_input=None,
config_file_path="model_config.json",
):
self.model = model
self.num_sharding_stages = num_sharding_stages
self.sharding_stages_id = sharding_stages_id
assert self.num_sharding_stages == len(
self.sharding_stages_id
), "Number of sharding stages should be equal to the list of their ID"
self.model_input = model_input
self.config_file_path = config_file_path
# (Nithin) this is a quick fix - revisit and rewrite
self.units_in_each_stage = np.array(units_in_each_stage)
self.track_loop = np.zeros(len(self.sharding_stages_id)).astype(int)
def split_into_dispatches(
self,
backend,
fx_tracing_required=False,
f16_model=False,
torch_mlir_tracing=True,
):
graph_for_compilation = self.model
if fx_tracing_required:
graph_for_compilation = import_with_fx(
self.model,
self.model_input,
is_f16=f16_model,
f16_input_mask=[False, False],
mlir_type="torchscript",
)
module = torch_mlir.compile(
graph_for_compilation,
(self.model_input),
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=torch_mlir_tracing,
verbose=False,
)
module = module.operation.get_asm(large_elements_limit=4)
module_file = save_mlir(
module,
model_name="module_pre_split",
frontend="torch",
mlir_dialect="linalg",
)
compiled_module_str = str(
compile_file(
module_file,
target_backends=[backend],
extra_args=[
"--compile-to=flow",
"--mlir-elide-elementsattrs-if-larger=4",
],
)
)
substring_start_idx = [
m.start()
for m in re.finditer("flow.dispatch @", compiled_module_str)
]
dispatch_list = dict()
# dispatch_no is the 'i'th index of a dispatch out of n total dispatches of a model
# dispatch_id is the unique id of a dispatch, multiple instances of the same dispatch
# can occur in a model
for dispatch_no, substring_idx in enumerate(substring_start_idx):
dispatch_idx = (
compiled_module_str[substring_idx:]
.split(":")[0]
.split("@")[-1]
)
key = "dispatch_no_" + str(dispatch_no)
dispatch_list[key] = {n: "None" for n in self.sharding_stages_id}
dispatch_list[key]["dispatch_id"] = dispatch_idx
self.generate_json(dispatch_list)
def split_into_layers(self):
model_dictionary = dict()
for name, m in self.model.named_modules():
if name == "":
continue
# Remove non-leaf nodes from the config as they aren't an operation
substring_before_final_period = name.split(".")[:-1]
substring_before_final_period = ".".join(
substring_before_final_period
)
if substring_before_final_period in model_dictionary:
del model_dictionary[substring_before_final_period]
# layer_dict = {n: "None" for n in self.sharding_stages_id}
# By default embed increasing device id's for each layer
increasing_wraparound_idx_list = (
self.track_loop % self.units_in_each_stage
)
layer_dict = {
n: int(increasing_wraparound_idx_list[idx][0][0])
for idx, n in enumerate(self.sharding_stages_id)
}
self.track_loop += 1
model_dictionary[name] = layer_dict
self.generate_json(model_dictionary)
def generate_json(self, artifacts):
with open(self.config_file_path, "w") as outfile:
json.dump(artifacts, outfile)
if __name__ == "__main__":
import torch
from transformers import AutoTokenizer
hf_model_path = "TheBloke/vicuna-7B-1.1-HF"
tokenizer = AutoTokenizer.from_pretrained(hf_model_path, use_fast=False)
compilation_prompt = "".join(["0" for _ in range(17)])
compilation_input_ids = tokenizer(
compilation_prompt,
return_tensors="pt",
).input_ids
compilation_input_ids = torch.tensor(compilation_input_ids).reshape(
[1, 19]
)
firstVicunaCompileInput = (compilation_input_ids,)
from apps.language_models.src.model_wrappers.vicuna_model import (
FirstVicuna,
SecondVicuna7B,
CombinedModel,
)
model = CombinedModel()
c = GenerateConfigFile(model, 1, ["gpu_id"], firstVicunaCompileInput)
c.split_into_layers()

View File

@@ -0,0 +1,819 @@
# Lint as: python3
"""AMDSHARK Importer"""
import sys
import tempfile
import os
import hashlib
from apps.amdshark_studio.modules.shared_cmd_opts import cmd_opts
def create_hash(file_name):
with open(file_name, "rb") as f:
file_hash = hashlib.blake2b(digest_size=64)
while chunk := f.read(2**10):
file_hash.update(chunk)
return file_hash.hexdigest()
# List of the supported frontends.
supported_frontends = {
"tensorflow",
"tf",
"pytorch",
"torch",
"tf-lite",
"tflite",
}
class AMDSharkImporter:
"""
AMDSharkImporter converts frontend modules into a
mlir_module. The supported frameworks are tensorflow,
pytorch, and tf-lite.
...
Attributes
----------
module :
torch, tensorflow or tf-lite module.
inputs :
inputs to the module, may be required for the shape
information.
frontend: str
frontend to which the module belongs.
raw_model_file: str
temp tflite model path
Methods
-------
import_mlir(is_dynamic, tracing_required, func_name):
is_dynamic: input shapes to be totally dynamic (pytorch specific).
tracing_required: whether tracing is required (pytorch specific.
func_name: The function to be traced out or imported to mlir.
import_debug(is_dynamic, tracing_required, func_name):
returns the converted (mlir_module,func_name) with inputs and golden
outputs.
The inputs and outputs are converted into np array.
"""
def __init__(
self,
module,
inputs: tuple = (),
frontend: str = "torch",
raw_model_file: str = "",
return_str: bool = False,
):
self.module = module
self.inputs = None if len(inputs) == 0 else inputs
self.frontend = frontend
if not self.frontend in supported_frontends:
print(
f"The frontend is not in the supported_frontends: {supported_frontends}"
)
sys.exit(1)
self.raw_model_file = raw_model_file
self.return_str = return_str
# NOTE: The default function for torch is "forward" and tf-lite is "main".
def _torch_mlir(self, is_dynamic, tracing_required, mlir_type):
from amdshark.torch_mlir_utils import get_torch_mlir_module
return get_torch_mlir_module(
self.module,
self.inputs,
is_dynamic,
tracing_required,
self.return_str,
mlir_type,
)
def _tf_mlir(self, func_name, save_dir="."):
from iree.compiler import tf as tfc
return tfc.compile_module(
self.module,
exported_names=[func_name],
import_only=True,
output_file=save_dir,
)
def _tflite_mlir(self, func_name, save_dir="."):
from iree.compiler import tflite as tflitec
self.mlir_model = tflitec.compile_file(
self.raw_model_file, # in tflite, it is a path to .tflite file, not a tflite interpreter
input_type="tosa",
import_only=True,
output_file=save_dir,
)
return self.mlir_model
# Adds the conversion of the frontend with the private function.
def import_mlir(
self,
is_dynamic=False,
tracing_required=False,
func_name="forward",
save_dir=cmd_opts.tmp_dir, #"./amdshark_tmp/",
mlir_type="linalg",
):
if self.frontend in ["torch", "pytorch"]:
if self.inputs == None:
print(
"Please pass in the inputs, the inputs are required to determine the shape of the mlir_module"
)
sys.exit(1)
return (
self._torch_mlir(is_dynamic, tracing_required, mlir_type),
func_name,
)
if self.frontend in ["tf", "tensorflow"]:
return self._tf_mlir(func_name, save_dir), func_name
if self.frontend in ["tflite", "tf-lite"]:
func_name = "main"
return self._tflite_mlir(func_name, save_dir), func_name
# Converts the frontend specific tensors into np array.
def convert_to_numpy(self, array_tuple: tuple):
if self.frontend in ["torch", "pytorch"]:
return [x.detach().cpu().numpy() for x in array_tuple]
if self.frontend in ["tf", "tensorflow"]:
return [x.numpy() for x in array_tuple]
# Saves `function_name.npy`, `inputs.npz`, `golden_out.npz` and `model_name.mlir` in the directory `dir`.
def save_data(
self,
dir,
model_name,
mlir_data,
func_name,
inputs,
outputs,
mlir_type="linalg",
):
import numpy as np
inputs_name = "inputs.npz"
outputs_name = "golden_out.npz"
func_file_name = "function_name"
model_name_mlir = (
model_name + "_" + self.frontend + "_" + mlir_type + ".mlir"
)
print(f"saving {model_name_mlir} to {dir}")
try:
inputs = [x.cpu().detach() for x in inputs]
except AttributeError:
try:
inputs = [x.numpy() for x in inputs]
except AttributeError:
inputs = [x for x in inputs]
np.savez(os.path.join(dir, inputs_name), *inputs)
np.savez(os.path.join(dir, outputs_name), *outputs)
np.save(os.path.join(dir, func_file_name), np.array(func_name))
if self.frontend == "torch":
with open(os.path.join(dir, model_name_mlir), "wb") as mlir_file:
mlir_file.write(mlir_data)
hash_gen_attempts = 2
for i in range(hash_gen_attempts):
try:
mlir_hash = create_hash(os.path.join(dir, model_name_mlir))
except FileNotFoundError as err:
if i < hash_gen_attempts:
continue
else:
raise err
np.save(os.path.join(dir, "hash"), np.array(mlir_hash))
return
def import_debug(
self,
is_dynamic=False,
tracing_required=False,
func_name="forward",
dir=tempfile.gettempdir(),
model_name="model",
golden_values=None,
mlir_type="linalg",
):
if self.inputs == None:
print(
f"There is no input provided: {self.inputs}, please provide inputs or simply run import_mlir."
)
sys.exit(1)
model_name_mlir = (
model_name + "_" + self.frontend + "_" + mlir_type + ".mlir"
)
artifact_path = os.path.join(dir, model_name_mlir)
imported_mlir = self.import_mlir(
is_dynamic,
tracing_required,
func_name,
save_dir=artifact_path,
mlir_type=mlir_type,
)
# TODO: Make sure that any generic function name is accepted. Currently takes in the default function names.
# TODO: Check for multiple outputs.
if self.frontend in ["torch", "pytorch"]:
import torch
golden_out = None
if golden_values is not None:
golden_out = golden_values
else:
golden_out = self.module(*self.inputs)
if torch.is_tensor(golden_out):
golden_out = tuple(
golden_out.detach().cpu().numpy(),
)
else:
golden_out = self.convert_to_numpy(golden_out)
# Save the artifacts in the directory dir.
self.save_data(
dir,
model_name,
imported_mlir[0],
imported_mlir[1],
self.inputs,
golden_out,
mlir_type,
)
return (
imported_mlir,
self.convert_to_numpy(self.inputs),
golden_out,
)
if self.frontend in ["tf", "tensorflow"]:
import tensorflow as tf
golden_out = self.module.forward(*self.inputs)
if tf.is_tensor(golden_out):
golden_out = tuple(
golden_out.numpy(),
)
elif golden_out is tuple:
golden_out = self.convert_to_numpy(golden_out)
elif hasattr(golden_out, "logits"):
# from transformers import TFSequenceClassifierOutput
golden_out = golden_out.logits
else:
golden_out = golden_out.last_hidden_state
# Save the artifacts in the directory dir.
self.save_data(
dir,
model_name,
imported_mlir[0],
imported_mlir[1],
self.inputs,
golden_out,
)
return (
imported_mlir,
self.convert_to_numpy(self.inputs),
golden_out,
)
if self.frontend in ["tflite", "tf-lite"]:
# TODO(Chi): Validate it for tflite models.
golden_out = self.module.invoke_tflite(self.inputs)
self.save_data(
dir,
model_name,
imported_mlir[0],
imported_mlir[1],
self.inputs,
golden_out,
)
return (
imported_mlir,
self.inputs,
golden_out,
)
def get_f16_inputs(inputs, is_f16, f16_input_mask):
if is_f16 == False:
return inputs
if f16_input_mask == None:
return tuple([x.half() for x in inputs])
f16_masked_inputs = []
for i in range(len(inputs)):
if f16_input_mask[i]:
f16_masked_inputs.append(inputs[i].half())
else:
f16_masked_inputs.append(inputs[i])
return tuple(f16_masked_inputs)
# Upcasts the block/list of ops.
def add_upcast(fx_g):
import torch
for node in fx_g.graph.nodes:
if node.target in [torch.ops.aten.mul]:
# This is a very strict check.
if hasattr(node.args[1], "target"):
if (
node.args[1].target in [torch.ops.aten.rsqrt]
and node.args[1].args[0].target in [torch.ops.aten.add]
and node.args[1].args[0].args[0].target
in [torch.ops.aten.mean]
and node.args[1].args[0].args[0].args[0].target
in [torch.ops.aten.pow]
):
print("found an upcasting block let's upcast it.")
pow_node = node.args[1].args[0].args[0].args[0]
mul_node = node
with fx_g.graph.inserting_before(pow_node):
lhs = pow_node.args[0]
upcast_lhs = fx_g.graph.call_function(
torch.ops.aten._to_copy,
args=(lhs,),
kwargs={"dtype": torch.float32},
)
pow_node.args = (upcast_lhs, pow_node.args[1])
with fx_g.graph.inserting_before(mul_node):
new_node = fx_g.graph.call_function(
torch.ops.aten._to_copy,
args=(mul_node,),
kwargs={"dtype": torch.float16},
)
mul_node.append(new_node)
mul_node.replace_all_uses_with(new_node)
new_node.args = (mul_node,)
new_node.kwargs = {"dtype": torch.float16}
fx_g.graph.lint()
def transform_fx(fx_g, quantized=False):
import torch
kwargs_dict = {
"dtype": torch.float16,
"device": torch.device(type="cpu"),
"pin_memory": False,
}
kwargs_dict1 = {
"dtype": torch.float16,
}
for node in fx_g.graph.nodes:
if node.op == "call_function":
# aten.empty should be filled with zeros.
if node.target in [torch.ops.aten.empty]:
with fx_g.graph.inserting_after(node):
new_node = fx_g.graph.call_function(
torch.ops.aten.zero_,
args=(node,),
)
node.append(new_node)
node.replace_all_uses_with(new_node)
new_node.args = (node,)
if quantized:
continue
if node.target in [
torch.ops.aten.arange,
torch.ops.aten.empty,
torch.ops.aten.zeros,
torch.ops.aten.zeros_like,
]:
if node.kwargs.get("dtype") == torch.float32:
node.kwargs = kwargs_dict
# Vicuna
if node.target in [
torch.ops.aten._to_copy,
]:
if node.kwargs.get("dtype") == torch.float32:
node.kwargs = kwargs_dict1
if node.target in [
torch.ops.aten.masked_fill,
]:
if node.args[2] > torch.finfo(torch.half).max:
max_val = torch.finfo(torch.half).max
node.args = (node.args[0], node.args[1], max_val)
elif node.args[2] < torch.finfo(torch.half).min:
min_val = torch.finfo(torch.half).min
node.args = (node.args[0], node.args[1], min_val)
if node.target in [
torch.ops.aten.full,
]:
if node.args[1] > torch.finfo(torch.half).max:
max_val = torch.finfo(torch.half).max
node.args = (node.args[0], max_val)
node.kwargs = kwargs_dict
elif node.args[1] < torch.finfo(torch.half).min:
min_val = torch.finfo(torch.half).min
node.args = (node.args[0], min_val)
node.kwargs = kwargs_dict
# Inputs and outputs of aten.var.mean should be upcasted to fp32.
if node.target in [torch.ops.aten.var_mean]:
with fx_g.graph.inserting_before(node):
new_node = fx_g.graph.call_function(
torch.ops.prims.convert_element_type,
args=(node.args[0], torch.float32),
kwargs={},
)
node.args = (new_node, node.args[1])
if node.name.startswith("getitem"):
with fx_g.graph.inserting_before(node):
if node.args[0].target in [torch.ops.aten.var_mean]:
new_node = fx_g.graph.call_function(
torch.ops.aten._to_copy,
args=(node,),
kwargs={"dtype": torch.float16},
)
node.append(new_node)
node.replace_all_uses_with(new_node)
new_node.args = (node,)
new_node.kwargs = {"dtype": torch.float16}
# Required for cuda debugging.
# for node in fx_g.graph.nodes:
# if node.op == "call_function":
# if node.kwargs.get("device") == torch.device(type="cpu"):
# new_kwargs = node.kwargs.copy()
# new_kwargs["device"] = torch.device(type="cuda")
# node.kwargs = new_kwargs
fx_g.graph.lint()
def gptq_transforms(fx_g):
import torch
for node in fx_g.graph.nodes:
if node.op == "call_function":
if node.target in [
torch.ops.aten.arange,
torch.ops.aten.empty,
torch.ops.aten.ones,
torch.ops.aten._to_copy,
]:
if node.kwargs.get("device") == torch.device(device="cuda:0"):
updated_kwargs = node.kwargs.copy()
updated_kwargs["device"] = torch.device(device="cpu")
node.kwargs = updated_kwargs
if node.target in [
torch.ops.aten._to_copy,
]:
if node.kwargs.get("dtype") == torch.bfloat16:
updated_kwargs = node.kwargs.copy()
updated_kwargs["dtype"] = torch.float16
node.kwargs = updated_kwargs
# Inputs of aten.native_layer_norm should be upcasted to fp32.
if node.target in [torch.ops.aten.native_layer_norm]:
with fx_g.graph.inserting_before(node):
new_node_arg0 = fx_g.graph.call_function(
torch.ops.prims.convert_element_type,
args=(node.args[0], torch.float32),
kwargs={},
)
node.args = (
new_node_arg0,
node.args[1],
node.args[2],
node.args[3],
node.args[4],
)
# Inputs of aten.mm should be upcasted to fp32.
if node.target in [torch.ops.aten.mm]:
with fx_g.graph.inserting_before(node):
new_node_arg0 = fx_g.graph.call_function(
torch.ops.prims.convert_element_type,
args=(node.args[0], torch.float32),
kwargs={},
)
new_node_arg1 = fx_g.graph.call_function(
torch.ops.prims.convert_element_type,
args=(node.args[1], torch.float32),
kwargs={},
)
node.args = (new_node_arg0, new_node_arg1)
# Outputs of aten.mm should be downcasted to fp16.
if type(node.args[0]) == torch.fx.node.Node and node.args[
0
].target in [torch.ops.aten.mm]:
with fx_g.graph.inserting_before(node):
tmp = node.args[0]
new_node = fx_g.graph.call_function(
torch.ops.aten._to_copy,
args=(node.args[0],),
kwargs={"dtype": torch.float16},
)
node.args[0].append(new_node)
node.args[0].replace_all_uses_with(new_node)
new_node.args = (tmp,)
new_node.kwargs = {"dtype": torch.float16}
# Inputs of aten._softmax should be upcasted to fp32.
if node.target in [torch.ops.aten._softmax]:
with fx_g.graph.inserting_before(node):
new_node_arg0 = fx_g.graph.call_function(
torch.ops.prims.convert_element_type,
args=(node.args[0], torch.float32),
kwargs={},
)
node.args = (new_node_arg0, node.args[1], node.args[2])
# Outputs of aten._softmax should be downcasted to fp16.
if (
type(node.args[0]) == torch.fx.node.Node
and node.args[0].target in [torch.ops.aten._softmax]
and node.target in [torch.ops.aten.expand]
):
with fx_g.graph.inserting_before(node):
tmp = node.args[0]
new_node = fx_g.graph.call_function(
torch.ops.aten._to_copy,
args=(node.args[0],),
kwargs={"dtype": torch.float16},
)
node.args[0].append(new_node)
node.args[0].replace_all_uses_with(new_node)
new_node.args = (tmp,)
new_node.kwargs = {"dtype": torch.float16}
fx_g.graph.lint()
# Doesn't replace the None type.
def change_fx_graph_return_to_tuple(fx_g):
for node in fx_g.graph.nodes:
if node.op == "output":
# output nodes always have one argument
node_arg = node.args[0]
out_nodes = []
if isinstance(node_arg, list):
# Don't return NoneType elements.
for out_node in node_arg:
if not isinstance(out_node, type(None)):
out_nodes.append(out_node)
# If there is a single tensor/element to be returned don't
# a tuple for it.
if len(out_nodes) == 1:
node.args = out_nodes
else:
node.args = (tuple(out_nodes),)
fx_g.graph.lint()
fx_g.recompile()
return fx_g
def flatten_training_input(inputs):
flattened_input = []
for i in inputs:
if isinstance(i, dict):
for value in i.values():
flattened_input.append(value.detach())
elif isinstance(i, tuple):
for value in i:
flattened_input.append(value)
else:
flattened_input.append(i)
return tuple(flattened_input)
# TODO: Remove is_f16 and fix all calls with using precision instead
# Applies fx conversion to the model and imports the mlir.
def import_with_fx(
model,
inputs,
is_f16=False,
f16_input_mask=None,
debug=False,
training=False,
return_str=False,
save_dir=tempfile.gettempdir(),
model_name="model",
mlir_type="linalg",
is_dynamic=False,
tracing_required=False,
precision="fp32",
is_gptq=False,
):
import torch
from torch.fx.experimental.proxy_tensor import make_fx
from torch._decomp import get_decompositions
from typing import List
golden_values = None
if debug:
try:
golden_values = model(*inputs)
except:
golden_values = None
def _remove_nones(fx_g: torch.fx.GraphModule) -> List[int]:
removed_indexes = []
for node in fx_g.graph.nodes:
if node.op == "output":
assert (
len(node.args) == 1
), "Output node must have a single argument"
node_arg = node.args[0]
if isinstance(node_arg, (list, tuple)):
node_arg = list(node_arg)
node_args_len = len(node_arg)
for i in range(node_args_len):
curr_index = node_args_len - (i + 1)
if node_arg[curr_index] is None:
removed_indexes.append(curr_index)
node_arg.pop(curr_index)
node.args = (tuple(node_arg),)
break
if len(removed_indexes) > 0:
fx_g.graph.lint()
fx_g.graph.eliminate_dead_code()
fx_g.recompile()
removed_indexes.sort()
return removed_indexes
def _unwrap_single_tuple_return(fx_g: torch.fx.GraphModule) -> bool:
"""
Replace tuple with tuple element in functions that return one-element tuples.
Returns true if an unwrapping took place, and false otherwise.
"""
unwrapped_tuple = False
for node in fx_g.graph.nodes:
if node.op == "output":
assert (
len(node.args) == 1
), "Output node must have a single argument"
node_arg = node.args[0]
if isinstance(node_arg, tuple):
if len(node_arg) == 1:
node.args = (node_arg[0],)
unwrapped_tuple = True
break
if unwrapped_tuple:
fx_g.graph.lint()
fx_g.recompile()
return unwrapped_tuple
# TODO: Control the decompositions.
decomps_list = [
torch.ops.aten.embedding_dense_backward,
torch.ops.aten.native_layer_norm_backward,
torch.ops.aten.slice_backward,
torch.ops.aten.select_backward,
torch.ops.aten.norm.ScalarOpt_dim,
torch.ops.aten.native_group_norm,
torch.ops.aten.upsample_bilinear2d.vec,
torch.ops.aten.split.Tensor,
torch.ops.aten.split_with_sizes,
torch.ops.aten.native_layer_norm,
torch.ops.aten.masked_fill.Tensor,
torch.ops.aten.masked_fill.Scalar,
torch.ops.aten._scaled_dot_product_flash_attention.default,
torch.ops.aten.index_add,
torch.ops.aten.index_add_,
]
if precision in ["int4", "int8"] and not is_gptq:
from brevitas_examples.llm.llm_quant.export import (
block_quant_layer_level_manager,
)
from brevitas_examples.llm.llm_quant.export import (
brevitas_layer_export_mode,
)
from brevitas_examples.llm.llm_quant.sharded_mlir_group_export import (
LinearWeightBlockQuantHandlerFwd,
)
from brevitas_examples.llm.llm_quant.export import (
replace_call_fn_target,
)
from brevitas_examples.llm.llm_quant.sharded_mlir_group_export import (
matmul_rhs_group_quant_placeholder,
)
from brevitas.backport.fx.experimental.proxy_tensor import (
make_fx as brevitas_make_fx,
)
export_context_manager = brevitas_layer_export_mode
export_class = block_quant_layer_level_manager(
export_handlers=[LinearWeightBlockQuantHandlerFwd]
)
with export_context_manager(model, export_class):
fx_g = brevitas_make_fx(
model,
decomposition_table=get_decompositions(decomps_list),
)(*inputs)
transform_fx(fx_g, quantized=True)
replace_call_fn_target(
fx_g,
src=matmul_rhs_group_quant_placeholder,
target=torch.ops.quant.matmul_rhs_group_quant,
)
fx_g.recompile()
removed_none_indexes = _remove_nones(fx_g)
was_unwrapped = _unwrap_single_tuple_return(fx_g)
else:
fx_g = make_fx(
model,
decomposition_table=get_decompositions(decomps_list),
)(*inputs)
fx_g.graph.set_codegen(torch.fx.graph.CodeGen())
fx_g.recompile()
def strip_overloads(gm):
"""
Modifies the target of graph nodes in :attr:`gm` to strip overloads.
Args:
gm(fx.GraphModule): The input Fx graph module to be modified
"""
for node in gm.graph.nodes:
if isinstance(node.target, torch._ops.OpOverload):
node.target = node.target.overloadpacket
gm.recompile()
strip_overloads(fx_g)
if is_f16:
fx_g = fx_g.half()
transform_fx(fx_g)
# TODO: Have to make it more generic.
add_upcast(fx_g)
fx_g.recompile()
if is_gptq:
gptq_transforms(fx_g)
fx_g.recompile()
if mlir_type == "fx":
return fx_g
if training:
change_fx_graph_return_to_tuple(fx_g)
inputs = flatten_training_input(inputs)
ts_graph = torch.jit.script(fx_g)
if mlir_type == "torchscript":
return ts_graph
inputs = get_f16_inputs(inputs, is_f16, f16_input_mask)
mlir_importer = AMDSharkImporter(
ts_graph,
inputs,
frontend="torch",
return_str=return_str,
)
if debug: # and not is_f16:
(mlir_module, func_name), _, _ = mlir_importer.import_debug(
dir=save_dir,
model_name=model_name,
golden_values=golden_values,
mlir_type=mlir_type,
is_dynamic=is_dynamic,
tracing_required=tracing_required,
)
return mlir_module, func_name
mlir_module, func_name = mlir_importer.import_mlir(mlir_type=mlir_type)
return mlir_module, func_name
# Saves a .mlir module python object to the directory 'dir' with 'model_name' and returns a path to the saved file.
def save_mlir(
mlir_module,
model_name,
mlir_dialect="linalg",
frontend="torch",
dir="",
):
model_name_mlir = (
model_name + "_" + frontend + "_" + mlir_dialect + ".mlir"
)
if dir == "":
dir = cmd_opts.tmp_dir, #os.path.join(".", "amdshark_tmp")
mlir_path = os.path.join(dir, model_name_mlir)
print(f"saving {model_name_mlir} to {dir}")
if not os.path.exists(dir):
os.makedirs(dir)
if frontend == "torch":
with open(mlir_path, "wb") as mlir_file:
mlir_file.write(mlir_module)
return mlir_path

View File

@@ -0,0 +1,243 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from amdshark.iree_utils.compile_utils import (
export_iree_module_to_vmfb,
load_flatbuffer,
create_dispatch_dirs,
compile_benchmark_dirs,
)
import os
from amdshark.amdshark_runner import AMDSharkRunner
from amdshark.parser import amdshark_args
import numpy as np
dtype_to_np_dtype = {
"f32": np.float32,
"f64": np.float64,
"i32": np.int32,
"i64": np.int64,
"i1": np.bool_,
}
class AMDSharkInference:
"""
Runs prediction or inference on mlir_module.
...
Attributes
----------
mlir_module : str
mlir_module or path represented in string; modules from torch-mlir are serialized in bytecode format.
device : str
device to execute the mlir_module on.
currently supports cpu, cuda, vulkan, and metal backends.
mlir_dialect: str
The dialect in which the given mlir_module is in.
Refer to {https://mlir.llvm.org/docs/Dialects/}
is_benchmark: bool
Whether this AMDSharkInference module should be benchmark-enabled.
mmap: bool
Whether to load/run vmfb using mmap. It's `True` by default.
Methods
-------
__call__(function_name, inputs=None):
Runs the function with `function_name` within the mlir_module along
with the given inputs, if the inputs are not given it autogenerates the
inputs. Also, the inputs should be a numpy array.
input_info():
Gives the information about the inputs required by the `function_name`.
This can be expensive as it does string matching to do so.
"""
def __init__(
self,
mlir_module,
device: str = "none",
mlir_dialect: str = "linalg",
is_benchmark: bool = False,
dispatch_benchmark: str = None,
dispatch_benchmark_dir: str = "temp_dispatch_benchmarks",
device_idx: int = None,
mmap: bool = True,
rt_flags: list = [],
):
self.mlir_module = mlir_module
if mlir_module is not None:
if mlir_module and not os.path.isfile(mlir_module):
print(
"Warning: Initializing AMDSharkInference with a mlir string/bytecode object will duplicate the model in RAM at compile time. To avoid this, initialize AMDSharkInference with a path to a MLIR module on your hard disk instead."
)
self.compile_str = True
else:
self.compile_str = False
self.device = amdshark_args.device if device == "none" else device
self.mlir_dialect = mlir_dialect
self.is_benchmark = is_benchmark
self.device_idx = device_idx
self.dispatch_benchmarks = (
amdshark_args.dispatch_benchmarks
if dispatch_benchmark is None
else dispatch_benchmark
)
self.dispatch_benchmarks_dir = (
amdshark_args.dispatch_benchmarks_dir
if dispatch_benchmark_dir == "temp_dispatch_benchmarks"
else dispatch_benchmark_dir
)
self.amdshark_runner = None
self.mmap = mmap
self.rt_flags = rt_flags
def compile(self, extra_args=[]):
if self.dispatch_benchmarks is not None:
extra_args.append(
f"--iree-hal-dump-executable-sources-to={self.dispatch_benchmarks_dir}"
)
extra_args.append(
f"--iree-hal-dump-executable-binaries-to={self.dispatch_benchmarks_dir}"
)
temp_dir = self.dispatch_benchmarks_dir.split("/")
temp_dir[-1] = "temp_" + temp_dir[-1]
temp_dir = "/".join(temp_dir)
self.temp_dispatch_benchmarks_dir = temp_dir
extra_args.append(
f"--iree-hal-dump-executable-benchmarks-to={self.temp_dispatch_benchmarks_dir}"
)
if self.is_benchmark == True:
from amdshark.amdshark_benchmark_runner import AMDSharkBenchmarkRunner
self.amdshark_runner = AMDSharkBenchmarkRunner(
self.mlir_module,
self.device,
self.mlir_dialect,
extra_args=extra_args,
)
else:
self.amdshark_runner = AMDSharkRunner(
self.mlir_module,
self.device,
self.mlir_dialect,
extra_args=extra_args,
device_idx=self.device_idx,
rt_flags=self.rt_flags,
)
if self.dispatch_benchmarks is not None:
create_dispatch_dirs(self.dispatch_benchmarks_dir, self.device)
compile_benchmark_dirs(
self.dispatch_benchmarks_dir,
self.device,
self.dispatch_benchmarks,
)
os.system(f"rm -rf {self.temp_dispatch_benchmarks_dir}")
# inputs are considered to be tuple of np.array.
def __call__(self, function_name: str, inputs: tuple, send_to_host=True):
return self.amdshark_runner.run(
function_name, inputs, send_to_host, device=self.device
)
# forward function.
def forward(self, inputs: tuple, send_to_host=True):
return self.amdshark_runner.run(
"forward", inputs, send_to_host, device=self.device
)
# Get all function names defined within the compiled module.
def get_functions_in_module(self):
return self.amdshark_runner.get_functions_in_module()
# Captures the static input information from the mlir_module.
# TODO(pashu123): Generate the input information for dynamic shapes.
def _input_info(self, function_name):
# func_key to get the line which contains the function.
func_key = "func.func @" + function_name
func_header = None
for line in str(self.mlir_module).splitlines():
if func_key in line:
func_header = line
break
if func_header is None:
print(f"Function: {function_name} not found")
import re
inputs = re.findall("\(.*?\)", func_header)[0].split(",")
shapes = []
dtype = []
for inp in inputs:
shape_dtype = re.findall(r"<[^>]*>", inp)[0].split("x")
shape_dtype[0], shape_dtype[-1] = (
shape_dtype[0][1:],
shape_dtype[-1][:-1],
)
shapes.append(tuple([int(x) for x in shape_dtype[:-1]]))
dtype.append(shape_dtype[-1])
return shapes, dtype
# Generates random input to be feed into the graph.
def generate_random_inputs(self, low=0, high=1):
shapes, dtype = self._input_info()
inputs = []
for i, j in zip(shapes, dtype):
inputs.append(
np.random.uniform(low, high, size=i).astype(
dtype_to_np_dtype[j]
)
)
return tuple(inputs)
# TODO: Instead of passing directory and having names decided by the module
# , user may want to save the module with manual names.
def save_module(
self, dir=os.getcwd(), module_name=None, extra_args=[], debug=False
):
return export_iree_module_to_vmfb(
self.mlir_module,
self.device,
dir,
self.mlir_dialect,
module_name=module_name,
extra_args=extra_args,
debug=debug,
compile_str=self.compile_str,
)
# load and return the module.
def load_module(self, path, extra_args=[]):
self.amdshark_runner = AMDSharkRunner(
device=self.device,
compile_vmfb=False,
extra_args=extra_args,
rt_flags=self.rt_flags,
)
params = load_flatbuffer(
path,
self.device,
self.device_idx,
mmap=self.mmap,
rt_flags=self.rt_flags,
)
self.amdshark_runner.iree_compilation_module = params["vmfb"]
self.amdshark_runner.iree_config = params["config"]
self.amdshark_runner.temp_file_to_unlink = params["temp_file_to_unlink"]
del params
return

127
amdshark/amdshark_runner.py Normal file
View File

@@ -0,0 +1,127 @@
# Copyright 2020 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from amdshark.iree_utils.compile_utils import (
get_iree_compiled_module,
get_results,
export_iree_module_to_vmfb,
load_flatbuffer,
)
from amdshark.iree_utils._common import check_device_drivers, device_driver_info
from amdshark.parser import amdshark_args
import os
import sys
# supported dialects by the amdshark-runtime.
supported_dialects = {
"linalg",
"auto",
"stablehlo",
"tosa",
"tf-lite",
"tm_tensor",
}
class AMDSharkRunner:
"""
Base class for AMDSharkInference and AMDSharkTrainer
used to execute an mlir_module.
...
Attributes
----------
mlir_module : str
mlir_module path, string, or bytecode.
device : str
device to execute the mlir_module on.
currently supports cpu, cuda, vulkan, and metal backends.
mlir_dialect: str
The dialect in which the given mlir_module is in.
Refer to {https://mlir.llvm.org/docs/Dialects/}
Methods
-------
run(function_name, inputs=None):
Runs the function with `function_name` within the mlir_module along
with the given inputs, if the inputs are not given it autogenerates the
inputs. Also, the inputs should be a numpy array.
input_info():
Gives the information about the inputs required by the `function_name`.
This can be expensive as it does string matching to do so.
"""
def __init__(
self,
mlir_module: bytes = None,
device: str = "none",
mlir_dialect: str = "linalg",
extra_args: list = [],
compile_vmfb: bool = True,
device_idx: int = None,
rt_flags: list = [],
):
self.mlir_module = mlir_module
if self.mlir_module is not None:
if not os.path.isfile(mlir_module):
print(
"Warning: Initializing AMDSharkRunner with a mlir string/bytecode object will duplicate the model in RAM at compile time. To avoid this, initialize AMDSharkInference with a path to a MLIR module on your hard disk instead."
)
self.compile_str = True
else:
self.compile_str = False
self.device = amdshark_args.device if device == "none" else device
self.mlir_dialect = mlir_dialect
self.extra_args = extra_args
self.device_idx = device_idx
self.rt_flags = rt_flags
if check_device_drivers(self.device):
print(device_driver_info(self.device))
sys.exit(1)
if compile_vmfb == True:
# Compile the module to get the .vmfb.
params = get_iree_compiled_module(
self.mlir_module,
self.device,
self.mlir_dialect,
extra_args=self.extra_args,
device_idx=self.device_idx,
rt_flags=self.rt_flags,
compile_str=self.compile_str,
)
self.iree_compilation_module = params["vmfb"]
self.iree_config = params["config"]
self.temp_file_to_unlink = params["temp_file_to_unlink"]
del params
def run(
self, function_name, inputs: tuple, send_to_host=False, device=None
):
return get_results(
self.iree_compilation_module,
function_name,
inputs,
self.iree_config,
self.mlir_dialect,
send_to_host,
device=device,
)
# Get all function names defined within the compiled module.
def get_functions_in_module(self):
return self.iree_compilation_module._vm_module.function_names

View File

@@ -12,9 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from shark.parser import shark_args
from shark.shark_runner import SharkRunner
from shark.backward_makefx import MakeFxModule
from amdshark.parser import amdshark_args
from amdshark.amdshark_runner import AMDSharkRunner
from amdshark.backward_makefx import MakeFxModule
from amdshark.amdshark_importer import import_with_fx, save_mlir
import numpy as np
from tqdm import tqdm
import sys
@@ -25,8 +26,8 @@ def print_err(*a):
print(*a, file=sys.stderr)
class SharkTrainer:
"""Training pytorch, tensorflow module on shark runtime."""
class AMDSharkTrainer:
"""Training pytorch, tensorflow module on amdshark runtime."""
def __init__(
self,
@@ -47,9 +48,9 @@ class SharkTrainer:
# By default it's the torch frontend.
self.frontend = "pytorch"
self.device = device if device is not None else shark_args.device
self.device = device if device is not None else amdshark_args.device
self.shark_runner = None
self.amdshark_runner = None
# Sets the frontend i.e `pytorch` or `tensorflow`.
def set_frontend(self, frontend: str):
@@ -58,6 +59,7 @@ class SharkTrainer:
"torch",
"tensorflow",
"tf",
"stablehlo",
"mhlo",
"linalg",
"tosa",
@@ -67,26 +69,35 @@ class SharkTrainer:
self.frontend = frontend
# Training function is needed in the case of torch_fn.
def compile(self, training_fn=None):
def compile(self, training_fn=None, mlir_type="linalg", extra_args=[]):
if self.frontend in ["torch", "pytorch"]:
aot_module = MakeFxModule(
self.model, tuple(self.input), custom_inference_fn=training_fn
packed_inputs = (
dict(self.model.named_parameters()),
dict(self.model.named_buffers()),
tuple(self.input),
)
aot_module.generate_graph()
# Returns the backward graph.
training_graph = aot_module.training_graph
weights = self.get_torch_params()
self.shark_runner = SharkRunner(
training_graph,
weights + self.input,
self.dynamic,
mlir_module, func_name = import_with_fx(
training_fn,
packed_inputs,
False,
[],
training=True,
mlir_type=mlir_type,
)
mlir_module = save_mlir(
mlir_module,
model_name="amdshark_model",
frontend="torch",
mlir_dialect=mlir_type,
)
self.amdshark_runner = AMDSharkRunner(
mlir_module,
self.device,
self.jit_trace,
self.from_aot,
self.frontend,
"tm_tensor",
extra_args=extra_args,
)
elif self.frontend in ["tensorflow", "tf", "mhlo"]:
self.shark_runner = SharkRunner(
elif self.frontend in ["tensorflow", "tf", "mhlo", "stablehlo"]:
self.amdshark_runner = AMDSharkRunner(
self.model,
self.input,
self.dynamic,
@@ -112,15 +123,15 @@ class SharkTrainer:
params = [x.numpy() for x in params]
print(f"Training started for {num_iters} iterations:")
for i in tqdm(range(num_iters)):
params = self.shark_runner.forward(
params + self.input, self.frontend
params = self.amdshark_runner.run(
"forward", params + self.input, self.frontend
)
return params
# Function to train tensorflow module.
# Output final loss.
# TODO(raikonenfnu): Save updated weight/states in SHARK.
# TODO(raikonenfnu): Save updated weight/states in AMDSHARK.
def _train_tf(self, num_iters):
input_list = []
for x in self.input:
@@ -139,7 +150,7 @@ class SharkTrainer:
print(f"Training started for {num_iters} iterations:")
for i in tqdm(range(num_iters)):
outputs = self.shark_runner.forward(input_list, self.frontend)
outputs = self.amdshark_runner.forward(input_list, self.frontend)
return outputs
def train(self, num_iters=1):

View File

@@ -15,7 +15,7 @@
import torch
from torch._decomp import get_decompositions
from torch.fx.experimental.proxy_tensor import make_fx
from torch.nn.utils import _stateless
from torch.nn.utils import stateless
from torch import fx
import tempfile
@@ -71,7 +71,7 @@ class MakeFxModule:
fx_g = self.change_fx_graph_return_to_tuple(fx_g)
ts_g = torch.jit.script(fx_g)
temp = tempfile.NamedTemporaryFile(
suffix="_shark_ts", prefix="temp_ts_"
suffix="_amdshark_ts", prefix="temp_ts_"
)
ts_g.save(temp.name)
new_ts = torch.jit.load(temp.name)

View File

@@ -0,0 +1,154 @@
import functools
from typing import List, Optional
import torch
from torch.fx.experimental.proxy_tensor import make_fx
from torch._functorch.compile_utils import strip_overloads
from amdshark.amdshark_inference import AMDSharkInference
from torch._decomp import get_decompositions
from torch.func import functionalize
import io
import torch_mlir
# TODO: Control decompositions.
def default_decompositions():
return get_decompositions(
[
torch.ops.aten.embedding_dense_backward,
torch.ops.aten.native_layer_norm_backward,
torch.ops.aten.slice_backward,
torch.ops.aten.select_backward,
torch.ops.aten.norm.ScalarOpt_dim,
torch.ops.aten.native_group_norm,
torch.ops.aten.upsample_bilinear2d.vec,
torch.ops.aten.split.Tensor,
torch.ops.aten.split_with_sizes,
torch.ops.aten.native_layer_norm,
torch.ops.aten.masked_fill.Tensor,
torch.ops.aten.masked_fill.Scalar,
]
)
def _remove_nones(fx_g: torch.fx.GraphModule) -> List[int]:
removed_indexes = []
for node in fx_g.graph.nodes:
if node.op == "output":
assert (
len(node.args) == 1
), "Output node must have a single argument"
node_arg = node.args[0]
if isinstance(node_arg, (list, tuple)):
node_arg = list(node_arg)
node_args_len = len(node_arg)
for i in range(node_args_len):
curr_index = node_args_len - (i + 1)
if node_arg[curr_index] is None:
removed_indexes.append(curr_index)
node_arg.pop(curr_index)
node.args = (tuple(node_arg),)
break
if len(removed_indexes) > 0:
fx_g.graph.lint()
fx_g.graph.eliminate_dead_code()
fx_g.recompile()
removed_indexes.sort()
return removed_indexes
def _returns_nothing(fx_g: torch.fx.GraphModule) -> bool:
for node in fx_g.graph.nodes:
if node.op == "output":
assert (
len(node.args) == 1
), "Output node must have a single argument"
node_arg = node.args[0]
if isinstance(node_arg, tuple):
return len(node_arg) == 0
return False
def _unwrap_single_tuple_return(fx_g: torch.fx.GraphModule) -> bool:
"""
Replace tuple with tuple element in functions that return one-element tuples.
Returns true if an unwrapping took place, and false otherwise.
"""
unwrapped_tuple = False
for node in fx_g.graph.nodes:
if node.op == "output":
assert (
len(node.args) == 1
), "Output node must have a single argument"
node_arg = node.args[0]
if isinstance(node_arg, tuple):
if len(node_arg) == 1:
node.args = (node_arg[0],)
unwrapped_tuple = True
break
if unwrapped_tuple:
fx_g.graph.lint()
fx_g.recompile()
return unwrapped_tuple
class AMDSharkBackend:
def __init__(
self, fx_g: torch.fx.GraphModule, inputs: tuple, options: dict
):
self.fx_g = fx_g
self.inputs = inputs
self.amdshark_module = None
self.device: str = options.get("device", "cpu")
self.was_unwrapped: bool = False
self.none_indices: list = []
self._modify_fx_g()
self.compile()
def _modify_fx_g(self):
self.none_indices = _remove_nones(self.fx_g)
self.was_unwrapped = _unwrap_single_tuple_return(self.fx_g)
def compile(self):
gm = make_fx(
functionalize(self.fx_g),
decomposition_table=default_decompositions(),
)(*self.inputs)
gm.graph.set_codegen(torch.fx.graph.CodeGen())
gm.recompile()
strip_overloads(gm)
ts_g = torch.jit.script(gm)
mlir_module = torch_mlir.compile(
ts_g, self.inputs, output_type="linalg-on-tensors"
)
bytecode_stream = io.BytesIO()
mlir_module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
from amdshark.amdshark_inference import AMDSharkInference
amdshark_module = AMDSharkInference(
mlir_module=bytecode,
device=self.device,
mlir_dialect="tm_tensor",
)
amdshark_module.compile(extra_args=[])
self.amdshark_module = amdshark_module
def __call__(self, *inputs):
np_inputs = [x.contiguous().detach().cpu().numpy() for x in inputs]
np_outs = self.amdshark_module("forward", np_inputs)
if self.was_unwrapped:
np_outs = [
np_outs,
]
if not isinstance(np_outs, list):
res = torch.from_numpy(np_outs)
return res
result = [torch.from_numpy(x) for x in np_outs]
for r_in in self.none_indices:
result.insert(r_in, None)
result = tuple(result)
return result

View File

@@ -0,0 +1,25 @@
import torch
import amdshark
def foo(x, a):
if x.shape[0] > 3:
return x + a
else:
return x + 3
amdshark_options = {"device": "cpu"}
compiled = torch.compile(foo, backend="amdshark", options=amdshark_options)
input = torch.ones(4)
x = compiled(input, input)
print(x)
input = torch.ones(3)
x = compiled(input, input)
print(x)

View File

@@ -22,7 +22,7 @@
"source": [
"# standard imports\n",
"import torch\n",
"from shark.iree_utils import get_iree_compiled_module"
"from amdshark.iree_utils import get_iree_compiled_module"
]
},
{
@@ -36,7 +36,9 @@
" from torchdynamo.optimizations.backends import create_backend\n",
" from torchdynamo.optimizations.subgraph import SubGraph\n",
"except ModuleNotFoundError:\n",
" print(\"Please install TorchDynamo using pip install git+https://github.com/pytorch/torchdynamo\")\n",
" print(\n",
" \"Please install TorchDynamo using pip install git+https://github.com/pytorch/torchdynamo\"\n",
" )\n",
" exit()\n",
"\n",
"# torch-mlir imports for compiling\n",
@@ -97,7 +99,9 @@
"\n",
" for node in fx_g.graph.nodes:\n",
" if node.op == \"output\":\n",
" assert len(node.args) == 1, \"Output node must have a single argument\"\n",
" assert (\n",
" len(node.args) == 1\n",
" ), \"Output node must have a single argument\"\n",
" node_arg = node.args[0]\n",
" if isinstance(node_arg, tuple) and len(node_arg) == 1:\n",
" node.args = (node_arg[0],)\n",
@@ -116,8 +120,12 @@
" if len(args) == 1 and isinstance(args[0], list):\n",
" args = args[0]\n",
"\n",
" linalg_module = compile(ts_graph, args, output_type=OutputType.LINALG_ON_TENSORS)\n",
" callable, _ = get_iree_compiled_module(linalg_module, \"cuda\", func_name=\"forward\")\n",
" linalg_module = compile(\n",
" ts_graph, args, output_type=OutputType.LINALG_ON_TENSORS\n",
" )\n",
" callable, _ = get_iree_compiled_module(\n",
" linalg_module, \"cuda\", func_name=\"forward\"\n",
" )\n",
"\n",
" def forward(*inputs):\n",
" return callable(*inputs)\n",
@@ -212,6 +220,7 @@
" assert isinstance(subgraph, SubGraph), \"Model must be a dynamo SubGraph.\"\n",
" return __torch_mlir(subgraph.model, *list(subgraph.example_inputs))\n",
"\n",
"\n",
"@torchdynamo.optimize(\"torch_mlir\")\n",
"def toy_example2(*args):\n",
" a, b = args\n",

View File

@@ -1,7 +1,7 @@
import torch
from torch_mlir import compile, OutputType
from shark.iree_utils import get_iree_compiled_module
from amdshark.iree_utils import get_iree_compiled_module
try:
import torchdynamo

View File

@@ -32,7 +32,7 @@
"source": [
"# eager mode imports\n",
"from torch_mlir.eager_mode.torch_mlir_tensor import TorchMLIRTensor\n",
"from shark.iree_eager_backend import EagerModeIREELinalgOnTensorsBackend"
"from amdshark.iree_eager_backend import EagerModeIREELinalgOnTensorsBackend"
],
"metadata": {
"collapsed": false,
@@ -440,7 +440,7 @@
{
"cell_type": "markdown",
"source": [
"There is a convenience class `SharkEagerMode` that will handle both the installation of the backend and the wrapping of `torch.Tensor`s:"
"There is a convenience class `AMDSharkEagerMode` that will handle both the installation of the backend and the wrapping of `torch.Tensor`s:"
],
"metadata": {
"collapsed": false,
@@ -684,9 +684,9 @@
],
"source": [
"# eager mode RAII\n",
"from shark.shark_runner import SharkEagerMode\n",
"from amdshark.amdshark_runner import AMDSharkEagerMode\n",
"\n",
"shark_eager_mode = SharkEagerMode(\"cpu\")\n",
"amdshark_eager_mode = AMDSharkEagerMode(\"cpu\")\n",
"\n",
"t = torch.ones((10, 10))\n",
"u = torch.ones((10, 10))\n",
@@ -712,7 +712,7 @@
{
"cell_type": "markdown",
"source": [
"The `SharkEagerMode` class is a hacky take on [RAII](https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization) that defines a \"deleter\" that runs when an instantiation (of `SharkEagerMode`) is garbage collected. Takeaway is that if you want to turn off `SharkEagerMode`, or switch backends, you need to `del` the instance:"
"The `AMDSharkEagerMode` class is a hacky take on [RAII](https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization) that defines a \"deleter\" that runs when an instantiation (of `AMDSharkEagerMode`) is garbage collected. Takeaway is that if you want to turn off `AMDSharkEagerMode`, or switch backends, you need to `del` the instance:"
],
"metadata": {
"collapsed": false,
@@ -757,8 +757,8 @@
}
],
"source": [
"del shark_eager_mode\n",
"shark_eager_mode = SharkEagerMode(\"cuda\")\n",
"del amdshark_eager_mode\n",
"amdshark_eager_mode = AMDSharkEagerMode(\"cuda\")\n",
"\n",
"t = torch.ones((10, 10))\n",
"u = torch.ones((10, 10))\n",

View File

@@ -17,8 +17,8 @@ from torch.utils.cpp_extension import load_inline, include_paths
from torch_mlir.eager_mode import torch_mlir_tensor
from torch_mlir.eager_mode.torch_mlir_tensor import TorchMLIRTensor
from shark.iree_eager_backend import EagerModeIREELinalgOnTensorsBackend
from shark.shark_runner import SharkEagerMode
from amdshark.iree_eager_backend import EagerModeIREELinalgOnTensorsBackend
from amdshark.amdshark_runner import AMDSharkEagerMode
def test_cpu():
@@ -85,7 +85,7 @@ def test_gpu():
def test_python_mode_ref_backend():
# hide this wherever you want?
_ = SharkEagerMode("refbackend")
_ = AMDSharkEagerMode("refbackend")
t = torch.ones((10, 10), device="cpu")
u = torch.ones((10, 10), device="cpu")
@@ -103,7 +103,7 @@ def test_python_mode_ref_backend():
def test_python_mode_iree_cpu():
# hide this wherever you want?
_ = SharkEagerMode("cpu")
_ = AMDSharkEagerMode("cpu")
t = torch.ones((10, 10), device="cpu")
u = torch.ones((10, 10), device="cpu")
@@ -121,7 +121,7 @@ def test_python_mode_iree_cpu():
def test_python_mode_iree_gpu():
_ = SharkEagerMode("gpu")
_ = AMDSharkEagerMode("gpu")
t = torch.ones((10, 10), device="cpu")
u = torch.ones((10, 10), device="cpu")

View File

@@ -0,0 +1,73 @@
import torch
import numpy as np
model = torch.hub.load(
"pytorch/vision:v0.10.0", "squeezenet1_0", pretrained=True
)
model.eval()
# from PIL import Image
# from torchvision import transforms
# import urllib
#
# url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
# try: urllib.URLopener().retrieve(url, filename)
# except: urllib.request.urlretrieve(url, filename)
#
#
# input_image = Image.open(filename)
# preprocess = transforms.Compose([
# transforms.Resize(256),
# transforms.CenterCrop(224),
# transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
# ])
# input_tensor = preprocess(input_image)
# input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
# print(input_batch.shape) # size = [1, 3, 224, 224]
# The above is code for generating sample inputs from an image. We can just use
# random values for accuracy testing though
input_batch = torch.randn(1, 3, 224, 224)
# Focus on CPU for now
if False and torch.cuda.is_available():
input_batch = input_batch.to("cuda")
model.to("cuda")
with torch.no_grad():
output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
golden_confidences = output[0]
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
golden_probabilities = torch.nn.functional.softmax(
golden_confidences, dim=0
).numpy()
golden_confidences = golden_confidences.numpy()
from amdshark.torch_mlir_lockstep_tensor import TorchMLIRLockstepTensor
input_detached_clone = input_batch.clone()
eager_input_batch = TorchMLIRLockstepTensor(input_detached_clone)
print("getting torch-mlir result")
output = model(eager_input_batch)
static_output = output.elem
confidences = static_output[0]
probabilities = torch.nn.functional.softmax(
torch.from_numpy(confidences), dim=0
).numpy()
print("The obtained result via amdshark is: ", confidences)
print("The golden result is:", golden_confidences)
np.testing.assert_allclose(
golden_confidences, confidences, rtol=1e-02, atol=1e-03
)
np.testing.assert_allclose(
golden_probabilities, probabilities, rtol=1e-02, atol=1e-03
)

View File

@@ -3,7 +3,7 @@ import requests
from transformers import CLIPProcessor, TFCLIPModel
import tensorflow as tf
from shark.shark_inference import SharkInference
from amdshark.amdshark_inference import AMDSharkInference
# Create a set of inputs
clip_vit_inputs = [
@@ -22,7 +22,7 @@ class CLIPModule(tf.Module):
input_ids=x, attention_mask=y, pixel_values=z
)
@tf.function(input_signature=clip_vit_inputs)
@tf.function(input_signature=clip_vit_inputs, jit_compile=True)
def forward(self, input_ids, attention_mask, pixel_values):
return self.m.predict(
input_ids, attention_mask, pixel_values
@@ -43,7 +43,7 @@ if __name__ == "__main__":
padding=True,
)
shark_module = SharkInference(
amdshark_module = AMDSharkInference(
CLIPModule(),
(
inputs["input_ids"],
@@ -51,11 +51,11 @@ if __name__ == "__main__":
inputs["pixel_values"],
),
)
shark_module.set_frontend("tensorflow")
shark_module.compile()
amdshark_module.set_frontend("tensorflow")
amdshark_module.compile()
print(
shark_module.forward(
amdshark_module.forward(
(
inputs["input_ids"],
inputs["attention_mask"],

View File

@@ -0,0 +1,15 @@
## Running ESRGAN
```
1. pip install numpy opencv-python
2. mkdir InputImages
(this is where all the input images will reside in)
3. mkdir OutputImages
(this is where the model will generate all the images)
4. mkdir models
(save the .pth checkpoint file here)
5. python esrgan.py
```
- Download [RRDB_ESRGAN_x4.pth](https://drive.google.com/drive/u/0/folders/17VYV_SoZZesU6mbxz2dMAIccSSlqLecY) and place it in the `models` directory as mentioned above in step 4.
- Credits : [ESRGAN](https://github.com/xinntao/ESRGAN)

View File

@@ -0,0 +1,239 @@
from ast import arg
import os.path as osp
import glob
import cv2
import numpy as np
import torch
from torch.fx.experimental.proxy_tensor import make_fx
from torch._decomp import get_decompositions
from amdshark.amdshark_inference import AMDSharkInference
import torch_mlir
import tempfile
import functools
import torch
import torch.nn as nn
import torch.nn.functional as F
def make_layer(block, n_layers):
layers = []
for _ in range(n_layers):
layers.append(block())
return nn.Sequential(*layers)
class ResidualDenseBlock_5C(nn.Module):
def __init__(self, nf=64, gc=32, bias=True):
super(ResidualDenseBlock_5C, self).__init__()
# gc: growth channel, i.e. intermediate channels
self.conv1 = nn.Conv2d(nf, gc, 3, 1, 1, bias=bias)
self.conv2 = nn.Conv2d(nf + gc, gc, 3, 1, 1, bias=bias)
self.conv3 = nn.Conv2d(nf + 2 * gc, gc, 3, 1, 1, bias=bias)
self.conv4 = nn.Conv2d(nf + 3 * gc, gc, 3, 1, 1, bias=bias)
self.conv5 = nn.Conv2d(nf + 4 * gc, nf, 3, 1, 1, bias=bias)
self.lrelu = nn.LeakyReLU(negative_slope=0.2, inplace=True)
# initialization
# mutil.initialize_weights([self.conv1, self.conv2, self.conv3, self.conv4, self.conv5], 0.1)
def forward(self, x):
x1 = self.lrelu(self.conv1(x))
x2 = self.lrelu(self.conv2(torch.cat((x, x1), 1)))
x3 = self.lrelu(self.conv3(torch.cat((x, x1, x2), 1)))
x4 = self.lrelu(self.conv4(torch.cat((x, x1, x2, x3), 1)))
x5 = self.conv5(torch.cat((x, x1, x2, x3, x4), 1))
return x5 * 0.2 + x
class RRDB(nn.Module):
"""Residual in Residual Dense Block"""
def __init__(self, nf, gc=32):
super(RRDB, self).__init__()
self.RDB1 = ResidualDenseBlock_5C(nf, gc)
self.RDB2 = ResidualDenseBlock_5C(nf, gc)
self.RDB3 = ResidualDenseBlock_5C(nf, gc)
def forward(self, x):
out = self.RDB1(x)
out = self.RDB2(out)
out = self.RDB3(out)
return out * 0.2 + x
class RRDBNet(nn.Module):
def __init__(self, in_nc, out_nc, nf, nb, gc=32):
super(RRDBNet, self).__init__()
RRDB_block_f = functools.partial(RRDB, nf=nf, gc=gc)
self.conv_first = nn.Conv2d(in_nc, nf, 3, 1, 1, bias=True)
self.RRDB_trunk = make_layer(RRDB_block_f, nb)
self.trunk_conv = nn.Conv2d(nf, nf, 3, 1, 1, bias=True)
#### upsampling
self.upconv1 = nn.Conv2d(nf, nf, 3, 1, 1, bias=True)
self.upconv2 = nn.Conv2d(nf, nf, 3, 1, 1, bias=True)
self.HRconv = nn.Conv2d(nf, nf, 3, 1, 1, bias=True)
self.conv_last = nn.Conv2d(nf, out_nc, 3, 1, 1, bias=True)
self.lrelu = nn.LeakyReLU(negative_slope=0.2, inplace=True)
def forward(self, x):
fea = self.conv_first(x)
trunk = self.trunk_conv(self.RRDB_trunk(fea))
fea = fea + trunk
fea = self.lrelu(
self.upconv1(F.interpolate(fea, scale_factor=2, mode="nearest"))
)
fea = self.lrelu(
self.upconv2(F.interpolate(fea, scale_factor=2, mode="nearest"))
)
out = self.conv_last(self.lrelu(self.HRconv(fea)))
return out
############### Parsing args #####################
import argparse
p = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
p.add_argument("--device", type=str, default="cpu", help="the device to use")
p.add_argument(
"--mlir_loc",
type=str,
default=None,
help="location of the model's mlir file",
)
args = p.parse_args()
###################################################
def inference(input_m):
return model(input_m)
def load_mlir(mlir_loc):
import os
if mlir_loc == None:
return None
print(f"Trying to load the model from {mlir_loc}.")
with open(os.path.join(mlir_loc)) as f:
mlir_module = f.read()
return mlir_module
def compile_through_fx(model, inputs, mlir_loc=None):
module = load_mlir(mlir_loc)
if module == None:
fx_g = make_fx(
model,
decomposition_table=get_decompositions(
[
torch.ops.aten.embedding_dense_backward,
torch.ops.aten.native_layer_norm_backward,
torch.ops.aten.slice_backward,
torch.ops.aten.select_backward,
torch.ops.aten.norm.ScalarOpt_dim,
torch.ops.aten.native_group_norm,
torch.ops.aten.upsample_bilinear2d.vec,
torch.ops.aten.split.Tensor,
torch.ops.aten.split_with_sizes,
]
),
)(inputs)
fx_g.graph.set_codegen(torch.fx.graph.CodeGen())
fx_g.recompile()
def strip_overloads(gm):
"""
Modifies the target of graph nodes in :attr:`gm` to strip overloads.
Args:
gm(fx.GraphModule): The input Fx graph module to be modified
"""
for node in gm.graph.nodes:
if isinstance(node.target, torch._ops.OpOverload):
node.target = node.target.overloadpacket
gm.recompile()
strip_overloads(fx_g)
ts_g = torch.jit.script(fx_g)
print("Torchscript graph generated successfully")
module = torch_mlir.compile(
ts_g,
inputs,
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=False,
verbose=False,
)
mlir_model = str(module)
func_name = "forward"
amdshark_module = AMDSharkInference(
mlir_model, device=args.device, mlir_dialect="linalg"
)
amdshark_module.compile()
return amdshark_module
model_path = "models/RRDB_ESRGAN_x4.pth" # models/RRDB_ESRGAN_x4.pth OR models/RRDB_PSNR_x4.pth
# device = torch.device('cuda') # if you want to run on CPU, change 'cuda' -> cpu
device = torch.device("cpu")
test_img_folder = "InputImages/*"
model = RRDBNet(3, 3, 64, 23, gc=32)
model.load_state_dict(torch.load(model_path), strict=True)
model.eval()
model = model.to(device)
print("Model path {:s}. \nTesting...".format(model_path))
if __name__ == "__main__":
idx = 0
for path in glob.glob(test_img_folder):
idx += 1
base = osp.splitext(osp.basename(path))[0]
print(idx, base)
# read images
img = cv2.imread(path, cv2.IMREAD_COLOR)
img = img * 1.0 / 255
img = torch.from_numpy(
np.transpose(img[:, :, [2, 1, 0]], (2, 0, 1))
).float()
img_LR = img.unsqueeze(0)
img_LR = img_LR.to(device)
with torch.no_grad():
amdshark_module = compile_through_fx(inference, img_LR)
amdshark_output = amdshark_module.forward((img_LR,))
amdshark_output = torch.from_numpy(amdshark_output)
amdshark_output = (
amdshark_output.data.squeeze().float().cpu().clamp_(0, 1).numpy()
)
esrgan_output = (
model(img_LR).data.squeeze().float().cpu().clamp_(0, 1).numpy()
)
# AMDSHARK OUTPUT
amdshark_output = np.transpose(amdshark_output[[2, 1, 0], :, :], (1, 2, 0))
amdshark_output = (amdshark_output * 255.0).round()
cv2.imwrite(
"OutputImages/{:s}_rlt_amdshark_output.png".format(base), amdshark_output
)
print("Generated AMDSHARK's output")
# ESRGAN OUTPUT
esrgan_output = np.transpose(esrgan_output[[2, 1, 0], :, :], (1, 2, 0))
esrgan_output = (esrgan_output * 255.0).round()
cv2.imwrite(
"OutputImages/{:s}_rlt_esrgan_output.png".format(base),
esrgan_output,
)
print("Generated ESRGAN's output")

View File

@@ -1,7 +1,7 @@
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch
from shark.shark_inference import SharkInference
from shark.shark_importer import SharkImporter
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_importer import AMDSharkImporter
from iree.compiler import compile_str
from iree import runtime as ireert
import os
@@ -35,7 +35,7 @@ if __name__ == "__main__":
return_tensors="pt",
)
inputs = (encoded_inputs["input_ids"], encoded_inputs["attention_mask"])
mlir_importer = SharkImporter(
mlir_importer = AMDSharkImporter(
AlbertModule(),
inputs,
frontend="torch",
@@ -43,11 +43,9 @@ if __name__ == "__main__":
minilm_mlir, func_name = mlir_importer.import_mlir(
is_dynamic=False, tracing_required=True
)
shark_module = SharkInference(
minilm_mlir, func_name, mlir_dialect="linalg"
)
shark_module.compile()
token_logits = torch.tensor(shark_module.forward(inputs))
amdshark_module = AMDSharkInference(minilm_mlir)
amdshark_module.compile()
token_logits = torch.tensor(amdshark_module.forward(inputs))
mask_id = torch.where(
encoded_inputs["input_ids"] == tokenizer.mask_token_id
)[1]
@@ -71,7 +69,7 @@ if __name__ == "__main__":
encoded_inputs["input_ids"],
encoded_inputs["attention_mask"],
)
token_logits = torch.tensor(shark_module.forward(inputs))
token_logits = torch.tensor(amdshark_module.forward(inputs))
mask_id = torch.where(
encoded_inputs["input_ids"] == tokenizer.mask_token_id
)[1]

View File

@@ -3,8 +3,8 @@ import requests
from transformers import TFAutoModelForMaskedLM, AutoTokenizer
import tensorflow as tf
from shark.shark_inference import SharkInference
from shark.shark_importer import SharkImporter
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_importer import AMDSharkImporter
from iree.compiler import tf as tfc
from iree.compiler import compile_str
from iree import runtime as ireert
@@ -28,7 +28,7 @@ class AlbertModule(tf.Module):
self.m = TFAutoModelForMaskedLM.from_pretrained("albert-base-v2")
self.m.predict = lambda x, y: self.m(input_ids=x, attention_mask=y)
@tf.function(input_signature=t5_inputs)
@tf.function(input_signature=t5_inputs, jit_compile=True)
def forward(self, input_ids, attention_mask):
return self.m.predict(input_ids, attention_mask)
@@ -46,7 +46,7 @@ if __name__ == "__main__":
return_tensors="tf",
)
inputs = (encoded_inputs["input_ids"], encoded_inputs["attention_mask"])
mlir_importer = SharkImporter(
mlir_importer = AMDSharkImporter(
AlbertModule(),
inputs,
frontend="tf",
@@ -54,11 +54,11 @@ if __name__ == "__main__":
minilm_mlir, func_name = mlir_importer.import_mlir(
is_dynamic=False, tracing_required=False
)
shark_module = SharkInference(minilm_mlir, func_name, mlir_dialect="mhlo")
shark_module.compile()
amdshark_module = AMDSharkInference(minilm_mlir, mlir_dialect="mhlo")
amdshark_module.compile()
output_idx = 0
data_idx = 1
token_logits = shark_module.forward(inputs)[output_idx][data_idx]
token_logits = amdshark_module.forward(inputs)[output_idx][data_idx]
mask_id = np.where(
tf.squeeze(encoded_inputs["input_ids"]) == tokenizer.mask_token_id
)
@@ -82,7 +82,7 @@ if __name__ == "__main__":
encoded_inputs["input_ids"],
encoded_inputs["attention_mask"],
)
token_logits = shark_module.forward(inputs)[output_idx][data_idx]
token_logits = amdshark_module.forward(inputs)[output_idx][data_idx]
mask_id = np.where(
tf.squeeze(encoded_inputs["input_ids"])
== tokenizer.mask_token_id

View File

@@ -0,0 +1,14 @@
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_downloader import download_model
mlir_model, func_name, inputs, golden_out = download_model(
"bloom", frontend="torch"
)
amdshark_module = AMDSharkInference(
mlir_model, device="cpu", mlir_dialect="tm_tensor"
)
amdshark_module.compile()
result = amdshark_module.forward(inputs)
print("The obtained result via amdshark is: ", result)
print("The golden result is:", golden_out)

View File

@@ -3,7 +3,7 @@ import requests
from transformers import GPT2Tokenizer, TFGPT2Model
import tensorflow as tf
from shark.shark_inference import SharkInference
from amdshark.amdshark_inference import AMDSharkInference
# Create a set of inputs
gpt2_inputs = [
@@ -19,7 +19,7 @@ class GPT2Module(tf.Module):
self.m.predict = lambda x, y: self.m(input_ids=x, attention_mask=y)
@tf.function(input_signature=gpt2_inputs)
@tf.function(input_signature=gpt2_inputs, jit_compile=True)
def forward(self, input_ids, attention_mask):
return self.m.predict(input_ids, attention_mask)
@@ -30,11 +30,11 @@ if __name__ == "__main__":
text = "I love the distilled version of models."
inputs = tokenizer(text, return_tensors="tf")
shark_module = SharkInference(
amdshark_module = AMDSharkInference(
GPT2Module(), (inputs["input_ids"], inputs["attention_mask"])
)
shark_module.set_frontend("tensorflow")
shark_module.compile()
amdshark_module.set_frontend("tensorflow")
amdshark_module.compile()
print(
shark_module.forward((inputs["input_ids"], inputs["attention_mask"]))
amdshark_module.forward((inputs["input_ids"], inputs["attention_mask"]))
)

View File

@@ -0,0 +1,18 @@
# AMDSHARK LLaMA
## TORCH-MLIR Version
```
https://github.com/nod-ai/torch-mlir.git
```
Then check out the `complex` branch and `git submodule update --init` and then build with `.\build_tools\python_deploy\build_windows.ps1`
### Setup & Run
```
git clone https://github.com/nod-ai/llama.git
```
Then in this repository
```
pip install -e .
python llama/amdshark_model.py
```

View File

@@ -0,0 +1,72 @@
import torch
import torch_mlir
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_compile import amdshark_compile_through_fx
from MEGABYTE_pytorch import MEGABYTE
import os
class MegaModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.model = MEGABYTE(
num_tokens=16000, # number of tokens
dim=(
512,
256,
), # transformer model dimension (512 for coarsest, 256 for fine in this example)
max_seq_len=(
1024,
4,
), # sequence length for global and then local. this can be more than 2
depth=(
6,
4,
), # number of layers for global and then local. this can be more than 2, but length must match the max_seq_len's
dim_head=64, # dimension per head
heads=8, # number of attention heads
flash_attn=True, # use flash attention
)
def forward(self, input):
return self.model(input)
megaModel = MegaModel()
inputs = [torch.randint(0, 16000, (1, 1024, 4))]
# CURRENTLY IT BAILS OUT HERE BECAUSE OF MISSING OP LOWERINGS :-
# 1. aten.alias
amdshark_module, _ = amdshark_compile_through_fx(
model=megaModel,
inputs=inputs,
extended_model_name="mega_amdshark",
is_f16=False,
f16_input_mask=None,
save_dir=os.getcwd(),
debug=False,
generate_or_load_vmfb=True,
extra_args=[],
device="cuda",
mlir_dialect="tm_tensor",
)
# logits = model(x)
def print_output_info(output, msg):
print("\n", msg)
print("\n\t", output.shape)
ans = amdshark_module("forward", inputs)
print_output_info(torch.from_numpy(ans), "AMDSHARK's output")
ans = megaModel.forward(*inputs)
print_output_info(ans, "ORIGINAL Model's output")
# and sample from the logits accordingly
# or you can use the generate function
# NEED TO LOOK AT THIS LATER IF REQUIRED IN AMDSHARK.
# sampled = model.generate(temperature = 0.9, filter_thres = 0.9) # (1, 1024, 4)

View File

@@ -0,0 +1,31 @@
from amdshark.amdshark_inference import AMDSharkInference
import numpy as np
mhlo_ir = r"""builtin.module {
func.func @forward(%arg0: tensor<1x4xf32>, %arg1: tensor<4x1xf32>) -> tensor<4x4xf32> {
%0 = chlo.broadcast_add %arg0, %arg1 : (tensor<1x4xf32>, tensor<4x1xf32>) -> tensor<4x4xf32>
%1 = "mhlo.abs"(%0) : (tensor<4x4xf32>) -> tensor<4x4xf32>
return %1 : tensor<4x4xf32>
}
}"""
arg0 = np.ones((1, 4)).astype(np.float32)
arg1 = np.ones((4, 1)).astype(np.float32)
print("Running amdshark on cpu backend")
amdshark_module = AMDSharkInference(mhlo_ir, device="cpu", mlir_dialect="mhlo")
# Generate the random inputs and feed into the graph.
x = amdshark_module.generate_random_inputs()
amdshark_module.compile()
print(amdshark_module.forward(x))
print("Running amdshark on cuda backend")
amdshark_module = AMDSharkInference(mhlo_ir, device="cuda", mlir_dialect="mhlo")
amdshark_module.compile()
print(amdshark_module.forward(x))
print("Running amdshark on vulkan backend")
amdshark_module = AMDSharkInference(mhlo_ir, device="vulkan", mlir_dialect="mhlo")
amdshark_module.compile()
print(amdshark_module.forward(x))

View File

@@ -1,6 +1,6 @@
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from shark.shark_inference import SharkInference
from amdshark.amdshark_inference import AMDSharkInference
torch.manual_seed(0)
tokenizer = AutoTokenizer.from_pretrained("microsoft/MiniLM-L12-H384-uncased")
@@ -23,13 +23,13 @@ class MiniLMSequenceClassification(torch.nn.Module):
test_input = torch.randint(2, (1, 128))
shark_module = SharkInference(
amdshark_module = AMDSharkInference(
MiniLMSequenceClassification(),
(test_input,),
jit_trace=True,
benchmark_mode=True,
)
shark_module.compile()
shark_module.forward((test_input,))
shark_module.benchmark_all((test_input,))
amdshark_module.compile()
amdshark_module.forward((test_input,))
amdshark_module.benchmark_all((test_input,))

View File

@@ -1,6 +1,6 @@
import tensorflow as tf
from transformers import BertModel, BertTokenizer, TFBertModel
from shark.shark_inference import SharkInference
from amdshark.amdshark_inference import AMDSharkInference
MAX_SEQUENCE_LENGTH = 512
BATCH_SIZE = 1
@@ -26,7 +26,7 @@ class BertModule(tf.Module):
input_ids=x, attention_mask=y, token_type_ids=z, training=False
)
@tf.function(input_signature=bert_input)
@tf.function(input_signature=bert_input, jit_compile=True)
def forward(self, input_ids, attention_mask, token_type_ids):
return self.m.predict(input_ids, attention_mask, token_type_ids)
@@ -53,9 +53,9 @@ if __name__ == "__main__":
encoded_input["attention_mask"],
encoded_input["token_type_ids"],
)
shark_module = SharkInference(
amdshark_module = AMDSharkInference(
BertModule(), test_input, benchmark_mode=True
)
shark_module.set_frontend("tensorflow")
shark_module.compile()
shark_module.benchmark_all(test_input)
amdshark_module.set_frontend("tensorflow")
amdshark_module.compile()
amdshark_module.benchmark_all(test_input)

View File

@@ -0,0 +1,73 @@
from transformers import AutoTokenizer, FlaxAutoModel
import torch
import jax
from typing import Union, Dict, List, Any
import numpy as np
from amdshark.amdshark_inference import AMDSharkInference
import io
NumpyTree = Union[np.ndarray, Dict[str, np.ndarray], List[np.ndarray]]
def convert_torch_tensor_tree_to_numpy(
tree: Union[torch.tensor, Dict[str, torch.tensor], List[torch.tensor]]
) -> NumpyTree:
return jax.tree_util.tree_map(
lambda torch_tensor: torch_tensor.cpu().detach().numpy(), tree
)
def convert_int64_to_int32(tree: NumpyTree) -> NumpyTree:
return jax.tree_util.tree_map(
lambda tensor: np.array(tensor, dtype=np.int32)
if tensor.dtype == np.int64
else tensor,
tree,
)
def get_sample_input():
tokenizer = AutoTokenizer.from_pretrained(
"microsoft/MiniLM-L12-H384-uncased"
)
inputs_torch = tokenizer("Hello, World!", return_tensors="pt")
return convert_int64_to_int32(
convert_torch_tensor_tree_to_numpy(inputs_torch.data)
)
def get_jax_model():
return FlaxAutoModel.from_pretrained("microsoft/MiniLM-L12-H384-uncased")
def export_jax_to_mlir(jax_model: Any, sample_input: NumpyTree):
model_mlir = jax.jit(jax_model).lower(**sample_input).compiler_ir()
byte_stream = io.BytesIO()
model_mlir.operation.write_bytecode(file=byte_stream)
return byte_stream.getvalue()
def assert_array_list_allclose(x, y, *args, **kwargs):
assert len(x) == len(y)
for a, b in zip(x, y):
np.testing.assert_allclose(
np.asarray(a), np.asarray(b), *args, **kwargs
)
sample_input = get_sample_input()
jax_model = get_jax_model()
mlir = export_jax_to_mlir(jax_model, sample_input)
# Compile and load module.
amdshark_inference = AMDSharkInference(mlir_module=mlir, mlir_dialect="mhlo")
amdshark_inference.compile()
# Run main function.
result = amdshark_inference("main", jax.tree_util.tree_flatten(sample_input)[0])
# Run JAX model.
reference_result = jax.tree_util.tree_flatten(jax_model(**sample_input))[0]
# Verify result.
assert_array_list_allclose(result, reference_result, atol=1e-5)

View File

@@ -0,0 +1,6 @@
flax
jax[cpu]
nodai-AMDSHARK
orbax
transformers
torch

View File

@@ -0,0 +1,23 @@
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_downloader import download_model
mlir_model, func_name, inputs, golden_out = download_model(
"microsoft/MiniLM-L12-H384-uncased",
frontend="torch",
)
amdshark_module = AMDSharkInference(mlir_model, device="cpu", mlir_dialect="linalg")
amdshark_module.compile()
result = amdshark_module.forward(inputs)
print("The obtained result via amdshark is: ", result)
print("The golden result is:", golden_out)
# Let's generate random inputs, currently supported
# for static models.
rand_inputs = amdshark_module.generate_random_inputs()
rand_results = amdshark_module.forward(rand_inputs)
print("Running amdshark_module with random_inputs is: ", rand_results)

View File

@@ -1,6 +1,6 @@
import tensorflow as tf
from transformers import BertModel, BertTokenizer, TFBertModel
from shark.shark_inference import SharkInference
from amdshark.amdshark_inference import AMDSharkInference
MAX_SEQUENCE_LENGTH = 512
BATCH_SIZE = 1
@@ -26,7 +26,7 @@ class BertModule(tf.Module):
input_ids=x, attention_mask=y, token_type_ids=z, training=False
)
@tf.function(input_signature=bert_input)
@tf.function(input_signature=bert_input, jit_compile=True)
def forward(self, input_ids, attention_mask, token_type_ids):
return self.m.predict(input_ids, attention_mask, token_type_ids)
@@ -48,7 +48,7 @@ if __name__ == "__main__":
tf.convert_to_tensor(encoded_input[key]), 0
)
shark_module = SharkInference(
amdshark_module = AMDSharkInference(
BertModule(),
(
encoded_input["input_ids"],
@@ -56,11 +56,11 @@ if __name__ == "__main__":
encoded_input["token_type_ids"],
),
)
shark_module.set_frontend("tensorflow")
shark_module.compile()
amdshark_module.set_frontend("tensorflow")
amdshark_module.compile()
print(
shark_module.forward(
amdshark_module.forward(
(
encoded_input["input_ids"],
encoded_input["attention_mask"],

View File

@@ -1,7 +1,7 @@
import torch
import torchvision.models as models
from shark.shark_inference import SharkInference
from shark.shark_importer import SharkImporter
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_importer import AMDSharkImporter
torch.hub.list("zhanghang1989/ResNeSt", force_reload=True)
@@ -21,7 +21,7 @@ class ResnestModule(torch.nn.Module):
input = torch.randn(1, 3, 224, 224)
mlir_importer = SharkImporter(
mlir_importer = AMDSharkImporter(
ResnestModule(),
(input,),
frontend="torch",
@@ -33,7 +33,7 @@ mlir_importer = SharkImporter(
print(golden_out)
shark_module = SharkInference(vision_mlir, func_name, mlir_dialect="linalg")
shark_module.compile()
result = shark_module.forward((input,))
amdshark_module = AMDSharkInference(vision_mlir, mlir_dialect="linalg")
amdshark_module.compile()
result = amdshark_module.forward((input,))
print("Obtained result", result)

View File

@@ -0,0 +1,74 @@
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.parser import amdshark_args
import torch
import numpy as np
import sys
import torchvision.models as models
import torch_mlir
torch.manual_seed(0)
class VisionModule(torch.nn.Module):
def __init__(self):
super().__init__()
self.model = models.resnet50(pretrained=True)
self.train(False)
def forward(self, input):
return self.model.forward(input)
model = VisionModule()
test_input = torch.randn(1, 3, 224, 224)
actual_out = model(test_input)
test_input_fp16 = test_input.to(device=torch.device("cuda"), dtype=torch.half)
model_fp16 = model.half()
model_fp16.eval()
model_fp16.to("cuda")
actual_out_fp16 = model_fp16(test_input_fp16)
ts_g = torch.jit.trace(model_fp16, [test_input_fp16])
module = torch_mlir.compile(
ts_g,
(test_input_fp16),
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=True,
verbose=False,
)
# from contextlib import redirect_stdout
# with open('resnet50_fp16_linalg_ir.mlir', 'w') as f:
# with redirect_stdout(f):
# print(module.operation.get_asm())
mlir_model = module
func_name = "forward"
amdshark_module = AMDSharkInference(mlir_model, device="cuda", mlir_dialect="linalg")
amdshark_module.compile()
def amdshark_result(x):
x_ny = x.cpu().detach().numpy()
inputs = (x_ny,)
result = amdshark_module.forward(inputs)
return torch.from_numpy(result)
observed_out = amdshark_result(test_input_fp16)
print("Golden result:", actual_out_fp16)
print("AMDSHARK result:", observed_out)
actual_out_fp16 = actual_out_fp16.to(device=torch.device("cpu"))
print(
torch.testing.assert_allclose(
actual_out_fp16, observed_out, rtol=1e-2, atol=1e-2
)
)

View File

@@ -4,8 +4,8 @@ import torch
import torchvision.models as models
from torchvision import transforms
import sys
from shark.shark_inference import SharkInference
from shark.shark_downloader import download_torch_model
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_downloader import download_model
################################## Preprocessing inputs and model ############
@@ -66,13 +66,17 @@ labels = load_labels()
## Can pass any img or input to the forward module.
mlir_model, func_name, inputs, golden_out = download_torch_model("resnet50")
mlir_model, func_name, inputs, golden_out = download_model(
"resnet50", frontend="torch"
)
shark_module = SharkInference(mlir_model, func_name, mlir_dialect="linalg")
shark_module.compile()
result = shark_module.forward((img.detach().numpy(),))
amdshark_module = AMDSharkInference(mlir_model, mlir_dialect="linalg")
amdshark_module.compile()
path = amdshark_module.save_module()
amdshark_module.load_module(path)
result = amdshark_module("forward", (img.detach().numpy(),))
print("The top 3 results obtained via shark_runner is:")
print("The top 3 results obtained via amdshark_runner is:")
print(top3_possibilities(torch.from_numpy(result)))
print()

View File

@@ -0,0 +1,842 @@
####################################################################################
# Please make sure you have transformers 4.21.2 installed before running this demo
#
# -p --model_path: the directory in which you want to store the bloom files.
# -dl --device_list: the list of device indices you want to use. if you want to only use the first device, or you are running on cpu leave this blank.
# Otherwise, please give this argument in this format: "[0, 1, 2]"
# -de --device: the device you want to run bloom on. E.G. cpu, cuda
# -c, --recompile: set to true if you want to recompile to vmfb.
# -d, --download: set to true if you want to redownload the mlir files
# -cm, --create_mlirs: set to true if you want to create the mlir files from scratch. please make sure you have transformers 4.21.2 before using this option
# -t --token_count: the number of tokens you want to generate
# -pr --prompt: the prompt you want to feed to the model
# -m --model_name: the name of the model, e.g. bloom-560m
#
# If you don't specify a prompt when you run this example, you will be able to give prompts through the terminal. Run the
# example in this way if you want to run multiple examples without reinitializing the model
#####################################################################################
import os
import io
import torch
import torch.nn as nn
from collections import OrderedDict
import torch_mlir
from torch_mlir import TensorPlaceholder
import re
from transformers.models.bloom.configuration_bloom import BloomConfig
import json
import sys
import argparse
import json
import urllib.request
import subprocess
from torch.fx.experimental.proxy_tensor import make_fx
from torch._decomp import get_decompositions
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_downloader import download_public_file
from transformers import (
BloomTokenizerFast,
BloomForSequenceClassification,
BloomForCausalLM,
)
from transformers.models.bloom.modeling_bloom import (
BloomBlock,
build_alibi_tensor,
)
IS_CUDA = False
class ShardedBloom:
def __init__(self, src_folder):
f = open(f"{src_folder}/config.json")
config = json.load(f)
f.close()
self.layers_initialized = False
self.src_folder = src_folder
try:
self.n_embed = config["n_embed"]
except KeyError:
self.n_embed = config["hidden_size"]
self.vocab_size = config["vocab_size"]
self.n_layer = config["n_layer"]
try:
self.n_head = config["num_attention_heads"]
except KeyError:
self.n_head = config["n_head"]
def _init_layer(self, layer_name, device, replace, device_idx):
if replace or not os.path.exists(
f"{self.src_folder}/{layer_name}.vmfb"
):
f_ = open(f"{self.src_folder}/{layer_name}.mlir", encoding="utf-8")
module = f_.read()
f_.close()
module = bytes(module, "utf-8")
amdshark_module = AMDSharkInference(
module,
device=device,
mlir_dialect="tm_tensor",
device_idx=device_idx,
)
amdshark_module.save_module(
module_name=f"{self.src_folder}/{layer_name}",
extra_args=[
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
"--iree-stream-resource-max-allocation-size=1000000000",
"--iree-codegen-check-ir-before-llvm-conversion=false",
],
)
else:
amdshark_module = AMDSharkInference(
"",
device=device,
mlir_dialect="tm_tensor",
device_idx=device_idx,
)
return amdshark_module
def init_layers(self, device, replace=False, device_idx=[0]):
if device_idx is not None:
n_devices = len(device_idx)
self.word_embeddings_module = self._init_layer(
"word_embeddings",
device,
replace,
device_idx if device_idx is None else device_idx[0 % n_devices],
)
self.word_embeddings_layernorm_module = self._init_layer(
"word_embeddings_layernorm",
device,
replace,
device_idx if device_idx is None else device_idx[1 % n_devices],
)
self.ln_f_module = self._init_layer(
"ln_f",
device,
replace,
device_idx if device_idx is None else device_idx[2 % n_devices],
)
self.lm_head_module = self._init_layer(
"lm_head",
device,
replace,
device_idx if device_idx is None else device_idx[3 % n_devices],
)
self.block_modules = [
self._init_layer(
f"bloom_block_{i}",
device,
replace,
device_idx
if device_idx is None
else device_idx[(i + 4) % n_devices],
)
for i in range(self.n_layer)
]
self.layers_initialized = True
def load_layers(self):
assert self.layers_initialized
self.word_embeddings_module.load_module(
f"{self.src_folder}/word_embeddings.vmfb"
)
self.word_embeddings_layernorm_module.load_module(
f"{self.src_folder}/word_embeddings_layernorm.vmfb"
)
for block_module, i in zip(self.block_modules, range(self.n_layer)):
block_module.load_module(f"{self.src_folder}/bloom_block_{i}.vmfb")
self.ln_f_module.load_module(f"{self.src_folder}/ln_f.vmfb")
self.lm_head_module.load_module(f"{self.src_folder}/lm_head.vmfb")
def forward_pass(self, input_ids, device):
if IS_CUDA:
cudaSetDevice(self.word_embeddings_module.device_idx)
input_embeds = self.word_embeddings_module(
inputs=(input_ids,), function_name="forward"
)
input_embeds = torch.tensor(input_embeds).float()
if IS_CUDA:
cudaSetDevice(self.word_embeddings_layernorm_module.device_idx)
hidden_states = self.word_embeddings_layernorm_module(
inputs=(input_embeds,), function_name="forward"
)
hidden_states = torch.tensor(hidden_states).float()
attention_mask = torch.ones(
[hidden_states.shape[0], len(input_ids[0])]
)
alibi = build_alibi_tensor(
attention_mask,
self.n_head,
hidden_states.dtype,
hidden_states.device,
)
causal_mask = _prepare_attn_mask(
attention_mask, input_ids.size(), input_embeds, 0
)
causal_mask = torch.tensor(causal_mask).float()
presents = ()
all_hidden_states = tuple(hidden_states)
for block_module, i in zip(self.block_modules, range(self.n_layer)):
if IS_CUDA:
cudaSetDevice(block_module.device_idx)
output = block_module(
inputs=(
hidden_states.detach().numpy(),
alibi.detach().numpy(),
causal_mask.detach().numpy(),
),
function_name="forward",
)
hidden_states = torch.tensor(output[0]).float()
all_hidden_states = all_hidden_states + (hidden_states,)
presents = presents + (
tuple(
(
output[1],
output[2],
)
),
)
if IS_CUDA:
cudaSetDevice(self.ln_f_module.device_idx)
hidden_states = self.ln_f_module(
inputs=(hidden_states,), function_name="forward"
)
if IS_CUDA:
cudaSetDevice(self.lm_head_module.device_idx)
logits = self.lm_head_module(
inputs=(hidden_states,), function_name="forward"
)
logits = torch.tensor(logits).float()
return torch.argmax(logits[:, -1, :], dim=-1)
def _make_causal_mask(
input_ids_shape: torch.Size,
dtype: torch.dtype,
past_key_values_length: int = 0,
):
"""
Make causal mask used for bi-directional self-attention.
"""
batch_size, target_length = input_ids_shape
mask = torch.full((target_length, target_length), torch.finfo(dtype).min)
mask_cond = torch.arange(mask.size(-1))
intermediate_mask = mask_cond < (mask_cond + 1).view(mask.size(-1), 1)
mask.masked_fill_(intermediate_mask, 0)
mask = mask.to(dtype)
if past_key_values_length > 0:
mask = torch.cat(
[
torch.zeros(
target_length, past_key_values_length, dtype=dtype
),
mask,
],
dim=-1,
)
expanded_mask = mask[None, None, :, :].expand(
batch_size, 1, target_length, target_length + past_key_values_length
)
return expanded_mask
def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: int = None):
"""
Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
"""
batch_size, source_length = mask.size()
tgt_len = tgt_len if tgt_len is not None else source_length
expanded_mask = (
mask[:, None, None, :]
.expand(batch_size, 1, tgt_len, source_length)
.to(dtype)
)
inverted_mask = 1.0 - expanded_mask
return inverted_mask.masked_fill(
inverted_mask.to(torch.bool), torch.finfo(dtype).min
)
def _prepare_attn_mask(
attention_mask, input_shape, inputs_embeds, past_key_values_length
):
# create causal mask
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
combined_attention_mask = None
if input_shape[-1] > 1:
combined_attention_mask = _make_causal_mask(
input_shape,
inputs_embeds.dtype,
past_key_values_length=past_key_values_length,
).to(attention_mask.device)
if attention_mask is not None:
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
expanded_attn_mask = _expand_mask(
attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
)
combined_attention_mask = (
expanded_attn_mask
if combined_attention_mask is None
else expanded_attn_mask + combined_attention_mask
)
return combined_attention_mask
def download_model(destination_folder, model_name):
download_public_file(
f"gs://amdshark_tank/sharded_bloom/{model_name}/", destination_folder
)
def compile_embeddings(embeddings_layer, input_ids, path):
input_ids_placeholder = torch_mlir.TensorPlaceholder.like(
input_ids, dynamic_axes=[1]
)
module = torch_mlir.compile(
embeddings_layer,
(input_ids_placeholder),
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=False,
verbose=False,
)
bytecode_stream = io.BytesIO()
module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
f_ = open(path, "w+")
f_.write(str(module))
f_.close()
return
def compile_word_embeddings_layernorm(
embeddings_layer_layernorm, embeds, path
):
embeds_placeholder = torch_mlir.TensorPlaceholder.like(
embeds, dynamic_axes=[1]
)
module = torch_mlir.compile(
embeddings_layer_layernorm,
(embeds_placeholder),
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=False,
verbose=False,
)
bytecode_stream = io.BytesIO()
module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
f_ = open(path, "w+")
f_.write(str(module))
f_.close()
return
def strip_overloads(gm):
"""
Modifies the target of graph nodes in :attr:`gm` to strip overloads.
Args:
gm(fx.GraphModule): The input Fx graph module to be modified
"""
for node in gm.graph.nodes:
if isinstance(node.target, torch._ops.OpOverload):
node.target = node.target.overloadpacket
gm.recompile()
def compile_to_mlir(
bblock,
hidden_states,
layer_past=None,
attention_mask=None,
head_mask=None,
use_cache=None,
output_attentions=False,
alibi=None,
block_index=0,
path=".",
):
fx_g = make_fx(
bblock,
decomposition_table=get_decompositions(
[
torch.ops.aten.split.Tensor,
torch.ops.aten.split_with_sizes,
]
),
tracing_mode="real",
_allow_non_fake_inputs=False,
)(hidden_states, alibi, attention_mask)
fx_g.graph.set_codegen(torch.fx.graph.CodeGen())
fx_g.recompile()
strip_overloads(fx_g)
hidden_states_placeholder = TensorPlaceholder.like(
hidden_states, dynamic_axes=[1]
)
attention_mask_placeholder = TensorPlaceholder.like(
attention_mask, dynamic_axes=[2, 3]
)
alibi_placeholder = TensorPlaceholder.like(alibi, dynamic_axes=[2])
ts_g = torch.jit.script(fx_g)
module = torch_mlir.compile(
ts_g,
(
hidden_states_placeholder,
alibi_placeholder,
attention_mask_placeholder,
),
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=False,
verbose=False,
)
module_placeholder = module
module_context = module_placeholder.context
def check_valid_line(line, line_n, mlir_file_len):
if "private" in line:
return False
if "attributes" in line:
return False
if mlir_file_len - line_n == 2:
return False
return True
mlir_file_len = len(str(module).split("\n"))
def remove_constant_dim(line):
if "17x" in line:
line = re.sub("17x", "?x", line)
line = re.sub("tensor.empty\(\)", "tensor.empty(%dim)", line)
if "tensor.empty" in line and "?x?" in line:
line = re.sub(
"tensor.empty\(%dim\)", "tensor.empty(%dim, %dim)", line
)
if "arith.cmpi eq" in line:
line = re.sub("c17", "dim", line)
if " 17," in line:
line = re.sub(" 17,", " %dim,", line)
return line
module = "\n".join(
[
remove_constant_dim(line)
for line, line_n in zip(
str(module).split("\n"), range(mlir_file_len)
)
if check_valid_line(line, line_n, mlir_file_len)
]
)
module = module_placeholder.parse(module, context=module_context)
bytecode_stream = io.BytesIO()
module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
f_ = open(path, "w+")
f_.write(str(module))
f_.close()
return
def compile_ln_f(ln_f, hidden_layers, path):
hidden_layers_placeholder = torch_mlir.TensorPlaceholder.like(
hidden_layers, dynamic_axes=[1]
)
module = torch_mlir.compile(
ln_f,
(hidden_layers_placeholder),
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=False,
verbose=False,
)
bytecode_stream = io.BytesIO()
module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
f_ = open(path, "w+")
f_.write(str(module))
f_.close()
return
def compile_lm_head(lm_head, hidden_layers, path):
hidden_layers_placeholder = torch_mlir.TensorPlaceholder.like(
hidden_layers, dynamic_axes=[1]
)
module = torch_mlir.compile(
lm_head,
(hidden_layers_placeholder),
torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=False,
verbose=False,
)
bytecode_stream = io.BytesIO()
module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
f_ = open(path, "w+")
f_.write(str(module))
f_.close()
return
def create_mlirs(destination_folder, model_name):
model_config = "bigscience/" + model_name
sample_input_ids = torch.ones([1, 17], dtype=torch.int64)
urllib.request.urlretrieve(
f"https://huggingface.co/bigscience/{model_name}/resolve/main/config.json",
filename=f"{destination_folder}/config.json",
)
urllib.request.urlretrieve(
f"https://huggingface.co/bigscience/bloom/resolve/main/tokenizer.json",
filename=f"{destination_folder}/tokenizer.json",
)
class HuggingFaceLanguage(torch.nn.Module):
def __init__(self):
super().__init__()
self.model = BloomForCausalLM.from_pretrained(model_config)
def forward(self, tokens):
return self.model.forward(tokens)[0]
class HuggingFaceBlock(torch.nn.Module):
def __init__(self, block):
super().__init__()
self.model = block
def forward(self, tokens, alibi, attention_mask):
output = self.model(
hidden_states=tokens,
alibi=alibi,
attention_mask=attention_mask,
use_cache=True,
output_attentions=False,
)
return (output[0], output[1][0], output[1][1])
model = HuggingFaceLanguage()
compile_embeddings(
model.model.transformer.word_embeddings,
sample_input_ids,
f"{destination_folder}/word_embeddings.mlir",
)
inputs_embeds = model.model.transformer.word_embeddings(sample_input_ids)
compile_word_embeddings_layernorm(
model.model.transformer.word_embeddings_layernorm,
inputs_embeds,
f"{destination_folder}/word_embeddings_layernorm.mlir",
)
hidden_states = model.model.transformer.word_embeddings_layernorm(
inputs_embeds
)
input_shape = sample_input_ids.size()
current_sequence_length = hidden_states.shape[1]
past_key_values_length = 0
past_key_values = tuple([None] * len(model.model.transformer.h))
attention_mask = torch.ones(
(hidden_states.shape[0], current_sequence_length), device="cpu"
)
alibi = build_alibi_tensor(
attention_mask,
model.model.transformer.n_head,
hidden_states.dtype,
"cpu",
)
causal_mask = _prepare_attn_mask(
attention_mask, input_shape, inputs_embeds, past_key_values_length
)
head_mask = model.model.transformer.get_head_mask(
None, model.model.transformer.config.n_layer
)
output_attentions = model.model.transformer.config.output_attentions
all_hidden_states = ()
for i, (block, layer_past) in enumerate(
zip(model.model.transformer.h, past_key_values)
):
all_hidden_states = all_hidden_states + (hidden_states,)
proxy_model = HuggingFaceBlock(block)
compile_to_mlir(
proxy_model,
hidden_states,
layer_past=layer_past,
attention_mask=causal_mask,
head_mask=head_mask[i],
use_cache=True,
output_attentions=output_attentions,
alibi=alibi,
block_index=i,
path=f"{destination_folder}/bloom_block_{i}.mlir",
)
compile_ln_f(
model.model.transformer.ln_f,
hidden_states,
f"{destination_folder}/ln_f.mlir",
)
hidden_states = model.model.transformer.ln_f(hidden_states)
compile_lm_head(
model.model.lm_head,
hidden_states,
f"{destination_folder}/lm_head.mlir",
)
def run_large_model(
token_count,
recompile,
model_path,
prompt,
device_list,
script_path,
device,
):
f = open(f"{model_path}/prompt.txt", "w+")
f.write(prompt)
f.close()
for i in range(token_count):
if i == 0:
will_compile = recompile
else:
will_compile = False
f = open(f"{model_path}/prompt.txt", "r")
prompt = f.read()
f.close()
subprocess.run(
[
"python",
script_path,
model_path,
"start",
str(will_compile),
"cpu",
"None",
prompt,
]
)
for i in range(config["n_layer"]):
if device_list is not None:
device_idx = str(device_list[i % len(device_list)])
else:
device_idx = "None"
subprocess.run(
[
"python",
script_path,
model_path,
str(i),
str(will_compile),
device,
device_idx,
prompt,
]
)
subprocess.run(
[
"python",
script_path,
model_path,
"end",
str(will_compile),
"cpu",
"None",
prompt,
]
)
f = open(f"{model_path}/prompt.txt", "r")
output = f.read()
f.close()
print(output)
if __name__ == "__main__":
parser = argparse.ArgumentParser(prog="Bloom-560m")
parser.add_argument("-p", "--model_path")
parser.add_argument("-dl", "--device_list", default=None)
parser.add_argument("-de", "--device", default="cpu")
parser.add_argument("-c", "--recompile", default=False, type=bool)
parser.add_argument("-d", "--download", default=False, type=bool)
parser.add_argument("-t", "--token_count", default=10, type=int)
parser.add_argument("-m", "--model_name", default="bloom-560m")
parser.add_argument("-cm", "--create_mlirs", default=False, type=bool)
parser.add_argument(
"-lm", "--large_model_memory_efficient", default=False, type=bool
)
parser.add_argument(
"-pr",
"--prompt",
default=None,
)
args = parser.parse_args()
if args.create_mlirs and args.large_model_memory_efficient:
print(
"Warning: If you need to use memory efficient mode, you probably want to use 'download' instead"
)
if not os.path.isdir(args.model_path):
os.mkdir(args.model_path)
if args.device_list is not None:
args.device_list = json.loads(args.device_list)
if args.device == "cuda" and args.device_list is not None:
IS_CUDA = True
from cuda.cudart import cudaSetDevice
if args.download and args.create_mlirs:
print(
"WARNING: It is not advised to turn on both download and create_mlirs"
)
if args.download:
download_model(args.model_path, args.model_name)
if args.create_mlirs:
create_mlirs(args.model_path, args.model_name)
from transformers import AutoTokenizer, AutoModelForCausalLM, BloomConfig
tokenizer = AutoTokenizer.from_pretrained(args.model_path)
if args.prompt is not None:
input_ids = tokenizer.encode(args.prompt, return_tensors="pt")
if args.large_model_memory_efficient:
f = open(f"{args.model_path}/config.json")
config = json.load(f)
f.close()
self_path = os.path.dirname(os.path.abspath(__file__))
script_path = os.path.join(self_path, "sharded_bloom_large_models.py")
if args.prompt is not None:
run_large_model(
args.token_count,
args.recompile,
args.model_path,
args.prompt,
args.device_list,
script_path,
args.device,
)
else:
while True:
prompt = input("Enter Prompt: ")
try:
token_count = int(
input("Enter number of tokens you want to generate: ")
)
except:
print(
"Invalid integer entered. Using default value of 10"
)
token_count = 10
run_large_model(
token_count,
args.recompile,
args.model_path,
prompt,
args.device_list,
script_path,
args.device,
)
else:
shardedbloom = ShardedBloom(args.model_path)
shardedbloom.init_layers(
device=args.device,
replace=args.recompile,
device_idx=args.device_list,
)
shardedbloom.load_layers()
if args.prompt is not None:
for _ in range(args.token_count):
next_token = shardedbloom.forward_pass(
torch.tensor(input_ids), device=args.device
)
input_ids = torch.cat(
[input_ids, next_token.unsqueeze(-1)], dim=-1
)
print(tokenizer.decode(input_ids.squeeze()))
else:
while True:
prompt = input("Enter Prompt: ")
try:
token_count = int(
input("Enter number of tokens you want to generate: ")
)
except:
print(
"Invalid integer entered. Using default value of 10"
)
token_count = 10
input_ids = tokenizer.encode(prompt, return_tensors="pt")
for _ in range(token_count):
next_token = shardedbloom.forward_pass(
torch.tensor(input_ids), device=args.device
)
input_ids = torch.cat(
[input_ids, next_token.unsqueeze(-1)], dim=-1
)
print(tokenizer.decode(input_ids.squeeze()))

View File

@@ -0,0 +1,381 @@
import sys
import os
from transformers import AutoTokenizer, AutoModelForCausalLM, BloomConfig
import re
from amdshark.amdshark_inference import AMDSharkInference
import torch
import torch.nn as nn
from collections import OrderedDict
from transformers.models.bloom.modeling_bloom import (
BloomBlock,
build_alibi_tensor,
)
import time
import json
def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: int = None):
"""
Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
"""
batch_size, source_length = mask.size()
tgt_len = tgt_len if tgt_len is not None else source_length
expanded_mask = (
mask[:, None, None, :]
.expand(batch_size, 1, tgt_len, source_length)
.to(dtype)
)
inverted_mask = 1.0 - expanded_mask
return inverted_mask.masked_fill(
inverted_mask.to(torch.bool), torch.finfo(dtype).min
)
def _prepare_attn_mask(
attention_mask, input_shape, inputs_embeds, past_key_values_length
):
# create causal mask
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
combined_attention_mask = None
if input_shape[-1] > 1:
combined_attention_mask = _make_causal_mask(
input_shape,
inputs_embeds.dtype,
past_key_values_length=past_key_values_length,
).to(attention_mask.device)
if attention_mask is not None:
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
expanded_attn_mask = _expand_mask(
attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
)
combined_attention_mask = (
expanded_attn_mask
if combined_attention_mask is None
else expanded_attn_mask + combined_attention_mask
)
return combined_attention_mask
def _make_causal_mask(
input_ids_shape: torch.Size,
dtype: torch.dtype,
past_key_values_length: int = 0,
):
"""
Make causal mask used for bi-directional self-attention.
"""
batch_size, target_length = input_ids_shape
mask = torch.full((target_length, target_length), torch.finfo(dtype).min)
mask_cond = torch.arange(mask.size(-1))
intermediate_mask = mask_cond < (mask_cond + 1).view(mask.size(-1), 1)
mask.masked_fill_(intermediate_mask, 0)
mask = mask.to(dtype)
if past_key_values_length > 0:
mask = torch.cat(
[
torch.zeros(
target_length, past_key_values_length, dtype=dtype
),
mask,
],
dim=-1,
)
expanded_mask = mask[None, None, :, :].expand(
batch_size, 1, target_length, target_length + past_key_values_length
)
return expanded_mask
if __name__ == "__main__":
working_dir = sys.argv[1]
layer_name = sys.argv[2]
will_compile = sys.argv[3]
device = sys.argv[4]
device_idx = sys.argv[5]
prompt = sys.argv[6]
if device_idx.lower().strip() == "none":
device_idx = None
else:
device_idx = int(device_idx)
if will_compile.lower().strip() == "true":
will_compile = True
else:
will_compile = False
f = open(f"{working_dir}/config.json")
config = json.load(f)
f.close()
layers_initialized = False
try:
n_embed = config["n_embed"]
except KeyError:
n_embed = config["hidden_size"]
vocab_size = config["vocab_size"]
n_layer = config["n_layer"]
try:
n_head = config["num_attention_heads"]
except KeyError:
n_head = config["n_head"]
if not os.path.isdir(working_dir):
os.mkdir(working_dir)
if layer_name == "start":
tokenizer = AutoTokenizer.from_pretrained(working_dir)
input_ids = tokenizer.encode(prompt, return_tensors="pt")
mlir_str = ""
if will_compile:
f = open(f"{working_dir}/word_embeddings.mlir", encoding="utf-8")
mlir_str = f.read()
f.close()
mlir_str = bytes(mlir_str, "utf-8")
amdshark_module = AMDSharkInference(
mlir_str,
device="cpu",
mlir_dialect="tm_tensor",
device_idx=None,
)
if will_compile:
amdshark_module.save_module(
module_name=f"{working_dir}/word_embeddings",
extra_args=[
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
"--iree-stream-resource-max-allocation-size=1000000000",
"--iree-codegen-check-ir-before-llvm-conversion=false",
],
)
amdshark_module.load_module(f"{working_dir}/word_embeddings.vmfb")
input_embeds = amdshark_module(
inputs=(input_ids,), function_name="forward"
)
input_embeds = torch.tensor(input_embeds).float()
mlir_str = ""
if will_compile:
f = open(
f"{working_dir}/word_embeddings_layernorm.mlir",
encoding="utf-8",
)
mlir_str = f.read()
f.close()
amdshark_module = AMDSharkInference(
mlir_str,
device="cpu",
mlir_dialect="tm_tensor",
device_idx=None,
)
if will_compile:
amdshark_module.save_module(
module_name=f"{working_dir}/word_embeddings_layernorm",
extra_args=[
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
"--iree-stream-resource-max-allocation-size=1000000000",
"--iree-codegen-check-ir-before-llvm-conversion=false",
],
)
amdshark_module.load_module(
f"{working_dir}/word_embeddings_layernorm.vmfb"
)
hidden_states = amdshark_module(
inputs=(input_embeds,), function_name="forward"
)
hidden_states = torch.tensor(hidden_states).float()
torch.save(hidden_states, f"{working_dir}/hidden_states_0.pt")
attention_mask = torch.ones(
[hidden_states.shape[0], len(input_ids[0])]
)
attention_mask = torch.tensor(attention_mask).float()
alibi = build_alibi_tensor(
attention_mask,
n_head,
hidden_states.dtype,
device="cpu",
)
torch.save(alibi, f"{working_dir}/alibi.pt")
causal_mask = _prepare_attn_mask(
attention_mask, input_ids.size(), input_embeds, 0
)
causal_mask = torch.tensor(causal_mask).float()
torch.save(causal_mask, f"{working_dir}/causal_mask.pt")
elif layer_name in [str(x) for x in range(n_layer)]:
hidden_states = torch.load(
f"{working_dir}/hidden_states_{layer_name}.pt"
)
alibi = torch.load(f"{working_dir}/alibi.pt")
causal_mask = torch.load(f"{working_dir}/causal_mask.pt")
mlir_str = ""
if will_compile:
f = open(
f"{working_dir}/bloom_block_{layer_name}.mlir",
encoding="utf-8",
)
mlir_str = f.read()
f.close()
mlir_str = bytes(mlir_str, "utf-8")
amdshark_module = AMDSharkInference(
mlir_str,
device=device,
mlir_dialect="tm_tensor",
device_idx=device_idx,
)
if will_compile:
amdshark_module.save_module(
module_name=f"{working_dir}/bloom_block_{layer_name}",
extra_args=[
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
"--iree-stream-resource-max-allocation-size=1000000000",
"--iree-codegen-check-ir-before-llvm-conversion=false",
],
)
amdshark_module.load_module(
f"{working_dir}/bloom_block_{layer_name}.vmfb"
)
output = amdshark_module(
inputs=(
hidden_states.detach().numpy(),
alibi.detach().numpy(),
causal_mask.detach().numpy(),
),
function_name="forward",
)
hidden_states = torch.tensor(output[0]).float()
torch.save(
hidden_states,
f"{working_dir}/hidden_states_{int(layer_name) + 1}.pt",
)
elif layer_name == "end":
mlir_str = ""
if will_compile:
f = open(f"{working_dir}/ln_f.mlir", encoding="utf-8")
mlir_str = f.read()
f.close()
mlir_str = bytes(mlir_str, "utf-8")
amdshark_module = AMDSharkInference(
mlir_str,
device="cpu",
mlir_dialect="tm_tensor",
device_idx=None,
)
if will_compile:
amdshark_module.save_module(
module_name=f"{working_dir}/ln_f",
extra_args=[
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
"--iree-stream-resource-max-allocation-size=1000000000",
"--iree-codegen-check-ir-before-llvm-conversion=false",
],
)
amdshark_module.load_module(f"{working_dir}/ln_f.vmfb")
hidden_states = torch.load(f"{working_dir}/hidden_states_{n_layer}.pt")
hidden_states = amdshark_module(
inputs=(hidden_states,), function_name="forward"
)
mlir_str = ""
if will_compile:
f = open(f"{working_dir}/lm_head.mlir", encoding="utf-8")
mlir_str = f.read()
f.close()
mlir_str = bytes(mlir_str, "utf-8")
if config["n_embed"] == 14336:
def get_state_dict():
d = torch.load(
f"{working_dir}/pytorch_model_00001-of-00072.bin"
)
return OrderedDict(
(k.replace("word_embeddings.", ""), v)
for k, v in d.items()
)
def load_causal_lm_head():
linear = nn.utils.skip_init(
nn.Linear, 14336, 250880, bias=False, dtype=torch.float
)
linear.load_state_dict(get_state_dict(), strict=False)
return linear.float()
lm_head = load_causal_lm_head()
logits = lm_head(torch.tensor(hidden_states).float())
else:
amdshark_module = AMDSharkInference(
mlir_str,
device="cpu",
mlir_dialect="tm_tensor",
device_idx=None,
)
if will_compile:
amdshark_module.save_module(
module_name=f"{working_dir}/lm_head",
extra_args=[
"--iree-vm-bytecode-module-output-format=flatbuffer-binary",
"--iree-stream-resource-max-allocation-size=1000000000",
"--iree-codegen-check-ir-before-llvm-conversion=false",
],
)
amdshark_module.load_module(f"{working_dir}/lm_head.vmfb")
logits = amdshark_module(
inputs=(hidden_states,), function_name="forward"
)
logits = torch.tensor(logits).float()
tokenizer = AutoTokenizer.from_pretrained(working_dir)
next_token = tokenizer.decode(torch.argmax(logits[:, -1, :], dim=-1))
f = open(f"{working_dir}/prompt.txt", "w+")
f.write(prompt + next_token)
f.close()

View File

@@ -0,0 +1,390 @@
# Description: an implementation of a deep learning recommendation model (DLRM)
# The model input consists of dense and sparse features. The former is a vector
# of floating point values. The latter is a list of sparse indices into
# embedding tables, which consist of vectors of floating point values.
# The selected vectors are passed to mlp networks denoted by triangles,
# in some cases the vectors are interacted through operators (Ops).
#
# output:
# vector of values
# model: |
# /\
# /__\
# |
# _____________________> Op <___________________
# / | \
# /\ /\ /\
# /__\ /__\ ... /__\
# | | |
# | Op Op
# | ____/__\_____ ____/__\____
# | |_Emb_|____|__| ... |_Emb_|__|___|
# input:
# [ dense features ] [sparse indices] , ..., [sparse indices]
#
# More precise definition of model layers:
# 1) fully connected layers of an mlp
# z = f(y)
# y = Wx + b
#
# 2) embedding lookup (for a list of sparse indices p=[p1,...,pk])
# z = Op(e1,...,ek)
# obtain vectors e1=E[:,p1], ..., ek=E[:,pk]
#
# 3) Operator Op can be one of the following
# Sum(e1,...,ek) = e1 + ... + ek
# Dot(e1,...,ek) = [e1'e1, ..., e1'ek, ..., ek'e1, ..., ek'ek]
# Cat(e1,...,ek) = [e1', ..., ek']'
# where ' denotes transpose operation
#
# References:
# [1] Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang,
# Narayanan Sundaram, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu,
# Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii,
# Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko,
# Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong,
# Misha Smelyanskiy, "Deep Learning Recommendation Model for Personalization and
# Recommendation Systems", CoRR, arXiv:1906.00091, 2019
import argparse
import sys
import numpy as np
import torch
import torch.nn as nn
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_importer import AMDSharkImporter
torch.manual_seed(0)
np.random.seed(0)
### define dlrm in PyTorch ###
class DLRM_Net(nn.Module):
def create_mlp(self, ln, sigmoid_layer):
# build MLP layer by layer
layers = nn.ModuleList()
for i in range(0, ln.size - 1):
n = ln[i]
m = ln[i + 1]
# construct fully connected operator
LL = nn.Linear(int(n), int(m), bias=True)
# initialize the weights
# with torch.no_grad():
# custom Xavier input, output or two-sided fill
mean = 0.0 # std_dev = np.sqrt(variance)
std_dev = np.sqrt(2 / (m + n)) # np.sqrt(1 / m) # np.sqrt(1 / n)
W = np.random.normal(mean, std_dev, size=(m, n)).astype(np.float32)
std_dev = np.sqrt(1 / m) # np.sqrt(2 / (m + 1))
bt = np.random.normal(mean, std_dev, size=m).astype(np.float32)
LL.weight.data = torch.tensor(W, requires_grad=True)
LL.bias.data = torch.tensor(bt, requires_grad=True)
# approach 2
# LL.weight.data.copy_(torch.tensor(W))
# LL.bias.data.copy_(torch.tensor(bt))
# approach 3
# LL.weight = Parameter(torch.tensor(W),requires_grad=True)
# LL.bias = Parameter(torch.tensor(bt),requires_grad=True)
layers.append(LL)
# construct sigmoid or relu operator
if i == sigmoid_layer:
layers.append(nn.Sigmoid())
else:
layers.append(nn.ReLU())
# approach 1: use ModuleList
# return layers
# approach 2: use Sequential container to wrap all layers
return torch.nn.Sequential(*layers)
def create_emb(self, m, ln, weighted_pooling=None):
emb_l = nn.ModuleList()
v_W_l = []
for i in range(0, ln.size):
n = ln[i]
# construct embedding operator
EE = nn.EmbeddingBag(n, m, mode="sum")
# initialize embeddings
# nn.init.uniform_(EE.weight, a=-np.sqrt(1 / n), b=np.sqrt(1 / n))
W = np.random.uniform(
low=-np.sqrt(1 / n), high=np.sqrt(1 / n), size=(n, m)
).astype(np.float32)
# approach 1
print(W)
EE.weight.data = torch.tensor(W, requires_grad=True)
# approach 2
# EE.weight.data.copy_(torch.tensor(W))
# approach 3
# EE.weight = Parameter(torch.tensor(W),requires_grad=True)
if weighted_pooling is None:
v_W_l.append(None)
else:
v_W_l.append(torch.ones(n, dtype=torch.float32))
emb_l.append(EE)
return emb_l, v_W_l
def __init__(
self,
m_spa=None,
ln_emb=None,
ln_bot=None,
ln_top=None,
arch_interaction_op=None,
arch_interaction_itself=False,
sigmoid_bot=-1,
sigmoid_top=-1,
weighted_pooling=None,
):
super(DLRM_Net, self).__init__()
if (
(m_spa is not None)
and (ln_emb is not None)
and (ln_bot is not None)
and (ln_top is not None)
and (arch_interaction_op is not None)
):
# save arguments
self.output_d = 0
self.arch_interaction_op = arch_interaction_op
self.arch_interaction_itself = arch_interaction_itself
if weighted_pooling is not None and weighted_pooling != "fixed":
self.weighted_pooling = "learned"
else:
self.weighted_pooling = weighted_pooling
# create operators
self.emb_l, w_list = self.create_emb(
m_spa, ln_emb, weighted_pooling
)
if self.weighted_pooling == "learned":
self.v_W_l = nn.ParameterList()
for w in w_list:
self.v_W_l.append(nn.Parameter(w))
else:
self.v_W_l = w_list
self.bot_l = self.create_mlp(ln_bot, sigmoid_bot)
self.top_l = self.create_mlp(ln_top, sigmoid_top)
def apply_mlp(self, x, layers):
return layers(x)
def apply_emb(self, lS_o, lS_i, emb_l, v_W_l):
# WARNING: notice that we are processing the batch at once. We implicitly
# assume that the data is laid out such that:
# 1. each embedding is indexed with a group of sparse indices,
# corresponding to a single lookup
# 2. for each embedding the lookups are further organized into a batch
# 3. for a list of embedding tables there is a list of batched lookups
# TORCH-MLIR
# We are passing all the embeddings as arguments for easy parsing.
ly = []
for k, sparse_index_group_batch in enumerate(lS_i):
sparse_offset_group_batch = lS_o[k]
# embedding lookup
# We are using EmbeddingBag, which implicitly uses sum operator.
# The embeddings are represented as tall matrices, with sum
# happening vertically across 0 axis, resulting in a row vector
# E = emb_l[k]
if v_W_l[k] is not None:
per_sample_weights = v_W_l[k].gather(
0, sparse_index_group_batch
)
else:
per_sample_weights = None
E = emb_l[k]
V = E(
sparse_index_group_batch,
sparse_offset_group_batch,
per_sample_weights=per_sample_weights,
)
ly.append(V)
return ly
def interact_features(self, x, ly):
if self.arch_interaction_op == "dot":
# concatenate dense and sparse features
(batch_size, d) = x.shape
T = torch.cat([x] + ly, dim=1).view((batch_size, -1, d))
# perform a dot product
Z = torch.bmm(T, torch.transpose(T, 1, 2))
# append dense feature with the interactions (into a row vector)
# approach 1: all
# Zflat = Z.view((batch_size, -1))
# approach 2: unique
_, ni, nj = Z.shape
# approach 1: tril_indices
# offset = 0 if self.arch_interaction_itself else -1
# li, lj = torch.tril_indices(ni, nj, offset=offset)
# approach 2: custom
offset = 1 if self.arch_interaction_itself else 0
li = torch.tensor(
[i for i in range(ni) for j in range(i + offset)]
)
lj = torch.tensor(
[j for i in range(nj) for j in range(i + offset)]
)
Zflat = Z[:, li, lj]
# concatenate dense features and interactions
R = torch.cat([x] + [Zflat], dim=1)
elif self.arch_interaction_op == "cat":
# concatenation features (into a row vector)
R = torch.cat([x] + ly, dim=1)
else:
sys.exit(
"ERROR: --arch-interaction-op="
+ self.arch_interaction_op
+ " is not supported"
)
return R
def forward(self, dense_x, lS_o, *lS_i):
return self.sequential_forward(dense_x, lS_o, lS_i)
def sequential_forward(self, dense_x, lS_o, lS_i):
# process dense features (using bottom mlp), resulting in a row vector
x = self.apply_mlp(dense_x, self.bot_l)
# debug prints
# print("intermediate")
# print(x.detach().cpu().numpy())
# process sparse features(using embeddings), resulting in a list of row vectors
ly = self.apply_emb(lS_o, lS_i, self.emb_l, self.v_W_l)
# for y in ly:
# print(y.detach().cpu().numpy())
# interact features (dense and sparse)
z = self.interact_features(x, ly)
# print(z.detach().cpu().numpy())
# obtain probability of a click (using top mlp)
p = self.apply_mlp(z, self.top_l)
# # clamp output if needed
# if 0.0 < self.loss_threshold and self.loss_threshold < 1.0:
# z = torch.clamp(p, min=self.loss_threshold, max=(1.0 - self.loss_threshold))
# else:
# z = p
return p
def dash_separated_ints(value):
vals = value.split("-")
for val in vals:
try:
int(val)
except ValueError:
raise argparse.ArgumentTypeError(
"%s is not a valid dash separated list of ints" % value
)
return value
# model related parameters
parser = argparse.ArgumentParser(
description="Train Deep Learning Recommendation Model (DLRM)"
)
parser.add_argument("--arch-sparse-feature-size", type=int, default=2)
parser.add_argument(
"--arch-embedding-size", type=dash_separated_ints, default="4-3-2"
)
# j will be replaced with the table number
parser.add_argument(
"--arch-mlp-bot", type=dash_separated_ints, default="4-3-2"
)
parser.add_argument(
"--arch-mlp-top", type=dash_separated_ints, default="8-2-1"
)
parser.add_argument(
"--arch-interaction-op", type=str, choices=["dot", "cat"], default="dot"
)
parser.add_argument(
"--arch-interaction-itself", action="store_true", default=False
)
parser.add_argument("--weighted-pooling", type=str, default=None)
args = parser.parse_args()
ln_bot = np.fromstring(args.arch_mlp_bot, dtype=int, sep="-")
ln_top = np.fromstring(args.arch_mlp_top, dtype=int, sep="-")
m_den = ln_bot[0]
ln_emb = np.fromstring(args.arch_embedding_size, dtype=int, sep="-")
m_spa = args.arch_sparse_feature_size
ln_emb = np.asarray(ln_emb)
num_fea = ln_emb.size + 1 # num sparse + num dense features
# Initialize the model.
dlrm_model = DLRM_Net(
m_spa=m_spa,
ln_emb=ln_emb,
ln_bot=ln_bot,
ln_top=ln_top,
arch_interaction_op=args.arch_interaction_op,
)
# Inputs to the model.
dense_inp = torch.tensor([[0.6965, 0.2861, 0.2269, 0.5513]])
vs0 = torch.tensor([[0], [0], [0]], dtype=torch.int64)
vsi = torch.tensor([1, 2, 3]), torch.tensor([1]), torch.tensor([1])
input_dlrm = (dense_inp, vs0, *vsi)
golden_output = dlrm_model(dense_inp, vs0, *vsi)
mlir_importer = AMDSharkImporter(
dlrm_model,
input_dlrm,
frontend="torch",
)
(dlrm_mlir, func_name), inputs, golden_out = mlir_importer.import_debug(
tracing_required=True
)
amdshark_module = AMDSharkInference(
dlrm_mlir, device="vulkan", mlir_dialect="linalg"
)
amdshark_module.compile()
result = amdshark_module.forward(input_dlrm)
np.testing.assert_allclose(
golden_output.detach().numpy(), result, rtol=1e-02, atol=1e-03
)
# Verified via torch-mlir.
# import torch_mlir
# from torch_mlir_e2e_test.linalg_on_tensors_backends import refbackend
# module = torch_mlir.compile(
# dlrm_model, inputs, use_tracing=True, output_type="linalg-on-tensors"
# )
# backend = refbackend.RefBackendLinalgOnTensorsBackend()
# compiled = backend.compile(module)
# jit_module = backend.load(compiled)
# dense_numpy = dense_inp.numpy()
# vs0_numpy = vs0.numpy()
# vsi_numpy = [inp.numpy() for inp in vsi]
# numpy_inp = (dense_numpy, vs0_numpy, *vsi_numpy)
# print(jit_module.forward(*numpy_inp))

View File

@@ -0,0 +1,311 @@
import torch
from torch import nn
from torchrec.datasets.utils import Batch
from torchrec.modules.crossnet import LowRankCrossNet
from torchrec.sparse.jagged_tensor import KeyedJaggedTensor, KeyedTensor
from torchrec.modules.embedding_configs import EmbeddingBagConfig
from torchrec.modules.embedding_modules import EmbeddingBagCollection
from torchrec.sparse.jagged_tensor import KeyedJaggedTensor
from typing import Dict, List, Optional, Tuple
from torchrec.models.dlrm import (
choose,
DenseArch,
DLRM,
InteractionArch,
SparseArch,
OverArch,
)
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_importer import AMDSharkImporter
import numpy as np
torch.manual_seed(0)
np.random.seed(0)
def calculate_offsets(tensor_list, prev_values, prev_offsets):
offset_init = 0
offset_list = []
values_list = []
if prev_offsets != None:
offset_init = prev_values.shape[-1]
for tensor in tensor_list:
offset_list.append(offset_init)
offset_init += tensor.shape[0]
concatendated_tensor_list = torch.cat(tensor_list)
if prev_values != None:
concatendated_tensor_list = torch.cat(
[prev_values, concatendated_tensor_list]
)
concatenated_offsets = torch.tensor(offset_list)
if prev_offsets != None:
concatenated_offsets = torch.cat([prev_offsets, concatenated_offsets])
return concatendated_tensor_list, concatenated_offsets
# Have to make combined_keys as dict as to which embedding bags they
# point to. {f1: 0, f3: 0, f2: 1}
# The result will be a triple containing values, indices and pointer tensor.
def to_list(key_jagged, combined_keys):
key_jagged_dict = key_jagged.to_dict()
combined_list = []
for key in combined_keys:
prev_values, prev_offsets = calculate_offsets(
key_jagged_dict[key].to_dense(), None, None
)
print(prev_values)
print(prev_offsets)
combined_list.append(prev_values)
combined_list.append(prev_offsets)
combined_list.append(torch.tensor(combined_keys[key]))
return combined_list
class SparseArchAMDShark(nn.Module):
def create_emb(self, embedding_dim, num_embeddings_list):
embedding_list = nn.ModuleList()
for i in range(0, num_embeddings_list.size):
num_embeddings = num_embeddings_list[i]
EE = nn.EmbeddingBag(num_embeddings, embedding_dim, mode="sum")
W = np.random.uniform(
low=-np.sqrt(1 / num_embeddings),
high=np.sqrt(1 / num_embeddings),
size=(num_embeddings, embedding_dim),
).astype(np.float32)
EE.weight.data = torch.tensor(W, requires_grad=True)
embedding_list.append(EE)
return embedding_list
def __init__(
self,
embedding_dim,
total_features,
num_embeddings_list,
):
super(SparseArchAMDShark, self).__init__()
self.embedding_dim = embedding_dim
self.num_features = total_features
self.embedding_list = self.create_emb(
embedding_dim, num_embeddings_list
)
def forward(self, *batched_inputs):
concatenated_list = []
input_enum, embedding_enum = 0, 0
for k in range(len(batched_inputs) // 3):
values = batched_inputs[input_enum]
input_enum += 1
offsets = batched_inputs[input_enum]
input_enum += 1
embedding_pointer = int(batched_inputs[input_enum])
input_enum += 1
E = self.embedding_list[embedding_pointer]
V = E(values, offsets)
concatenated_list.append(V)
return torch.cat(concatenated_list, dim=1).reshape(
-1, self.num_features, self.embedding_dim
)
def test_sparse_arch() -> None:
D = 3
eb1_config = EmbeddingBagConfig(
name="t1",
embedding_dim=D,
num_embeddings=10,
feature_names=["f1", "f3"],
)
eb2_config = EmbeddingBagConfig(
name="t2",
embedding_dim=D,
num_embeddings=10,
feature_names=["f2"],
)
ebc = EmbeddingBagCollection(tables=[eb1_config, eb2_config])
w1 = ebc.embedding_bags["t1"].weight
w2 = ebc.embedding_bags["t2"].weight
sparse_arch = SparseArch(ebc)
keys = ["f1", "f2", "f3", "f4", "f5"]
offsets = torch.tensor([0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 19])
features = KeyedJaggedTensor.from_offsets_sync(
keys=keys,
values=torch.tensor(
[1, 2, 4, 5, 4, 3, 2, 9, 1, 2, 4, 5, 4, 3, 2, 9, 1, 2, 3]
),
offsets=offsets,
)
sparse_archi = SparseArchAMDShark(D, 3, np.array([10, 10]))
sparse_archi.embedding_list[0].weight = w1
sparse_archi.embedding_list[1].weight = w2
inputs = to_list(features, {"f1": 0, "f3": 0, "f2": 1})
test_results = sparse_archi(*inputs)
sparse_features = sparse_arch(features)
torch.allclose(
sparse_features,
test_results,
rtol=1e-4,
atol=1e-4,
)
test_sparse_arch()
class DLRMAMDShark(nn.Module):
def __init__(
self,
embedding_dim,
total_features,
num_embeddings_list,
dense_in_features: int,
dense_arch_layer_sizes: List[int],
over_arch_layer_sizes: List[int],
) -> None:
super().__init__()
self.sparse_arch: SparseArchAMDShark = SparseArchAMDShark(
embedding_dim, total_features, num_embeddings_list
)
num_sparse_features: int = total_features
self.dense_arch = DenseArch(
in_features=dense_in_features,
layer_sizes=dense_arch_layer_sizes,
)
self.inter_arch = InteractionArch(
num_sparse_features=num_sparse_features,
)
over_in_features: int = (
embedding_dim
+ choose(num_sparse_features, 2)
+ num_sparse_features
)
self.over_arch = OverArch(
in_features=over_in_features,
layer_sizes=over_arch_layer_sizes,
)
def forward(
self, dense_features: torch.Tensor, *sparse_features
) -> torch.Tensor:
embedded_dense = self.dense_arch(dense_features)
embedded_sparse = self.sparse_arch(*sparse_features)
concatenated_dense = self.inter_arch(
dense_features=embedded_dense, sparse_features=embedded_sparse
)
logits = self.over_arch(concatenated_dense)
return logits
def test_dlrm() -> None:
B = 2
D = 8
dense_in_features = 100
eb1_config = EmbeddingBagConfig(
name="t1",
embedding_dim=D,
num_embeddings=100,
feature_names=["f1", "f3"],
)
eb2_config = EmbeddingBagConfig(
name="t2",
embedding_dim=D,
num_embeddings=100,
feature_names=["f2"],
)
ebc = EmbeddingBagCollection(tables=[eb1_config, eb2_config])
sparse_features = KeyedJaggedTensor.from_offsets_sync(
keys=["f1", "f3", "f2"],
values=torch.tensor([1, 2, 4, 5, 4, 3, 2, 9, 1, 2, 3]),
offsets=torch.tensor([0, 2, 4, 6, 8, 10, 11]),
)
ebc = EmbeddingBagCollection(tables=[eb1_config, eb2_config])
sparse_nn = DLRM(
embedding_bag_collection=ebc,
dense_in_features=dense_in_features,
dense_arch_layer_sizes=[20, D],
over_arch_layer_sizes=[5, 1],
)
sparse_nn_nod = DLRMAMDShark(
embedding_dim=8,
total_features=3,
num_embeddings_list=np.array([100, 100]),
dense_in_features=dense_in_features,
dense_arch_layer_sizes=[20, D],
over_arch_layer_sizes=[5, 1],
)
dense_features = torch.rand((B, dense_in_features))
x = to_list(sparse_features, {"f1": 0, "f3": 0, "f2": 1})
w1 = ebc.embedding_bags["t1"].weight
w2 = ebc.embedding_bags["t2"].weight
sparse_nn_nod.sparse_arch.embedding_list[0].weight = w1
sparse_nn_nod.sparse_arch.embedding_list[1].weight = w2
sparse_nn_nod.dense_arch.load_state_dict(sparse_nn.dense_arch.state_dict())
sparse_nn_nod.inter_arch.load_state_dict(sparse_nn.inter_arch.state_dict())
sparse_nn_nod.over_arch.load_state_dict(sparse_nn.over_arch.state_dict())
logits = sparse_nn(
dense_features=dense_features,
sparse_features=sparse_features,
)
logits_nod = sparse_nn_nod(dense_features, *x)
# print(logits)
# print(logits_nod)
# Import the module and print.
mlir_importer = AMDSharkImporter(
sparse_nn_nod,
(dense_features, *x),
frontend="torch",
)
(dlrm_mlir, func_name), inputs, golden_out = mlir_importer.import_debug(
tracing_required=True
)
amdshark_module = AMDSharkInference(
dlrm_mlir, device="cpu", mlir_dialect="linalg"
)
amdshark_module.compile()
result = amdshark_module.forward(inputs)
np.testing.assert_allclose(golden_out, result, rtol=1e-02, atol=1e-03)
torch.allclose(
logits,
logits_nod,
rtol=1e-4,
atol=1e-4,
)
test_dlrm()

View File

@@ -3,7 +3,7 @@ import requests
from transformers import T5Tokenizer, TFT5Model
import tensorflow as tf
from shark.shark_inference import SharkInference
from amdshark.amdshark_inference import AMDSharkInference
# Create a set of inputs
t5_inputs = [
@@ -18,7 +18,7 @@ class T5Module(tf.Module):
self.m = TFT5Model.from_pretrained("t5-small")
self.m.predict = lambda x, y: self.m(input_ids=x, decoder_input_ids=y)
@tf.function(input_signature=t5_inputs)
@tf.function(input_signature=t5_inputs, jit_compile=True)
def forward(self, input_ids, decoder_input_ids):
return self.m.predict(input_ids, decoder_input_ids)
@@ -29,7 +29,7 @@ if __name__ == "__main__":
text = "I love the distilled version of models."
inputs = tokenizer(text, return_tensors="tf").input_ids
shark_module = SharkInference(T5Module(), (inputs, inputs))
shark_module.set_frontend("tensorflow")
shark_module.compile()
print(shark_module.forward((inputs, inputs)))
amdshark_module = AMDSharkInference(T5Module(), (inputs, inputs))
amdshark_module.set_frontend("tensorflow")
amdshark_module.compile()
print(amdshark_module.forward((inputs, inputs)))

View File

@@ -1,6 +1,6 @@
import torch
import torchvision.models as models
from shark.shark_inference import SharkInference
from amdshark.amdshark_inference import AMDSharkInference
class VisionModule(torch.nn.Module):
@@ -35,9 +35,9 @@ vision_models_list = [
]
for i, vision_model in enumerate(vision_models_list):
shark_module = SharkInference(
amdshark_module = AMDSharkInference(
VisionModule(vision_model),
(input,),
)
shark_module.compile()
shark_module.forward((input,))
amdshark_module.compile()
amdshark_module.forward((input,))

View File

@@ -1,7 +1,7 @@
import torch
import numpy as np
from shark.shark_inference import SharkInference
from shark.shark_importer import SharkImporter
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_importer import AMDSharkImporter
class UnetModule(torch.nn.Module):
@@ -23,7 +23,7 @@ class UnetModule(torch.nn.Module):
input = torch.randn(1, 3, 224, 224)
mlir_importer = SharkImporter(
mlir_importer = AMDSharkImporter(
UnetModule(),
(input,),
frontend="torch",
@@ -33,7 +33,7 @@ mlir_importer = SharkImporter(
tracing_required=False
)
shark_module = SharkInference(vision_mlir, func_name, mlir_dialect="linalg")
shark_module.compile()
result = shark_module.forward((input,))
amdshark_module = AMDSharkInference(vision_mlir, mlir_dialect="linalg")
amdshark_module.compile()
result = amdshark_module.forward((input,))
np.testing.assert_allclose(golden_out, result, rtol=1e-02, atol=1e-03)

View File

@@ -0,0 +1,21 @@
import requests
from PIL import Image
from io import BytesIO
from pipeline_amdshark_stable_diffusion_upscale import (
AMDSharkStableDiffusionUpscalePipeline,
)
import torch
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = AMDSharkStableDiffusionUpscalePipeline(model_id)
# let's download an image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
low_res_img = low_res_img.resize((128, 128))
prompt = "a white cat"
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat.png")

View File

@@ -0,0 +1,98 @@
from diffusers import AutoencoderKL, UNet2DConditionModel
from transformers import CLIPTextModel
from utils import compile_through_fx
import torch
model_id = "stabilityai/stable-diffusion-x4-upscaler"
model_input = {
"clip": (torch.randint(1, 2, (1, 77)),),
"vae": (torch.randn(1, 4, 128, 128),),
"unet": (
torch.randn(2, 7, 128, 128), # latents
torch.tensor([1]).to(torch.float32), # timestep
torch.randn(2, 77, 1024), # embedding
torch.randn(2).to(torch.int64), # noise_level
),
}
def get_clip_mlir(model_name="clip_text", extra_args=[]):
text_encoder = CLIPTextModel.from_pretrained(
model_id,
subfolder="text_encoder",
)
class CLIPText(torch.nn.Module):
def __init__(self):
super().__init__()
self.text_encoder = text_encoder
def forward(self, input):
return self.text_encoder(input)[0]
clip_model = CLIPText()
amdshark_clip = compile_through_fx(
clip_model,
model_input["clip"],
model_name=model_name,
extra_args=extra_args,
)
return amdshark_clip
def get_vae_mlir(model_name="vae", extra_args=[]):
class VaeModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.vae = AutoencoderKL.from_pretrained(
model_id,
subfolder="vae",
)
def forward(self, input):
x = self.vae.decode(input, return_dict=False)[0]
return x
vae = VaeModel()
amdshark_vae = compile_through_fx(
vae,
model_input["vae"],
model_name=model_name,
extra_args=extra_args,
)
return amdshark_vae
def get_unet_mlir(model_name="unet", extra_args=[]):
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = UNet2DConditionModel.from_pretrained(
model_id,
subfolder="unet",
)
self.in_channels = self.unet.in_channels
self.train(False)
def forward(self, latent, timestep, text_embedding, noise_level):
unet_out = self.unet.forward(
latent,
timestep,
text_embedding,
noise_level,
return_dict=False,
)[0]
return unet_out
unet = UnetModel()
f16_input_mask = (True, True, True, False)
amdshark_unet = compile_through_fx(
unet,
model_input["unet"],
model_name=model_name,
is_f16=True,
f16_input_mask=f16_input_mask,
extra_args=extra_args,
)
return amdshark_unet

View File

@@ -0,0 +1,48 @@
import sys
from model_wrappers import (
get_vae_mlir,
get_unet_mlir,
get_clip_mlir,
)
from upscaler_args import args
from utils import get_amdshark_model
BATCH_SIZE = len(args.prompts)
if BATCH_SIZE != 1:
sys.exit("Only batch size 1 is supported.")
unet_flag = [
"--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-preprocessing-convert-conv2d-to-img2col,iree-preprocessing-pad-linalg-ops{pad-size=32}))"
]
vae_flag = [
"--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-flow-convert-conv-nchw-to-nhwc,iree-preprocessing-pad-linalg-ops{pad-size=16}))"
]
clip_flag = [
"--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-preprocessing-pad-linalg-ops{pad-size=16}))"
]
bucket = "gs://amdshark_tank/stable_diffusion/"
def get_unet():
model_name = "upscaler_unet"
if args.import_mlir:
return get_unet_mlir(model_name, unet_flag)
return get_amdshark_model(bucket, model_name, unet_flag)
def get_vae():
model_name = "upscaler_vae"
if args.import_mlir:
return get_vae_mlir(model_name, vae_flag)
return get_amdshark_model(bucket, model_name, vae_flag)
def get_clip():
model_name = "upscaler_clip"
if args.import_mlir:
return get_clip_mlir(model_name, clip_flag)
return get_amdshark_model(bucket, model_name, clip_flag)

View File

@@ -0,0 +1,489 @@
import inspect
from typing import Callable, List, Optional, Union
import numpy as np
import torch
import PIL
from PIL import Image
from diffusers.utils import is_accelerate_available
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel
from diffusers import (
DDIMScheduler,
DDPMScheduler,
LMSDiscreteScheduler,
PNDMScheduler,
)
from diffusers import logging
from diffusers.pipeline_utils import ImagePipelineOutput
from opt_params import get_unet, get_vae, get_clip
from tqdm.auto import tqdm
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
def preprocess(image):
if isinstance(image, torch.Tensor):
return image
elif isinstance(image, PIL.Image.Image):
image = [image]
if isinstance(image[0], PIL.Image.Image):
w, h = image[0].size
w, h = map(
lambda x: x - x % 64, (w, h)
) # resize to integer multiple of 64
image = [np.array(i.resize((w, h)))[None, :] for i in image]
image = np.concatenate(image, axis=0)
image = np.array(image).astype(np.float32) / 255.0
image = image.transpose(0, 3, 1, 2)
image = 2.0 * image - 1.0
image = torch.from_numpy(image)
elif isinstance(image[0], torch.Tensor):
image = torch.cat(image, dim=0)
return image
def amdshark_run_wrapper(model, *args):
np_inputs = tuple([x.detach().numpy() for x in args])
outputs = model("forward", np_inputs)
return torch.from_numpy(outputs)
class AMDSharkStableDiffusionUpscalePipeline:
def __init__(
self,
model_id,
):
self.tokenizer = CLIPTokenizer.from_pretrained(
model_id, subfolder="tokenizer"
)
self.low_res_scheduler = DDPMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
)
self.scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
)
self.vae = get_vae()
self.unet = get_unet()
self.text_encoder = get_clip()
self.max_noise_level = (350,)
self._execution_device = "cpu"
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline._encode_prompt
def _encode_prompt(
self,
prompt,
device,
num_images_per_prompt,
do_classifier_free_guidance,
negative_prompt,
):
r"""
Encodes the prompt into text encoder hidden states.
Args:
prompt (`str` or `list(int)`):
prompt to be encoded
device: (`torch.device`):
torch device
num_images_per_prompt (`int`):
number of images that should be generated per prompt
do_classifier_free_guidance (`bool`):
whether to use classifier free guidance or not
negative_prompt (`str` or `List[str]`):
The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored
if `guidance_scale` is less than `1`).
"""
batch_size = len(prompt) if isinstance(prompt, list) else 1
text_inputs = self.tokenizer(
prompt,
padding="max_length",
max_length=self.tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)
text_input_ids = text_inputs.input_ids
untruncated_ids = self.tokenizer(
prompt, padding="longest", return_tensors="pt"
).input_ids
if untruncated_ids.shape[-1] >= text_input_ids.shape[
-1
] and not torch.equal(text_input_ids, untruncated_ids):
removed_text = self.tokenizer.batch_decode(
untruncated_ids[:, self.tokenizer.model_max_length - 1 : -1]
)
logger.warning(
"The following part of your input was truncated because CLIP can only handle sequences up to"
f" {self.tokenizer.model_max_length} tokens: {removed_text}"
)
# if (
# hasattr(self.text_encoder.config, "use_attention_mask")
# and self.text_encoder.config.use_attention_mask
# ):
# attention_mask = text_inputs.attention_mask.to(device)
# else:
# attention_mask = None
text_embeddings = amdshark_run_wrapper(
self.text_encoder, text_input_ids.to(device)
)
# duplicate text embeddings for each generation per prompt, using mps friendly method
bs_embed, seq_len, _ = text_embeddings.shape
text_embeddings = text_embeddings.repeat(1, num_images_per_prompt, 1)
text_embeddings = text_embeddings.view(
bs_embed * num_images_per_prompt, seq_len, -1
)
# get unconditional embeddings for classifier free guidance
if do_classifier_free_guidance:
uncond_tokens: List[str]
if negative_prompt is None:
uncond_tokens = [""] * batch_size
elif type(prompt) is not type(negative_prompt):
raise TypeError(
f"`negative_prompt` should be the same type to `prompt`, but got {type(negative_prompt)} !="
f" {type(prompt)}."
)
elif isinstance(negative_prompt, str):
uncond_tokens = [negative_prompt]
elif batch_size != len(negative_prompt):
raise ValueError(
f"`negative_prompt`: {negative_prompt} has batch size {len(negative_prompt)}, but `prompt`:"
f" {prompt} has batch size {batch_size}. Please make sure that passed `negative_prompt` matches"
" the batch size of `prompt`."
)
else:
uncond_tokens = negative_prompt
max_length = text_input_ids.shape[-1]
uncond_input = self.tokenizer(
uncond_tokens,
padding="max_length",
max_length=max_length,
truncation=True,
return_tensors="pt",
)
# if (
# hasattr(self.text_encoder.config, "use_attention_mask")
# and self.text_encoder.config.use_attention_mask
# ):
# attention_mask = uncond_input.attention_mask.to(device)
# else:
# attention_mask = None
uncond_embeddings = amdshark_run_wrapper(
self.text_encoder,
uncond_input.input_ids.to(device),
)
uncond_embeddings = uncond_embeddings
# duplicate unconditional embeddings for each generation per prompt, using mps friendly method
seq_len = uncond_embeddings.shape[1]
uncond_embeddings = uncond_embeddings.repeat(
1, num_images_per_prompt, 1
)
uncond_embeddings = uncond_embeddings.view(
batch_size * num_images_per_prompt, seq_len, -1
)
# For classifier free guidance, we need to do two forward passes.
# Here we concatenate the unconditional and text embeddings into a single batch
# to avoid doing two forward passes
text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
return text_embeddings
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs
def prepare_extra_step_kwargs(self, generator, eta):
# prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
# eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
# eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
# and should be between [0, 1]
accepts_eta = "eta" in set(
inspect.signature(self.scheduler.step).parameters.keys()
)
extra_step_kwargs = {}
if accepts_eta:
extra_step_kwargs["eta"] = eta
# check if the scheduler accepts generator
accepts_generator = "generator" in set(
inspect.signature(self.scheduler.step).parameters.keys()
)
if accepts_generator:
extra_step_kwargs["generator"] = generator
return extra_step_kwargs
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.decode_latents with 0.18215->0.08333
def decode_latents(self, latents):
latents = 1 / 0.08333 * latents
image = amdshark_run_wrapper(self.vae, latents)
image = (image / 2 + 0.5).clamp(0, 1)
# we always cast to float32 as this does not cause significant overhead and is compatible with bfloa16
image = image.cpu().permute(0, 2, 3, 1).float().numpy()
return image
def check_inputs(self, prompt, image, noise_level, callback_steps):
if not isinstance(prompt, str) and not isinstance(prompt, list):
raise ValueError(
f"`prompt` has to be of type `str` or `list` but is {type(prompt)}"
)
if (
not isinstance(image, torch.Tensor)
and not isinstance(image, PIL.Image.Image)
and not isinstance(image, list)
):
raise ValueError(
f"`image` has to be of type `torch.Tensor`, `PIL.Image.Image` or `list` but is {type(image)}"
)
# verify batch size of prompt and image are same if image is a list or tensor
if isinstance(image, list) or isinstance(image, torch.Tensor):
if isinstance(prompt, str):
batch_size = 1
else:
batch_size = len(prompt)
if isinstance(image, list):
image_batch_size = len(image)
else:
image_batch_size = image.shape[0]
if batch_size != image_batch_size:
raise ValueError(
f"`prompt` has batch size {batch_size} and `image` has batch size {image_batch_size}."
" Please make sure that passed `prompt` matches the batch size of `image`."
)
@staticmethod
def numpy_to_pil(images):
"""
Convert a numpy image or a batch of images to a PIL image.
"""
if images.ndim == 3:
images = images[None, ...]
images = (images * 255).round().astype("uint8")
if images.shape[-1] == 1:
# special case for grayscale (single channel) images
pil_images = [
Image.fromarray(image.squeeze(), mode="L") for image in images
]
else:
pil_images = [Image.fromarray(image) for image in images]
return pil_images
def prepare_latents(
self,
batch_size,
num_channels_latents,
height,
width,
dtype,
device,
generator,
latents=None,
):
shape = (batch_size, num_channels_latents, height, width)
if latents is None:
if device == "mps":
# randn does not work reproducibly on mps
latents = torch.randn(
shape, generator=generator, device="cpu", dtype=dtype
).to(device)
else:
latents = torch.randn(
shape, generator=generator, device=device, dtype=dtype
)
else:
if latents.shape != shape:
raise ValueError(
f"Unexpected latents shape, got {latents.shape}, expected {shape}"
)
latents = latents.to(device)
# scale the initial noise by the standard deviation required by the scheduler
latents = latents * self.scheduler.init_noise_sigma
return latents
@torch.no_grad()
def __call__(
self,
prompt: Union[str, List[str]],
image: Union[
torch.FloatTensor, PIL.Image.Image, List[PIL.Image.Image]
],
num_inference_steps: int = 75,
guidance_scale: float = 9.0,
noise_level: int = 20,
negative_prompt: Optional[Union[str, List[str]]] = None,
num_images_per_prompt: Optional[int] = 1,
eta: float = 0.0,
generator: Optional[
Union[torch.Generator, List[torch.Generator]]
] = None,
latents: Optional[torch.FloatTensor] = None,
output_type: Optional[str] = "pil",
return_dict: bool = True,
callback: Optional[
Callable[[int, int, torch.FloatTensor], None]
] = None,
callback_steps: Optional[int] = 1,
):
# 1. Check inputs
self.check_inputs(prompt, image, noise_level, callback_steps)
# 2. Define call parameters
batch_size = 1 if isinstance(prompt, str) else len(prompt)
device = self._execution_device
# here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
# of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
# corresponds to doing no classifier free guidance.
do_classifier_free_guidance = guidance_scale > 1.0
# 3. Encode input prompt
text_embeddings = self._encode_prompt(
prompt,
device,
num_images_per_prompt,
do_classifier_free_guidance,
negative_prompt,
)
# 4. Preprocess image
image = preprocess(image)
image = image.to(dtype=text_embeddings.dtype, device=device)
# 5. set timesteps
self.scheduler.set_timesteps(num_inference_steps, device=device)
timesteps = self.scheduler.timesteps
# 5. Add noise to image
noise_level = torch.tensor(
[noise_level], dtype=torch.long, device=device
)
if device == "mps":
# randn does not work reproducibly on mps
noise = torch.randn(
image.shape,
generator=generator,
device="cpu",
dtype=text_embeddings.dtype,
).to(device)
else:
noise = torch.randn(
image.shape,
generator=generator,
device=device,
dtype=text_embeddings.dtype,
)
image = self.low_res_scheduler.add_noise(image, noise, noise_level)
batch_multiplier = 2 if do_classifier_free_guidance else 1
image = torch.cat([image] * batch_multiplier * num_images_per_prompt)
noise_level = torch.cat([noise_level] * image.shape[0])
# 6. Prepare latent variables
height, width = image.shape[2:]
# num_channels_latents = self.vae.config.latent_channels
num_channels_latents = 4
latents = self.prepare_latents(
batch_size * num_images_per_prompt,
num_channels_latents,
height,
width,
text_embeddings.dtype,
device,
generator,
latents,
)
# 7. Check that sizes of image and latents match
num_channels_image = image.shape[1]
# if (
# num_channels_latents + num_channels_image
# != self.unet.config.in_channels
# ):
# raise ValueError(
# f"Incorrect configuration settings! The config of `pipeline.unet`: {self.unet.config} expects"
# f" {self.unet.config.in_channels} but received `num_channels_latents`: {num_channels_latents} +"
# f" `num_channels_image`: {num_channels_image} "
# f" = {num_channels_latents+num_channels_image}. Please verify the config of"
# " `pipeline.unet` or your `image` input."
# )
# 8. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
# 9. Denoising loop
num_warmup_steps = (
len(timesteps) - num_inference_steps * self.scheduler.order
)
for i, t in tqdm(enumerate(timesteps)):
# expand the latents if we are doing classifier free guidance
latent_model_input = (
torch.cat([latents] * 2)
if do_classifier_free_guidance
else latents
)
# concat latents, mask, masked_image_latents in the channel dimension
latent_model_input = self.scheduler.scale_model_input(
latent_model_input, t
)
latent_model_input = torch.cat([latent_model_input, image], dim=1)
timestep = torch.tensor([t]).to(torch.float32)
# predict the noise residual
noise_pred = amdshark_run_wrapper(
self.unet,
latent_model_input.half(),
timestep,
text_embeddings.half(),
noise_level,
)
# perform guidance
if do_classifier_free_guidance:
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + guidance_scale * (
noise_pred_text - noise_pred_uncond
)
# compute the previous noisy sample x_t -> x_t-1
latents = self.scheduler.step(
noise_pred, t, latents, **extra_step_kwargs
).prev_sample
# # call the callback, if provided
# if i == len(timesteps) - 1 or (
# (i + 1) > num_warmup_steps
# and (i + 1) % self.scheduler.order == 0
# ):
# progress_bar.update()
# if callback is not None and i % callback_steps == 0:
# callback(i, t, latents)
# 10. Post-processing
# make sure the VAE is in float32 mode, as it overflows in float16
# self.vae.to(dtype=torch.float32)
image = self.decode_latents(latents.float())
# 11. Convert to PIL
if output_type == "pil":
image = self.numpy_to_pil(image)
if not return_dict:
return (image,)
return ImagePipelineOutput(images=image)

View File

@@ -0,0 +1,98 @@
import argparse
p = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
##############################################################################
### Stable Diffusion Params
##############################################################################
p.add_argument(
"--prompts",
nargs="+",
default=["cyberpunk forest by Salvador Dali"],
help="text of which images to be generated.",
)
p.add_argument(
"--negative-prompts",
nargs="+",
default=[""],
help="text you don't want to see in the generated image.",
)
p.add_argument(
"--steps",
type=int,
default=50,
help="the no. of steps to do the sampling.",
)
p.add_argument(
"--seed",
type=int,
default=42,
help="the seed to use.",
)
p.add_argument(
"--guidance_scale",
type=float,
default=7.5,
help="the value to be used for guidance scaling.",
)
##############################################################################
### Model Config and Usage Params
##############################################################################
p.add_argument(
"--device", type=str, default="vulkan", help="device to run the model."
)
p.add_argument(
"--precision", type=str, default="fp16", help="precision to run the model."
)
p.add_argument(
"--import_mlir",
default=False,
action=argparse.BooleanOptionalAction,
help="imports the model from torch module to amdshark_module otherwise downloads the model from amdshark_tank.",
)
p.add_argument(
"--load_vmfb",
default=True,
action=argparse.BooleanOptionalAction,
help="attempts to load the model from a precompiled flatbuffer and compiles + saves it if not found.",
)
p.add_argument(
"--save_vmfb",
default=False,
action=argparse.BooleanOptionalAction,
help="saves the compiled flatbuffer to the local directory",
)
##############################################################################
### IREE - Vulkan supported flags
##############################################################################
p.add_argument(
"--iree-vulkan-target-triple",
type=str,
default="",
help="Specify target triple for vulkan",
)
p.add_argument(
"--vulkan_debug_utils",
default=False,
action=argparse.BooleanOptionalAction,
help="Profiles vulkan device and collects the .rdc info",
)
args = p.parse_args()

View File

@@ -0,0 +1,230 @@
import os
import torch
from amdshark.amdshark_inference import AMDSharkInference
from upscaler_args import args
from amdshark.amdshark_importer import import_with_fx
from amdshark.iree_utils.vulkan_utils import (
set_iree_vulkan_runtime_flags,
get_vulkan_target_triple,
get_iree_vulkan_runtime_flags,
)
def _compile_module(amdshark_module, model_name, extra_args=[]):
if args.load_vmfb or args.save_vmfb:
device = (
args.device
if "://" not in args.device
else "-".join(args.device.split("://"))
)
extended_name = "{}_{}".format(model_name, device)
vmfb_path = os.path.join(os.getcwd(), extended_name + ".vmfb")
if args.load_vmfb and os.path.isfile(vmfb_path) and not args.save_vmfb:
print(f"loading existing vmfb from: {vmfb_path}")
amdshark_module.load_module(vmfb_path, extra_args=extra_args)
else:
if args.save_vmfb:
print("Saving to {}".format(vmfb_path))
else:
print(
"No vmfb found. Compiling and saving to {}".format(
vmfb_path
)
)
path = amdshark_module.save_module(
os.getcwd(), extended_name, extra_args
)
amdshark_module.load_module(path, extra_args=extra_args)
else:
amdshark_module.compile(extra_args)
return amdshark_module
# Downloads the model from amdshark_tank and returns the amdshark_module.
def get_amdshark_model(tank_url, model_name, extra_args=[]):
from amdshark.amdshark_downloader import download_model
from amdshark.parser import amdshark_args
# Set local amdshark_tank cache directory.
# amdshark_args.local_tank_cache = args.local_tank_cache
mlir_model, func_name, inputs, golden_out = download_model(
model_name,
tank_url=tank_url,
frontend="torch",
)
amdshark_module = AMDSharkInference(
mlir_model, device=args.device, mlir_dialect="linalg"
)
return _compile_module(amdshark_module, model_name, extra_args)
# Converts the torch-module into a amdshark_module.
def compile_through_fx(
model, inputs, model_name, is_f16=False, f16_input_mask=None, extra_args=[]
):
mlir_module, func_name = import_with_fx(
model, inputs, is_f16, f16_input_mask
)
amdshark_module = AMDSharkInference(
mlir_module,
device=args.device,
mlir_dialect="linalg",
)
return _compile_module(amdshark_module, model_name, extra_args)
def set_iree_runtime_flags():
vulkan_runtime_flags = get_iree_vulkan_runtime_flags()
if args.enable_rgp:
vulkan_runtime_flags += [
f"--enable_rgp=true",
f"--vulkan_debug_utils=true",
]
set_iree_vulkan_runtime_flags(flags=vulkan_runtime_flags)
def get_all_devices(driver_name):
"""
Inputs: driver_name
Returns a list of all the available devices for a given driver sorted by
the iree path names of the device as in --list_devices option in iree.
"""
from iree.runtime import get_driver
driver = get_driver(driver_name)
device_list_src = driver.query_available_devices()
device_list_src.sort(key=lambda d: d["path"])
return device_list_src
def get_device_mapping(driver, key_combination=3):
"""This method ensures consistent device ordering when choosing
specific devices for execution
Args:
driver (str): execution driver (vulkan, cuda, rocm, etc)
key_combination (int, optional): choice for mapping value for device name.
1 : path
2 : name
3 : (name, path)
Defaults to 3.
Returns:
dict: map to possible device names user can input mapped to desired combination of name/path.
"""
from amdshark.iree_utils._common import iree_device_map
driver = iree_device_map(driver)
device_list = get_all_devices(driver)
device_map = dict()
def get_output_value(dev_dict):
if key_combination == 1:
return f"{driver}://{dev_dict['path']}"
if key_combination == 2:
return dev_dict["name"]
if key_combination == 3:
return (dev_dict["name"], f"{driver}://{dev_dict['path']}")
# mapping driver name to default device (driver://0)
device_map[f"{driver}"] = get_output_value(device_list[0])
for i, device in enumerate(device_list):
# mapping with index
device_map[f"{driver}://{i}"] = get_output_value(device)
# mapping with full path
device_map[f"{driver}://{device['path']}"] = get_output_value(device)
return device_map
def map_device_to_name_path(device, key_combination=3):
"""Gives the appropriate device data (supported name/path) for user selected execution device
Args:
device (str): user
key_combination (int, optional): choice for mapping value for device name.
1 : path
2 : name
3 : (name, path)
Defaults to 3.
Raises:
ValueError:
Returns:
str / tuple: returns the mapping str or tuple of mapping str for the device depending on key_combination value
"""
driver = device.split("://")[0]
device_map = get_device_mapping(driver, key_combination)
try:
device_mapping = device_map[device]
except KeyError:
raise ValueError(f"Device '{device}' is not a valid device.")
return device_mapping
def set_init_device_flags():
if "vulkan" in args.device:
# set runtime flags for vulkan.
set_iree_runtime_flags()
# set triple flag to avoid multiple calls to get_vulkan_triple_flag
device_name, args.device = map_device_to_name_path(args.device)
if not args.iree_vulkan_target_triple:
triple = get_vulkan_target_triple(device_name)
if triple is not None:
args.iree_vulkan_target_triple = triple
print(
f"Found device {device_name}. Using target triple {args.iree_vulkan_target_triple}."
)
elif "cuda" in args.device:
args.device = "cuda"
elif "cpu" in args.device:
args.device = "cpu"
# set max_length based on availability.
if args.variant in ["anythingv3", "analogdiffusion", "dreamlike"]:
args.max_length = 77
elif args.variant == "openjourney":
args.max_length = 64
# use tuned models only in the case of stablediffusion/fp16 and rdna3 cards.
if (
args.variant in ["openjourney", "dreamlike"]
or args.precision != "fp16"
or "vulkan" not in args.device
or "rdna3" not in args.iree_vulkan_target_triple
):
args.use_tuned = False
print("Tuned models are currently not supported for this setting.")
elif args.use_base_vae and args.variant != "stablediffusion":
args.use_tuned = False
print("Tuned models are currently not supported for this setting.")
if args.use_tuned:
print("Using tuned models for stablediffusion/fp16 and rdna3 card.")
# Utility to get list of devices available.
def get_available_devices():
def get_devices_by_name(driver_name):
from amdshark.iree_utils._common import iree_device_map
device_list = []
try:
driver_name = iree_device_map(driver_name)
device_list_dict = get_all_devices(driver_name)
print(f"{driver_name} devices are available.")
except:
print(f"{driver_name} devices are not available.")
else:
for i, device in enumerate(device_list_dict):
device_list.append(f"{driver_name}://{i} => {device['name']}")
return device_list
set_iree_runtime_flags()
available_devices = []
vulkan_devices = get_devices_by_name("vulkan")
available_devices.extend(vulkan_devices)
cuda_devices = get_devices_by_name("cuda")
available_devices.extend(cuda_devices)
available_devices.append("cpu")
return available_devices

View File

@@ -0,0 +1,15 @@
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.amdshark_downloader import download_model
mlir_model, func_name, inputs, golden_out = download_model(
"v_diffusion", frontend="torch"
)
amdshark_module = AMDSharkInference(
mlir_model, device="vulkan", mlir_dialect="linalg"
)
amdshark_module.compile()
result = amdshark_module.forward(inputs)
print("The obtained result via amdshark is: ", result)
print("The golden result is:", golden_out)

View File

@@ -1,7 +1,7 @@
import torch
from torch.nn.utils import _stateless
from torch.nn.utils import stateless
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from shark.shark_runner import SharkTrainer
from amdshark.amdshark_trainer import AMDSharkTrainer
class MiniLMSequenceClassification(torch.nn.Module):
@@ -33,7 +33,7 @@ inp = (torch.randint(2, (1, 128)),)
def forward(params, buffers, args):
params_and_buffers = {**params, **buffers}
_stateless.functional_call(
stateless.functional_call(
mod, params_and_buffers, args, {}
).sum().backward()
optim = torch.optim.SGD(get_sorted_params(params), lr=0.01)
@@ -42,6 +42,7 @@ def forward(params, buffers, args):
return params, buffers
shark_module = SharkTrainer(mod, inp, custom_inference_fn=forward)
print(shark_module.forward())
amdshark_module = AMDSharkTrainer(mod, inp)
amdshark_module.compile(forward)
amdshark_module.train(num_iters=2)
print("training done")

View File

@@ -3,8 +3,8 @@ import os
import time
import tensorflow as tf
from shark.shark_trainer import SharkTrainer
from shark.parser import parser
from amdshark.amdshark_trainer import AMDSharkTrainer
from amdshark.parser import parser
from urllib import request
parser.add_argument(
@@ -28,7 +28,7 @@ if __name__ == "__main__":
np.random.randint(5, size=(BATCH_SIZE, SEQUENCE_LENGTH)),
np.random.randint(5, size=(BATCH_SIZE, SEQUENCE_LENGTH)),
]
file_link = "https://storage.googleapis.com/shark_tank/users/stanley/bert_tf_training.mlir"
file_link = "https://storage.googleapis.com/amdshark_tank/users/stanley/bert_tf_training.mlir"
response = request.urlretrieve(file_link, load_args.download_mlir_path)
sample_input_tensors = [
tf.convert_to_tensor(val, dtype=tf.int32)
@@ -41,7 +41,7 @@ if __name__ == "__main__":
)
with open(load_args.download_mlir_path, "rb") as input_file:
bert_mlir = input_file.read()
shark_module = SharkTrainer(
amdshark_module = AMDSharkTrainer(
bert_mlir,
(
sample_input_tensors,
@@ -50,10 +50,10 @@ if __name__ == "__main__":
),
),
)
shark_module.set_frontend("mhlo")
shark_module.compile()
amdshark_module.set_frontend("mhlo")
amdshark_module.compile()
start = time.time()
print(shark_module.train(num_iter))
print(amdshark_module.train(num_iter))
end = time.time()
total_time = end - start
print("time: " + str(total_time))

View File

@@ -8,7 +8,7 @@ from official.nlp.modeling import layers
from official.nlp.modeling import networks
from official.nlp.modeling.models import bert_classifier
from shark.shark_trainer import SharkTrainer
from amdshark.amdshark_trainer import AMDSharkTrainer
tf.random.set_seed(0)
@@ -52,7 +52,8 @@ class BertModule(tf.Module):
input_signature=[
bert_input, # inputs
tf.TensorSpec(shape=[BATCH_SIZE], dtype=tf.int32), # labels
]
],
jit_compile=True,
)
def forward(self, inputs, labels):
with tf.GradientTape() as tape:
@@ -78,7 +79,7 @@ if __name__ == "__main__":
for val in predict_sample_input
]
num_iter = 10
shark_module = SharkTrainer(
amdshark_module = AMDSharkTrainer(
BertModule(),
(
sample_input_tensors,
@@ -87,10 +88,10 @@ if __name__ == "__main__":
),
),
)
shark_module.set_frontend("tensorflow")
shark_module.compile()
amdshark_module.set_frontend("tensorflow")
amdshark_module.compile()
start = time.time()
print(shark_module.train(num_iter))
print(amdshark_module.train(num_iter))
end = time.time()
total_time = end - start
print("time: " + str(total_time))

View File

@@ -1,6 +1,6 @@
import torch
from torch.nn.utils import _stateless
from shark.shark_trainer import SharkTrainer
from amdshark.amdshark_trainer import AMDSharkTrainer
class Foo(torch.nn.Module):
@@ -37,8 +37,8 @@ def forward(params, buffers, args):
# fx_graph = forward(dict(mod.named_parameters()), dict(mod.named_buffers()), inp)
shark_module = SharkTrainer(mod, inp)
amdshark_module = AMDSharkTrainer(mod, inp)
# Pass the training function in case of torch
shark_module.compile(training_fn=forward)
amdshark_module.compile(training_fn=forward)
shark_module.train(num_iters=10)
amdshark_module.train(num_iters=10)

View File

@@ -0,0 +1,41 @@
# Stable Diffusion Img2Img model
## Installation
<details>
<summary>Installation (Linux)</summary>
### Activate amdshark.venv Virtual Environment
```shell
source amdshark.venv/bin/activate
# Some older pip installs may not be able to handle the recent PyTorch deps
python -m pip install --upgrade pip
```
### Install dependencies
# Run the setup.sh script
```shell
./setup.sh
```
### Run the Stable diffusion Img2Img model
To run the model with the default set of images and params, run:
```shell
python stable_diffusion_img2img.py
```
To run the model with your set of images, and parameters you need to specify the following params:
1.) Input images directory with the arg `--input_dir` containing 3-5 images.
2.) What to teach the model? Using the arg `--what_to_teach`, allowed values are `object` or `style`.
3.) Placeholder token using the arg `--placeholder_token`, that represents your new concept. It should be passed with the opening and closing angle brackets. For ex: token is `cat-toy`, it should be passed as `<cat-toy>`.
4.) Initializer token using the arg `--initializer_token`, which summarise what is your new concept.
For the result, you need to pass the text prompt with the arg: `--prompt`. The prompt string should contain a "*s" in it, which will be replaced by the placeholder token during the inference.
By default the result images will go into the `sd_result` dir. To specify your output dir use the arg: `--output_dir`.
The default value of max_training_steps is `3000`, which takes some hours to complete. You can pass the smaller value with the arg `--training_steps`. Specify the number of images to be sampled for the result with the `--num_inference_samples` arg.

View File

@@ -0,0 +1,25 @@
#!/bin/bash
TD="$(cd $(dirname $0) && pwd)"
if [ -z "$PYTHON" ]; then
PYTHON="$(which python3)"
fi
function die() {
echo "Error executing command: $*"
exit 1
}
PYTHON_VERSION_X_Y=`${PYTHON} -c 'import sys; version=sys.version_info[:2]; print("{0}.{1}".format(*version))'`
echo "Python: $PYTHON"
echo "Python version: $PYTHON_VERSION_X_Y"
mkdir input_images
wget https://huggingface.co/datasets/valhalla/images/resolve/main/2.jpeg -P input_images/
wget https://huggingface.co/datasets/valhalla/images/resolve/main/3.jpeg -P input_images/
wget https://huggingface.co/datasets/valhalla/images/resolve/main/5.jpeg -P input_images/
wget https://huggingface.co/datasets/valhalla/images/resolve/main/6.jpeg -P input_images/
pip install diffusers["training"]==0.4.1 transformers ftfy opencv-python

View File

@@ -0,0 +1,600 @@
# Textual-inversion fine-tuning for Stable Diffusion using diffusers
# This script shows how to "teach" Stable Diffusion a new concept via
# textual-inversion using 🤗 Hugging Face [🧨 Diffusers library](https://github.com/huggingface/diffusers).
# By using just 3-5 images you can teach new concepts to Stable Diffusion
# and personalize the model on your own images.
import argparse
import itertools
import math
import os
import random
import cv2
import numpy as np
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
from torch.utils.data import Dataset
import PIL
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import set_seed
from diffusers import (
AutoencoderKL,
DDPMScheduler,
PNDMScheduler,
StableDiffusionPipeline,
UNet2DConditionModel,
)
from diffusers.hub_utils import init_git_repo, push_to_hub
from diffusers.optimization import get_scheduler
from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
from PIL import Image
from torchvision import transforms
from tqdm.auto import tqdm
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
YOUR_TOKEN = "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"
p = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
p.add_argument(
"--input_dir",
type=str,
default="input_images/",
help="the directory contains the images used for fine tuning",
)
p.add_argument(
"--output_dir",
type=str,
default="sd_result",
help="the directory contains the images used for fine tuning",
)
p.add_argument(
"--training_steps",
type=int,
default=3000,
help="the maximum number of training steps",
)
p.add_argument("--seed", type=int, default=42, help="the random seed")
p.add_argument(
"--what_to_teach",
type=str,
choices=["object", "style"],
default="object",
help="what is it that you are teaching?",
)
p.add_argument(
"--placeholder_token",
type=str,
default="<cat-toy>",
help="It is the token you are going to use to represent your new concept",
)
p.add_argument(
"--initializer_token",
type=str,
default="toy",
help="It is a word that can summarise what is your new concept",
)
p.add_argument(
"--inference_steps",
type=int,
default=50,
help="the number of steps for inference",
)
p.add_argument(
"--num_inference_samples",
type=int,
default=4,
help="the number of samples for inference",
)
p.add_argument(
"--prompt",
type=str,
default="a grafitti in a wall with a *s on it",
help="the text prompt to use",
)
args = p.parse_args()
if "*s" not in args.prompt:
raise ValueError(
f'The prompt should have a "*s" which will be replaced by a placeholder token.'
)
prompt1, prompt2 = args.prompt.split("*s")
args.prompt = prompt1 + args.placeholder_token + prompt2
pretrained_model_name_or_path = "CompVis/stable-diffusion-v1-4"
# Load input images.
images = []
for filename in os.listdir(args.input_dir):
img = cv2.imread(os.path.join(args.input_dir, filename))
if img is not None:
images.append(img)
# Setup the prompt templates for training
imagenet_templates_small = [
"a photo of a {}",
"a rendering of a {}",
"a cropped photo of the {}",
"the photo of a {}",
"a photo of a clean {}",
"a photo of a dirty {}",
"a dark photo of the {}",
"a photo of my {}",
"a photo of the cool {}",
"a close-up photo of a {}",
"a bright photo of the {}",
"a cropped photo of a {}",
"a photo of the {}",
"a good photo of the {}",
"a photo of one {}",
"a close-up photo of the {}",
"a rendition of the {}",
"a photo of the clean {}",
"a rendition of a {}",
"a photo of a nice {}",
"a good photo of a {}",
"a photo of the nice {}",
"a photo of the small {}",
"a photo of the weird {}",
"a photo of the large {}",
"a photo of a cool {}",
"a photo of a small {}",
]
imagenet_style_templates_small = [
"a painting in the style of {}",
"a rendering in the style of {}",
"a cropped painting in the style of {}",
"the painting in the style of {}",
"a clean painting in the style of {}",
"a dirty painting in the style of {}",
"a dark painting in the style of {}",
"a picture in the style of {}",
"a cool painting in the style of {}",
"a close-up painting in the style of {}",
"a bright painting in the style of {}",
"a cropped painting in the style of {}",
"a good painting in the style of {}",
"a close-up painting in the style of {}",
"a rendition in the style of {}",
"a nice painting in the style of {}",
"a small painting in the style of {}",
"a weird painting in the style of {}",
"a large painting in the style of {}",
]
# Setup the dataset
class TextualInversionDataset(Dataset):
def __init__(
self,
data_root,
tokenizer,
learnable_property="object", # [object, style]
size=512,
repeats=100,
interpolation="bicubic",
flip_p=0.5,
set="train",
placeholder_token="*",
center_crop=False,
):
self.data_root = data_root
self.tokenizer = tokenizer
self.learnable_property = learnable_property
self.size = size
self.placeholder_token = placeholder_token
self.center_crop = center_crop
self.flip_p = flip_p
self.image_paths = [
os.path.join(self.data_root, file_path)
for file_path in os.listdir(self.data_root)
]
self.num_images = len(self.image_paths)
self._length = self.num_images
if set == "train":
self._length = self.num_images * repeats
self.interpolation = {
"linear": PIL.Image.LINEAR,
"bilinear": PIL.Image.BILINEAR,
"bicubic": PIL.Image.BICUBIC,
"lanczos": PIL.Image.LANCZOS,
}[interpolation]
self.templates = (
imagenet_style_templates_small
if learnable_property == "style"
else imagenet_templates_small
)
self.flip_transform = transforms.RandomHorizontalFlip(p=self.flip_p)
def __len__(self):
return self._length
def __getitem__(self, i):
example = {}
image = Image.open(self.image_paths[i % self.num_images])
if not image.mode == "RGB":
image = image.convert("RGB")
placeholder_string = self.placeholder_token
text = random.choice(self.templates).format(placeholder_string)
example["input_ids"] = self.tokenizer(
text,
padding="max_length",
truncation=True,
max_length=self.tokenizer.model_max_length,
return_tensors="pt",
).input_ids[0]
# default to score-sde preprocessing
img = np.array(image).astype(np.uint8)
if self.center_crop:
crop = min(img.shape[0], img.shape[1])
(
h,
w,
) = (
img.shape[0],
img.shape[1],
)
img = img[
(h - crop) // 2 : (h + crop) // 2,
(w - crop) // 2 : (w + crop) // 2,
]
image = Image.fromarray(img)
image = image.resize(
(self.size, self.size), resample=self.interpolation
)
image = self.flip_transform(image)
image = np.array(image).astype(np.uint8)
image = (image / 127.5 - 1.0).astype(np.float32)
example["pixel_values"] = torch.from_numpy(image).permute(2, 0, 1)
return example
# Setting up the model
# Load the tokenizer and add the placeholder token as a additional special token.
# Please read and if you agree accept the LICENSE
# [here](https://huggingface.co/CompVis/stable-diffusion-v1-4) if you see an error
tokenizer = CLIPTokenizer.from_pretrained(
pretrained_model_name_or_path,
subfolder="tokenizer",
use_auth_token=YOUR_TOKEN,
)
# Add the placeholder token in tokenizer
num_added_tokens = tokenizer.add_tokens(args.placeholder_token)
if num_added_tokens == 0:
raise ValueError(
f"The tokenizer already contains the token {args.placeholder_token}. Please pass a different"
" `placeholder_token` that is not already in the tokenizer."
)
# Get token ids for our placeholder and initializer token.
# This code block will complain if initializer string is not a single token
# Convert the initializer_token, placeholder_token to ids
token_ids = tokenizer.encode(args.initializer_token, add_special_tokens=False)
# Check if initializer_token is a single token or a sequence of tokens
if len(token_ids) > 1:
raise ValueError("The initializer token must be a single token.")
initializer_token_id = token_ids[0]
placeholder_token_id = tokenizer.convert_tokens_to_ids(args.placeholder_token)
# Load the Stable Diffusion model
# Load models and create wrapper for stable diffusion
text_encoder = CLIPTextModel.from_pretrained(
pretrained_model_name_or_path,
subfolder="text_encoder",
use_auth_token=YOUR_TOKEN,
)
vae = AutoencoderKL.from_pretrained(
pretrained_model_name_or_path,
subfolder="vae",
use_auth_token=YOUR_TOKEN,
)
unet = UNet2DConditionModel.from_pretrained(
pretrained_model_name_or_path,
subfolder="unet",
use_auth_token=YOUR_TOKEN,
)
# We have added the `placeholder_token` in the `tokenizer` so we resize the token embeddings here,
# this will a new embedding vector in the token embeddings for our `placeholder_token`
text_encoder.resize_token_embeddings(len(tokenizer))
# Initialise the newly added placeholder token with the embeddings of the initializer token
token_embeds = text_encoder.get_input_embeddings().weight.data
token_embeds[placeholder_token_id] = token_embeds[initializer_token_id]
# In Textual-Inversion we only train the newly added embedding vector,
# so lets freeze rest of the model parameters here.
def freeze_params(params):
for param in params:
param.requires_grad = False
# Freeze vae and unet
freeze_params(vae.parameters())
freeze_params(unet.parameters())
# Freeze all parameters except for the token embeddings in text encoder
params_to_freeze = itertools.chain(
text_encoder.text_model.encoder.parameters(),
text_encoder.text_model.final_layer_norm.parameters(),
text_encoder.text_model.embeddings.position_embedding.parameters(),
)
freeze_params(params_to_freeze)
# Creating our training data
train_dataset = TextualInversionDataset(
data_root=args.input_dir,
tokenizer=tokenizer,
size=512,
placeholder_token=args.placeholder_token,
repeats=100,
learnable_property=args.what_to_teach, # Option selected above between object and style
center_crop=False,
set="train",
)
def create_dataloader(train_batch_size=1):
return torch.utils.data.DataLoader(
train_dataset, batch_size=train_batch_size, shuffle=True
)
# Create noise_scheduler for training.
noise_scheduler = DDPMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
num_train_timesteps=1000,
tensor_format="pt",
)
# Define hyperparameters for our training
hyperparameters = {
"learning_rate": 5e-04,
"scale_lr": True,
"max_train_steps": args.training_steps,
"train_batch_size": 1,
"gradient_accumulation_steps": 4,
"seed": args.seed,
"output_dir": "sd-concept-output",
}
def training_function(text_encoder, vae, unet):
logger = get_logger(__name__)
train_batch_size = hyperparameters["train_batch_size"]
gradient_accumulation_steps = hyperparameters[
"gradient_accumulation_steps"
]
learning_rate = hyperparameters["learning_rate"]
max_train_steps = hyperparameters["max_train_steps"]
output_dir = hyperparameters["output_dir"]
accelerator = Accelerator(
gradient_accumulation_steps=gradient_accumulation_steps,
)
train_dataloader = create_dataloader(train_batch_size)
if hyperparameters["scale_lr"]:
learning_rate = (
learning_rate
* gradient_accumulation_steps
* train_batch_size
* accelerator.num_processes
)
# Initialize the optimizer
optimizer = torch.optim.AdamW(
text_encoder.get_input_embeddings().parameters(), # only optimize the embeddings
lr=learning_rate,
)
text_encoder, optimizer, train_dataloader = accelerator.prepare(
text_encoder, optimizer, train_dataloader
)
# Move vae and unet to device
vae.to(accelerator.device)
unet.to(accelerator.device)
# Keep vae and unet in eval model as we don't train these
vae.eval()
unet.eval()
# We need to recalculate our total training steps as the size of the training dataloader may have changed.
num_update_steps_per_epoch = math.ceil(
len(train_dataloader) / gradient_accumulation_steps
)
num_train_epochs = math.ceil(max_train_steps / num_update_steps_per_epoch)
# Train!
total_batch_size = (
train_batch_size
* accelerator.num_processes
* gradient_accumulation_steps
)
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Instantaneous batch size per device = {train_batch_size}")
logger.info(
f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}"
)
logger.info(
f" Gradient Accumulation steps = {gradient_accumulation_steps}"
)
logger.info(f" Total optimization steps = {max_train_steps}")
# Only show the progress bar once on each machine.
progress_bar = tqdm(
range(max_train_steps), disable=not accelerator.is_local_main_process
)
progress_bar.set_description("Steps")
global_step = 0
for epoch in range(num_train_epochs):
text_encoder.train()
for step, batch in enumerate(train_dataloader):
with accelerator.accumulate(text_encoder):
# Convert images to latent space
latents = (
vae.encode(batch["pixel_values"])
.latent_dist.sample()
.detach()
)
latents = latents * 0.18215
# Sample noise that we'll add to the latents
noise = torch.randn(latents.shape).to(latents.device)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(
0,
noise_scheduler.num_train_timesteps,
(bsz,),
device=latents.device,
).long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(
latents, noise, timesteps
)
# Get the text embedding for conditioning
encoder_hidden_states = text_encoder(batch["input_ids"])[0]
# Predict the noise residual
noise_pred = unet(
noisy_latents, timesteps, encoder_hidden_states
).sample
loss = (
F.mse_loss(noise_pred, noise, reduction="none")
.mean([1, 2, 3])
.mean()
)
accelerator.backward(loss)
# Zero out the gradients for all token embeddings except the newly added
# embeddings for the concept, as we only want to optimize the concept embeddings
if accelerator.num_processes > 1:
grads = (
text_encoder.module.get_input_embeddings().weight.grad
)
else:
grads = text_encoder.get_input_embeddings().weight.grad
# Get the index for tokens that we want to zero the grads for
index_grads_to_zero = (
torch.arange(len(tokenizer)) != placeholder_token_id
)
grads.data[index_grads_to_zero, :] = grads.data[
index_grads_to_zero, :
].fill_(0)
optimizer.step()
optimizer.zero_grad()
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
global_step += 1
logs = {"loss": loss.detach().item()}
progress_bar.set_postfix(**logs)
if global_step >= max_train_steps:
break
accelerator.wait_for_everyone()
# Create the pipeline using using the trained modules and save it.
if accelerator.is_main_process:
pipeline = StableDiffusionPipeline(
text_encoder=accelerator.unwrap_model(text_encoder),
vae=vae,
unet=unet,
tokenizer=tokenizer,
scheduler=PNDMScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
skip_prk_steps=True,
),
safety_checker=StableDiffusionSafetyChecker.from_pretrained(
"CompVis/stable-diffusion-safety-checker"
),
feature_extractor=CLIPFeatureExtractor.from_pretrained(
"openai/clip-vit-base-patch32"
),
)
pipeline.save_pretrained(output_dir)
# Also save the newly trained embeddings
learned_embeds = (
accelerator.unwrap_model(text_encoder)
.get_input_embeddings()
.weight[placeholder_token_id]
)
learned_embeds_dict = {
args.placeholder_token: learned_embeds.detach().cpu()
}
torch.save(
learned_embeds_dict, os.path.join(output_dir, "learned_embeds.bin")
)
import accelerate
accelerate.notebook_launcher(
training_function, args=(text_encoder, vae, unet), num_processes=1
)
# Set up the pipeline
pipe = StableDiffusionPipeline.from_pretrained(
hyperparameters["output_dir"],
# torch_dtype=torch.float16,
)
all_images = []
for _ in range(args.num_inference_samples):
images = pipe(
[args.prompt],
num_inference_steps=args.inference_steps,
guidance_scale=7.5,
).images
all_images.extend(images)
# output_path = os.path.abspath(os.path.join(os.getcwd(), args.output_dir))
if not os.path.isdir(args.output_dir):
os.mkdir(args.output_dir)
[
image.save(f"{args.output_dir}/{i}.jpeg")
for i, image in enumerate(all_images)
]

View File

@@ -0,0 +1,43 @@
# Stable Diffusion Fine Tuning
## Installation (Linux)
### Activate amdshark.venv Virtual Environment
```shell
source amdshark.venv/bin/activate
# Some older pip installs may not be able to handle the recent PyTorch deps
python -m pip install --upgrade pip
```
## Install dependencies
### Run the following installation commands:
```
pip install -U git+https://github.com/huggingface/diffusers.git
pip install accelerate transformers ftfy
```
### Build torch-mlir with the following branch:
Please cherry-pick this branch of torch-mlir: https://github.com/vivekkhandelwal1/torch-mlir/tree/sd-ops
and build it locally. You can find the instructions for using locally build Torch-MLIR,
here: https://github.com/nod-ai/AMD-SHARK-Studio#how-to-use-your-locally-built-iree--torch-mlir-with-amdshark
## Run the Stable diffusion fine tuning
To run the model with the default set of images and params, run:
```shell
python stable_diffusion_fine_tuning.py
```
By default the training is run through the PyTorch path. If you want to train the model using the Torchdynamo path of Torch-MLIR, you need to specify `--use_torchdynamo=True`.
The default number of training steps are `2000`, which would take many hours to complete based on your system config. You can pass the smaller value with the arg `--training_steps`. You can specify the number of images to be sampled for the result with the `--num_inference_samples` arg. For the number of inference steps you can use `--inference_steps` flag.
For example, you can run the training for a limited set of steps via the dynamo path by using the following command:
```
python stable_diffusion_fine_tuning.py --training_steps=1 --inference_steps=1 --num_inference_samples=1 --train_batch_size=1 --use_torchdynamo=True
```
You can also specify the device to be used via the flag `--device`. The default value is `cpu`, for GPU execution you can specify `--device="cuda"`.

View File

@@ -0,0 +1,914 @@
# Install the required libs
# pip install -U git+https://github.com/huggingface/diffusers.git
# pip install accelerate transformers ftfy
# Import required libraries
import argparse
import itertools
import math
import os
from typing import List
import random
import numpy as np
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
from torch.utils.data import Dataset
import PIL
import logging
import torch_mlir
from torch_mlir.dynamo import make_simple_dynamo_backend
import torch._dynamo as dynamo
from torch.fx.experimental.proxy_tensor import make_fx
from torch_mlir_e2e_test.linalg_on_tensors_backends import refbackend
from amdshark.amdshark_inference import AMDSharkInference
torch._dynamo.config.verbose = True
from diffusers import (
AutoencoderKL,
DDPMScheduler,
PNDMScheduler,
StableDiffusionPipeline,
UNet2DConditionModel,
)
from diffusers.optimization import get_scheduler
from diffusers.pipelines.stable_diffusion import (
StableDiffusionSafetyChecker,
)
from PIL import Image
from torchvision import transforms
from tqdm.auto import tqdm
from transformers import (
CLIPFeatureExtractor,
CLIPTextModel,
CLIPTokenizer,
)
# Enter your HuggingFace Token
# Note: You can comment this prompt and just set your token instead of passing it through cli for every execution.
hf_token = input("Please enter your huggingface token here: ")
YOUR_TOKEN = hf_token
def image_grid(imgs, rows, cols):
assert len(imgs) == rows * cols
w, h = imgs[0].size
grid = Image.new("RGB", size=(cols * w, rows * h))
grid_w, grid_h = grid.size
for i, img in enumerate(imgs):
grid.paste(img, box=(i % cols * w, i // cols * h))
return grid
# `pretrained_model_name_or_path` which Stable Diffusion checkpoint you want to use
# Options: 1.) "stabilityai/stable-diffusion-2"
# 2.) "stabilityai/stable-diffusion-2-base"
# 3.) "CompVis/stable-diffusion-v1-4"
# 4.) "runwayml/stable-diffusion-v1-5"
pretrained_model_name_or_path = "stabilityai/stable-diffusion-2"
# Add here the URLs to the images of the concept you are adding. 3-5 should be fine
urls = [
"https://huggingface.co/datasets/valhalla/images/resolve/main/2.jpeg",
"https://huggingface.co/datasets/valhalla/images/resolve/main/3.jpeg",
"https://huggingface.co/datasets/valhalla/images/resolve/main/5.jpeg",
"https://huggingface.co/datasets/valhalla/images/resolve/main/6.jpeg",
## You can add additional images here
]
# Downloading Images
import requests
import glob
from io import BytesIO
def download_image(url):
try:
response = requests.get(url)
except:
return None
return Image.open(BytesIO(response.content)).convert("RGB")
images = list(filter(None, [download_image(url) for url in urls]))
save_path = "./my_concept"
if not os.path.exists(save_path):
os.mkdir(save_path)
[image.save(f"{save_path}/{i}.jpeg") for i, image in enumerate(images)]
p = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)
p.add_argument(
"--input_dir",
type=str,
default="my_concept/",
help="the directory contains the images used for fine tuning",
)
p.add_argument(
"--output_dir",
type=str,
default="sd_result",
help="the directory contains the images used for fine tuning",
)
p.add_argument(
"--training_steps",
type=int,
default=2000,
help="the maximum number of training steps",
)
p.add_argument(
"--train_batch_size",
type=int,
default=4,
help="The batch size for training",
)
p.add_argument(
"--save_steps",
type=int,
default=250,
help="the number of steps after which to save the learned concept",
)
p.add_argument("--seed", type=int, default=42, help="the random seed")
p.add_argument(
"--what_to_teach",
type=str,
choices=["object", "style"],
default="object",
help="what is it that you are teaching?",
)
p.add_argument(
"--placeholder_token",
type=str,
default="<cat-toy>",
help="It is the token you are going to use to represent your new concept",
)
p.add_argument(
"--initializer_token",
type=str,
default="toy",
help="It is a word that can summarise what is your new concept",
)
p.add_argument(
"--inference_steps",
type=int,
default=50,
help="the number of steps for inference",
)
p.add_argument(
"--num_inference_samples",
type=int,
default=4,
help="the number of samples for inference",
)
p.add_argument(
"--prompt",
type=str,
default="a grafitti in a wall with a *s on it",
help="the text prompt to use",
)
p.add_argument(
"--device",
type=str,
default="cpu",
help="The device to use",
)
p.add_argument(
"--use_torchdynamo",
type=bool,
default=False,
help="This flag is used to determine whether the training has to be done through the torchdynamo path or not.",
)
args = p.parse_args()
torch.manual_seed(args.seed)
if "*s" not in args.prompt:
raise ValueError(
f'The prompt should have a "*s" which will be replaced by a placeholder token.'
)
prompt1, prompt2 = args.prompt.split("*s")
args.prompt = prompt1 + args.placeholder_token + prompt2
# `images_path` is a path to directory containing the training images.
images_path = args.input_dir
while not os.path.exists(str(images_path)):
print(
"The images_path specified does not exist, use the colab file explorer to copy the path :"
)
images_path = input("")
save_path = images_path
# Setup and check the images you have just added
images = []
for file_path in os.listdir(save_path):
try:
image_path = os.path.join(save_path, file_path)
images.append(Image.open(image_path).resize((512, 512)))
except:
print(
f"{image_path} is not a valid image, please make sure to remove this file from the directory otherwise the training could fail."
)
image_grid(images, 1, len(images))
########### Create Dataset ##########
# Setup the prompt templates for training
imagenet_templates_small = [
"a photo of a {}",
"a rendering of a {}",
"a cropped photo of the {}",
"the photo of a {}",
"a photo of a clean {}",
"a photo of a dirty {}",
"a dark photo of the {}",
"a photo of my {}",
"a photo of the cool {}",
"a close-up photo of a {}",
"a bright photo of the {}",
"a cropped photo of a {}",
"a photo of the {}",
"a good photo of the {}",
"a photo of one {}",
"a close-up photo of the {}",
"a rendition of the {}",
"a photo of the clean {}",
"a rendition of a {}",
"a photo of a nice {}",
"a good photo of a {}",
"a photo of the nice {}",
"a photo of the small {}",
"a photo of the weird {}",
"a photo of the large {}",
"a photo of a cool {}",
"a photo of a small {}",
]
imagenet_style_templates_small = [
"a painting in the style of {}",
"a rendering in the style of {}",
"a cropped painting in the style of {}",
"the painting in the style of {}",
"a clean painting in the style of {}",
"a dirty painting in the style of {}",
"a dark painting in the style of {}",
"a picture in the style of {}",
"a cool painting in the style of {}",
"a close-up painting in the style of {}",
"a bright painting in the style of {}",
"a cropped painting in the style of {}",
"a good painting in the style of {}",
"a close-up painting in the style of {}",
"a rendition in the style of {}",
"a nice painting in the style of {}",
"a small painting in the style of {}",
"a weird painting in the style of {}",
"a large painting in the style of {}",
]
# Setup the dataset
class TextualInversionDataset(Dataset):
def __init__(
self,
data_root,
tokenizer,
learnable_property="object", # [object, style]
size=512,
repeats=100,
interpolation="bicubic",
flip_p=0.5,
set="train",
placeholder_token="*",
center_crop=False,
):
self.data_root = data_root
self.tokenizer = tokenizer
self.learnable_property = learnable_property
self.size = size
self.placeholder_token = placeholder_token
self.center_crop = center_crop
self.flip_p = flip_p
self.image_paths = [
os.path.join(self.data_root, file_path)
for file_path in os.listdir(self.data_root)
]
self.num_images = len(self.image_paths)
self._length = self.num_images
if set == "train":
self._length = self.num_images * repeats
self.interpolation = {
"linear": PIL.Image.LINEAR,
"bilinear": PIL.Image.BILINEAR,
"bicubic": PIL.Image.BICUBIC,
"lanczos": PIL.Image.LANCZOS,
}[interpolation]
self.templates = (
imagenet_style_templates_small
if learnable_property == "style"
else imagenet_templates_small
)
self.flip_transform = transforms.RandomHorizontalFlip(p=self.flip_p)
def __len__(self):
return self._length
def __getitem__(self, i):
example = {}
image = Image.open(self.image_paths[i % self.num_images])
if not image.mode == "RGB":
image = image.convert("RGB")
placeholder_string = self.placeholder_token
text = random.choice(self.templates).format(placeholder_string)
example["input_ids"] = self.tokenizer(
text,
padding="max_length",
truncation=True,
max_length=self.tokenizer.model_max_length,
return_tensors="pt",
).input_ids[0]
# default to score-sde preprocessing
img = np.array(image).astype(np.uint8)
if self.center_crop:
crop = min(img.shape[0], img.shape[1])
(
h,
w,
) = (
img.shape[0],
img.shape[1],
)
img = img[
(h - crop) // 2 : (h + crop) // 2,
(w - crop) // 2 : (w + crop) // 2,
]
image = Image.fromarray(img)
image = image.resize(
(self.size, self.size), resample=self.interpolation
)
image = self.flip_transform(image)
image = np.array(image).astype(np.uint8)
image = (image / 127.5 - 1.0).astype(np.float32)
example["pixel_values"] = torch.from_numpy(image).permute(2, 0, 1)
return example
########## Setting up the model ##########
# Load the tokenizer and add the placeholder token as a additional special token.
tokenizer = CLIPTokenizer.from_pretrained(
pretrained_model_name_or_path,
subfolder="tokenizer",
)
# Add the placeholder token in tokenizer
num_added_tokens = tokenizer.add_tokens(args.placeholder_token)
if num_added_tokens == 0:
raise ValueError(
f"The tokenizer already contains the token {args.placeholder_token}. Please pass a different"
" `placeholder_token` that is not already in the tokenizer."
)
# Get token ids for our placeholder and initializer token.
# This code block will complain if initializer string is not a single token
# Convert the initializer_token, placeholder_token to ids
token_ids = tokenizer.encode(args.initializer_token, add_special_tokens=False)
# Check if initializer_token is a single token or a sequence of tokens
if len(token_ids) > 1:
raise ValueError("The initializer token must be a single token.")
initializer_token_id = token_ids[0]
placeholder_token_id = tokenizer.convert_tokens_to_ids(args.placeholder_token)
# Load the Stable Diffusion model
# Load models and create wrapper for stable diffusion
# pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path)
# del pipeline
text_encoder = CLIPTextModel.from_pretrained(
pretrained_model_name_or_path, subfolder="text_encoder"
)
vae = AutoencoderKL.from_pretrained(
pretrained_model_name_or_path, subfolder="vae"
)
unet = UNet2DConditionModel.from_pretrained(
pretrained_model_name_or_path, subfolder="unet"
)
# We have added the placeholder_token in the tokenizer so we resize the token embeddings here
# this will a new embedding vector in the token embeddings for our placeholder_token
text_encoder.resize_token_embeddings(len(tokenizer))
# Initialise the newly added placeholder token with the embeddings of the initializer token
token_embeds = text_encoder.get_input_embeddings().weight.data
token_embeds[placeholder_token_id] = token_embeds[initializer_token_id]
# In Textual-Inversion we only train the newly added embedding vector
# so lets freeze rest of the model parameters here
def freeze_params(params):
for param in params:
param.requires_grad = False
# Freeze vae and unet
freeze_params(vae.parameters())
freeze_params(unet.parameters())
# Freeze all parameters except for the token embeddings in text encoder
params_to_freeze = itertools.chain(
text_encoder.text_model.encoder.parameters(),
text_encoder.text_model.final_layer_norm.parameters(),
text_encoder.text_model.embeddings.position_embedding.parameters(),
)
freeze_params(params_to_freeze)
# Move vae and unet to device
# For the dynamo path default compilation device is `cpu`, since torch-mlir
# supports only that. Therefore, convert to device only for PyTorch path.
if not args.use_torchdynamo:
vae.to(args.device)
unet.to(args.device)
# Keep vae in eval mode as we don't train it
vae.eval()
# Keep unet in train mode to enable gradient checkpointing
unet.train()
class VaeModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.vae = vae
def forward(self, input):
x = self.vae.encode(input, return_dict=False)[0]
return x
class UnetModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.unet = unet
def forward(self, x, y, z):
return self.unet.forward(x, y, z, return_dict=False)[0]
amdshark_vae = VaeModel()
amdshark_unet = UnetModel()
####### Creating our training data ########
# Let's create the Dataset and Dataloader
train_dataset = TextualInversionDataset(
data_root=save_path,
tokenizer=tokenizer,
size=vae.sample_size,
placeholder_token=args.placeholder_token,
repeats=100,
learnable_property=args.what_to_teach, # Option selected above between object and style
center_crop=False,
set="train",
)
def create_dataloader(train_batch_size=1):
return torch.utils.data.DataLoader(
train_dataset, batch_size=train_batch_size, shuffle=True
)
# Create noise_scheduler for training
noise_scheduler = DDPMScheduler.from_config(
pretrained_model_name_or_path, subfolder="scheduler"
)
######## Training ###########
# Define hyperparameters for our training. If you are not happy with your results,
# you can tune the `learning_rate` and the `max_train_steps`
# Setting up all training args
hyperparameters = {
"learning_rate": 5e-04,
"scale_lr": True,
"max_train_steps": args.training_steps,
"save_steps": args.save_steps,
"train_batch_size": args.train_batch_size,
"gradient_accumulation_steps": 1,
"gradient_checkpointing": True,
"mixed_precision": "fp16",
"seed": 42,
"output_dir": "sd-concept-output",
}
# creating output directory
cwd = os.getcwd()
out_dir = os.path.join(cwd, hyperparameters["output_dir"])
while not os.path.exists(str(out_dir)):
try:
os.mkdir(out_dir)
except OSError as error:
print("Output directory not created")
###### Torch-MLIR Compilation ######
def _remove_nones(fx_g: torch.fx.GraphModule) -> List[int]:
removed_indexes = []
for node in fx_g.graph.nodes:
if node.op == "output":
assert (
len(node.args) == 1
), "Output node must have a single argument"
node_arg = node.args[0]
if isinstance(node_arg, (list, tuple)):
node_arg = list(node_arg)
node_args_len = len(node_arg)
for i in range(node_args_len):
curr_index = node_args_len - (i + 1)
if node_arg[curr_index] is None:
removed_indexes.append(curr_index)
node_arg.pop(curr_index)
node.args = (tuple(node_arg),)
break
if len(removed_indexes) > 0:
fx_g.graph.lint()
fx_g.graph.eliminate_dead_code()
fx_g.recompile()
removed_indexes.sort()
return removed_indexes
def _unwrap_single_tuple_return(fx_g: torch.fx.GraphModule) -> bool:
"""
Replace tuple with tuple element in functions that return one-element tuples.
Returns true if an unwrapping took place, and false otherwise.
"""
unwrapped_tuple = False
for node in fx_g.graph.nodes:
if node.op == "output":
assert (
len(node.args) == 1
), "Output node must have a single argument"
node_arg = node.args[0]
if isinstance(node_arg, tuple):
if len(node_arg) == 1:
node.args = (node_arg[0],)
unwrapped_tuple = True
break
if unwrapped_tuple:
fx_g.graph.lint()
fx_g.recompile()
return unwrapped_tuple
def _returns_nothing(fx_g: torch.fx.GraphModule) -> bool:
for node in fx_g.graph.nodes:
if node.op == "output":
assert (
len(node.args) == 1
), "Output node must have a single argument"
node_arg = node.args[0]
if isinstance(node_arg, tuple):
return len(node_arg) == 0
return False
def transform_fx(fx_g):
for node in fx_g.graph.nodes:
if node.op == "call_function":
if node.target in [
torch.ops.aten.empty,
]:
# aten.empty should be filled with zeros.
if node.target in [torch.ops.aten.empty]:
with fx_g.graph.inserting_after(node):
new_node = fx_g.graph.call_function(
torch.ops.aten.zero_,
args=(node,),
)
node.append(new_node)
node.replace_all_uses_with(new_node)
new_node.args = (node,)
fx_g.graph.lint()
@make_simple_dynamo_backend
def refbackend_torchdynamo_backend(
fx_graph: torch.fx.GraphModule, example_inputs: List[torch.Tensor]
):
# handling usage of empty tensor without initializing
transform_fx(fx_graph)
fx_graph.recompile()
if _returns_nothing(fx_graph):
return fx_graph
removed_none_indexes = _remove_nones(fx_graph)
was_unwrapped = _unwrap_single_tuple_return(fx_graph)
mlir_module = torch_mlir.compile(
fx_graph, example_inputs, output_type="linalg-on-tensors"
)
bytecode_stream = BytesIO()
mlir_module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
amdshark_module = AMDSharkInference(
mlir_module=bytecode, device=args.device, mlir_dialect="tm_tensor"
)
amdshark_module.compile()
def compiled_callable(*inputs):
inputs = [x.numpy() for x in inputs]
result = amdshark_module("forward", inputs)
if was_unwrapped:
result = [
result,
]
if not isinstance(result, list):
result = torch.from_numpy(result)
else:
result = tuple(torch.from_numpy(x) for x in result)
result = list(result)
for removed_index in removed_none_indexes:
result.insert(removed_index, None)
result = tuple(result)
return result
return compiled_callable
def predictions(torch_func, jit_func, batchA, batchB):
res = jit_func(batchA.numpy(), batchB.numpy())
if res is not None:
prediction = res
else:
prediction = None
return prediction
logger = logging.getLogger(__name__)
# def save_progress(text_encoder, placeholder_token_id, accelerator, save_path):
def save_progress(text_encoder, placeholder_token_id, save_path):
logger.info("Saving embeddings")
learned_embeds = (
# accelerator.unwrap_model(text_encoder)
text_encoder.get_input_embeddings().weight[placeholder_token_id]
)
learned_embeds_dict = {
args.placeholder_token: learned_embeds.detach().cpu()
}
torch.save(learned_embeds_dict, save_path)
train_batch_size = hyperparameters["train_batch_size"]
gradient_accumulation_steps = hyperparameters["gradient_accumulation_steps"]
learning_rate = hyperparameters["learning_rate"]
if hyperparameters["scale_lr"]:
learning_rate = (
learning_rate
* gradient_accumulation_steps
* train_batch_size
# * accelerator.num_processes
)
# Initialize the optimizer
optimizer = torch.optim.AdamW(
text_encoder.get_input_embeddings().parameters(), # only optimize the embeddings
lr=learning_rate,
)
# Training function
def train_func(batch_pixel_values, batch_input_ids):
# Convert images to latent space
latents = amdshark_vae(batch_pixel_values).sample().detach()
latents = latents * 0.18215
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(
0,
noise_scheduler.num_train_timesteps,
(bsz,),
device=latents.device,
).long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
# Get the text embedding for conditioning
encoder_hidden_states = text_encoder(batch_input_ids)[0]
# Predict the noise residual
noise_pred = amdshark_unet(
noisy_latents,
timesteps,
encoder_hidden_states,
)
# Get the target for loss depending on the prediction type
if noise_scheduler.config.prediction_type == "epsilon":
target = noise
elif noise_scheduler.config.prediction_type == "v_prediction":
target = noise_scheduler.get_velocity(latents, noise, timesteps)
else:
raise ValueError(
f"Unknown prediction type {noise_scheduler.config.prediction_type}"
)
loss = (
F.mse_loss(noise_pred, target, reduction="none").mean([1, 2, 3]).mean()
)
loss.backward()
# Zero out the gradients for all token embeddings except the newly added
# embeddings for the concept, as we only want to optimize the concept embeddings
grads = text_encoder.get_input_embeddings().weight.grad
# Get the index for tokens that we want to zero the grads for
index_grads_to_zero = torch.arange(len(tokenizer)) != placeholder_token_id
grads.data[index_grads_to_zero, :] = grads.data[
index_grads_to_zero, :
].fill_(0)
optimizer.step()
optimizer.zero_grad()
return loss
def training_function():
max_train_steps = hyperparameters["max_train_steps"]
output_dir = hyperparameters["output_dir"]
gradient_checkpointing = hyperparameters["gradient_checkpointing"]
train_dataloader = create_dataloader(train_batch_size)
# We need to recalculate our total training steps as the size of the training dataloader may have changed.
num_update_steps_per_epoch = math.ceil(
len(train_dataloader) / gradient_accumulation_steps
)
num_train_epochs = math.ceil(max_train_steps / num_update_steps_per_epoch)
# Train!
total_batch_size = (
train_batch_size
* gradient_accumulation_steps
# train_batch_size * accelerator.num_processes * gradient_accumulation_steps
)
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Instantaneous batch size per device = {train_batch_size}")
logger.info(
f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}"
)
logger.info(
f" Gradient Accumulation steps = {gradient_accumulation_steps}"
)
logger.info(f" Total optimization steps = {max_train_steps}")
# Only show the progress bar once on each machine.
progress_bar = tqdm(
# range(max_train_steps), disable=not accelerator.is_local_main_process
range(max_train_steps)
)
progress_bar.set_description("Steps")
global_step = 0
params_ = [i for i in text_encoder.get_input_embeddings().parameters()]
if args.use_torchdynamo:
print("******** TRAINING STARTED - TORCHYDNAMO PATH ********")
else:
print("******** TRAINING STARTED - PYTORCH PATH ********")
print("Initial weights:")
print(params_, params_[0].shape)
for epoch in range(num_train_epochs):
text_encoder.train()
for step, batch in enumerate(train_dataloader):
if args.use_torchdynamo:
dynamo_callable = dynamo.optimize(
refbackend_torchdynamo_backend
)(train_func)
lam_func = lambda x, y: dynamo_callable(
torch.from_numpy(x), torch.from_numpy(y)
)
loss = predictions(
train_func,
lam_func,
batch["pixel_values"],
batch["input_ids"],
# params[0].detach(),
)
else:
loss = train_func(batch["pixel_values"], batch["input_ids"])
print(loss)
# Checks if the accelerator has performed an optimization step behind the scenes
progress_bar.update(1)
global_step += 1
if global_step % hyperparameters["save_steps"] == 0:
save_path = os.path.join(
output_dir,
f"learned_embeds-step-{global_step}.bin",
)
save_progress(
text_encoder,
placeholder_token_id,
save_path,
)
logs = {"loss": loss.detach().item()}
progress_bar.set_postfix(**logs)
if global_step >= max_train_steps:
break
# Create the pipeline using using the trained modules and save it.
params__ = [i for i in text_encoder.get_input_embeddings().parameters()]
print("******** TRAINING PROCESS FINISHED ********")
print("Updated weights:")
print(params__, params__[0].shape)
pipeline = StableDiffusionPipeline.from_pretrained(
pretrained_model_name_or_path,
# text_encoder=accelerator.unwrap_model(text_encoder),
text_encoder=text_encoder,
tokenizer=tokenizer,
vae=vae,
unet=unet,
)
pipeline.save_pretrained(output_dir)
# Also save the newly trained embeddings
save_path = os.path.join(output_dir, f"learned_embeds.bin")
save_progress(text_encoder, placeholder_token_id, save_path)
training_function()
for param in itertools.chain(unet.parameters(), text_encoder.parameters()):
if param.grad is not None:
del param.grad # free some memory
torch.cuda.empty_cache()
# Set up the pipeline
from diffusers import DPMSolverMultistepScheduler
pipe = StableDiffusionPipeline.from_pretrained(
hyperparameters["output_dir"],
scheduler=DPMSolverMultistepScheduler.from_pretrained(
hyperparameters["output_dir"], subfolder="scheduler"
),
)
if not args.use_torchdynamo:
pipe.to(args.device)
# Run the Stable Diffusion pipeline
# Don't forget to use the placeholder token in your prompt
all_images = []
for _ in range(args.num_inference_samples):
images = pipe(
[args.prompt],
num_inference_steps=args.inference_steps,
guidance_scale=7.5,
).images
all_images.extend(images)
output_path = os.path.abspath(os.path.join(os.getcwd(), args.output_dir))
if not os.path.isdir(args.output_dir):
os.mkdir(args.output_dir)
[
image.save(f"{args.output_dir}/{i}.jpeg")
for i, image in enumerate(all_images)
]

View File

@@ -21,7 +21,6 @@ import torch
from iree.runtime import DeviceArray
from torch_mlir._mlir_libs._mlir.ir import Module
from torch_mlir.compiler_utils import (
get_module_name_for_debug_dump,
run_pipeline_with_repro_report,
)
from torch_mlir.eager_mode.torch_mlir_eager_backend import (
@@ -32,7 +31,7 @@ from torch_mlir_e2e_test.eager_backends.refbackend import (
NUMPY_TO_TORCH_DTYPE_DICT,
)
from shark.iree_utils.compile_utils import (
from amdshark.iree_utils.compile_utils import (
get_iree_compiled_module,
IREE_DEVICE_MAP,
)
@@ -64,14 +63,13 @@ class EagerModeIREELinalgOnTensorsBackend(TorchMLIREagerBackend):
)
def compile(self, imported_module: Module):
fn_name = get_module_name_for_debug_dump(imported_module)
run_pipeline_with_repro_report(
imported_module,
"torch-function-to-torch-backend-pipeline,torch-backend-to-linalg-on-tensors-backend-pipeline",
"EagerMode",
)
callable, _ = get_iree_compiled_module(
imported_module, self.raw_device_str, func_name=fn_name
imported_module, self.raw_device_str
)
return callable

View File

@@ -0,0 +1,164 @@
# Copyright 2023 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Common utilities to be shared by iree utilities.
import functools
import os
import sys
import subprocess
def run_cmd(cmd, debug=False, raise_err=False):
"""
Inputs:
cmd : cli command string.
debug : if True, prints debug info
raise_err : if True, raise exception to caller
"""
if debug:
print("IREE run command: \n\n")
print(cmd)
print("\n\n")
try:
result = subprocess.run(
cmd,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
check=True,
)
stdout = result.stdout.decode()
stderr = result.stderr.decode()
return stdout, stderr
except subprocess.CalledProcessError as e:
if raise_err:
raise Exception from e
else:
print(e.output)
sys.exit(f"Exiting program due to error running {cmd}")
def iree_device_map(device):
uri_parts = device.split("://", 2)
iree_driver = (
_IREE_DEVICE_MAP[uri_parts[0]]
if uri_parts[0] in _IREE_DEVICE_MAP
else uri_parts[0]
)
if len(uri_parts) == 1:
return iree_driver
elif "rocm" in uri_parts:
return "rocm"
else:
return f"{iree_driver}://{uri_parts[1]}"
def get_supported_device_list():
return list(_IREE_DEVICE_MAP.keys())
_IREE_DEVICE_MAP = {
"cpu": "local-task",
"cpu-task": "local-task",
"cpu-sync": "local-sync",
"cuda": "cuda",
"vulkan": "vulkan",
"metal": "metal",
"rocm": "rocm",
"hip": "hip",
"intel-gpu": "level_zero",
}
def iree_target_map(device):
if "://" in device:
device = device.split("://")[0]
return _IREE_TARGET_MAP[device] if device in _IREE_TARGET_MAP else device
_IREE_TARGET_MAP = {
"cpu": "llvm-cpu",
"cpu-task": "llvm-cpu",
"cpu-sync": "llvm-cpu",
"cuda": "cuda",
"vulkan": "vulkan-spirv",
"metal": "metal",
"rocm": "rocm",
"hip": "rocm",
"intel-gpu": "opencl-spirv",
}
# Finds whether the required drivers are installed for the given device.
@functools.cache
def check_device_drivers(device):
"""
Checks necessary drivers present for gpu and vulkan devices
False => drivers present!
"""
if "://" in device:
device = device.split("://")[0]
from iree.runtime import get_driver
device_mapped = iree_device_map(device)
try:
_ = get_driver(device_mapped)
except ValueError as ve:
print(
f"[ERR] device `{device}` not registered with IREE. "
"Ensure IREE is configured for use with this device.\n"
f"Full Error: \n {repr(ve)}"
)
return True
except RuntimeError as re:
print(f"[ERR] Failed to get driver for {device} with error:\n{repr(re)}")
return True
# Unknown device. We assume drivers are installed.
return False
# Installation info for the missing device drivers.
def device_driver_info(device):
device_driver_err_map = {
"cuda": {
"debug": "Try `nvidia-smi` on system to check.",
"solution": " from https://www.nvidia.in/Download/index.aspx?lang=en-in for your system.",
},
"vulkan": {
"debug": "Try `vulkaninfo` on system to check.",
"solution": " from https://vulkan.lunarg.com/sdk/home for your distribution.",
},
"metal": {
"debug": "Check if Bare metal is supported and enabled on your system.",
"solution": ".",
},
"rocm": {
"debug": f"Try `{'hip' if sys.platform == 'win32' else 'rocm'}info` on system to check.",
"solution": " from https://rocm.docs.amd.com/en/latest/rocm.html for your system.",
},
}
if device in device_driver_err_map:
err_msg = (
f"Required drivers for {device} not found. {device_driver_err_map[device]['debug']} "
f"Please install the required drivers{device_driver_err_map[device]['solution']} "
f"For further assistance please reach out to the community on discord [https://discord.com/invite/RUqY2h2s9u]"
f" and/or file a bug at https://github.com/nod-ai/AMD-SHARK-Studio/issues"
)
return err_msg
else:
return f"{device} is not supported."

View File

@@ -0,0 +1,154 @@
# Copyright 2020 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from amdshark.iree_utils._common import run_cmd, iree_device_map
from amdshark.iree_utils.cpu_utils import get_cpu_count
import numpy as np
import os
import re
import platform
UNIT_TO_SECOND_MAP = {"us": 1e-6, "ms": 0.001, "s": 1}
def tensor_to_type_str(input_tensors: tuple, mlir_dialect: str):
"""
Input: A tuple of input tensors i.e tuple(torch.tensor)
Output: list of string that represent mlir types (i.e 1x24xf64)
# TODO: Support more than floats, and ints
"""
list_of_type = []
for input_tensor in input_tensors:
type_string = "x".join([str(dim) for dim in input_tensor.shape])
if mlir_dialect in ["linalg", "tosa"]:
dtype_string = str(input_tensor.dtype).replace("torch.", "")
elif mlir_dialect in ["mhlo", "tflite"]:
dtype = input_tensor.dtype
try:
dtype_string = re.findall("'[^\"]*'", str(dtype))[0].replace(
"'", ""
)
except IndexError:
dtype_string = str(dtype)
regex_split = re.compile("([a-zA-Z]+)([0-9]+)")
match = regex_split.match(dtype_string)
mlir_type_string = str(match.group(1)[0]) + str(match.group(2))
type_string += f"x{mlir_type_string}"
list_of_type.append(type_string)
return list_of_type
def build_benchmark_args(
input_file: str,
device: str,
input_tensors: tuple,
mlir_dialect: str,
training=False,
):
"""
Inputs: input_file leading to vmfb, input_tensor to function, target device,
and whether it is training or not.
Outputs: string that execute benchmark-module on target model.
"""
path = os.path.join(os.environ["VIRTUAL_ENV"], "bin")
if platform.system() == "Windows":
benchmarker_path = os.path.join(path, "iree-benchmark-module.exe")
time_extractor = None
else:
benchmarker_path = os.path.join(path, "iree-benchmark-module")
time_extractor = "| awk 'END{{print $2 $3}}'"
benchmark_cl = [benchmarker_path, f"--module={input_file}"]
# TODO: The function named can be passed as one of the args.
fn_name = "forward"
if training == True:
# TODO: Replace name of train with actual train fn name.
fn_name = "train"
benchmark_cl.append(f"--function={fn_name}")
benchmark_cl.append(f"--device={iree_device_map(device)}")
mlir_input_types = tensor_to_type_str(input_tensors, mlir_dialect)
for mlir_input in mlir_input_types:
benchmark_cl.append(f"--input={mlir_input}")
if device == "cpu":
num_cpus = get_cpu_count()
if num_cpus is not None:
benchmark_cl.append(f"--task_topology_max_group_count={num_cpus}")
# if time_extractor:
# benchmark_cl.append(time_extractor)
benchmark_cl.append(f"--print_statistics=true")
return benchmark_cl
def build_benchmark_args_non_tensor_input(
input_file: str,
device: str,
inputs: tuple,
mlir_dialect: str,
function_name: str,
):
"""
Inputs: input_file leading to vmfb, input_tensor to function, target device,
and whether it is training or not.
Outputs: string that execute benchmark-module on target model.
"""
path = os.path.join(os.environ["VIRTUAL_ENV"], "bin")
if platform.system() == "Windows":
benchmarker_path = os.path.join(path, "iree-benchmark-module.exe")
time_extractor = None
else:
benchmarker_path = os.path.join(path, "iree-benchmark-module")
time_extractor = "| awk 'END{{print $2 $3}}'"
benchmark_cl = [benchmarker_path, f"--module={input_file}"]
# TODO: The function named can be passed as one of the args.
if function_name:
benchmark_cl.append(f"--function={function_name}")
benchmark_cl.append(f"--device={iree_device_map(device)}")
for input in inputs:
benchmark_cl.append(f"--input={input}")
if platform.system() != "Windows":
time_extractor = "| awk 'END{{print $2 $3}}'"
benchmark_cl.append(time_extractor)
return benchmark_cl
def run_benchmark_module(benchmark_cl):
"""
Run benchmark command, extract result and return iteration/seconds, host
peak memory, and device peak memory.
# TODO: Add an example of the benchmark command.
Input: benchmark command.
"""
benchmark_path = benchmark_cl[0]
assert os.path.exists(
benchmark_path
), "Cannot find iree_benchmark_module, Please contact AMDSHARK maintainer on discord."
bench_stdout, bench_stderr = run_cmd(" ".join(benchmark_cl))
try:
regex_split = re.compile("(\d+[.]*\d*)( *)([a-zA-Z]+)")
match = regex_split.search(bench_stdout)
time_ms = float(match.group(1))
unit = match.group(3)
except AttributeError:
regex_split = re.compile("(\d+[.]*\d*)([a-zA-Z]+)")
match = regex_split.search(bench_stdout)
time_ms = float(match.group(1))
unit = match.group(2)
iter_per_second = 1.0 / (time_ms * 0.001)
# Extract peak memory.
host_regex = re.compile(r".*HOST_LOCAL:\s*([0-9]+)B peak")
host_peak_b = int(host_regex.search(bench_stderr).group(1))
device_regex = re.compile(r".*DEVICE_LOCAL:\s*([0-9]+)B peak")
device_peak_b = int(device_regex.search(bench_stderr).group(1))
return iter_per_second, host_peak_b, device_peak_b

View File

@@ -0,0 +1,704 @@
# Copyright 2023 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import functools
import numpy as np
import os
import re
import tempfile
from pathlib import Path
import iree.runtime as ireert
import iree.compiler as ireec
from amdshark.parser import amdshark_args
from .trace import DetailLogger
from ._common import iree_device_map, iree_target_map
from .cpu_utils import get_iree_cpu_rt_args
from .benchmark_utils import *
# Get the iree-compile arguments given device.
def get_iree_device_args(device, extra_args=[]):
print("Configuring for device:" + device)
device, device_num = clean_device_info(device)
if "cpu" in device:
from amdshark.iree_utils.cpu_utils import get_iree_cpu_args
u_kernel_flag = ["--iree-llvmcpu-enable-ukernels"]
stack_size_flag = ["--iree-llvmcpu-stack-allocation-limit=256000"]
return (
get_iree_cpu_args()
+ u_kernel_flag
+ stack_size_flag
)
if device == "cuda":
from amdshark.iree_utils.gpu_utils import get_iree_gpu_args
return get_iree_gpu_args()
if device == "vulkan":
from amdshark.iree_utils.vulkan_utils import get_iree_vulkan_args
return get_iree_vulkan_args(
device_num=device_num, extra_args=extra_args
)
if device == "metal":
from amdshark.iree_utils.metal_utils import get_iree_metal_args
return get_iree_metal_args(extra_args=extra_args)
if device == "rocm":
from amdshark.iree_utils.gpu_utils import get_iree_rocm_args
return get_iree_rocm_args(device_num=device_num, extra_args=extra_args)
if device == "hip":
from amdshark.iree_utils.gpu_utils import get_iree_rocm_args
return get_iree_rocm_args(device_num=device_num, extra_args=extra_args, hip_driver=True)
return []
def get_iree_target_triple(device):
args = get_iree_device_args(device)
for flag in args:
if "triple" in flag:
triple = flag.split("=")[-1]
return triple
return ""
def clean_device_info(raw_device):
# return appropriate device and device_id for consumption by Studio pipeline
# Multiple devices only supported for vulkan and rocm (as of now).
# default device must be selected for all others
device_id = None
device = (
raw_device
if "=>" not in raw_device
else raw_device.split("=>")[1].strip()
)
if "://" in device:
device, device_id = device.split("://")
if len(device_id) <= 2:
device_id = int(device_id)
if device not in ["hip", "rocm", "vulkan"]:
device_id = None
if device in ["hip", "rocm", "vulkan"] and device_id == None:
device_id = 0
return device, device_id
# Get the iree-compiler arguments given frontend.
def get_iree_frontend_args(frontend):
if frontend in ["torch", "pytorch", "linalg", "tm_tensor"]:
return ["--iree-llvmcpu-target-cpu-features=host"]
elif frontend in ["tensorflow", "tf", "mhlo", "stablehlo"]:
return [
"--iree-llvmcpu-target-cpu-features=host",
"--iree-input-demote-i64-to-i32",
]
else:
# Frontend not found.
return []
# Common args to be used given any frontend or device.
def get_iree_common_args(debug=False):
common_args = [
"--iree-util-zero-fill-elided-attrs",
"--mlir-elide-elementsattrs-if-larger=10",
]
if debug == True:
common_args.extend(
[
"--iree-opt-strip-assertions=false",
"--verify=true",
]
)
else:
common_args.extend(
[
"--iree-opt-strip-assertions=true",
"--verify=false",
]
)
return common_args
# Args that are suitable only for certain models or groups of models.
# amdshark_args are passed down from pytests to control which models compile with these flags,
# but they can also be set in amdshark/parser.py
def get_model_specific_args():
ms_args = []
if amdshark_args.enable_conv_transform == True:
ms_args += [
"--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-flow-convert-conv-nchw-to-nhwc))"
]
if amdshark_args.enable_img2col_transform == True:
ms_args += [
"--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-preprocessing-convert-conv2d-to-img2col))"
]
if amdshark_args.use_winograd == True:
ms_args += [
"--iree-preprocessing-pass-pipeline=builtin.module(func.func(iree-linalg-ext-convert-conv2d-to-winograd))"
]
return ms_args
def create_dispatch_dirs(bench_dir, device):
protected_files = ["ordered-dispatches.txt"]
bench_dir_path = bench_dir.split("/")
bench_dir_path[-1] = "temp_" + bench_dir_path[-1]
tmp_bench_dir = "/".join(bench_dir_path)
for f_ in os.listdir(bench_dir):
if os.path.isfile(f"{bench_dir}/{f_}") and f_ not in protected_files:
dir_name = re.sub("\.\S*$", "", f_)
if os.path.exists(f"{bench_dir}/{dir_name}"):
os.system(f"rm -rf {bench_dir}/{dir_name}")
os.system(f"mkdir {bench_dir}/{dir_name}")
os.system(f"mv {bench_dir}/{f_} {bench_dir}/{dir_name}/{f_}")
for f_ in os.listdir(tmp_bench_dir):
if os.path.isfile(f"{tmp_bench_dir}/{f_}"):
dir_name = ""
for d_ in os.listdir(bench_dir):
if re.search(f"{d_}(?=\D)", f_):
dir_name = d_
if dir_name != "":
os.system(
f"mv {tmp_bench_dir}/{f_} {bench_dir}/{dir_name}/{dir_name}_benchmark.mlir"
)
def dump_isas(bench_dir):
for d_ in os.listdir(bench_dir):
if os.path.isdir(f"{bench_dir}/{d_}"):
for f_ in os.listdir(f"{bench_dir}/{d_}"):
if f_.endswith(".spv"):
os.system(
f"amdllpc -gfxip 11.0 {bench_dir}/{d_}/{f_} -v > \
{bench_dir}/{d_}/isa.txt"
)
def compile_benchmark_dirs(bench_dir, device, dispatch_benchmarks):
benchmark_runtimes = {}
dispatch_list = []
all_dispatches = False
if dispatch_benchmarks.lower().strip() == "all":
all_dispatches = True
else:
try:
dispatch_list = [
int(dispatch_index)
for dispatch_index in dispatch_benchmarks.split(" ")
]
except:
print("ERROR: Invalid dispatch benchmarks")
return None
for d_ in os.listdir(bench_dir):
if os.path.isdir(f"{bench_dir}/{d_}"):
in_dispatches = False
for dispatch in dispatch_list:
if str(dispatch) in d_:
in_dispatches = True
if all_dispatches or in_dispatches:
for f_ in os.listdir(f"{bench_dir}/{d_}"):
if "benchmark.mlir" in f_:
dispatch_file = open(f"{bench_dir}/{d_}/{f_}", "r")
module = dispatch_file.read()
dispatch_file.close()
flatbuffer_blob = ireec.compile_str(
module, target_backends=[iree_target_map(device)]
)
vmfb_file = open(
f"{bench_dir}/{d_}/{d_}_benchmark.vmfb", "wb"
)
vmfb_file.write(flatbuffer_blob)
vmfb_file.close()
config = get_iree_runtime_config(device)
vm_module = ireert.VmModule.from_buffer(
config.vm_instance,
flatbuffer_blob,
warn_if_copy=False,
)
benchmark_cl = build_benchmark_args_non_tensor_input(
input_file=f"{bench_dir}/{d_}/{d_}_benchmark.vmfb",
device=device,
inputs=(0,),
mlir_dialect="linalg",
function_name="",
)
benchmark_bash = open(
f"{bench_dir}/{d_}/{d_}_benchmark.sh", "w+"
)
benchmark_bash.write("#!/bin/bash\n")
benchmark_bash.write(" ".join(benchmark_cl))
benchmark_bash.close()
iter_per_second, _, _ = run_benchmark_module(
benchmark_cl
)
benchmark_file = open(
f"{bench_dir}/{d_}/{d_}_data.txt", "w+"
)
benchmark_file.write(f"DISPATCH: {d_}\n")
benchmark_file.write(str(iter_per_second) + "\n")
benchmark_file.write(
"AMDSHARK BENCHMARK RESULT: "
+ str(1 / (iter_per_second * 0.001))
+ "\n"
)
benchmark_file.close()
benchmark_runtimes[d_] = 1 / (iter_per_second * 0.001)
elif ".mlir" in f_ and "benchmark" not in f_:
dispatch_file = open(f"{bench_dir}/{d_}/{f_}", "r")
module = dispatch_file.read()
dispatch_file.close()
module = re.sub(
"hal.executable private",
"hal.executable public",
module,
)
flatbuffer_blob = ireec.compile_str(
module,
target_backends=[iree_target_map(device)],
extra_args=["--compile-mode=hal-executable"],
)
spirv_file = open(
f"{bench_dir}/{d_}/{d_}_spirv.vmfb", "wb"
)
spirv_file.write(flatbuffer_blob)
spirv_file.close()
ordered_dispatches = [
(k, v)
for k, v in sorted(
benchmark_runtimes.items(), key=lambda item: item[1]
)
][::-1]
f_ = open(f"{bench_dir}/ordered-dispatches.txt", "w+")
for dispatch in ordered_dispatches:
f_.write(f"{dispatch[0]}: {dispatch[1]}ms\n")
f_.close()
def compile_module_to_flatbuffer(
module,
device,
frontend,
model_config_path,
extra_args,
model_name="None",
debug=False,
compile_str=False,
write_to=None,
):
# Setup Compile arguments wrt to frontends.
input_type = "auto"
args = get_iree_frontend_args(frontend)
args += get_iree_device_args(device, extra_args)
args += get_iree_common_args(debug=debug)
args += get_model_specific_args()
args += extra_args
args += amdshark_args.additional_compile_args
if frontend in ["tensorflow", "tf"]:
input_type = "auto"
elif frontend in ["stablehlo", "tosa"]:
input_type = frontend
elif frontend in ["tflite", "tflite-tosa"]:
input_type = "tosa"
elif frontend in ["tm_tensor"]:
input_type = ireec.InputType.TM_TENSOR
elif frontend in ["torch", "pytorch"]:
input_type = "torch"
if compile_str:
flatbuffer_blob = ireec.compile_str(
module,
target_backends=[iree_target_map(device)],
extra_args=args,
input_type=input_type,
)
else:
assert os.path.isfile(module)
flatbuffer_blob = ireec.compile_file(
str(module),
input_type=input_type,
target_backends=[iree_target_map(device)],
extra_args=args,
)
if write_to is not None:
with open(write_to, "wb") as f:
f.write(flatbuffer_blob)
return None
return flatbuffer_blob
def get_iree_module(
flatbuffer_blob,
device,
device_idx=None,
rt_flags: list = [],
external_weight_file=None,
):
if external_weight_file is not None:
index = ireert.ParameterIndex()
index.load(external_weight_file)
# Returns the compiled module and the configs.
for flag in rt_flags:
ireert.flags.parse_flag(flag)
if device_idx is not None:
device = iree_device_map(device)
print("registering device id: ", device_idx)
haldriver = ireert.get_driver(device)
hal_device_id = haldriver.query_available_devices()[device_idx][
"device_id"
]
haldevice = haldriver.create_device(
hal_device_id,
allocators=amdshark_args.device_allocator,
)
config = ireert.Config(device=haldevice)
config.id = hal_device_id
else:
config = get_iree_runtime_config(device)
vm_module = ireert.VmModule.from_buffer(
config.vm_instance, flatbuffer_blob, warn_if_copy=False
)
modules = []
if external_weight_file is not None:
modules.append(index.create_provider(scope="model"))
ctx = ireert.SystemContext(vm_modules=modules, config=config)
ctx.add_vm_module(vm_module)
ModuleCompiled = getattr(ctx.modules, vm_module.name)
return ModuleCompiled, config
def load_vmfb_using_mmap(
flatbuffer_blob_or_path,
device: str,
device_idx: int = None,
rt_flags: list = [],
external_weight_file: str = None,
):
print(f"Loading module {flatbuffer_blob_or_path}...")
if "task" in device:
print(
f"[DEBUG] setting iree runtime flags for cpu:\n{' '.join(get_iree_cpu_rt_args())}"
)
for flag in get_iree_cpu_rt_args():
rt_flags.append(flag)
for flag in rt_flags:
print(flag)
ireert.flags.parse_flags(flag)
if "rocm" in device:
device = "rocm"
with DetailLogger(timeout=2.5) as dl:
# First get configs.
if device_idx is not None:
dl.log(f"Mapping device id: {device_idx}")
device = iree_device_map(device)
haldriver = ireert.get_driver(device)
dl.log(f"ireert.get_driver()")
hal_device_id = haldriver.query_available_devices()[device_idx][
"device_id"
]
haldevice = haldriver.create_device(
hal_device_id,
allocators=amdshark_args.device_allocator,
)
dl.log(f"ireert.create_device()")
config = ireert.Config(device=haldevice)
config.id = hal_device_id
dl.log(f"ireert.Config()")
else:
config = get_iree_runtime_config(device)
dl.log("get_iree_runtime_config")
if "task" in device:
print(
f"[DEBUG] setting iree runtime flags for cpu:\n{' '.join(get_iree_cpu_rt_args())}"
)
for flag in get_iree_cpu_rt_args():
ireert.flags.parse_flags(flag)
# Now load vmfb.
# Two scenarios we have here :-
# 1. We either have the vmfb already saved and therefore pass the path of it.
# (This would arise if we're invoking `load_module` from a AMDSharkInference obj)
# OR 2. We are compiling on the fly, therefore we have the flatbuffer blob to play with.
# (This would arise if we're invoking `compile` from a AMDSharkInference obj)
temp_file_to_unlink = None
if isinstance(flatbuffer_blob_or_path, Path):
flatbuffer_blob_or_path = flatbuffer_blob_or_path.__str__()
if (
isinstance(flatbuffer_blob_or_path, str)
and ".vmfb" in flatbuffer_blob_or_path
):
vmfb_file_path = flatbuffer_blob_or_path
mmaped_vmfb = ireert.VmModule.mmap(
config.vm_instance, flatbuffer_blob_or_path
)
vm_modules = []
if external_weight_file is not None:
index = ireert.ParameterIndex()
index.load(external_weight_file)
param_module = ireert.create_io_parameters_module(
config.vm_instance, index.create_provider(scope="model")
)
vm_modules.append(param_module)
vm_modules.append(mmaped_vmfb)
vm_modules.append(
ireert.create_hal_module(config.vm_instance, config.device)
)
dl.log(f"mmap {flatbuffer_blob_or_path}")
if "vulkan" in device:
# Vulkan pipeline creation consumes significant amount of time.
print(
"\tCompiling Vulkan shaders. This may take a few minutes."
)
ctx = ireert.SystemContext(config=config, vm_modules=vm_modules)
dl.log(f"ireert.SystemContext created")
for flag in amdshark_args.additional_runtime_args:
ireert.flags.parse_flags(flag)
dl.log(f"module initialized")
mmaped_vmfb = getattr(ctx.modules, mmaped_vmfb.name)
else:
with tempfile.NamedTemporaryFile(delete=False) as tf:
tf.write(flatbuffer_blob_or_path)
tf.flush()
vmfb_file_path = tf.name
temp_file_to_unlink = vmfb_file_path
mmaped_vmfb = ireert.VmModule.mmap(instance, vmfb_file_path)
dl.log(f"mmap temp {vmfb_file_path}")
return mmaped_vmfb, config, temp_file_to_unlink
def get_iree_compiled_module(
module,
device: str,
frontend: str = "torch",
model_config_path: str = None,
extra_args: list = [],
rt_flags: list = [],
device_idx: int = None,
mmap: bool = False,
debug: bool = False,
compile_str: bool = False,
external_weight_file: str = None,
write_to: bool = None,
):
"""Given a module returns the compiled .vmfb and configs"""
flatbuffer_blob = compile_module_to_flatbuffer(
module=module,
device=device,
frontend=frontend,
model_config_path=model_config_path,
extra_args=extra_args,
debug=debug,
compile_str=compile_str,
write_to=write_to,
)
temp_file_to_unlink = None
# TODO: Currently mmap=True control flow path has been switched off for mmap.
# Got to find a cleaner way to unlink/delete the temporary file since
# we're setting delete=False when creating NamedTemporaryFile. That's why
# I'm getting hold of the name of the temporary file in `temp_file_to_unlink`.
if mmap:
if write_to is not None:
flatbuffer_blob = write_to
vmfb, config, temp_file_to_unlink = load_vmfb_using_mmap(
flatbuffer_blob,
device,
device_idx,
rt_flags,
external_weight_file=external_weight_file,
)
else:
vmfb, config = get_iree_module(
flatbuffer_blob,
device,
device_idx=device_idx,
rt_flags=rt_flags,
external_weight_file=external_weight_file,
)
ret_params = {
"vmfb": vmfb,
"config": config,
"temp_file_to_unlink": temp_file_to_unlink,
}
return ret_params
def load_flatbuffer(
flatbuffer_path: str,
device: str,
device_idx: int = None,
mmap: bool = False,
rt_flags: list = [],
):
temp_file_to_unlink = None
if mmap:
vmfb, config, temp_file_to_unlink = load_vmfb_using_mmap(
flatbuffer_path, device, device_idx, rt_flags
)
else:
with open(os.path.join(flatbuffer_path), "rb") as f:
flatbuffer_blob = f.read()
vmfb, config = get_iree_module(
flatbuffer_blob,
device,
device_idx=device_idx,
rt_flags=rt_flags,
)
ret_params = {
"vmfb": vmfb,
"config": config,
"temp_file_to_unlink": temp_file_to_unlink,
}
return ret_params
def export_iree_module_to_vmfb(
module,
device: str,
directory: str,
mlir_dialect: str = "linalg",
model_config_path: str = None,
module_name: str = None,
extra_args: list = [],
debug: bool = False,
compile_str: bool = False,
):
# Compiles the module given specs and saves it as .vmfb file.
flatbuffer_blob = compile_module_to_flatbuffer(
module=module,
device=device,
frontend=mlir_dialect,
model_config_path=model_config_path,
extra_args=extra_args,
debug=debug,
compile_str=compile_str,
)
if module_name is None:
device_name = (
device if "://" not in device else "-".join(device.split("://"))
)
module_name = f"{mlir_dialect}_{device_name}"
filename = os.path.join(directory, module_name + ".vmfb")
with open(filename, "wb") as f:
f.write(flatbuffer_blob)
print(f"Saved vmfb in {filename}.")
return filename
def export_module_to_mlir_file(module, frontend, directory: str):
# TODO: write proper documentation.
mlir_str = module
if frontend in ["tensorflow", "tf", "mhlo", "stablehlo", "tflite"]:
mlir_str = module.decode("utf-8")
elif frontend in ["pytorch", "torch"]:
mlir_str = module.operation.get_asm()
filename = os.path.join(directory, "model.mlir")
with open(filename, "w") as f:
f.write(mlir_str)
print(f"Saved mlir in {filename}.")
return filename
def get_results(
compiled_vm,
function_name,
input,
config,
frontend="torch",
send_to_host=True,
debug_timeout: float = 5.0,
device: str = None,
):
"""Runs a .vmfb file given inputs and config and returns output."""
with DetailLogger(debug_timeout) as dl:
device_inputs = []
if device == "rocm" and hasattr(config, "id"):
haldriver = ireert.get_driver("rocm")
haldevice = haldriver.create_device(
config.id,
allocators=amdshark_args.device_allocator,
)
for input_array in input:
dl.log(f"Load to device: {input_array.shape}")
device_inputs.append(
ireert.asdevicearray(config.device, input_array)
)
dl.log(f"Invoke function: {function_name}")
result = compiled_vm[function_name](*device_inputs)
dl.log(f"Invoke complete")
result_tensors = []
if isinstance(result, tuple):
if send_to_host:
for val in result:
dl.log(f"Result to host: {val.shape}")
result_tensors.append(np.asarray(val, val.dtype))
else:
for val in result:
result_tensors.append(val)
return result_tensors
elif isinstance(result, dict):
data = list(result.items())
if send_to_host:
res = np.array(data, dtype=object)
return np.copy(res)
return data
else:
if send_to_host and result is not None:
dl.log("Result to host")
return result.to_host()
return result
dl.log("Execution complete")
@functools.cache
def get_iree_runtime_config(device):
device = iree_device_map(device)
haldriver = ireert.get_driver(device)
if "metal" in device and amdshark_args.device_allocator == "caching":
print(
"[WARNING] metal devices can not have a `caching` allocator."
"\nUsing default allocator `None`"
)
haldevice = haldriver.create_device_by_uri(
device,
# metal devices have a failure with caching allocators atm. blcking this util it gets fixed upstream.
allocators=amdshark_args.device_allocator
if "metal" not in device
else None,
)
config = ireert.Config(device=haldevice)
return config

View File

@@ -14,31 +14,52 @@
# All the iree_cpu related functionalities go here.
import functools
import subprocess
import platform
from amdshark.parser import amdshark_args
def get_cpu_count():
import multiprocessing
try:
cpu_count = multiprocessing.cpu_count()
return cpu_count
except NotImplementedError:
return None
# Get the default cpu args.
@functools.cache
def get_iree_cpu_args():
find_triple_cmd = "uname -s -m"
os_name, proc_name = (
subprocess.run(
find_triple_cmd, shell=True, stdout=subprocess.PIPE, check=True
)
.stdout.decode("utf-8")
.split()
)
uname = platform.uname()
os_name, proc_name = uname.system, uname.machine
if os_name == "Darwin":
find_kernel_version_cmd = "uname -r"
kernel_version = subprocess.run(
find_kernel_version_cmd,
shell=True,
stdout=subprocess.PIPE,
check=True,
).stdout.decode("utf-8")
kernel_version = uname.release
target_triple = f"{proc_name}-apple-darwin{kernel_version}"
elif os_name == "Linux":
target_triple = f"{proc_name}-linux-gnu"
elif os_name == "Windows":
target_triple = "x86_64-pc-windows-msvc"
else:
error_message = f"OS Type f{os_name} not supported and triple can't be determined, open issue to dSHARK team please :)"
error_message = f"OS Type f{os_name} not supported and triple can't be determined, open issue to dAMDSHARK team please :)"
raise Exception(error_message)
print(f"Target triple found:{target_triple}")
return [f"-iree-llvm-target-triple={target_triple}"]
return [
f"--iree-llvmcpu-target-triple={target_triple}",
]
# Get iree runtime flags for cpu
@functools.cache
def get_iree_cpu_rt_args():
default = get_cpu_count()
default = default if default <= 8 else default - 2
cpu_count = (
default
if amdshark_args.task_topology_max_group_count is None
else amdshark_args.task_topology_max_group_count
)
return [f"--task_topology_max_group_count={cpu_count}"]

View File

@@ -0,0 +1,209 @@
# Copyright 2020 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# All the iree_gpu related functionalities go here.
import functools
import iree.runtime as ireert
import ctypes
import sys
from subprocess import CalledProcessError
from amdshark.parser import amdshark_args
from amdshark.iree_utils._common import run_cmd
# TODO: refactor to rocm and cuda utils
# Get the default gpu args given the architecture.
@functools.cache
def get_iree_gpu_args():
ireert.flags.FUNCTION_INPUT_VALIDATION = False
ireert.flags.parse_flags("--cuda_allow_inline_execution")
# TODO: Give the user_interface to pass the sm_arch.
sm_arch = get_cuda_sm_cc()
if (
sm_arch
in ["sm_70", "sm_72", "sm_75", "sm_80", "sm_84", "sm_86", "sm_89"]
) and (amdshark_args.enable_tf32 == True):
return [
f"--iree-hal-cuda-llvm-target-arch={sm_arch}",
]
else:
return []
def check_rocm_device_arch_in_args(extra_args):
# Check if the target arch flag for rocm device present in extra_args
for flag in extra_args:
if "iree-rocm-target-chip" in flag:
flag_arch = flag.split("=")[1]
return flag_arch
return None
def get_rocm_device_arch(device_num=0, extra_args=[], hip_driver=False):
# ROCM Device Arch selection:
# 1 : User given device arch using `--iree-rocm-target-chip` flag
# 2 : Device arch from `iree-run-module --dump_devices=rocm` for device on index <device_num>
# 3 : default arch : gfx1100
arch_in_flag = check_rocm_device_arch_in_args(extra_args)
if arch_in_flag is not None:
print(
f"User Specified rocm target device arch from flag : {arch_in_flag} will be used"
)
return arch_in_flag
arch_in_device_dump = None
# get rocm arch from iree dump devices
def get_devices_info_from_dump(dump, driver):
from os import linesep
if driver == "hip":
dump_clean = list(
filter(
lambda s: "AMD" in s,
dump.split(linesep),
)
)
else:
dump_clean = list(
filter(
lambda s: f"--device={driver}" in s or "gpu-arch-name:" in s,
dump.split(linesep),
)
)
arch_pairs = [
(
dump_clean[i].split("=")[1].strip(),
dump_clean[i + 1].split(":")[1].strip(),
)
for i in range(0, len(dump_clean), 2)
]
return arch_pairs
dump_device_info = None
driver = "hip" if hip_driver else "rocm"
try:
dump_device_info = run_cmd(
"iree-run-module --dump_devices=" + driver, raise_err=True
)
except Exception as e:
print("could not execute `iree-run-module --dump_devices=" + driver + "`")
if dump_device_info is not None:
device_num = 0 if device_num is None else device_num
device_arch_pairs = get_devices_info_from_dump(dump_device_info[0], driver)
if len(device_arch_pairs) > device_num: # can find arch in the list
arch_in_device_dump = device_arch_pairs[device_num][1]
if arch_in_device_dump is not None:
print(f"Found ROCm device arch : {arch_in_device_dump}")
return arch_in_device_dump
default_rocm_arch = "gfx1100"
print(
"Did not find ROCm architecture from `--iree-rocm-target-chip` flag"
"\n or from `iree-run-module --dump_devices` command."
f"\nUsing {default_rocm_arch} as ROCm arch for compilation."
)
return default_rocm_arch
# Get the default gpu args given the architecture.
def get_iree_rocm_args(device_num=0, extra_args=[], hip_driver=False):
ireert.flags.FUNCTION_INPUT_VALIDATION = False
rocm_flags = []
if check_rocm_device_arch_in_args(extra_args) is None:
rocm_arch = get_rocm_device_arch(device_num, extra_args, hip_driver=hip_driver)
rocm_flags.append(f"--iree-rocm-target-chip={rocm_arch}")
return rocm_flags
# Some constants taken from cuda.h
CUDA_SUCCESS = 0
CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT = 16
CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39
CU_DEVICE_ATTRIBUTE_CLOCK_RATE = 13
CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE = 36
@functools.cache
def get_cuda_sm_cc():
libnames = ("libcuda.so", "libcuda.dylib", "nvcuda.dll")
for libname in libnames:
try:
cuda = ctypes.CDLL(libname)
except OSError:
continue
else:
break
else:
raise OSError("could not load any of: " + " ".join(libnames))
nGpus = ctypes.c_int()
name = b" " * 100
cc_major = ctypes.c_int()
cc_minor = ctypes.c_int()
result = ctypes.c_int()
device = ctypes.c_int()
context = ctypes.c_void_p()
error_str = ctypes.c_char_p()
result = cuda.cuInit(0)
if result != CUDA_SUCCESS:
cuda.cuGetErrorString(result, ctypes.byref(error_str))
print(
"cuInit failed with error code %d: %s"
% (result, error_str.value.decode())
)
return 1
result = cuda.cuDeviceGetCount(ctypes.byref(nGpus))
if result != CUDA_SUCCESS:
cuda.cuGetErrorString(result, ctypes.byref(error_str))
print(
"cuDeviceGetCount failed with error code %d: %s"
% (result, error_str.value.decode())
)
return 1
print("Found %d device(s)." % nGpus.value)
for i in range(nGpus.value):
result = cuda.cuDeviceGet(ctypes.byref(device), i)
if result != CUDA_SUCCESS:
cuda.cuGetErrorString(result, ctypes.byref(error_str))
print(
"cuDeviceGet failed with error code %d: %s"
% (result, error_str.value.decode())
)
return 1
print("Device: %d" % i)
if (
cuda.cuDeviceGetName(ctypes.c_char_p(name), len(name), device)
== CUDA_SUCCESS
):
print(" Name: %s" % (name.split(b"\0", 1)[0].decode()))
if (
cuda.cuDeviceComputeCapability(
ctypes.byref(cc_major), ctypes.byref(cc_minor), device
)
== CUDA_SUCCESS
):
print(
" Compute Capability: %d.%d"
% (cc_major.value, cc_minor.value)
)
sm = f"sm_{cc_major.value}{cc_minor.value}"
return sm

View File

@@ -0,0 +1,102 @@
# Copyright 2023 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# All the iree_vulkan related functionalities go here.
import functools
from amdshark.iree_utils._common import run_cmd
import iree.runtime as ireert
from sys import platform
from amdshark.iree_utils.vulkan_target_env_utils import get_vulkan_target_env_flag
@functools.cache
def get_metal_device_name(device_num=0):
iree_device_dump = run_cmd("iree-run-module --dump_devices")
iree_device_dump = iree_device_dump[0].split("\n\n")
metal_device_list = [
s.split("\n#")[2] for s in iree_device_dump if "--device=metal" in s
]
if len(metal_device_list) == 0:
raise ValueError("No device name found in device dump!")
if len(metal_device_list) > 1:
print("Following devices found:")
for i, dname in enumerate(metal_device_list):
print(f"{i}. {dname}")
print(f"Choosing device: {metal_device_list[device_num]}")
return metal_device_list[device_num]
def get_os_name():
if platform.startswith("linux"):
return "linux"
elif platform == "darwin":
return "macos"
elif platform == "win32":
return "windows"
else:
print("Cannot detect OS type, defaulting to linux.")
return "linux"
def get_metal_target_triple(device_name):
"""This method provides a target triple str for specified vulkan device.
Args:
device_name (str): name of the hardware device to be used with vulkan
Returns:
str or None: target triple or None if no match found for given name
"""
return "macos"
def get_metal_triple_flag(device_name="", device_num=0, extra_args=[]):
for flag in extra_args:
if "-iree-metal-target-platform=" in flag:
print(f"Using target triple {flag.split('=')[1]}")
return None
if device_name == "" or device_name == [] or device_name is None:
metal_device = get_metal_device_name(device_num=device_num)
else:
metal_device = device_name
triple = get_metal_target_triple(metal_device)
if triple is not None:
print(
f"Found metal device {metal_device}. Using metal target platform {triple}"
)
return f"-iree-metal-target-platform={triple}"
print(
"""Optimized kernel for your target device is not added yet.
Contact AMDSHARK Admin on discord[https://discord.com/invite/RUqY2h2s9u]
or pull up an issue."""
)
print(f"Target : {metal_device}")
return None
def get_iree_metal_args(device_num=0, extra_args=[]):
# Add any metal spefic compilation flags here
res_metal_flag = []
if len(extra_args) > 0:
res_metal_flag.extend(extra_args)
return res_metal_flag
def set_iree_metal_runtime_flags(flags):
for flag in flags:
ireert.flags.parse_flags(flag)
return

View File

@@ -0,0 +1,76 @@
# Copyright 2023 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List, Tuple
import os
import threading
import time
def _enable_detail_trace() -> bool:
return os.getenv("AMDSHARK_DETAIL_TRACE", "0") == "1"
class DetailLogger:
"""Context manager which can accumulate detailed log messages.
Detailed log is only emitted if the operation takes a long time
or errors.
"""
def __init__(self, timeout: float):
self._timeout = timeout
self._messages: List[Tuple[float, str]] = []
self._start_time = time.time()
self._active = not _enable_detail_trace()
self._lock = threading.RLock()
self._cond = threading.Condition(self._lock)
self._thread = None
def __enter__(self):
self._thread = threading.Thread(target=self._run)
self._thread.start()
return self
def __exit__(self, type, value, traceback):
with self._lock:
self._active = False
self._cond.notify()
if traceback:
self.dump_on_error(f"exception")
def _run(self):
with self._lock:
timed_out = not self._cond.wait(self._timeout)
if timed_out:
self.dump_on_error(f"took longer than {self._timeout}s")
def log(self, msg):
with self._lock:
timestamp = time.time()
if self._active:
self._messages.append((timestamp, msg))
else:
print(f" +{(timestamp - self._start_time) * 1000}ms: {msg}")
def dump_on_error(self, summary: str):
with self._lock:
if self._active:
print(f"::: Detailed report ({summary}):")
for timestamp, msg in self._messages:
print(
f" +{(timestamp - self._start_time) * 1000}ms: {msg}"
)
self._active = False

View File

@@ -0,0 +1,538 @@
# Copyright 2020 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from collections import OrderedDict
import functools
@functools.cache
def get_vulkan_target_env(vulkan_target_triple):
arch, product, os = vulkan_target_triple.split("=")[1].split("-")
triple = (arch, product, os)
# get version
version = get_version(triple=triple)
# TODO get revision
revision = 120
# extensions
extensions = get_extensions(triple)
# get vendor
vendor = get_vendor(triple)
# get device type
device_type = get_device_type(triple)
# get capabilities
capabilities = get_vulkan_target_capabilities(triple)
target_env = f"<#spirv.vce<{version}, r({revision}), {extensions}>, {vendor}:{device_type}, #spirv.resource_limits< {capabilities} >>"
return target_env
def get_vulkan_target_env_flag(vulkan_target_triple):
target_env = get_vulkan_target_env(vulkan_target_triple)
target_env_flag = f"--iree-vulkan-target-env={target_env}"
return target_env_flag
def get_version(triple):
arch, product, os = triple
if os in ["android30", "android31"]:
return "v1.1"
if product in ["android30", "android31"]:
return "v1.1"
if arch in ["unknown"]:
return "v1.1"
return "v1.3"
@functools.cache
def get_extensions(triple):
def make_ext_list(ext_list):
res = ", ".join(ext_list)
return f"[{res}]"
arch, product, os = triple
if arch == "m1":
ext = [
"SPV_KHR_16bit_storage",
"SPV_KHR_8bit_storage",
"SPV_KHR_shader_float16_int8",
"SPV_KHR_storage_buffer_storage_class",
"SPV_KHR_variable_pointers",
]
return make_ext_list(ext_list=ext)
if arch == "valhall":
ext = [
"SPV_KHR_16bit_storage",
"SPV_KHR_8bit_storage",
"SPV_KHR_shader_float16_int8",
"SPV_KHR_spirv_1_4",
"SPV_KHR_storage_buffer_storage_class",
"SPV_KHR_variable_pointers",
]
return make_ext_list(ext_list=ext)
if arch == "adreno":
ext = [
"SPV_KHR_16bit_storage",
"SPV_KHR_shader_float16_int8",
"SPV_KHR_spirv_1_4",
"SPV_KHR_storage_buffer_storage_class",
"SPV_KHR_variable_pointers",
]
if os == "android31":
ext.append("SPV_KHR_8bit_storage")
return make_ext_list(ext_list=ext)
if get_vendor(triple) == "SwiftShader":
ext = ["SPV_KHR_storage_buffer_storage_class"]
return make_ext_list(ext_list=ext)
if arch == "unknown":
ext = [
"SPV_KHR_storage_buffer_storage_class",
"SPV_KHR_variable_pointers",
]
return make_ext_list(ext_list=ext)
ext = [
"SPV_KHR_16bit_storage",
"SPV_KHR_8bit_storage",
"SPV_KHR_shader_float16_int8",
"SPV_KHR_spirv_1_4",
"SPV_KHR_storage_buffer_storage_class",
"SPV_KHR_variable_pointers",
"VK_EXT_subgroup_size_control",
]
if get_vendor(triple) == "NVIDIA" or arch == "rdna3":
ext.append("SPV_KHR_cooperative_matrix")
if get_vendor(triple) == ["NVIDIA", "AMD", "Intel"]:
ext.append("SPV_KHR_shader_integer_dot_product")
return make_ext_list(ext_list=ext)
@functools.cache
def get_vendor(triple):
arch, product, os = triple
if arch == "unknown":
return "Unknown"
if arch in ["rdna1", "rdna2", "rdna3", "rgcn3", "rgcn4", "rgcn5"]:
return "AMD"
if arch == "valhall":
return "ARM"
if arch == "m1":
return "Apple"
if arch in ["arc", "UHD"]:
return "Intel"
if arch in ["turing", "ampere", "pascal"]:
return "NVIDIA"
if arch == "adreno":
return "Qualcomm"
if arch == "cpu":
if product == "swiftshader":
return "SwiftShader"
return "Unknown"
print(f"Vendor for target triple - {triple} not found. Using unknown")
return "Unknown"
@functools.cache
def get_device_type(triple):
arch, product, _ = triple
if arch == "unknown":
return "Unknown"
if arch == "cpu":
return "CPU"
if arch in ["turing", "ampere", "arc", "pascal"]:
return "DiscreteGPU"
if arch in ["rdna1", "rdna2", "rdna3", "rgcn3", "rgcn5"]:
if product == "ivega10":
return "IntegratedGPU"
return "DiscreteGPU"
if arch in ["m1", "valhall", "adreno"]:
return "IntegratedGPU"
print(f"Device type for target triple - {triple} not found. Using unknown")
return "Unknown"
# get all the capabilities for the device
# TODO: make a dataclass for capabilites and init using vulkaninfo
@functools.cache
def get_vulkan_target_capabilities(triple):
def get_subgroup_val(l):
return int(sum([subgroup_feature[sgf] for sgf in l]))
cap = OrderedDict()
arch, product, os = triple
subgroup_feature = {
"Basic": 1,
"Vote": 2,
"Arithmetic": 4,
"Ballot": 8,
"Shuffle": 16,
"ShuffleRelative": 32,
"Clustered": 64,
"Quad": 128,
"PartitionedNV": 256,
}
cap["max_compute_shared_memory_size"] = 16384
cap["max_compute_workgroup_invocations"] = 128
cap["max_compute_workgroup_size"] = [128, 128, 64]
cap["subgroup_size"] = 32
cap["subgroupFeatures"] = ["Basic"]
cap["min_subgroup_size"] = None
cap["max_subgroup_size"] = None
cap["shaderFloat16"] = False
cap["shaderFloat64"] = False
cap["shaderInt8"] = False
cap["shaderInt16"] = False
cap["shaderInt64"] = False
cap["storageBuffer16BitAccess"] = False
cap["storagePushConstant16"] = False
cap["uniformAndStorageBuffer16BitAccess"] = False
cap["storageBuffer8BitAccess"] = False
cap["storagePushConstant8"] = False
cap["uniformAndStorageBuffer8BitAccess"] = False
cap["variablePointers"] = False
cap["variablePointersStorageBuffer"] = False
cap["coopmatCases"] = None
if arch in ["rdna1", "rdna2", "rdna3"]:
cap["max_compute_shared_memory_size"] = 65536
cap["max_compute_workgroup_invocations"] = 1024
cap["max_compute_workgroup_size"] = [1024, 1024, 1024]
cap["subgroup_size"] = 64
cap["min_subgroup_size"] = 32
cap["max_subgroup_size"] = 64
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Shuffle",
"ShuffleRelative",
"Clustered",
"Quad",
]
cap["shaderFloat16"] = True
cap["shaderFloat64"] = True
cap["shaderInt8"] = True
cap["shaderInt16"] = True
cap["shaderInt64"] = True
cap["shaderIntegerDotProduct"] = True
cap["storageBuffer16BitAccess"] = True
cap["storagePushConstant16"] = True
cap["uniformAndStorageBuffer16BitAccess"] = True
cap["storageBuffer8BitAccess"] = True
cap["storagePushConstant8"] = True
cap["uniformAndStorageBuffer8BitAccess"] = True
cap["variablePointers"] = True
cap["variablePointersStorageBuffer"] = True
if arch == "rdna3":
# TODO: Get scope value
cap["coopmatCases"] = [
"m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f16, result_type = f16, acc_sat = false, scope = <Subgroup>",
"m_size = 16, n_size = 16, k_size = 16, a_type = f16, b_type = f16, c_type = f32, result_type = f32, acc_sat = false, scope = <Subgroup>"
]
if product == "rx5700xt":
cap["storagePushConstant16"] = False
cap["storagePushConstant8"] = False
elif arch in ["rgcn5", "rgcn4", "rgcn3"]:
cap["max_compute_shared_memory_size"] = 65536
cap["max_compute_workgroup_invocations"] = 1024
cap["max_compute_workgroup_size"] = [1024, 1024, 1024]
cap["subgroup_size"] = 64
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Shuffle",
"ShuffleRelative",
"Clustered",
"Quad",
]
cap["min_subgroup_size"] = 64
cap["max_subgroup_size"] = 64
if arch == "rgcn5":
cap["shaderFloat16"] = True
cap["shaderFloat64"] = True
cap["storageBuffer16BitAccess"] = True
cap["shaderInt8"] = True
cap["shaderInt16"] = True
cap["shaderInt64"] = True
cap["shaderIntegerDotProduct"] = True
cap["storagePushConstant16"] = False
cap["uniformAndStorageBuffer16BitAccess"] = True
cap["storageBuffer8BitAccess"] = True
cap["storagePushConstant8"] = False
cap["uniformAndStorageBuffer8BitAccess"] = True
cap["variablePointers"] = True
cap["variablePointersStorageBuffer"] = True
elif arch == "m1":
cap["max_compute_shared_memory_size"] = 32768
cap["max_compute_workgroup_invocations"] = 1024
cap["max_compute_workgroup_size"] = [1024, 1024, 1024]
cap["subgroup_size"] = 32
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Shuffle",
"ShuffleRelative",
"Quad",
]
cap["shaderFloat16"] = True
cap["shaderFloat64"] = True
cap["shaderInt8"] = True
cap["shaderInt16"] = True
cap["shaderInt64"] = True
cap["shaderIntegerDotProduct"] = False
cap["storageBuffer16BitAccess"] = True
cap["storagePushConstant16"] = True
cap["uniformAndStorageBuffer16BitAccess"] = True
cap["storageBuffer8BitAccess"] = True
cap["storagePushConstant8"] = True
cap["uniformAndStorageBuffer8BitAccess"] = True
cap["variablePointers"] = True
cap["variablePointersStorageBuffer"] = True
elif arch == "valhall":
cap["max_compute_shared_memory_size"] = 32768
cap["max_compute_workgroup_invocations"] = 512
cap["max_compute_workgroup_size"] = [512, 512, 512]
cap["subgroup_size"] = 16
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Clustered",
"Quad",
]
if os == "android31":
cap["subgroupFeatures"].append("Shuffle")
cap["subgroupFeatures"].append("ShuffleRelative")
cap["shaderFloat16"] = True
cap["shaderInt8"] = True
cap["shaderInt16"] = True
cap["storageBuffer16BitAccess"] = True
cap["storagePushConstant16"] = True
cap["uniformAndStorageBuffer16BitAccess"] = True
cap["storageBuffer8BitAccess"] = True
cap["storagePushConstant8"] = True
cap["uniformAndStorageBuffer8BitAccess"] = True
cap["variablePointers"] = True
cap["variablePointersStorageBuffer"] = True
elif arch == "arc":
cap["max_compute_shared_memory_size"] = 32768
cap["max_compute_workgroup_invocations"] = 1024
cap["max_compute_workgroup_size"] = [1024, 1024, 64]
cap["subgroup_size"] = 32
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Shuffle",
"ShuffleRelative",
"Clustered",
"Quad",
]
cap["shaderFloat16"] = True
cap["shaderFloat64"] = False
cap["shaderInt8"] = True
cap["shaderInt16"] = True
cap["shaderInt64"] = False
cap["shaderIntegerDotProduct"] = True
cap["storageBuffer16BitAccess"] = True
cap["storagePushConstant16"] = True
cap["uniformAndStorageBuffer16BitAccess"] = True
cap["storageBuffer8BitAccess"] = True
cap["storagePushConstant8"] = True
cap["uniformAndStorageBuffer8BitAccess"] = True
cap["variablePointers"] = True
cap["variablePointersStorageBuffer"] = True
elif arch == "cpu":
if product == "swiftshader":
cap["max_compute_shared_memory_size"] = 16384
cap["subgroup_size"] = 4
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Shuffle",
"ShuffleRelative",
]
elif arch in ["pascal"]:
cap["max_compute_shared_memory_size"] = 49152
cap["max_compute_workgroup_invocations"] = 1536
cap["max_compute_workgroup_size"] = [1536, 1024, 64]
cap["subgroup_size"] = 32
cap["min_subgroup_size"] = 32
cap["max_subgroup_size"] = 32
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Shuffle",
"ShuffleRelative",
"Clustered",
"Quad",
]
cap["shaderFloat16"] = False
cap["shaderFloat64"] = True
cap["shaderInt8"] = True
cap["shaderInt16"] = True
cap["shaderInt64"] = True
cap["shaderIntegerDotProduct"] = True
cap["storageBuffer16BitAccess"] = True
cap["storagePushConstant16"] = True
cap["uniformAndStorageBuffer16BitAccess"] = True
cap["storageBuffer8BitAccess"] = True
cap["storagePushConstant8"] = True
cap["uniformAndStorageBuffer8BitAccess"] = True
cap["variablePointers"] = True
cap["variablePointersStorageBuffer"] = True
elif arch in ["ampere", "turing"]:
cap["max_compute_shared_memory_size"] = 49152
cap["max_compute_workgroup_invocations"] = 1024
cap["max_compute_workgroup_size"] = [1024, 1024, 1024]
cap["subgroup_size"] = 32
cap["min_subgroup_size"] = 32
cap["max_subgroup_size"] = 32
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Shuffle",
"ShuffleRelative",
"Clustered",
"Quad",
]
cap["shaderFloat16"] = True
cap["shaderFloat64"] = True
cap["shaderInt8"] = True
cap["shaderInt16"] = True
cap["shaderInt64"] = True
cap["shaderIntegerDotProduct"] = True
cap["storageBuffer16BitAccess"] = True
cap["storagePushConstant16"] = True
cap["uniformAndStorageBuffer16BitAccess"] = True
cap["storageBuffer8BitAccess"] = True
cap["storagePushConstant8"] = True
cap["uniformAndStorageBuffer8BitAccess"] = True
cap["variablePointers"] = True
cap["variablePointersStorageBuffer"] = True
cap["coopmatCases"] = [
"mSize = 8, nSize = 8, kSize = 32, aType = i8, bType = i8, cType = i32, resultType = i32, accSat = false, scope = #vk.scope<Subgroup>",
"mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f16, resultType = f16, accSat = false, scope = #vk.scope<Subgroup>",
"mSize = 16, nSize = 16, kSize = 16, aType = f16, bType = f16, cType = f32, resultType = f32, accSat = false, scope = #vk.scope<Subgroup>",
]
elif arch == "adreno":
cap["max_compute_shared_memory_size"] = 32768
cap["max_compute_workgroup_invocations"] = 1024
cap["max_compute_workgroup_size"] = [1024, 1024, 64]
cap["subgroup_size"] = 64
cap["subgroupFeatures"] = [
"Basic",
"Vote",
"Arithmetic",
"Ballot",
"Shuffle",
"ShuffleRelative",
"Quad",
]
cap["shaderFloat16"] = True
cap["shaderInt8"] = True
cap["shaderInt16"] = True
cap["storageBuffer16BitAccess"] = True
if os == "android31":
cap["uniformAndStorageBuffer8BitAccess"] = True
cap["variablePointers"] = True
cap["variablePointersStorageBuffer"] = True
elif arch == "unknown":
cap["subgroup_size"] = 64
cap["variablePointers"] = False
cap["variablePointersStorageBuffer"] = False
else:
print(
f"Architecture {arch} not matched. Using default vulkan target device capability"
)
def get_comma_sep_str(ele_list):
l = ""
for ele in ele_list:
l += f"{ele}, "
l = f"[{l[:-2]}]"
return l
res = ""
for k, v in cap.items():
if v is None or v == False:
continue
if isinstance(v, bool):
res += f"{k} = {'unit' if v == True else None}, "
elif isinstance(v, list):
if k == "subgroupFeatures":
res += f"subgroup_features = {get_subgroup_val(v)}: i32, "
elif k == "max_compute_workgroup_size":
res += f"max_compute_workgroup_size = dense<{get_comma_sep_str(v)}>: vector<{len(v)}xi32>, "
elif k == "coopmatCases":
cmc = ""
for case in v:
cmc += f"#spirv.coop_matrix_props_khr<{case}>, "
res += f"cooperative_matrix_properties_khr = [{cmc[:-2]}], "
else:
res += f"{k} = {get_comma_sep_str(v)}, "
else:
res += f"{k} = {v}, "
res = res[:-2]
return res

View File

@@ -0,0 +1,221 @@
# Copyright 2020 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# All the iree_vulkan related functionalities go here.
import functools
from os import linesep
from amdshark.iree_utils._common import run_cmd
import iree.runtime as ireert
from sys import platform
from amdshark.iree_utils.vulkan_target_env_utils import get_vulkan_target_env_flag
from amdshark.parser import amdshark_args
@functools.cache
def get_all_vulkan_devices():
from iree.runtime import get_driver
try:
driver = get_driver("vulkan")
device_list_src = driver.query_available_devices()
except:
device_list_src = {}
return [d["name"] for d in device_list_src]
@functools.cache
def get_vulkan_device_name(device_num=0):
if isinstance(device_num, int):
vulkaninfo_list = get_all_vulkan_devices()
if len(vulkaninfo_list) == 0:
raise ValueError("No device name found in VulkanInfo!")
if len(vulkaninfo_list) > 1:
print("Following devices found:")
for i, dname in enumerate(vulkaninfo_list):
print(f"{i}. {dname}")
print(f"Choosing device: vulkan://{device_num}")
vulkan_device_name = vulkaninfo_list[device_num]
else:
from iree.runtime import get_driver
vulkan_device_driver = get_driver(device_num)
vulkan_device_name = vulkan_device_driver.query_available_devices()[0]
print(vulkan_device_name)
return vulkan_device_name
def get_os_name():
if platform.startswith("linux"):
return "linux"
elif platform == "darwin":
return "macos"
elif platform == "win32":
return "windows"
else:
print("Cannot detect OS type, defaulting to linux.")
return "linux"
@functools.cache
def get_vulkan_target_triple(device_name):
"""This method provides a target triple str for specified vulkan device.
Args:
device_name (str): name of the hardware device to be used with vulkan
Returns:
str or None: target triple or None if no match found for given name
"""
# TODO: Replace this with a dict or something smarter.
system_os = get_os_name()
# Apple Targets
if all(x in device_name for x in ("Apple", "M1")):
triple = "m1-moltenvk-macos"
elif all(x in device_name for x in ("Apple", "M2")):
triple = "m1-moltenvk-macos"
# Nvidia Targets
elif all(x in device_name for x in ("RTX", "2080")):
triple = f"turing-rtx2080-{system_os}"
elif all(x in device_name for x in ("A100", "SXM4")):
triple = f"ampere-a100-{system_os}"
elif all(x in device_name for x in ("RTX", "3090")):
triple = f"ampere-rtx3090-{system_os}"
elif all(x in device_name for x in ("RTX", "3080")):
triple = f"ampere-rtx3080-{system_os}"
elif all(x in device_name for x in ("RTX", "3070")):
triple = f"ampere-rtx3070-{system_os}"
elif all(x in device_name for x in ("RTX", "3060")):
triple = f"ampere-rtx3060-{system_os}"
elif all(x in device_name for x in ("RTX", "3050")):
triple = f"ampere-rtx3050-{system_os}"
# We use ampere until lovelace target triples are plumbed in.
elif all(x in device_name for x in ("RTX", "4090")):
triple = f"ampere-rtx4090-{system_os}"
elif all(x in device_name for x in ("RTX", "4080")):
triple = f"ampere-rtx4080-{system_os}"
elif all(x in device_name for x in ("RTX", "4070")):
triple = f"ampere-rtx4070-{system_os}"
elif all(x in device_name for x in ("RTX", "4000")):
triple = f"turing-rtx4000-{system_os}"
elif all(x in device_name for x in ("RTX", "5000")):
triple = f"turing-rtx5000-{system_os}"
elif all(x in device_name for x in ("RTX", "6000")):
triple = f"turing-rtx6000-{system_os}"
elif all(x in device_name for x in ("RTX", "8000")):
triple = f"turing-rtx8000-{system_os}"
elif all(x in device_name for x in ("TITAN", "RTX")):
triple = f"turing-titanrtx-{system_os}"
elif all(x in device_name for x in ("GTX", "1060")):
triple = f"pascal-gtx1060-{system_os}"
elif all(x in device_name for x in ("GTX", "1070")):
triple = f"pascal-gtx1070-{system_os}"
elif all(x in device_name for x in ("GTX", "1080")):
triple = f"pascal-gtx1080-{system_os}"
# Amd Targets
# Linux: Radeon RX 7900 XTX
# Windows: AMD Radeon RX 7900 XTX
elif all(x in device_name for x in ("RX", "7800")):
triple = f"rdna3-7800-{system_os}"
elif all(x in device_name for x in ("RX", "7900")):
triple = f"rdna3-7900-{system_os}"
elif all(x in device_name for x in ("Radeon", "780M")):
triple = f"rdna3-780m-{system_os}"
elif all(x in device_name for x in ("AMD", "PRO", "W7900")):
triple = f"rdna3-w7900-{system_os}"
elif any(x in device_name for x in ("AMD", "Radeon")):
triple = f"rdna2-unknown-{system_os}"
# Intel Targets
elif any(x in device_name for x in ("A770", "A750")):
triple = f"arc-770-{system_os}"
elif "v620" in device_name:
triple = f"rdna2-v620-{system_os}"
# Adreno Targets
elif all(x in device_name for x in ("Adreno", "740")):
triple = f"adreno-a740-{system_os}"
else:
triple = None
return triple
def get_vulkan_triple_flag(device_name="", device_num=0, extra_args=[]):
for flag in extra_args:
if "-iree-vulkan-target-triple=" in flag:
print(f"Using target triple {flag.split('=')[1]}")
return None
if device_name == "" or device_name == [] or device_name is None:
vulkan_device = get_vulkan_device_name(device_num=device_num)
else:
vulkan_device = device_name
triple = get_vulkan_target_triple(vulkan_device)
if triple is not None:
print(
f"Found vulkan device {vulkan_device}. Using target triple {triple}"
)
return f"--iree-vulkan-target-triple={triple}"
print(
"""Optimized kernel for your target device is not added yet.
Contact AMDSHARK Admin on discord[https://discord.com/invite/RUqY2h2s9u]
or pull up an issue."""
)
print(f"Target : {vulkan_device}")
return None
def get_iree_vulkan_args(device_num=0, extra_args=[]):
# res_vulkan_flag = ["--iree-flow-demote-i64-to-i32"]
res_vulkan_flag = []
res_vulkan_flag += [
"--iree-stream-resource-max-allocation-size=3221225472",
"--iree-flow-inline-constants-max-byte-length=0"
]
vulkan_triple_flag = None
for arg in extra_args:
if "-iree-vulkan-target-triple=" in arg:
print(f"Using target triple {arg} from command line args")
vulkan_triple_flag = arg
break
if vulkan_triple_flag is None:
vulkan_triple_flag = get_vulkan_triple_flag(
device_num=device_num, extra_args=extra_args
)
res_vulkan_flag += [vulkan_triple_flag]
return res_vulkan_flag
@functools.cache
def get_iree_vulkan_runtime_flags():
vulkan_runtime_flags = [
f"--vulkan_validation_layers={'true' if amdshark_args.vulkan_debug_utils else 'false'}",
f"--vulkan_debug_verbosity={'4' if amdshark_args.vulkan_debug_utils else '0'}"
f"--vulkan-robust-buffer-access={'true' if amdshark_args.vulkan_debug_utils else 'false'}",
]
return vulkan_runtime_flags
def set_iree_vulkan_runtime_flags(flags):
for flag in flags:
ireert.flags.parse_flags(flag)
return

View File

@@ -0,0 +1,468 @@
# Copyright 2020 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Usage:
This function takes the model mlir file and the tuned config file as input,
and output a new mlir file with lowering configs annotated on certain ops.
There are two ways to utilize the function:
1. Call model_annotation function within another python script
from amdshark.model_annotation import model_annotation
with create_context() as ctx:
module = model_annotation(ctx, input_contents=..., config_path=..., search_op=...)
2. Run model_annotation.py directly
python model_annotation.py -model path_to_original_mlir -config_path path_to_config_file
"""
import json
import os
import sys
from typing import Dict, List
import iree.compiler._mlir_libs
from iree.compiler import ir
def model_annotation(
ctx: ir.Context,
*,
input_contents: str,
config_path: str,
search_op: str,
winograd: bool = False,
):
if os.path.isfile(input_contents):
with open(input_contents, "rb") as f:
input_contents = f.read()
module = ir.Module.parse(input_contents)
if config_path == "":
return module
if winograd:
with open(config_path, "r") as f:
data = json.load(f)
configs = data["c,f"]
else:
configs = load_model_configs(config_path)
# The Python API does not expose a general walk() function, so we just
# do it ourselves.
walk_children(module.operation, configs, search_op, winograd)
if not module.operation.verify():
raise RuntimeError("Modified program does not verify!")
return module
def load_model_configs(config_path: str):
config = {}
with open(config_path, "r") as f:
for line in f:
data = json.loads(line)
if "identifier" not in data.keys():
continue
if data["identifier"] == "matmul":
matrix_size = [data["m"], data["n"], data["k"]]
elif data["identifier"] == "bmm":
matrix_size = [data["b"], data["m"], data["n"], data["k"]]
elif data["identifier"] == "generic":
matrix_size = [1, data["b"], data["m"], data["n"], data["k"]]
elif data["identifier"] == "conv":
matrix_size = [
data["n"],
data["ih"],
data["iw"],
data["c"],
data["kh"],
data["kw"],
data["f"],
data["oh"],
data["ow"],
data["d"],
data["s"],
data["p"],
]
config[shape_list_to_string(matrix_size)] = data
f.close()
return config
def walk_children(
op: ir.Operation, configs: List[Dict], search_op: str, winograd: bool
):
if search_op == "matmul":
op_names = ["linalg.matmul", "mhlo.dot"]
elif search_op == "bmm":
op_names = ["linalg.batch_matmul", "mhlo.dot_general"]
elif search_op == "conv":
op_names = ["mhlo.convolution", "linalg.conv_2d_nhwc_hwcf"]
elif search_op == "generic":
op_names = ["linalg.generic"]
elif search_op == "all":
op_names = [
"mhlo.dot",
"mhlo.dot_general",
"mhlo.convolution",
"linalg.matmul",
"linalg.batch_matmul",
"linalg.conv_2d_nhwc_hwcf",
"linalg.generic",
]
else:
raise ValueError(f"{search_op} op is not tunable.")
for region in op.regions:
for block in region.blocks:
for child_op in block.operations:
# TODO: This is dumb. Both Operation and OpView should expose
# 'operation' and 'name' attributes.
if isinstance(child_op, ir.OpView):
child_op = child_op.operation
if winograd and child_op.name in [
"linalg.conv_2d_nchw_fchw",
"linalg.conv_2d_nhwc_hwcf",
]:
add_winograd_attribute(child_op, configs)
if child_op.name in op_names:
if child_op.name == "linalg.generic":
# This is for generic op that has contractionOpInterface
# which is basically einsum("mk,bkn->bmn")
op_result = str(child_op.results[0])
op_iterator = str(
child_op.attributes["iterator_types"]
)
if len(child_op.operands) != 3:
continue
if "reduction" not in op_iterator:
continue
if (
"arith.addf" not in op_result
or "arith.mulf" not in op_result
):
continue
if "arith.subf" in op_result:
continue
child_op_shape = get_op_shape(child_op, search_op)
if (
child_op_shape in configs.keys()
and configs[child_op_shape]["options"][0] != None
):
add_attributes(
child_op, configs[child_op_shape]["options"][0]
)
walk_children(child_op, configs, search_op, winograd)
def get_op_shape(op: ir.Operation, search_op: str):
shape_list = []
if search_op in ["generic", "all"]:
if op.name in ["linalg.generic"]:
input1 = str(op.operands[0].type)
input2 = str(op.operands[1].type)
m = input1.split("tensor<")[1].split("x")[0]
b = input2.split("tensor<")[1].split("x")[0]
k = input2.split("tensor<")[1].split("x")[1]
n = input2.split("tensor<")[1].split("x")[2]
shape_list = [1, int(b), int(m), int(n), int(k)]
if search_op in ["matmul", "all"]:
if op.name in ["mhlo.dot"]:
op_result = str(op.results[0])
m = op_result.split("tensor<")[1].split("x")[0]
k = op_result.split("tensor<")[1].split("x")[1]
n = op_result.split("tensor<")[2].split("x")[1]
shape_list = [int(m), int(n), int(k)]
elif op.name in ["linalg.matmul"]:
op_result = str(op.results[0]).split("ins(")[1]
m = op_result.split("tensor<")[1].split("x")[0]
k = op_result.split("tensor<")[1].split("x")[1]
n = op_result.split("tensor<")[2].split("x")[1]
shape_list = [int(m), int(n), int(k)]
if search_op in ["bmm", "all"]:
if op.name in ["mhlo.dot_general"]:
op_result = str(op.results[0])
b = op_result.split("tensor<")[1].split("x")[1]
m = op_result.split("tensor<")[1].split("x")[2]
k = op_result.split("tensor<")[1].split("x")[3]
n = op_result.split("tensor<")[3].split("x")[3]
shape_list = [int(b), int(m), int(n), int(k)]
elif op.name in ["linalg.batch_matmul"]:
op_result = str(op.results[0]).split("ins(")[1]
b = op_result.split("tensor<")[1].split("x")[0]
m = op_result.split("tensor<")[1].split("x")[1]
k = op_result.split("tensor<")[1].split("x")[2]
n = op_result.split("tensor<")[3].split("x")[2]
shape_list = [int(b), int(m), int(n), int(k)]
if search_op in ["conv", "all"]:
if op.name in ["mhlo.convolution"]:
op_result = str(op.results[0])
dilation = (
str(op.attributes["rhs_dilation"])
.split("dense<")[1]
.split(">")[0]
)
stride = (
str(op.attributes["window_strides"])
.split("dense<")[1]
.split(">")[0]
)
pad = (
str(op.attributes["padding"]).split("dense<")[1].split(">")[0]
)
n = op_result.split("tensor<")[1].split("x")[0]
ih = op_result.split("tensor<")[1].split("x")[1]
iw = op_result.split("tensor<")[1].split("x")[2]
c = op_result.split("tensor<")[1].split("x")[3]
kh = op_result.split("tensor<")[2].split("x")[0]
kw = op_result.split("tensor<")[2].split("x")[1]
f = op_result.split("tensor<")[2].split("x")[3]
oh = op_result.split("tensor<")[3].split("x")[1]
ow = op_result.split("tensor<")[3].split("x")[2]
shape_list = [
int(n),
int(ih),
int(iw),
int(c),
int(kh),
int(kw),
int(f),
int(oh),
int(ow),
int(dilation),
int(stride),
int(pad),
]
elif op.name in ["linalg.conv_2d_nhwc_hwcf"]:
op_result = str(op.results[0]).split("ins(")[1]
dilation = (
str(op.attributes["dilations"])
.split("dense<")[1]
.split(">")[0]
)
stride = (
str(op.attributes["strides"]).split("dense<")[1].split(">")[0]
)
pad = 0
n = op_result.split("tensor<")[1].split("x")[0]
ih = op_result.split("tensor<")[1].split("x")[1]
iw = op_result.split("tensor<")[1].split("x")[2]
c = op_result.split("tensor<")[1].split("x")[3]
kh = op_result.split("tensor<")[2].split("x")[0]
kw = op_result.split("tensor<")[2].split("x")[1]
f = op_result.split("tensor<")[2].split("x")[3]
oh = op_result.split("tensor<")[3].split("x")[1]
ow = op_result.split("tensor<")[3].split("x")[2]
shape_list = [
int(n),
int(ih),
int(iw),
int(c),
int(kh),
int(kw),
int(f),
int(oh),
int(ow),
int(dilation),
int(stride),
int(pad),
]
shape_str = shape_list_to_string(shape_list)
return shape_str
def add_attributes(op: ir.Operation, config: List[Dict]):
# Parse the config file
split_k = None
pipeline_depth = None
store_stage = None
subgroup_size = None
if "GPU" in config["pipeline"]:
pipeline = (
"LLVMGPUMatmulSimt"
if config["pipeline"] == "GPU"
else "LLVMGPUMatmulTensorCore"
)
tile_sizes = [config["work_group_tile_sizes"]]
workgroup_size = config["work_group_sizes"]
if "pipeline_depth" in config.keys():
pipeline_depth = config["pipeline_depth"]
if "split_k" in config.keys():
split_k = config["split_k"]
elif "SPIRV" in config["pipeline"]:
pipeline = config["pipeline"]
if pipeline == "SPIRVMatmulPromoteVectorize":
tile_sizes = [
config["work_group_tile_sizes"]
+ [config["reduction_tile_sizes"][-1]],
]
else:
tile_sizes = [
config["work_group_tile_sizes"],
config["parallel_tile_sizes"],
config["reduction_tile_sizes"],
]
workgroup_size = config["work_group_sizes"]
if "vector_tile_sizes" in config.keys():
tile_sizes += [config["vector_tile_sizes"]]
if "window_tile_sizes" in config.keys():
tile_sizes += [config["window_tile_sizes"]]
if "subgroup_size" in config.keys():
subgroup_size = config["subgroup_size"]
if "pipeline_depth" in config.keys():
pipeline_depth = config["pipeline_depth"]
if "store_stage" in config.keys():
store_stage = config["store_stage"]
else:
# For IREE CPU pipelines
pipeline = config["pipeline"]
tile_sizes = [
config["work_group_tile_sizes"],
config["parallel_tile_sizes"],
config["reduction_tile_sizes"],
]
workgroup_size = []
# Add compilation info as an attribute. We don't have a Python binding for CompilationInfo,
# so we just parse its string form.
if pipeline_depth != None:
translation_info = f"{pipeline} pipeline_depth = {pipeline_depth}"
if store_stage != None:
translation_info += f" store_stage = {store_stage}"
else:
translation_info = f"{pipeline}"
compilation_info = (
f"#iree_codegen.compilation_info<"
f"lowering_config = <tile_sizes = {repr(tile_sizes)}>, "
f"translation_info = <{translation_info}>, "
f"workgroup_size = {repr(workgroup_size)} "
)
if subgroup_size != None:
compilation_info += f", subgroup_size = {subgroup_size}>"
else:
compilation_info += ">"
attr = ir.Attribute.parse(compilation_info)
op.attributes["compilation_info"] = attr
# Add other attributes if required.
if split_k:
add_attribute_by_name(op, "iree_flow_split_k", split_k)
def add_winograd_attribute(op: ir.Operation, config: List):
op_result = str(op.results[0]).split("ins(")[1]
dilation = int(
str(op.attributes["dilations"]).split("dense<")[1].split(">")[0]
)
stride = int(
str(op.attributes["strides"]).split("dense<")[1].split(">")[0]
)
if op.name == "linalg.conv_2d_nchw_fchw":
f = int(op_result.split("tensor<")[2].split("x")[0])
c = int(op_result.split("tensor<")[2].split("x")[1])
kh = int(op_result.split("tensor<")[2].split("x")[2])
kw = int(op_result.split("tensor<")[2].split("x")[3])
else:
kh = int(op_result.split("tensor<")[2].split("x")[0])
kw = int(op_result.split("tensor<")[2].split("x")[1])
c = int(op_result.split("tensor<")[2].split("x")[2])
f = int(op_result.split("tensor<")[2].split("x")[3])
if (
dilation == 1
and stride == 1
and kh == 3
and kw == 3
and [c, f] in config
):
op.attributes["iree_winograd_conv"] = ir.IntegerAttr.get(
ir.IntegerType.get_signless(64), 1
)
def add_attribute_by_name(op: ir.Operation, name: str, val: int):
attr = ir.IntegerAttr.get(ir.IntegerType.get_signless(64), val)
op.attributes[name] = attr
def shape_list_to_string(input):
return "x".join([str(d) for d in input])
def create_context() -> ir.Context:
context = ir.Context()
context.allow_unregistered_dialects = True
return context
if __name__ == "__main__":
import argparse
from pathlib import Path
def path_expand(s):
return Path(s).expanduser().resolve()
parser = argparse.ArgumentParser()
parser.add_argument(
"-model",
type=path_expand,
default="model.mlir",
help="Path to the input mlir file",
)
parser.add_argument(
"-config_path",
type=path_expand,
default="best_configs.json",
help="Path where stores the op config file",
)
parser.add_argument(
"-output_path",
type=path_expand,
default="tuned_model.mlir",
help="Path to save the annotated mlir file",
)
parser.add_argument(
"-search_op",
type=str,
default="all",
help="Op to be optimized. options are matmul, bmm, conv.",
)
args = parser.parse_args()
with create_context() as ctx:
module = model_annotation(
ctx,
input_contents=args.model,
config_path=args.config_path,
search_op=args.search_op,
)
mlir_str = str(module)
with open(args.output_path, "w") as f:
f.write(mlir_str)
print(f"Saved mlir in {args.output_path}.")

170
amdshark/parser.py Normal file
View File

@@ -0,0 +1,170 @@
# Copyright 2020 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
import shlex
import subprocess
class SplitStrToListAction(argparse.Action):
def __init__(self, option_strings, dest, *args, **kwargs):
super(SplitStrToListAction, self).__init__(
option_strings=option_strings, dest=dest, *args, **kwargs
)
def __call__(self, parser, namespace, values, option_string=None):
del parser, option_string
setattr(namespace, self.dest, shlex.split(" "))
parser = argparse.ArgumentParser(description="AMDSHARK runner.")
parser.add_argument(
"--device",
type=str,
default="cpu",
help="Device on which amdshark_runner runs. options are cpu, cuda, and vulkan",
)
parser.add_argument(
"--additional_compile_args",
default=list(),
nargs=1,
action=SplitStrToListAction,
help="Additional arguments to pass to the compiler. These are appended as the last arguments.",
)
parser.add_argument(
"--additional_runtime_args",
default=list(),
nargs=1,
action=SplitStrToListAction,
help="Additional arguments to pass to the IREE runtime. These are appended as the last arguments.",
)
parser.add_argument(
"--enable_tf32",
type=bool,
default=False,
help="Enables TF32 precision calculations on supported GPUs.",
)
parser.add_argument(
"--model_config_path",
help="Directory to where the tuned model config file is located.",
default=None,
)
parser.add_argument(
"--num_warmup_iterations",
type=int,
default=5,
help="Run the model for the specified number of warmup iterations.",
)
parser.add_argument(
"--num_iterations",
type=int,
default=100,
help="Run the model for the specified number of iterations.",
)
parser.add_argument(
"--onnx_bench",
default=False,
action="store_true",
help="When enabled, pytest bench results will include ONNX benchmark results.",
)
parser.add_argument(
"--amdshark_prefix",
default=None,
help="gs://amdshark_tank/<this_flag>/model_directories",
)
parser.add_argument(
"--update_tank",
default=True,
action="store_true",
help="When enabled, AMDSHARK downloader will update local amdshark_tank if local hash is different from latest upstream hash.",
)
parser.add_argument(
"--force_update_tank",
default=False,
action="store_true",
help="When enabled, AMDSHARK downloader will force an update of local amdshark_tank artifacts for each request.",
)
parser.add_argument(
"--local_tank_cache",
default=None,
help="Specify where to save downloaded amdshark_tank artifacts. If this is not set, the default is ~/.local/amdshark_tank/.",
)
parser.add_argument(
"--dispatch_benchmarks",
default=None,
help='dispatches to return benchamrk data on. use "All" for all, and None for none.',
)
parser.add_argument(
"--dispatch_benchmarks_dir",
default="temp_dispatch_benchmarks",
help='directory where you want to store dispatch data generated with "--dispatch_benchmarks"',
)
parser.add_argument(
"--enable_conv_transform",
default=False,
action="store_true",
help="Enables the --iree-flow-enable-conv-nchw-to-nhwc-transform flag.",
)
parser.add_argument(
"--enable_img2col_transform",
default=False,
action="store_true",
help="Enables the --iree-flow-enable-conv-img2col-transform flag.",
)
parser.add_argument(
"--use_winograd",
default=False,
action="store_true",
help="Enables the --iree-flow-enable-conv-winograd-transform flag.",
)
parser.add_argument(
"--device_allocator",
type=str,
nargs="*",
default=["caching"],
help="Specifies one or more HAL device allocator specs "
"to augment the base device allocator",
choices=["debug", "caching"],
)
parser.add_argument(
"--task_topology_max_group_count",
type=str,
default=None,
help="passthrough flag for the iree flag of the same name. If None, defaults to cpu-count",
)
parser.add_argument(
"--vulkan_debug_utils",
default=False,
action=argparse.BooleanOptionalAction,
help="Profiles vulkan device and collects the .rdc info.",
)
parser.add_argument(
"--vulkan_validation_layers",
default=False,
action=argparse.BooleanOptionalAction,
help="Flag for disabling vulkan validation layers when benchmarking.",
)
amdshark_args, unknown = parser.parse_known_args()

315
amdshark/stress_test.py Normal file
View File

@@ -0,0 +1,315 @@
# Copyright 2022 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from iree.runtime import query_available_drivers, get_driver
from amdshark.amdshark_downloader import download_model
from amdshark.amdshark_inference import AMDSharkInference
from typing import List, Optional, Tuple
import numpy as np
import argparse
from amdshark.iree_utils._common import _IREE_DEVICE_MAP
import multiprocessing
from amdshark.amdshark_runner import supported_dialects
import logging
from concurrent.futures import ProcessPoolExecutor
from concurrent.futures.thread import ThreadPoolExecutor
import time
import numpy as np
IREE_TO_AMDSHARK_DRIVER_MAP = {v: k for k, v in _IREE_DEVICE_MAP.items()}
def stress_test_compiled_model(
amdshark_module_path: str,
function_name: str,
device: str,
inputs: List[np.ndarray],
golden_out: List[np.ndarray],
batch_size: int,
max_iterations: int,
max_duration_seconds: float,
inference_timeout_seconds: float,
tolerance_nulp: int,
stress_test_index: int,
):
logging.info(
f"Running stress test {stress_test_index} on device {device}."
)
# All interactions with the module must run in a single thread.
# We are using execution in a sperate thread in order to be able
# to wait with a timeout on the inference operation.
module_executor = ThreadPoolExecutor(1)
amdshark_module = module_executor.submit(
AMDSharkInference,
mlir_module=bytes(),
function_name=function_name,
device=device,
).result()
module_executor.submit(
amdshark_module.load_module, amdshark_module_path
).result()
input_batches = [np.repeat(arr, batch_size, axis=0) for arr in inputs]
golden_output_batches = np.repeat(golden_out, batch_size, axis=0)
report_interval_seconds = 10
start_time = time.time()
previous_report_time = start_time
first_iteration_output = None
for i in range(max_iterations):
output = module_executor.submit(
amdshark_module.forward, input_batches
).result(inference_timeout_seconds)
if first_iteration_output is None:
np.testing.assert_array_almost_equal_nulp(
golden_output_batches, output, nulp=tolerance_nulp
)
first_iteration_output = output
else:
np.testing.assert_array_equal(output, first_iteration_output)
current_time = time.time()
if report_interval_seconds < current_time - previous_report_time:
logging.info(
f"Stress test {stress_test_index} on device "
f"{device} at iteration {i+1}"
)
previous_report_time = current_time
if max_duration_seconds < current_time - start_time:
return
logging.info(f"Stress test {stress_test_index} on device {device} done.")
def get_device_type(device_name: str):
return device_name.split("://", 1)[0]
def get_device_types(device_names: str):
return [get_device_type(device_name) for device_name in device_names]
def query_devices(device_types: Optional[List[str]] = None) -> List[str]:
devices = []
if device_types is None:
device_types = [
IREE_TO_AMDSHARK_DRIVER_MAP[name]
for name in query_available_drivers()
if name in IREE_TO_AMDSHARK_DRIVER_MAP
]
for device_type in device_types:
driver = get_driver(_IREE_DEVICE_MAP[device_type])
device_infos = driver.query_available_devices()
for device_info in device_infos:
uri_path = (
device_info["path"]
if device_info["path"] != ""
else str(device_info["device_id"])
)
device_uri = f"{device_type}://{uri_path}"
devices.append(device_uri)
return devices
def compile_stress_test_module(
device_types: List[str], mlir_model: str, func_name: str, mlir_dialect: str
) -> List[str]:
amdshark_module_paths = []
for device_type in device_types:
logging.info(
f"Compiling stress test model for device type {device_type}."
)
amdshark_module = AMDSharkInference(
mlir_model,
func_name,
mlir_dialect=mlir_dialect,
device=device_type,
)
amdshark_module_paths.append(amdshark_module.save_module())
return amdshark_module_paths
def stress_test(
model_name: str,
dynamic_model: bool = False,
device_types: Optional[List[str]] = None,
device_names: Optional[List[str]] = None,
batch_size: int = 1,
max_iterations: int = 10**7,
max_duration_seconds: float = 3600,
inference_timeout_seconds: float = 60,
mlir_dialect: str = "linalg",
frontend: str = "torch",
oversubscription_factor: int = 1,
tolerance_nulp: int = 50000,
):
logging.info(f"Downloading stress test model {model_name}.")
mlir_model, func_name, inputs, golden_out = download_model(
model_name=model_name, dynamic=dynamic_model, frontend=frontend
)
if device_names is None or device_types is not None:
device_names = [] if device_names is None else device_names
with ProcessPoolExecutor() as executor:
# query_devices needs to run in a separate process,
# because it will interfere with other processes that are forked later.
device_names.extend(
executor.submit(query_devices, device_types).result()
)
device_types_set = list(set(get_device_types(device_names)))
with ProcessPoolExecutor() as executor:
# This needs to run in a subprocess because when compiling for CUDA,
# some stuff get intialized and cuInit will fail in a forked process
# later. It should be just compiling, but alas.
amdshark_module_paths_set = executor.submit(
compile_stress_test_module,
device_types_set,
mlir_model,
func_name,
mlir_dialect,
).result()
device_type_amdshark_module_path_map = {
device_type: module_path
for device_type, module_path in zip(
device_types_set, amdshark_module_paths_set
)
}
device_name_amdshark_module_path_map = {
device_name: device_type_amdshark_module_path_map[
get_device_type(device_name)
]
for device_name in device_names
}
# This needs to run in a spearate process, because it uses the drvier chache
# in IREE and a subsequent call to `iree.runtime.SystemContext.add_vm_module`
# in a forked process will hang.
with multiprocessing.Pool(
len(device_name_amdshark_module_path_map) * oversubscription_factor
) as process_pool:
process_pool.starmap(
stress_test_compiled_model,
[
(
module_path,
func_name,
device_name,
inputs,
golden_out,
batch_size,
max_iterations,
max_duration_seconds,
inference_timeout_seconds,
tolerance_nulp,
stress_test_index,
)
for stress_test_index, (device_name, module_path) in enumerate(
list(device_name_amdshark_module_path_map.items())
* oversubscription_factor
)
],
)
if __name__ == "__main__":
logging.basicConfig(encoding="utf-8", level=logging.INFO)
parser = argparse.ArgumentParser(
description="Downloads, compiles and runs a model from the tank to stress test the system."
)
parser.add_argument(
"--model", type=str, help="Model name in the tank.", default="alexnet"
)
parser.add_argument(
"--dynamic",
help="Use dynamic version of the model.",
action="store_true",
default=False,
)
parser.add_argument(
"--frontend", type=str, help="Frontend of the model.", default="torch"
)
parser.add_argument(
"--mlir-dialect",
type=str,
help="MLIR dialect of the model.",
default="linalg",
choices=supported_dialects,
)
parser.add_argument(
"--device-types",
type=str,
nargs="*",
choices=_IREE_DEVICE_MAP.keys(),
help="Runs the stress test on all devices with that type. "
"If absent and no deveices are specified "
"will run against all available devices.",
)
parser.add_argument(
"--devices",
type=str,
nargs="*",
help="List of devices to run the stress test on. "
"If device-types is specified will run against the union of the two.",
)
parser.add_argument(
"--batch-size",
type=int,
help="Number of inputs to feed into the model",
default=1,
)
parser.add_argument(
"--oversubscription",
type=int,
help="Oversubscrption factor. Each device will execute the model simultaneously "
"this many number of times.",
default=1,
)
parser.add_argument(
"--max-iterations",
type=int,
help="Maximum number of iterations to run the stress test per device.",
default=10**7,
)
parser.add_argument(
"--max-duration",
type=float,
help="Maximum number of seconds to run the stress test.",
default=3600,
)
parser.add_argument(
"--inference-timeout",
type=float,
help="Timeout in seconds for a single model inference operation.",
default=60,
)
parser.add_argument(
"--tolerance-nulp",
type=int,
help="The maximum number of unit in the last place for tolerance "
"when verifing results with the golden reference output.",
default=50000,
)
args = parser.parse_known_args()[0]
stress_test(
model_name=args.model,
dynamic_model=args.dynamic,
frontend=args.frontend,
mlir_dialect=args.mlir_dialect,
device_types=args.device_types,
device_names=args.devices,
batch_size=args.batch_size,
oversubscription_factor=args.oversubscription,
max_iterations=args.max_iterations,
max_duration_seconds=args.max_duration,
inference_timeout_seconds=args.inference_timeout,
tolerance_nulp=args.tolerance_nulp,
)

View File

@@ -1,10 +1,10 @@
# RUN: %PYTHON %s
import numpy as np
from shark.shark_importer import SharkImporter
from amdshark.amdshark_importer import AMDSharkImporter
import pytest
from shark.parser import shark_args
from shark.shark_inference import SharkInference
from shark.tflite_utils import TFLitePreprocessor
from amdshark.parser import amdshark_args
from amdshark.amdshark_inference import AMDSharkInference
from amdshark.tflite_utils import TFLitePreprocessor
import sys
# model_path = "https://tfhub.dev/tensorflow/lite-model/albert_lite_base/squadv1/1?lite-format=tflite"
@@ -66,32 +66,32 @@ class AlbertTfliteModuleTester:
self.save_vmfb = save_vmfb
def create_and_check_module(self):
shark_args.save_mlir = self.save_mlir
shark_args.save_vmfb = self.save_vmfb
amdshark_args.save_mlir = self.save_mlir
amdshark_args.save_vmfb = self.save_vmfb
tflite_preprocessor = TFLitePreprocessor(model_name="albert_lite_base")
raw_model_file_path = tflite_preprocessor.get_raw_model_file()
inputs = tflite_preprocessor.get_inputs()
tflite_interpreter = tflite_preprocessor.get_interpreter()
my_shark_importer = SharkImporter(
my_amdshark_importer = AMDSharkImporter(
module=tflite_interpreter,
inputs=inputs,
frontend="tflite",
raw_model_file=raw_model_file_path,
)
mlir_model, func_name = my_shark_importer.import_mlir()
mlir_model, func_name = my_amdshark_importer.import_mlir()
shark_module = SharkInference(
amdshark_module = AMDSharkInference(
mlir_module=mlir_model,
function_name=func_name,
device=self.device,
mlir_dialect="tflite",
)
# Case1: Use shark_importer default generate inputs
shark_module.compile()
mlir_results = shark_module.forward(inputs)
# Case1: Use amdshark_importer default generate inputs
amdshark_module.compile()
mlir_results = amdshark_module.forward(inputs)
## post process results for compare
input_details, output_details = tflite_preprocessor.get_model_details()
mlir_results = list(mlir_results)
@@ -105,14 +105,14 @@ class AlbertTfliteModuleTester:
input_details, output_details = tflite_preprocessor.get_model_details()
inputs = generate_inputs(input_details) # new inputs
shark_module = SharkInference(
amdshark_module = AMDSharkInference(
mlir_module=mlir_model,
function_name=func_name,
device=self.device,
mlir_dialect="tflite",
)
shark_module.compile()
mlir_results = shark_module.forward(inputs)
amdshark_module.compile()
mlir_results = amdshark_module.forward(inputs)
## post process results for compare
tflite_results = tflite_preprocessor.get_golden_output()
compare_results(mlir_results, tflite_results, output_details)

View File

@@ -0,0 +1,31 @@
# Copyright 2022 The Nod Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import pytest
import subprocess
import sys
import importlib.util
def test_stress_test():
subprocess.check_call(
[
sys.executable,
importlib.util.find_spec("amdshark.stress_test").origin,
"--model=squeezenet1_0",
"--devices",
"cpu",
"--max-iterations=1",
]
)

View File

@@ -0,0 +1,62 @@
import unittest
from unittest.mock import mock_open, patch
from apps.stable_diffusion.web.ui.txt2img_ui import (
export_settings,
load_settings,
all_gradio_labels,
)
class TestExportSettings(unittest.TestCase):
@patch("builtins.open", new_callable=mock_open)
@patch("json.dump")
def test_export_settings(self, mock_json_dump, mock_file):
test_values = ["value1", "value2", "value3"]
expected_output = {
"txt2img": {
label: value
for label, value in zip(all_gradio_labels, test_values)
}
}
export_settings(*test_values)
mock_file.assert_called_once_with("./ui/settings.json", "w")
mock_json_dump.assert_called_once_with(
expected_output, mock_file(), indent=4
)
@patch("apps.stable_diffusion.web.ui.txt2img_ui.json.load")
@patch(
"builtins.open",
new_callable=mock_open,
read_data='{"txt2img": {"some_setting": "some_value"}}',
)
def test_load_settings_file_exists(self, mock_file, mock_json_load):
mock_json_load.return_value = {
"txt2img": {
"txt2img_custom_model": "custom_model_value",
"custom_vae": "custom_vae_value",
}
}
settings = load_settings()
self.assertEqual(settings[0], "custom_model_value")
self.assertEqual(settings[1], "custom_vae_value")
@patch("apps.stable_diffusion.web.ui.txt2img_ui.json.load")
@patch("builtins.open", side_effect=FileNotFoundError)
def test_load_settings_file_not_found(self, mock_file, mock_json_load):
settings = load_settings()
default_lora_weights = "None"
self.assertEqual(settings[4], default_lora_weights)
@patch("apps.stable_diffusion.web.ui.txt2img_ui.json.load")
@patch("builtins.open", new_callable=mock_open, read_data="{}")
def test_load_settings_key_error(self, mock_file, mock_json_load):
mock_json_load.return_value = {}
settings = load_settings()
default_lora_weights = "None"
self.assertEqual(settings[4], default_lora_weights)

View File

@@ -96,7 +96,7 @@ class TFLitePreprocessor:
print("Setting up for TMP_WORK_DIR")
self.workdir = os.path.join(
os.path.dirname(__file__), "./../gen_shark_tank"
os.path.dirname(__file__), "./../gen_amdshark_tank"
)
os.makedirs(self.workdir, exist_ok=True)
print(f"TMP_WORK_DIR = {self.workdir}")

View File

@@ -0,0 +1,220 @@
# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
# Also available under a BSD-style license. See LICENSE.
import contextlib
import re
import traceback
import warnings
from typing import Any
import numpy as np
import torch
from torch.utils._pytree import tree_map
from torch_mlir.eager_mode.ir_building import build_mlir_module
from torch_mlir.eager_mode.torch_mlir_dispatch import (
UnsupportedByTorchMlirEagerMode,
normalize_args_kwargs,
check_get_aliased_arg,
)
from torch_mlir.eager_mode import EAGER_MODE_DEBUG
from torch_mlir.eager_mode.torch_mlir_tensor import (
TorchMLIRTensor,
check_requires_grad,
make_wrapper_subclass_from_torch_tensor,
make_bare_wrapper_subclass,
UNSUPPORTED_OPS,
no_dispatch,
)
from torch_mlir.eager_mode import torch_mlir_tensor
from amdshark.iree_eager_backend import EagerModeIREELinalgOnTensorsBackend
backend = EagerModeIREELinalgOnTensorsBackend("cpu")
torch_mlir_tensor.backend = backend
rtol = 1e-04
atol = 1e-05
class TorchMLIRLockstepTensor(TorchMLIRTensor):
"""This class overrides the dispatching for TorchMLIRTensor to allow for an op-by-op numerical comparison between PyTorch and the Torch-MLIR -> IREE backend compilation pipeline. This only supports the IREE backend and focuses on op-by-op level verification.
TODO: Extend this to do a cumulative trace with summary statistics at the end. Possibly requires a wrapper environment to store full trace info.
"""
def __new__(cls, elem, **kwargs):
if kwargs.get("constructing_from_device_tensor", False):
tensor_meta_data = backend.get_torch_metadata(elem, kwargs)
r = make_bare_wrapper_subclass(
cls=cls,
size=tensor_meta_data.size,
strides=tensor_meta_data.strides,
storage_offset=tensor_meta_data.storage_offset,
dtype=tensor_meta_data.dtype,
layout=tensor_meta_data.layout,
device=tensor_meta_data.device,
requires_grad=tensor_meta_data.requires_grad,
)
r.elem = elem
elif isinstance(elem, torch.nn.Parameter):
r = make_wrapper_subclass_from_torch_tensor(
cls, elem.data, **kwargs
)
# This is a hack to handle non-contiguous data through IREE-backend
nt = elem.detach().data.numpy()
if not nt.flags["C_CONTIGUOUS"]:
nt = np.ascontiguousarray(nt, dtype=nt.dtype)
r.elem = backend.transfer_from_torch_to_device(
torch.from_numpy(nt)
)
elif isinstance(elem, torch.Tensor):
r = make_wrapper_subclass_from_torch_tensor(cls, elem, **kwargs)
# Ditto TODO: Find a better way to handle this
nt = elem.numpy()
if not nt.flags["C_CONTIGUOUS"]:
nt = np.ascontiguousarray(nt, dtype=nt.dtype)
r.elem = backend.transfer_from_torch_to_device(
torch.from_numpy(nt)
)
# This branch handles the case when a python scalar is passed to some op
# or is returned from some aten op, such as _local_scalar_dense.
elif isinstance(elem, (int, float, bool)):
return elem
else:
raise ValueError(f"Unknown element type: {type(elem)}")
return r
def __repr__(self):
if self.grad_fn:
return f"TorchMLIRLockstepTensor({self.elem}, backend={backend.__class__.__name__}, grad_fn={self.grad_fn})"
else:
return f"TorchMLIRLockstepTensor({self.elem}, backend={backend.__class__.__name__})"
"""This does essentially the same dispatch as TorchMLIRTensor but operates as if debug mode is enabled. The numeric verification happens after the Torch-MLIR result is obtained by comparing against the
"""
@classmethod
def __torch_dispatch__(cls, func, _types, args=(), kwargs=None):
requires_grad = check_requires_grad(*args, **kwargs)
try:
with no_dispatch():
if hasattr(func, "op_name"):
op_name = func.op_name
elif hasattr(func, "__name__"):
# Handle builtin_function_or_method.
op_name = func.__name__
else:
raise RuntimeError(f"op {func} has no name")
if UNSUPPORTED_OPS.match(op_name):
raise UnsupportedByTorchMlirEagerMode(op_name)
if not hasattr(func, "_schema"):
raise RuntimeError(f"op {func} has no schema.")
normalized_kwargs = normalize_args_kwargs(func, args, kwargs)
if "layout" in normalized_kwargs and normalized_kwargs[
"layout"
] not in {0, None}:
raise UnsupportedByTorchMlirEagerMode(
f"{normalized_kwargs['layout']} layout not supported."
)
if "memory_format" in normalized_kwargs and normalized_kwargs[
"memory_format"
] not in {0, None}:
raise UnsupportedByTorchMlirEagerMode(
f"{normalized_kwargs['memory_format']} memory format not supported."
)
eager_module = build_mlir_module(func, normalized_kwargs)
device_tensor_args = [
kwarg.elem
for _, kwarg in normalized_kwargs.items()
if isinstance(kwarg, cls)
]
assert len(eager_module.body.operations[0].arguments) == len(
device_tensor_args
), "Number of parameters and number of arguments differs."
op_mlir_backend_callable = backend.compile(eager_module)
out = op_mlir_backend_callable(*device_tensor_args)
out = tree_map(
lambda x: cls(
x,
requires_grad=requires_grad,
constructing_from_device_tensor=True,
),
out,
)
# Numeric verification; Value for comparison comes from PyTorch eager
with no_dispatch():
unwrapped_args = tree_map(cls.unwrap, args)
unwrapped_kwargs = tree_map(cls.unwrap, kwargs)
if "_reshape_alias" in op_name:
native_out = torch.ops.aten.view(
unwrapped_args[0], unwrapped_args[1]
)
else:
native_out = func(*unwrapped_args, **unwrapped_kwargs)
native_out = tree_map(
lambda x: cls(x, requires_grad=requires_grad), native_out
).elem
tmp_out = out.elem
try:
np.testing.assert_allclose(
native_out.to_host(),
tmp_out.to_host(),
rtol=rtol,
atol=atol,
)
except Exception as e:
shaped_args = [
arg.shape if torch.is_tensor(arg) else arg
for arg in unwrapped_args
]
shaped_kwargs = [
kwarg.shape if torch.is_tensor(kwarg) else kwarg
for kwarg in unwrapped_kwargs
]
warnings.warn(
f"Lockstep accuracy verification failed with error: *{str(e)}*; "
f"Dispatched function name: *{str(func)}*; "
f"Dispatched function args: *{str(shaped_args)}*; "
f"Dispatched function kwargs: *{str(shaped_kwargs)}*; "
)
except Exception as e:
warnings.warn(traceback.format_exc())
if isinstance(e, UnsupportedByTorchMlirEagerMode):
warnings.warn(
f"Couldn't use TorchMLIR eager because current incompatibility: *{str(e)}*; running through PyTorch eager."
)
else:
warnings.warn(
f"Couldn't use TorchMLIR eager because of error: *{str(e)}*; "
f"Running through PyTorch eager"
)
with no_dispatch():
unwrapped_args = tree_map(cls.unwrap, args)
unwrapped_kwargs = tree_map(cls.unwrap, kwargs)
if "_reshape_alias" in op_name:
out = torch.ops.aten.view(
unwrapped_args[0], unwrapped_args[1]
)
else:
out = func(*unwrapped_args, **unwrapped_kwargs)
out = tree_map(lambda x: cls(x, requires_grad=requires_grad), out)
maybe_aliased_arg_name = check_get_aliased_arg(func)
if maybe_aliased_arg_name is not None:
warnings.warn(
f"Found aliased arg, but didn't copy tensor contents. This could lead to incorrect results for E2E model execution but doesn't affect the validity of the lockstep op verification."
)
# TODO: Find a way to handle argument aliasing for IREE backend
# backend.copy_into(normalized_kwargs[maybe_aliased_arg_name].elem, out.elem)
return out

View File

@@ -15,6 +15,15 @@
from torch_mlir.ir import StringAttr
import torch_mlir
from torch_mlir_e2e_test.linalg_on_tensors_backends import refbackend
import tempfile
from amdshark.parser import amdshark_args
import io
mlir_type_mapping_dict = {
"linalg": torch_mlir.OutputType.LINALG_ON_TENSORS,
"stablehlo": torch_mlir.OutputType.STABLEHLO,
"tosa": torch_mlir.OutputType.TOSA,
}
def get_module_name_for_asm_dump(module):
@@ -53,20 +62,29 @@ def get_torch_mlir_module(
input: tuple,
dynamic: bool,
jit_trace: bool,
from_torchscript: bool = False,
return_str: bool = False,
mlir_type: str = "linalg",
):
"""Get the MLIR's linalg-on-tensors module from torchscipt module."""
"""Get the MLIR's linalg-on-tensors module from the torchscipt module."""
ignore_traced_shapes = False
if dynamic:
input = create_dynamic_placeholders(input)
if jit_trace:
ignore_traced_shapes = True
module = torch_mlir.compile(
tempfile.tempdir = "."
mlir_module = torch_mlir.compile(
module,
input,
output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
output_type=mlir_type_mapping_dict[mlir_type],
use_tracing=jit_trace,
ignore_traced_shapes=ignore_traced_shapes,
)
return module
if return_str:
return mlir_module.operation.get_asm()
bytecode_stream = io.BytesIO()
mlir_module.operation.write_bytecode(bytecode_stream)
bytecode = bytecode_stream.getvalue()
return bytecode

0
apps/__init__.py Normal file
View File

View File

@@ -0,0 +1,48 @@
# -*- mode: python ; coding: utf-8 -*-
from apps.amdshark_studio.studio_imports import pathex, datas, hiddenimports
binaries = []
block_cipher = None
a = Analysis(
['web/index.py'],
pathex=pathex,
binaries=binaries,
datas=datas,
hiddenimports=hiddenimports,
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False,
module_collection_mode={
'gradio': 'py', # Collect gradio package as source .py files
},
)
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
exe = EXE(
pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
[],
name='nodai_amdshark_studio',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=False,
upx_exclude=[],
runtime_tmpdir=None,
console=True,
disable_windowed_traceback=False,
argv_emulation=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None,
)

View File

@@ -0,0 +1,107 @@
# from turbine_models.custom_models.controlnet import control_adapter, preprocessors
import os
import PIL
import numpy as np
from apps.amdshark_studio.web.utils.file_utils import (
get_generated_imgs_path,
)
from datetime import datetime
from PIL import Image
from gradio.components.image_editor import (
EditorValue,
)
class control_adapter:
def __init__(
self,
model: str,
):
self.model = None
def export_control_adapter_model(model_keyword):
return None
def export_xl_control_adapter_model(model_keyword):
return None
class preprocessors:
def __init__(
self,
model: str,
):
self.model = None
def export_controlnet_model(model_keyword):
return None
control_adapter_map = {
"sd15": {
"canny": {"initializer": control_adapter.export_control_adapter_model},
"openpose": {"initializer": control_adapter.export_control_adapter_model},
"scribble": {"initializer": control_adapter.export_control_adapter_model},
"zoedepth": {"initializer": control_adapter.export_control_adapter_model},
},
"sdxl": {
"canny": {"initializer": control_adapter.export_xl_control_adapter_model},
},
}
preprocessor_model_map = {
"canny": {"initializer": preprocessors.export_controlnet_model},
"openpose": {"initializer": preprocessors.export_controlnet_model},
"scribble": {"initializer": preprocessors.export_controlnet_model},
"zoedepth": {"initializer": preprocessors.export_controlnet_model},
}
class PreprocessorModel:
def __init__(
self,
hf_model_id,
device="cpu",
):
self.model = hf_model_id
self.device = device
def compile(self):
print("compile not implemented for preprocessor.")
return
def run(self, inputs):
print("run not implemented for preprocessor.")
return inputs
def cnet_preview(model, input_image):
curr_datetime = datetime.now().strftime("%Y-%m-%d.%H-%M-%S")
control_imgs_path = os.path.join(get_generated_imgs_path(), "control_hints")
if not os.path.exists(control_imgs_path):
os.mkdir(control_imgs_path)
img_dest = os.path.join(control_imgs_path, model + curr_datetime + ".png")
match model:
case "canny":
canny = PreprocessorModel("canny")
result = canny(
np.array(input_image),
100,
200,
)
Image.fromarray(result).save(fp=img_dest)
return result, img_dest
case "openpose":
openpose = PreprocessorModel("openpose")
result = openpose(np.array(input_image))
Image.fromarray(result[0]).save(fp=img_dest)
return result, img_dest
case "zoedepth":
zoedepth = PreprocessorModel("ZoeDepth")
result = zoedepth(np.array(input_image))
Image.fromarray(result).save(fp=img_dest)
return result, img_dest
case "scribble":
input_image.save(fp=img_dest)
return input_image, img_dest
case _:
return None, None

View File

@@ -0,0 +1,125 @@
import importlib
import os
import signal
import sys
import warnings
import json
from threading import Thread
from apps.amdshark_studio.modules.timer import startup_timer
from apps.amdshark_studio.web.utils.tmp_configs import (
config_tmp,
clear_tmp_mlir,
clear_tmp_imgs,
amdshark_tmp,
)
def imports():
import torch # noqa: F401
startup_timer.record("import torch")
warnings.filterwarnings(
action="ignore", category=DeprecationWarning, module="torch"
)
warnings.filterwarnings(action="ignore", category=UserWarning, module="torchvision")
warnings.filterwarnings(action="ignore", category=UserWarning, module="torch")
import gradio # noqa: F401
startup_timer.record("import gradio")
import apps.amdshark_studio.web.utils.globals as global_obj
global_obj._init()
startup_timer.record("initialize globals")
from apps.amdshark_studio.modules import (
img_processing,
) # noqa: F401
startup_timer.record("other imports")
def initialize():
configure_sigint_handler()
# Setup to use amdshark_tmp for gradio's temporary image files and clear any
# existing temporary images there if they exist. Then we can import gradio.
# It has to be in this order or gradio ignores what we've set up.
config_tmp()
# clear_tmp_mlir()
clear_tmp_imgs()
from apps.amdshark_studio.web.utils.file_utils import (
create_model_folders,
)
# Create custom models folders if they don't exist
create_model_folders()
import gradio as gr
# initialize_rest(reload_script_modules=False)
def initialize_rest(*, reload_script_modules=False):
"""
Called both from initialize() and when reloading the webui.
"""
# Keep this for adding reload options to the webUI.
def dumpstacks():
import threading
import traceback
id2name = {th.ident: th.name for th in threading.enumerate()}
code = []
for threadId, stack in sys._current_frames().items():
code.append(f"\n# Thread: {id2name.get(threadId, '')}({threadId})")
for filename, lineno, name, line in traceback.extract_stack(stack):
code.append(f"""File: "{filename}", line {lineno}, in {name}""")
if line:
code.append(" " + line.strip())
with open(os.path.join(amdshark_tmp, "stack_dump.log"), "w") as f:
f.write("\n".join(code))
def setup_middleware(app):
from starlette.middleware.gzip import GZipMiddleware
app.middleware_stack = (
None # reset current middleware to allow modifying user provided list
)
app.add_middleware(GZipMiddleware, minimum_size=1000)
configure_cors_middleware(app)
app.build_middleware_stack() # rebuild middleware stack on-the-fly
def configure_cors_middleware(app):
from starlette.middleware.cors import CORSMiddleware
from apps.amdshark_studio.modules.shared_cmd_opts import cmd_opts
cors_options = {
"allow_methods": ["*"],
"allow_headers": ["*"],
"allow_credentials": True,
}
if cmd_opts.api_accept_origin:
cors_options["allow_origins"] = cmd_opts.api_accept_origin.split(",")
app.add_middleware(CORSMiddleware, **cors_options)
def configure_sigint_handler():
# make the program just exit at ctrl+c without waiting for anything
def sigint_handler(sig, frame):
print(f"Interrupted with signal {sig} in {frame}")
dumpstacks()
os._exit(0)
signal.signal(signal.SIGINT, sigint_handler)

View File

@@ -0,0 +1,475 @@
from turbine_models.custom_models import stateless_llama
from turbine_models.model_runner import vmfbRunner
from turbine_models.gen_external_params.gen_external_params import gen_external_params
import time
from amdshark.iree_utils.compile_utils import compile_module_to_flatbuffer
from apps.amdshark_studio.web.utils.file_utils import (
get_resource_path,
get_checkpoints_path,
)
from apps.amdshark_studio.modules.shared_cmd_opts import cmd_opts
from apps.amdshark_studio.api.utils import parse_device
from urllib.request import urlopen
import iree.runtime as ireert
from itertools import chain
import gc
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
llm_model_map = {
"meta-llama/Llama-2-7b-chat-hf": {
"initializer": stateless_llama.export_transformer_model,
"hf_model_name": "meta-llama/Llama-2-7b-chat-hf",
"compile_flags": ["--iree-opt-const-expr-hoisting=False"],
"stop_token": 2,
"max_tokens": 4096,
"system_prompt": """<s>[INST] <<SYS>>Be concise. You are a helpful, respectful and honest assistant. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <</SYS>>""",
},
"Trelis/Llama-2-7b-chat-hf-function-calling-v2": {
"initializer": stateless_llama.export_transformer_model,
"hf_model_name": "Trelis/Llama-2-7b-chat-hf-function-calling-v2",
"compile_flags": ["--iree-opt-const-expr-hoisting=False"],
"stop_token": 2,
"max_tokens": 4096,
"system_prompt": """<s>[INST] <<SYS>>Be concise. You are a helpful, respectful and honest assistant. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <</SYS>>""",
},
"TinyPixel/small-llama2": {
"initializer": stateless_llama.export_transformer_model,
"hf_model_name": "TinyPixel/small-llama2",
"compile_flags": ["--iree-opt-const-expr-hoisting=True"],
"stop_token": 2,
"max_tokens": 1024,
"system_prompt": """<s>[INST] <<SYS>>Be concise. You are a helpful, respectful and honest assistant. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <</SYS>>""",
},
}
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<s>", "</s>"
DEFAULT_CHAT_SYS_PROMPT = """<s>[INST] <<SYS>>
Be concise. You are a helpful, respectful and honest assistant. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n <</SYS>>\n\n
"""
def append_user_prompt(history, input_prompt):
user_prompt = f"{B_INST} {input_prompt} {E_INST}"
history += user_prompt
return history
class LanguageModel:
def __init__(
self,
model_name,
hf_auth_token=None,
device=None,
quantization="int4",
precision="",
external_weights=None,
use_system_prompt=True,
streaming_llm=False,
):
_, _, self.triple = parse_device(device)
self.hf_model_name = llm_model_map[model_name]["hf_model_name"]
self.device = device.split("=>")[-1].strip()
self.backend = self.device.split("://")[0]
self.driver = self.backend
if "cpu" in device:
self.device = "cpu"
self.backend = "llvm-cpu"
self.driver = "local-task"
print(f"Selected {self.backend} as IREE target backend.")
self.precision = "f32" if "cpu" in device else "f16"
self.quantization = quantization
self.safe_name = self.hf_model_name.replace("/", "_").replace("-", "_")
self.external_weight_file = None
# TODO: find a programmatic solution for model arch spec instead of hardcoding llama2
self.file_spec = "_".join(
[
self.safe_name,
self.precision,
]
)
if self.quantization != "None":
self.file_spec += "_" + self.quantization
if external_weights in ["safetensors", "gguf"]:
self.external_weight_file = get_resource_path(
os.path.join("..", self.file_spec + "." + external_weights)
)
else:
self.external_weights = None
self.external_weight_file = None
if streaming_llm:
# Add streaming suffix to file spec after setting external weights filename.
self.file_spec += "_streaming"
self.streaming_llm = streaming_llm
self.tempfile_name = get_resource_path(
os.path.join("..", f"{self.file_spec}.tempfile")
)
# TODO: Tag vmfb with target triple of device instead of HAL backend
self.vmfb_name = str(
get_resource_path(
os.path.join("..", f"{self.file_spec}_{self.backend}.vmfb.tempfile")
)
)
self.max_tokens = llm_model_map[model_name]["max_tokens"]
self.iree_module_dict = None
self.use_system_prompt = use_system_prompt
self.global_iter = 0
self.prev_token_len = 0
self.first_input = True
self.hf_auth_token = hf_auth_token
if self.external_weight_file is not None:
if not os.path.exists(self.external_weight_file):
print(
f"External weight file {self.external_weight_file} does not exist. Generating..."
)
gen_external_params(
hf_model_name=self.hf_model_name,
quantization=self.quantization,
weight_path=self.external_weight_file,
hf_auth_token=hf_auth_token,
precision=self.precision,
)
else:
print(
f"External weight file {self.external_weight_file} found for {self.vmfb_name}"
)
self.external_weight_file = str(self.external_weight_file)
if os.path.exists(self.vmfb_name) and (
external_weights is None or os.path.exists(str(self.external_weight_file))
):
self.runner = vmfbRunner(
device=self.driver,
vmfb_path=self.vmfb_name,
external_weight_path=self.external_weight_file,
)
if self.streaming_llm:
self.model = self.runner.ctx.modules.streaming_state_update
else:
self.model = self.runner.ctx.modules.state_update
self.tokenizer = AutoTokenizer.from_pretrained(
self.hf_model_name,
use_fast=False,
use_auth_token=hf_auth_token,
)
elif not os.path.exists(self.tempfile_name):
self.torch_ir, self.tokenizer = llm_model_map[self.hf_model_name][
"initializer"
](
self.hf_model_name,
hf_auth_token,
compile_to="torch",
external_weights=external_weights,
precision=self.precision,
quantization=self.quantization,
streaming_llm=self.streaming_llm,
decomp_attn=True,
)
with open(self.tempfile_name, "w+") as f:
f.write(self.torch_ir)
del self.torch_ir
gc.collect()
self.compile()
else:
self.tokenizer = AutoTokenizer.from_pretrained(
self.hf_model_name,
use_fast=False,
use_auth_token=hf_auth_token,
)
self.compile()
# Reserved for running HF torch model as reference.
self.hf_mod = None
def compile(self) -> None:
# this comes with keys: "vmfb", "config", and "temp_file_to_unlink".
# ONLY architecture/api-specific compile-time flags for each backend, if needed.
# hf_model_id-specific global flags currently in model map.
flags = []
if "cpu" in self.backend:
flags.extend(
[
"--iree-global-opt-enable-quantized-matmul-reassociation",
]
)
elif self.backend == "vulkan":
flags.extend(["--iree-stream-resource-max-allocation-size=4294967296"])
elif self.backend == "rocm":
flags.extend(
[
"--iree-codegen-llvmgpu-enable-transform-dialect-jit=false",
"--iree-llvmgpu-enable-prefetch=true",
"--iree-opt-outer-dim-concat=true",
"--iree-flow-enable-aggressive-fusion",
]
)
if "gfx9" in self.triple:
flags.extend(
[
f"--iree-codegen-transform-dialect-library={get_mfma_spec_path(self.triple, get_checkpoints_path())}",
"--iree-codegen-llvmgpu-use-vector-distribution=true",
]
)
flags.extend(llm_model_map[self.hf_model_name]["compile_flags"])
flatbuffer_blob = compile_module_to_flatbuffer(
self.tempfile_name,
device=self.device,
frontend="auto",
model_config_path=None,
extra_args=flags,
write_to=self.vmfb_name,
)
self.runner = vmfbRunner(
device=self.driver,
vmfb_path=self.vmfb_name,
external_weight_path=self.external_weight_file,
)
if self.streaming_llm:
self.model = self.runner.ctx.modules.streaming_state_update
else:
self.model = self.runner.ctx.modules.state_update
def sanitize_prompt(self, prompt):
if isinstance(prompt, list):
prompt = list(chain.from_iterable(prompt))
prompt = " ".join([x for x in prompt if isinstance(x, str)])
prompt = prompt.replace("\n", " ")
prompt = prompt.replace("\t", " ")
prompt = prompt.replace("\r", " ")
if self.use_system_prompt and self.global_iter == 0:
prompt = append_user_prompt(DEFAULT_CHAT_SYS_PROMPT, prompt)
return prompt
else:
return f"{B_INST} {prompt} {E_INST}"
def chat(self, prompt):
prompt = self.sanitize_prompt(prompt)
input_tensor = self.tokenizer(prompt, return_tensors="pt").input_ids
def format_out(results):
return torch.tensor(results.to_host()[0][0])
history = []
for iter in range(self.max_tokens):
if self.streaming_llm:
token_slice = max(self.prev_token_len - 1, 0)
input_tensor = input_tensor[:, token_slice:]
if self.streaming_llm and self.model["get_seq_step"]() > 600:
print("Evicting cache space!")
self.model["evict_kvcache_space"]()
token_len = input_tensor.shape[-1]
device_inputs = [
ireert.asdevicearray(self.runner.config.device, input_tensor)
]
if self.first_input or not self.streaming_llm:
st_time = time.time()
token = self.model["run_initialize"](*device_inputs)
total_time = time.time() - st_time
token_len += 1
self.first_input = False
else:
st_time = time.time()
token = self.model["run_cached_initialize"](*device_inputs)
total_time = time.time() - st_time
token_len += 1
history.append(format_out(token))
while (
format_out(token) != llm_model_map[self.hf_model_name]["stop_token"]
and len(history) < self.max_tokens
):
dec_time = time.time()
if self.streaming_llm and self.model["get_seq_step"]() > 600:
print("Evicting cache space!")
self.model["evict_kvcache_space"]()
token = self.model["run_forward"](token)
history.append(format_out(token))
total_time = time.time() - dec_time
yield self.tokenizer.decode(history), total_time
self.prev_token_len = token_len + len(history)
if format_out(token) == llm_model_map[self.hf_model_name]["stop_token"]:
break
for i in range(len(history)):
if type(history[i]) != int:
history[i] = int(history[i])
result_output = self.tokenizer.decode(history)
self.global_iter += 1
return result_output, total_time
# Reference HF model function for sanity checks.
def chat_hf(self, prompt):
if self.hf_mod is None:
self.hf_mod = AutoModelForCausalLM.from_pretrained(
self.hf_model_name,
torch_dtype=torch.float,
token=self.hf_auth_token,
)
prompt = self.sanitize_prompt(prompt)
input_tensor = self.tokenizer(prompt, return_tensors="pt").input_ids
history = []
for iter in range(self.max_tokens):
token_len = input_tensor.shape[-1]
if self.first_input:
st_time = time.time()
result = self.hf_mod(input_tensor)
token = torch.argmax(result.logits[:, -1, :], dim=1)
total_time = time.time() - st_time
token_len += 1
pkv = result.past_key_values
self.first_input = False
history.append(int(token))
while token != llm_model_map[self.hf_model_name]["stop_token"]:
dec_time = time.time()
result = self.hf_mod(token.reshape([1, 1]), past_key_values=pkv)
history.append(int(token))
total_time = time.time() - dec_time
token = torch.argmax(result.logits[:, -1, :], dim=1)
pkv = result.past_key_values
yield self.tokenizer.decode(history), total_time
self.prev_token_len = token_len + len(history)
if token == llm_model_map[self.hf_model_name]["stop_token"]:
break
for i in range(len(history)):
if type(history[i]) != int:
history[i] = int(history[i])
result_output = self.tokenizer.decode(history)
self.global_iter += 1
return result_output, total_time
def get_mfma_spec_path(target_chip, save_dir):
url = "https://raw.githubusercontent.com/iree-org/iree/main/build_tools/pkgci/external_test_suite/attention_and_matmul_spec.mlir"
attn_spec = urlopen(url).read().decode("utf-8")
spec_path = os.path.join(save_dir, "attention_and_matmul_spec_mfma.mlir")
if os.path.exists(spec_path):
return spec_path
with open(spec_path, "w") as f:
f.write(attn_spec)
return spec_path
def llm_chat_api(InputData: dict):
from datetime import datetime as dt
import apps.amdshark_studio.web.utils.globals as global_obj
print(f"Input keys : {InputData.keys()}")
# print(f"model : {InputData['model']}")
is_chat_completion_api = (
"messages" in InputData.keys()
) # else it is the legacy `completion` api
# For Debugging input data from API
if is_chat_completion_api:
print(f"message -> role : {InputData['messages'][0]['role']}")
print(f"message -> content : {InputData['messages'][0]['content']}")
else:
print(f"prompt : {InputData['prompt']}")
model_name = (
InputData["model"]
if "model" in InputData.keys()
else "meta-llama/Llama-2-7b-chat-hf"
)
model_path = llm_model_map[model_name]
device = InputData["device"] if "device" in InputData.keys() else "cpu"
precision = "fp16"
max_tokens = InputData["max_tokens"] if "max_tokens" in InputData.keys() else 4096
device_id = None
if not global_obj.get_llm_obj():
print("\n[LOG] Initializing new pipeline...")
global_obj.clear_cache()
gc.collect()
if "cuda" in device:
device = "cuda"
elif "vulkan" in device:
device_id = int(device.split("://")[1])
device = "vulkan"
elif "cpu" in device:
device = "cpu"
precision = "fp32"
else:
print("unrecognized device")
llm_model = LanguageModel(
model_name=model_name,
hf_auth_token=cmd_opts.hf_auth_token,
device=device,
quantization=cmd_opts.quantization,
external_weights="safetensors",
use_system_prompt=True,
streaming_llm=False,
)
global_obj.set_llm_obj(llm_model)
else:
llm_model = global_obj.get_llm_obj()
llm_model.max_tokens = max_tokens
# TODO: add role dict for different models
if is_chat_completion_api:
# TODO: add funtionality for multiple messages
prompt = append_user_prompt(
InputData["messages"][0]["role"], InputData["messages"][0]["content"]
)
else:
prompt = InputData["prompt"]
print("prompt = ", prompt)
for res_op, _ in llm_model.chat(prompt):
if is_chat_completion_api:
choices = [
{
"index": 0,
"message": {
"role": "assistant",
"content": res_op, # since we are yeilding the result
},
"finish_reason": "stop", # or length
}
]
else:
choices = [
{
"text": res_op,
"index": 0,
"logprobs": None,
"finish_reason": "stop", # or length
}
]
end_time = dt.now().strftime("%Y%m%d%H%M%S%f")
return {
"id": end_time,
"object": "chat.completion" if is_chat_completion_api else "text_completion",
"created": int(end_time),
"choices": choices,
}
if __name__ == "__main__":
lm = LanguageModel(
"Trelis/Llama-2-7b-chat-hf-function-calling-v2",
hf_auth_token=None,
device="cpu-task",
external_weights="safetensors",
)
print("model loaded")
for i in lm.chat("hi, what are you?"):
print(i)

View File

@@ -0,0 +1,505 @@
import gc
import torch
import gradio as gr
import time
import os
import json
import numpy as np
import copy
import importlib.util
import sys
from tqdm.auto import tqdm
from pathlib import Path
from random import randint
from turbine_models.custom_models.sd_inference.sd_pipeline import AMDSharkSDPipeline
from turbine_models.custom_models.sdxl_inference.sdxl_compiled_pipeline import (
AMDSharkSDXLPipeline,
)
from apps.amdshark_studio.api.controlnet import control_adapter_map
from apps.amdshark_studio.api.utils import parse_device
from apps.amdshark_studio.web.utils.state import status_label
from apps.amdshark_studio.web.utils.file_utils import (
safe_name,
get_resource_path,
get_checkpoints_path,
)
from apps.amdshark_studio.modules.img_processing import (
save_output_img,
)
from apps.amdshark_studio.modules.ckpt_processing import (
preprocessCKPT,
save_irpa,
)
EMPTY_SD_MAP = {
"clip": None,
"scheduler": None,
"unet": None,
"vae_decode": None,
}
EMPTY_SDXL_MAP = {
"prompt_encoder": None,
"scheduled_unet": None,
"vae_decode": None,
"pipeline": None,
"full_pipeline": None,
}
EMPTY_FLAGS = {
"clip": None,
"unet": None,
"vae": None,
"pipeline": None,
}
def load_script(source, module_name):
"""
reads file source and loads it as a module
:param source: file to load
:param module_name: name of module to register in sys.modules
:return: loaded module
"""
spec = importlib.util.spec_from_file_location(module_name, source)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module
class StableDiffusion:
# This class is responsible for executing image generation and creating
# /managing a set of compiled modules to run Stable Diffusion. The init
# aims to be as general as possible, and the class will infer and compile
# a list of necessary modules or a combined "pipeline module" for a
# specified job based on the inference task.
def __init__(
self,
base_model_id,
height: int,
width: int,
batch_size: int,
steps: int,
scheduler: str,
precision: str,
device: str,
target_triple: str = None,
custom_vae: str = None,
num_loras: int = 0,
import_ir: bool = True,
is_controlled: bool = False,
external_weights: str = "safetensors",
):
self.precision = precision
self.compiled_pipeline = False
self.base_model_id = base_model_id
self.custom_vae = custom_vae
self.is_sdxl = "xl" in self.base_model_id.lower()
self.is_custom = ".py" in self.base_model_id.lower()
if self.is_custom:
custom_module = load_script(
os.path.join(get_checkpoints_path("scripts"), self.base_model_id),
"custom_pipeline",
)
self.turbine_pipe = custom_module.StudioPipeline
self.model_map = custom_module.MODEL_MAP
elif self.is_sdxl:
self.turbine_pipe = AMDSharkSDXLPipeline
self.model_map = EMPTY_SDXL_MAP
else:
self.turbine_pipe = AMDSharkSDPipeline
self.model_map = EMPTY_SD_MAP
max_length = 64
target_backend, self.rt_device, triple = parse_device(device, target_triple)
pipe_id_list = [
safe_name(base_model_id),
str(batch_size),
str(max_length),
f"{str(height)}x{str(width)}",
precision,
triple,
]
if num_loras > 0:
pipe_id_list.append(str(num_loras) + "lora")
if is_controlled:
pipe_id_list.append("controlled")
if custom_vae:
pipe_id_list.append(custom_vae)
self.pipe_id = "_".join(pipe_id_list)
self.pipeline_dir = Path(os.path.join(get_checkpoints_path(), self.pipe_id))
self.weights_path = Path(
os.path.join(
get_checkpoints_path(), safe_name(self.base_model_id + "_" + precision)
)
)
if not os.path.exists(self.weights_path):
os.mkdir(self.weights_path)
decomp_attn = True
attn_spec = None
if triple in ["gfx940", "gfx942", "gfx90a"]:
decomp_attn = False
attn_spec = "mfma"
elif triple in ["gfx1100", "gfx1103", "gfx1150"]:
decomp_attn = False
attn_spec = "wmma"
if triple in ["gfx1103", "gfx1150"]:
# external weights have issues on igpu
external_weights = None
elif target_backend == "llvm-cpu":
decomp_attn = False
self.sd_pipe = self.turbine_pipe(
hf_model_name=base_model_id,
scheduler_id=scheduler,
height=height,
width=width,
precision=precision,
max_length=max_length,
batch_size=batch_size,
num_inference_steps=steps,
device=target_backend,
iree_target_triple=triple,
ireec_flags=EMPTY_FLAGS,
attn_spec=attn_spec,
decomp_attn=decomp_attn,
pipeline_dir=self.pipeline_dir,
external_weights_dir=self.weights_path,
external_weights=external_weights,
custom_vae=custom_vae,
)
print(f"\n[LOG] Pipeline initialized with pipe_id: {self.pipe_id}.")
gc.collect()
def prepare_pipe(
self, custom_weights, adapters, embeddings, is_img2img, compiled_pipeline
):
print(f"\n[LOG] Preparing pipeline...")
self.is_img2img = False
mlirs = copy.deepcopy(self.model_map)
vmfbs = copy.deepcopy(self.model_map)
weights = copy.deepcopy(self.model_map)
if not self.is_sdxl:
compiled_pipeline = False
self.compiled_pipeline = compiled_pipeline
if custom_weights:
custom_weights = os.path.join(
get_checkpoints_path("checkpoints"),
safe_name(self.base_model_id.split("/")[-1]),
custom_weights,
)
diffusers_weights_path = preprocessCKPT(custom_weights, self.precision)
for key in weights:
if key in ["scheduled_unet", "unet"]:
unet_weights_path = os.path.join(
diffusers_weights_path,
"unet",
"diffusion_pytorch_model.safetensors",
)
weights[key] = save_irpa(unet_weights_path, "unet.")
elif key in ["clip", "prompt_encoder"]:
if not self.is_sdxl:
sd1_path = os.path.join(
diffusers_weights_path, "text_encoder", "model.safetensors"
)
weights[key] = save_irpa(sd1_path, "text_encoder_model.")
else:
clip_1_path = os.path.join(
diffusers_weights_path, "text_encoder", "model.safetensors"
)
clip_2_path = os.path.join(
diffusers_weights_path,
"text_encoder_2",
"model.safetensors",
)
weights[key] = [
save_irpa(clip_1_path, "text_encoder_model_1."),
save_irpa(clip_2_path, "text_encoder_model_2."),
]
elif key in ["vae_decode"] and weights[key] is None:
vae_weights_path = os.path.join(
diffusers_weights_path,
"vae",
"diffusion_pytorch_model.safetensors",
)
weights[key] = save_irpa(vae_weights_path, "vae.")
vmfbs, weights = self.sd_pipe.check_prepared(
mlirs, vmfbs, weights, interactive=False
)
print(f"\n[LOG] Loading pipeline to device {self.rt_device}.")
self.sd_pipe.load_pipeline(
vmfbs, weights, self.rt_device, self.compiled_pipeline
)
print(
"\n[LOG] Pipeline successfully prepared for runtime. Generating images..."
)
return
def generate_images(
self,
prompt,
negative_prompt,
image,
strength,
guidance_scale,
seed,
ondemand,
resample_type,
control_mode,
hints,
):
img = self.sd_pipe.generate_images(
prompt,
negative_prompt,
1,
guidance_scale,
seed,
return_imgs=True,
)
return img
def amdshark_sd_fn_dict_input(
sd_kwargs: dict,
):
print("\n[LOG] Submitting Request...")
for key in sd_kwargs:
if sd_kwargs[key] in [None, []]:
sd_kwargs[key] = None
if sd_kwargs[key] in ["None"]:
sd_kwargs[key] = ""
if key == "seed":
sd_kwargs[key] = int(sd_kwargs[key])
# TODO: move these checks into the UI code so we don't have gradio warnings in a generalized dict input function.
if not sd_kwargs["device"]:
gr.Warning("No device specified. Please specify a device.")
return None, ""
if sd_kwargs["height"] not in [512, 1024]:
gr.Warning("Height must be 512 or 1024. This is a temporary limitation.")
return None, ""
if sd_kwargs["height"] != sd_kwargs["width"]:
gr.Warning("Height and width must be the same. This is a temporary limitation.")
return None, ""
if sd_kwargs["base_model_id"] == "stabilityai/sdxl-turbo":
if sd_kwargs["steps"] > 10:
gr.Warning("Max steps for sdxl-turbo is 10. 1 to 4 steps are recommended.")
return None, ""
if sd_kwargs["guidance_scale"] > 3:
gr.Warning(
"sdxl-turbo CFG scale should be less than 2.0 if using negative prompt, 0 otherwise."
)
return None, ""
if sd_kwargs["target_triple"] == "":
if parse_device(sd_kwargs["device"], sd_kwargs["target_triple"])[2] == "":
gr.Warning(
"Target device architecture could not be inferred. Please specify a target triple, e.g. 'gfx1100' for a Radeon 7900xtx."
)
return None, ""
generated_imgs = yield from amdshark_sd_fn(**sd_kwargs)
return generated_imgs
def amdshark_sd_fn(
prompt,
negative_prompt,
sd_init_image: list,
height: int,
width: int,
steps: int,
strength: float,
guidance_scale: float,
seed: list,
batch_count: int,
batch_size: int,
scheduler: str,
base_model_id: str,
custom_weights: str,
custom_vae: str,
precision: str,
device: str,
target_triple: str,
ondemand: bool,
compiled_pipeline: bool,
resample_type: str,
controlnets: dict,
embeddings: dict,
):
sd_kwargs = locals()
if not isinstance(sd_init_image, list):
sd_init_image = [sd_init_image]
is_img2img = True if sd_init_image[0] is not None else False
from apps.amdshark_studio.modules.shared_cmd_opts import cmd_opts
import apps.amdshark_studio.web.utils.globals as global_obj
adapters = {}
is_controlled = False
control_mode = None
hints = []
num_loras = 0
import_ir = True
for i in embeddings:
num_loras += 1 if embeddings[i] else 0
if "model" in controlnets:
for i, model in enumerate(controlnets["model"]):
if "xl" not in base_model_id.lower():
adapters[f"control_adapter_{model}"] = {
"hf_id": control_adapter_map["runwayml/stable-diffusion-v1-5"][
model
],
"strength": controlnets["strength"][i],
}
else:
adapters[f"control_adapter_{model}"] = {
"hf_id": control_adapter_map["stabilityai/stable-diffusion-xl-1.0"][
model
],
"strength": controlnets["strength"][i],
}
if model is not None:
is_controlled = True
control_mode = controlnets["control_mode"]
for i in controlnets["hint"]:
hints.append[i]
submit_pipe_kwargs = {
"base_model_id": base_model_id,
"height": height,
"width": width,
"batch_size": batch_size,
"precision": precision,
"device": device,
"target_triple": target_triple,
"custom_vae": custom_vae,
"num_loras": num_loras,
"import_ir": import_ir,
"is_controlled": is_controlled,
"steps": steps,
"scheduler": scheduler,
}
submit_prep_kwargs = {
"custom_weights": custom_weights,
"adapters": adapters,
"embeddings": embeddings,
"is_img2img": is_img2img,
"compiled_pipeline": compiled_pipeline,
}
submit_run_kwargs = {
"prompt": prompt,
"negative_prompt": negative_prompt,
"image": sd_init_image,
"strength": strength,
"guidance_scale": guidance_scale,
"seed": seed,
"ondemand": ondemand,
"resample_type": resample_type,
"control_mode": control_mode,
"hints": hints,
}
if (
not global_obj.get_sd_obj()
or global_obj.get_pipe_kwargs() != submit_pipe_kwargs
):
print("\n[LOG] Initializing new pipeline...")
global_obj.clear_cache()
gc.collect()
# Initializes the pipeline and retrieves IR based on all
# parameters that are static in the turbine output format,
# which is currently MLIR in the torch dialect.
sd_pipe = StableDiffusion(
**submit_pipe_kwargs,
)
global_obj.set_sd_obj(sd_pipe)
global_obj.set_pipe_kwargs(submit_pipe_kwargs)
if (
not global_obj.get_prep_kwargs()
or global_obj.get_prep_kwargs() != submit_prep_kwargs
):
global_obj.set_prep_kwargs(submit_prep_kwargs)
global_obj.get_sd_obj().prepare_pipe(**submit_prep_kwargs)
generated_imgs = []
for current_batch in range(batch_count):
start_time = time.time()
out_imgs = global_obj.get_sd_obj().generate_images(**submit_run_kwargs)
if not isinstance(out_imgs, list):
out_imgs = [out_imgs]
# total_time = time.time() - start_time
# text_output = f"Total image(s) generation time: {total_time:.4f}sec"
# print(f"\n[LOG] {text_output}")
# if global_obj.get_sd_status() == SD_STATE_CANCEL:
# break
# else:
for batch in range(batch_size):
save_output_img(
out_imgs[batch],
seed,
sd_kwargs,
)
generated_imgs.extend(out_imgs)
# TODO: make seed changes over batch counts more configurable.
submit_run_kwargs["seed"] = submit_run_kwargs["seed"] + 1
yield generated_imgs, status_label(
"Stable Diffusion", current_batch + 1, batch_count, batch_size
)
return (generated_imgs, "")
def unload_sd():
print("Unloading models.")
import apps.amdshark_studio.web.utils.globals as global_obj
global_obj.clear_cache()
gc.collect()
def cancel_sd():
print("Inject call to cancel longer API calls.")
return
def view_json_file(file_path):
content = ""
with open(file_path, "r") as fopen:
content = fopen.read()
return content
def safe_name(name):
return name.replace("/", "_").replace("\\", "_").replace(".", "_")
if __name__ == "__main__":
from apps.amdshark_studio.modules.shared_cmd_opts import cmd_opts
import apps.amdshark_studio.web.utils.globals as global_obj
global_obj._init()
sd_json = view_json_file(
get_resource_path(os.path.join(cmd_opts.config_dir, "default_sd_config.json"))
)
sd_kwargs = json.loads(sd_json)
for arg in vars(cmd_opts):
if arg in sd_kwargs:
sd_kwargs[arg] = getattr(cmd_opts, arg)
for i in amdshark_sd_fn_dict_input(sd_kwargs):
print(i)

View File

@@ -0,0 +1,389 @@
import numpy as np
import json
from random import (
randint,
seed as seed_random,
getstate as random_getstate,
setstate as random_setstate,
)
from pathlib import Path
from apps.amdshark_studio.modules.shared_cmd_opts import cmd_opts
from cpuinfo import get_cpu_info
# TODO: migrate these utils to studio
from amdshark.iree_utils.vulkan_utils import (
set_iree_vulkan_runtime_flags,
get_vulkan_target_triple,
get_iree_vulkan_runtime_flags,
)
def get_available_devices():
def get_devices_by_name(driver_name):
from amdshark.iree_utils._common import iree_device_map
device_list = []
try:
driver_name = iree_device_map(driver_name)
device_list_dict = get_all_devices(driver_name)
print(f"{driver_name} devices are available.")
except:
print(f"{driver_name} devices are not available.")
else:
cpu_name = get_cpu_info()["brand_raw"]
for i, device in enumerate(device_list_dict):
device_name = (
cpu_name if device["name"] == "default" else device["name"]
)
if "local" in driver_name:
device_list.append(
f"{device_name} => {driver_name.replace('local', 'cpu')}"
)
else:
# for drivers with single devices
# let the default device be selected without any indexing
if len(device_list_dict) == 1:
device_list.append(f"{device_name} => {driver_name}")
else:
device_list.append(f"{device_name} => {driver_name}://{i}")
return device_list
set_iree_runtime_flags()
available_devices = []
rocm_devices = get_devices_by_name("rocm")
available_devices.extend(rocm_devices)
cpu_device = get_devices_by_name("cpu-sync")
available_devices.extend(cpu_device)
cpu_device = get_devices_by_name("cpu-task")
available_devices.extend(cpu_device)
from amdshark.iree_utils.vulkan_utils import (
get_all_vulkan_devices,
)
vulkaninfo_list = get_all_vulkan_devices()
vulkan_devices = []
id = 0
for device in vulkaninfo_list:
vulkan_devices.append(f"{device.strip()} => vulkan://{id}")
id += 1
if id != 0:
print(f"vulkan devices are available.")
available_devices.extend(vulkan_devices)
metal_devices = get_devices_by_name("metal")
available_devices.extend(metal_devices)
cuda_devices = get_devices_by_name("cuda")
available_devices.extend(cuda_devices)
hip_devices = get_devices_by_name("hip")
available_devices.extend(hip_devices)
for idx, device_str in enumerate(available_devices):
if "AMD Radeon(TM) Graphics =>" in device_str:
igpu_id_candidates = [
x.split("w/")[-1].split("=>")[0]
for x in available_devices
if "M Graphics" in x
]
for igpu_name in igpu_id_candidates:
if igpu_name:
available_devices[idx] = device_str.replace(
"AMD Radeon(TM) Graphics", igpu_name
)
break
return available_devices
def set_init_device_flags():
if "vulkan" in cmd_opts.device:
# set runtime flags for vulkan.
set_iree_runtime_flags()
# set triple flag to avoid multiple calls to get_vulkan_triple_flag
device_name, cmd_opts.device = map_device_to_name_path(cmd_opts.device)
if not cmd_opts.iree_vulkan_target_triple:
triple = get_vulkan_target_triple(device_name)
if triple is not None:
cmd_opts.iree_vulkan_target_triple = triple
print(
f"Found device {device_name}. Using target triple "
f"{cmd_opts.iree_vulkan_target_triple}."
)
elif "cuda" in cmd_opts.device:
cmd_opts.device = "cuda"
elif "metal" in cmd_opts.device:
device_name, cmd_opts.device = map_device_to_name_path(cmd_opts.device)
if not cmd_opts.iree_metal_target_platform:
from amdshark.iree_utils.metal_utils import get_metal_target_triple
triple = get_metal_target_triple(device_name)
if triple is not None:
cmd_opts.iree_metal_target_platform = triple.split("-")[-1]
print(
f"Found device {device_name}. Using target triple "
f"{cmd_opts.iree_metal_target_platform}."
)
elif "cpu" in cmd_opts.device:
cmd_opts.device = "cpu"
def set_iree_runtime_flags():
# TODO: This function should be device-agnostic and piped properly
# to general runtime driver init.
vulkan_runtime_flags = get_iree_vulkan_runtime_flags()
if cmd_opts.enable_rgp:
vulkan_runtime_flags += [
f"--enable_rgp=true",
f"--vulkan_debug_utils=true",
]
if cmd_opts.device_allocator_heap_key:
vulkan_runtime_flags += [
f"--device_allocator=caching:device_local={cmd_opts.device_allocator_heap_key}",
]
set_iree_vulkan_runtime_flags(flags=vulkan_runtime_flags)
def parse_device(device_str, target_override=""):
from amdshark.iree_utils.compile_utils import (
clean_device_info,
get_iree_target_triple,
iree_target_map,
)
rt_driver, device_id = clean_device_info(device_str)
target_backend = iree_target_map(rt_driver)
if device_id:
rt_device = f"{rt_driver}://{device_id}"
else:
rt_device = rt_driver
if target_override:
return target_backend, rt_device, target_override
match target_backend:
case "vulkan-spirv":
triple = get_iree_target_triple(device_str)
return target_backend, rt_device, triple
case "rocm":
triple = get_rocm_target_chip(device_str)
return target_backend, rt_device, triple
case "llvm-cpu":
return "llvm-cpu", "local-task", "x86_64-linux-gnu"
def get_rocm_target_chip(device_str):
# TODO: Use a data file to map device_str to target chip.
rocm_chip_map = {
"6700": "gfx1031",
"6800": "gfx1030",
"6900": "gfx1030",
"7900": "gfx1100",
"MI300X": "gfx942",
"MI300A": "gfx940",
"MI210": "gfx90a",
"MI250": "gfx90a",
"MI100": "gfx908",
"MI50": "gfx906",
"MI60": "gfx906",
"780M": "gfx1103",
}
for key in rocm_chip_map:
if key in device_str:
return rocm_chip_map[key]
raise AssertionError(
f"Device {device_str} not recognized. Please file an issue at https://github.com/nod-ai/AMD-SHARK-Studio/issues."
)
def get_all_devices(driver_name):
"""
Inputs: driver_name
Returns a list of all the available devices for a given driver sorted by
the iree path names of the device as in --list_devices option in iree.
"""
from iree.runtime import get_driver
driver = get_driver(driver_name)
device_list_src = driver.query_available_devices()
device_list_src.sort(key=lambda d: d["path"])
return device_list_src
def get_device_mapping(driver, key_combination=3):
"""This method ensures consistent device ordering when choosing
specific devices for execution
Args:
driver (str): execution driver (vulkan, cuda, rocm, etc)
key_combination (int, optional): choice for mapping value for
device name.
1 : path
2 : name
3 : (name, path)
Defaults to 3.
Returns:
dict: map to possible device names user can input mapped to desired
combination of name/path.
"""
from amdshark.iree_utils._common import iree_device_map
driver = iree_device_map(driver)
device_list = get_all_devices(driver)
device_map = dict()
def get_output_value(dev_dict):
if key_combination == 1:
return f"{driver}://{dev_dict['path']}"
if key_combination == 2:
return dev_dict["name"]
if key_combination == 3:
return dev_dict["name"], f"{driver}://{dev_dict['path']}"
# mapping driver name to default device (driver://0)
device_map[f"{driver}"] = get_output_value(device_list[0])
for i, device in enumerate(device_list):
# mapping with index
device_map[f"{driver}://{i}"] = get_output_value(device)
# mapping with full path
device_map[f"{driver}://{device['path']}"] = get_output_value(device)
return device_map
def get_opt_flags(model, precision="fp16"):
iree_flags = []
if len(cmd_opts.iree_vulkan_target_triple) > 0:
iree_flags.append(
f"-iree-vulkan-target-triple={cmd_opts.iree_vulkan_target_triple}"
)
if "rocm" in cmd_opts.device:
from amdshark.iree_utils.gpu_utils import get_iree_rocm_args
rocm_args = get_iree_rocm_args()
iree_flags.extend(rocm_args)
if cmd_opts.iree_constant_folding == False:
iree_flags.append("--iree-opt-const-expr-hoisting=False")
iree_flags.append(
"--iree-codegen-linalg-max-constant-fold-elements=9223372036854775807"
)
if cmd_opts.data_tiling == False:
iree_flags.append("--iree-opt-data-tiling=False")
if "vae" not in model:
# Due to lack of support for multi-reduce, we always collapse reduction
# dims before dispatch formation right now.
iree_flags += ["--iree-flow-collapse-reduction-dims"]
return iree_flags
def map_device_to_name_path(device, key_combination=3):
"""Gives the appropriate device data (supported name/path) for user
selected execution device
Args:
device (str): user
key_combination (int, optional): choice for mapping value for
device name.
1 : path
2 : name
3 : (name, path)
Defaults to 3.
Raises:
ValueError:
Returns:
str / tuple: returns the mapping str or tuple of mapping str for
the device depending on key_combination value
"""
driver = device.split("://")[0]
device_map = get_device_mapping(driver, key_combination)
try:
device_mapping = device_map[device]
except KeyError:
raise ValueError(f"Device '{device}' is not a valid device.")
return device_mapping
def get_devices_by_name(driver_name):
from amdshark.iree_utils._common import iree_device_map
device_list = []
try:
driver_name = iree_device_map(driver_name)
device_list_dict = get_all_devices(driver_name)
print(f"{driver_name} devices are available.")
except:
print(f"{driver_name} devices are not available.")
else:
cpu_name = get_cpu_info()["brand_raw"]
for i, device in enumerate(device_list_dict):
device_name = (
cpu_name if device["name"] == "default" else device["name"]
)
if "local" in driver_name:
device_list.append(
f"{device_name} => {driver_name.replace('local', 'cpu')}"
)
else:
# for drivers with single devices
# let the default device be selected without any indexing
if len(device_list_dict) == 1:
device_list.append(f"{device_name} => {driver_name}")
else:
device_list.append(f"{device_name} => {driver_name}://{i}")
return device_list
set_iree_runtime_flags()
available_devices = []
from amdshark.iree_utils.vulkan_utils import (
get_all_vulkan_devices,
)
vulkaninfo_list = get_all_vulkan_devices()
vulkan_devices = []
id = 0
for device in vulkaninfo_list:
vulkan_devices.append(f"{device.strip()} => vulkan://{id}")
id += 1
if id != 0:
print(f"vulkan devices are available.")
available_devices.extend(vulkan_devices)
metal_devices = get_devices_by_name("metal")
available_devices.extend(metal_devices)
cuda_devices = get_devices_by_name("cuda")
available_devices.extend(cuda_devices)
rocm_devices = get_devices_by_name("rocm")
available_devices.extend(rocm_devices)
cpu_device = get_devices_by_name("cpu-sync")
available_devices.extend(cpu_device)
cpu_device = get_devices_by_name("cpu-task")
available_devices.extend(cpu_device)
return available_devices
# Generate and return a new seed if the provided one is not in the
# supported range (including -1)
def sanitize_seed(seed: int | str):
seed = int(seed)
uint32_info = np.iinfo(np.uint32)
uint32_min, uint32_max = uint32_info.min, uint32_info.max
if seed < uint32_min or seed >= uint32_max:
seed = randint(uint32_min, uint32_max)
return seed
# take a seed expression in an input format and convert it to
# a list of integers, where possible
def parse_seed_input(seed_input: str | list | int):
if isinstance(seed_input, str):
try:
seed_input = json.loads(seed_input)
except (ValueError, TypeError):
seed_input = None
if isinstance(seed_input, int):
return [seed_input]
if isinstance(seed_input, list) and all(type(seed) is int for seed in seed_input):
return seed_input
raise TypeError(
"Seed input must be an integer or an array of integers in JSON format"
)

View File

@@ -0,0 +1,145 @@
import os
import json
import re
import requests
import torch
import safetensors
from iree.turbine.aot.params import (
ParameterArchiveBuilder,
)
from io import BytesIO
from pathlib import Path
from tqdm import tqdm
from omegaconf import OmegaConf
from diffusers import StableDiffusionPipeline
from apps.amdshark_studio.modules.shared_cmd_opts import cmd_opts
from diffusers.pipelines.stable_diffusion.convert_from_ckpt import (
download_from_original_stable_diffusion_ckpt,
create_vae_diffusers_config,
convert_ldm_vae_checkpoint,
)
def get_path_to_diffusers_checkpoint(custom_weights, precision="fp16"):
path = Path(custom_weights)
diffusers_path = path.parent.absolute()
diffusers_directory_name = os.path.join("diffusers", path.stem + f"_{precision}")
complete_path_to_diffusers = diffusers_path / diffusers_directory_name
complete_path_to_diffusers.mkdir(parents=True, exist_ok=True)
path_to_diffusers = complete_path_to_diffusers.as_posix()
return path_to_diffusers
def preprocessCKPT(custom_weights, precision="fp16", is_inpaint=False):
path_to_diffusers = get_path_to_diffusers_checkpoint(custom_weights, precision)
if next(Path(path_to_diffusers).iterdir(), None):
print("Checkpoint already loaded at : ", path_to_diffusers)
return path_to_diffusers
else:
print(
"Diffusers' checkpoint will be identified here : ",
path_to_diffusers,
)
from_safetensors = (
True if custom_weights.lower().endswith(".safetensors") else False
)
# EMA weights usually yield higher quality images for inference but
# non-EMA weights have been yielding better results in our case.
# TODO: Add an option `--ema` (`--no-ema`) for users to specify if
# they want to go for EMA weight extraction or not.
extract_ema = False
print("Loading diffusers' pipeline from original stable diffusion checkpoint")
num_in_channels = 9 if is_inpaint else 4
pipe = download_from_original_stable_diffusion_ckpt(
checkpoint_path_or_dict=custom_weights,
extract_ema=extract_ema,
from_safetensors=from_safetensors,
num_in_channels=num_in_channels,
)
if precision == "fp16":
pipe.to(dtype=torch.float16)
pipe.save_pretrained(path_to_diffusers)
del pipe
print("Loading complete")
return path_to_diffusers
def save_irpa(weights_path, prepend_str):
weights = safetensors.torch.load_file(weights_path)
archive = ParameterArchiveBuilder()
for key in weights.keys():
new_key = prepend_str + key
archive.add_tensor(new_key, weights[key])
irpa_file = weights_path.replace(".safetensors", ".irpa")
archive.save(irpa_file)
return irpa_file
def convert_original_vae(vae_checkpoint):
vae_state_dict = {}
for key in list(vae_checkpoint.keys()):
vae_state_dict["first_stage_model." + key] = vae_checkpoint.get(key)
config_url = (
"https://raw.githubusercontent.com/CompVis/stable-diffusion/"
"main/configs/stable-diffusion/v1-inference.yaml"
)
original_config_file = BytesIO(requests.get(config_url).content)
original_config = OmegaConf.load(original_config_file)
vae_config = create_vae_diffusers_config(original_config, image_size=512)
converted_vae_checkpoint = convert_ldm_vae_checkpoint(vae_state_dict, vae_config)
return converted_vae_checkpoint
def process_custom_pipe_weights(custom_weights):
if custom_weights != "":
if custom_weights.startswith("https://civitai.com/api/"):
# download the checkpoint from civitai if we don't already have it
weights_path = get_civitai_checkpoint(custom_weights)
# act as if we were given the local file as custom_weights originally
custom_weights_tgt = get_path_to_diffusers_checkpoint(weights_path)
custom_weights_params = weights_path
else:
assert custom_weights.lower().endswith(
(".ckpt", ".safetensors")
), "checkpoint files supported can be any of [.ckpt, .safetensors] type"
custom_weights_tgt = get_path_to_diffusers_checkpoint(custom_weights)
custom_weights_params = custom_weights
return custom_weights_params, custom_weights_tgt
def get_civitai_checkpoint(url: str):
with requests.get(url, allow_redirects=True, stream=True) as response:
response.raise_for_status()
# civitai api returns the filename in the content disposition
base_filename = re.findall(
'"([^"]*)"', response.headers["Content-Disposition"]
)[0]
destination_path = Path.cwd() / (cmd_opts.model_dir or "models") / base_filename
# we don't have this model downloaded yet
if not destination_path.is_file():
print(f"downloading civitai model from {url} to {destination_path}")
size = int(response.headers["content-length"], 0)
progress_bar = tqdm(total=size, unit="iB", unit_scale=True)
with open(destination_path, "wb") as f:
for chunk in response.iter_content(chunk_size=65536):
f.write(chunk)
progress_bar.update(len(chunk))
progress_bar.close()
# we already have this model downloaded
else:
print(f"civitai model already downloaded to {destination_path}")
response.close()
return destination_path.as_posix()

Some files were not shown because too many files have changed in this diff Show More