Abhishek-Varma
ce00c1c5e1
[SharkInference] Make SharkInference compile the entire module
...
-- Previously SharkInference was compiling and providing run APIs
for a harcoded function with function name "forward".
-- This commit makes the compiling functionality generic and now
any function being defined within the module can be run.
-- It also creates an API to fetch all the function names defined
within the compiled module.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
2022-12-24 09:05:06 +00:00
Stanley Winata
136021424c
[SD] Change default VMA large heap block size for windows perf. ( #715 )
...
Windows perform can boost from 2.67s/image to 2.4523s/image.
While Linux stays the same.
20221224.411
20221224.410
20221224.409
20221224.408
2022-12-24 01:40:58 +07:00
PhaneeshB
fee4ba3746
Add openjourney
2022-12-23 23:34:22 +05:30
Gaurav Shukla
a5b70335d4
[SD][web] Add variant support in the web UI
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-23 23:18:27 +05:30
Stanley Winata
5cf4976054
[Vulkan][utils] Add GTX Pascal support. ( #709 )
20221223.407
20221223.406
2022-12-22 15:24:15 -08:00
PhaneeshB
1aa3255061
Add vaebase for av3 and ad
2022-12-23 04:17:17 +05:30
Daniel Garvey
b01f29f10d
add support for clear_all ( #691 )
2022-12-22 11:25:03 -06:00
Boian Petkantchin
2673abca88
Fix concurrency issue in stress_test for CUDA devices
2022-12-22 08:54:19 -08:00
Gaurav Shukla
7eeb7f0715
[SD] Update all the utilities to make web and CLI codebase closer ( #707 )
...
At this point, all the utilities of SD web and CLI are exactly same.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-22 02:49:48 -08:00
powderluv
37262a2479
Remove spurious characters
20221222.405
20221222.404
2022-12-21 19:23:54 -08:00
Gaurav Shukla
de6e304959
[SD] Fix the resource location in shark_sd.spec ( #706 )
2022-12-21 14:41:56 -08:00
Quinn Dawkins
234475bbc7
Add base_vae entries for variant models ( #705 )
2022-12-21 14:35:08 -08:00
Quinn Dawkins
abbd9f7cfc
[SD] Set unet flags for cuda ( #704 )
2022-12-21 13:22:04 -08:00
Gaurav Shukla
dfd6ba67b3
[SD] Update SD CLI to use model_db.json
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-22 02:13:04 +05:30
yzhang93
1595254eab
Modify model annotation tool to walk through ops by shape ( #692 )
2022-12-21 10:46:30 -08:00
PhaneeshB
6964c5eeba
encapsulate relevant methods in one method
20221221.402
2022-12-21 23:56:17 +05:30
PhaneeshB
2befe771b3
Add support for automatic target triple selection for SD
2022-12-21 22:38:06 +05:30
Prashant Kumar
b133a035a4
Add the download progress bar.
2022-12-21 15:47:33 +05:30
Gaurav Shukla
726c062327
[SD] Update spec files
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-21 14:16:04 +05:30
Gaurav Shukla
9083672de3
[SD][web] Tuned models only for stablediffusion/fp16 and rdna3 cards
...
Currently tuned models are only available for stablediffusion/fp16 and
rdna3 cards.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-21 14:15:39 +05:30
Quinn Dawkins
cdbaf880af
[SD] [web] Add model variants to web
2022-12-21 13:42:22 +05:30
Quinn Dawkins
9434981cdc
Add random seed generation for seed = -1 in cli ( #689 )
2022-12-20 17:15:22 -05:00
Phaneesh Barwaria
8b3706f557
Add Anything v3 and AnalogDiffusion variants of SD ( #685 )
...
* base support for anythingv3
* add analogdiffusiont
* Update readme
* keep max len 77 till support for 64 added for variants
* lint fix
2022-12-20 13:08:13 -08:00
Gaurav Shukla
0d5173833d
[SD] Add a json file for model names information. ( #687 )
...
This commit simplifies the code to identify the model name for a
particular set of flags. This is achieved by introducing a json file
that stores the model names information. The models are uploaded in
gcloud with these names.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-20 11:47:31 -08:00
powderluv
bf1178eb79
roll to build 400
2022-12-20 10:34:31 -08:00
yzhang93
abcd3fa94a
[SD] Set model max length 64 as default ( #681 )
20221220.400
2022-12-19 21:13:04 -08:00
Quinn Dawkins
62aa1614b6
[SD] Add --use_base_vae flag to do conversion to pixel space on cpu ( #682 )
2022-12-19 21:09:39 -08:00
Quinn Dawkins
7027356126
[SD] Fix warmup for max length 64 ( #680 )
2022-12-19 21:04:44 -05:00
yzhang93
5ebe13a13d
Add Unet len 64 tuned model ( #679 )
2022-12-19 16:24:08 -08:00
Gaurav Shukla
c3bed9a2b7
[SD][web] Add flag to disable the progress bar animation
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-20 02:50:04 +05:30
yzhang93
f865222882
Update VAE 19dec tuned model ( #676 )
2022-12-19 12:42:28 -08:00
powderluv
e2fe2e4095
Point to 398
2022-12-19 12:08:30 -08:00
powderluv
0532a95f08
Update stable_diffusion_amd.md
2022-12-19 12:04:42 -08:00
Quinn Dawkins
ff536f6015
[SD] Deduplicate initial noise generation ( #677 )
2022-12-19 14:38:41 -05:00
Gaurav Shukla
097d0f27bb
[SD][web] Add 64 max_length support in SD web
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-20 00:00:58 +05:30
Prashant Kumar
2257f87edf
Update opt_params.py
2022-12-19 23:43:30 +05:30
PhaneeshB
a17800da00
Add 64 len f16 untuned mlir
2022-12-19 22:53:17 +05:30
Prashant Kumar
059c1b3a19
Disable vae --use_tuned version.
20221219.398
2022-12-19 22:45:45 +05:30
Stanley Winata
9a36816d27
[SD][CLI] Add a warmup phase ( #670 )
2022-12-20 00:14:23 +07:00
Gaurav Shukla
7986b9b20b
[SD][WEB] Update VAE model and wrapper
...
This commit updates VAE model which significantly improves performance
by an order of ~300ms.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-19 22:32:05 +05:30
Gaurav Shukla
b2b3a0a62b
[SD] Move initial latent generation out of inference time
...
The initial random latent generation is not taken into account
for total SD inference time.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2022-12-19 22:32:05 +05:30
Prashant Kumar
3173b7d1d9
Update VAE model and wrapper.
2022-12-19 19:54:50 +05:30
Gaurav Shukla
9d716d70d6
[SD][web] Fix performance issues on shark scheduler
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
20221219.397
2022-12-19 17:44:37 +05:30
Stanley Winata
e1901a8608
[SD][CL] Disable print at every iteration. ( #664 )
...
Printing might incur extra time to runtime. Hence, we add a flag to hide it. To disable printing please set this flag `--hide_steps`.
Co-authored-by: Stanley <stanley@MacStudio.lan >
2022-12-19 15:39:57 +07:00
Quinn Dawkins
7d0cbd8d90
[SD][web] Set default tuned unet to v2 ( #663 )
20221219.396
2022-12-19 11:50:08 +07:00
Quinn Dawkins
59358361f9
[SD] Make clip batch 2 for positive and negative prompts ( #662 )
...
Combines the forward passes for each input prompt type into a single batched clip pass.
2022-12-18 23:46:21 -05:00
Quinn Dawkins
7fea2d3b68
[SD] update default large heap size for web as well ( #661 )
20221219.395
2022-12-18 21:50:26 -05:00
Quinn Dawkins
b6d3ff26bd
[SD] Change default VMA large heap block size ( #660 )
2022-12-18 21:41:46 -05:00
Stella Laurenzo
523e63f5c1
Fix NoneType exception if vulkan tuning flags not detected. ( #659 )
...
(This goes on to produce compilation errors, but one step at a time)
2022-12-18 16:40:56 -08:00
Stella Laurenzo
10630ab597
Add config stanza for NVIDIA RTX 2080. ( #658 )
...
Just happened to have this card on my Windows machine and verified that the SD demo works on it.
```
Average step time: 144.26142692565918ms/it
Clip Inference Avg time (ms) = (205.001 + 44.000) / 2 = 124.501
VAE Inference time (ms): 281.001
Total image generation time: 7.856997728347778sec
```
I'd love to add an API upstream to derive compiler tuning flags from a host device.
2022-12-18 16:40:47 -08:00