Abhishek Varma
2a451fa0c7
[Llama2] Add a standalone utility for dynamic and combining IRs
...
-- This script adds a standalone utility for converting Llama IRs
to dynamic and combining them as well.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
2023-10-05 20:01:06 +05:30
Jakub Kuderski
9c4610b9da
Add microbenchmark mode to vicuna CLI ( #1864 )
...
Add flags to enable a non-internactive mode for microbenchmarking llama
models. In this mode, the system and user prompts are specified with CLI
flags, and the number of generated tokens and iterations is fixed.
Also move the stats below the response and trim any response blankspace.
20231004.977
2023-10-05 00:12:08 -04:00
powderluv
a38cc9d216
Update vulkan_utils.py for Radeon 780m igpu ( #1866 )
2023-10-04 20:33:07 -07:00
Jakub Kuderski
1c382449ec
[vulkan] Print note about module load times. NFC. ( #1862 )
...
Print a note ahead of a potentially long inactivity to set the right expectations.
Separately, we should add progress to the UI and make this loading faster.
20231004.976
20231003.975
2023-10-03 17:27:27 -04:00
Gaurav Shukla
7cc9b3f8e8
[llama cli] Fix llama cli
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
20231003.974
2023-10-03 20:39:53 +05:30
Gaurav Shukla
e54517e967
[UI] Disable config generator, lora train and model manager ( #1858 )
...
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-10-02 22:34:40 -07:00
Ean Garvey
326327a799
Collect pipeline submodules for diffusers ckpt preprocessing. ( #1859 )
20231002.973
20231002.972
2023-10-03 00:29:28 -04:00
Ean Garvey
785b65c7b0
Add flag for specifying device-local caching allocator heap key. ( #1856 )
2023-10-03 00:28:39 -04:00
Sungsoon Cho
0d16c81687
Remove unused import. ( #1857 )
2023-10-02 11:36:08 -05:00
Vivek Khandelwal
8dd7850c69
Add Falcon-GPTQ support
2023-10-02 16:39:57 +05:30
Gaurav Shukla
e930ba85b4
[os] Remove os dependency from vmfb naming ( #1854 )
...
Also fixes a small ui issue for chatbot.
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
20231001.971
20230930.970
20230930.969
20230929.968
2023-09-29 12:38:17 -05:00
Gaurav Shukla
cd732e7a38
[chatbot] split execution time to prefill and decode
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-29 13:18:03 +05:30
Gaurav Shukla
8e0f8b3227
[ui] Update chatbot UI
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-29 13:18:03 +05:30
Gaurav Shukla
b8210ef796
[chatbot] Re-instantiate the chatbot object if device id changes
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-29 13:18:03 +05:30
PhaneeshB
94594542a9
remove use of vulkaninfo
20230928.967
2023-09-28 21:57:00 +05:30
Gaurav Shukla
82f833e87d
[vulkan] Update vmfb naming
...
Update vmfb naming for vulkan devices in order to resolve naming
conflicts in the presence of multiple vulkan devices.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-28 14:52:11 +05:30
Vivek Khandelwal
c9d6870105
Modify falcon pipeline for 180b support
2023-09-28 12:39:35 +05:30
Jakub Kuderski
4fec03a6cc
[vulkan] Switch from coop matrix NV to KHR ( #1848 )
20230927.965
2023-09-27 21:43:37 -04:00
harsh-nod
9a27f51378
Deprecate inference directory
...
This patch removes the inference directory that was no longer being used.
2023-09-27 14:29:00 -07:00
Abhishek Varma
ad1a0f35ff
Fix misdirection while saving vmfb
...
-- Currently SHARK suggests that vmfb has been saved, while
that is not the case and no vmfb is generated.
This creates a misdirection for IR/vmfbs which are of larger
size.
-- This commit therefore fixes that misdirection.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
2023-09-27 16:25:29 +05:30
Nelson Sharpe
6773278ec2
Fix checkpoint_path unexpected argument ( #1832 )
20230926.964
20230925.963
20230924.962
2023-09-24 14:17:52 -07:00
Abhishek Varma
9a0efffcca
[Llama2] Fix wrong Vulkan device ID + Add Vulkan compile flags
...
-- This commit fixes the wrong Vulkan device being selected during
runtime.
-- It also adds couple of IREE compilation flags to target specific
Vulkan device.
-- It also changes the Vulkan device listing to be more in tune with
lowering control flow.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
20230923.961
20230922.960
2023-09-22 22:24:18 +05:30
gpetters94
61c6f153d9
Switch to keras-nightly to fix a Linux issue ( #1835 )
20230921.959
2023-09-21 12:33:45 -04:00
Phaneesh Barwaria
effd42e8f5
pin gradio to v3.44.3
2023-09-21 17:33:43 +05:30
Sungsoon Cho
b5fbb1a8a0
Rename the func arg save_json to avoid name collision. ( #1837 )
...
* Rename the func arg save_json to avoid name collision.
* black formatted.
20230920.958
20230919.957
2023-09-19 17:29:27 -05:00
Quinn Dawkins
ded74d09cd
[vicuna.py] Keep past key values on device ( #1836 )
...
The past key values are only used within the models themselves and can
be kept on device. For vulkan int4, this gives 44 tok/s (for the first
prompt) and settles at around 26 tok/s on 7900xtx.
2023-09-19 18:17:41 -04:00
Boian Petkantchin
79267931c1
Add argument --additional_compile_args ( #1119 )
...
This allows to pass more arguemnts to the IREE compiler
Example:
python my-app.py --additional_compile_args="--mlir-pretty-debuginfo --mlir-timing"
Co-authored-by: Boian Petkantchin <boian@nod-labs.com >
2023-09-19 11:26:03 -05:00
zjgarvey
9eceba69b7
local_tank_cache included into clear_all ( #1833 )
20230918.956
2023-09-18 00:27:23 -05:00
Ean Garvey
ca609afb6a
Update README.md ( #1830 )
20230917.955
20230916.954
20230915.953
20230914.951
20230914.950
2023-09-14 10:33:57 -05:00
Gaurav Shukla
11bdce9790
[flags] Fix vulkan runtime flags as vma is dropped from iree ( #1831 )
2023-09-14 08:58:59 -05:00
Ean Garvey
684943a4a6
(SD) Fix tokenizers imports in pyinstaller builds. ( #1828 )
...
* Fix tokenizers metadata.
* (SD) Disable VAE lowering configs (rdna3) and add versioned tunings.
* Update sd_annotation.py
* (SD) Add cv2 to spec.
* Update stencil pipeline with the new img2img arg.
20230913.949
20230912.948
2023-09-12 12:23:48 -05:00
PhaneeshB
b817bb8455
add roles for llama2
2023-09-12 10:59:28 +05:30
Ean Garvey
780f520f02
Fix vk.target_env extensions and remove redundant SD imports. ( #1826 )
...
* Remove redundant IREE runtime imports.
* Fix vulkan target env extensions.
20230911.946
2023-09-11 13:42:52 -05:00
Dom
c61b6f8d65
Code refactoring ( #1817 )
...
* use join
* fix bug
* further code optimizations
---------
Co-authored-by: Daniel Garvey <34486624+dan-garvey@users.noreply.github.com >
2023-09-11 11:30:56 -05:00
Abhishek Varma
c854208d49
[Llama2] Prefetch llama2 tokenizer configs ( #1824 )
...
-- This commit prefetches llama2 tokenizer configs from shark_tank.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
20230910.942
20230909.941
20230908.940
2023-09-08 11:29:54 -07:00
Gaurav Shukla
c5dcfc1f13
[vicuna] Exit when mlir is not present in shark tank ( #1825 )
...
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-08 10:30:29 -07:00
Abhishek Varma
bde63ee8ae
Add logging feature in WebUI ( #1821 )
2023-09-08 05:48:05 -07:00
Vivek Khandelwal
9681d494eb
Update decomp list and shark trainer for DLRM
20230907.939
20230907.938
2023-09-06 21:24:50 +05:30
Gaurav Shukla
ede6bf83e2
[vicuna] Disabling the IR generation path
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-06 20:13:17 +05:30
Ean Garvey
2c2693fb7d
Fix torchvision versioning in Linux importer setup. ( #1809 )
20230905.935
20230905.934
2023-09-05 12:57:03 -05:00
Vivek Khandelwal
1d31b2b2c6
Fix StableHLO Compilation flag
2023-09-05 21:32:33 +05:30
Gaurav Shukla
d2f64eefa3
[chatbot] Remove few outdated models from list ( #1814 )
20230904.932
2023-09-04 09:26:32 -07:00
Abhishek Varma
87ae14b6ff
[SD] Add sdpfa decomposition + update IREE flag
...
-- This commit adds Scaled Dot Product Flash Attention's decomposition
in shark_importer.
-- It also updates `iree-flow-enable-data-tiling` to `iree-opt-data-tiling`.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
20230904.931
2023-09-04 18:03:53 +05:30
Phaneesh Barwaria
1ccafa1fc1
fix llama2-70b rewrite tensor dim
20230903.930
20230903.929
20230902.928
20230901.927
2023-09-01 17:27:06 +05:30
jinchen62
4c3d8a0a7f
Enable downloading vmfb/mlir for webui ( #1807 )
20230831.925
20230831.924
2023-08-31 11:05:47 -07:00
jinchen62
3601dc7c3b
Fix llama2 13b combined ir ( #1803 )
20230830.923
20230829.922
20230829.921
20230828.920
2023-08-28 11:34:44 -07:00
Daniel Garvey
671881cf87
Llama2 70b ( #1783 )
...
* llama2 70b IR gen
* fix IR sec llama2 + debug
* llama270b
---------
Co-authored-by: PhaneeshB <b.phaneesh@gmail.com >
20230827.919
20230826.918
20230826.917
2023-08-25 23:04:28 -07:00
Gaurav Shukla
4e9be6be59
[chatbot] Add debug as class attribute ( #1799 )
...
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
20230825.916
20230825.915
2023-08-25 21:46:29 -07:00
Ean Garvey
9c8cbaf498
Add support for ROCM (Windows) in Studio + compile utils ( #1770 )
...
* WIP: MSVC ROCM support for SHARK Studio
* Make get_iree_rocm_args platform-agnostic.
* Update stable_args.py
* Update rocm arg handling in SD utils
* Guard quantization imports.
Co-authored-by: jam https://github.com/jammm
20230825.914
2023-08-25 20:56:05 -07:00
Ean Garvey
9e348a114e
Revert changes process_skipfiles.py ( #1798 )
...
Keeps a small typo fix but reverts the rest of changes to this file from 450c231171
2023-08-25 15:31:49 -07:00