Stefan Kapusniak
71d25ec5d8
SD: Fix repeatable seeds when intial seed is random ( #1893 )
2023-10-14 22:50:42 -07:00
Vivek Khandelwal
202ffff67b
Add support for sharded Falcon model
2023-10-13 22:05:10 +05:30
Stefan Kapusniak
a208302bb9
Fix repeatable seeds consistency over batch counts ( #1889 )
...
* Set the input seed for the random number generator when
generating repeatable seeds to exclude any negative numbers
in the parsed seed input. The makes seeds generated for
different batch counts consistent where they have the same
input for the initial seed or set of seeds.
2023-10-12 17:15:19 -05:00
Vivek Khandelwal
b83d32fafe
Fix Falcon GPTQ Pipeline
2023-10-11 20:09:32 +05:30
Vivek Khandelwal
0a618e1863
Add support for Falcon GPTQ
2023-10-11 10:47:48 +05:30
Phaneesh Barwaria
a731eb6ed4
Macos fixes ( #1883 )
...
* fix venv setup for MacOS
* allow stream fuse binding on mac
* clean iree metal args
2023-10-09 23:36:12 -07:00
Ean Garvey
2004d16945
Revert "[SDXL] Add SDXL pipeline to SHARK ( #1731 )" ( #1882 )
...
This reverts commit 9f0a421764 .
2023-10-09 18:01:44 -07:00
Gaurav Shukla
6e409bfb77
fix else if syntax error
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-10-10 06:23:56 +05:30
Gaurav Shukla
77727d149c
[warning] Fix dropdown warning
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-10-10 05:18:43 +05:30
Ean Garvey
66f6e79d68
Split CPU/GPU definitions conditionally outside of torch contexts. ( #1879 )
2023-10-09 16:46:41 -07:00
Ean Garvey
3b825579a7
(LLaMa-2) Point to int4 + f32 acc .mlir for cpu ( #1878 )
...
- fixes some issues with non-system prompt invocation
Co-authored-by: Gaurav Shukla <gauravshukla789@gmail.com >
2023-10-09 14:37:35 -05:00
Abhishek Varma
9f0a421764
[SDXL] Add SDXL pipeline to SHARK ( #1731 )
...
-- This commit adds SDXL pipeline to SHARK.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
2023-10-09 13:01:37 -05:00
Gaurav Shukla
c28682110c
[chatbot] Flag to add system prompt
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-10-09 22:17:39 +05:30
Ean Garvey
caf6cc5d8f
Switch most compile flows to use ireec.compile_file. ( #1863 )
...
* Switch most compile flows to use ireec.compile_file.
* re-add input type to compile_str path.
* Check if mlir_module exists before checking if it's a path or pyobject.
* Fix some save_dir cases
2023-10-06 23:04:43 -05:00
Ean Garvey
8614a18474
Remove tf dependencies from importer path. ( #1874 )
...
* Remove tf dependencies from import path.
* Fix formatting.
2023-10-06 12:27:12 -07:00
Jakub Kuderski
86c1c0c215
Add aggregate statistics to microbenchmark ( #1871 )
...
Print averaged results at the end of all iterations. Increase the
default number of iterations to 5.
Example:
```
Number of iterations: 5
Prefill: avg. 0.03 s, stddev 0.00
Decode: avg. 43.34 tokens/s, stdev 0.13
```
Also remove the -2 in the number of generated tokens -- I did not find
any evidence we need it.
2023-10-06 10:03:07 -07:00
Daniel Garvey
8bb364bcb8
enforce fp32 accumulates for cpu ( #1873 )
2023-10-06 11:34:49 -05:00
Daniel Garvey
7abddd01ec
argmax inside model + brevitas pin ( #1872 )
2023-10-05 20:15:21 -07:00
Abhishek Varma
2a451fa0c7
[Llama2] Add a standalone utility for dynamic and combining IRs
...
-- This script adds a standalone utility for converting Llama IRs
to dynamic and combining them as well.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
2023-10-05 20:01:06 +05:30
Jakub Kuderski
9c4610b9da
Add microbenchmark mode to vicuna CLI ( #1864 )
...
Add flags to enable a non-internactive mode for microbenchmarking llama
models. In this mode, the system and user prompts are specified with CLI
flags, and the number of generated tokens and iterations is fixed.
Also move the stats below the response and trim any response blankspace.
2023-10-05 00:12:08 -04:00
Gaurav Shukla
7cc9b3f8e8
[llama cli] Fix llama cli
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-10-03 20:39:53 +05:30
Gaurav Shukla
e54517e967
[UI] Disable config generator, lora train and model manager ( #1858 )
...
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-10-02 22:34:40 -07:00
Ean Garvey
326327a799
Collect pipeline submodules for diffusers ckpt preprocessing. ( #1859 )
2023-10-03 00:29:28 -04:00
Ean Garvey
785b65c7b0
Add flag for specifying device-local caching allocator heap key. ( #1856 )
2023-10-03 00:28:39 -04:00
Vivek Khandelwal
8dd7850c69
Add Falcon-GPTQ support
2023-10-02 16:39:57 +05:30
Gaurav Shukla
e930ba85b4
[os] Remove os dependency from vmfb naming ( #1854 )
...
Also fixes a small ui issue for chatbot.
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-29 12:38:17 -05:00
Gaurav Shukla
cd732e7a38
[chatbot] split execution time to prefill and decode
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-29 13:18:03 +05:30
Gaurav Shukla
8e0f8b3227
[ui] Update chatbot UI
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-29 13:18:03 +05:30
Gaurav Shukla
b8210ef796
[chatbot] Re-instantiate the chatbot object if device id changes
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-29 13:18:03 +05:30
PhaneeshB
94594542a9
remove use of vulkaninfo
2023-09-28 21:57:00 +05:30
Gaurav Shukla
82f833e87d
[vulkan] Update vmfb naming
...
Update vmfb naming for vulkan devices in order to resolve naming
conflicts in the presence of multiple vulkan devices.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-28 14:52:11 +05:30
Vivek Khandelwal
c9d6870105
Modify falcon pipeline for 180b support
2023-09-28 12:39:35 +05:30
Nelson Sharpe
6773278ec2
Fix checkpoint_path unexpected argument ( #1832 )
2023-09-24 14:17:52 -07:00
Abhishek Varma
9a0efffcca
[Llama2] Fix wrong Vulkan device ID + Add Vulkan compile flags
...
-- This commit fixes the wrong Vulkan device being selected during
runtime.
-- It also adds couple of IREE compilation flags to target specific
Vulkan device.
-- It also changes the Vulkan device listing to be more in tune with
lowering control flow.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
2023-09-22 22:24:18 +05:30
Quinn Dawkins
ded74d09cd
[vicuna.py] Keep past key values on device ( #1836 )
...
The past key values are only used within the models themselves and can
be kept on device. For vulkan int4, this gives 44 tok/s (for the first
prompt) and settles at around 26 tok/s on 7900xtx.
2023-09-19 18:17:41 -04:00
zjgarvey
9eceba69b7
local_tank_cache included into clear_all ( #1833 )
2023-09-18 00:27:23 -05:00
Ean Garvey
684943a4a6
(SD) Fix tokenizers imports in pyinstaller builds. ( #1828 )
...
* Fix tokenizers metadata.
* (SD) Disable VAE lowering configs (rdna3) and add versioned tunings.
* Update sd_annotation.py
* (SD) Add cv2 to spec.
* Update stencil pipeline with the new img2img arg.
2023-09-12 12:23:48 -05:00
PhaneeshB
b817bb8455
add roles for llama2
2023-09-12 10:59:28 +05:30
Ean Garvey
780f520f02
Fix vk.target_env extensions and remove redundant SD imports. ( #1826 )
...
* Remove redundant IREE runtime imports.
* Fix vulkan target env extensions.
2023-09-11 13:42:52 -05:00
Abhishek Varma
c854208d49
[Llama2] Prefetch llama2 tokenizer configs ( #1824 )
...
-- This commit prefetches llama2 tokenizer configs from shark_tank.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
2023-09-08 11:29:54 -07:00
Gaurav Shukla
c5dcfc1f13
[vicuna] Exit when mlir is not present in shark tank ( #1825 )
...
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-08 10:30:29 -07:00
Abhishek Varma
bde63ee8ae
Add logging feature in WebUI ( #1821 )
2023-09-08 05:48:05 -07:00
Gaurav Shukla
ede6bf83e2
[vicuna] Disabling the IR generation path
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-09-06 20:13:17 +05:30
Gaurav Shukla
d2f64eefa3
[chatbot] Remove few outdated models from list ( #1814 )
2023-09-04 09:26:32 -07:00
Phaneesh Barwaria
1ccafa1fc1
fix llama2-70b rewrite tensor dim
2023-09-01 17:27:06 +05:30
jinchen62
4c3d8a0a7f
Enable downloading vmfb/mlir for webui ( #1807 )
2023-08-31 11:05:47 -07:00
jinchen62
3601dc7c3b
Fix llama2 13b combined ir ( #1803 )
2023-08-28 11:34:44 -07:00
Daniel Garvey
671881cf87
Llama2 70b ( #1783 )
...
* llama2 70b IR gen
* fix IR sec llama2 + debug
* llama270b
---------
Co-authored-by: PhaneeshB <b.phaneesh@gmail.com >
2023-08-25 23:04:28 -07:00
Gaurav Shukla
4e9be6be59
[chatbot] Add debug as class attribute ( #1799 )
...
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-08-25 21:46:29 -07:00
Ean Garvey
9c8cbaf498
Add support for ROCM (Windows) in Studio + compile utils ( #1770 )
...
* WIP: MSVC ROCM support for SHARK Studio
* Make get_iree_rocm_args platform-agnostic.
* Update stable_args.py
* Update rocm arg handling in SD utils
* Guard quantization imports.
Co-authored-by: jam https://github.com/jammm
2023-08-25 20:56:05 -07:00