-- This script adds a standalone utility for converting Llama IRs
to dynamic and combining them as well.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Add flags to enable a non-internactive mode for microbenchmarking llama
models. In this mode, the system and user prompts are specified with CLI
flags, and the number of generated tokens and iterations is fixed.
Also move the stats below the response and trim any response blankspace.
Update vmfb naming for vulkan devices in order to resolve naming
conflicts in the presence of multiple vulkan devices.
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
-- This commit fixes the wrong Vulkan device being selected during
runtime.
-- It also adds couple of IREE compilation flags to target specific
Vulkan device.
-- It also changes the Vulkan device listing to be more in tune with
lowering control flow.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
The past key values are only used within the models themselves and can
be kept on device. For vulkan int4, this gives 44 tok/s (for the first
prompt) and settles at around 26 tok/s on 7900xtx.
* WIP: MSVC ROCM support for SHARK Studio
* Make get_iree_rocm_args platform-agnostic.
* Update stable_args.py
* Update rocm arg handling in SD utils
* Guard quantization imports.
Co-authored-by: jam https://github.com/jammm
* [Llama2] Add a fix for Llama2 13B downloading/crashing
-- This commit fixes downloading/crashing of llama2 13B on wrong
.mlir file.
-- Also adds support for downloading vmfb from shark_tank in CLI.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
* [llama2] Add a spec file to run Llama/Vicuna CLI exe
-- This commit adds a spec file to run Llama/Vicuna CLI exe.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
---------
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
* improper test hiresfix
* add sliders & use `clear_cache`
* add resample choices & fix step adjustment
* add step adjustment to img2img
* add resample options to img2img
* simplify hiresfix
- import `img2img_inf` from `img2img_ui.py` instead of just copying it into `txt2img_ui.py`
* set `hri` to None after using
* add more resample types, and don't show output until hiresfix is done
* cleaner implementation
* ran black
* ran black again with jupyter dependencies