* HF-Reference LLM mode.
* Fixup test to match current output from Turbine.
* lint
* Fix test error message + Only initialize HF torch model when used.
* Remove redundant format_out change.
* Streaming LLM
* Update precision and add gpu support
* (studio2) Separate weights generation for quantization support
* Adapt prompt changes to studio flow
* Remove outdated flag from llm compile flags.
* (studio2) use turbine vmfbRunner
* tweaks to prompts
* Update CPU path and llm api test.
* Change device in test to cpu.
* Fixes to runner, device names, vmfb mgmt
* Use small test without external weights.
* Upgrade Gradio pin from 4.7.1 to 4.80.
* Make Nod AI logos visible again.
* Remove image toolbars from png import boxes.
* Set Input Images on img2img, outpaint and upscaler tabs to be upload
only.
* Change Image control to an ImageEditor control for masking on the
inpaint tab. Remove previous height restriction as this hides the
editing controls.
* Move Input Image/Masked Image on img2img, inpaint, outpaint and
upscaler tabs to be the first control on their tabs.
* Remove download buttons from all galleries as they download some
html rather the image (gradio issue #6595)
* Remove add new row and column from Output Gallery parameters
dataframe.
* Add partial workaround for not being able to select text in the Output
Gallery Gallery parameters dataframe (gradio issue #6086 )
* Fix uglified formatting of subdirectory selection dropown, refresh
button, and open folder buttons on the Output Gallery tab.
* Force Output Gallery to use the full width of the Gallery control
for the preview overlay when an image is selected, rather than
an overlay the width of the selected image.
* Fix sendto buttons.
* Reset Inpaint ImageEditor control with the Mask Layer after generation
is complete, as it gets lost if the image was sent to the tab from
another tab rather than being uploaded. Also rework queuing and
progress rendering along this codepath. This doesn't solve the
underlying problem of the Mask Layer being removed, but does get inpaint
fully working with the Gradio update.
* improved sharded performance and fixed issue with lmhead on rocm
* mmap shards + disable sharing of device arrays across devices
* fix device_idx for non-layer vmfbs
* fix time calc for sharded
---------
Co-authored-by: Elias Joseph <elias@nod-labs.com>
Co-authored-by: PhaneeshB <b.phaneesh@gmail.com>
* Fix some issues with defaults
Fixes to llama2 cpu compilation (turns off data tiling for old argmax
mode)
---------
Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
* Update default CPU compilation flags.
c5a6cdc8dd52eb7e9b82
tweak CPU iree-compile flags to match upstream changes.
* Add an option for data tiling on SD models.
This makes the program wait for tracy profiler to connect before exiting
and flush profiling data after each token.
I don't know how to select the tracy iree-runtime variant
programatically -- instead, print an error and exit.
- Move statistics out of the main loop
- Add 'end-to-end' numbers
- Switch the main display unit from s to ms
- Start measuring time at 0
The new print format looks like this:
```
Number of iterations: 5
Num tokens: 1 (prompt), 512 (generated), 513 (total)
Prefill: avg. 0.01 ms (stdev 0.00), avg. 97.99 tokens/s
Decode: avg. 4840.44 ms (stdev 28.80), avg. 97.99 tokens/s
Decode end-2-end: avg. 85.78 tokens/s (w/o prompt), avg. 95.98 (w/ prompt)
```
* Add .mlir to the fiiles that are deleted from `./shark_tmp` when studio
is started.
* refactor/rename existing gradio temp file cleanup on startup to be
consistent with a general `./shark_tmp` cleanup
* Move clean_device_info to compile_utils
* Update compile_utils.py
* Fix .mlir writes for some user-level permissions
* Fix cases where full URI is given
* Fix conditionals.
* Fix device path handling in vulkan utils.
- add torchvision to setup_venv.ps1 -- we need this for the torchvision::nms that is now a dependency of controlnet features.
- Don't have bad flashy orange updates when using the chatbot
- Don't limit the height of the chatbot -- there's mixed opinions and solutions around this one. I think the default (400) is just way too small and LLMs generate plenty enough to justify matching the output.
* Workaround gradio bug that causes the parameters frame to always show
scrollbars.
* Remove the original funky method of setting the number of image
columns in the gallery using _fn= javacript events. The version
of gradio we now have pinned allows doing this by setting the property
on the gallery directly and also doesn't keep resetting the columns on
other events being fired.
* Adds a function to webui utils to read metadata from
.safetensors LoRA files. and do limiting parsing of the format written
out by the Kohya SS scripts (https://github.com/kohya-ss/sd-scripts)
to get tag frequency and trained model information.
* Adds a new common_ui_events.py file for gradio event handlers
needed for multiple UI tabs, and adds an event handler for binding to
the change event of the LoRA selection boxes, that outputs HTML
to display the LoRA tag frequency and model information.
* Adds an HTML gradio control to each of the SD tabs to show the
LoRA model name, and most frequently trained tags.
* Bind the change event of the LoRA selection box on each tab
to our new event handler, with the output set to the relevant HTML
control.
* Restrict resamplers for img2img and high res fix to the ones that
PIL.Image actually supports, since it uses that to di the resampling.
Removed: Antialias, Affine, Cubic. Added: Hamming.
* Set list of available schedulers to CPU only when high res fix
is selected in the web ui. Set list to all schdulers when high res fix
is deselected.
* Put hi res fix in its own Accordian in the txt2img UI instead of
grouping it with Advanced Options.