* improved sharded performance and fixed issue with lmhead on rocm
* mmap shards + disable sharing of device arrays across devices
* fix device_idx for non-layer vmfbs
* fix time calc for sharded
---------
Co-authored-by: Elias Joseph <elias@nod-labs.com>
Co-authored-by: PhaneeshB <b.phaneesh@gmail.com>
* Fix some issues with defaults
Fixes to llama2 cpu compilation (turns off data tiling for old argmax
mode)
---------
Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
* Update default CPU compilation flags.
c5a6cdc8dd52eb7e9b82
tweak CPU iree-compile flags to match upstream changes.
* Add an option for data tiling on SD models.
* Move clean_device_info to compile_utils
* Update compile_utils.py
* Fix .mlir writes for some user-level permissions
* Fix cases where full URI is given
* Fix conditionals.
* Fix device path handling in vulkan utils.
compile_str is always False in compile_module_to_flatbuffer since there
is a parameter 'model_name' before 'debug'.
This issue is relative to https://github.com/nod-ai/SHARK/pull/1863.
Then we can use mlir model buffer in RAM to run inference.
* Switch most compile flows to use ireec.compile_file.
* re-add input type to compile_str path.
* Check if mlir_module exists before checking if it's a path or pyobject.
* Fix some save_dir cases
Print a note ahead of a potentially long inactivity to set the right expectations.
Separately, we should add progress to the UI and make this loading faster.
-- Currently SHARK suggests that vmfb has been saved, while
that is not the case and no vmfb is generated.
This creates a misdirection for IR/vmfbs which are of larger
size.
-- This commit therefore fixes that misdirection.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
This allows to pass more arguemnts to the IREE compiler
Example:
python my-app.py --additional_compile_args="--mlir-pretty-debuginfo --mlir-timing"
Co-authored-by: Boian Petkantchin <boian@nod-labs.com>
-- This commit adds Scaled Dot Product Flash Attention's decomposition
in shark_importer.
-- It also updates `iree-flow-enable-data-tiling` to `iree-opt-data-tiling`.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
* WIP: MSVC ROCM support for SHARK Studio
* Make get_iree_rocm_args platform-agnostic.
* Update stable_args.py
* Update rocm arg handling in SD utils
* Guard quantization imports.
Co-authored-by: jam https://github.com/jammm
* Optimize device enumeration overhead and log details on long operations.
* Various fixes to add `@functools.cache` to what should be one time, expensive, device enumeration and setup activities. Cuts several seconds off of initialization on my machine.
* Add detailed tracing to actual invocations if they exceed a certain timeout or have an exception.
* Add detailed tracing to loading status.
* By default detail logging is only printed if an operation takes an excessive amount of time. All logging/timing can be printed by setting the variable `$env:SHARK_DETAIL_TRACE = "1"`
* Remove cache from unhashable functions
* Adding metal_utils for iree_utils
* Add patch for making compile API work for both MEGABYTE and MiniGPT4 (#1559)
-- It also modifies the mega_test.py script
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
* [SD] Update unet in_channels API and add PIL metadata to spec. (#1560)
* Fix deprecation warning for unet config.
* Include PIL metadata instead of hidden imports in SD spec.
* Fixing iree-metal-target-platform
* adding metal to txt2img pipeline
* Fixing Copyright date
* removing debug prints
* black lint formating
* fixing device dump
---------
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <avarma094@gmail.com>
Co-authored-by: Ean Garvey <87458719+monorimet@users.noreply.github.com>
Co-authored-by: powderluv <powderluv@users.noreply.github.com>
* Do not hardcode the name of the VM module in get_iree_module
* Add example JAX MiniLM inference
---------
Co-authored-by: Boian Petkantchin <boian@nod-labs.com>
Example:
$ python my_app.py --device_allocator caching debug
This will wrap the device allocator with first caching allocator then
debug allocator.
$ python my_app.py --device_allocator caching
Only wrap with caching allocator.
Co-authored-by: Boian Petkantchin <boian@nod-labs.com>