Update setup_venv.ps1 (#1064 )

[SD] Update need_vae_encode correctly
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>
2026-04-20 03:00:34 -04:00 · 2023-02-21 14:13:04 -05:00 · 2023-02-21 20:26:06 +05:30 · 2023-02-21 18:46:15 +05:30 · 2023-02-20 22:04:25 -06:00 · 2023-02-20 14:46:26 -06:00
16 changed files with 109 additions and 61 deletions
--- a/README.md
+++ b/README.md
@@ -215,14 +215,14 @@ python -m  shark.examples.shark_inference.resnet50_script --device="cpu" # Use g
 pytest tank/test_models.py -k "MiniLM"
 ```
  
-
+### How to use your locally built IREE / Torch-MLIR with SHARK
 If you are a *Torch-mlir developer or an IREE developer* and want to test local changes you can uninstall
 the provided packages with `pip uninstall torch-mlir` and / or `pip uninstall iree-compiler iree-runtime` and build locally
 with Python bindings and set your PYTHONPATH as mentioned [here](https://github.com/iree-org/iree/tree/main/docs/api_docs/python#install-iree-binaries)
 for IREE and [here](https://github.com/llvm/torch-mlir/blob/main/development.md#setup-python-environment-to-export-the-built-python-packages)
 for Torch-MLIR.

-### How to use your locally built Torch-MLIR with SHARK
+How to use your locally built Torch-MLIR with SHARK:
 ```shell
 1.) Run `./setup_venv.sh in SHARK` and activate `shark.venv` virtual env.
 2.) Run `pip uninstall torch-mlir`.
@@ -240,9 +240,15 @@ Now the SHARK will use your locally build Torch-MLIR repo.

 ## Benchmarking Dispatches

-To produce benchmarks of individual dispatches, you can add `--dispatch_benchmarks=All --dispatch_benchmarks_dir=<output_dir>` to your command line argument.  
+To produce benchmarks of individual dispatches, you can add `--dispatch_benchmarks=All --dispatch_benchmarks_dir=<output_dir>` to your pytest command line argument.  
 If you only want to compile specific dispatches, you can specify them with a space seperated string instead of `"All"`.  E.G. `--dispatch_benchmarks="0 1 2 10"`

+For example, to generate and run dispatch benchmarks for MiniLM on CUDA:
+```
+pytest -k "MiniLM and torch and static and cuda" --benchmark_dispatches=All -s --dispatch_benchmarks_dir=./my_dispatch_benchmarks                                                                                
+```
+The given command will populate `<dispatch_benchmarks_dir>/<model_name>/` with an `ordered_dispatches.txt` that lists and orders the dispatches and their latencies, as well as folders for each dispatch that contain .mlir, .vmfb, and results of the benchmark for that dispatch.
+
 if you want to instead incorporate this into a python script, you can pass the `dispatch_benchmarks` and `dispatch_benchmarks_dir` commands when initializing `SharkInference`, and the benchmarks will be generated when compiled.  E.G:

 ```
@@ -266,7 +272,7 @@ Output will include:
 - A .txt file containing benchmark output


-See tank/README.md for instructions on how to run model tests and benchmarks from the SHARK tank.
+See tank/README.md for further instructions on how to run model tests and benchmarks from the SHARK tank.

 </details>

--- a/apps/stable_diffusion/scripts/img2img.py
+++ b/apps/stable_diffusion/scripts/img2img.py
@@ -104,7 +104,7 @@ def img2img_inf(
        width,
        device,
    )
-    if config_obj != new_config_obj:
+    if not img2img_obj or config_obj != new_config_obj:
        config_obj = new_config_obj
        args.precision = precision
        args.batch_size = batch_size
@@ -136,11 +136,9 @@ def img2img_inf(
            args.width,
            args.use_base_vae,
            args.use_tuned,
+            low_cpu_mem_usage=args.low_cpu_mem_usage,
        )

-    if not img2img_obj:
-        sys.exit("text to image pipeline must not return a null value")
-
    img2img_obj.scheduler = schedulers[scheduler]

    start_time = time.time()
@@ -230,6 +228,7 @@ if __name__ == "__main__":
        args.width,
        args.use_base_vae,
        args.use_tuned,
+        low_cpu_mem_usage=args.low_cpu_mem_usage,
    )

    start_time = time.time()
--- a/apps/stable_diffusion/scripts/inpaint.py
+++ b/apps/stable_diffusion/scripts/inpaint.py
@@ -97,7 +97,7 @@ def inpaint_inf(
        width,
        device,
    )
-    if config_obj != new_config_obj:
+    if not inpaint_obj or config_obj != new_config_obj:
        config_obj = new_config_obj
        args.precision = precision
        args.batch_size = batch_size
@@ -131,9 +131,6 @@ def inpaint_inf(
            args.use_tuned,
        )

-    if not inpaint_obj:
-        sys.exit("text to image pipeline must not return a null value")
-
    inpaint_obj.scheduler = schedulers[scheduler]

    start_time = time.time()
--- a/apps/stable_diffusion/scripts/txt2img.py
+++ b/apps/stable_diffusion/scripts/txt2img.py
@@ -94,7 +94,7 @@ def txt2img_inf(
        width,
        device,
    )
-    if config_obj != new_config_obj:
+    if not txt2img_obj or config_obj != new_config_obj:
        config_obj = new_config_obj
        args.precision = precision
        args.batch_size = batch_size
@@ -105,6 +105,7 @@ def txt2img_inf(
        args.iree_vulkan_target_triple = ""
        args.use_tuned = True
        args.import_mlir = False
+        args.img_path = None
        set_init_device_flags()
        model_id = (
            args.hf_model_id
@@ -126,11 +127,9 @@ def txt2img_inf(
            args.width,
            args.use_base_vae,
            args.use_tuned,
+            low_cpu_mem_usage=args.low_cpu_mem_usage,
        )

-    if not txt2img_obj:
-        sys.exit("text to image pipeline must not return a null value")
-
    txt2img_obj.scheduler = schedulers[scheduler]

    start_time = time.time()
@@ -199,6 +198,7 @@ if __name__ == "__main__":
        args.width,
        args.use_base_vae,
        args.use_tuned,
+        low_cpu_mem_usage=args.low_cpu_mem_usage,
    )

    for current_batch in range(args.batch_count):
--- a/apps/stable_diffusion/src/models/model_wrappers.py
+++ b/apps/stable_diffusion/src/models/model_wrappers.py
@@ -80,6 +80,7 @@ class SharkifyStableDiffusionModel:
        batch_size: int = 1,
        use_base_vae: bool = False,
        use_tuned: bool = False,
+        low_cpu_mem_usage: bool = False
    ):
        self.check_params(max_len, width, height)
        self.max_len = max_len
@@ -114,6 +115,7 @@ class SharkifyStableDiffusionModel:
        if use_tuned:
            self.model_name = self.model_name + "_tuned"
        self.model_name = self.model_name + "_" + get_path_stem(self.model_id)
+        self.low_cpu_mem_usage = low_cpu_mem_usage

    def get_extended_name_for_all_model(self):
        model_name = {}
@@ -139,11 +141,12 @@ class SharkifyStableDiffusionModel:

    def get_vae_encode(self):
        class VaeEncodeModel(torch.nn.Module):
-            def __init__(self, model_id=self.model_id):
+            def __init__(self, model_id=self.model_id, low_cpu_mem_usage=False):
                super().__init__()
                self.vae = AutoencoderKL.from_pretrained(
                    model_id,
                    subfolder="vae",
+                    low_cpu_mem_usage=low_cpu_mem_usage,
                )

            def forward(self, input):
@@ -165,23 +168,26 @@ class SharkifyStableDiffusionModel:

    def get_vae(self):
        class VaeModel(torch.nn.Module):
-            def __init__(self, model_id=self.model_id, base_vae=self.base_vae, custom_vae=self.custom_vae):
+            def __init__(self, model_id=self.model_id, base_vae=self.base_vae, custom_vae=self.custom_vae, low_cpu_mem_usage=False):
                super().__init__()
                self.vae = None
                if custom_vae == "":
                    self.vae = AutoencoderKL.from_pretrained(
                        model_id,
                        subfolder="vae",
+                        low_cpu_mem_usage=low_cpu_mem_usage,
                    )
                elif not isinstance(custom_vae, dict):
                    self.vae = AutoencoderKL.from_pretrained(
                        custom_vae,
                        subfolder="vae",
+                        low_cpu_mem_usage=low_cpu_mem_usage,
                    )
                else:
                    self.vae = AutoencoderKL.from_pretrained(
                        model_id,
                        subfolder="vae",
+                        low_cpu_mem_usage=low_cpu_mem_usage,
                    )
                    self.vae.load_state_dict(custom_vae)
                self.base_vae = base_vae
@@ -196,7 +202,7 @@ class SharkifyStableDiffusionModel:
                x = x * 255.0
                return x.round()

-        vae = VaeModel()
+        vae = VaeModel(low_cpu_mem_usage=self.low_cpu_mem_usage)
        inputs = tuple(self.inputs["vae"])
        is_f16 = True if self.precision == "fp16" else False
        shark_vae = compile_through_fx(
@@ -211,11 +217,12 @@ class SharkifyStableDiffusionModel:

    def get_unet(self):
        class UnetModel(torch.nn.Module):
-            def __init__(self, model_id=self.model_id):
+            def __init__(self, model_id=self.model_id, low_cpu_mem_usage=False):
                super().__init__()
                self.unet = UNet2DConditionModel.from_pretrained(
                    model_id,
                    subfolder="unet",
+                    low_cpu_mem_usage=low_cpu_mem_usage,
                )
                self.in_channels = self.unet.in_channels
                self.train(False)
@@ -234,7 +241,7 @@ class SharkifyStableDiffusionModel:
                )
                return noise_pred

-        unet = UnetModel()
+        unet = UnetModel(low_cpu_mem_usage=self.low_cpu_mem_usage)
        is_f16 = True if self.precision == "fp16" else False
        inputs = tuple(self.inputs["unet"])
        input_mask = [True, True, True, False]
@@ -251,17 +258,18 @@ class SharkifyStableDiffusionModel:

    def get_clip(self):
        class CLIPText(torch.nn.Module):
-            def __init__(self, model_id=self.model_id):
+            def __init__(self, model_id=self.model_id, low_cpu_mem_usage=False):
                super().__init__()
                self.text_encoder = CLIPTextModel.from_pretrained(
                    model_id,
                    subfolder="text_encoder",
+                    low_cpu_mem_usage=low_cpu_mem_usage,
                )

            def forward(self, input):
                return self.text_encoder(input)[0]

-        clip_model = CLIPText()
+        clip_model = CLIPText(low_cpu_mem_usage=self.low_cpu_mem_usage)
        shark_clip = compile_through_fx(
            clip_model,
            tuple(self.inputs["clip"]),
@@ -326,6 +334,8 @@ class SharkifyStableDiffusionModel:
            if args.hf_model_id == "":
                sys.exit("Base model configuration for the custom model is missing. Use `--clear_all` and re-run.")
            print("Loaded vmfbs from cache and successfully fetched base model configuration.")
+            if not need_vae_encode:
+                return vmfbs[:3]
            return vmfbs

        # Step 2:
--- a/apps/stable_diffusion/src/pipelines/pipeline_shark_stable_diffusion_utils.py
+++ b/apps/stable_diffusion/src/pipelines/pipeline_shark_stable_diffusion_utils.py
@@ -201,6 +201,7 @@ class StableDiffusionPipeline:
        width: int,
        use_base_vae: bool,
        use_tuned: bool,
+        low_cpu_mem_usage: bool = False,
    ):
        if import_mlir:
            mlir_import = SharkifyStableDiffusionModel(
@@ -214,6 +215,7 @@ class StableDiffusionPipeline:
                width=width,
                use_base_vae=use_base_vae,
                use_tuned=use_tuned,
+                low_cpu_mem_usage=low_cpu_mem_usage,
            )
            if cls.__name__ in ["Image2ImagePipeline", "InpaintPipeline"]:
                clip, unet, vae, vae_encode = mlir_import()
@@ -248,6 +250,7 @@ class StableDiffusionPipeline:
                width=width,
                use_base_vae=use_base_vae,
                use_tuned=use_tuned,
+                low_cpu_mem_usage=low_cpu_mem_usage,
            )
            if cls.__name__ in ["Image2ImagePipeline", "InpaintPipeline"]:
                clip, unet, vae, vae_encode = mlir_import()
--- a/apps/stable_diffusion/src/utils/sd_annotation.py
+++ b/apps/stable_diffusion/src/utils/sd_annotation.py
@@ -70,7 +70,7 @@ def load_winograd_configs():
    config_bucket = "gs://shark_tank/sd_tuned/configs/"
    config_name = f"{args.annotation_model}_winograd_{device}.json"
    full_gs_url = config_bucket + config_name
-    winograd_config_dir = f"{WORKDIR}configs/" + config_name
+    winograd_config_dir = os.path.join(WORKDIR, "configs", config_name)
    print("Loading Winograd config file from ", winograd_config_dir)
    download_public_file(full_gs_url, winograd_config_dir, True)
    return winograd_config_dir
@@ -113,7 +113,7 @@ def load_lower_configs():
            config_name = f"{args.annotation_model}_{version}_{args.precision}_{device}_{spec}.json"

    full_gs_url = config_bucket + config_name
-    lowering_config_dir = f"{WORKDIR}configs/" + config_name
+    lowering_config_dir = os.path.join(WORKDIR, "configs", config_name)
    print("Loading lowering config file from ", lowering_config_dir)
    download_public_file(full_gs_url, lowering_config_dir, True)
    return lowering_config_dir
--- a/apps/stable_diffusion/src/utils/stable_args.py
+++ b/apps/stable_diffusion/src/utils/stable_args.py
@@ -193,6 +193,13 @@ p.add_argument(
    help="The repo-id of hugging face.",
 )

+p.add_argument(
+    "--low_cpu_mem_usage",
+    default=False,
+    action=argparse.BooleanOptionalAction,
+    help="Use the accelerate package to reduce cpu memory consumption",
+)
+
 ##############################################################################
 ### IREE - Vulkan supported flags
 ##############################################################################
--- a/build_tools/stable_diffusion_testing.py
+++ b/build_tools/stable_diffusion_testing.py
@@ -54,6 +54,8 @@ def test_loop(device="vulkan", beta=False, extra_flags=[]):
        extra_flags.append("--beta_models=True")
    for import_opt in import_options:
        for model_name in hf_model_names:
+            if model_name == "Linaqruf/anything-v3.0":
+                continue
            for use_tune in tuned_options:
                command = (
                    [
--- a/conftest.py
+++ b/conftest.py
@@ -60,3 +60,13 @@ def pytest_addoption(parser):
        default="gs://shark_tank/latest",
        help="URL to bucket from which to download SHARK tank artifacts. Default is gs://shark_tank/latest",
    )
+    parser.addoption(
+        "--benchmark_dispatches",
+        default=None,
+        help="Benchmark individual dispatch kernels produced by IREE compiler. Use 'All' for all, or specific dispatches e.g. '0 1 2 10'",
+    )
+    parser.addoption(
+        "--dispatch_benchmarks_dir",
+        default="./temp_dispatch_benchmarks",
+        help="Directory in which dispatch benchmarks are saved.",
+    )
--- a/generate_sharktank.py
+++ b/generate_sharktank.py
@@ -162,13 +162,13 @@ def save_tf_model(tf_model_list):
            tf_model_name = tf_model_name.replace("/", "_")
            tf_model_dir = os.path.join(WORKDIR, str(tf_model_name) + "_tf")
            os.makedirs(tf_model_dir, exist_ok=True)
-
            mlir_importer = SharkImporter(
                model,
-                input,
+                inputs=input,
                frontend="tf",
            )
            mlir_importer.import_debug(
+                is_dynamic=False,
                dir=tf_model_dir,
                model_name=tf_model_name,
            )
--- a/requirements-importer.txt
+++ b/requirements-importer.txt
@@ -1,7 +1,7 @@
 -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
 --pre

-numpy==1.22.4
+numpy>1.22.4
 torchvision
 pytorch-triton
 tabulate
@@ -15,8 +15,8 @@ iree-tools-tf

 # TensorFlow and JAX.
 gin-config
-tensorflow==2.10.1
-keras==2.10
+tensorflow>=2.10.1
+keras>=2.10
 #tf-models-nightly
 #tensorflow-text-nightly
 transformers
--- a/setup_venv.ps1
+++ b/setup_venv.ps1
@@ -9,10 +9,10 @@
  If that environment does not exist, it creates it.
  
 .PARAMETER update-src
-  updates to latest version from git .\source
+  git pulls latest version

 .PARAMETER force
-  removes and recreates venv to force update all dependencies
+  removes and recreates venv to force update of all dependencies
  
 .EXAMPLE
  .\setup_venv.ps1 --force
@@ -26,12 +26,6 @@
 .OUTPUTS
  None

-.NOTES
-  Version        1.0
-  Author         powderluv, xzuyn
-  Creation Date  2023-02-17
-  PurposeChange Initial script development
-
 #>

 param([string]$arguments)
@@ -56,18 +50,6 @@ if ($arguments -eq "--force"){
    }
 }

-
-#Write-Host "Installing python"
-
-#Start-Process winget install Python.Python.3.10 '/quiet InstallAllUsers=1 PrependPath=1' -wait -NoNewWindow
-
-#Write-Host "python installation completed successfully"
-
-#Write-Host "Reload environment variables"
-#$env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User")
-#Write-Host "Reloaded environment variables"
-
-
 # redirect stderr into stdout
 $p = &{python -V} 2>&1
 # check if an ErrorRecord was returned
@@ -78,19 +60,27 @@ $version = if($p -is [System.Management.Automation.ErrorRecord])
 }
 else
 {
-    # otherwise return as is
-    $p
+    # otherwise return complete Python list
+    $PyVer = py --list
 }

-Write-Host "Python version found is"
-Write-Host $p
-if ($p -notlike "*3.11*")
+# deactivate any activated venvs
+if ($PyVer -like "*venv*")
+{
+  deactivate # make sure we don't update the wrong venv
+  $PyVer = py --list # update list
+}
+
+Write-Host "Python versions found are"
+Write-Host ($PyVer | Out-String) # formatted output with line breaks
+if (!($PyVer -like "*3.11*")) # if 3.11 is not in list
 {
    Write-Host "Please install Python 3.11 and try again"
    break
 }

 Write-Host "Installing Build Dependencies"
+# make sure we really use 3.11 from list, even if it's not the default.
 py -3.11 -m venv .\shark.venv\
 .\shark.venv\Scripts\activate
 python -m pip install --upgrade pip
--- a/setup_venv.sh
+++ b/setup_venv.sh
@@ -129,7 +129,7 @@ if [[ $(uname -s) = 'Linux' && ! -z "${BENCHMARK}" ]]; then
  TV_VERSION=${TV_VER:9:18}
  $PYTHON -m pip uninstall -y torch torchvision
  $PYTHON -m pip install -U --pre --no-warn-conflicts triton
-  $PYTHON -m pip install --no-deps https://download.pytorch.org/whl/nightly/cu117/torch-${TORCH_VERSION}%2Bcu117-cp310-cp310-linux_x86_64.whl https://download.pytorch.org/whl/nightly/cu117/torchvision-${TV_VERSION}%2Bcu117-cp310-cp310-linux_x86_64.whl
+  $PYTHON -m pip install --no-deps https://download.pytorch.org/whl/nightly/cu117/torch-${TORCH_VERSION}%2Bcu117-cp311-cp311-linux_x86_64.whl https://download.pytorch.org/whl/nightly/cu117/torchvision-${TV_VERSION}%2Bcu117-cp311-cp311-linux_x86_64.whl
  if [ $? -eq 0 ];then
    echo "Successfully Installed torch + cu117."
  else
--- a/shark/iree_utils/benchmark_utils.py
+++ b/shark/iree_utils/benchmark_utils.py
@@ -139,9 +139,14 @@ def run_benchmark_module(benchmark_cl):
        benchmark_path
    ), "Cannot find benchmark_module, Please contact SHARK maintainer on discord."
    bench_result = run_cmd(" ".join(benchmark_cl))
-    print(bench_result)
-    regex_split = re.compile("(\d+[.]*\d*)(  *)([a-zA-Z]+)")
-    match = regex_split.search(bench_result)
-    time = float(match.group(1))
-    unit = match.group(3)
+    try:
+        regex_split = re.compile("(\d+[.]*\d*)(  *)([a-zA-Z]+)")
+        match = regex_split.search(bench_result)
+        time = float(match.group(1))
+        unit = match.group(3)
+    except AttributeError:
+        regex_split = re.compile("(\d+[.]*\d*)([a-zA-Z]+)")
+        match = regex_split.search(bench_result)
+        time = float(match.group(1))
+        unit = match.group(2)
    return 1.0 / (time * 0.001)
--- a/tank/test_models.py
+++ b/tank/test_models.py
@@ -137,6 +137,19 @@ class SharkModuleTester:
    def create_and_check_module(self, dynamic, device):
        shark_args.local_tank_cache = self.local_tank_cache
        shark_args.force_update_tank = self.update_tank
+        shark_args.dispatch_benchmarks = self.benchmark_dispatches
+        if self.benchmark_dispatches is not None:
+            _m = self.config["model_name"].split("/")
+            _m.extend([self.config["framework"], str(dynamic), device])
+            _m = "_".join(_m)
+            shark_args.dispatch_benchmarks_dir = os.path.join(
+                self.dispatch_benchmarks_dir,
+                _m,
+            )
+            if not os.path.exists(self.dispatch_benchmarks_dir):
+                os.mkdir(self.dispatch_benchmarks_dir)
+            if not os.path.exists(shark_args.dispatch_benchmarks_dir):
+                os.mkdir(shark_args.dispatch_benchmarks_dir)
        if "nhcw-nhwc" in self.config["flags"] and not os.path.isfile(
            ".use-iree"
        ):
@@ -278,6 +291,12 @@ class SharkModuleTest(unittest.TestCase):
            "update_tank"
        )
        self.module_tester.tank_url = self.pytestconfig.getoption("tank_url")
+        self.module_tester.benchmark_dispatches = self.pytestconfig.getoption(
+            "benchmark_dispatches"
+        )
+        self.module_tester.dispatch_benchmarks_dir = (
+            self.pytestconfig.getoption("dispatch_benchmarks_dir")
+        )

        if config["xfail_cpu"] == "True" and device == "cpu":
            pytest.xfail(reason=config["xfail_reason"])
Author	SHA1	Message	Date
cstueckrath	f01c526efd	Update setup_venv.ps1 (#1064 )	2023-02-21 14:13:04 -05:00
Gaurav Shukla	16168ab6b3	[SD] Update need_vae_encode correctly Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-02-21 20:26:06 +05:30
Gaurav Shukla	4233218629	[SD] Reset args.img_path to None in txt2img to avoid vae_encode Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-02-21 18:46:15 +05:30
RaINi_	b63fb36dc0	Use path.join for the winograd config directory (#1065 )	2023-02-20 22:04:25 -06:00
Daniel Garvey	4e92304b89	remove annoying accelerate warning (#1056 ) disables usage of low_cpu_mem_usage=True in from_pretrained() calls. Can be re-enabled by using flag --low_cpu_mem_usage defaults to False to avoid spam as we don't include accelerate in our requirements.txt	2023-02-20 14:46:26 -06:00
Ean Garvey	2ae047f1a8	Update importer/benchmark setup for python3.11 (#1043 )	2023-02-20 11:29:00 -06:00
Ean Garvey	6d2a485264	Add --benchmark_dispatches option to pytest. (#800 ) * Add --benchmark_dispatches option to pytest. * Update README.md and fix filepath for dispatch benchmarks	2023-02-19 12:16:18 -06:00
Daniel Garvey	4f045db024	disable anythingv3 until issue is resolved (#1053 )	2023-02-18 23:47:21 -05:00