chore: bump version to v5.10.1

fix(mm): disable new model probe API
There is a subtle change in behaviour with the new model probe API. Previously, checks for model types was done in a specific order. For example, we did all main model checks before LoRA checks. With the new API, the order of checks has changed. Check ordering is as follows: - New API checks are run first, then legacy API checks. - New API checks categorized by their speed. When we run new API checks, we sort them from fastest to slowest, and run them in that order. This is a performance optimization. Currently, LoRA and LLaVA models are the only model types with the new API. Checks for them are thus run first. LoRA checks involve checking the state dict for presence of keys with specific prefixes. We expect these keys to only exist in LoRAs. It turns out that main models may have some of these keys. For example, this model has keys that match the LoRA prefix `lora_te_`: https://civitai.com/models/134442/helloyoung25d Under the old probe, we'd do the main model checks first and correctly identify this as a main model. But with the new setup, we do the LoRA check first, and those pass. So we import this model as a LoRA. Thankfully, the old probe still exists. For now, the new probe is fully disabled. It was only called in one spot. I've also added the example affected model as a test case for the model probe. Right now, this causes the test to fail, and I've marked the test as xfail. CI will pass. Once we enable the new API again, the xfail will pass, and CI will fail, and we'll be reminded to update the test.
2026-01-22 21:08:08 -05:00 · 2025-04-19 00:05:02 +10:00 · 2025-04-18 22:44:10 +10:00 · 2025-04-18 10:12:03 +10:00 · 2025-04-18 10:12:03 +10:00 · 2025-04-18 10:12:03 +10:00
13 changed files with 77 additions and 73 deletions
--- a/invokeai/app/invocations/flux_redux.py
+++ b/invokeai/app/invocations/flux_redux.py
@@ -3,6 +3,7 @@ from typing import Literal, Optional

 import torch
 from PIL import Image
+from transformers import SiglipImageProcessor, SiglipVisionModel

 from invokeai.app.invocations.baseinvocation import (
    BaseInvocation,
@@ -115,8 +116,14 @@ class FluxReduxInvocation(BaseInvocation):
    @torch.no_grad()
    def _siglip_encode(self, context: InvocationContext, image: Image.Image) -> torch.Tensor:
        siglip_model_config = self._get_siglip_model(context)
-        with context.models.load(siglip_model_config.key).model_on_device() as (_, siglip_pipeline):
-            assert isinstance(siglip_pipeline, SigLipPipeline)
+        with context.models.load(siglip_model_config.key).model_on_device() as (_, model):
+            assert isinstance(model, SiglipVisionModel)
+
+            model_abs_path = context.models.get_absolute_path(siglip_model_config)
+            processor = SiglipImageProcessor.from_pretrained(model_abs_path, local_files_only=True)
+            assert isinstance(processor, SiglipImageProcessor)
+
+            siglip_pipeline = SigLipPipeline(processor, model)
            return siglip_pipeline.encode_image(
                x=image, device=TorchDevice.choose_torch_device(), dtype=TorchDevice.choose_torch_dtype()
            )
--- a/invokeai/app/invocations/llava_onevision_vllm.py
+++ b/invokeai/app/invocations/llava_onevision_vllm.py
@@ -3,13 +3,14 @@ from typing import Any
 import torch
 from PIL.Image import Image
 from pydantic import field_validator
+from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration, LlavaOnevisionProcessor

 from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
 from invokeai.app.invocations.fields import FieldDescriptions, ImageField, InputField, UIComponent, UIType
 from invokeai.app.invocations.model import ModelIdentifierField
 from invokeai.app.invocations.primitives import StringOutput
 from invokeai.app.services.shared.invocation_context import InvocationContext
-from invokeai.backend.llava_onevision_model import LlavaOnevisionModel
+from invokeai.backend.llava_onevision_pipeline import LlavaOnevisionPipeline
 from invokeai.backend.util.devices import TorchDevice


@@ -54,10 +55,17 @@ class LlavaOnevisionVllmInvocation(BaseInvocation):
    @torch.no_grad()
    def invoke(self, context: InvocationContext) -> StringOutput:
        images = self._get_images(context)
+        model_config = context.models.get_config(self.vllm_model)

-        with context.models.load(self.vllm_model) as vllm_model:
-            assert isinstance(vllm_model, LlavaOnevisionModel)
-            output = vllm_model.run(
+        with context.models.load(self.vllm_model).model_on_device() as (_, model):
+            assert isinstance(model, LlavaOnevisionForConditionalGeneration)
+
+            model_abs_path = context.models.get_absolute_path(model_config)
+            processor = AutoProcessor.from_pretrained(model_abs_path, local_files_only=True)
+            assert isinstance(processor, LlavaOnevisionProcessor)
+
+            model = LlavaOnevisionPipeline(model, processor)
+            output = model.run(
                prompt=self.prompt,
                images=images,
                device=TorchDevice.choose_torch_device(),
--- a/invokeai/app/services/model_install/model_install_default.py
+++ b/invokeai/app/services/model_install/model_install_default.py
@@ -38,7 +38,6 @@ from invokeai.backend.model_manager.config import (
    AnyModelConfig,
    CheckpointConfigBase,
    InvalidModelConfigException,
-    ModelConfigBase,
 )
 from invokeai.backend.model_manager.legacy_probe import ModelProbe
 from invokeai.backend.model_manager.metadata import (
@@ -647,10 +646,14 @@ class ModelInstallService(ModelInstallServiceBase):
        hash_algo = self._app_config.hashing_algorithm
        fields = config.model_dump()

-        try:
-            return ModelConfigBase.classify(model_path=model_path, hash_algo=hash_algo, **fields)
-        except InvalidModelConfigException:
-            return ModelProbe.probe(model_path=model_path, fields=fields, hash_algo=hash_algo)  # type: ignore
+        return ModelProbe.probe(model_path=model_path, fields=fields, hash_algo=hash_algo)
+
+        # New model probe API is disabled pending resolution of issue caused by a change of the ordering of checks.
+        # See commit message for details.
+        # try:
+        #     return ModelConfigBase.classify(model_path=model_path, hash_algo=hash_algo, **fields)
+        # except InvalidModelConfigException:
+        #     return ModelProbe.probe(model_path=model_path, fields=fields, hash_algo=hash_algo)  # type: ignore

    def _register(
        self, model_path: Path, config: Optional[ModelRecordChanges] = None, info: Optional[AnyModelConfig] = None
--- a/invokeai/app/services/shared/invocation_context.py
+++ b/invokeai/app/services/shared/invocation_context.py
@@ -21,6 +21,7 @@ from invokeai.app.services.shared.sqlite.sqlite_common import SQLiteDirection
 from invokeai.app.util.step_callback import diffusion_step_callback
 from invokeai.backend.model_manager.config import (
    AnyModelConfig,
+    ModelConfigBase,
 )
 from invokeai.backend.model_manager.load.load_base import LoadedModel, LoadedModelWithoutConfig
 from invokeai.backend.model_manager.taxonomy import AnyModel, BaseModelType, ModelFormat, ModelType, SubModelType
@@ -543,6 +544,30 @@ class ModelsInterface(InvocationContextInterface):
        self._util.signal_progress(f"Loading model {source}")
        return self._services.model_manager.load.load_model_from_path(model_path=model_path, loader=loader)

+    def get_absolute_path(self, config_or_path: AnyModelConfig | Path | str) -> Path:
+        """Gets the absolute path for a given model config or path.
+
+        For example, if the model's path is `flux/main/FLUX Dev.safetensors`, and the models path is
+        `/home/username/InvokeAI/models`, this method will return
+        `/home/username/InvokeAI/models/flux/main/FLUX Dev.safetensors`.
+
+        Args:
+            config_or_path: The model config or path.
+
+        Returns:
+            The absolute path to the model.
+        """
+
+        model_path = Path(config_or_path.path) if isinstance(config_or_path, ModelConfigBase) else Path(config_or_path)
+
+        if model_path.is_absolute():
+            return model_path.resolve()
+
+        base_models_path = self._services.configuration.models_path
+        joined_path = base_models_path / model_path
+        resolved_path = joined_path.resolve()
+        return resolved_path
+

 class ConfigInterface(InvocationContextInterface):
    def get(self) -> InvokeAIAppConfig:
--- a/invokeai/backend/llava_onevision_pipeline.py
+++ b/invokeai/backend/llava_onevision_pipeline.py
@@ -1,26 +1,15 @@
-from pathlib import Path
-from typing import Optional
-
 import torch
 from PIL.Image import Image
-from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration, LlavaOnevisionProcessor
-
-from invokeai.backend.raw_model import RawModel
+from transformers import LlavaOnevisionForConditionalGeneration, LlavaOnevisionProcessor


-class LlavaOnevisionModel(RawModel):
+class LlavaOnevisionPipeline:
+    """A wrapper for a LLaVA Onevision model + processor."""
+
    def __init__(self, vllm_model: LlavaOnevisionForConditionalGeneration, processor: LlavaOnevisionProcessor):
        self._vllm_model = vllm_model
        self._processor = processor

-    @classmethod
-    def load_from_path(cls, path: str | Path):
-        vllm_model = LlavaOnevisionForConditionalGeneration.from_pretrained(path, local_files_only=True)
-        assert isinstance(vllm_model, LlavaOnevisionForConditionalGeneration)
-        processor = AutoProcessor.from_pretrained(path, local_files_only=True)
-        assert isinstance(processor, LlavaOnevisionProcessor)
-        return cls(vllm_model, processor)
-
    def run(self, prompt: str, images: list[Image], device: torch.device, dtype: torch.dtype) -> str:
        # TODO(ryand): Tune the max number of images that are useful for the model.
        if len(images) > 3:
@@ -44,13 +33,3 @@ class LlavaOnevisionModel(RawModel):
        # The output_str will include the prompt, so we extract the response.
        response = output_str.split("assistant\n", 1)[1].strip()
        return response
-
-    def to(self, device: Optional[torch.device] = None, dtype: Optional[torch.dtype] = None) -> None:
-        self._vllm_model.to(device=device, dtype=dtype)
-
-    def calc_size(self) -> int:
-        """Get size of the model in memory in bytes."""
-        # HACK(ryand): Fix this issue with circular imports.
-        from invokeai.backend.model_manager.load.model_util import calc_module_size
-
-        return calc_module_size(self._vllm_model)
--- a/invokeai/backend/model_manager/load/model_loaders/llava_onevision.py
+++ b/invokeai/backend/model_manager/load/model_loaders/llava_onevision.py
@@ -1,7 +1,8 @@
 from pathlib import Path
 from typing import Optional

-from invokeai.backend.llava_onevision_model import LlavaOnevisionModel
+from transformers import LlavaOnevisionForConditionalGeneration
+
 from invokeai.backend.model_manager.config import (
    AnyModelConfig,
 )
@@ -23,6 +24,8 @@ class LlavaOnevisionModelLoader(ModelLoader):
            raise ValueError("Unexpected submodel requested for LLaVA OneVision model.")

        model_path = Path(config.path)
-        model = LlavaOnevisionModel.load_from_path(model_path)
-        model.to(dtype=self._torch_dtype)
+        model = LlavaOnevisionForConditionalGeneration.from_pretrained(
+            model_path, local_files_only=True, torch_dtype=self._torch_dtype
+        )
+        assert isinstance(model, LlavaOnevisionForConditionalGeneration)
        return model
--- a/invokeai/backend/model_manager/load/model_loaders/sig_lip_pipeline.py
+++ b/invokeai/backend/model_manager/load/model_loaders/sig_lip_pipeline.py
@@ -1,13 +1,14 @@
 from pathlib import Path
 from typing import Optional

+from transformers import SiglipVisionModel
+
 from invokeai.backend.model_manager.config import (
    AnyModelConfig,
 )
 from invokeai.backend.model_manager.load.load_default import ModelLoader
 from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry
 from invokeai.backend.model_manager.taxonomy import AnyModel, BaseModelType, ModelFormat, ModelType, SubModelType
-from invokeai.backend.sig_lip.sig_lip_pipeline import SigLipPipeline


@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.SigLIP, format=ModelFormat.Diffusers)
@@ -23,6 +24,5 @@ class SigLIPModelLoader(ModelLoader):
            raise ValueError("Unexpected submodel requested for LLaVA OneVision model.")

        model_path = Path(config.path)
-        model = SigLipPipeline.load_from_path(model_path)
-        model.to(dtype=self._torch_dtype)
+        model = SiglipVisionModel.from_pretrained(model_path, local_files_only=True, torch_dtype=self._torch_dtype)
        return model
--- a/invokeai/backend/model_manager/load/model_util.py
+++ b/invokeai/backend/model_manager/load/model_util.py
@@ -16,11 +16,9 @@ from invokeai.backend.image_util.depth_anything.depth_anything_pipeline import D
 from invokeai.backend.image_util.grounding_dino.grounding_dino_pipeline import GroundingDinoPipeline
 from invokeai.backend.image_util.segment_anything.segment_anything_pipeline import SegmentAnythingPipeline
 from invokeai.backend.ip_adapter.ip_adapter import IPAdapter
-from invokeai.backend.llava_onevision_model import LlavaOnevisionModel
 from invokeai.backend.model_manager.taxonomy import AnyModel
 from invokeai.backend.onnx.onnx_runtime import IAIOnnxRuntimeModel
 from invokeai.backend.patches.model_patch_raw import ModelPatchRaw
-from invokeai.backend.sig_lip.sig_lip_pipeline import SigLipPipeline
 from invokeai.backend.spandrel_image_to_image_model import SpandrelImageToImageModel
 from invokeai.backend.textual_inversion import TextualInversionModelRaw
 from invokeai.backend.util.calc_tensor_size import calc_tensor_size
@@ -51,8 +49,6 @@ def calc_model_size_by_data(logger: logging.Logger, model: AnyModel) -> int:
            GroundingDinoPipeline,
            SegmentAnythingPipeline,
            DepthAnythingPipeline,
-            SigLipPipeline,
-            LlavaOnevisionModel,
        ),
    ):
        return model.calc_size()
--- a/invokeai/backend/sig_lip/sig_lip_pipeline.py
+++ b/invokeai/backend/sig_lip/sig_lip_pipeline.py
@@ -1,14 +1,9 @@
-from pathlib import Path
-from typing import Optional
-
 import torch
 from PIL import Image
 from transformers import SiglipImageProcessor, SiglipVisionModel

-from invokeai.backend.raw_model import RawModel

-
-class SigLipPipeline(RawModel):
+class SigLipPipeline:
    """A wrapper for a SigLIP model + processor."""

    def __init__(
@@ -19,25 +14,7 @@ class SigLipPipeline(RawModel):
        self._siglip_processor = siglip_processor
        self._siglip_model = siglip_model

-    @classmethod
-    def load_from_path(cls, path: str | Path):
-        siglip_model = SiglipVisionModel.from_pretrained(path, local_files_only=True)
-        assert isinstance(siglip_model, SiglipVisionModel)
-        siglip_processor = SiglipImageProcessor.from_pretrained(path, local_files_only=True)
-        assert isinstance(siglip_processor, SiglipImageProcessor)
-        return cls(siglip_processor, siglip_model)
-
-    def to(self, device: Optional[torch.device] = None, dtype: Optional[torch.dtype] = None) -> None:
-        self._siglip_model.to(device=device, dtype=dtype)
-
    def encode_image(self, x: Image.Image, device: torch.device, dtype: torch.dtype) -> torch.Tensor:
        imgs = self._siglip_processor.preprocess(images=[x], do_resize=True, return_tensors="pt", do_convert_rgb=True)
        encoded_x = self._siglip_model(**imgs.to(device=device, dtype=dtype)).last_hidden_state
        return encoded_x
-
-    def calc_size(self) -> int:
-        """Get size of the model in memory in bytes."""
-        # HACK(ryand): Fix this issue with circular imports.
-        from invokeai.backend.model_manager.load.model_util import calc_module_size
-
-        return calc_module_size(self._siglip_model)
--- a/invokeai/frontend/web/src/features/gallery/components/ImageMetadataViewer/DataViewer.tsx
+++ b/invokeai/frontend/web/src/features/gallery/components/ImageMetadataViewer/DataViewer.tsx
@@ -1,5 +1,5 @@
 import type { FlexProps } from '@invoke-ai/ui-library';
-import { Box, Flex, IconButton, Tooltip, useShiftModifier } from '@invoke-ai/ui-library';
+import { Box, chakra, Flex, IconButton, Tooltip, useShiftModifier } from '@invoke-ai/ui-library';
 import { getOverlayScrollbarsParams } from 'common/components/OverlayScrollbars/constants';
 import { useClipboard } from 'common/hooks/useClipboard';
 import { Formatter } from 'fracturedjsonjs';
@@ -26,6 +26,8 @@ const overlayscrollbarsOptions = getOverlayScrollbarsParams({
  overflowY: 'scroll',
 }).options;

+const ChakraPre = chakra('pre');
+
 const DataViewer = (props: Props) => {
  const { label, data, fileName, withDownload = true, withCopy = true, extraCopyActions, ...rest } = props;
  const dataString = useMemo(() => (isString(data) ? data : formatter.Serialize(data)) ?? '', [data]);
@@ -51,7 +53,7 @@ const DataViewer = (props: Props) => {
    <Flex bg="base.800" borderRadius="base" flexGrow={1} w="full" h="full" position="relative" {...rest}>
      <Box position="absolute" top={0} left={0} right={0} bottom={0} overflow="auto" p={2} fontSize="sm">
        <OverlayScrollbarsComponent defer style={overlayScrollbarsStyles} options={overlayscrollbarsOptions}>
-          <pre>{dataString}</pre>
+          <ChakraPre whiteSpace="pre-wrap">{dataString}</ChakraPre>
        </OverlayScrollbarsComponent>
      </Box>
      <Flex position="absolute" top={0} insetInlineEnd={0} p={2}>
--- a/invokeai/version/invokeai_version.py
+++ b/invokeai/version/invokeai_version.py
@@ -1 +1 @@
-__version__ = "5.10.0"
+__version__ = "5.10.1"
--- a/tests/test_model_probe.py
+++ b/tests/test_model_probe.py
@@ -137,6 +137,7 @@ def test_minimal_working_example(datadir: Path):
    assert config.fun_quote == "Minimal working example of a ModelConfigBase subclass"


+@pytest.mark.xfail(reason="Known issue with 'helloyoung25d_V15j.safetensors'.", strict=True)
 def test_regression_against_model_probe(datadir: Path, override_model_loading):
    """Verifies results from ModelConfigBase.classify are consistent with those from ModelProbe.probe.
    The test paths are gathered from the 'test_model_probe' directory.
--- a/tests/test_model_probe/stripped_models/helloyoung25d_V15j.safetensors
+++ b/tests/test_model_probe/stripped_models/helloyoung25d_V15j.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0f0547f89bdcbb0dfd8b6ff1d8de63336df20107e9a27afc0934e8d3cce584d7
+size 308563
Author	SHA1	Message	Date
psychedelicious	298444f2bc	chore: bump version to v5.10.1	2025-04-19 00:05:02 +10:00
psychedelicious	deb1984289	fix(mm): disable new model probe API There is a subtle change in behaviour with the new model probe API. Previously, checks for model types was done in a specific order. For example, we did all main model checks before LoRA checks. With the new API, the order of checks has changed. Check ordering is as follows: - New API checks are run first, then legacy API checks. - New API checks categorized by their speed. When we run new API checks, we sort them from fastest to slowest, and run them in that order. This is a performance optimization. Currently, LoRA and LLaVA models are the only model types with the new API. Checks for them are thus run first. LoRA checks involve checking the state dict for presence of keys with specific prefixes. We expect these keys to only exist in LoRAs. It turns out that main models may have some of these keys. For example, this model has keys that match the LoRA prefix `lora_te_`: https://civitai.com/models/134442/helloyoung25d Under the old probe, we'd do the main model checks first and correctly identify this as a main model. But with the new setup, we do the LoRA check first, and those pass. So we import this model as a LoRA. Thankfully, the old probe still exists. For now, the new probe is fully disabled. It was only called in one spot. I've also added the example affected model as a test case for the model probe. Right now, this causes the test to fail, and I've marked the test as xfail. CI will pass. Once we enable the new API again, the xfail will pass, and CI will fail, and we'll be reminded to update the test.	2025-04-18 22:44:10 +10:00
psychedelicious	814406d98a	feat(mm): siglip model loading supports partial loading In the previous commit, the LLaVA model was updated to support partial loading. In this commit, the SigLIP model is updated in the same way. This model is used for FLUX Redux. It's <4GB and only ever run in isolation, so it won't benefit from partial loading for the vast majority of users. Regardless, I think it is best if we make _all_ models work with partial loading. PS: I also fixed the initial load dtype issue, described in the prev commit. It's probably a non-issue for this model, but we may as well fix it.	2025-04-18 10:12:03 +10:00
psychedelicious	c054501103	feat(mm): llava model loading supports partial loading; fix OOM crash on initial load The model manager has two types of model cache entries: - `CachedModelOnlyFullLoad`: The model may only ever be loaded and unloaded as a single object. - `CachedModelWithPartialLoad`: The model may be partially loaded and unloaded. Partial loaded is enabled by overwriting certain torch layer classes, adding the ability to autocast the layer to a device on-the-fly. See `CustomLinear` for an example. So, to take advantage of partial loading and be cached as a `CachedModelWithPartialLoad`, the model must inherit from `torch.nn.Module`. The LLaVA classes provided by `transformers` do inherit from `torch.nn.Module`, but we wrap those classes in a separate class called `LlavaOnevisionModel`. The wrapper encapsulate both the LLaVA model and its "processor" - a lightweight class that prepares model inputs like text and images. While it is more elegant to encapsulate both model and processor classes in a single entity, this prevents the model cache from enabling partial loading for the chunky vLLM model. Fixing this involved a few changes. - Update the `LlavaOnevisionModelLoader` class to operate on the vLLM model directly, instead the `LlavaOnevisionModel` wrapper class. - Instantiate the processor directly in the node. The processor is lightweight and does its business on the CPU. We don't need to worry about caching in the model manager. - Remove caching support code from the `LlavaOnevisionModel` wrapper class. It's not needed, because we do not cache this class. The class now only handles running the models provided to it. - Rename `LlavaOnevisionModel` to `LlavaOnevisionPipeline` to better represent its purpose. These changes have a bonus effect of fixing an OOM crash when initially loading the models. This was most apparent when loading LLaVA 7B, which is pretty chunky. The initial load is onto CPU RAM. In the old version of the loaders, we ignored the loader's target dtype for the initial load. Instead, we loaded the model at `transformers`'s "default" dtype of fp32. LLaVA 7B is fp16 and weighs ~17GB. Loading as fp32 means we need double that amount (~34GB) of CPU RAM. Many users only have 32GB RAM, so this causes a _CPU_ OOM - which is a hard crash of the whole process. With the updated loaders, the initial load logic now uses the target dtype for the initial load. LLaVA now needs the expected ~17GB RAM for its initial load. PS: If we didn't make the accompanying partial loading changes, we still could have solved this OOM. We'd just need to pass the initial load dtype to the wrapper class and have it load on that dtype. But we may as well fix both issues. PPS: There are other models whose model classes are wrappers around a torch module class, and thus cannot be partially loaded. However, these models are typically fairly small and/or are run only on their own, so they don't benefit as much from partial loading. It's the really big models (like LLaVA 7B) that benefit most from the partial loading.	2025-04-18 10:12:03 +10:00
psychedelicious	c1d819c7e5	feat(nodes): add get_absolute_path method to context.models API Given a model config or path (presumably to a model), returns the absolute path to the model. Check the next few commits for use-case.	2025-04-18 10:12:03 +10:00
psychedelicious	2a8e91f94d	feat(ui): wrap JSON in dataviewer	2025-04-17 22:55:04 +10:00