Upstream bug in `transformers` breaks use of `AutoModelForMaskGeneration` class to load SAM models
Simple fix - directly load the model with `SamModel` class instead.
See upstream issue https://github.com/huggingface/transformers/issues/38228
Adds full support for managing model-to-model relationships in the UI and backend.
Introduces RelatedModels subpanel for linking and unlinking models in model management.
- Adds REST API routes for adding, removing, and retrieving model relationships.
- New database migration: creates model_relationships table for bidirectional links.
- New service layer (model_relationships) for relationship management.
- Updated frontend: Related models float to top of LoRA/Main grouped model comboboxes for quick access.
- Added 'Show Only Related' toggle badge to MainModelPicker filter bar
**Amended commit to remove changes to ParamMainModelSelect.tsx and MainModelPicker.tsx to avoid conflict with upstream deletion/ rewrite**
When we do our field type overrides to allow invocations to be instantiated without all required fields, we were not modifying the annotation of the field but did set the default value of the field to `None`.
This results in an error when doing a ser/de round trip. Here's what we end up doing:
```py
from pydantic import BaseModel, Field
class MyModel(BaseModel):
foo: str = Field(default=None)
```
And here is a simple round-trip, which should not error but which does:
```py
MyModel(**MyModel().model_dump())
# ValidationError: 1 validation error for MyModel
# foo
# Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
# For further information visit https://errors.pydantic.dev/2.11/v/string_type
```
To fix this, we now check every incoming field and update its annotation to match its default value. In other words, when we override the default field value to `None`, we make its type annotation `<original type> | None`.
This prevents the error during deserialization.
This slightly alters the schema for all invocations and outputs - the values of all fields without default values are now typed as `<original type> | None`, reflecting the overrides.
This means the autogenerated types for fields have also changed for fields without defaults:
```ts
// Old
image?: components["schemas"]["ImageField"];
// New
image?: components["schemas"]["ImageField"] | null;
```
This does not break anything on the frontend.
There is a subtle change in behaviour with the new model probe API.
Previously, checks for model types was done in a specific order. For example, we did all main model checks before LoRA checks.
With the new API, the order of checks has changed. Check ordering is as follows:
- New API checks are run first, then legacy API checks.
- New API checks categorized by their speed. When we run new API checks, we sort them from fastest to slowest, and run them in that order. This is a performance optimization.
Currently, LoRA and LLaVA models are the only model types with the new API. Checks for them are thus run first.
LoRA checks involve checking the state dict for presence of keys with specific prefixes. We expect these keys to only exist in LoRAs.
It turns out that main models may have some of these keys.
For example, this model has keys that match the LoRA prefix `lora_te_`: https://civitai.com/models/134442/helloyoung25d
Under the old probe, we'd do the main model checks first and correctly identify this as a main model. But with the new setup, we do the LoRA check first, and those pass. So we import this model as a LoRA.
Thankfully, the old probe still exists. For now, the new probe is fully disabled. It was only called in one spot.
I've also added the example affected model as a test case for the model probe. Right now, this causes the test to fail, and I've marked the test as xfail. CI will pass.
Once we enable the new API again, the xfail will pass, and CI will fail, and we'll be reminded to update the test.
In the previous commit, the LLaVA model was updated to support partial loading.
In this commit, the SigLIP model is updated in the same way.
This model is used for FLUX Redux. It's <4GB and only ever run in isolation, so it won't benefit from partial loading for the vast majority of users. Regardless, I think it is best if we make _all_ models work with partial loading.
PS: I also fixed the initial load dtype issue, described in the prev commit. It's probably a non-issue for this model, but we may as well fix it.
The model manager has two types of model cache entries:
- `CachedModelOnlyFullLoad`: The model may only ever be loaded and unloaded as a single object.
- `CachedModelWithPartialLoad`: The model may be partially loaded and unloaded.
Partial loaded is enabled by overwriting certain torch layer classes, adding the ability to autocast the layer to a device on-the-fly. See `CustomLinear` for an example.
So, to take advantage of partial loading and be cached as a `CachedModelWithPartialLoad`, the model must inherit from `torch.nn.Module`.
The LLaVA classes provided by `transformers` do inherit from `torch.nn.Module`, but we wrap those classes in a separate class called `LlavaOnevisionModel`. The wrapper encapsulate both the LLaVA model and its "processor" - a lightweight class that prepares model inputs like text and images.
While it is more elegant to encapsulate both model and processor classes in a single entity, this prevents the model cache from enabling partial loading for the chunky vLLM model.
Fixing this involved a few changes.
- Update the `LlavaOnevisionModelLoader` class to operate on the vLLM model directly, instead the `LlavaOnevisionModel` wrapper class.
- Instantiate the processor directly in the node. The processor is lightweight and does its business on the CPU. We don't need to worry about caching in the model manager.
- Remove caching support code from the `LlavaOnevisionModel` wrapper class. It's not needed, because we do not cache this class. The class now only handles running the models provided to it.
- Rename `LlavaOnevisionModel` to `LlavaOnevisionPipeline` to better represent its purpose.
These changes have a bonus effect of fixing an OOM crash when initially loading the models. This was most apparent when loading LLaVA 7B, which is pretty chunky.
The initial load is onto CPU RAM. In the old version of the loaders, we ignored the loader's target dtype for the initial load. Instead, we loaded the model at `transformers`'s "default" dtype of fp32.
LLaVA 7B is fp16 and weighs ~17GB. Loading as fp32 means we need double that amount (~34GB) of CPU RAM. Many users only have 32GB RAM, so this causes a _CPU_ OOM - which is a hard crash of the whole process.
With the updated loaders, the initial load logic now uses the target dtype for the initial load. LLaVA now needs the expected ~17GB RAM for its initial load.
PS: If we didn't make the accompanying partial loading changes, we still could have solved this OOM. We'd just need to pass the initial load dtype to the wrapper class and have it load on that dtype. But we may as well fix both issues.
PPS: There are other models whose model classes are wrappers around a torch module class, and thus cannot be partially loaded. However, these models are typically fairly small and/or are run only on their own, so they don't benefit as much from partial loading. It's the really big models (like LLaVA 7B) that benefit most from the partial loading.