35 Commits

Author SHA1 Message Date
Lincoln Stein
47a634d8fb fix(naming style) change name of model_cache_keep_alive to model_cache_keep_alive_min 2026-01-04 17:36:55 -05:00
copilot-swe-agent[bot]
8d76b4e4d4 Fix ruff whitespace errors and improve timeout logging
- Remove all trailing whitespace (W293 errors)
- Add debug logging when timeout fires but activity detected
- Add debug logging when timeout fires but cache is empty
- Only log "Clearing model cache" message when actually clearing
- Prevents misleading timeout messages during active generation

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
2025-12-24 04:05:57 +00:00
copilot-swe-agent[bot]
b16717bbf8 Explicitly pass all ModelCache constructor parameters
- Add explicit storage_device parameter (cpu)
- Add explicit log_memory_usage parameter from config
- Improves code clarity and configuration transparency

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
2025-12-24 00:30:51 +00:00
copilot-swe-agent[bot]
9bbd2b3f11 Add model_cache_keep_alive config option and timeout mechanism
- Added model_cache_keep_alive config field (minutes, default 0 = infinite)
- Implemented timeout tracking in ModelCache class
- Added _record_activity() to track model usage
- Added _on_timeout() to auto-clear cache when timeout expires
- Added shutdown() method to clean up timers
- Integrated timeout with get(), lock(), unlock(), and put() operations
- Updated ModelManagerService to pass keep_alive parameter
- Added cleanup in stop() method

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
2025-12-24 00:22:59 +00:00
psychedelicious
454d05bbde refactor: model manager v3 (#8607)
* feat(mm): add UnknownModelConfig

* refactor(ui): move model categorisation-ish logic to central location, simplify model manager models list

* refactor(ui)refactor(ui): more cleanup of model categories

* refactor(ui): remove unused excludeSubmodels

I can't remember what this was for and don't see any reference to it.
Maybe it's just remnants from a previous implementation?

* feat(nodes): add unknown as model base

* chore(ui): typegen

* feat(ui): add unknown model base support in ui

* feat(ui): allow changing model type in MM, fix up base and variant selects

* feat(mm): omit model description instead of making it "base type filename model"

* feat(app): add setting to allow unknown models

* feat(ui): allow changing model format in MM

* feat(app): add the installed model config to install complete events

* chore(ui): typegen

* feat(ui): toast warning when installed model is unidentified

* docs: update config docstrings

* chore(ui): typegen

* tests(mm): fix test for MM, leave the UnknownModelConfig class in the list of configs

* tidy(ui): prefer types from zod schemas for model attrs

* chore(ui): lint

* fix(ui): wrong translation string

* feat(mm): normalized model storage

Store models in a flat directory structure. Each model is in a dir named
its unique key (a UUID). Inside that dir is either the model file or the
model dir.

* feat(mm): add migration to flat model storage

* fix(mm): normalized multi-file/diffusers model installation no worky

now worky

* refactor: port MM probes to new api

- Add concept of match certainty to new probe
- Port CLIP Embed models to new API
- Fiddle with stuff

* feat(mm): port TIs to new API

* tidy(mm): remove unused probes

* feat(mm): port spandrel to new API

* fix(mm): parsing for spandrel

* fix(mm): loader for clip embed

* fix(mm): tis use existing weight_files method

* feat(mm): port vae to new API

* fix(mm): vae class inheritance and config_path

* tidy(mm): patcher types and import paths

* feat(mm): better errors when invalid model config found in db

* feat(mm): port t5 to new API

* feat(mm): make config_path optional

* refactor(mm): simplify model classification process

Previously, we had a multi-phase strategy to identify models from their
files on disk:
1. Run each model config classes' `matches()` method on the files. It
checks if the model could possibly be an identified as the candidate
model type. This was intended to be a quick check. Break on the first
match.
2. If we have a match, run the config class's `parse()` method. It
derive some additional model config attrs from the model files. This was
intended to encapsulate heavier operations that may require loading the
model into memory.
3. Derive the common model config attrs, like name, description,
calculate the hash, etc. Some of these are also heavier operations.

This strategy has some issues:
- It is not clear how the pieces fit together. There is some
back-and-forth between different methods and the config base class. It
is hard to trace the flow of logic until you fully wrap your head around
the system and therefore difficult to add a model architecture to the
probe.
- The assumption that we could do quick, lightweight checks before
heavier checks is incorrect. We often _must_ load the model state dict
in the `matches()` method. So there is no practical perf benefit to
splitting up the responsibility of `matches()` and `parse()`.
- Sometimes we need to do the same checks in `matches()` and `parse()`.
In these cases, splitting the logic is has a negative perf impact
because we are doing the same work twice.
- As we introduce the concept of an "unknown" model config (i.e. a model
that we cannot identify, but still record in the db; see #8582), we will
_always_ run _all_ the checks for every model. Therefore we need not try
to defer heavier checks or resource-intensive ops like hashing. We are
going to do them anyways.
- There are situations where a model may match multiple configs. One
known case are SD pipeline models with merged LoRAs. In the old probe
API, we relied on the implicit order of checks to know that if a model
matched for pipeline _and_ LoRA, we prefer the pipeline match. But, in
the new API, we do not have this implicit ordering of checks. To resolve
this in a resilient way, we need to get all matches up front, then use
tie-breaker logic to figure out which should win (or add "differential
diagnosis" logic to the matchers).
- Field overrides weren't handled well by this strategy. They were only
applied at the very end, if a model matched successfully. This means we
cannot tell the system "Hey, this model is type X with base Y. Trust me
bro.". We cannot override the match logic. As we move towards letting
users correct mis-identified models (see #8582), this is a requirement.

We can simplify the process significantly and better support "unknown"
models.

Firstly, model config classes now have a single `from_model_on_disk()`
method that attempts to construct an instance of the class from the
model files. This replaces the `matches()` and `parse()` methods.

If we fail to create the config instance, a special exception is raised
that indicates why we think the files cannot be identified as the given
model config class.

Next, the flow for model identification is a bit simpler:
- Derive all the common fields up-front (name, desc, hash, etc).
- Merge in overrides.
- Call `from_model_on_disk()` for every config class, passing in the
fields. Overrides are handled in this method.
- Record the results for each config class and choose the best one.

The identification logic is a bit more verbose, with the special
exceptions and handling of overrides, but it is very clear what is
happening.

The one downside I can think of for this strategy is we do need to check
every model type, instead of stopping at the first match. It's a bit
less efficient. In practice, however, this isn't a hot code path, and
the improved clarity is worth far more than perf optimizations that the
end user will likely never notice.

* refactor(mm): remove unused methods in config.py

* refactor(mm): add model config parsing utils

* fix(mm): abstractmethod bork

* tidy(mm): clarify that model id utils are private

* fix(mm): fall back to UnknownModelConfig correctly

* feat(mm): port CLIPVisionDiffusersConfig to new api

* feat(mm): port SigLIPDiffusersConfig to new api

* feat(mm): make match helpers more succint

* feat(mm): port flux redux to new api

* feat(mm): port ip adapter to new api

* tidy(mm): skip optimistic override handling for now

* refactor(mm): continue iterating on config

* feat(mm): port flux "control lora" and t2i adapter to new api

* tidy(ui): use Extract to get model config types

* fix(mm): t2i base determination

* feat(mm): port cnet to new api

* refactor(mm): add config validation utils, make it all consistent and clean

* feat(mm): wip port of main models to new api

* feat(mm): wip port of main models to new api

* feat(mm): wip port of main models to new api

* docs(mm): add todos

* tidy(mm): removed unused model merge class

* feat(mm): wip port main models to new api

* tidy(mm): clean up model heuristic utils

* tidy(mm): clean up ModelOnDisk caching

* tidy(mm): flux lora format util

* refactor(mm): make config classes narrow

Simpler logic to identify, less complexity to add new model, fewer
useless attrs that do not relate to the model arch, etc

* refactor(mm): diffusers loras

w

* feat(mm): consistent naming for all model config classes

* fix(mm): tag generation & scattered probe fixes

* tidy(mm): consistent class names

* refactor(mm): split configs into separate files

* docs(mm): add comments for identification utils

* chore(ui): typegen

* refactor(mm): remove legacy probe, new configs dir structure, update imports

* fix(mm): inverted condition

* docs(mm): update docsstrings in factory.py

* docs(mm): document flux variant attr

* feat(mm): add helper method for legacy configs

* feat(mm): satisfy type checker in flux denoise

* docs(mm): remove extraneous comment

* fix(mm): ensure unknown model configs get unknown attrs

* fix(mm): t5 identification

* fix(mm): sdxl ip adapter identification

* feat(mm): more flexible config matching utils

* fix(mm): clip vision identification

* feat(mm): add sanity checks before probing paths

* docs(mm): add reminder for self for field migrations

* feat(mm): clearer naming for main config class hierarchy

* feat(mm): fix clip vision starter model bases, add ref to actual models

* feat(mm): add model config schema migration logic

* fix(mm): duplicate import

* refactor(mm): split big migration into 3

Split the big migration that did all of these things into 3:

- Migration 22: Remove unique contraint on base/name/type in models
table
- Migration 23: Migrate configs to v6.8.0 schemas
- Migration 24: Normalize file storage

* fix(mm): pop base/type/format when creating unknown model config

* fix(db): migration 22 insert only real cols

* fix(db): migration 23 fall back to unknown model when config change fails

* feat(db): run migrations 23 and 24

* fix(mm): false negative on flux lora

* fix(mm): vae checkpoint probe checking for dir instead of file

* fix(mm): ModelOnDisk skips dirs when looking for weights

Previously a path w/ any of the known weights suffixes would be seen as
a weights file, even if it was a directory. We now check to ensure the
candidate path is actually a file before adding it to the list of
weights.

* feat(mm): add method to get main model defaults from a base

* feat(mm): do not log when multiple non-unknown model matches

* refactor(mm): continued iteration on model identifcation

* tests(mm): refactor model identification tests

Overhaul of model identification (probing) tests. Previously we didn't
test the correctness of probing except in a few narrow cases - now we
do.

See tests/model_identification/README.md for a detailed overview of the
new test setup. It includes instructions for adding a new test case. In
brief:

- Download the model you want to add as a test case
- Run a script against it to generate the test model files
- Fill in the expected model type/format/base/etc in the generated test
metadata JSON file

Included test cases:
- All starter models
- A handful of other models that I had installed
- Models present in the previous test cases as smoke tests, now also
tested for correctness

* fix(mm): omit type/format/base when creating unknown config instance

* feat(mm): use ValueError for model id sanity checks

* feat(mm): add flag for updating models to allow class changes

* tests(mm): fix remaining MM tests

* feat: allow users to edit models freely

* feat(ui): add warning for model settings edit

* tests(mm): flux state dict tests

* tidy: remove unused file

* fix(mm): lora state dict loading in model id

* feat(ui): use translation string for model edit warning

* docs(db): update version numbers in migration comments

* chore: bump version to v6.9.0a1

* docs: update model id readme

* tests(mm): attempt to fix windows model id tests

* fix(mm): issue with deleting single file models

* feat(mm): just delete the dir w/ rmtree when deleting model

* tests(mm): windows CI issue

* fix(ui): typegen schema sync

* fix(mm): fixes for migration 23

- Handle CLIP Embed and Main SD models missing variant field
- Handle errors when calling the discriminator function, previously only
handled ValidationError but it could be a ValueError or something else
- Better logging for config migration

* chore: bump version to v6.9.0a2

* chore: bump version to v6.9.0a3
2025-10-15 10:18:53 +11:00
Billy
182580ff69 Imports 2025-03-26 12:55:10 +11:00
Ryan Dick
36a3869af0 Add keep_ram_copy_of_weights config option. 2025-01-16 15:35:25 +00:00
Ryan Dick
974b4671b1 Deprecate the ram and vram configs to make the migration to dynamic
memory limits smoother for users who had previously overriden these
values.
2025-01-07 16:45:29 +00:00
Ryan Dick
a167632f09 Calculate model cache size limits dynamically based on the available RAM / VRAM. 2025-01-07 01:14:20 +00:00
Ryan Dick
d0bfa019be Add 'enable_partial_loading' config flag. 2025-01-07 00:31:00 +00:00
Ryan Dick
535e45cedf First pass at adding partial loading support to the ModelCache. 2025-01-07 00:30:58 +00:00
Ryan Dick
d30a9ced38 Rename model_cache_default.py -> model_cache.py. 2024-12-24 14:23:18 +00:00
Ryan Dick
e0bfa6157b Remove ModelCacheBase. 2024-12-24 14:23:18 +00:00
Ryan Dick
1d449097cc Apply ruff rule to disallow all relative imports. 2024-07-04 09:35:37 -04:00
Ryan Dick
9da5925287 Add ruff rule to disallow relative parent imports. 2024-07-04 09:35:37 -04:00
Lincoln Stein
3e0fb45dd7 Load single-file checkpoints directly without conversion (#6510)
* use model_class.load_singlefile() instead of converting; works, but performance is poor

* adjust the convert api - not right just yet

* working, needs sql migrator update

* rename migration_11 before conflict merge with main

* Update invokeai/backend/model_manager/load/model_loaders/stable_diffusion.py

Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>

* Update invokeai/backend/model_manager/load/model_loaders/stable_diffusion.py

Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>

* implement lightweight version-by-version config migration

* simplified config schema migration code

* associate sdxl config with sdxl VAEs

* remove use of original_config_file in load_single_file()

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>
2024-06-27 17:31:28 -04:00
Lincoln Stein
e93f4d632d [util] Add generic torch device class (#6174)
* introduce new abstraction layer for GPU devices

* add unit test for device abstraction

* fix ruff

* convert TorchDeviceSelect into a stateless class

* move logic to select context-specific execution device into context API

* add mock hardware environments to pytest

* remove dangling mocker fixture

* fix unit test for running on non-CUDA systems

* remove unimplemented get_execution_device() call

* remove autocast precision

* Multiple changes:

1. Remove TorchDeviceSelect.get_execution_device(), as well as calls to
   context.models.get_execution_device().
2. Rename TorchDeviceSelect to TorchDevice
3. Added back the legacy public API defined in `invocation_api`, including
   choose_precision().
4. Added a config file migration script to accommodate removal of precision=autocast.

* add deprecation warnings to choose_torch_device() and choose_precision()

* fix test crash

* remove app_config argument from choose_torch_device() and choose_torch_dtype()

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-15 13:12:49 +00:00
Lincoln Stein
812f10730f adjust free vram calculation for models that will be removed by lazy offloading (#6150)
Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-04 22:51:12 -04:00
psychedelicious
60492500db chore: ruff 2024-03-19 09:24:28 +11:00
psychedelicious
897fe497dc fix(config): use new get_config across the app, use correct settings 2024-03-19 09:24:28 +11:00
psychedelicious
528ac5dd25 refactor(nodes): model identifiers
- All models are identified by a key and optionally a submodel type via new model `ModelField`. Previously, a few model types had their own class, but not all of them. This inconsistency just added complexity without any benefit.
- Update all invocation to use the new format.
- In the node API, models are loaded by key or an instance of `ModelField` as a convenience.
- Add an enriched model schema for metadata. It includes key, hash, name, base and type.
2024-03-07 10:56:59 +11:00
psychedelicious
afd9ae7712 tidy(mm): remove convenience methods from high level model manager service
These were added as a hold-me-over for the nodes API changes, no longer needed. A followup commit will fix the nodes API to not rely on these.
2024-03-07 10:56:59 +11:00
Brandon Rising
de9287a3e4 Run ruff 2024-03-01 10:42:33 +11:00
Brandon Rising
008716040b Allow users to run model manager without cuda 2024-03-01 10:42:33 +11:00
psychedelicious
5a3195f757 final tidying before marking PR as ready for review
- Replace AnyModelLoader with ModelLoaderRegistry
- Fix type check errors in multiple files
- Remove apparently unneeded `get_model_config_enum()` method from model manager
- Remove last vestiges of old model manager
- Updated tests and documentation

resolve conflict with seamless.py
2024-03-01 10:42:33 +11:00
Lincoln Stein
5d612ec095 Tidy names and locations of modules
- Rename old "model_management" directory to "model_management_OLD" in order to catch
  dangling references to original model manager.
- Caught and fixed most dangling references (still checking)
- Rename lora, textual_inversion and model_patcher modules
- Introduce a RawModel base class to simplfy the Union returned by the
  model loaders.
- Tidy up the model manager 2-related tests. Add useful fixtures, and
  a finalizer to the queue and installer fixtures that will stop the
  services and release threads.
2024-03-01 10:42:33 +11:00
Lincoln Stein
996eb96b4e Fix issues identified during PR review by RyanjDick and brandonrising
- ModelMetadataStoreService is now injected into ModelRecordStoreService
  (these two services are really joined at the hip, and should someday be merged)
- ModelRecordStoreService is now injected into ModelManagerService
- Reduced timeout value for the various installer and download wait*() methods
- Introduced a Mock modelmanager for testing
- Removed bare print() statement with _logger in the install helper backend.
- Removed unused code from model loader init file
- Made `locker` a private variable in the `LoadedModel` object.
- Fixed up model merge frontend (will be deprecated anyway!)
2024-03-01 10:42:33 +11:00
Lincoln Stein
a23dedd2ee make model manager v2 ready for PR review
- Replace legacy model manager service with the v2 manager.

- Update invocations to use new load interface.

- Fixed many but not all type checking errors in the invocations. Most
  were unrelated to model manager

- Updated routes. All the new routes live under the route tag
  `model_manager_v2`. To avoid confusion with the old routes,
  they have the URL prefix `/api/v2/models`. The old routes
  have been de-registered.

- Added a pytest for the loader.

- Updated documentation in contributing/MODEL_MANAGER.md
2024-03-01 10:42:33 +11:00
Lincoln Stein
7956602b19 consolidate model manager parts into a single class 2024-03-01 10:42:33 +11:00
psychedelicious
0f8af643d1 chore(backend): rename ModelInfo -> LoadedModelInfo
We have two different classes named `ModelInfo` which might need to be used by API consumers. We need to export both but have to deal with this naming collision.

The `ModelInfo` I've renamed here is the one that is returned when a model is loaded. It's the object least likely to be used by API consumers.
2024-03-01 10:42:33 +11:00
psychedelicious
8637c40661 feat(nodes): update all invocations to use new invocation context
Update all invocations to use the new context. The changes are all fairly simple, but there are a lot of them.

Supporting minor changes:
- Patch bump for all nodes that use the context
- Update invocation processor to provide new context
- Minor change to `EventServiceBase` to accept a node's ID instead of the dict version of a node
- Minor change to `ModelManagerService` to support the new wrapped context
- Fanagling of imports to avoid circular dependencies
2024-03-01 10:42:33 +11:00
psychedelicious
859e3d5a61 chore: flake8 2023-10-30 01:49:10 +11:00
Lincoln Stein
3546c41f4a close #4975 2023-10-23 18:48:14 -04:00
psychedelicious
c238a7f18b feat(api): chore: pydantic & fastapi upgrade
Upgrade pydantic and fastapi to latest.

- pydantic~=2.4.2
- fastapi~=103.2
- fastapi-events~=0.9.1

**Big Changes**

There are a number of logic changes needed to support pydantic v2. Most changes are very simple, like using the new methods to serialized and deserialize models, but there are a few more complex changes.

**Invocations**

The biggest change relates to invocation creation, instantiation and validation.

Because pydantic v2 moves all validation logic into the rust pydantic-core, we may no longer directly stick our fingers into the validation pie.

Previously, we (ab)used models and fields to allow invocation fields to be optional at instantiation, but required when `invoke()` is called. We directly manipulated the fields and invocation models when calling `invoke()`.

With pydantic v2, this is much more involved. Changes to the python wrapper do not propagate down to the rust validation logic - you have to rebuild the model. This causes problem with concurrent access to the invocation classes and is not a free operation.

This logic has been totally refactored and we do not need to change the model any more. The details are in `baseinvocation.py`, in the `InputField` function and `BaseInvocation.invoke_internal()` method.

In the end, this implementation is cleaner.

**Invocation Fields**

In pydantic v2, you can no longer directly add or remove fields from a model.

Previously, we did this to add the `type` field to invocations.

**Invocation Decorators**

With pydantic v2, we instead use the imperative `create_model()` API to create a new model with the additional field. This is done in `baseinvocation.py` in the `invocation()` wrapper.

A similar technique is used for `invocation_output()`.

**Minor Changes**

There are a number of minor changes around the pydantic v2 models API.

**Protected `model_` Namespace**

All models' pydantic-provided methods and attributes are prefixed with `model_` and this is considered a protected namespace. This causes some conflict, because "model" means something to us, and we have a ton of pydantic models with attributes starting with "model_".

Forunately, there are no direct conflicts. However, in any pydantic model where we define an attribute or method that starts with "model_", we must tell set the protected namespaces to an empty tuple.

```py
class IPAdapterModelField(BaseModel):
    model_name: str = Field(description="Name of the IP-Adapter model")
    base_model: BaseModelType = Field(description="Base model")

    model_config = ConfigDict(protected_namespaces=())
```

**Model Serialization**

Pydantic models no longer have `Model.dict()` or `Model.json()`.

Instead, we use `Model.model_dump()` or `Model.model_dump_json()`.

**Model Deserialization**

Pydantic models no longer have `Model.parse_obj()` or `Model.parse_raw()`, and there are no `parse_raw_as()` or `parse_obj_as()` functions.

Instead, you need to create a `TypeAdapter` object to parse python objects or JSON into a model.

```py
adapter_graph = TypeAdapter(Graph)
deserialized_graph_from_json = adapter_graph.validate_json(graph_json)
deserialized_graph_from_dict = adapter_graph.validate_python(graph_dict)
```

**Field Customisation**

Pydantic `Field`s no longer accept arbitrary args.

Now, you must put all additional arbitrary args in a `json_schema_extra` arg on the field.

**Schema Customisation**

FastAPI and pydantic schema generation now follows the OpenAPI version 3.1 spec.

This necessitates two changes:
- Our schema customization logic has been revised
- Schema parsing to build node templates has been revised

The specific aren't important, but this does present additional surface area for bugs.

**Performance Improvements**

Pydantic v2 is a full rewrite with a rust backend. This offers a substantial performance improvement (pydantic claims 5x to 50x depending on the task). We'll notice this the most during serialization and deserialization of sessions/graphs, which happens very very often - a couple times per node.

I haven't done any benchmarks, but anecdotally, graph execution is much faster. Also, very larges graphs - like with massive iterators - are much, much faster.
2023-10-17 14:59:25 +11:00
psychedelicious
402cf9b0ee feat: refactor services folder/module structure
Refactor services folder/module structure.

**Motivation**

While working on our services I've repeatedly encountered circular imports and a general lack of clarity regarding where to put things. The structure introduced goes a long way towards resolving those issues, setting us up for a clean structure going forward.

**Services**

Services are now in their own folder with a few files:

- `services/{service_name}/__init__.py`: init as needed, mostly empty now
- `services/{service_name}/{service_name}_base.py`: the base class for the service
- `services/{service_name}/{service_name}_{impl_type}.py`: the default concrete implementation of the service - typically one of `sqlite`, `default`, or `memory`
- `services/{service_name}/{service_name}_common.py`: any common items - models, exceptions, utilities, etc

Though it's a bit verbose to have the service name both as the folder name and the prefix for files, I found it is _extremely_ confusing to have all of the base classes just be named `base.py`. So, at the cost of some verbosity when importing things, I've included the service name in the filename.

There are some minor logic changes. For example, in `InvocationProcessor`, instead of assigning the model manager service to a variable to be used later in the file, the service is used directly via the `Invoker`.

**Shared**

Things that are used across disparate services are in `services/shared/`:

- `default_graphs.py`: previously in `services/`
- `graphs.py`: previously in `services/`
- `paginatation`: generic pagination models used in a few services
- `sqlite`: the `SqliteDatabase` class, other sqlite-specific things
2023-10-12 12:15:06 -04:00