InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-01-15 02:47:58 -05:00

Author	SHA1	Message	Date
psychedelicious	35c7c59455	fix(app): reduce peak memory usage We've long suspected there is a memory leak in Invoke, but that may not be true. What looks like a memory leak may in fact be the expected behaviour for our allocation patterns. We observe ~20 to ~30 MB increase in memory usage per session executed. I did some prolonged tests, where I measured the process's RSS in bytes while doing 200 SDXL generations. I found that it eventually leveled off at around 100 generations, at which point memory usage had climbed by ~900MB from its starting point. I used tracemalloc to diff the allocations of single session executions and found that we are allocating ~20MB or so per session in `ModelPatcher.apply_ti()`. In `ModelPatcher.apply_ti()` we add tokens to the tokenizer when handling TIs. The added tokens should be scoped to only the current invocation, but there is no simple way to remove the tokens afterwards. As a workaround for this, we clone the tokenizer, add the TI tokens to the clone, and use the clone to when running compel. Afterwards, this cloned tokenizer is discarded. The tokenizer uses ~20MB of memory, and it has referrers/referents to other compel stuff. This is what is causing the observed increases in memory per session! We'd expect these objects to be GC'd but python doesn't do it immediately. After creating the cond tensors, we quickly move on to denoising. So there isn't any time for the GC to happen to free up its existing memory arenas/blocks to reuse them. Instead, python needs to request more memory from the OS. We can improve the situation by immediately calling `del` on the tokenizer clone and related objects. In fact, we already had some code in the compel nodes to `del` some of these objects, but not all. Adding the `del`s vastly improves things. We hit peak RSS in half the sessions (~50 or less) and it's now ~100MB more than starting value. There is still a gradual increase in memory usage until we level off.	2025-06-11 12:56:16 +10:00
Kent Keirsey	d4c4926caa	Update Compel to 2.1.1 and apply Sentences Split logic	2025-05-26 22:54:15 -04:00
psychedelicious	830880a6fc	chore(nodes): update titles of all model-specific nodes to reference their models Also bump versions on all of them.	2025-03-17 10:32:19 +11:00
Billy	f2689598c0	Formatting	2025-03-06 09:11:00 +11:00
Ryan Dick	71b97ce7be	Reduce the likelihood of encountering https://github.com/invoke-ai/InvokeAI/issues/7513 by elminating places where the door was left open for this to happen.	2025-01-07 01:20:15 +00:00
Ryan Dick	bcd29c5d74	Remove all cases where we check the 'model.device'. This is no longer trustworthy now that partial loading is permitted.	2025-01-07 00:31:00 +00:00
Ryan Dick	477d87ec31	Fix layer patch dtype selection for CLIP text encoder models.	2024-12-29 21:48:51 +00:00
Ryan Dick	80db9537ff	Rename model_patcher.py -> layer_patcher.py.	2024-12-24 15:57:54 +00:00
Ryan Dick	61253b91f1	Enable LoRAPatcher.apply_smart_lora_patches(...) throughout the stack.	2024-12-24 15:57:54 +00:00
Ryan Dick	dd09509dbd	Rename ModelPatcher -> LayerPatcher to avoid conflicts with another ModelPatcher definition.	2024-12-17 13:20:19 +00:00
Ryan Dick	7fad4c9491	Rename LoRAModelRaw to ModelPatchRaw.	2024-12-17 13:20:19 +00:00
Ryan Dick	b820862eab	Rename ModelPatcher methods to reflect that they are general model patching methods and are not LoRA-specific.	2024-12-17 13:20:19 +00:00
Ryan Dick	c604a0956e	Rename LoRAPatcher -> ModelPatcher.	2024-12-17 13:20:19 +00:00
Ryan Dick	42f8d6aa11	Rename backend/lora/ to backend/patches	2024-12-17 13:20:19 +00:00
psychedelicious	96a31a5563	feat(app): add more events when loading/running models	2024-11-15 05:49:05 +11:00
Ryan Dick	fef26a5f2f	Consolidate all LoRA patching logic in the LoRAPatcher.	2024-09-15 04:39:56 +03:00
Ryan Dick	2b3e4e123d	Split LoRA layer implementations into separate files.	2024-09-12 15:53:30 +00:00
Sergey Borisov	faa88f72bf	Make lora as separate extensions	2024-07-27 02:39:53 +03:00
Ryan Dick	1d449097cc	Apply ruff rule to disallow all relative imports.	2024-07-04 09:35:37 -04:00
Lincoln Stein	2871676f79	LoRA patching optimization (#6439 ) * allow model patcher to optimize away the unpatching step when feasible * remove lazy_offloading functionality * allow model patcher to optimize away the unpatching step when feasible * remove lazy_offloading functionality * do not save original weights if there is a CPU copy of state dict * Update invokeai/backend/model_manager/load/load_base.py Co-authored-by: Ryan Dick <ryanjdick3@gmail.com> * documentation fixes added during penultimate review --------- Co-authored-by: Lincoln Stein <lstein@gmail.com> Co-authored-by: Kent Keirsey <31807370+hipsterusername@users.noreply.github.com> Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>	2024-06-06 13:53:35 +00:00
Lincoln Stein	532f82cb97	Optimize RAM to VRAM transfer (#6312 ) * avoid copying model back from cuda to cpu * handle models that don't have state dicts * add assertions that models need a `device()` method * do not rely on torch.nn.Module having the device() method * apply all patches after model is on the execution device * fix model patching in latents too * log patched tokenizer * closes #6375 --------- Co-authored-by: Lincoln Stein <lstein@gmail.com>	2024-05-24 17:06:09 +00:00
Lincoln Stein	e93f4d632d	[util] Add generic torch device class (#6174 ) * introduce new abstraction layer for GPU devices * add unit test for device abstraction * fix ruff * convert TorchDeviceSelect into a stateless class * move logic to select context-specific execution device into context API * add mock hardware environments to pytest * remove dangling mocker fixture * fix unit test for running on non-CUDA systems * remove unimplemented get_execution_device() call * remove autocast precision * Multiple changes: 1. Remove TorchDeviceSelect.get_execution_device(), as well as calls to context.models.get_execution_device(). 2. Rename TorchDeviceSelect to TorchDevice 3. Added back the legacy public API defined in `invocation_api`, including choose_precision(). 4. Added a config file migration script to accommodate removal of precision=autocast. * add deprecation warnings to choose_torch_device() and choose_precision() * fix test crash * remove app_config argument from choose_torch_device() and choose_torch_dtype() --------- Co-authored-by: Lincoln Stein <lstein@gmail.com>	2024-04-15 13:12:49 +00:00
Ryan Dick	4a828818da	Remove support for Prompt-to-Prompt cross-attention control (aka .swap()). This feature is not widely used. It does not work with SDXL and is incompatible with IP-Adapter and regional prompting. The implementation is also intertwined with both text embedding and the UNet attention layers, resulting in a high maintenance burden. For all of these reasons, we have decided to drop support.	2024-04-09 10:57:02 -04:00
Ryan Dick	338bf808d6	Rename MaskField to be a generice TensorField.	2024-04-09 08:12:12 -04:00
Ryan Dick	4e64b26702	Update compel nodes to accept an optional prompt mask.	2024-04-09 08:12:12 -04:00
psychedelicious	29b04b7e83	chore: bump nodes versions Bump all nodes in prep for v4.0.0.	2024-03-20 10:28:07 +11:00
psychedelicious	132790eebe	tidy(nodes): use canonical capitalizations	2024-03-07 10:56:59 +11:00
psychedelicious	528ac5dd25	refactor(nodes): model identifiers - All models are identified by a key and optionally a submodel type via new model `ModelField`. Previously, a few model types had their own class, but not all of them. This inconsistency just added complexity without any benefit. - Update all invocation to use the new format. - In the node API, models are loaded by key or an instance of `ModelField` as a convenience. - Add an enriched model schema for metadata. It includes key, hash, name, base and type.	2024-03-07 10:56:59 +11:00
blessedcoolant	ae34bcfbc0	fix: Assertion issue with SDXL Compel	2024-03-01 10:42:33 +11:00
Brandon Rising	f475b78734	Ruff check	2024-03-01 10:42:33 +11:00
Brandon Rising	ca9b815c89	Extract TI loading logic into util, disallow it from ever failing a generation	2024-03-01 10:42:33 +11:00
Brandon Rising	8efd4284e9	Fix one last reference to the uncasted model	2024-03-01 10:42:33 +11:00
Brandon Rising	5922cee541	Allow TIs to be either a key or a name in the prompt during our transition to using keys	2024-03-01 10:42:33 +11:00
psychedelicious	34f3a39cc9	fix(nodes): fix TI loading	2024-03-01 10:42:33 +11:00
psychedelicious	731860c332	feat(nodes): JIT graph nodes validation We use pydantic to validate a union of valid invocations when instantiating a graph. Previously, we constructed the union while creating the `Graph` class. This introduces a dependency on the order of imports. For example, consider a setup where we have 3 invocations in the app: - Python executes the module where `FirstInvocation` is defined, registering `FirstInvocation`. - Python executes the module where `SecondInvocation` is defined, registering `SecondInvocation`. - Python executes the module where `Graph` is defined. A union of invocations is created and used to define the `Graph.nodes` field. The union contains `FirstInvocation` and `SecondInvocation`. - Python executes the module where `ThirdInvocation` is defined, registering `ThirdInvocation`. - A graph is created that includes `ThirdInvocation`. Pydantic validates the graph using the union, which does not know about `ThirdInvocation`, raising a `ValidationError` about an unknown invocation type. This scenario has been particularly problematic in tests, where we may create invocations dynamically. The test files have to be structured in such a way that the imports happen in the right order. It's a major pain. This PR refactors the validation of graph nodes to resolve this issue: - `BaseInvocation` gets a new method `get_typeadapter`. This builds a pydantic `TypeAdapter` for the union of all registered invocations, caching it after the first call. - `Graph.nodes`'s type is widened to `dict[str, BaseInvocation]`. This actually is a nice bonus, because we get better type hints whenever we reference `some_graph.nodes`. - A "plain" field validator takes over the validation logic for `Graph.nodes`. "Plain" validators totally override pydantic's own validation logic. The validator grabs the `TypeAdapter` from `BaseInvocation`, then validates each node with it. The validation is identical to the previous implementation - we get the same errors. `BaseInvocationOutput` gets the same treatment.	2024-03-01 10:42:33 +11:00
psychedelicious	5a3195f757	final tidying before marking PR as ready for review - Replace AnyModelLoader with ModelLoaderRegistry - Fix type check errors in multiple files - Remove apparently unneeded `get_model_config_enum()` method from model manager - Remove last vestiges of old model manager - Updated tests and documentation resolve conflict with seamless.py	2024-03-01 10:42:33 +11:00
Lincoln Stein	5d612ec095	Tidy names and locations of modules - Rename old "model_management" directory to "model_management_OLD" in order to catch dangling references to original model manager. - Caught and fixed most dangling references (still checking) - Rename lora, textual_inversion and model_patcher modules - Introduce a RawModel base class to simplfy the Union returned by the model loaders. - Tidy up the model manager 2-related tests. Add useful fixtures, and a finalizer to the queue and installer fixtures that will stop the services and release threads.	2024-03-01 10:42:33 +11:00
psychedelicious	539570cc7a	feat(nodes): update invocation context for mm2, update nodes model usage	2024-03-01 10:42:33 +11:00
Lincoln Stein	a23dedd2ee	make model manager v2 ready for PR review - Replace legacy model manager service with the v2 manager. - Update invocations to use new load interface. - Fixed many but not all type checking errors in the invocations. Most were unrelated to model manager - Updated routes. All the new routes live under the route tag `model_manager_v2`. To avoid confusion with the old routes, they have the URL prefix `/api/v2/models`. The old routes have been de-registered. - Added a pytest for the loader. - Updated documentation in contributing/MODEL_MANAGER.md	2024-03-01 10:42:33 +11:00
Lincoln Stein	78ef946e01	BREAKING CHANGES: invocations now require model key, not base/type/name - Implement new model loader and modify invocations and embeddings - Finish implementation loaders for all models currently supported by InvokeAI. - Move lora, textual_inversion, and model patching support into backend/embeddings. - Restore support for model cache statistics collection (a little ugly, needs work). - Fixed up invocations that load and patch models. - Move seamless and silencewarnings utils into better location	2024-03-01 10:42:33 +11:00
psychedelicious	4ce21087d3	fix(nodes): restore type annotations for `InvocationContext`	2024-03-01 10:42:33 +11:00
psychedelicious	05fb485d33	feat(nodes): move `ConditioningFieldData` to `conditioning_data.py`	2024-03-01 10:42:33 +11:00
psychedelicious	8637c40661	feat(nodes): update all invocations to use new invocation context Update all invocations to use the new context. The changes are all fairly simple, but there are a lot of them. Supporting minor changes: - Patch bump for all nodes that use the context - Update invocation processor to provide new context - Minor change to `EventServiceBase` to accept a node's ID instead of the dict version of a node - Minor change to `ModelManagerService` to support the new wrapped context - Fanagling of imports to avoid circular dependencies	2024-03-01 10:42:33 +11:00
psychedelicious	992b02aa65	tidy(nodes): move all field things to fields.py Unfortunately, this is necessary to prevent circular imports at runtime.	2024-03-01 10:42:33 +11:00
Brandon	32ad742f3e	Ti trigger from prompt util (#5294 ) * Pull logic for extracting TI triggers into a util function * Remove duplicate regex for ti triggers * Fix linting for ruff * Remove unused imports	2023-12-22 03:04:44 +00:00
psychedelicious	e8b83fecff	fix(backend): apply clip skip after lora This handles LoRAs that attempt to modify layers skipped by CLIP Skip.	2023-11-14 11:30:15 +11:00
psychedelicious	6aa87f973e	fix(nodes): create `app/shared/` module to prevent circular imports We have a number of shared classes, objects, and functions that are used in multiple places. This causes circular import issues. This commit creates a new `app/shared/` module to hold these shared classes, objects, and functions. Initially, only `FreeUConfig` and `FieldDescriptions` are moved here. This resolves a circular import issue with custom nodes. Other shared classes, objects, and functions will be moved here in future commits.	2023-11-09 16:41:55 +11:00
Ryan Dick	379d68f595	Patch LoRA on device when model is already on device.	2023-11-02 10:03:17 -07:00
psychedelicious	c238a7f18b	feat(api): chore: pydantic & fastapi upgrade Upgrade pydantic and fastapi to latest. - pydantic~=2.4.2 - fastapi~=103.2 - fastapi-events~=0.9.1 Big Changes There are a number of logic changes needed to support pydantic v2. Most changes are very simple, like using the new methods to serialized and deserialize models, but there are a few more complex changes. Invocations The biggest change relates to invocation creation, instantiation and validation. Because pydantic v2 moves all validation logic into the rust pydantic-core, we may no longer directly stick our fingers into the validation pie. Previously, we (ab)used models and fields to allow invocation fields to be optional at instantiation, but required when `invoke()` is called. We directly manipulated the fields and invocation models when calling `invoke()`. With pydantic v2, this is much more involved. Changes to the python wrapper do not propagate down to the rust validation logic - you have to rebuild the model. This causes problem with concurrent access to the invocation classes and is not a free operation. This logic has been totally refactored and we do not need to change the model any more. The details are in `baseinvocation.py`, in the `InputField` function and `BaseInvocation.invoke_internal()` method. In the end, this implementation is cleaner. Invocation Fields In pydantic v2, you can no longer directly add or remove fields from a model. Previously, we did this to add the `type` field to invocations. Invocation Decorators With pydantic v2, we instead use the imperative `create_model()` API to create a new model with the additional field. This is done in `baseinvocation.py` in the `invocation()` wrapper. A similar technique is used for `invocation_output()`. Minor Changes There are a number of minor changes around the pydantic v2 models API. Protected `model_` Namespace All models' pydantic-provided methods and attributes are prefixed with `model_` and this is considered a protected namespace. This causes some conflict, because "model" means something to us, and we have a ton of pydantic models with attributes starting with "model_". Forunately, there are no direct conflicts. However, in any pydantic model where we define an attribute or method that starts with "model_", we must tell set the protected namespaces to an empty tuple. ```py class IPAdapterModelField(BaseModel): model_name: str = Field(description="Name of the IP-Adapter model") base_model: BaseModelType = Field(description="Base model") model_config = ConfigDict(protected_namespaces=()) ``` Model Serialization Pydantic models no longer have `Model.dict()` or `Model.json()`. Instead, we use `Model.model_dump()` or `Model.model_dump_json()`. Model Deserialization Pydantic models no longer have `Model.parse_obj()` or `Model.parse_raw()`, and there are no `parse_raw_as()` or `parse_obj_as()` functions. Instead, you need to create a `TypeAdapter` object to parse python objects or JSON into a model. ```py adapter_graph = TypeAdapter(Graph) deserialized_graph_from_json = adapter_graph.validate_json(graph_json) deserialized_graph_from_dict = adapter_graph.validate_python(graph_dict) ``` Field Customisation Pydantic `Field`s no longer accept arbitrary args. Now, you must put all additional arbitrary args in a `json_schema_extra` arg on the field. Schema Customisation FastAPI and pydantic schema generation now follows the OpenAPI version 3.1 spec. This necessitates two changes: - Our schema customization logic has been revised - Schema parsing to build node templates has been revised The specific aren't important, but this does present additional surface area for bugs. Performance Improvements Pydantic v2 is a full rewrite with a rust backend. This offers a substantial performance improvement (pydantic claims 5x to 50x depending on the task). We'll notice this the most during serialization and deserialization of sessions/graphs, which happens very very often - a couple times per node. I haven't done any benchmarks, but anecdotally, graph execution is much faster. Also, very larges graphs - like with massive iterators - are much, much faster.	2023-10-17 14:59:25 +11:00
Ryan Dick	b57acb7353	Merge branch 'main' into feat/ip-adapter	2023-09-15 13:15:25 -04:00

1 2 3

139 Commits