InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-01-30 14:48:15 -05:00

Author	SHA1	Message	Date
psychedelicious	518a896521	feat(mm): add `usage_info` to model config	2025-05-06 09:07:52 -04:00
Kent Keirsey	1f63b60021	Implementing support for Non-Standard LoRA Format (#7985 ) * integrate loRA * idk anymore tbh * enable fused matrix for quantized models * integrate loRA * idk anymore tbh * enable fused matrix for quantized models * ruff fix --------- Co-authored-by: Sam <bhaskarmdutt@gmail.com> Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>	2025-05-05 09:40:38 -04:00
Mary Hipp	fb91f48722	change base model for chatGPT 4o	2025-04-29 09:12:49 +10:00
Mary Hipp	04c005284c	add gpt-image to possible base model types	2025-04-28 15:39:11 -04:00
psychedelicious	14944872c4	feat(mm): add model taxonomy for API models & Imagen3 as base model type	2025-04-28 13:31:26 -04:00
psychedelicious	814406d98a	feat(mm): siglip model loading supports partial loading In the previous commit, the LLaVA model was updated to support partial loading. In this commit, the SigLIP model is updated in the same way. This model is used for FLUX Redux. It's <4GB and only ever run in isolation, so it won't benefit from partial loading for the vast majority of users. Regardless, I think it is best if we make _all_ models work with partial loading. PS: I also fixed the initial load dtype issue, described in the prev commit. It's probably a non-issue for this model, but we may as well fix it.	2025-04-18 10:12:03 +10:00
psychedelicious	c054501103	feat(mm): llava model loading supports partial loading; fix OOM crash on initial load The model manager has two types of model cache entries: - `CachedModelOnlyFullLoad`: The model may only ever be loaded and unloaded as a single object. - `CachedModelWithPartialLoad`: The model may be partially loaded and unloaded. Partial loaded is enabled by overwriting certain torch layer classes, adding the ability to autocast the layer to a device on-the-fly. See `CustomLinear` for an example. So, to take advantage of partial loading and be cached as a `CachedModelWithPartialLoad`, the model must inherit from `torch.nn.Module`. The LLaVA classes provided by `transformers` do inherit from `torch.nn.Module`, but we wrap those classes in a separate class called `LlavaOnevisionModel`. The wrapper encapsulate both the LLaVA model and its "processor" - a lightweight class that prepares model inputs like text and images. While it is more elegant to encapsulate both model and processor classes in a single entity, this prevents the model cache from enabling partial loading for the chunky vLLM model. Fixing this involved a few changes. - Update the `LlavaOnevisionModelLoader` class to operate on the vLLM model directly, instead the `LlavaOnevisionModel` wrapper class. - Instantiate the processor directly in the node. The processor is lightweight and does its business on the CPU. We don't need to worry about caching in the model manager. - Remove caching support code from the `LlavaOnevisionModel` wrapper class. It's not needed, because we do not cache this class. The class now only handles running the models provided to it. - Rename `LlavaOnevisionModel` to `LlavaOnevisionPipeline` to better represent its purpose. These changes have a bonus effect of fixing an OOM crash when initially loading the models. This was most apparent when loading LLaVA 7B, which is pretty chunky. The initial load is onto CPU RAM. In the old version of the loaders, we ignored the loader's target dtype for the initial load. Instead, we loaded the model at `transformers`'s "default" dtype of fp32. LLaVA 7B is fp16 and weighs ~17GB. Loading as fp32 means we need double that amount (~34GB) of CPU RAM. Many users only have 32GB RAM, so this causes a _CPU_ OOM - which is a hard crash of the whole process. With the updated loaders, the initial load logic now uses the target dtype for the initial load. LLaVA now needs the expected ~17GB RAM for its initial load. PS: If we didn't make the accompanying partial loading changes, we still could have solved this OOM. We'd just need to pass the initial load dtype to the wrapper class and have it load on that dtype. But we may as well fix both issues. PPS: There are other models whose model classes are wrappers around a torch module class, and thus cannot be partially loaded. However, these models are typically fairly small and/or are run only on their own, so they don't benefit as much from partial loading. It's the really big models (like LLaVA 7B) that benefit most from the partial loading.	2025-04-18 10:12:03 +10:00
Mary Hipp	9846229e52	build graph for cogview4	2025-04-10 10:50:13 +10:00
Ryan Dick	46316e43f0	typegen	2025-04-10 10:50:13 +10:00
Ryan Dick	7e894ffe83	Consolidate InpaintExtension implementations for SD3 and FLUX.	2025-04-10 10:50:13 +10:00
Ryan Dick	321c2d358c	Add CogView4 model loader. And various other fixes to get a CogView4 workflow running (though quality is still below expectations).	2025-04-10 10:50:13 +10:00
Ryan Dick	0338983895	Update CogView4 starter model entry with approximate bundle size.	2025-04-10 10:50:13 +10:00
Ryan Dick	bac05a7885	Add CogView4TextEncoderInvocation	2025-04-10 10:50:13 +10:00
Ryan Dick	e2c4ea8e89	Add CogView4 model probing.	2025-04-10 10:50:13 +10:00
Kevin Turner	52a8ad1c18	chore: rename model.size to model.file_size to disambiguate from RAM size or pixel size	2025-04-10 09:53:03 +10:00
Kevin Turner	f09aacf992	fix: ModelProbe.probe needs to return a size field	2025-04-10 09:53:03 +10:00
Kevin Turner	9590e8ff39	feat: expose model storage size	2025-04-10 09:53:03 +10:00
psychedelicious	8294e2cdea	feat(mm): support size calculation for onnx models	2025-04-07 11:37:55 +10:00
psychedelicious	8d32ede082	tidy(nodes): remove matplotlib dependency It was only used for a single color conversion function. Replaced with cv2 code, tested functionality to confirm it works the same.	2025-04-04 18:42:13 +11:00
psychedelicious	8188484a40	tidy: delete unused file	2025-04-04 18:42:13 +11:00
psychedelicious	986b7426d2	tidy(nodes): remove unused old dw openpose detector class	2025-04-04 18:42:13 +11:00
psychedelicious	89c999ca58	fix(backend): remove mps_fixes The fixes in this module monkeypatched `torch` to resolve some issues with FP16 on macOS. These issues have long since been resolved. Included in the now-removed fixes is `CustomSlicedAttentionProcessor`, which is intended to reduce memory requirements for MPS. This overrides `diffusers`' own `SlicedAttentionProcessor`. Unfortunately, `attention_type: sliced` produces hot garbage with the fixes and black images without the fixes. So this class appears to now be a moot point. Regardless, SDPA is supported on MPS and very efficient, so sliced attention is largely obsolete.	2025-04-04 18:42:13 +11:00
psychedelicious	5fa2cf59e2	fix(app): add trusted classes to torch safe globals to prevent errors when loading them In `ObjectSerializerDisk`, we use `torch.load` to load serialized objects from disk. With torch 2.6.0, torch defaults to `weights_only=True`. As a result, torch will raise when attempting to deserialize anything with an unrecognized class. For example, our `ConditioningFieldData` class is untrusted. When we load conditioning from disk, we will get a runtime error. Torch provides a method to add trusted classes to an allowlist. This change adds an arg to `ObjectSerializerDisk` to add a list of safe globals to the allowlist and uses it for both `ObjectSerializerDisk` instances. Note: My first attempt inferred the class from the generic type arg that `ObjectSerializerDisk` accepts, and added that to the allowlist. Unfortunately, this doesn't work. For example, `ConditioningFieldData` has a `conditionings` attribute that may be one some other untrusted classes representing model-specific conditioning data. So, even if we allowlist `ConditioningFieldData`, loading will fail when torch deserializes the `conditionings` attribute.	2025-04-04 18:42:13 +11:00
jazzhaiku	9868c3bfe3	Merge branch 'main' into lora-classification	2025-03-31 16:43:26 +11:00
psychedelicious	a44bfb4658	fix(mm): handle FLUX models w/ diff in_channels keys Before FLUX Fill was merged, we didn't do any checks for the model variant. We always returned "normal". To determine if a model is a FLUX Fill model, we need to check the state dict for a specific key. Initially, this logic was too strict and rejected quantized FLUX models. This issue was resolved, but it turns out there is another failure mode - some fine-tunes use a different key. This change further reduces the strictness, handling the alternate key and also falling back to "normal" if we don't see either key. This effectively restores the previous probing behaviour for all FLUX models. Closes #7856 Closes #7859	2025-03-31 12:32:55 +11:00
Billy	965753bf8b	Ruff formatting	2025-03-31 08:18:00 +11:00
Billy	40c53ab95c	Guard	2025-03-29 09:58:02 +11:00
jazzhaiku	c25f6d1f84	Merge branch 'main' into lora-classification	2025-03-28 12:32:22 +11:00
jazzhaiku	1af9930951	Merge branch 'main' into small-improvements	2025-03-28 12:11:09 +11:00
Billy	c276c1cbee	Comment	2025-03-28 10:57:46 +11:00
Billy	c619348f29	Extract ModelOnDisk to its own module	2025-03-28 10:35:13 +11:00
Billy	0d75c99476	Caching	2025-03-27 17:55:09 +11:00
Billy	323d409fb6	Make ruff happy	2025-03-27 17:47:57 +11:00
Billy	f251722f56	LoRA classification API	2025-03-27 17:47:01 +11:00
psychedelicious	7004fde41b	fix(mm): vllm model calculates its own size	2025-03-27 09:36:14 +11:00
Billy	efd14ec0e4	Make ruff happy	2025-03-27 08:11:39 +11:00
Billy	82dd2d508f	Deprecate checkpoint as file, diffusers as directory terminology	2025-03-27 08:10:12 +11:00
Billy	60b5aef16a	Log error -> warning	2025-03-27 06:56:22 +11:00
Billy	0e8b5484d5	Error handling	2025-03-26 19:31:57 +11:00
Billy	454506c83e	Type hints	2025-03-26 19:12:49 +11:00
Billy	8f6ab67376	Logs	2025-03-26 16:34:32 +11:00
Billy	5afcc7778f	Redundant	2025-03-26 16:32:19 +11:00
Billy	325e07d330	Error handling	2025-03-26 16:30:45 +11:00
Billy	a016bdc159	Add todo	2025-03-26 16:17:26 +11:00
Billy	a14f0b2864	Fail early on invalid config	2025-03-26 16:10:32 +11:00
Billy	721483318a	Extend ModelOnDisk	2025-03-26 16:10:00 +11:00
Billy	a6b94e8ca4	Revert some files	2025-03-26 13:18:50 +11:00
Billy	182580ff69	Imports	2025-03-26 12:55:10 +11:00
Billy	8e9d5c1187	Ruff formatting	2025-03-26 12:30:31 +11:00
Billy	99aac5870e	Remove star imports	2025-03-26 12:27:00 +11:00

1 2 3 4 5 ...

2330 Commits