InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-02-14 14:14:56 -05:00

Author	SHA1	Message	Date
skunkworxdark	604763d20f	Update flux.py Replace T5Tokenizer with T5TokenizerFast	2025-07-03 08:04:08 +10:00
Billy	3cd4306eec	Update import path	2025-06-26 19:47:06 +10:00
Billy	2832ca300f	Formatting	2025-06-24 07:26:42 +10:00
Billy	de5f413440	Filter bundle_emb for all LoRAs	2025-06-24 07:12:11 +10:00
Billy	150a876c73	Formatting	2025-06-23 13:52:19 +10:00
Billy	62c3b01e4f	Merge branch 'main' into OMI	2025-06-23 13:52:07 +10:00
Billy	e1157f343b	Support for Flux and SDXL	2025-06-23 13:51:16 +10:00
Billy	4ee54eac1d	Another attempt	2025-06-20 14:10:06 +10:00
Billy	1fd83f5e68	Import	2025-06-19 11:01:50 +10:00
Billy	637487c573	Convert FROM OMI to diffusers	2025-06-19 11:00:27 +10:00
Billy	4e98e7d0a2	Typo: dot should be comma	2025-06-19 10:47:24 +10:00
Billy	12f65d800d	Formatting	2025-06-19 09:40:58 +10:00
Billy	45d09f8f51	Use OMI conversion utils	2025-06-19 09:40:49 +10:00
Billy	9b4fdb493e	Loader	2025-06-18 10:53:54 +10:00
Billy	47e21d6e04	Formatting	2025-06-17 13:56:38 +10:00
Billy	84ab4a1c30	Convert from OMI to default LoRA state dict	2025-06-17 13:56:22 +10:00
Kevin Turner	2981591c36	test: add some aitoolkit lora tests	2025-06-16 19:08:11 +10:00
Kevin Turner	ab8c739cd8	fix(LoRA): add ai-toolkit to lora loader	2025-06-16 19:08:11 +10:00
psychedelicious	814406d98a	feat(mm): siglip model loading supports partial loading In the previous commit, the LLaVA model was updated to support partial loading. In this commit, the SigLIP model is updated in the same way. This model is used for FLUX Redux. It's <4GB and only ever run in isolation, so it won't benefit from partial loading for the vast majority of users. Regardless, I think it is best if we make _all_ models work with partial loading. PS: I also fixed the initial load dtype issue, described in the prev commit. It's probably a non-issue for this model, but we may as well fix it.	2025-04-18 10:12:03 +10:00
psychedelicious	c054501103	feat(mm): llava model loading supports partial loading; fix OOM crash on initial load The model manager has two types of model cache entries: - `CachedModelOnlyFullLoad`: The model may only ever be loaded and unloaded as a single object. - `CachedModelWithPartialLoad`: The model may be partially loaded and unloaded. Partial loaded is enabled by overwriting certain torch layer classes, adding the ability to autocast the layer to a device on-the-fly. See `CustomLinear` for an example. So, to take advantage of partial loading and be cached as a `CachedModelWithPartialLoad`, the model must inherit from `torch.nn.Module`. The LLaVA classes provided by `transformers` do inherit from `torch.nn.Module`, but we wrap those classes in a separate class called `LlavaOnevisionModel`. The wrapper encapsulate both the LLaVA model and its "processor" - a lightweight class that prepares model inputs like text and images. While it is more elegant to encapsulate both model and processor classes in a single entity, this prevents the model cache from enabling partial loading for the chunky vLLM model. Fixing this involved a few changes. - Update the `LlavaOnevisionModelLoader` class to operate on the vLLM model directly, instead the `LlavaOnevisionModel` wrapper class. - Instantiate the processor directly in the node. The processor is lightweight and does its business on the CPU. We don't need to worry about caching in the model manager. - Remove caching support code from the `LlavaOnevisionModel` wrapper class. It's not needed, because we do not cache this class. The class now only handles running the models provided to it. - Rename `LlavaOnevisionModel` to `LlavaOnevisionPipeline` to better represent its purpose. These changes have a bonus effect of fixing an OOM crash when initially loading the models. This was most apparent when loading LLaVA 7B, which is pretty chunky. The initial load is onto CPU RAM. In the old version of the loaders, we ignored the loader's target dtype for the initial load. Instead, we loaded the model at `transformers`'s "default" dtype of fp32. LLaVA 7B is fp16 and weighs ~17GB. Loading as fp32 means we need double that amount (~34GB) of CPU RAM. Many users only have 32GB RAM, so this causes a _CPU_ OOM - which is a hard crash of the whole process. With the updated loaders, the initial load logic now uses the target dtype for the initial load. LLaVA now needs the expected ~17GB RAM for its initial load. PS: If we didn't make the accompanying partial loading changes, we still could have solved this OOM. We'd just need to pass the initial load dtype to the wrapper class and have it load on that dtype. But we may as well fix both issues. PPS: There are other models whose model classes are wrappers around a torch module class, and thus cannot be partially loaded. However, these models are typically fairly small and/or are run only on their own, so they don't benefit as much from partial loading. It's the really big models (like LLaVA 7B) that benefit most from the partial loading.	2025-04-18 10:12:03 +10:00
Ryan Dick	46316e43f0	typegen	2025-04-10 10:50:13 +10:00
Ryan Dick	321c2d358c	Add CogView4 model loader. And various other fixes to get a CogView4 workflow running (though quality is still below expectations).	2025-04-10 10:50:13 +10:00
Billy	182580ff69	Imports	2025-03-26 12:55:10 +11:00
Ryan Dick	2ef1ecf381	Fix copy-paste errors.	2025-03-18 11:53:06 +11:00
Ryan Dick	e9714fe476	Add LLaVA Onevision model loading and inference support.	2025-03-18 11:53:06 +11:00
Ryan Dick	f1fde792ee	Get FLUX Redux working: model loading and inference.	2025-03-06 10:31:17 +11:00
Ryan Dick	0db6639b4b	Add FLUX OneTrainer model probing.	2025-01-28 14:51:35 +00:00
Ryan Dick	b2bb359d47	Update the model loading logic for several of the large FLUX-related models to ensure that the model is initialized on the meta device prior to loading the state dict into it. This helps to keep peak memory down.	2025-01-16 02:30:28 +00:00
Ryan Dick	a7c72992a6	Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type.	2024-12-24 14:23:18 +00:00
Ryan Dick	d30a9ced38	Rename model_cache_default.py -> model_cache.py.	2024-12-24 14:23:18 +00:00
Ryan Dick	e0bfa6157b	Remove ModelCacheBase.	2024-12-24 14:23:18 +00:00
Brandon Rising	c9b2cce627	Add diffusers config object for control loras	2024-12-17 14:01:41 -05:00
Ryan Dick	41664f88db	Rename backend/patches/conversions/ to backend/patches/lora_conversions/	2024-12-17 13:20:19 +00:00
Ryan Dick	42f8d6aa11	Rename backend/lora/ to backend/patches	2024-12-17 13:20:19 +00:00
Brandon Rising	046d19446c	Rename Structural Lora to Control Lora	2024-12-17 07:28:45 -05:00
Brandon Rising	f3b253987f	Initial setup for flux tools control loras	2024-12-17 07:28:45 -05:00
David Burnett	bb3cedddd5	Rework change based on comments	2024-11-08 10:27:47 +00:00
David Burnett	7b5efc2203	Flux Vae broke for float16, force bfloat16 or float32 were compatible	2024-11-06 17:47:22 -05:00
Brandon Rising	ebabf4f7a8	Setup Model and T5 Encoder selection fields for sd3 nodes	2024-11-04 12:42:09 -05:00
Ryan Dick	c620581699	Bug fixes to get SD3 text-to-image workflow running.	2024-11-04 12:42:09 -05:00
Ryan Dick	586c00bc02	(minor) Remove unused dict.	2024-11-04 12:42:09 -05:00
Ryan Dick	a2486a5f06	Remove unused prediction_type and upcast_attention from from_single_file(...) calls.	2024-10-28 13:05:17 -04:00
Ryan Dick	07ab116efb	Remove `load_safety_checker=False` from calls to from_single_file(...). This param has been deprecated, and by including it (even when set to False) the safety checker automatically gets downloaded.	2024-10-28 13:05:17 -04:00
David Burnett	24f9b46fbc	ruff fix	2024-10-23 10:09:24 +11:00
David Burnett	54b3aa1d01	load t5 model in the same format as it is saved, seems to load as float32 on Macs	2024-10-23 10:09:24 +11:00
Ryan Dick	e545f18a45	(minor) Fix ruff.	2024-10-21 22:38:06 +00:00
Ryan Dick	f70a8e2c1a	A bunch of HACKS to get ViT-L CLIP vision encoder working for FLUX IP-Adapter. Need to revisit how to clean this all up long term.	2024-10-21 15:43:00 +00:00
Ryan Dick	c2a8fbd8d6	(minor) Move infer_xlabs_ip_adapter_params_from_state_dict(...) to state_dict_utils.py.	2024-10-21 15:38:50 +00:00
Ryan Dick	d6643d7263	Add model loading code for xlabs FLUX IP-Adapter (not tested).	2024-10-21 15:38:50 +00:00
Ryan Dick	8d1a45863c	Support installing InstantX ControlNet models from diffusers directory format.	2024-10-09 17:04:10 +00:00

1 2 3

141 Commits