InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-01-23 10:17:58 -05:00

Author	SHA1	Message	Date
Ryan Dick	f01e41ceaf	First pass at dynamically calculating the working memory requirements for the VAE decoding operation. Still need to tune SD3 and FLUX.	2024-12-19 15:26:16 -05:00
Ryan Dick	609ed06265	Add AutoencoderKL to the list of models that opt-out of partial loading.	2024-12-19 15:25:23 -05:00
Ryan Dick	f9e899a6ba	Make pinned pytorch version slightly more specific. We need at least 2.4 for access to torch.nn.functional.rms_norm(...).	2024-12-19 14:03:01 -05:00
Ryan Dick	9262c0ec53	Do not raise if a cache entry is deleted twice and ensure that OOM errors propagate up the stack.	2024-12-19 18:32:01 +00:00
Ryan Dick	7fddb06dc4	Add a list of models that opt-out of partial loading.	2024-12-19 16:00:56 +00:00
Ryan Dick	239297caf6	Tidy the API for overriding the working_mem_bytes for a particular operation.	2024-12-19 05:05:04 +00:00
Ryan Dick	20f0b2f4fa	Update app config docstring.	2024-12-19 04:33:26 +00:00
Ryan Dick	cfb8815355	Remove unused and outdated get_cache_size and set_cache_size endpoints.	2024-12-19 04:06:08 +00:00
Ryan Dick	c866b5a799	Allow legacy ram/vram configs to override default behavior if set.	2024-12-19 04:06:08 +00:00
Ryan Dick	3b76812d43	Only support partial model loading on CUDA.	2024-12-18 19:13:15 -05:00
Ryan Dick	a8f3471fc7	Drop models from the cache if we fail loading/unloading them.	2024-12-18 23:53:25 +00:00
Ryan Dick	6d8dee05a9	Use the cpu state dict strategy for managing CachedModelOnlyFullLoad memory.	2024-12-18 22:52:57 +00:00
Ryan Dick	e684e49299	Do not apply the autocast context when models are fully loaded onto the GPU - it adds some overhead.	2024-12-18 21:51:39 +00:00
Ryan Dick	4ce2042d65	Add remove_autocast_from_module_forward(...) utility.	2024-12-18 20:28:32 +00:00
Ryan Dick	05a50b557a	Update logic to enforce max size of RAM cache to avoid overfilling.	2024-12-18 20:21:38 +00:00
Ryan Dick	85e1e9587e	Add info logs each time a model is loaded.	2024-12-18 19:52:54 +00:00
Ryan Dick	8e763e87bb	Allow invocations to request more working VRAM when loading a model via the ModelCache.	2024-12-18 19:52:34 +00:00
Ryan Dick	4a4360a40c	Add enable_partial_loading config.	2024-12-18 17:17:08 +00:00
Ryan Dick	612d6b00e3	In FluxTextEncoderInvocation, make sure model is locked before loading next model.	2024-12-18 17:12:12 +00:00
Ryan Dick	7a5dd084ad	Update MPS cache limit logic.	2024-12-17 23:44:17 -05:00
Ryan Dick	79a4d0890f	WIP - add device_working_mem_gb config	2024-12-18 03:31:37 +00:00
Ryan Dick	e0c899104b	Consolidate the LayerPatching patching modes into a single implementation.	2024-12-17 18:33:36 +00:00
Ryan Dick	c37bb6375c	Rename model_patcher.py -> layer_patcher.py.	2024-12-17 17:19:12 +00:00
Ryan Dick	4716170988	Use torch.device('cpu') instead of 'cpu' when calling .to(), because some custom models don't support the latter.	2024-12-17 17:14:42 +00:00
Ryan Dick	463196d781	Update apply_smart_model_patches() so that layer restore matches the behavior of non-smart mode.	2024-12-17 17:13:45 +00:00
Ryan Dick	e1e756800d	Enable LoRAPatcher.apply_smart_lora_patches(...) throughout the stack.	2024-12-17 15:50:51 +00:00
Ryan Dick	ab337594b8	(minor) Rename num_layers -> num_loras in unit tests.	2024-12-17 15:39:01 +00:00
Ryan Dick	699e4e5995	Add test_apply_smart_lora_patches_to_partially_loaded_model(...).	2024-12-17 15:32:51 +00:00
Ryan Dick	33f17520ca	Add LoRAPatcher.smart_apply_lora_patches()	2024-12-17 15:29:04 +00:00
Ryan Dick	46d061212c	Update CachedModelWithPartialLoad to operate on state_dicts rather than moving torch.nn.Modules around.	2024-12-17 15:18:55 +00:00
Ryan Dick	829dddefc8	Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.	2024-12-17 15:18:55 +00:00
Ryan Dick	b6c159cfdb	Fix bug with partial offload of model buffers.	2024-12-17 15:18:55 +00:00
Ryan Dick	5a31c467a3	Fix bug in ModelCache that was causing it to offload more models from VRAM than necessary.	2024-12-17 15:18:55 +00:00
Ryan Dick	13dbde2429	Fix handling of torch.nn.Module buffers in CachedModelWithPartialLoad.	2024-12-17 15:18:55 +00:00
Ryan Dick	a8ee72d7fb	Maintain a read-only CPU state dict copy in CachedModelWithPartialLoad.	2024-12-17 15:18:55 +00:00
Ryan Dick	7a002e1b05	Memoize frequently accessed values in CachedModelWithPartialLoad.	2024-12-17 15:18:55 +00:00
Ryan Dick	b50dd8502f	More ModelCache logging improvements.	2024-12-17 15:18:55 +00:00
Ryan Dick	f4c13b057d	Cleanup of ModelCache and added a bunch of debug logging.	2024-12-17 15:18:55 +00:00
Ryan Dick	cb884ee567	Fix a couple of bugs to get basic vanilla partial model load working with the model cache.	2024-12-17 15:18:55 +00:00
Ryan Dick	050d4465e6	WIP - first pass at overhauling ModelCache to work with partial loads.	2024-12-17 15:18:55 +00:00
Ryan Dick	e48bb844b9	Delete experimental torch device autocasting solutions and clean up TorchFunctionAutocastDeviceContext.	2024-12-17 15:18:55 +00:00
Ryan Dick	57eb05983b	Create CachedModelOnlyFullLoad class.	2024-12-17 15:18:55 +00:00
Ryan Dick	dc3be08653	Move CachedModelWithPartialLoad into the main model_cache/ directory.	2024-12-17 15:18:55 +00:00
Ryan Dick	ae1041286f	Get rid of ModelLocker. It was an unnecessary layer of indirection.	2024-12-17 15:18:55 +00:00
Ryan Dick	6e270cc5bf	Move lock(...) and unlock(...) logic from ModelLocker to the ModelCache and make a bunch of ModelCache properties/methods private.	2024-12-17 15:18:55 +00:00
Ryan Dick	6dc447aba8	Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type.	2024-12-17 15:18:55 +00:00
Ryan Dick	a4c0fcb6c8	Rename model_cache_default.py -> model_cache.py.	2024-12-17 15:18:55 +00:00
Ryan Dick	1f3580716c	Remove ModelCacheBase.	2024-12-17 15:18:55 +00:00
Ryan Dick	405e53f80a	Move CacheStats to its own file.	2024-12-17 15:18:55 +00:00
Ryan Dick	be120ff587	Move CacheRecord out to its own file.	2024-12-17 15:18:55 +00:00

1 2 3 4 5 ...

15166 Commits