InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-02-02 16:05:13 -05:00

Author	SHA1	Message	Date
Ryan Dick	b301785dc8	Normalize the T5 model identifiers so that a FLUX T5 or an SD3 T5 model can be used interchangeably.	2025-01-16 08:33:58 +11:00
Ryan Dick	607d19f4dd	We should not trust the value of since the model could be partially-loaded.	2025-01-07 19:22:31 -05:00
Ryan Dick	d7ab464176	Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.	2025-01-07 02:53:44 +00:00
Ryan Dick	5b42b7bd45	Add a utility to help with determining the working memory required for expensive operations.	2025-01-07 01:20:15 +00:00
Ryan Dick	b343f81644	Use torch.cuda.memory_allocated() rather than torch.cuda.memory_reserved() to be more conservative in setting dynamic VRAM cache limits.	2025-01-07 01:20:15 +00:00
Ryan Dick	fc4a22fe78	Allow expensive operations to request more working memory.	2025-01-07 01:20:13 +00:00
Ryan Dick	a167632f09	Calculate model cache size limits dynamically based on the available RAM / VRAM.	2025-01-07 01:14:20 +00:00
Ryan Dick	6a9de1fcf3	Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated().	2025-01-07 00:31:53 +00:00
Ryan Dick	e5180c4e6b	Add get_effective_device(...) utility to aid in determining the effective device of models that are partially loaded.	2025-01-07 00:31:00 +00:00
Ryan Dick	2619ef53ca	Handle device casting in ia2_layer.py.	2025-01-07 00:31:00 +00:00
Ryan Dick	bcd29c5d74	Remove all cases where we check the 'model.device'. This is no longer trustworthy now that partial loading is permitted.	2025-01-07 00:31:00 +00:00
Ryan Dick	1b7bb70bde	Improve handling of cases when application code modifies the size of a model after registering it with the model cache.	2025-01-07 00:31:00 +00:00
Ryan Dick	7127040c3a	Remove unused function set_nested_attr(...).	2025-01-07 00:31:00 +00:00
Ryan Dick	ceb2498a67	Add log prefix to model cache logs.	2025-01-07 00:31:00 +00:00
Ryan Dick	d0bfa019be	Add 'enable_partial_loading' config flag.	2025-01-07 00:31:00 +00:00
Ryan Dick	535e45cedf	First pass at adding partial loading support to the ModelCache.	2025-01-07 00:30:58 +00:00
Ryan Dick	c579a218ef	Allow models to be locked in VRAM, even if they have been dropped from the RAM cache (related: https://github.com/invoke-ai/InvokeAI/issues/7513 ).	2025-01-06 23:02:52 +00:00
Ryan Dick	8b4b0ff0cf	Fix bug in CustomConv1d and CustomConv2d patch calculations.	2024-12-29 19:10:19 +00:00
Ryan Dick	6fd9b0a274	Delete old sidecar wrapper implementation. This functionality has moved into the custom layers.	2024-12-29 17:33:08 +00:00
Ryan Dick	a8bef59699	First pass at making custom layer patches work with weights streamed from the CPU to the GPU.	2024-12-29 17:01:37 +00:00
Ryan Dick	6d49ee839c	Switch the LayerPatcher to use 'custom modules' to manage layer patching.	2024-12-29 01:18:30 +00:00
Ryan Dick	0525f967c2	Fix the _autocast_forward_with_patches() function for CustomConv1d and CustomConv2d.	2024-12-29 00:22:37 +00:00
Ryan Dick	2855bb6b41	Update BaseLayerPatch.get_parameters(...) to accept a dict of orig_parameters rather than orig_module. This will enable compatibility between patching and cpu->gpu streaming.	2024-12-28 21:12:53 +00:00
Ryan Dick	20acfc9a00	Raise in CustomEmbedding and CustomGroupNorm if a patch is applied.	2024-12-28 20:49:17 +00:00
Ryan Dick	918f541af8	Add unit test for a SetParameterLayer patch applied to a CustomFluxRMSNorm layer.	2024-12-28 20:44:48 +00:00
Ryan Dick	93e76b61d6	Add CustomFluxRMSNorm layer.	2024-12-28 20:33:38 +00:00
Ryan Dick	f692e217ea	Add patch support to CustomConv1d and CustomConv2d (no unit tests yet).	2024-12-27 22:23:17 +00:00
Ryan Dick	f2981979f9	Get custom layer patches working with all quantized linear layer types.	2024-12-27 22:00:22 +00:00
Ryan Dick	ef970a1cdc	Add support for FluxControlLoRALayer in CustomLinear layers and add a unit test for it.	2024-12-27 21:00:47 +00:00
Ryan Dick	e24e386a27	Add support for patches to CustomModuleMixin and add a single unit test (more to come).	2024-12-27 18:57:13 +00:00
Ryan Dick	b06d61e3c0	Improve custom layer wrap/unwrap logic.	2024-12-27 16:29:48 +00:00
Ryan Dick	7d6ab0ceb2	Add a CustomModuleMixin class with a flag for enabling/disabling autocasting (since it incurs some runtime speed overhead.)	2024-12-26 20:08:30 +00:00
Ryan Dick	a8b2c4c3d2	Add inference tests for all custom module types (i.e. to test autocasting from cpu to device).	2024-12-26 18:33:46 +00:00
Ryan Dick	987c9ae076	Move custom autocast modules to separate files in a custom_modules/ directory.	2024-12-24 22:21:31 +00:00
Ryan Dick	6d7314ac0a	Consolidate the LayerPatching patching modes into a single implementation.	2024-12-24 15:57:54 +00:00
Ryan Dick	80db9537ff	Rename model_patcher.py -> layer_patcher.py.	2024-12-24 15:57:54 +00:00
Ryan Dick	6f926f05b0	Update apply_smart_model_patches() so that layer restore matches the behavior of non-smart mode.	2024-12-24 15:57:54 +00:00
Ryan Dick	cefcb340d9	Add LoRAPatcher.smart_apply_lora_patches()	2024-12-24 15:57:54 +00:00
Ryan Dick	7214d4969b	Workaround a weird quirk of QuantState.to() and add a unit test to exercise it.	2024-12-24 14:32:11 +00:00
Ryan Dick	f8a6accf8a	Fix bitsandbytes imports to avoid ImportErrors on MacOS.	2024-12-24 14:32:11 +00:00
Ryan Dick	f8ab414f99	Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded.	2024-12-24 14:32:11 +00:00
Ryan Dick	c6795a1b47	Make CachedModelWithPartialLoad work with models that have non-persistent buffers.	2024-12-24 14:32:11 +00:00
Ryan Dick	0a8fc74ae9	Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules.	2024-12-24 14:32:11 +00:00
Ryan Dick	dc54e8763b	Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers.	2024-12-24 14:32:11 +00:00
Ryan Dick	1b56020876	Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers.	2024-12-24 14:32:11 +00:00
Ryan Dick	3f990393a1	Simplify the state management in InvokeLinear8bitLt and add unit tests. This is in preparation for wrapping it to support streaming of weights from cpu to gpu.	2024-12-24 14:32:11 +00:00
Ryan Dick	fe0ef2c27c	Add torch module autocast utilities.	2024-12-24 14:32:11 +00:00
Ryan Dick	65fcbf5f60	Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.	2024-12-24 14:32:11 +00:00
Ryan Dick	55b13c1da3	(minor) Add TODO comment regarding the location of get_model_cache_key().	2024-12-24 14:23:19 +00:00
Ryan Dick	7dc3e0fdbe	Get rid of ModelLocker. It was an unnecessary layer of indirection.	2024-12-24 14:23:18 +00:00

1 2 3 4 5 ...

2208 Commits