2235 Commits

Author SHA1 Message Date
Ryan Dick
cc9d215a9b Add endpoint for emptying the model cache. Also, adds a threading lock to the ModelCache to make it thread-safe. 2025-01-30 09:18:28 -05:00
Ryan Dick
f7315f0432 Make the default max RAM cache size more conservative. 2025-01-30 08:46:59 -05:00
Ryan Dick
229834a5e8 Performance optimizations for LoRAs applied on top of GGML-quantized tensors. 2025-01-28 14:51:35 +00:00
Ryan Dick
6c919e1bca Handle DoRA layer device casting when model is partially-loaded. 2025-01-28 14:51:35 +00:00
Ryan Dick
5357d6e08e Rename ConcatenatedLoRALayer to MergedLayerPatch. And other minor cleanup. 2025-01-28 14:51:35 +00:00
Ryan Dick
7fef569e38 Update frontend graph building logic to support FLUX LoRAs that modify the T5 encoder weights. 2025-01-28 14:51:35 +00:00
Ryan Dick
e7fb435cc5 Update DoRALayer with a custom get_parameters() override that 1) applies alpha scaling to delta_v, and 2) warns if the base model is incompatible. 2025-01-28 14:51:35 +00:00
Ryan Dick
5d472ac1b8 Move quantized weight handling for patch layers up from ConcatenatedLoRALayer to CustomModuleMixin. 2025-01-28 14:51:35 +00:00
Ryan Dick
28514ba59a Update ConcatenatedLoRALayer to work with all sub-layer types. 2025-01-28 14:51:35 +00:00
Ryan Dick
5ea7953537 Update GGMLTensor with ops necessary to work with ConcatenatedLoRALayer. 2025-01-28 14:51:35 +00:00
Ryan Dick
0db6639b4b Add FLUX OneTrainer model probing. 2025-01-28 14:51:35 +00:00
Ryan Dick
b8eed2bdcb Relax lora_layers_from_flux_diffusers_grouped_state_dict(...) so that it can work with more LoRA variants (e.g. hada) 2025-01-28 14:51:35 +00:00
Ryan Dick
1054283f5c Fix bug in FLUX T5 Koyha-style LoRA key parsing. 2025-01-28 14:51:35 +00:00
Ryan Dick
409b69ee5d Fix typo in DoRALayer. 2025-01-28 14:51:35 +00:00
Ryan Dick
206f261e45 Add utils for loading FLUX OneTrainer DoRA models. 2025-01-28 14:51:35 +00:00
Ryan Dick
7eee4da896 Further updates to lora_model_from_flux_diffusers_state_dict() so that it can be re-used for OneTrainer LoRAs. 2025-01-28 14:51:35 +00:00
Ryan Dick
908976ac08 Add support for LyCoris-style LoRA keys in lora_model_from_flux_diffusers_state_dict(). Previously, it only supported PEFT-style LoRA keys. 2025-01-28 14:51:35 +00:00
Ryan Dick
dfa253e75b Add utils for working with Kohya LoRA keys. 2025-01-28 14:51:35 +00:00
Ryan Dick
4f369e3dfb First draft of DoRALayer. Not tested yet. 2025-01-28 14:51:35 +00:00
Ryan Dick
5bd6428fdd Add is_state_dict_likely_in_flux_onetrainer_format() util function. 2025-01-28 14:51:35 +00:00
Ryan Dick
f88c1ba0c3 Fix bug with some LoRA variants when applied to a BnB NF4 quantized model. Note the previous commit which added a unit test to trigger this bug. 2025-01-22 09:20:40 +11:00
Ryan Dick
0cf51cefe8 Revise the logic for calculating the RAM model cache limit. 2025-01-16 23:46:07 +00:00
Ryan Dick
da589b3f1f Memory optimization to load state dicts one module at a time in CachedModelWithPartialLoad when we are not storing a CPU copy of the state dict (i.e. when keep_ram_copy_of_weights=False). 2025-01-16 17:00:33 +00:00
Ryan Dick
36a3869af0 Add keep_ram_copy_of_weights config option. 2025-01-16 15:35:25 +00:00
Ryan Dick
c76d08d1fd Add keep_ram_copy option to CachedModelOnlyFullLoad. 2025-01-16 15:08:23 +00:00
Ryan Dick
04087c38ce Add keep_ram_copy option to CachedModelWithPartialLoad. 2025-01-16 14:51:44 +00:00
Ryan Dick
b2bb359d47 Update the model loading logic for several of the large FLUX-related models to ensure that the model is initialized on the meta device prior to loading the state dict into it. This helps to keep peak memory down. 2025-01-16 02:30:28 +00:00
Ryan Dick
b301785dc8 Normalize the T5 model identifiers so that a FLUX T5 or an SD3 T5 model can be used interchangeably. 2025-01-16 08:33:58 +11:00
Ryan Dick
607d19f4dd We should not trust the value of since the model could be partially-loaded. 2025-01-07 19:22:31 -05:00
Ryan Dick
d7ab464176 Offload the current model when locking if it is already partially loaded and we have insufficient VRAM. 2025-01-07 02:53:44 +00:00
Ryan Dick
5b42b7bd45 Add a utility to help with determining the working memory required for expensive operations. 2025-01-07 01:20:15 +00:00
Ryan Dick
b343f81644 Use torch.cuda.memory_allocated() rather than torch.cuda.memory_reserved() to be more conservative in setting dynamic VRAM cache limits. 2025-01-07 01:20:15 +00:00
Ryan Dick
fc4a22fe78 Allow expensive operations to request more working memory. 2025-01-07 01:20:13 +00:00
Ryan Dick
a167632f09 Calculate model cache size limits dynamically based on the available RAM / VRAM. 2025-01-07 01:14:20 +00:00
Ryan Dick
6a9de1fcf3 Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated(). 2025-01-07 00:31:53 +00:00
Ryan Dick
e5180c4e6b Add get_effective_device(...) utility to aid in determining the effective device of models that are partially loaded. 2025-01-07 00:31:00 +00:00
Ryan Dick
2619ef53ca Handle device casting in ia2_layer.py. 2025-01-07 00:31:00 +00:00
Ryan Dick
bcd29c5d74 Remove all cases where we check the 'model.device'. This is no longer trustworthy now that partial loading is permitted. 2025-01-07 00:31:00 +00:00
Ryan Dick
1b7bb70bde Improve handling of cases when application code modifies the size of a model after registering it with the model cache. 2025-01-07 00:31:00 +00:00
Ryan Dick
7127040c3a Remove unused function set_nested_attr(...). 2025-01-07 00:31:00 +00:00
Ryan Dick
ceb2498a67 Add log prefix to model cache logs. 2025-01-07 00:31:00 +00:00
Ryan Dick
d0bfa019be Add 'enable_partial_loading' config flag. 2025-01-07 00:31:00 +00:00
Ryan Dick
535e45cedf First pass at adding partial loading support to the ModelCache. 2025-01-07 00:30:58 +00:00
Ryan Dick
c579a218ef Allow models to be locked in VRAM, even if they have been dropped from the RAM cache (related: https://github.com/invoke-ai/InvokeAI/issues/7513). 2025-01-06 23:02:52 +00:00
Ryan Dick
8b4b0ff0cf Fix bug in CustomConv1d and CustomConv2d patch calculations. 2024-12-29 19:10:19 +00:00
Ryan Dick
6fd9b0a274 Delete old sidecar wrapper implementation. This functionality has moved into the custom layers. 2024-12-29 17:33:08 +00:00
Ryan Dick
a8bef59699 First pass at making custom layer patches work with weights streamed from the CPU to the GPU. 2024-12-29 17:01:37 +00:00
Ryan Dick
6d49ee839c Switch the LayerPatcher to use 'custom modules' to manage layer patching. 2024-12-29 01:18:30 +00:00
Ryan Dick
0525f967c2 Fix the _autocast_forward_with_patches() function for CustomConv1d and CustomConv2d. 2024-12-29 00:22:37 +00:00
Ryan Dick
2855bb6b41 Update BaseLayerPatch.get_parameters(...) to accept a dict of orig_parameters rather than orig_module. This will enable compatibility between patching and cpu->gpu streaming. 2024-12-28 21:12:53 +00:00