David Burnett
|
6c0bd7d150
|
fix import ordering, remove code I reverted that the resync added back
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
8abcc99ced
|
add check for state_dict, required to load TI's
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
73ab4b8895
|
fix offload device
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
86719f2065
|
revert to overload due to failing tests, use Torch futures instead
|
2025-05-19 11:16:23 +10:00 |
|
psychedelicious
|
5f12b9185f
|
feat(mm): add cache_snapshot to model cache clear callback
|
2025-05-15 16:06:47 +10:00 |
|
psychedelicious
|
d958d2e5a0
|
feat(mm): iterate on cache callbacks API
|
2025-05-15 14:37:22 +10:00 |
|
psychedelicious
|
823ca214e6
|
feat(mm): iterate on cache callbacks API
|
2025-05-15 13:28:51 +10:00 |
|
psychedelicious
|
a33da450fd
|
feat(mm): support cache callbacks
|
2025-05-15 11:23:58 +10:00 |
|
Kent Keirsey
|
1f63b60021
|
Implementing support for Non-Standard LoRA Format (#7985)
* integrate loRA
* idk anymore tbh
* enable fused matrix for quantized models
* integrate loRA
* idk anymore tbh
* enable fused matrix for quantized models
* ruff fix
---------
Co-authored-by: Sam <bhaskarmdutt@gmail.com>
Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>
|
2025-05-05 09:40:38 -04:00 |
|
Billy
|
182580ff69
|
Imports
|
2025-03-26 12:55:10 +11:00 |
|
Billy
|
f2689598c0
|
Formatting
|
2025-03-06 09:11:00 +11:00 |
|
Ryan Dick
|
cc9d215a9b
|
Add endpoint for emptying the model cache. Also, adds a threading lock to the ModelCache to make it thread-safe.
|
2025-01-30 09:18:28 -05:00 |
|
Ryan Dick
|
f7315f0432
|
Make the default max RAM cache size more conservative.
|
2025-01-30 08:46:59 -05:00 |
|
Ryan Dick
|
229834a5e8
|
Performance optimizations for LoRAs applied on top of GGML-quantized tensors.
|
2025-01-28 14:51:35 +00:00 |
|
Ryan Dick
|
5d472ac1b8
|
Move quantized weight handling for patch layers up from ConcatenatedLoRALayer to CustomModuleMixin.
|
2025-01-28 14:51:35 +00:00 |
|
Ryan Dick
|
28514ba59a
|
Update ConcatenatedLoRALayer to work with all sub-layer types.
|
2025-01-28 14:51:35 +00:00 |
|
Ryan Dick
|
0cf51cefe8
|
Revise the logic for calculating the RAM model cache limit.
|
2025-01-16 23:46:07 +00:00 |
|
Ryan Dick
|
da589b3f1f
|
Memory optimization to load state dicts one module at a time in CachedModelWithPartialLoad when we are not storing a CPU copy of the state dict (i.e. when keep_ram_copy_of_weights=False).
|
2025-01-16 17:00:33 +00:00 |
|
Ryan Dick
|
36a3869af0
|
Add keep_ram_copy_of_weights config option.
|
2025-01-16 15:35:25 +00:00 |
|
Ryan Dick
|
c76d08d1fd
|
Add keep_ram_copy option to CachedModelOnlyFullLoad.
|
2025-01-16 15:08:23 +00:00 |
|
Ryan Dick
|
04087c38ce
|
Add keep_ram_copy option to CachedModelWithPartialLoad.
|
2025-01-16 14:51:44 +00:00 |
|
Ryan Dick
|
d7ab464176
|
Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.
|
2025-01-07 02:53:44 +00:00 |
|
Ryan Dick
|
5b42b7bd45
|
Add a utility to help with determining the working memory required for expensive operations.
|
2025-01-07 01:20:15 +00:00 |
|
Ryan Dick
|
b343f81644
|
Use torch.cuda.memory_allocated() rather than torch.cuda.memory_reserved() to be more conservative in setting dynamic VRAM cache limits.
|
2025-01-07 01:20:15 +00:00 |
|
Ryan Dick
|
fc4a22fe78
|
Allow expensive operations to request more working memory.
|
2025-01-07 01:20:13 +00:00 |
|
Ryan Dick
|
a167632f09
|
Calculate model cache size limits dynamically based on the available RAM / VRAM.
|
2025-01-07 01:14:20 +00:00 |
|
Ryan Dick
|
6a9de1fcf3
|
Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated().
|
2025-01-07 00:31:53 +00:00 |
|
Ryan Dick
|
e5180c4e6b
|
Add get_effective_device(...) utility to aid in determining the effective device of models that are partially loaded.
|
2025-01-07 00:31:00 +00:00 |
|
Ryan Dick
|
1b7bb70bde
|
Improve handling of cases when application code modifies the size of a model after registering it with the model cache.
|
2025-01-07 00:31:00 +00:00 |
|
Ryan Dick
|
7127040c3a
|
Remove unused function set_nested_attr(...).
|
2025-01-07 00:31:00 +00:00 |
|
Ryan Dick
|
ceb2498a67
|
Add log prefix to model cache logs.
|
2025-01-07 00:31:00 +00:00 |
|
Ryan Dick
|
d0bfa019be
|
Add 'enable_partial_loading' config flag.
|
2025-01-07 00:31:00 +00:00 |
|
Ryan Dick
|
535e45cedf
|
First pass at adding partial loading support to the ModelCache.
|
2025-01-07 00:30:58 +00:00 |
|
Ryan Dick
|
c579a218ef
|
Allow models to be locked in VRAM, even if they have been dropped from the RAM cache (related: https://github.com/invoke-ai/InvokeAI/issues/7513).
|
2025-01-06 23:02:52 +00:00 |
|
Ryan Dick
|
8b4b0ff0cf
|
Fix bug in CustomConv1d and CustomConv2d patch calculations.
|
2024-12-29 19:10:19 +00:00 |
|
Ryan Dick
|
a8bef59699
|
First pass at making custom layer patches work with weights streamed from the CPU to the GPU.
|
2024-12-29 17:01:37 +00:00 |
|
Ryan Dick
|
6d49ee839c
|
Switch the LayerPatcher to use 'custom modules' to manage layer patching.
|
2024-12-29 01:18:30 +00:00 |
|
Ryan Dick
|
0525f967c2
|
Fix the _autocast_forward_with_patches() function for CustomConv1d and CustomConv2d.
|
2024-12-29 00:22:37 +00:00 |
|
Ryan Dick
|
2855bb6b41
|
Update BaseLayerPatch.get_parameters(...) to accept a dict of orig_parameters rather than orig_module. This will enable compatibility between patching and cpu->gpu streaming.
|
2024-12-28 21:12:53 +00:00 |
|
Ryan Dick
|
20acfc9a00
|
Raise in CustomEmbedding and CustomGroupNorm if a patch is applied.
|
2024-12-28 20:49:17 +00:00 |
|
Ryan Dick
|
918f541af8
|
Add unit test for a SetParameterLayer patch applied to a CustomFluxRMSNorm layer.
|
2024-12-28 20:44:48 +00:00 |
|
Ryan Dick
|
93e76b61d6
|
Add CustomFluxRMSNorm layer.
|
2024-12-28 20:33:38 +00:00 |
|
Ryan Dick
|
f692e217ea
|
Add patch support to CustomConv1d and CustomConv2d (no unit tests yet).
|
2024-12-27 22:23:17 +00:00 |
|
Ryan Dick
|
f2981979f9
|
Get custom layer patches working with all quantized linear layer types.
|
2024-12-27 22:00:22 +00:00 |
|
Ryan Dick
|
ef970a1cdc
|
Add support for FluxControlLoRALayer in CustomLinear layers and add a unit test for it.
|
2024-12-27 21:00:47 +00:00 |
|
Ryan Dick
|
e24e386a27
|
Add support for patches to CustomModuleMixin and add a single unit test (more to come).
|
2024-12-27 18:57:13 +00:00 |
|
Ryan Dick
|
b06d61e3c0
|
Improve custom layer wrap/unwrap logic.
|
2024-12-27 16:29:48 +00:00 |
|
Ryan Dick
|
7d6ab0ceb2
|
Add a CustomModuleMixin class with a flag for enabling/disabling autocasting (since it incurs some runtime speed overhead.)
|
2024-12-26 20:08:30 +00:00 |
|
Ryan Dick
|
987c9ae076
|
Move custom autocast modules to separate files in a custom_modules/ directory.
|
2024-12-24 22:21:31 +00:00 |
|
Ryan Dick
|
7214d4969b
|
Workaround a weird quirk of QuantState.to() and add a unit test to exercise it.
|
2024-12-24 14:32:11 +00:00 |
|