Ryan Dick
fc4a22fe78
Allow expensive operations to request more working memory.
2025-01-07 01:20:13 +00:00
Ryan Dick
a167632f09
Calculate model cache size limits dynamically based on the available RAM / VRAM.
2025-01-07 01:14:20 +00:00
Ryan Dick
6a9de1fcf3
Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated().
2025-01-07 00:31:53 +00:00
Ryan Dick
e5180c4e6b
Add get_effective_device(...) utility to aid in determining the effective device of models that are partially loaded.
2025-01-07 00:31:00 +00:00
Ryan Dick
1b7bb70bde
Improve handling of cases when application code modifies the size of a model after registering it with the model cache.
2025-01-07 00:31:00 +00:00
Ryan Dick
7127040c3a
Remove unused function set_nested_attr(...).
2025-01-07 00:31:00 +00:00
Ryan Dick
ceb2498a67
Add log prefix to model cache logs.
2025-01-07 00:31:00 +00:00
Ryan Dick
d0bfa019be
Add 'enable_partial_loading' config flag.
2025-01-07 00:31:00 +00:00
Ryan Dick
535e45cedf
First pass at adding partial loading support to the ModelCache.
2025-01-07 00:30:58 +00:00
Ryan Dick
c579a218ef
Allow models to be locked in VRAM, even if they have been dropped from the RAM cache (related: https://github.com/invoke-ai/InvokeAI/issues/7513 ).
2025-01-06 23:02:52 +00:00
Ryan Dick
8b4b0ff0cf
Fix bug in CustomConv1d and CustomConv2d patch calculations.
2024-12-29 19:10:19 +00:00
Ryan Dick
a8bef59699
First pass at making custom layer patches work with weights streamed from the CPU to the GPU.
2024-12-29 17:01:37 +00:00
Ryan Dick
6d49ee839c
Switch the LayerPatcher to use 'custom modules' to manage layer patching.
2024-12-29 01:18:30 +00:00
Ryan Dick
0525f967c2
Fix the _autocast_forward_with_patches() function for CustomConv1d and CustomConv2d.
2024-12-29 00:22:37 +00:00
Ryan Dick
2855bb6b41
Update BaseLayerPatch.get_parameters(...) to accept a dict of orig_parameters rather than orig_module. This will enable compatibility between patching and cpu->gpu streaming.
2024-12-28 21:12:53 +00:00
Ryan Dick
20acfc9a00
Raise in CustomEmbedding and CustomGroupNorm if a patch is applied.
2024-12-28 20:49:17 +00:00
Ryan Dick
918f541af8
Add unit test for a SetParameterLayer patch applied to a CustomFluxRMSNorm layer.
2024-12-28 20:44:48 +00:00
Ryan Dick
93e76b61d6
Add CustomFluxRMSNorm layer.
2024-12-28 20:33:38 +00:00
Ryan Dick
f692e217ea
Add patch support to CustomConv1d and CustomConv2d (no unit tests yet).
2024-12-27 22:23:17 +00:00
Ryan Dick
f2981979f9
Get custom layer patches working with all quantized linear layer types.
2024-12-27 22:00:22 +00:00
Ryan Dick
ef970a1cdc
Add support for FluxControlLoRALayer in CustomLinear layers and add a unit test for it.
2024-12-27 21:00:47 +00:00
Ryan Dick
e24e386a27
Add support for patches to CustomModuleMixin and add a single unit test (more to come).
2024-12-27 18:57:13 +00:00
Ryan Dick
b06d61e3c0
Improve custom layer wrap/unwrap logic.
2024-12-27 16:29:48 +00:00
Ryan Dick
7d6ab0ceb2
Add a CustomModuleMixin class with a flag for enabling/disabling autocasting (since it incurs some runtime speed overhead.)
2024-12-26 20:08:30 +00:00
Ryan Dick
987c9ae076
Move custom autocast modules to separate files in a custom_modules/ directory.
2024-12-24 22:21:31 +00:00
Ryan Dick
7214d4969b
Workaround a weird quirk of QuantState.to() and add a unit test to exercise it.
2024-12-24 14:32:11 +00:00
Ryan Dick
f8a6accf8a
Fix bitsandbytes imports to avoid ImportErrors on MacOS.
2024-12-24 14:32:11 +00:00
Ryan Dick
f8ab414f99
Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded.
2024-12-24 14:32:11 +00:00
Ryan Dick
c6795a1b47
Make CachedModelWithPartialLoad work with models that have non-persistent buffers.
2024-12-24 14:32:11 +00:00
Ryan Dick
0a8fc74ae9
Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules.
2024-12-24 14:32:11 +00:00
Ryan Dick
dc54e8763b
Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers.
2024-12-24 14:32:11 +00:00
Ryan Dick
1b56020876
Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers.
2024-12-24 14:32:11 +00:00
Ryan Dick
fe0ef2c27c
Add torch module autocast utilities.
2024-12-24 14:32:11 +00:00
Ryan Dick
55b13c1da3
(minor) Add TODO comment regarding the location of get_model_cache_key().
2024-12-24 14:23:19 +00:00
Ryan Dick
7dc3e0fdbe
Get rid of ModelLocker. It was an unnecessary layer of indirection.
2024-12-24 14:23:18 +00:00
Ryan Dick
a39bcf7e85
Move lock(...) and unlock(...) logic from ModelLocker to the ModelCache and make a bunch of ModelCache properties/methods private.
2024-12-24 14:23:18 +00:00
Ryan Dick
a7c72992a6
Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type.
2024-12-24 14:23:18 +00:00
Ryan Dick
d30a9ced38
Rename model_cache_default.py -> model_cache.py.
2024-12-24 14:23:18 +00:00
Ryan Dick
e0bfa6157b
Remove ModelCacheBase.
2024-12-24 14:23:18 +00:00
Ryan Dick
83ea6420e2
Move CacheStats to its own file.
2024-12-24 14:23:18 +00:00
Ryan Dick
ce11a1952e
Move CacheRecord out to its own file.
2024-12-24 14:23:18 +00:00
Ryan Dick
e48dee4c4a
Rip out ModelLockerBase.
2024-12-24 14:23:18 +00:00
Lincoln Stein
8d35af946e
[MM] add API routes for getting & setting MM cache sizes ( #6523 )
...
* [MM] add API routes for getting & setting MM cache sizes, and retrieving MM stats
* Update invokeai/app/api/routers/model_manager.py
Co-authored-by: Ryan Dick <ryanjdick3@gmail.com >
* code cleanup after @ryand review
* Update invokeai/app/api/routers/model_manager.py
Co-authored-by: Ryan Dick <ryanjdick3@gmail.com >
* fix merge conflicts; tested and working
---------
Co-authored-by: Lincoln Stein <lstein@gmail.com >
Co-authored-by: Ryan Dick <ryanjdick3@gmail.com >
2024-09-02 12:18:21 -04:00
Ryan Dick
6ba9b1b6b0
Tidy up GIG -> GB and remove unused GIG constant.
2024-08-29 19:08:18 +00:00
Ryan Dick
c578b8df1e
Improve ModelCache docs.
2024-08-29 19:08:18 +00:00
Ryan Dick
cad9a41433
Remove unused MOdelCache.exists(...) function.
2024-08-29 19:08:18 +00:00
Ryan Dick
5fefb3b0f4
Remove unused param from ModelCache.
2024-08-29 19:08:18 +00:00
Ryan Dick
5284a870b0
Remove unused constructor params from ModelCache.
2024-08-29 19:08:18 +00:00
Ryan Dick
e064377c05
Remove default model cache sizes from model_cache_default.py. These defaults were misleading, because the config defaults take precedence over them.
2024-08-29 19:08:18 +00:00
Lincoln Stein
97a7f51721
don't use cpu state_dict for model unpatching when executing on cpu ( #6631 )
...
Co-authored-by: Lincoln Stein <lstein@gmail.com >
2024-07-18 15:34:01 -04:00