Ryan Dick
|
f01e41ceaf
|
First pass at dynamically calculating the working memory requirements for the VAE decoding operation. Still need to tune SD3 and FLUX.
|
2024-12-19 15:26:16 -05:00 |
|
Ryan Dick
|
609ed06265
|
Add AutoencoderKL to the list of models that opt-out of partial loading.
|
2024-12-19 15:25:23 -05:00 |
|
Ryan Dick
|
f9e899a6ba
|
Make pinned pytorch version slightly more specific. We need at least 2.4 for access to torch.nn.functional.rms_norm(...).
|
2024-12-19 14:03:01 -05:00 |
|
Ryan Dick
|
9262c0ec53
|
Do not raise if a cache entry is deleted twice and ensure that OOM errors propagate up the stack.
|
2024-12-19 18:32:01 +00:00 |
|
Ryan Dick
|
7fddb06dc4
|
Add a list of models that opt-out of partial loading.
|
2024-12-19 16:00:56 +00:00 |
|
Ryan Dick
|
239297caf6
|
Tidy the API for overriding the working_mem_bytes for a particular operation.
|
2024-12-19 05:05:04 +00:00 |
|
Ryan Dick
|
20f0b2f4fa
|
Update app config docstring.
|
2024-12-19 04:33:26 +00:00 |
|
Ryan Dick
|
cfb8815355
|
Remove unused and outdated get_cache_size and set_cache_size endpoints.
|
2024-12-19 04:06:08 +00:00 |
|
Ryan Dick
|
c866b5a799
|
Allow legacy ram/vram configs to override default behavior if set.
|
2024-12-19 04:06:08 +00:00 |
|
Ryan Dick
|
3b76812d43
|
Only support partial model loading on CUDA.
|
2024-12-18 19:13:15 -05:00 |
|
Ryan Dick
|
a8f3471fc7
|
Drop models from the cache if we fail loading/unloading them.
|
2024-12-18 23:53:25 +00:00 |
|
Ryan Dick
|
6d8dee05a9
|
Use the cpu state dict strategy for managing CachedModelOnlyFullLoad memory.
|
2024-12-18 22:52:57 +00:00 |
|
Ryan Dick
|
e684e49299
|
Do not apply the autocast context when models are fully loaded onto the GPU - it adds some overhead.
|
2024-12-18 21:51:39 +00:00 |
|
Ryan Dick
|
4ce2042d65
|
Add remove_autocast_from_module_forward(...) utility.
|
2024-12-18 20:28:32 +00:00 |
|
Ryan Dick
|
05a50b557a
|
Update logic to enforce max size of RAM cache to avoid overfilling.
|
2024-12-18 20:21:38 +00:00 |
|
Ryan Dick
|
85e1e9587e
|
Add info logs each time a model is loaded.
|
2024-12-18 19:52:54 +00:00 |
|
Ryan Dick
|
8e763e87bb
|
Allow invocations to request more working VRAM when loading a model via the ModelCache.
|
2024-12-18 19:52:34 +00:00 |
|
Ryan Dick
|
4a4360a40c
|
Add enable_partial_loading config.
|
2024-12-18 17:17:08 +00:00 |
|
Ryan Dick
|
612d6b00e3
|
In FluxTextEncoderInvocation, make sure model is locked before loading next model.
|
2024-12-18 17:12:12 +00:00 |
|
Ryan Dick
|
7a5dd084ad
|
Update MPS cache limit logic.
|
2024-12-17 23:44:17 -05:00 |
|
Ryan Dick
|
79a4d0890f
|
WIP - add device_working_mem_gb config
|
2024-12-18 03:31:37 +00:00 |
|
Ryan Dick
|
e0c899104b
|
Consolidate the LayerPatching patching modes into a single implementation.
|
2024-12-17 18:33:36 +00:00 |
|
Ryan Dick
|
c37bb6375c
|
Rename model_patcher.py -> layer_patcher.py.
|
2024-12-17 17:19:12 +00:00 |
|
Ryan Dick
|
4716170988
|
Use torch.device('cpu') instead of 'cpu' when calling .to(), because some custom models don't support the latter.
|
2024-12-17 17:14:42 +00:00 |
|
Ryan Dick
|
463196d781
|
Update apply_smart_model_patches() so that layer restore matches the behavior of non-smart mode.
|
2024-12-17 17:13:45 +00:00 |
|
Ryan Dick
|
e1e756800d
|
Enable LoRAPatcher.apply_smart_lora_patches(...) throughout the stack.
|
2024-12-17 15:50:51 +00:00 |
|
Ryan Dick
|
ab337594b8
|
(minor) Rename num_layers -> num_loras in unit tests.
|
2024-12-17 15:39:01 +00:00 |
|
Ryan Dick
|
699e4e5995
|
Add test_apply_smart_lora_patches_to_partially_loaded_model(...).
|
2024-12-17 15:32:51 +00:00 |
|
Ryan Dick
|
33f17520ca
|
Add LoRAPatcher.smart_apply_lora_patches()
|
2024-12-17 15:29:04 +00:00 |
|
Ryan Dick
|
46d061212c
|
Update CachedModelWithPartialLoad to operate on state_dicts rather than moving torch.nn.Modules around.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
829dddefc8
|
Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
b6c159cfdb
|
Fix bug with partial offload of model buffers.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
5a31c467a3
|
Fix bug in ModelCache that was causing it to offload more models from VRAM than necessary.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
13dbde2429
|
Fix handling of torch.nn.Module buffers in CachedModelWithPartialLoad.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
a8ee72d7fb
|
Maintain a read-only CPU state dict copy in CachedModelWithPartialLoad.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
7a002e1b05
|
Memoize frequently accessed values in CachedModelWithPartialLoad.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
b50dd8502f
|
More ModelCache logging improvements.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
f4c13b057d
|
Cleanup of ModelCache and added a bunch of debug logging.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
cb884ee567
|
Fix a couple of bugs to get basic vanilla partial model load working with the model cache.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
050d4465e6
|
WIP - first pass at overhauling ModelCache to work with partial loads.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
e48bb844b9
|
Delete experimental torch device autocasting solutions and clean up TorchFunctionAutocastDeviceContext.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
57eb05983b
|
Create CachedModelOnlyFullLoad class.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
dc3be08653
|
Move CachedModelWithPartialLoad into the main model_cache/ directory.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
ae1041286f
|
Get rid of ModelLocker. It was an unnecessary layer of indirection.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
6e270cc5bf
|
Move lock(...) and unlock(...) logic from ModelLocker to the ModelCache and make a bunch of ModelCache properties/methods private.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
6dc447aba8
|
Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
a4c0fcb6c8
|
Rename model_cache_default.py -> model_cache.py.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
1f3580716c
|
Remove ModelCacheBase.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
405e53f80a
|
Move CacheStats to its own file.
|
2024-12-17 15:18:55 +00:00 |
|
Ryan Dick
|
be120ff587
|
Move CacheRecord out to its own file.
|
2024-12-17 15:18:55 +00:00 |
|