psychedelicious
|
5f12b9185f
|
feat(mm): add cache_snapshot to model cache clear callback
|
2025-05-15 16:06:47 +10:00 |
|
psychedelicious
|
d958d2e5a0
|
feat(mm): iterate on cache callbacks API
|
2025-05-15 14:37:22 +10:00 |
|
psychedelicious
|
823ca214e6
|
feat(mm): iterate on cache callbacks API
|
2025-05-15 13:28:51 +10:00 |
|
psychedelicious
|
a33da450fd
|
feat(mm): support cache callbacks
|
2025-05-15 11:23:58 +10:00 |
|
Billy
|
182580ff69
|
Imports
|
2025-03-26 12:55:10 +11:00 |
|
Billy
|
f2689598c0
|
Formatting
|
2025-03-06 09:11:00 +11:00 |
|
Ryan Dick
|
cc9d215a9b
|
Add endpoint for emptying the model cache. Also, adds a threading lock to the ModelCache to make it thread-safe.
|
2025-01-30 09:18:28 -05:00 |
|
Ryan Dick
|
f7315f0432
|
Make the default max RAM cache size more conservative.
|
2025-01-30 08:46:59 -05:00 |
|
Ryan Dick
|
0cf51cefe8
|
Revise the logic for calculating the RAM model cache limit.
|
2025-01-16 23:46:07 +00:00 |
|
Ryan Dick
|
36a3869af0
|
Add keep_ram_copy_of_weights config option.
|
2025-01-16 15:35:25 +00:00 |
|
Ryan Dick
|
04087c38ce
|
Add keep_ram_copy option to CachedModelWithPartialLoad.
|
2025-01-16 14:51:44 +00:00 |
|
Ryan Dick
|
d7ab464176
|
Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.
|
2025-01-07 02:53:44 +00:00 |
|
Ryan Dick
|
b343f81644
|
Use torch.cuda.memory_allocated() rather than torch.cuda.memory_reserved() to be more conservative in setting dynamic VRAM cache limits.
|
2025-01-07 01:20:15 +00:00 |
|
Ryan Dick
|
fc4a22fe78
|
Allow expensive operations to request more working memory.
|
2025-01-07 01:20:13 +00:00 |
|
Ryan Dick
|
a167632f09
|
Calculate model cache size limits dynamically based on the available RAM / VRAM.
|
2025-01-07 01:14:20 +00:00 |
|
Ryan Dick
|
6a9de1fcf3
|
Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated().
|
2025-01-07 00:31:53 +00:00 |
|
Ryan Dick
|
ceb2498a67
|
Add log prefix to model cache logs.
|
2025-01-07 00:31:00 +00:00 |
|
Ryan Dick
|
d0bfa019be
|
Add 'enable_partial_loading' config flag.
|
2025-01-07 00:31:00 +00:00 |
|
Ryan Dick
|
535e45cedf
|
First pass at adding partial loading support to the ModelCache.
|
2025-01-07 00:30:58 +00:00 |
|
Ryan Dick
|
c579a218ef
|
Allow models to be locked in VRAM, even if they have been dropped from the RAM cache (related: https://github.com/invoke-ai/InvokeAI/issues/7513).
|
2025-01-06 23:02:52 +00:00 |
|
Ryan Dick
|
6d49ee839c
|
Switch the LayerPatcher to use 'custom modules' to manage layer patching.
|
2024-12-29 01:18:30 +00:00 |
|
Ryan Dick
|
55b13c1da3
|
(minor) Add TODO comment regarding the location of get_model_cache_key().
|
2024-12-24 14:23:19 +00:00 |
|
Ryan Dick
|
7dc3e0fdbe
|
Get rid of ModelLocker. It was an unnecessary layer of indirection.
|
2024-12-24 14:23:18 +00:00 |
|
Ryan Dick
|
a39bcf7e85
|
Move lock(...) and unlock(...) logic from ModelLocker to the ModelCache and make a bunch of ModelCache properties/methods private.
|
2024-12-24 14:23:18 +00:00 |
|
Ryan Dick
|
a7c72992a6
|
Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type.
|
2024-12-24 14:23:18 +00:00 |
|
Ryan Dick
|
d30a9ced38
|
Rename model_cache_default.py -> model_cache.py.
|
2024-12-24 14:23:18 +00:00 |
|