Lincoln Stein
8cf4c6944a
(style) ruff fix
2026-01-03 14:54:15 -05:00
Lincoln Stein
db228ddc4f
(style) add @record_activity and @synchronized to locked methods
2026-01-03 14:52:31 -05:00
copilot-swe-agent[bot]
4987b4da1c
Fix timeout message appearing during active generation
...
Only log "Clearing model cache" message when there are actually unlocked
models to clear. This prevents the misleading message from appearing during
active generation when all models are locked.
Changes:
- Check for unlocked models before logging clear message
- Add count of unlocked models in log message
- Add debug log when all models are locked
- Improves user experience by avoiding confusing messages
Co-authored-by: lstein <111189+lstein@users.noreply.github.com >
2025-12-24 05:31:11 +00:00
copilot-swe-agent[bot]
8d76b4e4d4
Fix ruff whitespace errors and improve timeout logging
...
- Remove all trailing whitespace (W293 errors)
- Add debug logging when timeout fires but activity detected
- Add debug logging when timeout fires but cache is empty
- Only log "Clearing model cache" message when actually clearing
- Prevents misleading timeout messages during active generation
Co-authored-by: lstein <111189+lstein@users.noreply.github.com >
2025-12-24 04:05:57 +00:00
copilot-swe-agent[bot]
c3217d8a08
Address code review feedback
...
- Remove unused variable in test
- Add clarifying comment for daemon thread setting
- Add detailed comment explaining cache clearing with 1000 GB value
- Improve code documentation
Co-authored-by: lstein <111189+lstein@users.noreply.github.com >
2025-12-24 00:27:39 +00:00
copilot-swe-agent[bot]
2500153ed8
Fix race condition in timeout mechanism
...
- Added clarifying comment that _record_activity is called with lock held
- Enhanced double-check in _on_timeout for thread safety
- Added lock protection to shutdown method
- Improved handling of edge cases where timer fires during activity
Co-authored-by: lstein <111189+lstein@users.noreply.github.com >
2025-12-24 00:26:01 +00:00
copilot-swe-agent[bot]
9bbd2b3f11
Add model_cache_keep_alive config option and timeout mechanism
...
- Added model_cache_keep_alive config field (minutes, default 0 = infinite)
- Implemented timeout tracking in ModelCache class
- Added _record_activity() to track model usage
- Added _on_timeout() to auto-clear cache when timeout expires
- Added shutdown() method to clean up timers
- Integrated timeout with get(), lock(), unlock(), and put() operations
- Updated ModelManagerService to pass keep_alive parameter
- Added cleanup in stop() method
Co-authored-by: lstein <111189+lstein@users.noreply.github.com >
2025-12-24 00:22:59 +00:00
psychedelicious
5f12b9185f
feat(mm): add cache_snapshot to model cache clear callback
2025-05-15 16:06:47 +10:00
psychedelicious
d958d2e5a0
feat(mm): iterate on cache callbacks API
2025-05-15 14:37:22 +10:00
psychedelicious
823ca214e6
feat(mm): iterate on cache callbacks API
2025-05-15 13:28:51 +10:00
psychedelicious
a33da450fd
feat(mm): support cache callbacks
2025-05-15 11:23:58 +10:00
Billy
182580ff69
Imports
2025-03-26 12:55:10 +11:00
Billy
f2689598c0
Formatting
2025-03-06 09:11:00 +11:00
Ryan Dick
cc9d215a9b
Add endpoint for emptying the model cache. Also, adds a threading lock to the ModelCache to make it thread-safe.
2025-01-30 09:18:28 -05:00
Ryan Dick
f7315f0432
Make the default max RAM cache size more conservative.
2025-01-30 08:46:59 -05:00
Ryan Dick
0cf51cefe8
Revise the logic for calculating the RAM model cache limit.
2025-01-16 23:46:07 +00:00
Ryan Dick
36a3869af0
Add keep_ram_copy_of_weights config option.
2025-01-16 15:35:25 +00:00
Ryan Dick
04087c38ce
Add keep_ram_copy option to CachedModelWithPartialLoad.
2025-01-16 14:51:44 +00:00
Ryan Dick
d7ab464176
Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.
2025-01-07 02:53:44 +00:00
Ryan Dick
b343f81644
Use torch.cuda.memory_allocated() rather than torch.cuda.memory_reserved() to be more conservative in setting dynamic VRAM cache limits.
2025-01-07 01:20:15 +00:00
Ryan Dick
fc4a22fe78
Allow expensive operations to request more working memory.
2025-01-07 01:20:13 +00:00
Ryan Dick
a167632f09
Calculate model cache size limits dynamically based on the available RAM / VRAM.
2025-01-07 01:14:20 +00:00
Ryan Dick
6a9de1fcf3
Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated().
2025-01-07 00:31:53 +00:00
Ryan Dick
ceb2498a67
Add log prefix to model cache logs.
2025-01-07 00:31:00 +00:00
Ryan Dick
d0bfa019be
Add 'enable_partial_loading' config flag.
2025-01-07 00:31:00 +00:00
Ryan Dick
535e45cedf
First pass at adding partial loading support to the ModelCache.
2025-01-07 00:30:58 +00:00
Ryan Dick
c579a218ef
Allow models to be locked in VRAM, even if they have been dropped from the RAM cache (related: https://github.com/invoke-ai/InvokeAI/issues/7513 ).
2025-01-06 23:02:52 +00:00
Ryan Dick
6d49ee839c
Switch the LayerPatcher to use 'custom modules' to manage layer patching.
2024-12-29 01:18:30 +00:00
Ryan Dick
55b13c1da3
(minor) Add TODO comment regarding the location of get_model_cache_key().
2024-12-24 14:23:19 +00:00
Ryan Dick
7dc3e0fdbe
Get rid of ModelLocker. It was an unnecessary layer of indirection.
2024-12-24 14:23:18 +00:00
Ryan Dick
a39bcf7e85
Move lock(...) and unlock(...) logic from ModelLocker to the ModelCache and make a bunch of ModelCache properties/methods private.
2024-12-24 14:23:18 +00:00
Ryan Dick
a7c72992a6
Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type.
2024-12-24 14:23:18 +00:00
Ryan Dick
d30a9ced38
Rename model_cache_default.py -> model_cache.py.
2024-12-24 14:23:18 +00:00