Jonathan
dc5007fe95
Fix/model cache Qwen/CogView4 cancel repair ( #8959 )
...
* Repair partially loaded Qwen models after cancel to avoid device mismatches
* ruff
* Repair CogView4 text encoder after canceled partial loads
* Avoid MPS CI crash in repair regression test
* Fix MPS device assertion in repair test
2026-03-15 10:04:15 -04:00
Lincoln Stein
76b0838094
Feature(backend): Add user toggle to run encoder models on CPU ( #8777 )
...
* feature(backend) Add user toggle to run encoder models on CPU
Co-authored-by: lstein <111189+lstein@users.noreply.github.com >
Add frontend UI for CPU-only model execution toggle
Co-authored-by: lstein <111189+lstein@users.noreply.github.com >
* chore(frontend): remove package lock file created by npm
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: lstein <111189+lstein@users.noreply.github.com >
Co-authored-by: Jonathan <34005131+JPPhoto@users.noreply.github.com >
2026-02-04 15:13:29 -05:00
David Burnett
6c0bd7d150
fix import ordering, remove code I reverted that the resync added back
2025-05-19 11:16:23 +10:00
David Burnett
8abcc99ced
add check for state_dict, required to load TI's
2025-05-19 11:16:23 +10:00
David Burnett
73ab4b8895
fix offload device
2025-05-19 11:16:23 +10:00
David Burnett
86719f2065
revert to overload due to failing tests, use Torch futures instead
2025-05-19 11:16:23 +10:00
Ryan Dick
da589b3f1f
Memory optimization to load state dicts one module at a time in CachedModelWithPartialLoad when we are not storing a CPU copy of the state dict (i.e. when keep_ram_copy_of_weights=False).
2025-01-16 17:00:33 +00:00
Ryan Dick
c76d08d1fd
Add keep_ram_copy option to CachedModelOnlyFullLoad.
2025-01-16 15:08:23 +00:00
Ryan Dick
04087c38ce
Add keep_ram_copy option to CachedModelWithPartialLoad.
2025-01-16 14:51:44 +00:00
Ryan Dick
d7ab464176
Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.
2025-01-07 02:53:44 +00:00
Ryan Dick
1b7bb70bde
Improve handling of cases when application code modifies the size of a model after registering it with the model cache.
2025-01-07 00:31:00 +00:00
Ryan Dick
7127040c3a
Remove unused function set_nested_attr(...).
2025-01-07 00:31:00 +00:00
Ryan Dick
6d49ee839c
Switch the LayerPatcher to use 'custom modules' to manage layer patching.
2024-12-29 01:18:30 +00:00
Ryan Dick
f8ab414f99
Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded.
2024-12-24 14:32:11 +00:00
Ryan Dick
c6795a1b47
Make CachedModelWithPartialLoad work with models that have non-persistent buffers.
2024-12-24 14:32:11 +00:00
Ryan Dick
0a8fc74ae9
Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules.
2024-12-24 14:32:11 +00:00