Ryan Dick
|
f8a6accf8a
|
Fix bitsandbytes imports to avoid ImportErrors on MacOS.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
f8ab414f99
|
Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
c6795a1b47
|
Make CachedModelWithPartialLoad work with models that have non-persistent buffers.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
0a8fc74ae9
|
Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
dc54e8763b
|
Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
1b56020876
|
Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
97d56f7dc9
|
Add torch module autocast unit test for GGUF-quantized models.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
fe0ef2c27c
|
Add torch module autocast utilities.
|
2024-12-24 14:32:11 +00:00 |
|