Commit Graph

14 Commits

Author SHA1 Message Date
Ryan Dick
a8b2c4c3d2 Add inference tests for all custom module types (i.e. to test autocasting from cpu to device). 2024-12-26 18:33:46 +00:00
Ryan Dick
03944191db Split test_autocast_modules.py into separate test files to mirror the source file structure. 2024-12-24 22:29:11 +00:00
Ryan Dick
987c9ae076 Move custom autocast modules to separate files in a custom_modules/ directory. 2024-12-24 22:21:31 +00:00
Ryan Dick
0fc538734b Skip flaky test when running on Github Actions, and further reduce peak unit test memory. 2024-12-24 14:32:11 +00:00
Ryan Dick
7214d4969b Workaround a weird quirk of QuantState.to() and add a unit test to exercise it. 2024-12-24 14:32:11 +00:00
Ryan Dick
a83a999b79 Reduce peak memory used for unit tests. 2024-12-24 14:32:11 +00:00
Ryan Dick
f8a6accf8a Fix bitsandbytes imports to avoid ImportErrors on MacOS. 2024-12-24 14:32:11 +00:00
Ryan Dick
f8ab414f99 Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded. 2024-12-24 14:32:11 +00:00
Ryan Dick
c6795a1b47 Make CachedModelWithPartialLoad work with models that have non-persistent buffers. 2024-12-24 14:32:11 +00:00
Ryan Dick
0a8fc74ae9 Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules. 2024-12-24 14:32:11 +00:00
Ryan Dick
dc54e8763b Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers. 2024-12-24 14:32:11 +00:00
Ryan Dick
1b56020876 Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers. 2024-12-24 14:32:11 +00:00
Ryan Dick
97d56f7dc9 Add torch module autocast unit test for GGUF-quantized models. 2024-12-24 14:32:11 +00:00
Ryan Dick
fe0ef2c27c Add torch module autocast utilities. 2024-12-24 14:32:11 +00:00