Ryan Dick
|
a8b2c4c3d2
|
Add inference tests for all custom module types (i.e. to test autocasting from cpu to device).
|
2024-12-26 18:33:46 +00:00 |
|
Ryan Dick
|
03944191db
|
Split test_autocast_modules.py into separate test files to mirror the source file structure.
|
2024-12-24 22:29:11 +00:00 |
|
Ryan Dick
|
987c9ae076
|
Move custom autocast modules to separate files in a custom_modules/ directory.
|
2024-12-24 22:21:31 +00:00 |
|
Ryan Dick
|
0fc538734b
|
Skip flaky test when running on Github Actions, and further reduce peak unit test memory.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
7214d4969b
|
Workaround a weird quirk of QuantState.to() and add a unit test to exercise it.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
a83a999b79
|
Reduce peak memory used for unit tests.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
f8a6accf8a
|
Fix bitsandbytes imports to avoid ImportErrors on MacOS.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
f8ab414f99
|
Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
c6795a1b47
|
Make CachedModelWithPartialLoad work with models that have non-persistent buffers.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
0a8fc74ae9
|
Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
dc54e8763b
|
Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
1b56020876
|
Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
97d56f7dc9
|
Add torch module autocast unit test for GGUF-quantized models.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
fe0ef2c27c
|
Add torch module autocast utilities.
|
2024-12-24 14:32:11 +00:00 |
|