Ryan Dick
|
3f990393a1
|
Simplify the state management in InvokeLinear8bitLt and add unit tests. This is in preparation for wrapping it to support streaming of weights from cpu to gpu.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
65fcbf5f60
|
Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
29fe1533f2
|
Fix bug in InvokeLinear8bitLt that was causing old state information to persist after loading from a state dict. This manifested as state tensors being left on the GPU even when a model had been offloaded to the CPU cache.
|
2024-08-29 19:08:18 +00:00 |
|
Ryan Dick
|
19a68afb3a
|
Fix bug in InvokeInt8Params that was causing it to use double the necessary VRAM.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
d3a5ca5247
|
More improvements for LLM.int8() - not fully tested.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
f01f56a98e
|
LLM.int8() quantization is working, but still some rough edges to solve.
|
2024-08-26 20:17:50 -04:00 |
|