Commit Graph

40 Commits

Author SHA1 Message Date
Ryan Dick
3f990393a1 Simplify the state management in InvokeLinear8bitLt and add unit tests. This is in preparation for wrapping it to support streaming of weights from cpu to gpu. 2024-12-24 14:32:11 +00:00
Ryan Dick
65fcbf5f60 Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW. 2024-12-24 14:32:11 +00:00
Ryan Dick
9369b39a12 Add GGMLTensor op. 2024-12-17 13:20:19 +00:00
David Burnett
9bd17ea02f Get flux working with MPS on 2.4.1, with GGUF support 2024-10-23 10:20:42 +11:00
Brandon Rising
d328eaf743 Remove no longer used dequantize_tensor function 2024-10-02 18:33:05 -04:00
Ryan Dick
bc63e2acc5 Add workaround for FLUX GGUF models with incorrect img_in.weight shape. 2024-10-02 18:33:05 -04:00
Ryan Dick
ec7e771942 Add a compute_dtype field to GGMLTensor. 2024-10-02 18:33:05 -04:00
Ryan Dick
fe84013392 Add unit tests for GGMLTensor. 2024-10-02 18:33:05 -04:00
Ryan Dick
710f81266b Fix type errors in GGMLTensor. 2024-10-02 18:33:05 -04:00
Brandon Rising
446e2884bc Remove no longer used code paths, general cleanup of new dequantization code, update probe 2024-10-02 18:33:05 -04:00
Brandon Rising
7d9f125232 Run ruff and update imports 2024-10-02 18:33:05 -04:00
Brandon Rising
66bbd62758 Run ruff and fix typing in torch patcher 2024-10-02 18:33:05 -04:00
Brandon Rising
0875e861f5 Various updates to gguf performance 2024-10-02 18:33:05 -04:00
Ryan Dick
f06765dfba Get alternative GGUF implementation working... barely. 2024-10-02 18:33:05 -04:00
Ryan Dick
f347b26999 Initial experimentation with Tensor-like extension for GGUF. 2024-10-02 18:33:05 -04:00
Brandon Rising
2bfb0ddff5 Initial GGUF support for flux models 2024-10-02 18:33:05 -04:00
Ryan Dick
29fe1533f2 Fix bug in InvokeLinear8bitLt that was causing old state information to persist after loading from a state dict. This manifested as state tensors being left on the GPU even when a model had been offloaded to the CPU cache. 2024-08-29 19:08:18 +00:00
Brandon Rising
65bb46bcca Rename params for flux and flux vae, add comments explaining use of the config_path in model config 2024-08-26 20:17:50 -04:00
Ryan Dick
635d2f480d ruff 2024-08-26 20:17:50 -04:00
Brandon Rising
56b9906e2e Setup scaffolding for in progress images and add ability to cancel the flux node 2024-08-26 20:17:50 -04:00
Ryan Dick
dff4a88baa Move quantization scripts to a scripts/ subdir. 2024-08-26 20:17:50 -04:00
Ryan Dick
a21f6c4964 Update docs for T5 quantization script. 2024-08-26 20:17:50 -04:00
Ryan Dick
97562504b7 Remove all references to optimum-quanto and downgrade diffusers. 2024-08-26 20:17:50 -04:00
Ryan Dick
b9dd354e2b Fixes to the T5XXL quantization script. 2024-08-26 20:17:50 -04:00
Ryan Dick
33c2fbd201 Add script for quantizing a T5 model. 2024-08-26 20:17:50 -04:00
Ryan Dick
b66f19d4d1 Add docs to the quantization scripts. 2024-08-26 20:17:50 -04:00
Ryan Dick
4105a78b83 Update load_flux_model_bnb_llm_int8.py to work with a single-file FLUX transformer checkpoint. 2024-08-26 20:17:50 -04:00
Ryan Dick
19a68afb3a Fix bug in InvokeInt8Params that was causing it to use double the necessary VRAM. 2024-08-26 20:17:50 -04:00
Ryan Dick
cfac7c8189 Move requantize.py to the quatnization/ dir. 2024-08-26 20:17:50 -04:00
Ryan Dick
ac96f187bd Remove duplicate log_time(...) function. 2024-08-26 20:17:50 -04:00
Brandon Rising
57168d719b Fix styling/lint 2024-08-26 20:17:50 -04:00
Brandon Rising
4bd7fda694 Install sub directories with folders correctly, ensure consistent dtype of tensors in flux pipeline and vae 2024-08-26 20:17:50 -04:00
Brandon Rising
2d9042fb93 Run Ruff 2024-08-26 20:17:50 -04:00
Brandon Rising
9ed53af520 Run Ruff 2024-08-26 20:17:50 -04:00
Brandon Rising
56fda669fd Manage quantization of models within the loader 2024-08-26 20:17:50 -04:00
Ryan Dick
1fa6bddc89 WIP on moving from diffusers to FLUX 2024-08-26 20:17:50 -04:00
Ryan Dick
d3a5ca5247 More improvements for LLM.int8() - not fully tested. 2024-08-26 20:17:50 -04:00
Ryan Dick
f01f56a98e LLM.int8() quantization is working, but still some rough edges to solve. 2024-08-26 20:17:50 -04:00
Ryan Dick
99b0f79784 Clean up NF4 implementation. 2024-08-26 20:17:50 -04:00
Ryan Dick
eeabb7ebe5 Make quantized loading fast for both T5XXL and FLUX transformer. 2024-08-26 20:17:50 -04:00