jiangmencity
|
5259693ed1
|
chore: fix some comments
Signed-off-by: jiangmencity <jiangmen@52it.net>
|
2025-08-14 09:32:54 +10:00 |
|
Kevin Turner
|
8bd52ed744
|
fix: improve gguf performance with torch.compile
pytorch 2.7 does not implement `set.__contains__`, so make this a list instead.
See https://github.com/pytorch/pytorch/issues/145761
|
2025-05-22 13:42:09 +10:00 |
|
David Burnett
|
6c0bd7d150
|
fix import ordering, remove code I reverted that the resync added back
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
99e154d773
|
fix picky ruff issue
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
e4e43ae126
|
fix missing bracket
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
a07fac6180
|
raise exected exception when attempting to change dtype
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
93d4b00082
|
Add to overload for GGMLTensor, so calling to on the model moves the quantized data as well
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
86719f2065
|
revert to overload due to failing tests, use Torch futures instead
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
5271fc1cac
|
fix picky ruff issue
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
96ff7d9093
|
fix missing bracket
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
6f73d9e9c6
|
raise exected exception when attempting to change dtype
|
2025-05-19 11:16:23 +10:00 |
|
David Burnett
|
29b406a84b
|
Add to overload for GGMLTensor, so calling to on the model moves the quantized data as well
|
2025-05-19 11:16:23 +10:00 |
|
Ryan Dick
|
5ea7953537
|
Update GGMLTensor with ops necessary to work with ConcatenatedLoRALayer.
|
2025-01-28 14:51:35 +00:00 |
|
Ryan Dick
|
a8b2c4c3d2
|
Add inference tests for all custom module types (i.e. to test autocasting from cpu to device).
|
2024-12-26 18:33:46 +00:00 |
|
Ryan Dick
|
3f990393a1
|
Simplify the state management in InvokeLinear8bitLt and add unit tests. This is in preparation for wrapping it to support streaming of weights from cpu to gpu.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
65fcbf5f60
|
Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.
|
2024-12-24 14:32:11 +00:00 |
|
Ryan Dick
|
9369b39a12
|
Add GGMLTensor op.
|
2024-12-17 13:20:19 +00:00 |
|
David Burnett
|
9bd17ea02f
|
Get flux working with MPS on 2.4.1, with GGUF support
|
2024-10-23 10:20:42 +11:00 |
|
Brandon Rising
|
d328eaf743
|
Remove no longer used dequantize_tensor function
|
2024-10-02 18:33:05 -04:00 |
|
Ryan Dick
|
bc63e2acc5
|
Add workaround for FLUX GGUF models with incorrect img_in.weight shape.
|
2024-10-02 18:33:05 -04:00 |
|
Ryan Dick
|
ec7e771942
|
Add a compute_dtype field to GGMLTensor.
|
2024-10-02 18:33:05 -04:00 |
|
Ryan Dick
|
fe84013392
|
Add unit tests for GGMLTensor.
|
2024-10-02 18:33:05 -04:00 |
|
Ryan Dick
|
710f81266b
|
Fix type errors in GGMLTensor.
|
2024-10-02 18:33:05 -04:00 |
|
Brandon Rising
|
446e2884bc
|
Remove no longer used code paths, general cleanup of new dequantization code, update probe
|
2024-10-02 18:33:05 -04:00 |
|
Brandon Rising
|
7d9f125232
|
Run ruff and update imports
|
2024-10-02 18:33:05 -04:00 |
|
Brandon Rising
|
66bbd62758
|
Run ruff and fix typing in torch patcher
|
2024-10-02 18:33:05 -04:00 |
|
Brandon Rising
|
0875e861f5
|
Various updates to gguf performance
|
2024-10-02 18:33:05 -04:00 |
|
Ryan Dick
|
f06765dfba
|
Get alternative GGUF implementation working... barely.
|
2024-10-02 18:33:05 -04:00 |
|
Ryan Dick
|
f347b26999
|
Initial experimentation with Tensor-like extension for GGUF.
|
2024-10-02 18:33:05 -04:00 |
|
Brandon Rising
|
2bfb0ddff5
|
Initial GGUF support for flux models
|
2024-10-02 18:33:05 -04:00 |
|
Ryan Dick
|
29fe1533f2
|
Fix bug in InvokeLinear8bitLt that was causing old state information to persist after loading from a state dict. This manifested as state tensors being left on the GPU even when a model had been offloaded to the CPU cache.
|
2024-08-29 19:08:18 +00:00 |
|
Brandon Rising
|
65bb46bcca
|
Rename params for flux and flux vae, add comments explaining use of the config_path in model config
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
635d2f480d
|
ruff
|
2024-08-26 20:17:50 -04:00 |
|
Brandon Rising
|
56b9906e2e
|
Setup scaffolding for in progress images and add ability to cancel the flux node
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
dff4a88baa
|
Move quantization scripts to a scripts/ subdir.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
a21f6c4964
|
Update docs for T5 quantization script.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
97562504b7
|
Remove all references to optimum-quanto and downgrade diffusers.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
b9dd354e2b
|
Fixes to the T5XXL quantization script.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
33c2fbd201
|
Add script for quantizing a T5 model.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
b66f19d4d1
|
Add docs to the quantization scripts.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
4105a78b83
|
Update load_flux_model_bnb_llm_int8.py to work with a single-file FLUX transformer checkpoint.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
19a68afb3a
|
Fix bug in InvokeInt8Params that was causing it to use double the necessary VRAM.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
cfac7c8189
|
Move requantize.py to the quatnization/ dir.
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
ac96f187bd
|
Remove duplicate log_time(...) function.
|
2024-08-26 20:17:50 -04:00 |
|
Brandon Rising
|
57168d719b
|
Fix styling/lint
|
2024-08-26 20:17:50 -04:00 |
|
Brandon Rising
|
4bd7fda694
|
Install sub directories with folders correctly, ensure consistent dtype of tensors in flux pipeline and vae
|
2024-08-26 20:17:50 -04:00 |
|
Brandon Rising
|
2d9042fb93
|
Run Ruff
|
2024-08-26 20:17:50 -04:00 |
|
Brandon Rising
|
9ed53af520
|
Run Ruff
|
2024-08-26 20:17:50 -04:00 |
|
Brandon Rising
|
56fda669fd
|
Manage quantization of models within the loader
|
2024-08-26 20:17:50 -04:00 |
|
Ryan Dick
|
1fa6bddc89
|
WIP on moving from diffusers to FLUX
|
2024-08-26 20:17:50 -04:00 |
|