Commit Graph

27 Commits

Author SHA1 Message Date
Kevin Turner
8bd52ed744 fix: improve gguf performance with torch.compile
pytorch 2.7 does not implement `set.__contains__`, so make this a list instead.

See https://github.com/pytorch/pytorch/issues/145761
2025-05-22 13:42:09 +10:00
David Burnett
6c0bd7d150 fix import ordering, remove code I reverted that the resync added back 2025-05-19 11:16:23 +10:00
David Burnett
99e154d773 fix picky ruff issue 2025-05-19 11:16:23 +10:00
David Burnett
e4e43ae126 fix missing bracket 2025-05-19 11:16:23 +10:00
David Burnett
a07fac6180 raise exected exception when attempting to change dtype 2025-05-19 11:16:23 +10:00
David Burnett
93d4b00082 Add to overload for GGMLTensor, so calling to on the model moves the quantized data as well 2025-05-19 11:16:23 +10:00
David Burnett
86719f2065 revert to overload due to failing tests, use Torch futures instead 2025-05-19 11:16:23 +10:00
David Burnett
5271fc1cac fix picky ruff issue 2025-05-19 11:16:23 +10:00
David Burnett
96ff7d9093 fix missing bracket 2025-05-19 11:16:23 +10:00
David Burnett
6f73d9e9c6 raise exected exception when attempting to change dtype 2025-05-19 11:16:23 +10:00
David Burnett
29b406a84b Add to overload for GGMLTensor, so calling to on the model moves the quantized data as well 2025-05-19 11:16:23 +10:00
Ryan Dick
5ea7953537 Update GGMLTensor with ops necessary to work with ConcatenatedLoRALayer. 2025-01-28 14:51:35 +00:00
Ryan Dick
a8b2c4c3d2 Add inference tests for all custom module types (i.e. to test autocasting from cpu to device). 2024-12-26 18:33:46 +00:00
Ryan Dick
9369b39a12 Add GGMLTensor op. 2024-12-17 13:20:19 +00:00
David Burnett
9bd17ea02f Get flux working with MPS on 2.4.1, with GGUF support 2024-10-23 10:20:42 +11:00
Brandon Rising
d328eaf743 Remove no longer used dequantize_tensor function 2024-10-02 18:33:05 -04:00
Ryan Dick
bc63e2acc5 Add workaround for FLUX GGUF models with incorrect img_in.weight shape. 2024-10-02 18:33:05 -04:00
Ryan Dick
ec7e771942 Add a compute_dtype field to GGMLTensor. 2024-10-02 18:33:05 -04:00
Ryan Dick
fe84013392 Add unit tests for GGMLTensor. 2024-10-02 18:33:05 -04:00
Ryan Dick
710f81266b Fix type errors in GGMLTensor. 2024-10-02 18:33:05 -04:00
Brandon Rising
446e2884bc Remove no longer used code paths, general cleanup of new dequantization code, update probe 2024-10-02 18:33:05 -04:00
Brandon Rising
7d9f125232 Run ruff and update imports 2024-10-02 18:33:05 -04:00
Brandon Rising
66bbd62758 Run ruff and fix typing in torch patcher 2024-10-02 18:33:05 -04:00
Brandon Rising
0875e861f5 Various updates to gguf performance 2024-10-02 18:33:05 -04:00
Ryan Dick
f06765dfba Get alternative GGUF implementation working... barely. 2024-10-02 18:33:05 -04:00
Ryan Dick
f347b26999 Initial experimentation with Tensor-like extension for GGUF. 2024-10-02 18:33:05 -04:00
Brandon Rising
2bfb0ddff5 Initial GGUF support for flux models 2024-10-02 18:33:05 -04:00