InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-02-01 17:14:58 -05:00

Author	SHA1	Message	Date
Ryan Dick	3f990393a1	Simplify the state management in InvokeLinear8bitLt and add unit tests. This is in preparation for wrapping it to support streaming of weights from cpu to gpu.	2024-12-24 14:32:11 +00:00
Ryan Dick	65fcbf5f60	Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.	2024-12-24 14:32:11 +00:00
Ryan Dick	9369b39a12	Add GGMLTensor op.	2024-12-17 13:20:19 +00:00
David Burnett	9bd17ea02f	Get flux working with MPS on 2.4.1, with GGUF support	2024-10-23 10:20:42 +11:00
Brandon Rising	d328eaf743	Remove no longer used dequantize_tensor function	2024-10-02 18:33:05 -04:00
Ryan Dick	bc63e2acc5	Add workaround for FLUX GGUF models with incorrect img_in.weight shape.	2024-10-02 18:33:05 -04:00
Ryan Dick	ec7e771942	Add a compute_dtype field to GGMLTensor.	2024-10-02 18:33:05 -04:00
Ryan Dick	fe84013392	Add unit tests for GGMLTensor.	2024-10-02 18:33:05 -04:00
Ryan Dick	710f81266b	Fix type errors in GGMLTensor.	2024-10-02 18:33:05 -04:00
Brandon Rising	446e2884bc	Remove no longer used code paths, general cleanup of new dequantization code, update probe	2024-10-02 18:33:05 -04:00
Brandon Rising	7d9f125232	Run ruff and update imports	2024-10-02 18:33:05 -04:00
Brandon Rising	66bbd62758	Run ruff and fix typing in torch patcher	2024-10-02 18:33:05 -04:00
Brandon Rising	0875e861f5	Various updates to gguf performance	2024-10-02 18:33:05 -04:00
Ryan Dick	f06765dfba	Get alternative GGUF implementation working... barely.	2024-10-02 18:33:05 -04:00
Ryan Dick	f347b26999	Initial experimentation with Tensor-like extension for GGUF.	2024-10-02 18:33:05 -04:00
Brandon Rising	2bfb0ddff5	Initial GGUF support for flux models	2024-10-02 18:33:05 -04:00
Ryan Dick	29fe1533f2	Fix bug in InvokeLinear8bitLt that was causing old state information to persist after loading from a state dict. This manifested as state tensors being left on the GPU even when a model had been offloaded to the CPU cache.	2024-08-29 19:08:18 +00:00
Brandon Rising	65bb46bcca	Rename params for flux and flux vae, add comments explaining use of the config_path in model config	2024-08-26 20:17:50 -04:00
Ryan Dick	635d2f480d	ruff	2024-08-26 20:17:50 -04:00
Brandon Rising	56b9906e2e	Setup scaffolding for in progress images and add ability to cancel the flux node	2024-08-26 20:17:50 -04:00
Ryan Dick	dff4a88baa	Move quantization scripts to a scripts/ subdir.	2024-08-26 20:17:50 -04:00
Ryan Dick	a21f6c4964	Update docs for T5 quantization script.	2024-08-26 20:17:50 -04:00
Ryan Dick	97562504b7	Remove all references to optimum-quanto and downgrade diffusers.	2024-08-26 20:17:50 -04:00
Ryan Dick	b9dd354e2b	Fixes to the T5XXL quantization script.	2024-08-26 20:17:50 -04:00
Ryan Dick	33c2fbd201	Add script for quantizing a T5 model.	2024-08-26 20:17:50 -04:00
Ryan Dick	b66f19d4d1	Add docs to the quantization scripts.	2024-08-26 20:17:50 -04:00
Ryan Dick	4105a78b83	Update load_flux_model_bnb_llm_int8.py to work with a single-file FLUX transformer checkpoint.	2024-08-26 20:17:50 -04:00
Ryan Dick	19a68afb3a	Fix bug in InvokeInt8Params that was causing it to use double the necessary VRAM.	2024-08-26 20:17:50 -04:00
Ryan Dick	cfac7c8189	Move requantize.py to the quatnization/ dir.	2024-08-26 20:17:50 -04:00
Ryan Dick	ac96f187bd	Remove duplicate log_time(...) function.	2024-08-26 20:17:50 -04:00
Brandon Rising	57168d719b	Fix styling/lint	2024-08-26 20:17:50 -04:00
Brandon Rising	4bd7fda694	Install sub directories with folders correctly, ensure consistent dtype of tensors in flux pipeline and vae	2024-08-26 20:17:50 -04:00
Brandon Rising	2d9042fb93	Run Ruff	2024-08-26 20:17:50 -04:00
Brandon Rising	9ed53af520	Run Ruff	2024-08-26 20:17:50 -04:00
Brandon Rising	56fda669fd	Manage quantization of models within the loader	2024-08-26 20:17:50 -04:00
Ryan Dick	1fa6bddc89	WIP on moving from diffusers to FLUX	2024-08-26 20:17:50 -04:00
Ryan Dick	d3a5ca5247	More improvements for LLM.int8() - not fully tested.	2024-08-26 20:17:50 -04:00
Ryan Dick	f01f56a98e	LLM.int8() quantization is working, but still some rough edges to solve.	2024-08-26 20:17:50 -04:00
Ryan Dick	99b0f79784	Clean up NF4 implementation.	2024-08-26 20:17:50 -04:00
Ryan Dick	eeabb7ebe5	Make quantized loading fast for both T5XXL and FLUX transformer.	2024-08-26 20:17:50 -04:00

40 Commits