InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI.git synced 2026-02-02 12:45:23 -05:00

Author	SHA1	Message	Date
Ryan Dick	548b3eddb8	pnpm typegen	2025-01-07 01:20:15 +00:00
Ryan Dick	5b42b7bd45	Add a utility to help with determining the working memory required for expensive operations.	2025-01-07 01:20:15 +00:00
Ryan Dick	71b97ce7be	Reduce the likelihood of encountering https://github.com/invoke-ai/InvokeAI/issues/7513 by elminating places where the door was left open for this to happen.	2025-01-07 01:20:15 +00:00
Ryan Dick	b343f81644	Use torch.cuda.memory_allocated() rather than torch.cuda.memory_reserved() to be more conservative in setting dynamic VRAM cache limits.	2025-01-07 01:20:15 +00:00
Ryan Dick	4abfb35321	Tune SD3 VAE decode working memory estimate.	2025-01-07 01:20:15 +00:00
Ryan Dick	cba6528ea7	Add a 20% buffer to all VAE decode working memory estimates.	2025-01-07 01:20:15 +00:00
Ryan Dick	6a5cee61be	Tune the working memory estimate for FLUX VAE decoding.	2025-01-07 01:20:15 +00:00
Ryan Dick	bd8017ecd5	Update working memory estimate for VAE decoding when tiling is being applied.	2025-01-07 01:20:15 +00:00
Ryan Dick	299eb94a05	Estimate the working memory required for VAE decoding, since this operations tends to be memory intensive.	2025-01-07 01:20:15 +00:00
Ryan Dick	fc4a22fe78	Allow expensive operations to request more working memory.	2025-01-07 01:20:13 +00:00
Ryan Dick	a167632f09	Calculate model cache size limits dynamically based on the available RAM / VRAM.	2025-01-07 01:14:20 +00:00
Ryan Dick	1321fac8f2	Remove get_cache_size() and set_cache_size() endpoints. These were unused by the frontend and refer to cache fields that are no longer accessible.	2025-01-07 01:06:20 +00:00
Ryan Dick	6a9de1fcf3	Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated().	2025-01-07 00:31:53 +00:00
Ryan Dick	e5180c4e6b	Add get_effective_device(...) utility to aid in determining the effective device of models that are partially loaded.	2025-01-07 00:31:00 +00:00
Ryan Dick	2619ef53ca	Handle device casting in ia2_layer.py.	2025-01-07 00:31:00 +00:00
Ryan Dick	bcd29c5d74	Remove all cases where we check the 'model.device'. This is no longer trustworthy now that partial loading is permitted.	2025-01-07 00:31:00 +00:00
Ryan Dick	1b7bb70bde	Improve handling of cases when application code modifies the size of a model after registering it with the model cache.	2025-01-07 00:31:00 +00:00
Ryan Dick	402dd840a1	Add seed to flaky unit test.	2025-01-07 00:31:00 +00:00
Ryan Dick	7127040c3a	Remove unused function set_nested_attr(...).	2025-01-07 00:31:00 +00:00
Ryan Dick	ceb2498a67	Add log prefix to model cache logs.	2025-01-07 00:31:00 +00:00
Ryan Dick	d0bfa019be	Add 'enable_partial_loading' config flag.	2025-01-07 00:31:00 +00:00
Ryan Dick	535e45cedf	First pass at adding partial loading support to the ModelCache.	2025-01-07 00:30:58 +00:00
Ryan Dick	782ee7a0ec	Partial Loading PR 3.5: Fix pre-mature model drops from the RAM cache (#7522 ) ## Summary This is an unplanned fix between PR3 and PR4 in the sequence of partial loading (i.e. low-VRAM) PRs. This PR restores the 'Current Workaround' documented in https://github.com/invoke-ai/InvokeAI/issues/7513. In other words, to work around a flaw in the model cache API, this fix allows models to be loaded into VRAM _even if_ they have been dropped from the RAM cache. This PR also adds an info log each time that this workaround is hit. In a future PR (#7509), we will eliminate the places in the application code that are capable of triggering this condition. ## Related Issues / Discussions - #7492 - #7494 - #7500 - https://github.com/invoke-ai/InvokeAI/issues/7513 ## QA Instructions - Set RAM cache limit to a small value. E.g. `ram: 4` - Run FLUX text-to-image with the full T5 encoder, which exceeds 4GB. This will trigger the error condition. - Before the fix, this test configuration would cause a `KeyError`. After the fix, we should see an info-level log explaining that the condition was hit, but that generation should continue successfully. ## Merge Plan No special instructions. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_	2025-01-06 19:05:48 -05:00
Ryan Dick	c579a218ef	Allow models to be locked in VRAM, even if they have been dropped from the RAM cache (related: https://github.com/invoke-ai/InvokeAI/issues/7513 ).	2025-01-06 23:02:52 +00:00
Riku	f4f7415a3b	fix(app): remove obsolete DEFAULT_PRECISION variable	2025-01-06 11:14:58 +11:00
Mary Hipp	7d6c443d6f	fix(api): limit board_name length to 300 characters	2025-01-06 10:49:49 +11:00
psychedelicious	868e06eb8b	tests: fix test_model_install.py	2025-01-03 11:21:23 -05:00
psychedelicious	40e4dbe1fb	docs: add blurb about setting a HF token when downloading HF models by URL and not repo id	2025-01-03 11:21:23 -05:00
psychedelicious	4815b4ea80	feat(ui): tweak verbiage for model install errors	2025-01-03 11:21:23 -05:00
psychedelicious	d77a6ccd76	fix(ui): model install error toasts not updating correctly	2025-01-03 11:21:23 -05:00
psychedelicious	3e860c8338	feat(ui): starter models filter works with model base For example, "flux" now matches any starter model with a model base of "FLUX".	2025-01-03 11:21:23 -05:00
psychedelicious	4f2ef7ce76	refactor(ui): handle hf vs civitai/other url model install errors separately Previously, we didn't differentiate between model install errors for different types of model install sources, resulting in a buggy UX: - If a HF model install failed, but it was a HF URL install and not a repo id install, the link to the HF model page was incorrect. - If a non-HF URL install (e.g. civitai) failed, we treated it as a HF URL install. In this case, if the user's HF token was invalid or unset, we directed the user to set it. If the HF token was valid, we displayed an empty red toast. If it's not a HF URL install, then of course neither of these are correct. Also, the logic for handling the toasts was a bit complicated. This change does a few things: - Consolidate the model install error toasts into one place - the socket.io event handler for the model install error event. There is no more global state for the toasts and there are no hooks managing them. - Handling the different cases for errors, including all combinations of HF/non-HF and unauthorized/forbidden/unknown.	2025-01-03 11:21:23 -05:00
psychedelicious	d7e9ad52f9	chore(ui): typegen	2025-01-03 11:21:23 -05:00
psychedelicious	b6d7a44004	refactor(events): include full model source in model install events This is required to fix an issue with the MM UI's error handling. Previously, we only included the model source as a string. That could be an arbitrary URL, file path or HF repo id, but the frontend has no parsing logic to differentiate between these different model sources. Without access to the type of model source, it is difficult to determine how the user should proceed. For example, if it's HF URL with an HTTP unauthorized error, we should direct the user to log in to HF. But if it's a civitai URL with the same error, we should not direct the user to HF. There are a variety of related edge cases. With this change, the full `ModelSource` object is included in each model install event, including error events. I had to fix some circular import issues, hence the import changes to files other than `events_common.py`.	2025-01-03 11:21:23 -05:00
psychedelicious	e18100ae7e	refactor(ui): move model install error event handling to own file No logic change.	2025-01-03 11:21:23 -05:00
psychedelicious	ad0aa0e6b2	feat(ui): reset canvas layers only resets the layers	2025-01-03 11:02:04 -05:00
psychedelicious	157b92e0fd	docs: no need to specify version for dev env setup	2025-01-03 10:59:39 -05:00
psychedelicious	fd838ad9d4	docs: update dev env docs to mirror the launcher's install method	2025-01-03 14:27:45 +11:00
psychedelicious	5e9227c052	docs: update manual install docs to mirror the launcher's install method	2025-01-03 14:27:45 +11:00
Kent Keirsey	94785231ce	Update href to correct link	2025-01-02 09:39:41 +11:00
Ryan Dick	b46d7abfb0	Partial Loading PR3: Integrate 1) partial loading, 2) quantized models, 3) model patching (#7500 ) ## Summary This PR is the third in a sequence of PRs working towards support for partial loading of models onto the compute device (for low-VRAM operation). This PR updates the LoRA patching code so that the following features can cooperate fully: - Partial loading of weights onto the GPU - Quantized layers / weights - Model patches (e.g. LoRA) Note that this PR does not yet enable partial loading. It adds support in the model patching code so that partial loading can be enabled in a future PR. ## Technical Design Decisions The layer patching logic has been integrated into the custom layers (via `CustomModuleMixin`) rather than keeping it in a separate set of wrapper layers, as before. This has the following advantages: - It makes it easier to calculate the modified weights on the fly and then reuse the normal forward() logic. - In the future, it makes it possible to pass original parameters that have been cast to the device down to the LoRA calculation without having to re-cast (but the current implementation hasn't fully taken advantage of this yet). ## Know Limitations 1. I haven't fully solved device management for patch types that require the original layer value to calculate the patch. These aren't very common, and are not compatible with some quantized layers, so leaving this for future if there's demand. 2. There is a small speed regression for models that have CPU bottlenecks. This seems to be caused by slightly slower method resolution on the custom layers sub-classes. The regression does not show up on larger models, like FLUX, that are almost entirely GPU-limited. I think this small regression is tolerable, but if we decide that it's not, then the slowdown can easily be reclaimed by optimizing other CPU operations (e.g. if we only sent every 2nd progress image, we'd see a much more significant speedup). ## Related Issues / Discussions - https://github.com/invoke-ai/InvokeAI/pull/7492 - https://github.com/invoke-ai/InvokeAI/pull/7494 ## QA Instructions Speed tests: - Vanilla SD1 speed regression - Before: 3.156s (8.78 it/s) - After: 3.54s (8.35 it/s) - Vanilla SDXL speed regression - Before: 6.23s (4.46 it/s) - After: 6.45s (4.31 it/s) - Vanilla FLUX speed regression - Before: 12.02s (2.27 it/s) - After: 11.91s (2.29 it/s) LoRA tests with default configuration: - [x] SD1: A handful of LoRA variants - [x] SDXL: A handful of LoRA variants - [x] flux non-quantized: multiple lora variants - [x] flux bnb-quantized: multiple lora variants - [x] flux ggml-quantized: muliple lora variants - [x] flux non-quantized: FLUX control LoRA - [x] flux bnb-quantized: FLUX control LoRA - [x] flux ggml-quantized: FLUX control LoRA LoRA tests with sidecar patching forced: - [x] SD1: A handful of LoRA variants - [x] SDXL: A handful of LoRA variants - [x] flux non-quantized: multiple lora variants - [x] flux bnb-quantized: multiple lora variants - [x] flux ggml-quantized: muliple lora variants - [x] flux non-quantized: FLUX control LoRA - [x] flux bnb-quantized: FLUX control LoRA - [x] flux ggml-quantized: FLUX control LoRA Other: - [x] Smoke testing of IP-Adapter, ControlNet All tests repeated on: - [x] cuda - [x] cpu (only test SD1, because larger models are prohibitively slow) - [x] mps (skipped FLUX tests, because my Mac doesn't have enough memory to run them in a reasonable amount of time) ## Merge Plan No special instructions. ## Checklist - [x] _The PR has a short but descriptive title, suitable for a changelog_ - [x] _Tests added / updated (if applicable)_ - [x] _Documentation added / updated (if applicable)_ - [ ] _Updated `What's New` copy (if doing a release after this PR)_	2024-12-31 13:58:13 -05:00
Ryan Dick	9a0a226ce1	Fix bitsandbytes imports in unit tests on MacOS.	2024-12-30 10:41:48 -05:00
Ryan Dick	477d87ec31	Fix layer patch dtype selection for CLIP text encoder models.	2024-12-29 21:48:51 +00:00
Ryan Dick	8b4b0ff0cf	Fix bug in CustomConv1d and CustomConv2d patch calculations.	2024-12-29 19:10:19 +00:00
Ryan Dick	6fd9b0a274	Delete old sidecar wrapper implementation. This functionality has moved into the custom layers.	2024-12-29 17:33:08 +00:00
Ryan Dick	52fc5a64d4	Add a unit test for a LoRA patch applied to a quantized linear layer with weights streamed from CPU to GPU.	2024-12-29 17:14:55 +00:00
Ryan Dick	a8bef59699	First pass at making custom layer patches work with weights streamed from the CPU to the GPU.	2024-12-29 17:01:37 +00:00
Ryan Dick	6d49ee839c	Switch the LayerPatcher to use 'custom modules' to manage layer patching.	2024-12-29 01:18:30 +00:00
Ryan Dick	0525f967c2	Fix the _autocast_forward_with_patches() function for CustomConv1d and CustomConv2d.	2024-12-29 00:22:37 +00:00
Ryan Dick	2855bb6b41	Update BaseLayerPatch.get_parameters(...) to accept a dict of orig_parameters rather than orig_module. This will enable compatibility between patching and cpu->gpu streaming.	2024-12-28 21:12:53 +00:00

1 2 3 4 5 ...

15240 Commits